Aergia: Exploiting Packet Latency Slack in On-Chip Networks

Reetuparna Das; Onur Mutlu; Thomas Moscibroda; Chita Das

Aergia: Exploiting Packet Latency Slack in On-Chip Networks

Reetuparna Das ,
Onur Mutlu ,
Thomas Moscibroda ,
Chita Das

ISCA 2010: 37th International Symposium on Computer Architecture (ISCA), Saint-Malo, France | June 2010

Published by ACM

Selected as a 2011 Top Pick paper by IEEE Micro

Publication

Download BibTex

Traditional Network-on-Chips (NoCs) employ simple arbitration strategies, such as round-robin or oldest-first, to decide which packets should be prioritized in the network. This is sub-optimal since different packets can have very different effects on system performance due to, e.g., different level of memory-level parallelism (MLP) of applications. Certain packets may be performance-critical because they cause the processor to stall, whereas others may be delayed for a number of cycles with no effect on application-level performance as their latencies are hidden by other outstanding packets’ latencies. In this paper, we define slack as a key measure that characterizes the relative importance of a packet. Specifically, the slack of a packet is the number of cycles the packet can be delayed in the network with no effect on execution time. This paper proposes new router prioritization policies that exploit the available slack of interfering packets in order to accelerate performance-critical packets and thus improve overall system performance. When two packets interfere with each other in a router, the packet with the lower slack value is prioritized. We describe mechanisms to estimate slack, prevent starvation, and combine slack-based prioritization with other recently proposed application-aware prioritization mechanisms.

We evaluate slack-based prioritization policies on a 64-core CMP with an 8×8 mesh NoC using a suite of 35 diverse applications. For a representative set of case studies, our proposed policy increases average system throughput by 21.0% over the commonly used round-robin policy. Averaged over 56 randomly-generated multi-programmed workload mixes, the proposed policy improves system throughput by 10.3%, while also reducing application-level unfairness by 30.8%.