Last update:
Fri Jan 10 10:12:27 MST 2025
C. Alvarez and J. Corbal and E. Salami and M. Valero Initial Results on Fuzzy Floating Point Computation for Multimedia Processors 1--1 A. Gordon-Ross and S. Cotterell and F. Vahid Exploiting Fixed Programs in Embedded Systems: a Loop Cache Example . . . . . 2--2 Jin-Hyuck Choi and Jung-Hoon Lee and Seh-Woong Jeong and Shin-Dug Kim and C. Weems A Low Power TLB Structure for Embedded Systems . . . . . . . . . . . . . . . . 3--3 B. Towles and W. J. Dally Worst-case Traffic for Oblivious Routing Functions . . . . . . . . . . . . . . . 4--4 O. S. Unsal and C. M. Krishna and C. A. Mositz Cool-Fetch: Compiler-Enabled Power-Aware Fetch Throttling . . . . . . . . . . . . 5--5 Li Shang and L. Peh and N. K. Jha Power-efficient Interconnection Networks: Dynamic Voltage Scaling with Links . . . . . . . . . . . . . . . . . 6--6 A. J. KleinOsowski and D. J. Lilja MinneSPEC: a New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research . . . . . . . . . 7--7 H. Vandierendonck and K. De Bosschere An Address Transformation Combining Block- and Word-Interleaving . . . . . . 8--8 S. Tambat and S. Vajapeyam Page-Level Behavior of Cache Contention 9--9 Philo Juang and P. Diodato and S. Kaxiras and K. Skadron and Zhigang Hu and M. Martonosi and D. W. Clark Implementing Decay Techniques using 4T Quasi-Static Memory Cells . . . . . . . 10--10 YoungChul Sohn and NaiHoon Jung and Seungryoul Maeng Request Reordering to Enhance the Performance of Strict Consistency Models 11--11 K. A. Shaw and W. J. Dally Migration in Single Chip Multiprocessors 12--12
K.-H. Sihn and Joonwon Lee and Jung-Wan Cho A Speculative Coherence Scheme using Decoupling Synchronization for Multiprocessor Systems . . . . . . . . . 1--1 R. Kumar and K. Farkas and N. P. Jouppi and P. Ranganathan and D. M. Tullsen Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures 2--2 R. Sendag and Peng-fei Chuang and D. J. Lilja Address Correlation: Exceeding the Limits of Locality . . . . . . . . . . . 3--3 A. Milenkovic and M. Milenkovic Stream-Based Trace Compression . . . . . 4--4 Chuanjun Zhang and F. Vahid and Jun Yang and W. Walid A Way-Halting Cache for Low-Energy High-Performance Systems . . . . . . . . 5--5 A. Cohen and F. Finkelstein and A. Mendelson and R. Ronen and D. Rudoy On Estimating Optimal Performance of CPU Dynamic Thermal Management . . . . . . . 6--6 A. Cristal and J. F. Martinez and J. Llosa and M. Valero A case for resource-conscious out-of-order processors . . . . . . . . 7--7
D. Citron Exploiting Low Entropy to Reduce Wire Delay . . . . . . . . . . . . . . . . . 1--1 A. Singh and W. J. Dally and B. Towles and A. K. Gupta Globally Adaptive Load-Balanced Routing on Tori . . . . . . . . . . . . . . . . 2--2 M. E. Gomez and J. Duato and J. Flich and P. Lopez and A. Robles and N. A. Nordbotten and O. Lysne and T. Skeie An Efficient Fault-Tolerant Routing Methodology for Meshes and Tori . . . . 3--3 J. M. Stine and N. P. Carter and J. Flich Comparing Adaptive Routing and Dynamic Voltage Scaling for Link Power Reduction 4--4 B. Robatmili and N. Yazdani and S. Sardashti and M. Nourani Thread-Sensitive Instruction Issue for SMT Processors . . . . . . . . . . . . . 5--5 Yue Luo and L. K. John Efficiently Evaluating Speedup Using Sampled Processor Simulation . . . . . . 6--6 L. Ceze and K. Strauss and J. Tuck and J. Renau and J. Torrellas CAVA: Hiding L2 Misses with Checkpoint-Assisted Value Prediction . . 7--7 A. Singh and W. J. Dally Buffer and Delay Bounds in High Radix Interconnection Networks . . . . . . . . 8--8 A. L. Holloway and G. S. Sohi Characterization of Problem Stores . . . 9--9
Y. Sazeides and R. Kumar and D. M. Tullsen and T. Constantinou The Danger of Interval-Based Power Efficiency Metrics: When Worst Is Best 1--1 O. Mutlu and Hyesoon Kim and J. Stark and Y. N. Patt On Reusing the Results of Pre-Executed Instructions in a Runahead Execution Processor . . . . . . . . . . . . . . . 2--2
Chuanjun Zhang Balanced instruction cache: reducing conflict misses of direct-mapped caches through balanced subarray accesses . . . 2--5 G. Ottoni and R. Rangan and A. Stoler and M. J. Bridges and D. I. August From sequential programs to concurrent threads . . . . . . . . . . . . . . . . 6--9 A. K. Gupta and W. J. Dally Topology optimization of interconnection networks . . . . . . . . . . . . . . . . 10--13 J.-L. Gaudiot and Y. Patt and K. Skadon Foreword . . . . . . . . . . . . . . . . 11--11 T. Y. Morad and U. C. Weiser and A. Kolodnyt and M. Valero and E. Ayguade Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors . . . . . . . . . . . . 14--17 N. Riley and C. Zilles Probabilistic counter updates for predictor hysteresis and bias . . . . . 18--21 Huiyang Zhou A case for fault tolerance and performance enhancement using chip multi-processors . . . . . . . . . . . . 22--25 Moon-Sang Lee and Sang-Kwon Lee and Joonwon Lee and Seung-Ryoul Maeng Adopting system call based address translation into user-level communication . . . . . . . . . . . . . 26--29 Jung Ho Ahn and W. J. Dally Data parallel address architecture . . . 30--33 N. Eisley and Li-Shiuan Peh and Li Shang In-network cache coherence . . . . . . . 34--37 R. Srinivasan and J. Cook and O. Lubeck Performance modeling using Monte Carlo simulation . . . . . . . . . . . . . . . 38--41
O. Ergin and O. Unsal and X. Vera and A. Gonzalez Exploiting Narrow Values for Soft Error Tolerance . . . . . . . . . . . . . . . 12--12 W. Li and S. Mohanty and K. Kavi A Page-based Hybrid (Software--Hardware) Dynamic Memory Allocator . . . . . . . . 13--13 J. Donald and M. Martonosi An Efficient, Practical Parallelization Methodology for Multicore Architecture Simulation . . . . . . . . . . . . . . . 14--14 A. Bracy and K. Doshi and Q. Jacobson Disintermediated Active Communication 15--15 A. Mallik and B. Lin and G. Memik and P. Dinda and R. P. Dick User-Driven Frequency Scaling . . . . . 16--16 C. Blundell and E. C. Lewis and M. M. K. Martin Subtleties of transactional memory atomicity semantics . . . . . . . . . . 17--17 G. Price and M. Vachharajani A Case for Compressing Traces with BDDs 18--18
M. Moreto Planas and F. Cazorla and A. Ramirez and M. Valero Explaining Dynamic Cache Partitioning Speed Ups . . . . . . . . . . . . . . . 1--4 N. Enright Jerger and M. Lipasti and L. Peh Circuit-Switched Coherence . . . . . . . 5--8 S. Kodakara and J. Kim and D. Lilja and D. Hawkins and W. Hsu and P. Yew CIM: a Reliable Metric for Evaluating Program Phase Classifications . . . . . 9--12 W. R. Dieter and A. Kaveti and H. G. Dietz Low-Cost Microarchitectural Support for Improved Floating-Point Accuracy . . . . 13--16 Y. Etsion and D. G. Feitelson Probabilistic Prediction of Temporal Locality . . . . . . . . . . . . . . . . 17--20 Z. Guz and I. Keidar and A. Kolodny and U. Weiser Nahalal: Cache Organization for Chip Multiprocessors . . . . . . . . . . . . 21--24
J. A. Joao and O. Mutlu and H. Kim and Y. N. Patt Dynamic Predication of Indirect Jumps 25--28 A. Das and S. Ozdemir and G. Memik and J. Zambreno and A. Choudhary Microarchitectures for Managing Chip Revenues under Process Variations . . . 29--32 J. Zebchuk and A. Moshovos A Building Block for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy . . . . . . . . . . . . . . . 33--36 J. Kim and J. Balfour and W. J. Dally Flattened Butterfly Topology for On-Chip Networks . . . . . . . . . . . . . . . . 37--40 X. Xiao and J. Lee A Novel Parallel Deadlock Detection Algorithm and Hardware for Multiprocessor System-on-a-Chip . . . . 41--44 D. August and J. Chang and S. Girbal and D. Gracia-Perez and G. Mouchard and D. A. Penry and O. Temam and N. Vachharajani UNISIM: an Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development . . 45--48 R. Sendag and J. Yi and P. Chuang Branch Misprediction Prediction: Complementary Branch Predictors . . . . 49--52 G. Yalcin and O. Ergin Using tag-match comparators for detecting soft errors . . . . . . . . . 53--56
J. A. Joao and O. Mutlu and H. Kim and Y. N. Patt Dynamic Predication of Indirect Jumps 1--4 A. Das and S. Ozdemir and G. Memik and J. Zambreno and A. Choudhary Microarchitectures for Managing Chip Revenues under Process Variations . . . 5--8 A. Roth Physical register reference counting . . 9--12 J. Flich and J. Duato Logic-Based Distributed Routing for NoCs 13--16 J. H. Yoon and E. H. Nam and Y. J. Seong and H. Kim and B. Kim and S. L. Min and Y. Cho Chameleon: a High Performance Flash/FRAM Hybrid Solid State Disk Architecture . . 17--20 A. Biswas and P. Racunas and J. Emer and S. Mukherjee Computing Accurate AVFs using ACE Analysis on Performance Models: a Rebuttal . . . . . . . . . . . . . . . . 21--24 S. Cho and R. Melhem Corollaries to Amdahl's Law for Energy 25--28 J. Balfour and W. Dally and D. Black-Schaffer and V. Parikh and J. Park An Energy-Efficient Processor Architecture for Embedded Systems . . . 29--32
Anonymous [Front cover] . . . . . . . . . . . . . c1--c1 Anonymous Editorial Board [Cover2] . . . . . . . . c2--c2 D. Pao and W. Lin and B. Liu Pipelined Architecture for Multi-String Matching . . . . . . . . . . . . . . . . 33--36 R. Sunkam Ramanujam and B. Lin Randomized Partially-Minimal Routing on Three-Dimensional Mesh Networks . . . . 37--40 D. Black-Schaffer and J. Balfour and W. Dally and V. Parikh and J. Park Hierarchical Instruction Register Organization . . . . . . . . . . . . . . 41--44 J. Lee and X. Xiao A Parallel Deadlock Detection Algorithm with $ O(1) $ Overall Run-time Complexity . . . . . . . . . . . . . . . 45--48 C. Gomez Requena and F. Gilabert Villamon and M. Gomez and P. Lopez and J. Duato Beyond Fat-tree: Unidirectional Load--Balanced Multistage Interconnection Network . . . . . . . . 49--52 Z. Li and C. Zhu and L. Shang and R. Dick and Y. Sun Transaction-Aware Network-on-Chip Resource Reservation . . . . . . . . . . 53--56 S. Fide and S. Jenks Proactive Use of Shared L3 Caches to Enhance Cache Communications in Multi-Core Processors . . . . . . . . . 57--60 I. Walter and I. Cidon and A. Kolodny BENoC: a Bus-Enhanced Network on-Chip for a Power Efficient CMP . . . . . . . 61--64 A. Golander and S. Weiss and R. Ronen DDMR: Dynamic and Scalable Dual Modular Redundancy with Short Validation Intervals . . . . . . . . . . . . . . . 65--68 Anonymous Information for authors . . . . . . . . c3--c3 Anonymous IEEE Computer Society [Cover 4] . . . . c4--c4
Rohit Sunkam Ramanujam and Bill Lin Weighted Random Routing on Torus Networks . . . . . . . . . . . . . . . . 1--4 Jung Ho Ahn and Jacob Leverich and Robert S. Schreiber and Norman P. Jouppi Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs . . . . . . . . . . . . 5--8 Po-Han Wang and Yen-Ming Chen and Chia-Lin Yang and Yu-Jung Cheng A Predictive Shutdown Technique for GPU Shader Processors . . . . . . . . . . . 9--12 Christopher Barnes and Pranav Vaidya and Jaehwan John Lee An XML-Based ADL Framework for Automatic Generation of Multithreaded Computer Architecture Simulators . . . . . . . . 13--16 Carlos Luque and Miquel Moreto and Francisco J. Cazorla and Roberto Gioiosa and Alper Buyuktosunoglu and Mateo Valero CPU Accounting in CMP Processors . . . . 17--20 Vassos Soteriou and Rohit Sunkam Ramanujam and Bill Lin and Li-Shiuan Peh A High-Throughput Distributed Shared-Buffer NoC Router . . . . . . . . 21--24 Zvika Guz and Evgeny Bolotin and Idit Keidar and Avinoam Kolodny and Avi Mendelson and Uri C. Weiser Many-Core vs. Many-Thread Machines: Stay Away From the Valley . . . . . . . . . . 25--28 Aniruddha Desai and Jugdutt Singh Architecture Independent Characterization of Embedded Java Workloads . . . . . . . . . . . . . . . 29--32 Elisardo Antelo A Comment on ``Beyond Fat-tree: Unidirectional Load-Balanced Multistage Interconnection Network'' . . . . . . . 33--34 Anonymous [Advertisement] . . . . . . . . . . . . 35--35 Anonymous Ad --- IEEE Computer Society Digital Library . . . . . . . . . . . . . . . . 36--36 Anonymous Editorial Board [Cover2] . . . . . . . . c2--c2 Anonymous [Front cover] . . . . . . . . . . . . . c1--c1 Anonymous Information for authors . . . . . . . . c3--c3 Anonymous IEEE Computer Society [Cover4] . . . . . c4--c4
Jean-Luc Gaudiot Introducing the New Editor-in-Chief of \booktitleIEEE Computer Architecture Letters . . . . . . . . . . . . . . . . 37--38 K. Skadron Letter from the Editor . . . . . . . . . 39--39 Kevin Skadron Untitled . . . . . . . . . . . . . . . . 39--39 Jing Xin and Russ Joseph Exploiting Locality to Improve Circuit-level Timing Speculation . . . . 40--43 Arvind Sudarsanam and Ramachandra Kallam and Aravind Dasu PRR--PRR Dynamic Relocation . . . . . . 44--47 Jacob Leverich and Matteo Monchiero and Vanish Talwar and Partha Ranganathan and Christos Kozyrakis Power Management of Datacenter Workloads Using Per-Core Power Gating . . . . . . 48--51 Enric Musoll A Process-Variation Aware Technique for Tile-Based, Massive Multicore Processors 52--55 Alexandro Baldassin and Felipe Klein and Guido Araujo and Rodolfo Azevedo and Paulo Centoducatte Characterizing the Energy Consumption of Software Transactional Memory . . . . . 56--59 James Balfour and R. Curtis Harting and William J. Dally Operand Registers and Explicit Operand Forwarding . . . . . . . . . . . . . . . 60--63 Derek Chiou and Hari Angepat and Nikhil A. Patil and Dam Sunwoo Accurate Functional-First Multicore Simulators . . . . . . . . . . . . . . . 64--67 Anonymous [Advertisement] . . . . . . . . . . . . 68--68 Anonymous [Advertisement] . . . . . . . . . . . . 69--69 Anonymous [Advertisement] . . . . . . . . . . . . 70--70 Anonymous [Advertisement] . . . . . . . . . . . . 71--71 Anonymous [Advertisement] . . . . . . . . . . . . 72--72 Anonymous Editorial Board [Cover2] . . . . . . . . c2--c2 Anonymous [Front cover] . . . . . . . . . . . . . c1--c1 Anonymous Information for authors . . . . . . . . c3--c3 Anonymous IEEE Computer Society [Cover4] . . . . . c4--c4
Shruti Patil and David J. Lilja Using Resampling Techniques to Compute Confidence Intervals for the Harmonic Mean of Rate-Based Performance Metrics 1--4 Andre Seznec A Phase Change Memory as a Secure Main Memory . . . . . . . . . . . . . . . . . 5--8 Seon-yeong Park and Euiseong Seo and Ji-Yong Shin and Seungryoul Maeng and Joonwon Lee Exploiting Internal Parallelism of Flash-based SSDs . . . . . . . . . . . . 9--12 Hari Subramoni and Fabrizio Petrini and Virat Agarwal and Davide Pasetto Intra-Socket and Inter-Socket Communication in Multi-core Systems . . 13--16 Giang Hoang and Chang Bae and John Lange and Lide Zhang and Peter Dinda and Russ Joseph A Case for Alternative Nested Paging Models for Virtualized Systems . . . . . 17--20 Evgeni Krimer and Robert Pawlowski and Mattan Erez and Patrick Chiang Synctium: a Near-Threshold Stream Processor for Energy-Constrained Parallel Applications . . . . . . . . . 21--24 Andrew Hilton and Amir Roth SMT-Directory: Efficient Load-Load Ordering for SMT . . . . . . . . . . . . 25--28 Mohammad Hammoud and Sangyeun Cho and Rami G. Melhem A Dynamic Pressure-Aware Associative Placement Strategy for Large Scale Chip Multiprocessors . . . . . . . . . . . . 29--32 Hyungjun Kim and Paul V. Gratz Leveraging Unused Cache Block Words to Reduce Power in CMP Interconnect . . . . 33--36 Anonymous Editorial Board [Cover2] . . . . . . . . c2--c2 Anonymous [Front cover] . . . . . . . . . . . . . c1--c1 Anonymous Information for authors . . . . . . . . c3--c3 Anonymous IEEE Computer Society [Cover4] . . . . . c4--c4
K. Skadron Editorial: Letter from the Editor-in-Chief . . . . . . . . . . . . 37--44 Kevin Skadron Untitled . . . . . . . . . . . . . . . . 37--44 Syed Muhammad Zeeshan Iqbal and Yuchen Liang and Hakan Grahn ParMiBench --- an Open-Source Benchmark for Embedded Multiprocessor Systems . . 45--48 Zhen Fang and Erik G. Hallnor and Bin Li and Michael Leddige and Donglai Dai and Seung Eun Lee and Srihari Makineni and Ravi Iyer Boomerang: Reducing Power Consumption of Response Packets in NoCs with Minimal Performance Impact . . . . . . . . . . . 49--52 Michael J. Lyons and Mark Hempstead and Gu-Yeon Wei and David Brooks The Accelerator Store framework for high-performance, low-power accelerator-based systems . . . . . . . 53--56 Ran Manevich and Israel Cidon and Avinoam Kolodny and Isask'har Walter Centralized Adaptive Routing for NoCs 57--60 Meng Zhang and Alvin R. Lebeck and Daniel J. Sorin Fractal Consistency: Architecting the Memory System to Facilitate Verification 61--64 Anonymous Advertisement --- \booktitleIEEE Transactions on Computers Celebrates 60 Years . . . . . . . . . . . . . . . . . 65--65 Anonymous 2011 IEEE Computer Society Simulator Design Competition . . . . . . . . . . . 66--66 Anonymous Advertisement --- Special Student Offer 67--67 Anonymous Advertisement --- Distinguish Yourself With the CSDP . . . . . . . . . . . . . 68--68 Anonymous Conference Proceedings Services (CPS) [advertisement] . . . . . . . . . . . . 69--69 Anonymous IEEE Computer Society Jobs . . . . . . . 70--70 Anonymous Advertisement --- Stay Connected to the IEEE Computer Society . . . . . . . . . 71--71 Anonymous Advertisement --- Computer Society Digital Library . . . . . . . . . . . . 72--72 Anonymous Editorial Board [Cover2] . . . . . . . . c2--c2 Anonymous [Front cover] . . . . . . . . . . . . . c1--c1 Anonymous Information for authors . . . . . . . . c3--c3 Anonymous IEEE Computer Society [Cover4] . . . . . c4--c4
K. Skadron Editorial: Letter from the Editor-in-Chief . . . . . . . . . . . . 1--3 Kevin Skadron Untitled . . . . . . . . . . . . . . . . 1--3 Hans Vandierendonck and Andre Seznec Fairness Metrics for Multi-Threaded Processors . . . . . . . . . . . . . . . 4--7 Jie Tang and Shaoshan Liu and Zhimin Gu and Chen Liu and Jean-Luc Gaudiot Prefetching in Embedded Mobile Systems Can Be Energy-Efficient . . . . . . . . 8--11 Omer Khan and Mieszko Lis and Yildiz Sinangil and Srinivas Devadas DCC: a Dependable Cache Coherence Multicore Architecture . . . . . . . . . 12--15 Paul Rosenfeld and Elliott Cooper-Balis and Bruce Jacob DRAMSim2: a Cycle Accurate Memory System Simulator . . . . . . . . . . . . . . . 16--19 Chunyang Gou and Georgi N. Gaydadjiev Exploiting SPMD Horizontal Locality . . 20--23 Xiaoqun Wang and Zhenzhou Ji and Chen Fu and Mingzeng Hu GCMS: a Global Contention Management Scheme in Hardware Transactional Memory 24--27 Anonymous 2010 Reviewers List . . . . . . . . . . 28--28 Anonymous 2010 Annual Index . . . . . . . . . . . ?? Anonymous Cover 2 . . . . . . . . . . . . . . . . c2--c2 Anonymous Cover 3 . . . . . . . . . . . . . . . . c3--c3 Anonymous Cover 4 . . . . . . . . . . . . . . . . c4--c4 Anonymous [Front cover] . . . . . . . . . . . . . c1--c1
Jason Mars and Lingjia Tang and Robert Hundt Heterogeneity in ``Homogeneous'' Warehouse-Scale Computers: a Performance Opportunity . . . . . . . . . . . . . . 29--32 George Michelogiannakis and Nan Jiang and Daniel U. Becker and William J. Dally Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks . . . . 33--36 Chen-Han Ho and Garret Staus and Aaron Ulmer and Karthikeyan Sankaralingam Exploring the Interaction Between Device Lifetime Reliability and Security Vulnerabilities . . . . . . . . . . . . 37--40 Carles Hernandez and Antoni Roca and Jose Flich and Federico Silla and Jose Duato Fault-Tolerant Vertical Link Design for Effective 3D Stacking . . . . . . . . . 41--44 Inseok Choi and Minshu Zhao and Xu Yang and Donald Yeung Experience with Improving Distributed Shared Cache Performance on Tilera's Tile Processor . . . . . . . . . . . . . 45--48 Pablo Prieto and Valentin Puente and Jose-Angel Gregorio Multilevel Cache Modeling for Chip-Multiprocessor Systems . . . . . . 49--52 Kostas Siozios and Dimitrios Rodopoulos and Dimitrios Soudris On Supporting Rapid Thermal Analysis . . 53--56 Anonymous Cover 3 . . . . . . . . . . . . . . . . c3--c3 Anonymous [Front cover] . . . . . . . . . . . . . c1--c1 Anonymous IEEE Computer Society [society information] . . . . . . . . . . . . . . c4--c4 Anonymous Publication information . . . . . . . . c2--c2
Simha Sethumadhavan and Ryan Roberts and Yannis Tsividis A Case for Hybrid Discrete-Continuous Architectures . . . . . . . . . . . . . 1--4 Ji Kong and Peilin Liu and Yu Zhang Atomic Streaming: a Framework of On-Chip Data Supply System for Task-Parallel MPSoCs . . . . . . . . . . . . . . . . . 5--8 Abhishek Deb and Josep Maria Codina and Antonio Gonzalez A HW/SW Co-designed Programmable Functional Unit . . . . . . . . . . . . 9--12 Roberta Piscitelli and Andy D. Pimentel A High-Level Power Model for MPSoC on FPGA . . . . . . . . . . . . . . . . . . 13--16 Ian Finlayson and Gang-Ryung Uh and David Whalley and Gary Tyson An Overview of Static Pipelining . . . . 17--20 Lisa Wu and Martha A. Kim and Stephen A. Edwards Cache Impacts of Datatype Acceleration 21--24 Anonymous 2011 Reviewers List . . . . . . . . . . 25--26 Anonymous There now is a quick and easy way to find out about our collection of \booktitleTransactions [Advertisement] 26--26 Anonymous Advertisement --- Conference Publishing Services (CPS) . . . . . . . . . . . . . 28--28 Anonymous 2011 Annual Index . . . . . . . . . . . ?? Anonymous [Cover2] . . . . . . . . . . . . . . . . c2--c2 Anonymous [Cover3] . . . . . . . . . . . . . . . . c3--c3 Anonymous [Front cover and table of contents] . . c1--c1 Anonymous IEEE Computer Society [Back cover] . . . c4--c4
John D. Davis and Suzanne Rivoire and Moises Goldszmidt and Ehsan K. Ardestani Including Variability in Large-Scale Cluster Power Models . . . . . . . . . . 29--32 Nagesh B. Lakshminarayana and Jaekyu Lee and Hyesoon Kim and Jinwoo Shin DRAM Scheduling Policy for GPGPU Architectures Based on a Potential Function . . . . . . . . . . . . . . . . 33--36 Yaohua Wang and Shuming Chen and Kai Zhang and Jianghua Wan and Xiaowen Chen and Hu Chen and Haibo Wang Instruction Shuffle: Achieving MIMD-like Performance on SIMD Architectures . . . 37--40 Reena Panda and Paul V. Gratz and Daniel A. Jiménez B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors . . 41--44 Timothy N. Miller and Renji Thomas and Radu Teodorescu Mitigating the Effects of Process Variation in Ultra-low Voltage Chip Multiprocessors using Dual Supply Voltages and Half-Speed Units . . . . . 45--48 Yong Li and Rami Melhem and Alex K. Jones Leveraging Sharing in Second Level Translation-Lookaside Buffers for Chip Multiprocessors . . . . . . . . . . . . 49--52 Christina Delimitrou and Sriram Sankar and Kushagra Vaid and Christos Kozyrakis Decoupling Datacenter Storage Studies from Access to Large-Scale Applications 53--56 Jie Chen and Guru Venkataramani and Gabriel Parmer The Need for Power Debugging in the Multi-Core Environment . . . . . . . . . 57--60 Justin Meza and Jichuan Chang and HanBin Yoon and Onur Mutlu and Parthasarathy Ranganathan Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management . . . . . . . . . . . . 61--64 Tsahee Zidenberg and Isaac Keslassy and Uri Weiser MultiAmdahl: How Should I Divide My Heterogeneous Chip? . . . . . . . . . . 65--68 Anonymous [Back cover] . . . . . . . . . . . . . . c4--c4 Anonymous [Back inside cover] . . . . . . . . . . c3--c3 Anonymous [Front inside cover] . . . . . . . . . . c2--c2
Kevin Skadron Introducing the New Editor-in-Chief of the \booktitleIEEE Computer Architecture Letters . . . . . . . . . . . . . . . . 1--1 Anonymous 2012 Annual Index . . . . . . . . . . . 1--4 Lieven Eeckhout A Message from the New Editor-in-Chief and Introduction of New Associate Editors . . . . . . . . . . . . . . . . 2--2 J. Martinez A Message from the New Editor-in-Chief and Introduction of New Associate Editors . . . . . . . . . . . . . . . . 2--4 Arash Tavakkol and Mohammad Arjomand and Hamid Sarbazi-Azad Network-on-SSD: a Scalable and High-Performance Communication Design Paradigm for SSDs . . . . . . . . . . . 5--8 Guang Sun and Chia-Wei Chang and Bill Lin A New Worst-Case Throughput Bound for Oblivious Routing in Odd Radix Mesh Network . . . . . . . . . . . . . . . . 9--12 I. Burak Karsli and Pedro Reviriego and M. Fatih Balli and O\uguz Ergin and J. A. Maestro Enhanced Duplication: a Technique to Correct Soft Errors in Narrow Values . . 13--16 Michael Lyons and Gu-Yeon Wei and David Brooks Shrink-Fit: a Framework for Flexible Accelerator Sizing . . . . . . . . . . . 17--20 Nam Duong and Alexander V. Veidenbaum Compiler-Assisted, Selective Out-Of-Order Commit . . . . . . . . . . 21--24 Siddharth Nilakantan and Steven Battle and Mark Hempstead Metrics for Early-Stage Modeling of Many-Accelerator Architectures . . . . . 25--28 Christina Delimitrou and Christos Kozyrakis The Netflix Challenge: Datacenter Edition . . . . . . . . . . . . . . . . 29--32 Anonymous 2012 reviewers list . . . . . . . . . . 33--34 Anonymous IEEE Open Access Publishing . . . . . . 35--35 Anonymous \booktitleIEEE Transactions Newsletter 36--36
J. F. Martinez Editorial . . . . . . . . . . . . . . . 37--38 Xun Jian and John Sartori and Henry Duwe and Rakesh Kumar High Performance, Energy Efficient Chipkill Correct Memory with Multidimensional Parity . . . . . . . . 39--42 Rakan Maddah and Sangyeun Cho and Rami Melhem Data Dependent Sparing to Manage Better-Than-Bad Blocks . . . . . . . . . 43--46 Hanjoon Kim and Yonggon Kim and John Kim Clumsy Flow Control for High-Throughput Bufferless On-Chip Networks . . . . . . 47--50 Yi Kai and Yi Wang and Bin Liu GreenRouter: Reducing Power by Innovating Router's Architecture . . . . 51--54 Yongsoo Joo and Sangsoo Park A Hybrid PRAM and STT--RAM Cache Architecture for Extending the Lifetime of PRAM Caches . . . . . . . . . . . . . 55--58 Emily Blem and Hadi Esmaeilzadeh and Renee St Amant and Karthikeyan Sankaralingam and Doug Burger Multicore Model from Abstract Single Core Inputs . . . . . . . . . . . . . . 59--62 Pierre Michaud Demystifying Multicore Throughput Metrics . . . . . . . . . . . . . . . . 63--66 Priyanka Tembey and Augusto Vega and Alper Buyuktosunoglu and Dilma Da Silva and Pradip Bose SMT Switch: Software Mechanisms for Power Shifting . . . . . . . . . . . . . 67--70 Anonymous IEEE Open Access Publishing . . . . . . 71--71 Anonymous Stay Connected to the IEEE Computer Society . . . . . . . . . . . . . . . . 72--72 Anonymous [Back cover] . . . . . . . . . . . . . . c4--c4 Anonymous [Back inside cover] . . . . . . . . . . c3--c3 Anonymous [Front cover] . . . . . . . . . . . . . c1--c1 Anonymous [Front inside cover] . . . . . . . . . . c2--c2
Angelos Arelakis and Per Stenström A Case for a Value-Aware Cache . . . . . 1--4 Zheng Chen and Huaxi Gu and Yintang Yang and Luying Bai and Hui Li A Power Efficient and Compact Optical Interconnect for Network-on-Chip . . . . 5--8 Emilio G. Cota and Paolo Mantovani and Michele Petracca and Mario R. Casu and Luca P. Carloni Accelerator Memory Reuse in the Dark Silicon Era . . . . . . . . . . . . . . 9--12 Yu-Liang Chou and Shaoshan Liu and Eui-Young Chung and Jean-Luc Gaudiot An Energy and Performance Efficient DVFS Scheme for Irregular Parallel Divide-and-Conquer Algorithms on the Intel SCC . . . . . . . . . . . . . . . 13--16 Nadav Rotem and Yosi Ben Asher Block Unification IF-conversion for High Performance Architectures . . . . . . . 17--20 Aleksandar Ilic and Frederico Pratas and Leonel Sousa Cache-aware Roofline model: Upgrading the loft . . . . . . . . . . . . . . . . 21--24 Rotem Efraim and Ran Ginosar and C. Weiser and Avi Mendelson Energy Aware Race to Halt: a Down to EARtH Approach for Platform Energy Management . . . . . . . . . . . . . . . 25--28 Yaman Çakmakçi and O\uguz Ergin Exploiting Virtual Addressing for Increasing Reliability . . . . . . . . . 29--32 Yuhao Zhu and Aditya Srikanth and Jingwen Leng and Vijay Janapa Reddi Exploiting Webpage Characteristics for Energy-Efficient Mobile Web Browsing . . 33--36 Amir Morad and Tomer Y. Morad and Leonid Yavits and Ran Ginosar and Uri Weiser Generalized MultiAmdahl: Optimization of Heterogeneous Multi-Accelerator SoC . . 37--40 Shahar Kvatinsky and Yuval H. Nacson and Yoav Etsion and Eby G. Friedman and Avinoam Kolodny and Uri C. Weiser Memristor-Based Multithreading . . . . . 41--44 Joseph G. Wingbermuehle and Ron K. Cytron and Roger D. Chamberlain Optimization of Application-Specific Memories . . . . . . . . . . . . . . . . 45--48 Yunlong Xu and Rui Wang and Nilanjan Goswami and Tao Li and Depei Qian Software Transactional Memory for GPU Architectures . . . . . . . . . . . . . 49--52 Keun Sup Shim and Mieszko Lis and Omer Khan and Srinivas Devadas Thread Migration Prediction for Distributed Shared Caches . . . . . . . 53--56 Anonymous Table of Contents . . . . . . . . . . . C1--C4 Anonymous \booktitleIEEE Transactions on Pattern Analysis and Machine Intelligence Editorial Board . . . . . . . . . . . . C2--C2 Anonymous \booktitleIEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors . . . . . . . . C3--C3 Anonymous IEEE Computer Society . . . . . . . . . C4--C4
Maysam Lavasani and Hari Angepat and Derek Chiou An FPGA-based In-Line Accelerator for Memcached . . . . . . . . . . . . . . . 57--60 Xiang Song and Jian Yang and Haibo Chen Architecting Flash-based Solid-State Drive for High-performance I/O Virtualization . . . . . . . . . . . . . 61--64 Carole-Jean Wu Architectural Thermal Energy Harvesting Opportunities for Sustainable Computing 65--68 Leonid Yavits and Amir Morad and Ran Ginosar Cache Hierarchy Optimization . . . . . . 69--72 Sadegh Yazdanshenas and Marzieh Ranjbar Pirbasti and Mahdi Fazeli and Ahmad Patooghy Coding Last Level STT-RAM Cache For High Endurance And Low Power . . . . . . . . 73--76 Jan Kasper Martinsen and Hakan Grahn and Anders Isberg Heuristics for Thread-Level Speculation in Web Applications . . . . . . . . . . 77--80 Vivek S. Nandakumar and Ma\lgorzata Marek-Sadowska On Optimal Kernel Size for Integrated CPU--GPUs --- a Case Study . . . . . . . 81--84 Qixiao Liu and Victor Jimenez and Miquel Moreto and Jaume Abella and Francisco J. Cazorla and Mateo Valero Per-task Energy Accounting in Computing Systems . . . . . . . . . . . . . . . . 85--88 Hamid Mahmoodi and Sridevi Srinivasan Lakshmipuram and Manish Arora and Yashar Asgarieh and Houman Homayoun and Bill Lin and Dean M. Tullsen Resistive Computation: a Critique . . . 89--92 Stijn Eyerman and Lieven Eeckhout Restating the Case for Weighted-IPC Metrics to Evaluate Multiprogram Workload Performance . . . . . . . . . . 93--96 Sonya R. Wolff and Ronald D. Barnes Revisiting Using the Results of Pre-Executed Instructions in Runahead Processors . . . . . . . . . . . . . . . 97--100 Youngsok Kim and Jaewon Lee and Donggyu Kim and Jangwoo Kim ScaleGPU: GPU Architecture for Memory-Unaware GPU Programming . . . . . 101--104 Sriram Sankar and Sudhanva Gurumurthi Soft Failures in Large Datacenters . . . 105--108 Daehoon Kim and Hwanju Kim and Jaehyuk Huh vCache: Providing a Transparent View of the LLC in Virtualized Environments . . 109--112 Anonymous Table of Contents . . . . . . . . . . . C1--C1 Anonymous \booktitleIEEE Computer Architecture Letters Editorial Board . . . . . . . . C2--C2 Anonymous \booktitleIEEE Computer Architecture Letters Information for Authors . . . . C3--C3 Anonymous IEEE Computer Society [advertisement] C4--C4
Jianwei Liao and Fengxiang Zhang and Li Li and Guoqiang Xiao Adaptive Wear-Leveling in Flash-Based Memory . . . . . . . . . . . . . . . . . 1--4 Anonymous 2014 Index \booktitleIEEE Computer Architecture Letters Vol. 13 . . . . . . 1--5 Jie Chen and Guru Venkataramani A Hardware-Software Cooperative Approach for Application Energy Profiling . . . . 5--8 Dae-Hyun Kim and Prashant J. Nair and Moinuddin K. Qureshi Architectural Support for Mitigating Row Hammering in DRAM Memories . . . . . . . 9--12 Ralph Nathan and Daniel J. Sorin Argus-G: Comprehensive, Low-Cost Error Detection for GPGPU Cores . . . . . . . 13--16 Seongil O and Sanghyuk Kwon and Young Hoon Son and Yujin Park and Jung Ho Ahn CIDR: a Cache Inspired Area-Efficient DRAM Resilience Architecture against Permanent Faults . . . . . . . . . . . . 17--20 O. Seongil and Sanghyuk Kwon and Young Hoon Son and Yujin Park and Jung Ho Ahn CIDR: a Cache Inspired Area-Efficient DRAM Resilience Architecture against Permanent Faults . . . . . . . . . . . . 17--20 Ujjwal Gupta and Umit Y. Ogras Constrained Energy Optimization in Heterogeneous Platforms Using Generalized Scaling Models . . . . . . . 21--25 Amin Farmahini-Farahani and Jung Ho Ahn and Katherine Morrow and Nam Sung Kim DRAMA: an Architecture for Accelerated Processing Near Memory . . . . . . . . . 26--29 Trevor E. Carlson and Siddharth Nilakantan and Mark Hempstead and Wim Heirman Epoch Profiles: Microarchitecture-Based Application Analysis and Optimization 30--33 Jason Power and Joel Hestness and Marc S. Orr and Mark D. Hill and David A. Wood gem5-gpu: a Heterogeneous CPU--GPU Simulator . . . . . . . . . . . . . . . 34--36 Dilan Manatunga and Joo Hwan Lee and Hyesoon Kim Hardware Support for Safe Execution of Native Client Applications . . . . . . . 37--40 Longjun Liu and Chao Li and Hongbin Sun and Yang Hu and Jingmin Xin and Nanning Zheng and Tao Li Leveraging Heterogeneous Power for Improving Datacenter Efficiency and Resiliency . . . . . . . . . . . . . . . 41--45 Rui Wang and Wangyuan Zhang and Tao Li and Depei Qian Leveraging Non-Volatile Storage to Achieve Versatile Cache Optimizations 46--49 Milad Mohammadi and Song Han and Tor M. Aamodt and William J. Dally On-Demand Dynamic Branch Prediction . . 50--53 Leonid Azriel and Avi Mendelson and Uri Weiser Peripheral Memory: a Technique for Fighting Memory Bandwidth Bottleneck . . 54--57 Zhaoguo Wang and Han Yi and Ran Liu and Mingkai Dong and Haibo Chen Persistent Transactional Memory . . . . 58--61 Enric Gibert and Raul Martínez and Carlos Madriles and Josep M. Codina Profiling Support for Runtime Managed Code: Next Generation Performance Monitoring Units . . . . . . . . . . . . 62--65 Daecheol You and Ki-Seok Chung Quality of Service-Aware Dynamic Voltage and Frequency Scaling for Embedded GPUs 66--69 Sungjin Lee and Jihong Kim and Arvind Refactored Design of I/O Architecture for Flash Storage . . . . . . . . . . . 70--74 Fengkai Yuan and Zhenzhou Ji and Suxia Zhu Set-Granular Regional Distributed Cooperative Caching . . . . . . . . . . 75--78 Junghee Lee and Youngjae Kim and Jongman Kim and Galen M. Shipman Synchronous I/O Scheduling of Independent Write Caches for an Array of SSDs . . . . . . . . . . . . . . . . . . 79--82 Anonymous Rock Stars of Wearables . . . . . . . . 83--83 Anonymous Rock Stars of Cybersecurity 2015 Conference . . . . . . . . . . . . . . . 84--84 Anonymous Table of Contents . . . . . . . . . . . C1--C1 Anonymous \booktitleIEEE Computer Architecture Letters Editorial Board . . . . . . . . C2--C2 Anonymous \booktitleIEEE Computer Architecture Letters Information for Authors . . . . C3--C3 Anonymous IEEE Computer Society . . . . . . . . . C4--C4
Qingchuan Shi and Henry Hoffmann and Omer Khan A Cross-Layer Multicore Architecture to Tradeoff Program Accuracy and Resilience Overheads . . . . . . . . . . . . . . . 85--89 Zhong Zheng and Zhiying Wang and Mikko Lipasti Adaptive Cache and Concurrency Allocation on GPGPUs . . . . . . . . . . 90--93 Tony Nowatzki and Venkatraman Govindaraju and Karthikeyan Sankaralingam A Graph-Based Program Representation for Analyzing Hardware Specialization Approaches . . . . . . . . . . . . . . . 94--98 Seung Hun Kim and Dohoon Kim and Changmin Lee and Won Seob Jeong and Won Woo Ro and Jean-Luc Gaudiot A Performance-Energy Model to Evaluate Single Thread Execution Acceleration . . 99--102 William Song and Saibal Mukhopadhyay and Sudhakar Yalamanchili Architectural Reliability: Lifetime Reliability Characterization and Management of Many-Core Processors . . . 103--106 Pavan Poluri and Ahmed Louri A Soft Error Tolerant Network-on-Chip Router Pipeline for Multi-Core Systems 107--110 Canwen Xiao and Yue Yang and Jianwen Zhu A Sufficient Condition for Deadlock-Free Adaptive Routing in Mesh Networks . . . 111--114 Sparsh Mittal and Jeffrey S. Vetter AYUSH: a Technique for Extending Lifetime of SRAM--NVM Hybrid Caches . . 115--118 Rajit Manohar Comparing Stochastic and Deterministic Computing . . . . . . . . . . . . . . . 119--122 Bon-Keun Seo and Seungryoul Maeng and Joonwon Lee and Euiseong Seo DRACO: a Deduplicating FTL for Tangible Extra Capacity . . . . . . . . . . . . . 123--126 Vivek Seshadri and Kevin Hsieh and Amirali Boroum and Donghyuk Lee and Michael A. Kozuch and Onur Mutlu and Phillip B. Gibbons and Todd C. Mowry Fast Bulk Bitwise AND and OR in DRAM . . 127--131 Muhammad Shoaib Bin Altaf and David A. Wood LogCA: a Performance Model for Hardware Accelerators . . . . . . . . . . . . . . 132--135 Dionysios Diamantopoulos and Sotirios Xydis and Kostas Siozios and Dimitrios Soudris Mitigating Memory-Induced Dark Silicon in Many-Accelerator Architectures . . . 136--139 Matthew Poremba and Tao Zhang and Yuan Xie NVMain 2.0: a User-Friendly Memory Simulator to Model (Non-) Volatile Memory Systems . . . . . . . . . . . . . 140--143 Hans Vandierendonck and Ahmad Hassan and Dimitrios S. Nikolopoulos On the Energy-Efficiency of Byte-Addressable Non-Volatile Memory . . 144--147 Leonid Yavits and Shahar Kvatinsky and Amir Morad and Ran Ginosar Resistive Associative Processor . . . . 148--151 Suk Chan Kang and Chrysostomos Nicopoulos and Ada Gavrilovska and Jongman Kim Subtleties of Run-Time Virtual Address Stacks . . . . . . . . . . . . . . . . . 152--155 Dimitrios Rodopoulos and Francky Catthoor and Dimitrios Soudris Tackling Performance Variability Due to RAS Mechanisms with PID-Controlled DVFS 156--159 Nikola Markovic and Daniel Nemirovsky and Osman Unsal and Mateo Valero and Adrian Cristal Thread Lock Section-Aware Scheduling on Asymmetric Single-ISA Multi-Core . . . . 160--163 Gennady Pekhimenko and Evgeny Bolotin and Mike O'Connor and Onur Mutlu and Todd C. Mowry and Stephen W. Keckler Toggle-Aware Compression for GPUs . . . 164--168 Anonymous Table of Contents . . . . . . . . . . . C1--C1 Anonymous \booktitleIEEE Computer Architecture Letters Editorial Board . . . . . . . . C2--C2 Anonymous \booktitleIEEE Computer Architecture Letters Information for Authors . . . . C3--C3 Anonymous IEEE Computer Society . . . . . . . . . C4--C4
Wo-Tak Wu and Ahmed Louri A Methodology for Cognitive NoC Design 1--4 Anonymous 2015 Index \booktitleIEEE Computer Architecture Letters Vol. 14 . . . . . . 1--6 Anonymous 2015 Index \booktitleIEEE Computer Architecture Letters Vol. 14 . . . . . . 1--6 Seyyed Hossein Seyyedaghaei Rezaei and Abbas Mazloumi and Mehdi Modarressi and Pejman Lotfi-Kamran Dynamic Resource Sharing for High-Performance $3$-D Networks-on-Chip 5--8 Miguel Gorgues and Jose Flich End-Point Congestion Filter for Adaptive Routing with Congestion-Insensitive Performance . . . . . . . . . . . . . . 9--12 Biswabandan Panda and Shankar Balachandran Expert Prefetch Prediction: an Expert Predicting the Usefulness of Hardware Prefetchers . . . . . . . . . . . . . . 13--16 Abdulaziz Eker and O\uguz Ergin Exploiting Existing Copies in Register File for Soft Error Correction . . . . . 17--20 Matthew Maycock and Simha Sethumadhavan Hardware Enforced Statistical Privacy 21--24 Dongdong Li and Tor M. Aamodt Inter-Core Locality Aware Memory Scheduling . . . . . . . . . . . . . . . 25--28 Libei Pu and Kshitij Doshi and Ellis Giles and Peter Varman Non-Intrusive Persistence with a Backend NVM Controller . . . . . . . . . . . . . 29--32 P. Garcia and T. Gomes and J. Monteiro and A. Tavares and M. Ekpanyapong On-Chip Message Passing Sub-System for Embedded Inter-Domain Communication . . 33--36 Minghua Li and Guancheng Chen and Qijun Wang and Yonghua Lin and Peter Hofstee and Per Stenstrom and Dian Zhou PATer: a Hardware Prefetching Automatic Tuner on IBM POWER8 Processor . . . . . 37--40 Mohammad Alian and Daehoon Kim and Nam Sung Kim pd-gem5: Simulation Infrastructure for Parallel/Distributed Computer Systems 41--44 Yoongu Kim and Weikun Yang and Onur Mutlu Ramulator: a Fast and Extensible DRAM Simulator . . . . . . . . . . . . . . . 45--49 Lena E. Olson and Simha Sethumadhavan and Mark D. Hill Security Implications of Third-Party Accelerators . . . . . . . . . . . . . . 50--53 Bruce Jacob The Case for VLIW--CMP as a Building Block for Exascale . . . . . . . . . . . 54--57 Marios Kleanthous and Yiannakis Sazeides and Emre Ozer and Chrysostomos Nicopoulos and Panagiota Nikolaou and Zacharias Hadjilambrou Toward Multi-Layer Holistic Evaluation of System Designs . . . . . . . . . . . 58--61 Bhavya K. Daya and Li-Shiuan Peh and Anantha P. Chandrakasan Towards High-Performance Bufferless NoCs with SCEPTER . . . . . . . . . . . . . . 62--65 Anonymous Introducing IEEE Collabratec . . . . . . 66--66 Anonymous Introducing IEEE Collabratec . . . . . . 66--66 Anonymous Experience the Newest and Most Advanced Thinking in Big Data Analytics . . . . . 67--67 Anonymous \booktitleIEEE Cyber Security . . . . . 68--68 Anonymous Table of Contents . . . . . . . . . . . C1--C1 Anonymous Cover . . . . . . . . . . . . . . . . . C2--C2 Anonymous Cover . . . . . . . . . . . . . . . . . C2--C2 Anonymous Cover . . . . . . . . . . . . . . . . . C3--C3 Anonymous Cover . . . . . . . . . . . . . . . . . C3--C3 Anonymous [Back cover] . . . . . . . . . . . . . . C4--C4
Shuang Liang and Shouyi Yin and Leibo Liu and Yike Guo and Shaojun Wei A Coarse-Grained Reconfigurable Architecture for Compute-Intensive MapReduce Acceleration . . . . . . . . . 69--72 Bo-Cheng Charles Lai and Luis Garrido Platero and Hsien-Kai Kuo A Quantitative Method to Data Reuse Patterns of SIMT Applications . . . . . 73--76 Yaman Çakmakçi and Will Toms and Javier Navaridas and Mikel Lujan Cyclic Power-Gating as an Alternative to Voltage and Frequency Scaling . . . . . 77--80 Erik Tomusk and Christophe Dubach and Michael O'Boyle Diversity: a Design Goal for Heterogeneous Processors . . . . . . . . 81--84 Milad Hashemi and Debbie Marr and Doug Carmean and Yale N. Patt Efficient Execution of Bursty Applications . . . . . . . . . . . . . . 85--88 Sudarsun Kannan and Moinudin Qureshi and Ada Gavrilovska and Karsten Schwan Energy Aware Persistence: Reducing the Energy Overheads of Persistent Memory 89--92 Alejandro Valero and Negar Miralaei and Salvador Petit and Julio Sahuquillo and Timothy M. Jones Enhancing the L1 Data Cache Design to Mitigate HCI . . . . . . . . . . . . . . 93--96 Rathijit Sen and David A. Wood GPGPU Footprint Models to Estimate per-Core Power . . . . . . . . . . . . . 97--100 Daejin Jung and Sheng Li and Jung Ho Ahn Large Pages on Steroids: Small Ideas to Accelerate Big Memory Applications . . . 101--104 Javier Verdu and Alex Pajuelo Performance Scalability Analysis of JavaScript Applications with Web Workers 105--108 Christina Delimitrou and Christos Kozyrakis Security Implications of Data Mining in Cloud Scheduling . . . . . . . . . . . . 109--112 Zhenning Wang and Jun Yang and Rami Melhem and Bruce Childers and Youtao Zhang and Minyi Guo Simultaneous Multikernel: Fine-Grained Sharing of GPUs . . . . . . . . . . . . 113--116 Chulian Zhang and Hamed Tabkhi and Gunar Schirner Studying Inter-Warp Divergence Aware Execution on GPUs . . . . . . . . . . . 117--120 Arash Tavakkol and Pooyan Mehrvarzy and Hamid Sarbazi-Azad TBM: Twin Block Management Policy to Enhance the Utilization of Plane-Level Parallelism in SSDs . . . . . . . . . . 121--124 Bruce Jacob The 2 PetaFLOP, 3 Petabyte, 9 TB/s, 90 kW Cabinet: a System Architecture for Exascale and Big Data . . . . . . . . . 125--128 He Xiao and Wen Yueh and Saibal Mukhopadhyay and Sudhakar Yalamanchili Thermally Adaptive Cache Access Mechanisms for 3D Many-Core Architectures . . . . . . . . . . . . . 129--132 Qi Hu and Peng Liu and Michael C. Huang Threads and Data Mapping: Affinity Analysis for Traffic Reduction . . . . . 133--136 Anonymous Table of Contents . . . . . . . . . . . C1--C1 Anonymous Cover . . . . . . . . . . . . . . . . . C2--C2 Anonymous Cover . . . . . . . . . . . . . . . . . C3--C3 Anonymous Table of contents [back cover] . . . . . C4--C4
Nathan Beckmann and Daniel Sanchez Cache Calculus: Modeling Caches through Differential Equations . . . . . . . . . 1--5 Anonymous 2016 Index \booktitleIEEE Computer Architecture Letters Vol. 15 . . . . . . 1--6 Xin Zhan and Reza Azimi and Svilen Kanev and David Brooks and Sherief Reda CARB: a C-State Power Management Arbiter for Latency-Critical Workloads . . . . . 6--9 Dong-Ik Jeon and Ki-Seok Chung CasHMC: a Cycle-Accurate Simulator for Hybrid Memory Cube . . . . . . . . . . . 10--13 Hao Wu and Fangfei Liu and Ruby B. Lee Cloud Server Benchmark Suite for Evaluating New Hardware Architectures 14--17 Seyed Mohammad Seyedzadeh and Alex K. Jones and Rami Melhem Counter-Based Tree Structure for Row Hammering Mitigation in DRAM . . . . . . 18--21 Hoda Naghibijouybari and Nael Abu-Ghazaleh Covert Channels on GPGPUs . . . . . . . 22--25 Wonjun Song and Hyung-Joon Jung and Jung Ho Ahn and Jae W. Lee and John Kim Evaluation of Performance Unfairness in NUMA System Architecture . . . . . . . . 26--29 Uri Verner and Avi Mendelson and Assaf Schuster Extending Amdahl's Law for Multicores with Turbo Boost . . . . . . . . . . . . 30--33 Hiroshi Sasaki and Fang-Hsiang Su and Teruo Tanimoto and Simha Sethumadhavan Heavy Tails in Program Structure . . . . 34--37 Liang Feng and Hao Liang and Sharad Sinha and Wei Zhang HeteroSim: a Heterogeneous CPU--FPGA Simulator . . . . . . . . . . . . . . . 38--41 Xia Zhao and Yuxi Liu and Almutaz Adileh and Lieven Eeckhout LA-LLC: Inter-Core Locality-Aware Last-Level Cache to Exploit Many-to-Many Traffic in GPGPUs . . . . . . . . . . . 42--45 Amirali Boroumand and Saugata Ghose and Minesh Patel and Hasan Hassan and Brandon Lucia and Kevin Hsieh and Krishna T. Malladi and Hongzhong Zheng and Onur Mutlu LazyPIM: an Efficient Cache Coherence Mechanism for Processing-in-Memory . . . 46--50 Mark Gottscho and Mohammed Shoaib and Sriram Govindan and Bikash Sharma and Di Wang and Puneet Gupta Measuring the Impact of Memory Errors on Application Performance . . . . . . . . 51--55 Almutaz Adileh and Stijn Eyerman and Aamer Jaleel and Lieven Eeckhout Mind The Power Holes: Sifting Operating Points in Power-Limited Heterogeneous Multicores . . . . . . . . . . . . . . . 56--59 Hiroshi Sasaki and Alper Buyuktosunoglu and Augusto Vega and Pradip Bose Mitigating Power Contention: a Scheduling Based Approach . . . . . . . 60--63 David Gonzalez Marquez and Adrian Cristal Kestelman and Esteban Mocskos Mth: Codesigned Hardware/Software Support for Fine Grain Threads . . . . . 64--67 Tomer Y. Morad and Gil Shomron and Mattan Erez and Avinoam Kolodny and Uri C. Weiser Optimizing Read-Once Data Flow in Big-Data Applications . . . . . . . . . 68--71 Ali Yasoubi and Reza Hojabr and Mehdi Modarressi Power-Efficient Accelerator Design for Neural Networks Using Computation Reuse 72--75 Young Hoon Son and Hyunyoon Cho and Yuhwan Ro and Jae W. Lee and Jung Ho Ahn SALAD: Achieving Symmetric Access Latency with Asymmetric DRAM Architecture . . . . . . . . . . . . . . 76--79 Patrick Judd and Jorge Albericio and Andreas Moshovos Stripes: Bit-Serial Deep Neural Network Computing . . . . . . . . . . . . . . . 80--83 Gokul Subramanian Ravi and Mikko Lipasti Timing Speculation in Multi-Cycle Data Paths . . . . . . . . . . . . . . . . . 84--87
Samira Khan and Chris Wilkerson and Donghyuk Lee and Alaa R. Alameldeen and Onur Mutlu A Case for Memory Content-Based Detection and Mitigation of Data-Dependent Failures in DRAM . . . . 88--93 Sparsh Mittal and Jeffrey S. Vetter and Lei Jiang Addressing Read-Disturbance Issue in STT--RAM by Data Compression and Selective Duplication . . . . . . . . . 94--98 Mohammad Bakhshalipour and Pejman Lotfi-Kamran and Hamid Sarbazi-Azad An Efficient Temporal Data Prefetcher for L1 Caches . . . . . . . . . . . . . 99--102 Jorge A. Martínez and Juan Antonio Maestro and Pedro Reviriego A Scheme to Improve the Intrinsic Error Detection of the Instruction Set Architecture . . . . . . . . . . . . . . 103--106 Rujia Wang and Sparsh Mittal and Youtao Zhang and Jun Yang Decongest: Accelerating Super-Dense PCM Under Write Disturbance by Hot Page Remapping . . . . . . . . . . . . . . . 107--110 Teruo Tanimoto and Takatsugu Ono and Koji Inoue and Hiroshi Sasaki Enhanced Dependence Graph Model for Critical Path Analysis on Modern Out-of-Order Processors . . . . . . . . 111--114 Junghee Lee and Kalidas Ganesh and Hyuk-Jun Lee and Youngjae Kim FESSD: a Fast Encrypted SSD Employing On-Chip Access-Control Memory . . . . . 115--118 Abdel-Hameed A. Badawy and Donald Yeung Guiding Locality Optimizations for Graph Computations via Reuse Distance Analysis 119--122 Yue Zha and Jing Li IMEC: a Fully Morphable In-Memory Computing Fabric Enabled by Resistive Crossbar . . . . . . . . . . . . . . . . 123--126 Li-Jhan Chen and Hsiang-Yun Cheng and Po-Han Wang and Chia-Lin Yang Improving GPGPU Performance via Cache Locality Aware Thread Block Scheduling 127--131 James Garland and David Gregg Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks . . . . . . . . . . . . . . . . 132--135 Myoungsoo Jung NearZero: an Integration of Phase Change Memory with Multi-Core Coprocessor . . . 136--140 Leonid Yavits and Uri Weiser and Ran Ginosar Resistive Address Decoder . . . . . . . 141--144 Madhavan Manivannan and Miquel Peric\`as and Vassilis Papaefstathiou and Per Stenström Runtime-Assisted Global Cache Management for Task-Based Parallel Programs . . . . 145--148 Arthur Perais and Andre Seznec Storage-Free Memory Dependency Prediction . . . . . . . . . . . . . . . 149--152 Amirhossein Mirhosseini and Aditya Agrawal and Josep Torrellas Survive: Pointer-Based In-DRAM Incremental Checkpointing for Low-Cost Data Persistence and Rollback-Recovery 153--157 Sandro Pinto and Jorge Pereira and Tiago Gomes and Mongkol Ekpanyapong and Adriano Tavares Towards a TrustZone-Assisted Hypervisor for Real-Time Embedded Systems . . . . . 158--161 Trevor E. Carlson and Kim-Anh Tran and Alexandra Jimborean and Konstantinos Koukos and Magnus Själander and Stefanos Kaxiras Transcending Hardware Limits with Software Out-of-Order Processing . . . . 162--165 Hossein Ahmadvand and Maziar Goudarzi Using Data Variety for Efficient Progressive Big Data Processing in Warehouse-Scale Computers . . . . . . . 166--169 Dan Zhang and Xiaoyu Ma and Derek Chiou Worklist-Directed Prefetching . . . . . 170--173
Alberto Scionti and Somnath Mazumdar and Stephane Zuckerman Enabling Massive Multi-Threading with Fast Hashing . . . . . . . . . . . . . . 1--4 Anonymous 2017 Index \booktitleIEEE Computer Architecture Letters Vol. 16 . . . . . . 1--6 Dong-Ik Jeon and Kyeong-Bin Park and Ki-Seok Chung HMC-MAC: Processing-in Memory Architecture for Multiply--Accumulate Operations with Hybrid Memory Cube . . . 5--8 Sam Van den Steen and Lieven Eeckhout Modeling Superscalar Processor Memory-Level Parallelism . . . . . . . . 9--12 Srdjan Durkovic and Zoran Cica Birkhoff--von Neumann Switch Based on Greedy Scheduling . . . . . . . . . . . 13--16 Binh Pham and Derek Hower and Abhishek Bhattacharjee and Trey Cain TLB Shootdown Mitigation for Low-Power Many-Core Servers with L1 Virtual Caches 17--20 Leonid Yavits and Ran Ginosar Accelerator for Sparse Machine Learning 21--24 Eleftherios-Iordanis Christoforidis and Sotirios Xydis and Dimitrios Soudris CF-TUNE: Collaborative Filtering Auto-Tuning for Energy Efficient Many-Core Processors . . . . . . . . . . 25--28 Amjad F. Almatrood and Harpreet Singh Design of Generalized Pipeline Cellular Array in Quantum-Dot Cellular Automata 29--32 Yue Zha and Jing Li CMA: a Reconfigurable Complex Matching Accelerator for Wire-Speed Network Intrusion Detection . . . . . . . . . . 33--36 Myoungsoo Jung and Jie Zhang and Ahmed Abulila and Miryeong Kwon and Narges Shahidi and John Shalf and Nam Sung Kim and Mahmut Kandemir SimpleSSD: Modeling Solid State Drives for Holistic System Simulation . . . . . 37--41 Zamshed Chowdhury and Jonathan D. Harms and S. Karen Khatamifard and Masoud Zabihi and Yang Lv and Andrew P. Lyle and Sachin S. Sapatnekar and Ulya R. Karpuzcu and Jian-Ping Wang Efficient In-Memory Processing Using Spintronics . . . . . . . . . . . . . . 42--46 Mohammadamin Ajdari and Pyeongsu Park and Dongup Kwon and Joonsung Kim and Jangwoo Kim A Scalable HW-Based Inline Deduplication for SSD Arrays . . . . . . . . . . . . . 47--50 Morteza Hoseinzadeh Flow-Based Simulation Methodology . . . 51--54 Stijn Eyerman and Wim Heirman and Kristof Du Bois and Ibrahim Hur Multi-Stage CPI Stacks . . . . . . . . . 55--58 Guowei Zhang and Daniel Sanchez Leveraging Hardware Caches for Memoization . . . . . . . . . . . . . . 59--63 Armin Vakil-Ghahani and Sara Mahdizadeh-Shahri and Mohammad-Reza Lotfi-Namin and Mohammad Bakhshalipour and Pejman Lotfi-Kamran and Hamid Sarbazi-Azad Cache Replacement Policy Based on Expected Hit Count . . . . . . . . . . . 64--67 Zacharias Hadjilambrou and Shidhartha Das and Marco A. Antoniades and Yiannakis Sazeides Sensing CPU Voltage Noise Through Electromagnetic Emanations . . . . . . . 68--71 Daejin Jung and Sunjung Lee and Wonjong Rhee and Jung Ho Ahn Partitioning Compute Units in CNN Acceleration for Statistical Memory Traffic Shaping . . . . . . . . . . . . 72--75 Joshua San Miguel and Karthik Ganesan and Mario Badr and Natalie Enright Jerger The EH Model: Analytical Exploration of Energy-Harvesting Architectures . . . . 76--79 Jihun Kim and Joonsung Kim and Pyeongsu Park and Jong Kim and Jangwoo Kim SSD Performance Modeling Using Bottleneck Analysis . . . . . . . . . . 80--83 Kevin Angstadt and Jack Wadden and Vinh Dang and Ted Xie and Dan Kramp and Westley Weimer and Mircea Stan and Kevin Skadron MNCaRT: an Open-Source, Multi-Architecture Automata-Processing Research and Execution Ecosystem . . . . 84--87 Hao Zheng and Ahmed Louri EZ-Pass: an Energy & Performance-Efficient Power-Gating Router Architecture for Scalable NoCs 88--91 Leila Delshadtehrani and Schuyler Eldridge and Sadullah Canakci and Manuel Egele and Ajay Joshi Nile: a Programmable Monitoring Coprocessor . . . . . . . . . . . . . . 92--95 Eojin Lee and Sukhan Lee and G. Edward Suh and Jung Ho Ahn TWiCe: Time Window Counter Based Row Refresh to Prevent Row-Hammering . . . . 96--99
Joydeep Rakshit and Kartik Mohanram LEO: Low Overhead Encryption ORAM for Non-Volatile Memories . . . . . . . . . 100--104 Sang Wook Stephen Do and Michel Dubois Core Reliability: Leveraging Hardware Transactional Memory . . . . . . . . . . 105--108 Manolis Kaliorakis and Athanasios Chatzidimitriou and George Papadimitriou and Dimitris Gizopoulos Statistical Analysis of Multicore CPUs Operation in Scaled Voltage Conditions 109--112 Soroosh Khoram and Yue Zha and Jing Li An Alternative Analytical Approach to Associative Processing . . . . . . . . . 113--116 S. Karen Khatamifard and M. Hassan Najafi and Ali Ghoreyshi and Ulya R. Karpuzcu and David J. Lilja On Memory System Design for Stochastic Computing . . . . . . . . . . . . . . . 117--121 Dimitris Mouris and Nektarios Georgios Tsoutsos and Michail Maniatakos TERMinator Suite: Benchmarking Privacy-Preserving Architectures . . . . 122--125 Esha Choukse and Mattan Erez and Alaa Alameldeen CompressPoints: an Evaluation Methodology for Compressed Memory Systems . . . . . . . . . . . . . . . . 126--129 Seikwon Kim and Wonsang Kwak and Changdae Kim and Jaehyuk Huh Zebra Refresh: Value Transformation for Zero-Aware DRAM Refresh Reduction . . . 130--133 Youngeun Kwon and Minsoo Rhu A Case for Memory-Centric HPC System Architecture for Training Deep Neural Networks . . . . . . . . . . . . . . . . 134--138 Engin Ipek and Florian Longnos and Shihai Xiao and Wei Yang Bit-Level Load Balancing: a New Technique for Improving the Write Throughput of Deeply Scaled STT-MRAM . . 139--142 Konstantinos Iliakis and Sotirios Xydis and Dimitrios Soudris Decoupled MapReduce for Shared-Memory Multi-Core Architectures . . . . . . . . 143--146 Zhaoshi Li and Leibo Liu and Yangdong Deng and Shouyi Yin and Shaojun Wei Breaking the Synchronization Bottleneck with Reconfigurable Transactional Execution . . . . . . . . . . . . . . . 147--150 Engin Ipek and Florian Longnos and Shihai Xiao and Wei Yang Vertical Writes: Closing the Throughput Gap between Deeply Scaled STT-MRAM and DRAM . . . . . . . . . . . . . . . . . . 151--154 Yu Gan and Christina Delimitrou The Architectural Implications of Cloud Microservices . . . . . . . . . . . . . 155--158 Ofir Shwartz and Yitzhak Birk Distributed Memory Integrity Trees . . . 159--162 Ji-Tae Yun and Su-Kyung Yoon and Jeong-Geun Kim and Bernd Burgstaller and Shin-Dug Kim Regression Prefetcher with Preprocessing for DRAM--PCM Hybrid Main Memory . . . . 163--166 Jiangwei Zhang and Donald Kline, Jr. and Long Fang and Rami Melhem and Alex K. Jones RETROFIT: Fault-Aware Wear Leveling . . 167--170 Neeraj Kulkarni and Feng Qi and Christina Delimitrou Leveraging Approximation to Improve Datacenter Resource Efficiency . . . . . 171--174 Laith M. AlBarakat and V. Paul Gratz and Daniel A. Jiménez MTB-Fetch: Multithreading Aware Hardware Prefetching for Chip Multiprocessors . . 175--178 Thiruvengadam Vijayaraghavan and Amit Rajesh and Karthikeyan Sankaralingam MPU--BWM: Accelerating Sequence Alignment . . . . . . . . . . . . . . . 179--182 Sander De Pestel and Sam Van den Steen and Shoaib Akram and Lieven Eeckhout RPPM: Rapid Performance Prediction of Multithreaded Applications on Multicore Hardware . . . . . . . . . . . . . . . . 183--186 Wenyi Zhao and Quan Chen and Minyi Guo KSM: Online Application-Level Performance Slowdown Prediction for Spatial Multitasking GPGPU . . . . . . . 187--191 Shivam Swami and Kartik Mohanram ARSENAL: Architecture for Secure Non-Volatile Memories . . . . . . . . . 192--196 Abanti Basak and Xing Hu and Shuangchen Li and Sang Min Oh and Yuan Xie Exploring Core and Cache Hierarchy Bottlenecks in Graph Processing Workloads . . . . . . . . . . . . . . . 197--200 S. Karen Khatamifard and Longfei Wang and Selcuk Köse and Ulya R. Karpuzcu A New Class of Covert Channels Exploiting Power Management Vulnerabilities . . . . . . . . . . . . 201--204 Sushant Kondguli and Michael Huang Bootstrapping: Using SMT Hardware to Improve Single-Thread Performance . . . 205--208 Donald Kline, Jr. and Rami Melhem and Alex K. Jones Counter Advance for Reliable Encryption in Phase Change Memory . . . . . . . . . 209--212 Debiprasanna Sahoo and Swaraj Sha and Manoranjan Satpathy and Madhu Mutyam ReDRAM: a Reconfigurable DRAM Cache for GPGPUs . . . . . . . . . . . . . . . . . 213--216 Susumu Mashimo and Ryota Shioya and Koji Inoue VMOR: Microarchitectural Support for Operand Access in an Interpreter . . . . 217--220 Seungwon Min and Mohammad Alian and Wen-Mei Hwu and Nam Sung Kim Semi-Coherent DMA: an Alternative I/O Coherency Management for Embedded Systems . . . . . . . . . . . . . . . . 221--224 Negin Nematollahi and Mohammad Sadrosadati and Hajar Falahati and Marzieh Barkhordar and Hamid Sarbazi-Azad Neda: Supporting Direct Inter-Core Neighbor Data Exchange in GPUs . . . . . 225--229 Hamza Omar and Halit Dogan and Brian Kahne and Omer Khan Multicore Resource Isolation for Deterministic, Resilient and Secure Concurrent Execution of Safety-Critical Applications . . . . . . . . . . . . . . 230--234 Farzaneh Zokaee and Hamid R. Zarandi and Lei Jiang AligneR: a Process-in-Memory Architecture for Short Read Alignment in ReRAMs . . . . . . . . . . . . . . . . . 235--238 Qian Lou and Lei Jiang BRAWL: a Spintronics-Based Portable Basecalling-in-Memory Architecture for Nanopore Genome Sequencing . . . . . . . 239--242 Donghyun Min and Donggyu Park and Jinwoo Ahn and Ryan Walker and Junghee Lee and Sungyong Park and Youngjae Kim Amoeba: an Autonomous Backup and Recovery SSD for Ransomware Attack Defense . . . . . . . . . . . . . . . . 243--246 Chinam Kim and Hyukjun Lee A High-Bandwidth PCM-Based Memory System for Highly Available IP Routing Table Lookup . . . . . . . . . . . . . . . . . 246--249
Jiho Kim and Jehee Cha and Jason Jong Kyu Park and Dongsuk Jeon and Yongjun Park Improving GPU Multitasking Efficiency Using Dynamic Resource Sharing . . . . . 1--5 Anonymous 2018 Index \booktitleIEEE Computer Architecture Letters Vol. 17 . . . . . . 1--8 Sheng Xu and Xiaoming Chen and Ying Wang and Yinhe Han and Xuehai Qian and Xiaowei Li PIMSim: a Flexible and Detailed Processing-in-Memory Simulator . . . . . 6--9 Gil Shomron and Uri Weiser Spatial Correlation and Value Prediction in Convolutional Neural Networks . . . . 10--13 Ujjwal Gupta and Sumit K. Mandal and Manqing Mao and Chaitali Chakrabarti and Umit Y. Ogras A Deep Q-Learning Approach for Dynamic Management of Heterogeneous Processors 14--17 Samuel Rogers and Joshua Slycord and Ronak Raheja and Hamed Tabkhi Scalable LLVM-Based Accelerator Modeling in gem5 . . . . . . . . . . . . . . . . 18--21 Berkin Akin and Alaa R. Alameldeen A Case For Asymmetric Processing in Memory . . . . . . . . . . . . . . . . . 22--25 Konstantinos Tovletoglou and Lev Mukhanov and Dimitrios S. Nikolopoulos and Georgios Karakonstantis Shimmer: Implementing a Heterogeneous-Reliability DRAM Framework on a Commodity Server . . . . . . . . . 26--29 Chanchal Kumar and Sidharth Singh and Gregory T. Byrd Hybrid Remote Access Protocol . . . . . 30--33 Yicheng Wang and Yang Liu and Peiyun Wu and Zhao Zhang Detect DRAM Disturbance Error by Using Disturbance Bin Counters . . . . . . . . 34--37 Xinfeng Xie and Xing Hu and Peng Gu and Shuangchen Li and Yu Ji and Yuan Xie NNBench-X: Benchmarking and Understanding Neural Network Workloads for Accelerator Designs . . . . . . . . 38--42 Asif Ali Khan and Fazal Hameed and Robin Bläsing and Stuart Parkin and Jeronimo Castrillon RTSim: a Cycle-Accurate Simulator for Racetrack Memories . . . . . . . . . . . 43--46 Yiming Gan and Yuxian Qiu and Jingwen Leng and Yuhao Zhu SVSoC: Speculative Vision Systems-on-a-Chip . . . . . . . . . . . 47--50 Ting-Ru Lin and Yunfan Li and Massoud Pedram and Lizhong Chen Design Space Exploration of Memory Controller Placement in Throughput Processors with Deep Learning . . . . . 51--54 Yehia Arafa and Abdel-Hameed A. Badawy and Gopinath Chennupati and Nandakishore Santhi and Stephan Eidenbenz PPT--GPU: Scalable GPU Performance Modeling . . . . . . . . . . . . . . . . 55--58 Bradley Denby and Brandon Lucia Orbital Edge Computing: Machine Inference in Space . . . . . . . . . . . 59--62 He Liu and Jianhui Han and Youhui Zhang A Unified Framework for Training, Mapping and Simulation of ReRAM-Based Convolutional Neural Network Acceleration . . . . . . . . . . . . . . 63--66 Tian Tan and Eriko Nurvitadhi and Derek Chiou Dark Wires and the Opportunities for Reconfigurable Logic . . . . . . . . . . 67--70 Ajeya Naithani and Josue Feliu and Almutaz Adileh and Lieven Eeckhout Precise Runahead Execution . . . . . . . 71--74 V. Agrawal and M. A. Dinani and Y. Shui and M. Ferdman and N. Honarmand Massively Parallel Server Processors . . 75--78 H. Golestani and G. Gupta and R. Sen Performance Modeling and Bottleneck Analysis of EDGE Processors Using Dependence Graphs . . . . . . . . . . . 79--82 J. Leng and A. Buyuktosunoglu and R. Bertran and P. Bose and V. J. Reddi Asymmetric Resilience for Accelerator-Rich Systems . . . . . . . . 83--86
E. Sadredini and R. Rahimi and V. Verma and M. Stan and K. Skadron A Scalable and Efficient In-Memory Interconnect Architecture for Automata Processing . . . . . . . . . . . . . . . 87--90 A. Yasin and A. Mendelson and Y. Ben-Asher Tuning Performance via Metrics with Expectations . . . . . . . . . . . . . . 91--94 L. Wang and M. Jahre and A. Adileh and Z. Wang and L. Eeckhout Modeling Emerging Memory-Divergent GPU Applications . . . . . . . . . . . . . . 95--98 G. Shomron and T. Horowitz and U. Weiser SMT-SA: Simultaneous Multithreading in Systolic Arrays . . . . . . . . . . . . 99--102 D. Masouros and S. Xydis and D. Soudris Rusty: Runtime System Predictability Leveraging LSTM Neural Networks . . . . 103--106 S. Kim and H. Jung and W. Shin and H. Lee and H. Lee HAD-TWL: Hot Address Detection-Based Wear Leveling for Phase-Change Memory Systems with Low Latency . . . . . . . . 107--110 H. Zhou and G. T. Byrd Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation . . . 111--114 J. Rao and T. Ao and K. Dai and X. Zou ARCE: Towards Code Pointer Integrity on Embedded Processors Using Architecture-Assisted Run-Time Metadata Management . . . . . . . . . . . . . . . 115--118 K. Bhardwaj and M. Havasi and Y. Yao and D. M. Brooks and J. M. H. Lobato and G. Wei Determining Optimal Coherency Interface for Many-Accelerator SoCs Using Bayesian Optimization . . . . . . . . . . . . . . 119--123 Ali Ansari and Pejman Lotfi-Kamran and Hamid Sarbazi-Azad Code Layout Optimization for Near-Ideal Instruction Cache . . . . . . . . . . . 124--127 Kiran Ranganath and AmirAli Abdolrashidi and Shuaiwen Leon Song and Daniel Wong Speeding up Collective Communications Through Inter-GPU Re-Routing . . . . . . 128--131 Dylan Stow and Amin Farmahini-Farahani and Sudhanva Gurumurthi and Michael Ignatowski and Yuan Xie Power Profiling of Modern Die-Stacked Memory . . . . . . . . . . . . . . . . . 132--135 Seyed Morteza Nabavinejad and Hassan Hafez-Kolahi and Sherief Reda Coordinated DVFS and Precision Control for Deep Neural Networks . . . . . . . . 136--140 Seunghak Lee and Nam Sung Kim and Daehoon Kim Exploiting OS-Level Memory Offlining for DRAM Power Management . . . . . . . . . 141--144 Theodoros Marinakis and Iraklis Anagnostopoulos Performance and Fairness Improvement on CMPs Considering Bandwidth and Cache Utilization . . . . . . . . . . . . . . 1--4 Adarsha Balaji and Shihao Song and Anup Das and Nikil Dutt and Jeff Krichmar and Nagarajan Kandasamy and Francky Catthoor A Framework to Explore Workload-Specific Performance and Lifetime Trade-offs in Neuromorphic Computing . . . . . . . . . 149--152 Hyeran Jeon and Hodjat Asghari Esfeden and Nael B. Abu-Ghazaleh and Daniel Wong and Sindhuja Elango Locality-Aware GPU Register File . . . . 153--156 Chen Li and Yifan Sun and Lingling Jin and Lingjie Xu and Zheng Cao and Pengfei Fan and David Kaeli and Sheng Ma and Yang Guo and Jun Yang Priority-Based PCIe Scheduling for Multi-Tenant Multi-GPU Systems . . . . . 157--160 Jian Weng and Sihao Liu and Vidushi Dadu and Tony Nowatzki DAEGEN: a Modular Compiler for Exploring Decoupled Spatial Accelerators . . . . . 161--165 Konstantinos Iliakis and Sotirios Xydis and Dimitrios Soudris LOOG: Improving GPU Efficiency With Light-Weight Out-Of-Order Execution . . 166--169 Reoma Matsuo and Ryota Shioya and Hideki Ando Improving the Instruction Fetch Throughput with Dynamically Configuring the Fetch Pipeline . . . . . . . . . . . 170--173 Vamsee Reddy Kommareddy and Baogang Zhang and Fan Yao and Rickard Ewetz and Amro Awad Are Crossbar Memories Secure? New Security Vulnerabilities in Crossbar Memories . . . . . . . . . . . . . . . . 174--177 Kristin Barber and Anys Bacha and Li Zhou and Yinqian Zhang and Radu Teodorescu Isolating Speculative Data to Prevent Transient Execution Attacks . . . . . . 178--181
Ki-Dong Kang and Gyeongseo Park and Nam Sung Kim and Daehoon Kim Network Packet Processing Mode-Aware Power Management for Data Center Servers 1--4 Mustafa Cavus and Mohammed Shatnawi and Resit Sendag and Augustus K. Uht Exploring Prefetching, Pre-Execution and Branch Outcome Streaming for In-Memory Database Lookups . . . . . . . . . . . . 5--8 Rahul Bodduna and Vinod Ganesan and Patanjali SLPSK and Kamakoti Veezhinathan and Chester Rebeiro Brutus: Refuting the Security Claims of the Cache Timing Randomization Countermeasure Proposed in CEASER . . . 9--12 Minsub Kim and Jaeha Kung and Sungjin Lee Towards Scalable Analytics with Inference-Enabled Solid-State Drives . . 13--17 Congmiao Li and Jean-Luc Gaudiot Challenges in Detecting an Evasive Spectre . . . . . . . . . . . . . . . . 18--21 Mingyu Yan and Zhaodong Chen and Lei Deng and Xiaochun Ye and Zhimin Zhang and Dongrui Fan and Yuan Xie Characterizing and Understanding GCNs on GPU . . . . . . . . . . . . . . . . . . 22--25 Chanchal Kumar and Aayush Chaudhary and Shubham Bhawalkar and Utkarsh Mathur and Saransh Jain and Adith Vastrad and Eric Rotenberg Post-Silicon Microarchitecture . . . . . 26--29 Stijn Eyerman and Wim Heirman and Sam Van den Steen and Ibrahim Hur Breaking In-Order Branch Miss Recovery 30--33 Zhi-Gang Liu and Paul N. Whatmough and Matthew Mattina Systolic Tensor Array: an Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference . . . . . . . . . . 34--37 Srivatsan Krishnan and Zishen Wan and Kshitij Bhardwaj and Paul Whatmough and Aleksandra Faust and Gu-Yeon Wei and David Brooks and Vijay Janapa Reddi The Sky Is Not the Limit: a Visual Performance Model for Cyber-Physical Co-Design in Autonomous Machines . . . . 38--42 Pierre Michaud Exploiting Thermal Transients With Deterministic Turbo Clock Frequency . . 43--46 Zhufei Chu and Huiming Tian and Zeqiang Li and Yinshui Xia and Lunyao Wang A High-Performance Design of Generalized Pipeline Cellular Array . . . . . . . . 47--50 Lingjun Zhu and Lennart Bamberg and Anthony Agnesina and Francky Catthoor and Dragomir Milojevic and Manu Komalan and Julien Ryckaert and Alberto Garcia-Ortiz and Sung Kyu Lim Heterogeneous $3$D Integration for a RISC-V System With STT-MRAM . . . . . . 51--54 Tony Mason and Thaleia Dimitra Doudali and Margo Seltzer and Ada Gavrilovska Unexpected Performance of Intel Optane DC Persistent Memory . . . . . . . . . . 55--58 Zhihui Zhang and Jingwen Leng and Lingxiao Ma and Youshan Miao and Chao Li and Minyi Guo Architectural Implications of Graph Neural Networks . . . . . . . . . . . . 59--62 Anderson L. Sartor and Anish Krishnakumar and Samet E. Arda and Umit Y. Ogras and Radu Marculescu HiLITE: Hierarchical and Lightweight Imitation Learning for Power Management of Embedded SoCs . . . . . . . . . . . . 63--67 Harsh Desai and Brandon Lucia A Power-Aware Heterogeneous Architecture Scaling Model for Energy-Harvesting Computers . . . . . . . . . . . . . . . 68--71 Bo-Cheng Lai and Chun-Yen Chen and Yi-Da Hsin and Bo-Yen Lin A Two-Directional BigData Sorting Architecture on FPGAs . . . . . . . . . 72--75 Peng Gu and Benjamin S. Lim and Wenqin Huangfu and Krishan T. Malladi and Andrew Chang and Yuan Xie NMTSim: Transaction-Command Based Simulator for New Memory Technology Devices . . . . . . . . . . . . . . . . 76--79 Seyyed Hossein SeyyedAghaei Rezaei and Mehdi Modarressi and Rachata Ausavarungnirun and Mohammad Sadrosadati and Onur Mutlu and Masoud Daneshtalab NoM: Network-on-Memory for Inter-Bank Data Transfer in Highly-Banked Memories 80--83 Anonymous 2019 Index \booktitleIEEE Computer Architecture Letters Vol. 18 . . . . . . 1--8
Alberto Ros and Alexandra Jimborean The Entangling Instruction Prefetcher 84--87 Rahul Singh and Gokul Subramanian Ravi and Mikko Lipasti and Joshua San Miguel Value Locality Based Approximation With ODIN . . . . . . . . . . . . . . . . . . 88--91 Jie Zhang and Miryeong Kwon and Sanghyun Han and Nam Sung Kim and Mahmut Kandemir and Myoungsoo Jung FastDrain: Removing Page Victimization Overheads in NVMe Storage Stack . . . . 92--96 Junsu Im and Hanbyeol Kim and Yumin Won and Jiho Oh and Minjae Kim and Sungjin Lee Probability-Based Address Translation for Flash SSDs . . . . . . . . . . . . . 97--100 Ahmed Samara and James Tuck The Case for Domain-Specialized Branch Predictors for Graph-Processing . . . . 101--104 Reza Mirosanlou and Danlu Guo and Mohamed Hassan and Rodolfo Pellizzoni MCsim: an Extensible DRAM Memory Controller Simulator . . . . . . . . . . 105--109 Shang Li and Zhiyuan Yang and Dhiraj Reddy and Ankur Srivastava and Bruce Jacob DRAMsim3: a Cycle-Accurate, Thermal-Capable DRAM Simulator . . . . . 106--109 Joo Hwan Lee and Hui Zhang and Veronica Lagrange and Praveen Krishnamoorthy and Xiaodong Zhao and Yang Seok Ki SmartSSD: FPGA Accelerated Near-Storage Data Analytics on SSD . . . . . . . . . 110--113 Purab Ranjan Sutradhar and Mark Connolly and Sathwika Bavikadi and Sai Manoj Pudukotai Dinakarrao and Mark A. Indovina and Amlan Ganguly pPIM: a Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning . . . . . . . . . . . . . 118--121 Wonkyo Choe and Jonghyeon Kim and Jeongseob Ahn A Study of Memory Placement on Hardware-Assisted Tiered Memory Systems 122--125 Nada Lachtar and Abdulrahman Abu Elkhail and Anys Bacha and Hafiz Malik A Cross-Stack Approach Towards Defending Against Cryptojacking . . . . . . . . . 126--129 Fatemeh Golshan and Mohammad Bakhshalipour and Mehran Shakerinava and Ali Ansari and Pejman Lotfi-Kamran and Hamid Sarbazi-Azad Harnessing Pairwise-Correlating Data Prefetching With Runahead Metadata . . . 130--133 Nikita Lazarev and Neil Adit and Shaojie Xiang and Zhiru Zhang and Christina Delimitrou Dagger: Towards Efficient RPCs in Cloud Microservices With Near-Memory Reconfigurable NICs . . . . . . . . . . 134--138 Ali Jahanshahi and Hadi Zamani Sabzi and Chester Lau and Daniel Wong GPU-NEST: Characterizing Energy Efficiency of Multi-GPU Inference Servers . . . . . . . . . . . . . . . . 139--142 Darya Mikhailenko and Yujin Nakamoto and Ben Feinberg and Engin Ipek Adapting In Situ Accelerators for Sparsity with Granular Matrix Reordering 143--146 Yasuo Ishii and Jaekyu Lee and Krishnendra Nathella and Dam Sunwoo Rebasing Instruction Prefetching: an Industry Perspective . . . . . . . . . . 147--150 Newton and Virendra Singh and Trevor E. Carlson PIM-GraphSCC: PIM-Based Graph Processing Using Graph's Community Structures . . . 151--154 Zamshed I. Chowdhury and S. Karen Khatamifard and Zhaoyong Zheng and Tali Moreshet and R. Iris Bahar and Ulya R. Karpuzcu Voltage Noise Mitigation With Barrier Approximation . . . . . . . . . . . . . 155--158 Yuezhi Che and Yuanzhou Yang and Amro Awad and Rujia Wang A Lightweight Memory Access Pattern Obfuscation Framework for NVM . . . . . 163--166 Elaheh Sadredini and Reza Rahimi and Kevin Skadron Enabling In-SRAM Pattern Processing With Low-Overhead Reporting Architecture . . 167--170 Ferdous Sharifi and Nezam Rohbani and Shaahin Hessabi Aging-Aware Context Switching in Multicore Processors Based on Workload Classification . . . . . . . . . . . . . 159--162
Anonymous 2020 Index \booktitleIEEE Computer Architecture Letters Vol. 19 . . . . . . 1--7 Hyoukjun Kwon and Michael Pellauer and Angshuman Parashar and Tushar Krishna Flexion: a Quantitative Metric for Flexibility in DNN Accelerators . . . . 1--4 Byeongho Kim and Jaehyun Park and Eojin Lee and Minsoo Rhu and Jung Ho Ahn TRiM: Tensor Reduction in Memory . . . . 5--8 Nirmal Kumar Boran and Shubhankit Rathore and Meet Udeshi and Virendra Singh Fine-Grained Scheduling in Heterogeneous-ISA Architectures . . . . 9--12 Salonik Resch and Swamit Tannu and Ulya R. Karpuzcu and Moinuddin Qureshi A Day In the Life of a Quantum Error . . 13--16 Mohsin Shan and Omer Khan Accelerating Concurrent Priority Scheduling Using Adaptive in-Hardware Task Distribution in Multicores . . . . 17--21 Arthur Perais A Case for Speculative Strength Reduction . . . . . . . . . . . . . . . 22--25 Marta Navarro and Lucia Pons and Julio Sahuquillo Hy-Sched: a Simple Hyperthreading-Aware Thread to Core Allocation Strategy . . . 26--29 Mohammad Alian and Jongmin Shin and Ki-Dong Kang and Ren Wang and Alexandros Daglis and Daehoon Kim and Nam Sung Kim IDIO: Orchestrating Inbound Network Data on Server Processors . . . . . . . . . . 30--33 Hweesoo Kim and Sunjung Lee and Jaewan Choi and Jung Ho Ahn Row-Streaming Dataflow Using a Chaining Buffer and Systolic Array+ Structure . . 34--37 Hans Kasan and John Kim The Case for Dynamic Bias in Global Adaptive Routing . . . . . . . . . . . . 38--41 Parth Shah and Ranjal Gautham Shenoy and Vaidyanathan Srinivasan and Pradip Bose and Alper Buyuktosunoglu TokenSmart: Distributed, Scalable Power Management in the Many-Core Era . . . . 42--45 Qian Li and Bin Li and Pietro Mercati and Ramesh Illikkal and Charlie Tai and Michael Kishinevsky and Christos Kozyrakis RAMBO: Resource Allocation for Microservices Using Bayesian Optimization . . . . . . . . . . . . . . 46--49 Sunghwan Kim and Gyusun Lee and Jiwon Woo and Jinkyu Jeong Zero-Copying I/O Stack for Low-Latency SSDs . . . . . . . . . . . . . . . . . . 50--53 Chao Yu and Sihang Liu and Samira Khan MultiPIM: a Detailed and Configurable Multi-Stack Processing-In-Memory Simulator . . . . . . . . . . . . . . . 54--57 Tian Tan and Eriko Nurvitadhi and Aravind Dasu and Martin Langhammer and Derek Chiou FlexScore: Quantifying Flexibility . . . 58--4 Arindam Sarkar and Newton Singh and Varun Venkitaraman and Virendra Singh DAM: Deadblock Aware Migration Techniques for STT-RAM-Based Hybrid Caches . . . . . . . . . . . . . . . . . 62--4 Han Li and Mingyu Yan and Xiaocheng Yang and Lei Deng and Wenming Li and Xiaochun Ye and Dongrui Fan and Yuan Xie Hardware Acceleration for GCNs via Bidirectional Fusion . . . . . . . . . . 66--4 Yongjoo Jang and Sejin Kim and Daehoon Kim and Sungjin Lee and Jaeha Kung Deep Partitioned Training From Near-Storage Computing to DNN Accelerators . . . . . . . . . . . . . . 70--73 Salonik Resch and Husrev Cilasun and Ulya R. Karpuzcu Cryogenic PIM: Challenges Opportunities 74--77 Wim Heirman and Stijn Eyerman and Kristof Du Bois and Ibrahim Hur RIO: ROB-Centric In-Order Modeling of Out-of-Order Processors . . . . . . . . 78--81
Aporva Amarnath and Subhankar Pal and Hiwot Tadese Kassa and Augusto Vega and Alper Buyuktosunoglu and Hubertus Franke and John-David Wellman and Ronald Dreslinski and Pradip Bose Heterogeneity-Aware Scheduling on SoCs for Autonomous Vehicles . . . . . . . . 82--85 Lei Wang and Xingwang Xiong and Jianfeng Zhan and Wanling Gao and Xu Wen and Guoxin Kang and Fei Tang WPC: Whole-Picture Workload Characterization Across Intermediate Representation, ISA, and Microarchitecture . . . . . . . . . . . 86--89 Stijn Eyerman and Wim Heirman and Ibrahim Hur Modeling DRAM Timing in Parallel Simulators With Immediate-Response Memory Model . . . . . . . . . . . . . . 90--93 Hajar Falahati and Masoud Peyro and Hossein Amini and Mehran Taghian and Mohammad Sadrosadati and Pejman Lotfi-Kamran and Hamid Sarbazi-Azad Data-Aware Compression of Neural Networks . . . . . . . . . . . . . . . . 94--97 Benjamin Wu and Trishita Tiwari and G. Edward Suh and Aaron B. Wagner Guessing Outputs of Dynamically Pruned CNNs Using Memory Access Patterns . . . 98--101 Mingi Yoo and Jaeyong Song and Jounghoo Lee and Namhyung Kim and Youngsok Kim and Jinho Lee Making a Better Use of Caches for GCN Accelerators with Feature Slicing and Automatic Tile Morphing . . . . . . . . 102--105 Bongjoon Hyun and Jiwon Lee and Minsoo Rhu Characterization and Analysis of Deep Learning for 3D Point Cloud Analytics 106--109 Alexander Rucker and Muhammad Shahbaz and Kunle Olukotun Chopping off the Tail: Bounded Non-Determinism for Real-Time Accelerators . . . . . . . . . . . . . . 110--113 Jiya Su and Linfeng He and Peng Jiang and Rujia Wang Exploring PIM Architecture for High-Performance Graph Pattern Mining 114--117 Yunjae Lee and Youngeun Kwon and Minsoo Rhu Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network Training . . . . . 118--121 Francisco Muñoz-Martínez and José L. Abellán and Manuel E. Acacio and Tushar Krishna STONNE: Enabling Cycle-Level Microarchitectural Simulation for DNN Inference Accelerators . . . . . . . . . 122--125 Nima Shoghi and Andrei Bersatti and Moinuddin Qureshi and Hyesoon Kim SmaQ: Smart Quantization for DNN Training by Exploiting Value Clustering 126--129 Haris Volos The Case for Replication-Aware Memory-Error Protection in Disaggregated Memory . . . . . . . . . . . . . . . . . 130--133 Truls Asheim and Boris Grot and Rakesh Kumar BTB-X: a Storage-Effective BTB Organization . . . . . . . . . . . . . . 134--137 Pratik Kumar and Chavhan Sujeet Yashavant and Biswabandan Panda DAMARU: a Denial-of-Service Attack on Randomized Last-Level Caches . . . . . . 138--141 Fatemeh Ghasemi and Magnus Jahre Modeling Periodic Energy-Harvesting Computing Systems . . . . . . . . . . . 142--145 Neelu Shivprakash Kalani and Biswabandan Panda Instruction Criticality Based Energy-Efficient Hardware Data Prefetching . . . . . . . . . . . . . . 146--149 Jiho Kim and Myoungsoo Jung and John Kim Decoupled SSD: Reducing Data Movement on NAND-Based Flash SSD . . . . . . . . . . 150--153 Hyeon Gyu Lee and Minwook Kim and Juwon Lee and Eunji Lee and Bryan S. Kim and Sungjin Lee and Yeseong Kim and Sang Lyul Min and Jin-Soo Kim Learned Performance Model for SSD . . . 154--157 Sudhanva Gurumurthi and Kijun Lee and Munseon Jang and Vilas Sridharan and Aaron Nygren and Yesin Ryu and Kyomin Sohn and Taekyun Kim and Hoeju Chung HBM3 RAS: Enhancing Resilience at Scale 158--161 Pavlos Aimoniotis and Christos Sakalis and Magnus Själander and Stefanos Kaxiras Reorder Buffer Contention: a Forward Speculative Interference Attack for Speculation Invariant Instructions . . . 162--165 Seyed Morteza Nabavinejad and Sherief Reda BayesTuner: Leveraging Bayesian Optimization For DNN Inference Configuration Selection . . . . . . . . 166--170 Hyungkyu Ham and Hyunuk Cho and Minjae Kim and Jueon Park and Jeongmin Hong and Hyojin Sung and Eunhyeok Park and Euicheol Lim and Gwangsun Kim Near-Data Processing in Memory Expander for DNN Acceleration on GPUs . . . . . . 171--174 Wenjie Liu and Wim Heirman and Stijn Eyerman and Shoaib Akram and Lieven Eeckhout Scale-Model Simulation . . . . . . . . . 175--178
Anonymous 2021 Index \booktitleIEEE Computer Architecture Letters Vol. 20 . . . . . . 1--8 Xinfeng Xie and Peng Gu and Jiayi Huang and Yufei Ding and Yuan Xie MPU-Sim: a Simulator for In-DRAM Near-Bank Processing Architectures . . . 1--4 Mo Zou and Mingzhe Zhang and Rujia Wang and Xian-He Sun and Xiaochun Ye and Dongrui Fan and Zhimin Tang Accelerating Graph Processing With Lightweight Learning-Based Data Reordering . . . . . . . . . . . . . . . 5--8 Kristin Barber and Moein Ghaniyoun and Yinqian Zhang and Radu Teodorescu A Pre-Silicon Approach to Discovering Microarchitectural Vulnerabilities in Security Critical Applications . . . . . 9--12 Dusol Lee and Duwon Hong and Wonil Choi and Jihong Kim MQSim-E: an Enterprise SSD Simulator . . 13--16 Benjamin J. Lucas and Ali Alwan and Marion Murzello and Yazheng Tu and Pengzhou He and Andrew J. Schwartz and David Guevara and Ujjwal Guin and Kyle Juretus and Jiafeng Xie Lightweight Hardware Implementation of Binary Ring-LWE PQC Accelerator . . . . 17--20
Yongwon Shin and Juseong Park and Jeongmin Hong and Hyojin Sung Runtime Support for Accelerating CNN Models on Digital DRAM Processing-in-Memory Hardware . . . . . 33--36 Hoyong Jin and Donghun Jeong and Taewon Park and Jong Hwan Ko and Jungrae Kim Multi-Prediction Compression: an Efficient and Scalable Memory Compression Framework for GP-GPU . . . . 37--40 Argyris Kokkinis and Dionysios Diamantopoulos and Kostas Siozios Dynamic Optimization of On-Chip Memories for HLS Targeting Many-Accelerator Platforms . . . . . . . . . . . . . . . 41--44 Sungmin Yun and Byeongho Kim and Jaehyun Park and Hwayong Nam and Jung Ho Ahn and Eojin Lee GraNDe: Near-Data Processing Architecture With Adaptive Matrix Mapping for Graph Convolutional Networks 45--48 Rui Ma and Evangelos Georganas and Alexander Heinecke and Sergey Gribok and Andrew Boutros and Eriko Nurvitadhi FPGA-Based AI Smart NICs for Scalable Distributed AI Training Systems . . . . 49--52 Fazal Hameed and Asif Ali Khan and Sebastien Ollivier and Alex K. Jones and Jeronimo Castrillon DNA Pre-Alignment Filter Using Processing Near Racetrack Memory . . . . 53--56 Ling Yang and Libo Huang and Run Yan and Nong Xiao and Sheng Ma and Li Shen and Weixia Xu Stride Equality Prediction for Value Speculation . . . . . . . . . . . . . . 57--60 Jeongmin Hong and Sungjun Cho and Gwangsun Kim Overcoming Memory Capacity Wall of GPUs With Heterogeneous Memory Stack . . . . 61--64 Luca Piccolboni and Davide Giri and Luca P. Carloni Accelerators & Security: The Socket Approach . . . . . . . . . . . . . . . . 65--68 Mingyu Yan and Mo Zou and Xiaocheng Yang and Wenming Li and Xiaochun Ye and Dongrui Fan and Yuan Xie Characterizing and Understanding HGNNs on GPUs . . . . . . . . . . . . . . . . 69--72 Cecil Accetti and Rendong Ying and Peilin Liu Structured Combinators for Efficient Graph Reduction . . . . . . . . . . . . 73--76 Yu Omori and Keiji Kimura Open-Source Hardware Memory Protection Engine Integrated With NVMM Simulator 77--80 Minjae Kim and Bryan S. Kim and Eunji Lee and Sungjin Lee A Case Study of a DRAM-NVM Hybrid Memory Allocator for Key--Value Stores . . . . 81--84 Zhengrong Wang and Christopher Liu and Tony Nowatzki Infinity Stream: Enabling Transparent and Automated In-Memory Computing . . . 85--88 Lingxi Wu and Rasool Sharifi and Ashish Venkat and Kevin Skadron DRAM-CAM: General-Purpose Bit-Serial Exact Pattern Matching . . . . . . . . . 89--92 Salonik Resch and Ulya Karpuzcu On Variable Strength Quantum ECC . . . . 93--96 Peter Salvesen and Magnus Jahre LMT: Accurate and Resource-Scalable Slowdown Prediction . . . . . . . . . . 97--100 Gyeongcheol Shin and Junsoo Kim and Joo-Young Kim OpenMDS: an Open-Source Shell Generation Framework for High-Performance Design on Xilinx Multi-Die FPGAs . . . . . . . . . 101--104 Majid Jalili and Mattan Erez Managing Prefetchers With Deep Reinforcement Learning . . . . . . . . . 105--108 Marzieh Lenjani and Alif Ahmed and Kevin Skadron Pulley: an Algorithm/Hardware Co-Optimization for In-Memory Sorting 109--112 Yongye Zhu and Shijia Wei and Mohit Tiwari Revisiting Browser Performance Benchmarking From an Architectural Perspective . . . . . . . . . . . . . . 113--116 Donghyun Gouk and Seungkwan Kang and Miryeong Kwon and Junhyeok Jang and Hyunkyu Choi and Sangwon Lee and Myoungsoo Jung PreGNN: Hardware Acceleration to Take Preprocessing Off the Critical Path in Graph Neural Networks . . . . . . . . . 117--120 Yinshen Wang and Wenming Li and Tianyu Liu and Liangjiang Zhou and Bingnan Wang and Zhihua Fan and Xiaochun Ye and Dongrui Fan and Chibiao Ding Characterization and Implementation of Radar System Applications on a Reconfigurable Dataflow Architecture . . 121--124 Xiaofeng Hou and Cheng Xu and Jiacheng Liu and Xuehan Tang and Lingyu Sun and Chao Li and Kwang-Ting Cheng Characterizing and Understanding End-to-End Multi-Modal Neural Networks on GPUs . . . . . . . . . . . . . . . . 125--128 Jared Nye and Omer Khan SSE: Security Service Engines to Accelerate Enclave Performance in Secure Multicore Processors . . . . . . . . . . 129--132 Gino A. Chacon and Charles Williams and Johann Knechtel and Ozgur Sinanoglu and Paul V. Gratz Hardware Trojan Threats to Cache Coherence in Modern 2.5D Chiplet Systems 133--136 Lieven Eeckhout A First-Order Model to Assess Computer Architecture Sustainability . . . . . . 137--140 Ranyang Zhou and Sepehr Tabrizchi and Arman Roohi and Shaahin Angizi LT-PIM: an LUT-Based Processing-in-DRAM Architecture With RowHammer Self-Tracking . . . . . . . . . . . . . 141--144 Jongwon Park and Jinkyu Jeong Speculative Multi-Level Access in LSM Tree-Based KV Store . . . . . . . . . . 145--148 Marjan Fariborz and Mahyar Samani and Terry O'Neill and Jason Lowe-Power and S. J. Ben Yoo and Venkatesh Akella A Model for Scalable and Balanced Accelerators for Graph Processing . . . 149--152 Jianming Huang and Yu Hua Ensuring Data Confidentiality in eADR-Based NVM Systems . . . . . . . . . 153--156 Sejin Kim and Jungwoo Kim and Yongjoo Jang and Jaeha Kung and Sungjin Lee SEMS: Scalable Embedding Memory System for Accelerating Embedding-Based DNNs 157--160
Daniel A. Jiménez and Elvira Teran and Paul V. Gratz Last-Level Cache Insertion and Promotion Policy in the Presence of Aggressive Prefetching . . . . . . . . . . . . . . 17--20 Yaebin Moon and Wanju Doh and Kwanhee Kyung and Eojin Lee and Jung Ho Ahn ADT: Aggressive Demotion and Promotion for Tiered Memory . . . . . . . . . . . 21--24 Gyeongseo Park and Ki-Dong Kang and Minho Kim and Daehoon Kim CoreNap: Energy Efficient Core Allocation for Latency-Critical Workloads . . . . . . . . . . . . . . . 1--4 Joonseop Sim and Soohong Ahn and Taeyoung Ahn and Seungyong Lee and Myunghyun Rhee and Jooyoung Kim and Kwangsik Shin and Donguk Moon and Euiseok Kim and Kyoung Park Computational CXL-Memory Solution for Accelerating Memory-Intensive Applications . . . . . . . . . . . . . . 5--8 Burkhard Ringlein and Francois Abel and Dionysios Diamantopoulos and Beat Weiss and Christoph Hagleitner and Dietmar Fey Advancing Compilation of DNNs for FPGAs Using Operation Set Architectures . . . 9--12 Seonho Lee and Ranggi Hwang and Jongse Park and Minsoo Rhu HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution and Linearization 13--16 Hanyeoreum Bae and Donghyun Gouk and Seungjun Lee and Jiseon Kim and Sungjoon Koh and Jie Zhang and Myoungsoo Jung Intelligent SSD Firmware for Zero-Overhead Journaling . . . . . . . . 25--28 Xia Zhao and Guangda Zhang and Lu Wang and Yangmei Li and Yongjun Zhang RouteReplies: Alleviating Long Latency in Many-Chip-Module GPUs . . . . . . . . 29--32 Kevin Weston and Farabi Mahmud and Vahid Janfaza and Abdullah Muzahid SmartIndex: Learning to Index Caches to Improve Performance . . . . . . . . . . 33--36 Soroosh Khoram and Kyle Daruwalla and Mikko Lipasti Energy-Efficient Bayesian Inference Using Bitstream Computing . . . . . . . 37--40 Jennifer Brana and Brian C. Schwedock and Yatin A. Manerkar and Nathan Beckmann Kobold: Simplified Cache Coherence for Cache-Attached Accelerators . . . . . . 41--44
Kiseok Jeon and Junghee Lee and Bumsoo Kim and James J. Kim Hardware Accelerated Reusable Merkle Tree Generation for Bitcoin Blockchain Headers . . . . . . . . . . . . . . . . 69--72 Hwanjun Lee and Seunghak Lee and Yeji Jung and Daehoon Kim T-CAT: Dynamic Cache Allocation for Tiered Memory Systems With Memory Interleaving . . . . . . . . . . . . . . 73--76 Ipoom Jeong and Jiaqi Lou and Yongseok Son and Yongjoo Park and Yifan Yuan and Nam Sung Kim LADIO: Leakage-Aware Direct I/O for I/O-Intensive Workloads . . . . . . . . 77--80 Chandana S. Deshpande and Arthur Perais and Frédéric Pétrot Toward Practical 128-Bit General Purpose Microarchitectures . . . . . . . . . . . 81--84 Achilleas Tzenetopoulos and Dimosthenis Masouros and Dimitrios Soudris and Sotirios Xydis DVFaaS: Leveraging DVFS for FaaS Workflows . . . . . . . . . . . . . . . 85--88 Hwayong Nam and Seungmin Baek and Minbok Wi and Michael Jaemin Kim and Jaehyun Park and Chihun Song and Nam Sung Kim and Jung Ho Ahn X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands . . . . . . . . 89--92 Ahmed Nematallah and Chang Hyun Park and David Black-Schaffer Exploring the Latency Sensitivity of Cache Replacement Policies . . . . . . . 93--96 Fernando Mosquera and Krishna Kavi and Gayatri Mehta and Lizy John Guard Cache: Creating Noisy Side-Channels . . . . . . . . . . . . . 97--100 Jason Mars and Yiping Kang and Roland Daynauth and Baichuan Li and Ashish Mahendra and Krisztian Flautner and Lingjia Tang The Jaseci Programming Paradigm and Runtime Stack: Building Scale-Out Production Applications Easy and Fast 101--104 Naorin Hossain and Alper Buyuktosunoglu and John-David Wellman and Pradip Bose and Margaret Martonosi SoCurity: a Design Approach for Enhancing SoC Security . . . . . . . . . 105--108 Justin Feng and Fatemeh Arkannezhad and Christopher Ryu and Enoch Huang and Siddhant Gupta and Nader Sehatbakhsh Simulating Our Way to Safer Software: a Tale of Integrating Microarchitecture Simulation and Leakage Estimation Modeling . . . . . . . . . . . . . . . . 109--112 Jaewan Choi and Jaehyun Park and Kwanhee Kyung and Nam Sung Kim and Jung Ho Ahn Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models . . 113--116 Yonghae Kim and Anurag Kar and Jaewon Lee and Jaekyu Lee and Hyesoon Kim Hardware-Assisted Code-Pointer Tagging for Forward-Edge Control-Flow Integrity 117--120 Gururaj Saileshwar and Moinuddin Qureshi The Mirage of Breaking MIRAGE: Analyzing the Modeling Pitfalls in Emerging Attacks on MIRAGE . . . . . . . . . . . 121--124 Yun-Chen Lo and Yu-Chih Tsai and Ren-Shuo Liu LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks . . . . . . . . . . . . . . . . 125--128 Maziar Goudarzi and Reza Azimi and Julian Humecki and Faizaan Rehman and Richard Zhang and Chirag Sethi and Tanishq Bomman and Yuqi Yang By-Software Branch Prediction in Loops 129--132 Yugyoung Yun and Eunhyeok Park Fast Performance Prediction for Efficient Distributed DNN Training . . . 133--136 Meng Wu and Mingyu Yan and Xiaocheng Yang and Wenming Li and Zhimin Zhang and Xiaochun Ye and Dongrui Fan Characterizing and Understanding Defense Methods for GNNs on GPUs . . . . . . . . 137--140 Pratyush Patel and Zibo Gong and Syeda Rizvi and Esha Choukse and Pulkit Misra and Thomas Anderson and Akshitha Sriraman Towards Improved Power Management in Cloud GPUs . . . . . . . . . . . . . . . 141--144 Shiqing Zhang and Mahmood Naderan-Tahan and Magnus Jahre and Lieven Eeckhout Balancing Performance Against Cost and Sustainability in Multi-Chip-Module GPUs 145--148 Chanyoung Park and Chun-Yi Liu and Kyungtae Kang and Mahmut Kandemir and Wonil Choi Design of a High-Performance, High-Endurance Key-Value SSD for Large-Key Workloads . . . . . . . . . . 149--152 Jie Liu and Zhongyuan Zhao and Zijian Ding and Benjamin Brock and Hongbo Rong and Zhiru Zhang An Intermediate Language for General Sparse Format Customization . . . . . . 153--156 Seunghak Lee and Ki-Dong Kang and Gyeongseo Park and Nam Sung Kim and Daehoon Kim NoHammer: Preventing Row Hammer With Last-Level Cache Management . . . . . . 157--160 Pau Escofet and Anabel Ovide and Carmen G. Almudever and Eduard Alarcón and Sergi Abadal Hungarian Qubit Assignment for Optimized Mapping of Quantum Circuits on Multi-Core Architectures . . . . . . . . 161--164 Lingfei Lu and Yudi Qiu and Shiyan Yi and Yibo Fan A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System . . . . . . . . . 165--168 Hailong Li and Jaewan Choi and Yongsuk Kwon and Jung Ho Ahn A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models . . . . . . . . . . . . . . . . . 169--172 Adam Hastings and Ryan Piersma and Simha Sethumadhavan Architectural Security Regulation . . . 173--176 Theodoros Trochatos and Chuanqi Xu and Sanjay Deshpande and Yao Lu and Yongshan Ding and Jakub Szefer A Quantum Computer Trusted Execution Environment . . . . . . . . . . . . . . 177--180 Peiyun Wu and Trung Le and Zhichun Zhu and Zhao Zhang Redundant Array of Independent Memory Devices . . . . . . . . . . . . . . . . 181--184 Jonathan Garcia-Mallen and Shuohao Ping and Alex Miralles-Cordal and Ian Martin and Mukund Ramakrishnan and Yipeng Huang Towards an Accelerator for Differential and Algebraic Equations Useful to Scientists . . . . . . . . . . . . . . . 185--188
João Vieira and Nuno Roma and Gabriel Falcao and Pedro Tomás gem5-accel: a Pre-RTL Simulation Toolchain for Accelerator Architecture Validation . . . . . . . . . . . . . . . 1--4 Atiyeh Gheibi-Fetrat and Negar Akbarzadeh and Shaahin Hessabi and Hamid Sarbazi-Azad Tulip: Turn-Free Low-Power Network-on-Chip . . . . . . . . . . . . 5--8 Yosuke Ueno and Yuna Tomida and Teruo Tanimoto and Masamitsu Tanaka and Yutaka Tabuchi and Koji Inoue and Hiroshi Nakamura Inter-Temperature Bandwidth Reduction in Cryogenic QAOA Machines . . . . . . . . 9--12 Hyeseong Kim and Yunjae Lee and Minsoo Rhu FPGA-Accelerated Data Preprocessing for Personalized Recommendation Systems . . 7--10 Christodoulos Peltekis and Vasileios Titopoulos and Chrysostomos Nicopoulos and Giorgos Dimitrakopoulos DeMM: a Decoupled Matrix Multiplication Engine Supporting Relaxed Structured Sparsity . . . . . . . . . . . . . . . . 17--20 Caden Corontzos and Eitan Frachtenberg Direct-Coding DNA With Multilevel Parallelism . . . . . . . . . . . . . . 21--24 Ramin Ayanzadeh and Moinuddin Qureshi Enhancing the Reach and Reliability of Quantum Annealers by Pruning Longer Chains . . . . . . . . . . . . . . . . . 25--28 Courtney Golden and Dan Ilan and Caroline Huang and Niansong Zhang and Zhiru Zhang and Christopher Batten Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator . . . . . . . . . . . . . . 29--32 Samuel Thomas and Kidus Workneh and Ange-Thierry Ishimwe and Zack McKevitt and Phaedra Curlin and R. Iris Bahar and Joseph Izraelevitz and Tamara Lehman Baobab Merkle Tree for Efficient Secure Memory . . . . . . . . . . . . . . . . . 33--36 Minsik Cho and Keivan A. Vahid and Qichen Fu and Saurabh Adya and Carlo C. Del Mundo and Mohammad Rastegari and Devang Naik and Peter Zatloukal eDKM: an Efficient and Accurate Train-Time Weight Clustering for Large Language Models . . . . . . . . . . . . 37--40 Yang-Gon Kim and Yun-Ki Han and Jae-Kang Shin and Jun-Kyum Kim and Lee-Sup Kim Accelerating Deep Reinforcement Learning via Phase-Level Parallelism for Robotics Applications . . . . . . . . . . . . . . 41--44 Yuxin Yang and Xiaoming Chen and Yinhe Han JANM-IK: Jacobian Argumented Nelder--Mead Algorithm for Inverse Kinematics and its Hardware Acceleration 45--48 Mohammad Hafezan and Ehsan Atoofian Improving Energy-Efficiency of Capsule Networks on Modern GPUs . . . . . . . . 49--52 Mahita Nagabhiru and Gregory T. Byrd Achieving Forward Progress Guarantee in Small Hardware Transactions . . . . . . 53--56 Rui Ma and Jia-Ching Hsu and Ali Mansoorshahi and Joseph Garvey and Michael Kinsner and Deshanand Singh and Derek Chiou Primate: a Framework to Automatically Generate Soft Processors for Network Applications . . . . . . . . . . . . . . 57--60 Lo\"\ic France and Florent Bruguier and David Novo and Maria Mushtaq and Pascal Benoit Reducing the Silicon Area Overhead of Counter-Based Rowhammer Mitigations . . 61--64 L. Yavits DRAMA: Commodity DRAM Based Content Addressable Memory . . . . . . . . . . . 65--68 Deepanjali Mishra and Konstantinos Kanellopoulos and Ashish Panwar and Akshitha Sriraman and Vivek Seshadri and Onur Mutlu and Todd C. Mowry Address Scaling: Architectural Support for Fine-Grained Thread-Safe Metadata Management . . . . . . . . . . . . . . . 69--72 Changmin Shin and Taehee Kwon and Jaeyong Song and Jae Hyung Ju and Frank Liu and Yeonkyu Choi and Jinho Lee A Case for In-Memory Random Scatter--Gather for Fast Graph Processing . . . . . . . . . . . . . . . 73--77 Lieven Eeckhout R.I.P. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup Instead . . . . . . . . . . . . . . . . 78--82 Z. Jahshan and L. Yavits MajorK: Majority Based kmer Matching in Commodity DRAM . . . . . . . . . . . . . 83--86 Shiyan Yi and Yudi Qiu and Lingfei Lu and Guohao Xu and Yong Gong and Xiaoyang Zeng and Yibo Fan GATe: Streamlining Memory Access and Communication to Accelerate Graph Attention Network With Near-Memory Processing . . . . . . . . . . . . . . . 87--90 Mrinmay Sasmal and Tresa Joseph and Bindiya T. S. Approximate Multiplier Design With LFSR-Based Stochastic Sequence Generators for Edge AI . . . . . . . . . 91--94 Varun Gohil and Sundar Dev and Gaurang Upasani and David Lo and Parthasarathy Ranganathan and Christina Delimitrou The Importance of Generalizability in Machine Learning for Systems . . . . . . 95--98 Nikhil Agarwal and Mitchell Fream and Souradip Ghosh and Brian C. Schwedock and Nathan Beckmann UDIR: Towards a Unified Compiler Framework for Reconfigurable Dataflow Architectures . . . . . . . . . . . . . 99--103 Kyriaki Tsantikidou and Nicolas Sklavos An Area Efficient Architecture of a Novel Chaotic System for High Randomness Security in e-Health . . . . . . . . . . 104--107 Yongmo Park and Subhankar Pal and Aporva Amarnath and Karthik Swaminathan and Wei D. Lu and Alper Buyuktosunoglu and Pradip Bose Dramaton: a Near-DRAM Accelerator for Large Number Theoretic Transforms . . . 108--111 Haocong Luo and Yahya Can Tu\ugrul and F. Nisa Bostancì and Ataberk Olgun and A. Giray Ya\uglìkçì and Onur Mutlu Ramulator 2.0: a Modern, Modular, and Extensible DRAM Simulator . . . . . . . 112--116 Hyungyo Kim and Gaohan Ye and Nachuan Wang and Amir Yazdanbakhsh and Nam Sung Kim Exploiting Intel Advanced Matrix Extensions (AMX) for Large Language Model Inference . . . . . . . . . . . . 117--120 Tianzheng Li and Enfang Cui and Yuting Wu and Qian Wei and Yue Gao TeleVM: a Lightweight Virtual Machine for RISC-V Architecture . . . . . . . . 121--124 Yingjie Qi and Jianlei Yang and Ao Zhou and Tong Qiao and Chunming Hu Architectural Implications of GNN Aggregation Programming Abstractions . . 125--128 Asif Ali Khan and Fazal Hameed and Taha Shahroodi and Alex K. Jones and Jeronimo Castrillon Efficient Memory Layout for Pre-Alignment Filtering of Long DNA Reads Using Racetrack Memory . . . . . . 129--132 Saurav Maji and Kyungmi Lee and Anantha P. Chandrakasan SparseLeakyNets: Classification Prediction Attack Over Sparsity-Aware Embedded Neural Networks Using Timing Side-Channel Information . . . . . . . . 133--136 Seyyed Hossein SeyyedAghaei Rezaei and Parham Zilouchian Moghaddam and Mehdi Modarressi Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories . . . . . . . . . 137--141
Hossein Katebi and Navidreza Asadi and Maziar Goudarzi FullPack: Full Vector Utilization for Sub-Byte Quantized Matrix--Vector Multiplication on General Purpose CPUs 142--145 Erika S. Alcorta and Mahesh Madhav and Richard Afoakwa and Scott Tetrick and Neeraja J. Yadwadkar and Andreas Gerstlauer Characterizing Machine Learning-Based Runtime Prefetcher Selection . . . . . . 146--149 Andreas Kosmas Kakolyris and Dimosthenis Masouros and Sotirios Xydis and Dimitrios Soudris SLO-Aware GPU DVFS for Energy-Efficient LLM Inference Serving . . . . . . . . . 150--153 Dongho Yoon and Taehun Kim and Jae W. Lee and Minsoo Rhu A Quantitative Analysis of State Space Model-Based Large Language Model: Study of Hungry Hungry Hippos . . . . . . . . 154--157 Mohammadamin Ajdari and Behrang Montazerzohour and Kimia Abdi and Hossein Asadi Empirical Architectural Analysis on Performance Scalability of Petascale All-Flash Storage Systems . . . . . . . 158--161 Ali Mohammadpur-Fard and Sina Darabi and Hajar Falahati and Negin Mahani and Hamid Sarbazi-Azad Exploiting Direct Memory Operands in GPU Instructions . . . . . . . . . . . . . . 162--165 Pablo Andreu and Pedro Lopez and Carles Hernandez Hashing ATD Tags for Low-Overhead Safe Contention Monitoring . . . . . . . . . 166--169 Deniz Gurevin and Caiwen Ding and Omer Khan Exploiting Intrinsic Redundancies in Dynamic Graph Neural Networks for Processing Efficiency . . . . . . . . . 170--174 Reoma Matsuo and Toru Koizumi and Hidetsugu Irie and Shuichi Sakai and Ryota Shioya TURBULENCE: Complexity-Effective Out-of-Order Execution on GPU With Distance-Based ISA . . . . . . . . . . . 175--178 Dongjae Lee and Bongjoon Hyun and Taehun Kim and Minsoo Rhu Analysis of Data Transfer Bottlenecks in Commercial PIM Systems: a Study With UPMEM--PIM . . . . . . . . . . . . . . . 179--182 Seunghyuk Yu and Hyeonu Kim and Kyoungho Jeun and Sunyoung Hwang and Eojin Lee Architecting Compatible PIM Protocol for CPU--PIM Collaboration . . . . . . . . . 183--186 Yazheng Tu and Pengzhou He and Chip-Hong Chang and Jiafeng Xie LTE: Lightweight and Time-Efficient Hardware Encoder for Post-Quantum Scheme HQC . . . . . . . . . . . . . . . . . . 187--190 Mohamed Hossam and Salah Hessien and Mohamed Hassan Octopus: a Cycle-Accurate Cache System Simulator . . . . . . . . . . . . . . . 191--194 Paresh Baidya and Rourab Paul and Swagata Mandal and Sumit Kumar Debnath Efficient Implementation of Knuth Yao Sampler on Reconfigurable Hardware . . . 195--198 Rui Xie and Asad Ul Haq and Linsen Ma and Krystal Sun and Sanchari Sen and Swagath Venkataramani and Liu Liu and Tong Zhang SmartQuant: CXL-Based AI Model Store in Support of Runtime Configurable Weight Quantization . . . . . . . . . . . . . . 199--202 Haeyoon Cho and Hyojun Son and Jungmin Choi and Byungil Koh and Minho Ha and John Kim Proactive Embedding on Cold Data for Deep Learning Recommendation Model Training . . . . . . . . . . . . . . . . 203--206 Hyesung Ji and Sangpyo Kim and Jaewan Choi and Jung Ho Ahn Accelerating Programmable Bootstrapping Targeting Contemporary GPU Microarchitecture . . . . . . . . . . . 207--210 Yuya Degawa and Shota Suzuki and Junichiro Kadomoto and Hidetsugu Irie and Shuichi Sakai Cycle-Oriented Dynamic Approximation: Architectural Framework to Meet Performance Requirements . . . . . . . . 211--214 Md Tareq Mahmud and Ke Wang A Flexible Hybrid Interconnection Design for High-Performance and Energy-Efficient Chiplet-Based Systems 215--218 Hyungkyu Ham and Wonhyuk Yang and Yunseon Shin and Okkyun Woo and Guseul Heo and Sangyeop Lee and Jongse Park and Gwangsun Kim ONNXim: a Fast, Cycle-Level Multi-Core NPU Simulator . . . . . . . . . . . . . 219--222 Shizhuo Zhu and Illia Shkirko and Jacob Levinson and Zhengrong Wang and Tony Nowatzki SPGPU: Spatially Programmed GPU . . . . 223--226 Eunyeong Cho and Jehyeon Bang and Minsoo Rhu Characterization and Analysis of Text-to-Image Diffusion Models . . . . . 227--230 Farid Samandi and Natheesan Ratnasegar and Michael Ferdman A Case for Hardware Memoization in Server CPUs . . . . . . . . . . . . . . 231--234 Hanna Cha and Sungchul Lee and Yeonan Ha and Hanhwi Jang and Joonsung Kim and Youngsok Kim GCStack: a GPU Cycle Accounting Mechanism for Providing Accurate Insight Into GPU Performance . . . . . . . . . . 235--238 Hongtao Wang and Peiquan Jin ZoneBuffer: an Efficient Buffer Management Scheme for ZNS SSDs . . . . . 239--242 Samuel Coulon and Tianyou Bao and Jiafeng Xie SCALES: SCALable and Area-Efficient Systolic Accelerator for Ternary Polynomial Multiplication . . . . . . . 243--246 Navnil Choudhury and Chao Lu and Kanad Basu Quantum Assertion Scheme for Assuring Qudit Robustness . . . . . . . . . . . . 247--250
Haseung Bong and Nahyeon Kang and Youngsok Kim and Joonsung Kim and Hanhwi Jang IntervalSim++: Enhanced Interval Simulation for Unbalanced Processor Designs . . . . . . . . . . . . . . . . 1--4 Myoungjun Chun and Jaeyong Lee and Inhyuk Choi and Jisung Park and Myungsuk Kim and Jihong Kim Straw: a Stress-Aware WL-Based Read Reclaim Technique for High-Density NAND Flash-Based SSDs . . . . . . . . . . . . 5--8 Chaithanya Krishna Vadlamudi and Bahar Asgari Electra: Eliminating the Ineffectual Computations on Bitmap Compressed Matrices . . . . . . . . . . . . . . . . 9--12