Last update:
Tue Oct 1 07:12:31 MDT 2024
Caxton C. Foster A review of dynamic memories with enhanced data access by Harold S. Stone. IEEETC Vol. C-21, #4, p 359--386, April 1972 . . . . . . . . . . . . . . . . . . 3--7 M. Bataille Something old: the Gamma 60 the computer that was ahead of its time . . . . . . . 10--15 Caxton C. Foster Something new: the Intel MCS-4 micro computer set . . . . . . . . . . . . . . 16--17 J. A. N. Lee My next compiler . . . . . . . . . . . . 17--19 Michael J. Flynn and Mrs. Carol Rogers Computer architecture at Johns Hopkins 21--33
R. F. Vaughan and R. A. Collins On computer architecture, software portability & microprogramming . . . . . 14--15 James C. Brakefield An optimal floating point format . . . . 16--17 J. E. Brewer Recent doctoral dissertations of interest to SIGARCH . . . . . . . . . . 18--20
C. W. Bettcher Thread standardization and relative cost 9--9 Richard L. Sites Floating point significance interrupt proposal . . . . . . . . . . . . . . . . 10--12 Caxton Foster Computer architecture . . . . . . . . . 13--18
Louis S. Adler A mini-computer configuration for CAI: a systems engineering view . . . . . . . . 10--19 W. M. Gentleman and B. A. Wichmann Timing on computers . . . . . . . . . . 20--23 Karl Schank Architectural assistance to software debugging aids . . . . . . . . . . . . . 37--38
Dileep P. Bhandarkar and Samuel H. Fuller Markov chain models for analyzing memory interference in multiprocessor computer systems . . . . . . . . . . . . . . . . 1--6 George A. Anderson Interconnecting a distributed processor system for avionics . . . . . . . . . . 11--16 L. Rodney Goke and G. J. Lipovski Banyan networks for partitioning multiprocessor systems . . . . . . . . . 21--28 Harry F. Jordan and Burton J. Smith Structure of digital system description languages . . . . . . . . . . . . . . . 31--34 John A. N. Lee VDL---a definition system for all levels 41--48 Charles H. Radoy and George P. Copeland, Jr. and G. J. Lipovski A methodology for parallel processing design tradeoffs . . . . . . . . . . . . 51--56 S. F. Reddaway DAP---a distributed array processor . . 61--65 Peter M. Kogge Maximal rate pipelined solutions to recurrence problems . . . . . . . . . . 71--76 Tilak Agerwala and Mike Flynn Comments on capabilities, limitations and ``correctness'' of Petri nets . . . 81--86 Wayne E. Omohundro and James H. Tracey Flowware---a flow charting procedure to describe digital networks . . . . . . . 91--97 Mario R. Barbacci and Daniel P. Siewiorek Automated exploration of the design space for register transfer (RT) systems 101--106 T. A. Laliotis Implementation aspects of the symbol hardware compiler . . . . . . . . . . . 111--115 George P. Copeland, Jr. and G. J. Lipovski and Stanley Y. W. Su The architecture of CASSM: a cellular system for non-numeric processing . . . 121--128 John M. Hemphill and S. A. Szygenda Deriving design guidelines for diagnosable computer systems . . . . . . 131--135 Behrooz Parhami and Algirdas Avizienis Design of fault-tolerant associative processors . . . . . . . . . . . . . . . 141--145 M. A. Fischler and O. Firschein A fault tolerant multiprocessor architecture for real-time control applications . . . . . . . . . . . . . . 151--157 G. J. Lipovski A varistructured fail-soft cellular computer . . . . . . . . . . . . . . . . 161--165 Jean Vaucher and Christian Rey A hardware laboratory for computer architecture research . . . . . . . . . 171--175 P. J. Knoke Simulation exercises for computer architecture education . . . . . . . . . 181--185 M. E. Sloan Computer architecture courses in electrical engineering departments . . . 191--195 R. Hartenstein Increasing hardware complexity---a challenge to computer architecture education . . . . . . . . . . . . . . . 201--206 George Rossmann Review of the \em Workshop on Computer Architecture Education . . . . . . . . . 211--214 Richard G. Cooper Micromodules: Microprogrammable building blocks for hardware development . . . . 221--226 S. H. Fuller and D. P. Siewiorek and R. J. Swan Computer Modules: an architecture for large digital modules . . . . . . . . . 231--237 Rodnay Zaks A microprogrammed architecture for front end processing . . . . . . . . . . . . . 241--246 Z. G. Vranesic and V. C. Hamacher and Y. Y. Leung Design of a fully variable-length structured minicomputer . . . . . . . . 251--255 Orin E. Marvel Happe Honeywell Associative Parallel Processing Ensemble . . . . . . . . . . 261--267 Mario R. Schaffner A computer architecture and its programming language . . . . . . . . . . 271--277
John Shore Conjecture corner . . . . . . . . . . . 3--6 W. M. McKeeman Computer design evaluation using programming language primitives . . . . 7--18 Reiner W. Hartenstein Letter to membership from incoming chairman (CAN, Oct. 73) . . . . . . . . 19--22
David Stryker and David Weiss Secure system architecture . . . . . . . 37--38
Stephen Y. H. Su Book review of \em Logic and Logic Design by B. Girling and H. G. Morning. International Textbook Company Limited 1973 . . . . . . . . . . . . . . . . . . 2--3 John Shore Conjecture corner . . . . . . . . . . . 4--9
L. Nisnevich and E. Strasbourger Decentralized priority control in data communication . . . . . . . . . . . . . 1--6 Cecil C. Reames and Ming T. Liu A loop network for simultaneous transmission of variable-length messages 7--12 James F. Callan The architecture of the Picture System 13--16 John Staudhammer and Jeffrey F. Eastman and James N. England A fast display-oriented processor . . . 17--22 Jeffrey F. Eastman and John Staudhammer Computer display of colored three-dimensional objects . . . . . . . 23--27 Henry D. Kerr A microprogrammed processor for interactive computer graphics . . . . . 28--33 C. V. W. Armstrong Functional memory techniques applied to the microprogrammed control of an associative processor . . . . . . . . . 34--40 James F. Wade and Paul D. Stigall Instruction design to minimize program size . . . . . . . . . . . . . . . . . . 41--44 James O. Bondi and Paul D. Stigall HMO, a hardware microcode optimizer . . 45--51 A. M. Peskin The computer aided design of processor architectures . . . . . . . . . . . . . 51--55 W. H. Huen and D. P. Siewiorek Intermodule protocol for register transfer level modules: representation and analytic tools . . . . . . . . . . . 56--62 Portia Isaacson Picture systems, PS, and the design of a channel-to-channel computer interface 63--70 Lennart Löfgren Reference concepts in a tree structured address space . . . . . . . . . . . . . 71--79 Judith A. Anderson and G. J. Lipovski A virtual memory for microprocessors . . 80--84 R. E. Brundage and A. P. Batson The performance enhancement of descriptor-based virtual memory systems through the use of associative registers 85--90 Orin E. Marvel SPEAC: special purpose electronic area correlator . . . . . . . . . . . . . . . 91--94 James M. Satterfield Architectural advances of the space shuttle orbiter avionics computer system 95--98 Uno R. Kodres and William L. McCracken Design study of an avionics navigation microcomputer . . . . . . . . . . . . . 99--105 Gerald R. Kane An iteratively structured information processor . . . . . . . . . . . . . . . 106--112 H. Richards, Jr. and A. E. Oldehoeft Hardware-software interactions in SYMBOL-2R's operating system . . . . . . 113--118 Pierre Sylvain and Maniel Vineberg The design and evaluation of the array machine: a high-level language processor 119--125 Jack B. Dennis and David P. Misunas A preliminary architecture for a basic data-flow processor . . . . . . . . . . 126--132 K. J. Berkling Reduction languages for reduction machines . . . . . . . . . . . . . . . . 133--140 Willis K. King and Fulvio Carbonaro Output devices sharing by minicomputers 141--145 S. Rannem and V. C. Hamacher and S. G. Zaky and P. Connolly On relating small computer performance to design parameters . . . . . . . . . . 146--151 Harold W. Lawson, Jr. and Bengt Magnhagen Advantages of structured hardware . . . 152--158 Peter Kornerup Concepts of the MATHILDA system . . . . 159--164 Caxton C. Foster SOCRATES . . . . . . . . . . . . . . . . 165--169 Donald F. Wann and Robert A. Ellis Conjoined computer systems: an architecture for laboratory data processing and instrument control . . . 170--175 E. Douglas Jensen A distributed function computer for real-time control . . . . . . . . . . . 176--182 C. H. Radoy and G. J. Lipovski Switched multiple instruction, multiple data stream processing . . . . . . . . . 183--187 Robert J. Lechner Sequentially encoded data structures that support bidirectional scanning . . 188--194 Martin Freeman An instruction class for an extensible interpreter . . . . . . . . . . . . . . 195--200 W. K. Giloi and H. Berg STARLET: a computer concept based on ordered sets as primitive data types . . 201--206 R. G. Cornell and H. C. Torng A cellular general purpose computer . . 207--213 Barry C. Goldstein and Thomas W. Scrutchin A machine-oriented resource management architecture . . . . . . . . . . . . . . 214--219 M. E. Sloan A design-oriented computer engineering program . . . . . . . . . . . . . . . . 220--224 Janis Beitch Baron and D. E. Atkins An educational laboratory in contemporary digital design . . . . . . 225--231
W. R. Smith AADC computer family architecture program . . . . . . . . . . . . . . . . 4--8 Åmund Lunde More data on the O/W ratios: a note on a paper by Flynn . . . . . . . . . . . . . 9--13 G. Jack Lipovski and Stanley Y. W. and Sr On non-numeric architecture . . . . . . 14--29
Guy. G. Boulaye Structured design for structured computer architecture . . . . . . . . . 8--17
D. L. Parnas Evaluation criteria for abstract machines with unknown applications . . . 2--9 William R. Smith AADC computer family architecture questions and answers . . . . . . . . . 15--21 Stephen Y. H. Su An introduction to CHDL (computer hardware description languages) . . . . 22--23 R. W. Doran The International Computers Ltd. ICL2900 computer architecture . . . . . . . . . 24--47
Gordon Bell and William D. Strecker Computer structures: What have we learned from the PDP-11? . . . . . . . . 1--14 Helmut Kerner and Werner Beyerle A PMS level language for performance evaluation modelling (V-PMS) . . . . . . 15--19 M. Moalla and G. Saucier and J. Sifakis and M. Zachariades A design tool for the multilevel description and simulation of systems of interconnected modules . . . . . . . . . 20--27 Jonathan Allen A course in computer structures . . . . 28--32 George E. Rossmann The IEEE Computer Society task force on computer architecture . . . . . . . . . 33--33 Lawrence C. Widdoes, Jr. The Minerva multi-microprocessor . . . . 34--39 R. G. Arnold and E. W. Page A hierarchical, restructurable multi-microprocessor architecture . . . 40--45 Robert McGill and John Steinhoff A multimicroprocessor approach to numerical analysis: An application to gaming problems . . . . . . . . . . . . 46--51 John E. Jensen and Jean-Loup Baer A model of interference in a shared resource multiprocessor . . . . . . . . 52--57 Clement K. C. Leung and David P. Misunas and Andrij Neczwid and Jack B. Dennis A computer simulation facility for packet communication architecture . . . 58--63 S. L. Rege Cost, performance and size tradeoffs for different levels in a memory hierarchy 64--67 Paul E. Dworak and Alice C. Parker An input interface for a real-time digital sound generation system . . . . 68--73 Michael C. Mulder and Patrick P. Fasang A microprocessor oriented data acquisition and control system for power system control . . . . . . . . . . . . . 74--78 H. M. Gladney and G. Hochweller Multiprogramming for real-time applications . . . . . . . . . . . . . . 79--85 Theodore H. Kehl Basil architecture --- an HLL minicomputer . . . . . . . . . . . . . . 86--92 Harold W. Lawson, Jr. Function distribution in computer system architectures . . . . . . . . . . . . . 93--97 Chris A. Vissers Interface, a dispersed architecture . . 98--104 A. Thomasian and A. Avizienis A design study of a shared resource computing system . . . . . . . . . . . . 105--112 W. S. Ford and V. C. Hamacher Hardware support for inter-process communication and processor sharing . . 113--118 Ulrich Trambacz and Georg Hyla A taxonomy of display processors . . . . 119--120 W. E. Kluge Traversing binary tree structures with shift register memories (recent results) 121.1--121.1 Eduardo B. Fernandez and Rita C. Summers and Charles D. Coleman Architectural support for system protection (recent results) . . . . . . 121.2--121.2 James W. Gault and Alice C. Parker The design of a user-programmable digital interface (recent results) . . . 121.3--121.3 Serge Fournier and Ming T. Liu System design of a grammar-programmable high-level language machine . . . . . . 122.4--122.4 Ch. Kuznia and R. Kober and H. Kopp SMS 101 --- a structured multi microprocessor system with deadlock-free operation scheme . . . . . . . . . . . . 122.5--122.5 Philip S. Liu and Frederic J. Mowle Selection schemes for dynamically microcoding Fortran programs . . . . . . 122.6--122.6 S. H. Fuller and D. P. Siewiorek and R. J. Swan The design of a multi-micro-computer system . . . . . . . . . . . . . . . . . 123--123 Cecil C. Reames and Ming T. Liu Design and simulation of the distributed loop computer network (DLCN) . . . . . . 124--129 Paolo Franchi Distribution of functions and control in RPCNET . . . . . . . . . . . . . . . . . 130--135 Larry D. Wittie Efficient message routing in Mega-Micro-Computer networks . . . . . . 136--140 Terry A. Welch An investigation of descriptor oriented architecture . . . . . . . . . . . . . . 141--146 E. A. Feustel Tagged architecture and the semantics of programming languages: Extensible types 147--150 A. P. Batson and R. E. Brundage and J. P. Kearns Design data for Algol-60 machines . . . 151--154 William D. Strecker Cache memories for PDP-11 family computers . . . . . . . . . . . . . . . 155--158 Janak H. Patel and Edward S. Davidson Improving the throughput of a pipeline by insertion of delays . . . . . . . . . 159--164 A. M. Abd-Alla and Laird H. Moffett On-line architecture tuning using microcapture . . . . . . . . . . . . . . 165--171 Leonard D. Healy A character-oriented context-addressed segment-sequential storage . . . . . . . 172--177 J. A. Bush and G. J. Lipovski and S. Y. W. su and J. K. Watson and S. J. Ackerman Some implementations of segment sequential functions . . . . . . . . . . 178--185 Manlio DeMartinis and G. Jack Lipovski and Stanley Y. W. Su and J. K. Watson A Self Managing Secondary Memory system 186--194 Samuel H. Fuller Price/performance comparison of C.mmp and the PDP-10 . . . . . . . . . . . . . 195--202
Lars-Erik Thorelli Representation of arrays in computers 6--9 Helmut Berndt Evolutionary computer architecture: the Unidata 7.000 series . . . . . . . . . . 10--16 Jack B. Dennis Computer architecture and the cost of software . . . . . . . . . . . . . . . . 17--21 George Lindamood On navel contemplation and the art of computer maintenance . . . . . . . . . . 22--23
S. H. Fuller and G. A. Mathew Implementing microprogram storage with PLA's . . . . . . . . . . . . . . . . . 6--11 D. R. Hicks A generalized queue scheme for process synchronization and communication . . . 12--14 Glen G. Langdon Book reviews: Review of \em Introduction to Computer Architecture by Harold S. Stone . . . . . . . . . . . . . . . . . 17--19
Kenneth J. Thurber ARPS: a new real-time computer . . . . . 6--16 Alan B. Salisbury MCF: a military computer family for computer-based systems . . . . . . . . . 17--20 Frederic N. Ris A unified decimal floating-point architecture for the support of high-level languages . . . . . . . . . . 21--31 G. Jack Lipovski A question of style . . . . . . . . . . 32--38 G. Chroust Data interfaces versus control interfaces: a half-baked conjecture . . 39--40
Glen G. Langdon Considerations on the ``figure of merit'' technique for storage hierarchy design . . . . . . . . . . . . . . . . . 25--28 Edward F. Miller Book Reviews: Review of \em High-Level Language Computer Architecture by Yaohan Chu. Academic Press, New York, 1975 . . 29--29
Yaohan Chu Architecture of a hardware data interpreter . . . . . . . . . . . . . . 1--9 Subrata Dasgupta The design of some language constructs for horizontal microprogramming . . . . 10--16 E. Douglas Jensen and Richard Y. Kain The Honeywell Modular Microprogram Machine: M3 . . . . . . . . . . . . . . 17--28 Richard R. Ramseyer and Andries van Dam A multi-microprocessor implementation of a general purpose pipelined CPU . . . . 29--34 C. V. Ravi and Torben Moller A hierarchical microcomputer system for hardware and software development . . . 35--40 J. Archer Harris and David R. Smith Hierarchical multiprocessor organizations . . . . . . . . . . . . . 41--48 K. Hurakami and S. Nishikawa and M. Sato Poly-Processor System analysis and design . . . . . . . . . . . . . . . . . 49--56 Guy Mazare A few examples of how to use a symmetrical multi-micro-processor . . . 57--62 Peter M. Kogge The microprogramming of pipelined processors . . . . . . . . . . . . . . . 63--69 Howard Jay Siegel The universality of various types of SIMD machine interconnection networks 70--79 Ramakrishna B. Rau and George E. Rossmann The effect of instruction fetch strategies upon the performance of pipelined instruction units . . . . . . 80--89 S. R. Ahuja and J. R. Jump A modular memory scheme for array processing . . . . . . . . . . . . . . . 90--94 Leonard S. Haynes The architecture of an ALGOL 60 computer implemented with distributed processors 95--104 Herbert Sullivan and T. R. Bashkow A large scale, homogeneous, fully distributed parallel machine, I . . . . 105--117 Herbert Sullivan and Theodore R. Bashkow and David Klappholz A Large Scale, Homogeneous, Fully Distributed Parallel Machine, II . . . . 118--124 G. Jack Lipovski On virtual memories and micronetworks 125--134 Jon C. Strauss and Kenneth J. Thurber Considerations for new tactical computer systems . . . . . . . . . . . . . . . . 135--140 Kenneth J. Thurber and Peter C. Patton and Robert C. Deward and Jon C. Strauss and Thomas W. Petschauer An advanced tactical computer concept 141--146 Gary J. Nutt Microprocessor implementation of a parallel processor . . . . . . . . . . . 147--152 Paul Dworak and Alice C. Parker and Richard Blum The design and implementation of a real-time sound generation system . . . 153--158 A. C. Parker and A. W. Nagle Hardware/software tradeoffs in a variable word width, variable queue length buffer memory . . . . . . . . . . 159--164 Bernard L. Peuto and Leonard J. Shustek An instruction timing model of CPU performance . . . . . . . . . . . . . . 165--178 Cornelis H. Hoogendoorn Reduction of memory interference in multiprocessor systems . . . . . . . . . 179--183 D. W. Hammerstrom and E. S. Davidson Information content of CPU memory referencing behavior . . . . . . . . . . 184--192 Ming T. Liu and Cecil C. Reames Message communication protocol and operating system design for the Distributed Loop Computer Network (DLCN) 193--200 G. H. Poujoulat Architecture of the CORAIL building block system . . . . . . . . . . . . . . 201--204 H. L. Tredennick and T. A. Welch High-speed buffering for variable length operands . . . . . . . . . . . . . . . . 205--210
Rod Steel Another general purpose computer architecture . . . . . . . . . . . . . . 5--11 George E. Lindamood What's in a name? . . . . . . . . . . . 12--14 Conrad Schneiker The microprocessors of the future . . . 15--16 Edward F. Miller, Jr. Book review: Review of \em Large-Scale Computer Architecture: Parallel and Associative Processors by Kenneth J. Thurber, Hayden Book Company, Rochelle Park, New Jersey 1976 . . . . . . . . . 17--17
William M. Conner and Edward R. Dirling Input/Output considerations in look-ahead processing . . . . . . . . . 7--12 Robert F. Rosin The significance of microprogramming . . 14--19 Mario J. Gonzalez Book review: Review of \em Microprogramming Primer by Harry Katzan, Jr., McGraw-Hill 1977 . . . . . . . . . 29--30
Maniel Vineberg Implementation of character string pattern matching on a multiprocessor . . 1--7 R. M. Bird and J. C. Tu and R. M. Worthy Associative/parallel processors for searching very large textual data bases 8--9 G. J. Lipovski On imaginary fields, token transfers and floating codes in intelligent secondary memories . . . . . . . . . . . . . . . . 17--22 S. G. Zaky Microprocessors for non-numeric processing . . . . . . . . . . . . . . . 23--30 David K. Hsiao and Krishnamurthi Kannan The architecture of a database computer --- a summary . . . . . . . . . . . . . 31--33 Robert S. Rosenthal The data management machine, a classification . . . . . . . . . . . . . 35--39 Ken J. McDonell Trends in non-software support for input-output functions . . . . . . . . . 40--47 R. Cerretti and D. Jasilli and D. R. Matteucci Ulisse: An Italian project for a multifunctional terminal system . . . . 48--50 Olin H. Bray Data management requirements: The similarity of memory management, database systems, and message processing 68--76 Barry M. Landson and Robert G. Sargent A comparison of sequential and associate computing of priority queues . . . . . . 77--78
Glenford J. Myers The case against stack-oriented instruction sets . . . . . . . . . . . . 7--10 Andrew S. Tanenbaum Ambiguous machine architecture and program efficiency . . . . . . . . . . . 11--13 D. R. Hicks Microprogramming with a content-addressable read-only-memory . . 14--15 D. R. Hicks Multitasking as a program structuring primitive . . . . . . . . . . . . . . . 16--18
G. Chroust Book reviews: Review of \em Digital System Implementation by Gerrit A. Blaauw, Prentice Hall, Series in Automatic Computation 1976 . . . . . . . 27--28
R. A. Hagan and C. S. Wallace A virtual memory system for the Hewlett Packard 2100A . . . . . . . . . . . . . 5--13 Forest Baskett More on microprocessors of the future 14--17 Yaohan Chu Direct-execution computer architecture 18--23 Peter U. Schulthess and Eduard P. Mumprecht Reply to the case against stack-oriented instruction sets . . . . . . . . . . . . 24--27
John B. Mountain and Philip H. Enslow Application of the military computer family architecture selection criteria to the PR1ME P400 . . . . . . . . . . . 3--17 G. Jack Lipovski Just a few more words on microprocessors of the future . . . . . . . . . . . . . 18--21 J. L. Keedy On the use of stacks in the evaluation of expressions . . . . . . . . . . . . . 22--28 Andrew S. Tanenbaum Review of \em Processor Architecture by S. H. Lavington, NCC Publications, Manchester 1976 . . . . . . . . . . . . 31--31 A. E. Whiteside Book reviews: Review of \em The Architecture of Concurrent Programs by Per Brinch Hansen, Prentice-Hall 1977 32--32
Dileep P. Bhandarkar and J. Egil Juliussen Semiconductor technology: trends and implications . . . . . . . . . . . . . . 4--14 A. J. Payne A computer console design to help the operator . . . . . . . . . . . . . . . . 15--22 Daniel R. McGlynn Review of \em Content Addressable Parallel Processors by Caxton C. Foster. Van Nostrand Reinhold Co. 1976 . . . . . 23--23 C. V. Ramamoorthy Review of \em Structured Computer Organization by Andrew S. Tanenbaum, Prentice-Hall 1976 . . . . . . . . . . . 23--23 W. Buchholz Review of \em Computer System Architecture by M. Morris Mano, Prentice-Hall 1976 . . . . . . . . . . . 24--24 Z. G. Vranesic Book reviews: Review of \em Content Addressable Parallel Processors by Caxton C. Foster, Van Nostrand Reinhold Co. 1976 . . . . . . . . . . . . . . . . 24--24
R. R. Korfhage and W. H. E. Day and L. L. Beck and W. F. Appelbe Data physics: an unorthodox view of data and its implications in data processors 1--7 George P. Copeland String storage and searching for data base applications: implementation on the INDY backend kernel . . . . . . . . . . 8--17 Allen J. Otis and George P. Copeland Editing requirements for data base applications and their implementation on the INDY backend kernel . . . . . . . . 18--29 G. Jack Lipovski Semantic paging on intelligent discs . . 30--34 Rhon Williams A multiprocessing system for the direct execution of LISP . . . . . . . . . . . 35--41 R. M. Bird and J. B. Newsbaum and J. L. Trefftzs Text file inversion: an evaluation . . . 42--50 David C. Roberts A specialized computer architecture for text retrieval . . . . . . . . . . . . . 51--59 M. J. Stucki and J. R. Cox and G. C. Roman and P. N. Turcu Coordinating concurrent access in a distributed database architecture . . . 60--64 Mohamed G. Gouda A hierarchical controller for concurrent accessing of distributed databases . . . 65--70 Bezalel Gavish and Harvey Koch An extensible architecture for data flow processing . . . . . . . . . . . . . . . 71--76 J. B. Harvill Functional parallelism in an operand state saving computer . . . . . . . . . 77--84 J. S. Hutchison and W. G. Roman Madman machine . . . . . . . . . . . . . 85--90 Jayanta Banerjee and David K. Hsiao The use of a database machine for supporting relational databases . . . . 91--98 Paul J. Sadowski and S. A. Schuster Exploiting parallelism in a Relational Associative Processor . . . . . . . . . 99--109 Hsu Chang Bubbles for relational database . . . . 110--116 A. El Masri and J. Rohmer and D. Tusera A machine for information retrieval . . 117--120 Dante R. Matteucci A distributed structure for the automization of the Catalog of the National Cultural Heritage: experiences and proposals . . . . . . . . . . . . . 121--133
Kenneth J. Thurber Computer communication techniques . . . 7--16 Hal W. Jennings A variation on the PDP 11 . . . . . . . 17--26
Per Brinch Hansen Multiprocessor architectures for concurrent programs . . . . . . . . . . 4--23 J. L. Keedy On the evaluation of expressions using accumulators, stacks and store-to-store instructions . . . . . . . . . . . . . . 24--27 Rahul Chattergy In the current literature . . . . . . . 30--30
Harvey G. Cragon An evaluation of code space requirements and performance of various architectures 5--21 Kenneth J. Thurber and Harvey A. Freeman A bibliography of local computer network architectures . . . . . . . . . . . . . 22--27
Lyle A. Cox, Jr. The nature of ``computer architecture'' 8--12 Jan L. A. van de Snepscheut and Gert A. Slavenburg Introducing the notion of processes to hardware . . . . . . . . . . . . . . . . 13--23 D. E. Atkins Review of \em Advances in Computer Architecture by Glenford J. Myers. Wiley-Interscience Division of John Wiley and Sons 1978 . . . . . . . . . . 25--26 Kevin W. Bowyer Book review of \em The Structure of Computers and Computations: Volume One by David J. Kuck. John Wiley & Sons 1978 27--30
Randall Gibson and Paul Anderson Technical overview of the Renaissance Octobus system . . . . . . . . . . . . . 2--9 Johan W. Stevenson and Andrew S. Tanenbaum Efficient encoding of machine instructions . . . . . . . . . . . . . . 10--17 J. L. Keedy More on the use of stacks in the evaluation of expressions . . . . . . . 18--22 G. E. Quick Intelligent memory: ``a parallel processing concept'' . . . . . . . . . . 23--28
Ronald L. Rivest The BLIZZARD computer architecture . . . 2--10 J. L. Keedy A technique for passing reference parameters in an information-hiding architecture . . . . . . . . . . . . . . 11--15
Krishna M. Kavipurapu and Dennis J. Frailey Quantification of architectures using software science . . . . . . . . . . . . 2--6 Trevor Turton A proposed high-speed computer design 7--21 Computer Architecture News staff In the current literature . . . . . . . 22--22
Dana Richards On a ``Counter--Example'' . . . . . . . 2--3 Peter J. Denning Why not innovations in computer architecture? . . . . . . . . . . . . . 4--7 G. W. Gerrity Hardware detection of undefined references . . . . . . . . . . . . . . . 8--11 Peter J. Denning and T. Don Dennis On minimizing contention at semaphores 12--19
Jack B. Dennis and G. Andrew Boughton and Clement K. C. Leung Building blocks for data flow prototypes 1--8 Edward S. Davidson A multiple stream microprocessor prototype system: AMP-1 . . . . . . . . 9--16 F. Andre and J. P. Banâtre and H. Leroy and G. Paget and F. Ployette and J. P. Routeau KENSUR: An architecture oriented towards programming languages translation . . . 17--22 J. G. Kuhl and S. M. Reddy Distributed fault-tolerance for large multiprocessor systems . . . . . . . . . 23--30 Miroslaw Malek A comparison connection assignment for diagnosis of multiprocessor systems . . 31--36 K. E. Grosspietsch and J. Kaiser and E. Nett A concept for test and reconfiguration of a fault-tolerant VLSI processor system . . . . . . . . . . . . . . . . . 37--43 Jean-Paul Brassard and Jan Gecsei Path building in cellular partitioning networks . . . . . . . . . . . . . . . . 44--50 Robert J. McMillen and Howard Jay Siegel MIMD machine communication using the augmented data manipulator network . . . 51--60 John P. Shen and John P. Hayes Fault tolerance of a class of connecting networks . . . . . . . . . . . . . . . . 61--71 E. G. Coffman, Jr. and Kimming So On the comparison between single and multiple processor systems . . . . . . . 72--79 V. Carl Hamacher and Gerald S. Shedler Performance of a collision-free local bus network having asynchronous distributed control . . . . . . . . . . 80--87 W. M. Zuberek Timed Petri nets and preliminary performance evaluation . . . . . . . . . 88--96 David R. Ditzel and David A. Patterson Retrospective on high-level language computer architecture . . . . . . . . . 97--104 J. P. Sansonnet and M. Castan and C. Percebois M3L: a list-directed architecture . . . 105--112 Yasushi Hibino A Practical Parallel Garbage Collection Algorithm and Its Implementation . . . . 113--120 Philip C. Treleaven and Geoffrey F. Mole A multi-processor reduction machine for user-defined reduction languages . . . . 121--130 Jeffrey M. Tobias A single user multiprocessor incorporating processor manipulation facilities . . . . . . . . . . . . . . . 131--138 Robert H. Halstead, Jr. and Stephen A. Ward The MuNet: a scalable decentralized architecture for parallel computation 139--145 Butler W. Lampson and Kenneth A. Pier A processor for a high-performance personal computer . . . . . . . . . . . 146--160 D. B. G. Edwards and A. E. Knowles and J. V. Woods MU6-G: a new design to achieve mainframe performance from a mini-sized computer 161--167 Kenneth E. Batcher Architecture of a massively parallel processor . . . . . . . . . . . . . . . 168--173 John Palmer The Intel 8087 numeric data processor 174--181 Robert H. Kuhn Efficient mapping of algorithms to single-stage interconnections . . . . . 182--189 David Nassimi and Sartaj Sahni A self routing Benes network . . . . . . 190--195 H. von Issendorff and W. Grünewald An adaptable network for functional distributed systems . . . . . . . . . . 196--201 Mokhtar Boshra Riad A combination of field and current access techniques for efficient and cost-effective bubble memories . . . . . 202--210 K. S. Trivedi Designing linear storage hierarchies so as to maximize reliability subject to cost and performance constraints . . . . 211--217 Sudhir R. Ahuja and Charles S. Roberts An associative/parallel processor for partial match retrieval using superimposed codes . . . . . . . . . . . 218--227 M. D. Ruggiero and S. G. Zaky A microprocessor-based virtual memory system . . . . . . . . . . . . . . . . . 228--235 Anand Jagannathan A technique for the architectural implementation of software subsystems 236--244 Viktors Berstis Security and protection of data in the IBM System/38 . . . . . . . . . . . . . 245--252 Miguel García Hoffmann Hardware implementation of communication protocols: a formal approach . . . . . . 253--263 P. Guillier and D. Slosberg An architecture with comprehensive facilities of inter-process synchronization and communication . . . 264--270 Robert M. Lougheed and David L. McCubbrey The cytocomputer: a practical pipelined image processor . . . . . . . . . . . . 271--277 C. Halatsis and A. van Dam and J. Joosten and M. Letheren Architectural considerations for a microprogrammable emulating engine using bit-slices . . . . . . . . . . . . . . . 278--291 Mary Jane Irwin and Don Heller Online pipeline systems for recursive numeric computations . . . . . . . . . . 292--299 M. J. Foster and H. T. Kung Design of special-purpose VLSI chips: Example and opinions . . . . . . . . . . 300--307 Anshul Kumar and P. C. P. Bhatt A structured language for CAD of digital systems . . . . . . . . . . . . . . . . 308--316 Uwe Hercksen and Rainer Klar and Wolfgang Kleinöder Hardware-measurements of storage access conflicts in the processor array EGPA(1) 317--324 Mario Tokoro and Kiichiro Tamaru and Masaaki Mizuno and Masao Hori A high level multi-lingual multiprocessor KMP/II . . . . . . . . . 325--333
Ken Aupperle A real innovation in computer architecture . . . . . . . . . . . . . . 6--7 John R. Galloway, Jr. Architectural innovation round: round #3 8--10 John A. Sharp Some thoughts on data flow architectures 11--21 Mary Payne and Dileep Bhandarkar VAX floating point: a solid foundation for numerical computation . . . . . . . 22--33 Lloyd Dickman Treasurer's report . . . . . . . . . . . 37--38 Computer Architecture News staff Current literature: abstracts of articles of interest\ldots . . . . . . . 48--48
Julian Davies Clock architecture and management . . . 3--6 G. Chroust and J. R. Mühlbacher Rivalling multiprocessor organization: a hardware/speed trade-off . . . . . . . . 7--10 David Stevenson A report on the proposed IEEE Floating Point Standard (IEEE Task p754) . . . . 11--12
Justin Rattner and George Cox Object-based computer architecture . . . 4--11 G. J. Myers and B. R. S. Buckingham A hardware implementation of capability-based addressing . . . . . . 12--24 David A. Patterson and David R. Ditzel The case for the reduced instruction set computer . . . . . . . . . . . . . . . . 25--33 Douglas W. Clark and William D. Strecker Comments on ``The Case for the Reduced Instruction Set Computer,'' by Patterson and Ditzel . . . . . . . . . . . . . . . 34--38 James C. Brakefield Is 32 bits of address too much? . . . . 39--40 James C. Brakefield The peripheral bus . . . . . . . . . . . 41--43 Trevor Mudge Book reviews: Review of \em The Structure of Computers and Computation, Vol. I by David J. Kuck, John Wiley & and Sons 1978 . . . . . . . . . . . . . . . 44--45 Computer Architecture News Staff Current literature: abstracts of articles of interest\ldots . . . . . . . 46--46
Karl Reed The way forward in computer architecture research . . . . . . . . . . . . . . . . 3--7 John Gilmore Suggested enhancements to the Motorola MC68000 . . . . . . . . . . . . . . . . 8--14 John F. Wakerly Pascal extensions for describing computer instruction sets . . . . . . . 15--23 Krishna M. Kavi Semantics of an algorithm . . . . . . . 24--26 Philip C. Treleaven VLSI: machine architecture and very high level languages . . . . . . . . . . . . 27--38
Lloyd Dickman SIGARCH business . . . . . . . . . . . . 7--8
Martin L. De Prycker A new index mode for the VAX-11 . . . . 10--11 David Stevenson The Phoenix Project . . . . . . . . . . 12--15 E. M. J. C. Van Oost Multi-processor system description and simulation using structured multi-programming languages . . . . . . 16--32 John Wakerly Book review: Review of 'The Computers that Saved Metropolis, by DC Comics and Radio Shack', July 1980 . . . . . . . . 33--34
Arvind and V. Kathail A Multiple Processor Data Flow Machine that Supports Generalized Procedures . . ??
G. W. Gerrity On processes and interrupts . . . . . . 4--14 Dwight D. Hill A hardware mechanism for supporting range checks . . . . . . . . . . . . . . 15--21 Vladimir S. Cherniavsky The computing memory another distributed computer architecture . . . . . . . . . 22--24 James E. Thornton 8th Annual Symposium on Computer Architecture: Heterogeneous Computer Architecture . . . . . . . . . . . . . . 25--33 Computer Architecture News Staff Errata for two publications . . . . . . 34--34
Donald C. Lindsay Cache memory for microprocessors . . . . 6--13 Krishna M. Kavi Innovative architectures and commercial computers: a summary of the panel discussion at NCC 1981 . . . . . . . . . 14--16 R. M. Jenevein and ?. DeGroot and G. Jack Lipovski Errata: ``A hardware support mechanism for scheduling resources in parallel machine environment'': (from Proceedings of the 8th Annual Symposium on Computer Architecture, p. 57) . . . . . . . . . . 17--17
C. K. Yuen Extending the power of short-wordlength processors by means of context-dependent machine instructions . . . . . . . . . . 9--15 Allan Gottlieb and Clyde P. Kruskal Coordinating parallel processors: a partial unification . . . . . . . . . . 16--24 Anonymous Errata: Structured machine design: an ongoing experiment . . . . . . . . . . . 25--25
Charlie McDowell Protection at the micromachine level . . 4--8 Edward A. Feustel Protected procedure call on the PRIME(TM) machines . . . . . . . . . . . 9--22 Hossam El-Halabi and Dharma P. Agrawal Some remarks on direct execution computers . . . . . . . . . . . . . . . 23--27 Daniel T. Fitzpatrick and John K. Foderaro and Manolis G. H. Katevenis and Howard A. Landman and David A. Patterson and James B. Peek and Zvi Peshkess and Carlo H. Séquin and Robert W. Sherburne and Korbin S. Van Dyke A RISCy approach to VLSI . . . . . . . . 28--32
Justin Rattner Hardware/software cooperation in the iAPX-432 . . . . . . . . . . . . . . . . 1--1 John Hennessy and Norman Jouppi and Forest Baskett and Thomas Gross and John Gill Hardware/software tradeoffs for increased performance . . . . . . . . . 2--11 James W. Rymarczyk Coding guidelines for pipelined processors . . . . . . . . . . . . . . . 12--19 Richard K. Johnsson and John D. Wick An overview of the mesa processor architecture . . . . . . . . . . . . . . 20--29 Alan D. Berenbaum and Michael W. Condry and Priscilla M. Lu The operating system and language support features of the BELLMACTM-32 microprocessor . . . . . . . . . . . . . 30--38 George Radin The 801 minicomputer . . . . . . . . . . 39--47 David R. Ditzel and H. R. McLellan Register allocation for free: The C machine stack cache . . . . . . . . . . 48--56 Samuel P. Harbison An architectural alternative to optimizing compilers . . . . . . . . . . 57--65 Butler W. Lampson Fast procedure calls . . . . . . . . . . 66--76 Douglas W. Jones Systematic protection mechanism design 77--80 Karl Reed On a general property of memory mapping tables . . . . . . . . . . . . . . . . . 81--86 Robert P. Cook and Nitin Donde An experiment to improve operand addressing . . . . . . . . . . . . . . . 87--91 Akira Fusaoka and Masaharu Hirayama Compiler chip: a hardware implementation of compiler . . . . . . . . . . . . . . 92--95 B. R. Rau and C. D. Glaeser and E. M. Greenawalt Architectural support for the efficient generation of code for horizontal architectures . . . . . . . . . . . . . 96--99 R. E. McLear and D. M. Scheibelhut and E. Tammaru Guidelines for creating a debuggable processor . . . . . . . . . . . . . . . 100--106 M. V. Wilkes Hardware support for memory protection: Capability implementations . . . . . . . 107--116 Fred J. Pollack and George W. Cox and Dan W. Hammerstrom and Kevin C. Kahn and Konrad K. Lai and Justin R. Rattner Supporting Ada memory management in the iAPX-432 . . . . . . . . . . . . . . . . 117--131 J. P. Sansonnet and M. Castan and C. Percebois and D. Botella and J. Perez Direct execution of Lisp on a list-directed architecture . . . . . . . 132--139 Mark Scott Johnson Some requirements for architectural support of software debugging . . . . . 140--148 C. A. Middelburg The effect of the PDP-11 architecture on code generation for chill . . . . . . . 149--157 Richard E. Sweet and James G. Sandman, Jr. Empirical analysis of the mesa instruction set . . . . . . . . . . . . 158--166 Gene McDaniel An analysis of a mesa instruction set using dynamic instruction frequencies 167--176 Cheryl A. Wiecek A case study of VAX-11 instruction set usage for compiler execution . . . . . . 177--184 Mamoru Maekawa and Ken Sakamura and Chiaki Ishikawa Firmware structure and architectural support for monitors, vertical migration and user microprogramming . . . . . . . 185--194 N. Kamibayashi and H. Ogawana and K. Nagayama and H. Aiso Heart: an operating system nucleus machine implemented by firmware . . . . 195--204 Sudhir R. Ahuja and Abhaya Asthana A multi-microprocessor architecture with hardware support for communication and scheduling . . . . . . . . . . . . . . . 205--209
David A. Patterson and Richard S. Piepho RISC assessment: a high-level language experiment . . . . . . . . . . . . . . . 3--8 Douglas W. Clark and Henry M. Levy Measurement and analysis of instruction use in the VAX-11/780 . . . . . . . . . 9--17 Krishna Kavi and Boumediene Belkhouche and Evelyn Bullard and Lois Delcambre and Stephen Nemecek HLL architectures: Pitfalls and predilections . . . . . . . . . . . . . 18--23 Allan Gottlieb and Ralph Grishman and Clyde P. Kruskal and Kevin P. McAuliffe and Larry Rudolph and Marc Snir The NYU Ultracomputer---designing a MIMD, shared-memory parallel machine (extended abstract) . . . . . . . . . . 27--42 King-Hang Chu and King-Sun Fu VLSI architectures for high speed recognition of context-free languages and finite-state languages . . . . . . . 43--49 Mark A. Franklin and Donald F. Wann Asynchronous and clocked control structures for VLSI based interconnection networks . . . . . . . . 50--59 Robert J. McMillen and Howard Jay Siegel Performance and fault tolerance improvements in the Inverse Augmented Data Manipulator network . . . . . . . . 63--72 D. S. Parker and C. S. Raghavendra The Gamma network: a multiprocessor interconnection network with redundant paths . . . . . . . . . . . . . . . . . 73--80 R. M. Jenevein and J. C. Browne A control processor for a reconfigurable array computer . . . . . . . . . . . . . 81--89 Laxmi N. Bhuyan and Dharma P. Agrawal A general class of processor interconnection strategies . . . . . . . 90--98 F. J. Burkowski Instruction set design issues relating to a static dataflow computer . . . . . 101--111 James E. Smith Decoupled access/execute computer architectures . . . . . . . . . . . . . 112--119 L. J. Caluwaerts and J. Debacker and J. A. Peperstraete A data flow architecture with a paged memory system . . . . . . . . . . . . . 120--127 B. Ramakrishna Rau and Christopher D. Glaeser and Raymond L. Picard Efficient code generation for horizontal architectures: Compiler techniques and architectural support . . . . . . . . . 131--139 Gene C. Barton Sentry: a novel hardware implementation of classic operating system mechanisms 140--147 M. Abramovici and Y. H. Levendel and P. R. Menon A logic simulation machine . . . . . . . 148--157 Subrata Dasgupta and Marius Olafsson Towards a family of languages for the design and implementation of machine architectures . . . . . . . . . . . . . 158--167 Yann-Hang Lee and Kang G. Shin Rollback propagation detection and performance evaluation of FTMR2M---a fault-tolerant multiprocessor . . . . . 171--180 Woei Lin and Chuan-lin Wu Design of a $ 2 \times 2 $ fault-tolerant switching element . . . . 181--189 Donald Fussell and Peter Varman Fault-tolerant wafer-scale architectures for VLSI . . . . . . . . . . . . . . . . 190--198 Sakti Pramanik Database filters . . . . . . . . . . . . 201--210 Mario Tokoro and Takashi Takizuka On the semantic structure of information --- a proposal of the abstract storage architecture . . . . . . . . . . . . . . 211--217 Yasunori Dohi and Akira Suzuki and Noriyuki Matsui Hardware sorter and its application to data base machine . . . . . . . . . . . 218--225 Philip C. Treleaven and Richard P. Hopkins A recursive computer architecture for VLSI . . . . . . . . . . . . . . . . . . 229--238 M. Castan and E. I. Organick $ \mu $3L: an HLL-RISC processor for parallel execution of FP-language programs . . . . . . . . . . . . . . . . 239--247 F. Hommes The heap/substitution concept --- an implementation of functional operations on data structures for a reduction machine . . . . . . . . . . . . . . . . 248--256 Paul F. Reynolds, Jr. A shared resource algorithm for distributed simulation . . . . . . . . . 259--266 Bijendra N. Jain Duplication of packets and their detection in X.25 communication protocols . . . . . . . . . . . . . . . 267--273 Pauline Markenscoff A multiple processor system for real time control tasks . . . . . . . . . . . 274--280 Leslie Jill Miller A heterogeneous multiprocessor design and the distributed scheduling of its task group workload . . . . . . . . . . 283--290 George H. Goble and Michael H. Marsh A dual processor VAX 11/780 . . . . . . 291--298 Michel Dubois and Fay\.e A. Briggs Effects of cache coherency in multiprocessors . . . . . . . . . . . . 299--308 T. N. Mudge and B. A. Makrucki Probabilistic analysis of a crossbar switch . . . . . . . . . . . . . . . . . 311--320 Steven P. Levitan and Caxton C. Foster Finding an extremum in a network . . . . 321--325 U. V. Premkumar and J. C. Browne Resource allocation in rectangular SW banyans . . . . . . . . . . . . . . . . 326--333 Anonymous List of authors . . . . . . . . . . . . 335--335
Alastair J. W. Mayer The architecture of the Burroughs B5000: 20 years later and still ahead of the times? . . . . . . . . . . . . . . . . . 3--10 James C. Brakefield From the other side of the Atlantic: how to improve upon the MU5 design . . . . . 11--16 Paul M. Hansen and Mark A. Linton and Robert N. Mayo and Marguerite Murphy and David A. Patterson A performance evaluation of the Intel iAPX 432 . . . . . . . . . . . . . . . . 17--26 Miquel Huguet The protection of the processor status word of the PDP-11/60 . . . . . . . . . 27--30 James Brakefield Just what is an op-code?: or a universal computer design . . . . . . . . . . . . 31--34
J. D. Knott and T. W. Crockett Fair dynamic arbitration for a multiprocessor communications bus . . . 4--9 James R. Larus A comparison of microcode, assembly code, and high-level languages on the VAX-11 and RISC I . . . . . . . . . . . 10--15 David A. Patterson A performance evaluation of the Intel 80286 . . . . . . . . . . . . . . . . . 16--18 Rod Egan The effect of VLSI on computer architecture . . . . . . . . . . . . . . 19--22 Thomas Benzie Book reviews: Review of \em Microcomputer Architecture and Programming by John F. Wakerly, John Wiley & Sons, Inc., 1981 . . . . . . . . 23--23
Henry M. Levy and Douglas W. Clark On the use of benchmarks for measuring system performance . . . . . . . . . . . 5--8 Peter Schulthess and Fritz Vonaesch OPA: a new architecture for Pascal-like languages . . . . . . . . . . . . . . . 9--20 James C. Brakefield Talk on interpreters . . . . . . . . . . 21--28 D. W. Doran Main frame computer trends . . . . . . . 29--44
Daniel Gajski and David Kuck and Duncan Lawrie and Ahmed Sameh CEDAR: a large scale multiprocessor . . 7--11 Elaine French and Hugh Glaser TUKI: a data flow processor . . . . . . 12--18 Nenad Marovac A systematic approach to the design and implementation of a computer instruction set . . . . . . . . . . . . . . . . . . 19--24 Harvey Cragon Executable instruction set specification 25--43 Robert P. Colwell and Charles Y. Hitchcock and E. Douglas Jensen Peering through the RISC/CISC fog: an outline of research . . . . . . . . . . 44--50 G. W. Gorsline Review of \em Advances in Computer Architecture by Glenford J. Myers, John Wiley & Sons, Inc. 1982 . . . . . . . . . 55--55 M. W. Sachs Book reviews: Review of \em Microcomputer Interfacing by G. Jack Lipovski, Lexington Books 1980 . . . . . 55--55
David Abramson and John Rosenberg Hardware support for program debuggers in a paged virtual memory . . . . . . . 8--19 Dennis J. Frailey Word length of a computer architecture definitions and applications . . . . . . 20--26 Lee A. Hollaar Book reviews: Review of \em Computer Design by Glen G. Langdon, Computeach Press . . . . . . . . . . . . . . . . . 27--28
Maurice V. Wilkes Size, power, and speed (keynote address) 2--4 W. K. Giloi Towards a taxonomy of computer architecture based on the machine data type view . . . . . . . . . . . . . . . 6--15 Algirdas Avi\vzienis Framework for a taxonomy of fault-tolerance attributes in computer systems . . . . . . . . . . . . . . . . 16--21 Björn Pehrson and Joachim Parrow Caddie an interactive design environment 24--31 Subrata Dasgupta On the verification of computer architectures using an architecture description language . . . . . . . . . . 32--38 Richard M. King Research on synthesis of concurrent computing systems (extended abstract) 39--46 Allan L. Fisher and H. T. Kung and Louis M. Monier and Yasunori Dohi Architecture of the PSC---a programmable systolic chip . . . . . . . . . . . . . 48--53 Allan L. Fisher and H. T. Kung Synchronizing large VLSI processor arrays . . . . . . . . . . . . . . . . . 54--58 Robert A. Wagner The Boolean Vector Machine [BVM] . . . . 59--66 M. A. Bonuccelli and E. Lodi and F. Luccio and P. Maestrini and L. Pagli A VLSI tree machine for relational data bases . . . . . . . . . . . . . . . . . 67--73 L. J. Caluwaerts and J. Debacker and J. A. Peperstraete Implementing streams on a data flow computer system with paged memory . . . 76--83 Joseph E. Requa The Piecewise Data Flow architecture control flow and register management . . 84--89 Mario Tokoro and J. R. Jagannathan and Hideki Sunahara On the working set concept for data-flow machines . . . . . . . . . . . . . . . . 90--97 R. W. Marczy\'nski and J. Milewski A data driven system based on a microprogrammed processor module . . . . 98--106 David A. Patterson and Phil Garrison and Mark Hill and Dimitris Lioupis and Chris Nyberg and Tim Sippel and Korbin Van Dyke Architecture of a VLSI instruction cache for a RISC . . . . . . . . . . . . . . . 108--116 Phil C. C. Yeh and Janak H. Patel and Edward S. Davidson Performance of shared cache for parallel-pipelined computer systems . . 117--123 James R. Goodman Using cache memory to reduce processor-memory traffic . . . . . . . . 124--131 James E. Smith and James R. Goodman A study of instruction cache organizations and replacement policies 132--137 Joseph A. Fisher Very Long Instruction Word architectures and the ELI-512 . . . . . . . . . . . . 140--150 Shinji Tomita and Kiyoshi Shibayama and Toshiaki Kitamura and Toshiyuki Nakata and Hiroshi Hagiwara A user-microprogrammable, local host computer with low-level parallelism . . 151--157 Richard H. Gumpertz Combining tags with error codes . . . . 160--165 Young Gil Park and Jung Wan Cho Fault diagnosis of bit-slice processor 166--172 M. A. Fiol and I. Alegre and J. L. A. Yebra Line digraph iterations and the (d,k) problem for directed graphs . . . . . . 174--177 Eli Opper and Miroslaw Malek and G. Jack Lipovski Resource allocation in rectangular CC-banyans . . . . . . . . . . . . . . . 178--184 Franti\vsek Sovi\vs Uniform theory of the shuffle-exchange type permutation networks . . . . . . . 185--191 Vason P. Srini and Jorge F. Asenjo Analysis of Cray-1S architecture . . . . 194--206 Harry F. Jordan Performance measurements on HEP --- a pipelined MIMD computer . . . . . . . . 207--212 Hideharu Amano and Takaichi Yoshida and Hideo Aiso (SM)2-Sparse Matrix Solving Machine . . 213--220 R. Kalyana Krishnan and A. K. Rajasekar and C. S. Moghe An experimental system for Computer Science instruction . . . . . . . . . . 222--227 Klaus Kronlöf Execution control and memory management of a Data Flow Signal Processor . . . . 230--235 Masasuke Kishi and Hiroshi Yasuhara and Yasusuke Kawamura DDDP---a Distributed Data Driven Processor . . . . . . . . . . . . . . . 236--242 Naohisa Takahashi and Makoto Amamiya A data flow processor array system: Design and analysis . . . . . . . . . . 243--250 Kenneth A. Pier A retrospective on the Dorado, a high-performance personal computer . . . 252--269 Robert J. Dugan System/370 extended architecture: a program view of the channel subsystem 270--276 Richard L. Norton and Jacob A. Abraham Adaptive interpretation as a means of exploiting complex instruction sets . . 277--282 Manoj Kumar and Daniel M. Dias and J. R. Jump Switching strategies in a class of packet switching networks . . . . . . . 284--300 Benjamin W. Wah A comparative study of distributed resource sharing on multiprocessors . . 301--308 W. Kent Fuchs and Jacob A. Abraham and Kuang-Hua Huang Concurrent error detection in VLSI interconnection networks . . . . . . . . 309--315 W. K. Giloi and P. Behr Hierarchical function distribution --- a design principle for advanced multicomputer architectures . . . . . . 318--325 Luigi Stringa EMMA-an industrial experience on large multiprocessing architectures . . . . . 326--333 Lars Philipson and Bo Nilsson and Bjorn Breidegard A communication structure for a multiprocessor computer with distributed global memory . . . . . . . . . . . . . 334--340 Hiromu Hayashi and Akira Hattori and Haruo Akimoto ALPHA---a high-performance LISP machine equipped with a new stack structure and garbage collection system . . . . . . . 342--348 Shinji Umeyama and Koichiro Tamura A parallel execution model of logic programs . . . . . . . . . . . . . . . . 349--355 Claudia Schmittgen and Werner Kluge A system architecture for the concurrent evaluation of applicative program expressions . . . . . . . . . . . . . . 356--362 Yoshinori Yamaguchi and Kenji Toda and Toshitsugu Yuba A performance evaluation of a Lisp-based data-driven machine (EM-3) . . . . . . . 363--369 Steven L. Tanimoto A pyramidal approach to parallel processing . . . . . . . . . . . . . . . 372--378 Gérard Gaillat The design of a parallel processor for image processing on-board satellites: an application oriented approach . . . . . 379--386 Hitoshi Nishimura and Hiroshi Ohno and Toru Kawata and Isao Shirakawa and Koichi Omura Links-1 --- a parallel pipelined multimicrocomputer system for image creation . . . . . . . . . . . . . . . . 387--394 T. Ericsson and P. E. Danielsson LIPP --- a SIMD multiprocessor architecture for image processing . . . 395--400 Philip C. Treleaven The new generation of computer architecture . . . . . . . . . . . . . . 402--409 Shunichi Uchida Inference machine: From sequential to parallel . . . . . . . . . . . . . . . . 410--416 Tohru Moto-oka Overview to the Fifth Generation Computer System project . . . . . . . . 417--422 Kunio Murakami and Takeo Kakuta and Nobuyoshi Miyazaki and Shigeki Shibayama and Haruo Yokota A relational data base machine: First step to knowledge base machine . . . . . 423--425 Arvind and Robert A. Iannucci A critique of multiprocessing von Neumann style . . . . . . . . . . . . . 426--436
Dwight D. Hill An analysis of C machine support for other block-structured languages . . . . 6--16 Nenad Marovac On interprocess interaction in distributed architectures . . . . . . . 17--22 Robert J. Schalkoff Towards an efficient, dedicated architecture for a Digital Geometric Image Transformer (DGIT) . . . . . . . . 23--29 Arieh Plotkin and Daniel Tabak A Tree Structured Architecture for semantic gap reduction . . . . . . . . . 30--44
Maurice V. Wilkes Keeping jump instructions out of the pipeline of a RISC-like computer . . . . 5--7 Jeremy Jones Puzzling with microcode . . . . . . . . 8--12 Wayne Amsbury A code-splitting algorithm . . . . . . . 13--21 Jack J. Dongarra Performance of various computers using standard linear equations software in a Fortran environment . . . . . . . . . . 22--27 M. R. Bhujade On the design of Always Compatible Instruction Set Architecture(ACISA) . . 28--30
J. L. Heath Re-evaluation of the RISC I . . . . . . 3--10 David A. Patterson RISC watch . . . . . . . . . . . . . . . 11--19 Michael Beeler Beyond the Baskett benchmark . . . . . . 20--31 Edward A. Feustel Process exchange on the PR1ME family of computers . . . . . . . . . . . . . . . 32--43 P. M. Fenwick Addressing operations for automatic data structure accessing . . . . . . . . . . 44--57 C. K. Yuen Some applications of the implicit register reference . . . . . . . . . . . 58--63 Krishna M. Kavi and K. Krishnamohan Architecture quality . . . . . . . . . . 64--72
Dharma P. Agrawal and Winser E. Alexander B-HIVE: a heterogeneous, interconnected, versatile and expandable multicomputer system . . . . . . . . . . . . . . . . . 7--13
F. J. Burkowski A vector and array multiprocessor extension of the sylvan architecture . . 4--11 Alejandro Kapauan and J. Timothy Field and Dennis B. Gannon and Lawrence Snyder The Pringle parallel computer . . . . . 12--20 Mehrad Yasrebi and G. J. Lipovski A state-of-the-art SIMD two-dimensional FFT array processor . . . . . . . . . . 21--27 Y. W. Ma and R. Krishnamurti The architecture of Replica: a special-purpose computer system for active multi-sensory perception of $3$-dimensional objects . . . . . . . . 30--37 Samuel M. Goldwasser A generalized object display processor architecture . . . . . . . . . . . . . . 38--47 Katsura Kawakami and Shigeo Shimazaki A special purpose LSI processor using the DDA algorithm for image transformation . . . . . . . . . . . . . 48--54 Benjamin W. Wah and Guo-Jie Li and Chee-Fen Yu The status of MANIP --- a multicomputer architecture for solving, combinatorial extremum-search problems . . . . . . . . 56--63 R. Gonzalez-Rubio and J. Rohmer and D. Terral The SCHUSS filter: a processor for non-numerical data processing . . . . . 64--73 Carl Ebeling and Andrew Palay The design and implementation of a VLSI chess move generator . . . . . . . . . . 74--80 Manjai Lee and Chuan-lin Wu Performance analysis of circuit switching, baseline interconnection networks . . . . . . . . . . . . . . . . 82--90 Clyde P. Kruskal and Marc Snir The importance of being square . . . . . 91--98 Chi-Yuan Chin and Kai Hwang Connection principles for multipath, packet switching networks . . . . . . . 99--108 Shlomo Weiss and James E. Smith Instruction issue logic for pipelined supercomputers . . . . . . . . . . . . . 110--118 Robert G. Wedig and Marc A. Rose The reduction of branch instruction execution overhead using structured control flow . . . . . . . . . . . . . . 119--125 Utpal Banerjee and Daniel D. Gajski Fast execution of loops with if statements . . . . . . . . . . . . . . . 126--132 Daniel Gajski and Won Kim and Shinya Fushimi A parallel pipelined relational query processor: an architectural overview . . 134--141 Arun K. Somani and Vinod K. Agarwal An efficient VLSI dictionary machine . . 142--150 Allan L. Fisher Dictionary machines with a small number of processors . . . . . . . . . . . . . 151--156 Mark D. Hill and Alan Jay Smith Experimental evaluation of on-chip microprocessor cache memories . . . . . 158--166 James R. Goodman and Men-chow Chiang The use of static column RAM as a memory hierarchy . . . . . . . . . . . . . . . 167--173 I. J. Haikala Cache hit ratios with geometric task switch intervals . . . . . . . . . . . . 175--175 Yutaka Ishikawa and Mario Tokoro The design of an object oriented architecture . . . . . . . . . . . . . . 178--187 David Ungar and Ricki Blau and Peter Foley and Dain Samples and David Patterson Architecture of SOAR: Smalltalk on a RISC . . . . . . . . . . . . . . . . . . 188--197 Pradip Bose and Edward S. Davidson Design of instruction set architectures for support of high-level languages . . 198--206 Patrice Quinton Automatic synthesis of systolic arrays from uniform recurrent equations . . . . 208--214 Chang nian Zhang and David Y. Y. Yun Multi-dimensional systolic networks, for Discrete Fourier Transform . . . . . . . 215--222 J. A. B. Fortes and D. I. Moldovan Data broadcasting in linearly scheduled array processors . . . . . . . . . . . . 224--231 I. V. Ramakrishnan and P. J. Varman Modular matrix multiplication on a linear array . . . . . . . . . . . . . . 232--238 T. R. N. Rao Joint encryption and error correction schemes . . . . . . . . . . . . . . . . 240--241 Bella Bose Unidirectional error correction/detection for VLSI memory . . 242--244 C. L. Chen Error-correcting codes for semiconductor memories . . . . . . . . . . . . . . . . 245--247 Khaled Abdel Ghaffar and Robert J. McEliece Soft error correction for increased densities in VLSI memories . . . . . . . 248--250 Richard M. King and Robert A. Wagner Combining speed with alpha-particle induced memory, error tolerance in a large Boolean vector machine . . . . . . 251--253 Laxmi N. Bhuyan On the performance of loosely coupled multiprocessors . . . . . . . . . . . . 256--262 Ravi Mehrotra and Sarosh N. Talukdar Scheduling of tasks for distributed processors . . . . . . . . . . . . . . . 263--270 Krishna M. Kavi and Edward W. Banios and Bruce D. Shriver Message repository definitional facility: an architectural model for interprocess communication . . . . . . . 271--278 Prithviraj Banerjee and Jacob A. Abraham Fault-secure algorithms for multiple-processor systems . . . . . . . 279--287 Lubomir Bic Execution of logic programs on a dataflow architecture . . . . . . . . . 290--296 W. G. Rudd and Duncan A. Buell and Donald M. Chiarulli A high performance factoring machine . . 297--300 Joel S. Emer and Douglas W. Clark A characterization of processor performance in the VAX-11/780 . . . . . 301--310 W. D. Moeller and G. Sandweg The peripheral processor PP4, a highly regular VLSI processor . . . . . . . . . 312--318 Lars Philipson VLSI based design principles for MIMD multiprocessor computers with distributed memory management . . . . . 319--327 M. R. Samatham and D. K. Pradhan A multiprocessor network suitable for single-chip VLSI implementation . . . . 328--339 Larry Rudolph and Zary Segall Dynamic decentralized cache schemes for MIMD parallel processors . . . . . . . . 340--347 Mark S. Papamarcos and Janak H. Patel A low-overhead coherence solution for multiprocessors with private cache memories . . . . . . . . . . . . . . . . 348--354 James Archibald and Jean Loup Baer An economical solution to the cache coherence problem . . . . . . . . . . . 355--362 Ilkka J. Haikala Cache hit ratios with geometric task switch intervals . . . . . . . . . . . . 364--371
Gilman D. Chesley A wafer microcomputer . . . . . . . . . 4--6 Howard Jay Siegel and Thomas Schwederski and Nathaniel J. Davis IV and James T. Kuehn PASM: a reconfigurable parallel system for image processing . . . . . . . . . . 7--19
Javaid Aslam Methodology for designing a computer architecture . . . . . . . . . . . . . . 4--11 Peter C. J. Graham Providing architectural support for expert systems . . . . . . . . . . . . . 12--18
Jack J. Dongarra Performance of various computers using standard linear equations software in a Fortran environment . . . . . . . . . . 3--11 T. M. Hor and C. K. Yuen The design and programming of a powerful short wordlength processor using context-dependent machine instructions 12--26 E. N. Miya Multiprocessor/distributed processing bibliography (in machine-readable form) 27--29
Weiming Hu Dataflow architecture for EEG patient monitor . . . . . . . . . . . . . . . . 3--10 A. G. Tagg Speculations on the evolution of an architecture . . . . . . . . . . . . . . 11--18 Brian Randell Hardware/software tradeoffs: a general design principle? . . . . . . . . . . . 19--21
V. K. Prasanna Kumar and C. S. Raghavendra Array processor with multiple broadcasting . . . . . . . . . . . . . . 2--10 G. Wolf and J. R. Jump Matrix multiplication in an interleaved array processing architecture . . . . . 11--17 J. R. Goodman and Jian-tu Hsieh and Koujuch Liou and Andrew R. Pleszkun and P. B. Schechter and Honesty C. Young PIPE: a VLSI decoupled architecture . . 20--27 Peter Y. T. Hsu and Joseph T. Rahmeh and Edward S. Davidson and Jacob A. Abraham TIDBITS: speedup via time-delay bit-slicing in ALU design for VLSI technology . . . . . . . . . . . . . . . 29--35 James E. Smith and Andrew R. Pleszkun Implementation of precise interrupts in pipelined processors . . . . . . . . . . 36--44 Herb Schwetman and Daniel Gajski and Dennis Gannon and Daniel Hills and Jacob Schwartz and James Browne Classification of parallel processor architectures (invited tutorial session) 45--45 Makoto Hasegawa and Yoshiharu Shigei High-speed top-of-stack scheme for VLSI processor: a management algorithm and its analysis . . . . . . . . . . . . . . 48--54 Charles Y. Hitchcock III and H. M. Brinkley Sprunt Analyzing multiple register sets . . . . 55--63 Alan Jay Smith Cache evaluation and the impact of workload choice . . . . . . . . . . . . 64--73 David A. Moon Architecture of the Symbolics 3600 . . . 76--83 Ashwin Ram and Janak H. Patel Parallel garbage collection without synchronization overhead . . . . . . . . 84--90 Gurindar S. Sohi and Edward S. Davidson and Janak H. Patel An efficient LISP-execution architecture with a new representation for list structures . . . . . . . . . . . . . . . 91--98 Hideharu Amano and Taisuke Boku and Tomohiro Kudoh and Hideo Aiso (SM)2-II: a new version of the sparse matrix solving machine . . . . . . . . . 100--107 John Beetem and Monty Denneau and Don Weingarten The GF11 supercomputer . . . . . . . . . 108--115 Bradley Warren Smith and Howard Jay Siegel Models for use in the design of macro-pipelined parallel processors . . 116--123 Jan Edler and Allan Gottlieb and Clyde P. Kruskal and Kevin P. McAuliffe and Larry Rudolph and Marc Snir and Patricia J. Teller and James Wilson Issues related to MIMD shared-memory computers: the NYU Ultracomputer approach . . . . . . . . . . . . . . . . 126--135 R. N. Ibbett and P. C. Capon and N. P. Topham MU6V: a parallel vector processing system . . . . . . . . . . . . . . . . . 136--144 Stephen F. Lundstrom A decentralized control, highly concurrent multiprocessor . . . . . . . 145--151 William J. Dally and James T. Kajiya An object oriented architecture . . . . 154--161 Edward F. Gehringer and J. Leslie Keedy Tagged architecture: how compelling are its advantages? . . . . . . . . . . . . 162--170 S. Nanba and N. Ohno and H. Kubo and H. Morisue and T. Ohshima and H. Yamagishi VM/4: ACOS-4 virtual machine architecture . . . . . . . . . . . . . . 171--178 T. P. Dobry and A. M. Despain and Y. N. Patt Performance studies of a Prolog machine architecture . . . . . . . . . . . . . . 180--190 Ryosei Nakazaki and Akihiko Konagaya and Shin'ichi Habata and Hideo Shimazu and Mamoru Umemutra and Masahiro Yamamoto and Minoru Yokota and Takashi Chikayama Design of a high-speed Prolog machine (HPM) . . . . . . . . . . . . . . . . . 191--197 Nam Sung Woo A hardware unification unit: design and analysis . . . . . . . . . . . . . . . . 198--205 Nicholas Matelan The FLEX/32 multicomputer . . . . . . . 209--213 J. Rattner Commercial multiprocessors (title only) 214--214 Dick Naedel Closely coupled asynchronous hierarchical and parallel processing in an open architecture . . . . . . . . . . 215--220 Jim Savage Parallel processing as a language design problem . . . . . . . . . . . . . . . . 221--224 David P. Rodgers Improvements in multiprocessor system design . . . . . . . . . . . . . . . . . 225--231 Peter B. Mark The Sequoia computer: a fault-tolerant tightly-coupled multiprocessor architecture . . . . . . . . . . . . . . 232--232 Elliot Nestle and Armond Inselberg The SYNAPSE N+1 System: architectural characteristics and performance data of a tightly-coupled multiprocessor system 233--239 Robert W. Horst and Timothy C. K. Chou An architecture for high volume transaction processing . . . . . . . . . 240--245 Harold Stone and Eric Manning and Harriet Rigas and Philip Treleaven The fifth generation computer systems projects (invited session) . . . . . . . 247--247 Shigeo Kamiya and Susumu Matsuda and Kazuhide Iwata and Shigeki Shibayama and Hiroshi Sakai and Kunio Murakami A hardware pipeline algorithm for relational database operation . . . . . 250--257 Dik Lun Lee A distributed multiple-response resolver for value-order retrieval . . . . . . . 258--265 John Feo and Roy Jenevein and J. C. Browne Dynamic, distributed resource configuration on SW-banyans . . . . . . 268--275 R. H. Katz and S. J. Eggers and D. A. Wood and C. L. Perkins and R. G. Sheldon Implementing a cache consistency protocol . . . . . . . . . . . . . . . . 276--283 Zhiyuan Li and Walid Abu-Sufah A technique for reducing synchronization overhead in large scale multiprocessors 284--291 Colin Whitby-Strevens The transputer . . . . . . . . . . . . . 292--300 A. R. Hurson and B. Shirazi A systolic multiplier unit and its VLSI design . . . . . . . . . . . . . . . . . 302--309 Rami Melhem A language for the simulation of systolic architectures . . . . . . . . . 310--314 Henry Y. H. Chuang and Guo He A versatile systolic array for matrix computations . . . . . . . . . . . . . . 315--322 Rex Vedder and Dennis Finn The Hughes Data Flow Multiprocessor: architecture for efficient signal and data processing . . . . . . . . . . . . 324--332 Kenneth R. Traub An abstract parallel graph reduction machine . . . . . . . . . . . . . . . . 333--341 Bruno R. Preiss and V. C. Hamacher Data flow on a queue machine . . . . . . 342--351 J. L. Gaudiot Methods for handling structures in data-flow systems . . . . . . . . . . . 352--358 M. R. Samatham and D. K. Pradhan The de Bruijn multiprocessor network: a versatile sorting network . . . . . . . 360--367 Nian-Feng Tzeng and Pen-Chung Yew and Chun-Qi Zhu A fault-tolerant scheme for multistage interconnection networks . . . . . . . . 368--375 V. P. Kumar and S. M. Reddy Design and analysis of fault-tolerant multistage interconnection networks with low link complexity . . . . . . . . . . 376--386 Nathaniel J. Davis IV and Howard Jay Siegel The performance analysis of partitioned circuit switched multistage interconnection networks . . . . . . . . 387--394 Dalibor Vrsalovic and Edward F. Gehringer and Zary Z. Segall and Daniel P. Siewiorek The influence of parallel decomposition strategies on the performance of multiprocessor systems . . . . . . . . . 396--405 Walid Abu-Sufah and Alex Y. Kwok Performance prediction tools for Cedar: a multiprocessor supercomputer . . . . . 406--413 José M. Llabería Griñó and Mateo Valero Cortés and Enrique Herrada Lillo and Jesús Labarta Mancho Analysis and simulation of multiplexed single-bus networks with and without buffering . . . . . . . . . . . . . . . 414--421 J. Sanguinetti and B. Kumar Performance of a message-based multiprocessor . . . . . . . . . . . . . 424--425
J.-Fr. Hake PDOC --- a database on parallel processing literature . . . . . . . . . 2--7 Mark Rockey The dataflow architecture: a suitable base for the implementation of expert systems . . . . . . . . . . . . . . . . 8--14 Harvey G. Cragon An architecture design system . . . . . 15--21 Miquel Huguet and Tomás Lang A reduced register file for RISC architectures . . . . . . . . . . . . . 22--31
Cedell A. Alexander and William M. Keshlear and Faye Briggs Translation buffer performance in a UNIX environment . . . . . . . . . . . . . . 2--14 Rosanna Lee On ``hot spot'' contention . . . . . . . 15--20
Nam Sung Woo and Richard O'Keefe A comment on ``A hardware unification unit: design and analysis'' . . . . . . 2--3 A. B. Ruighaver Design aspects of the Delft Parallel Processor DPP84 and its programming system . . . . . . . . . . . . . . . . . 4--8 Dan Hammerstrom and David Maier and Shreekant Thakkar The Cognitive Architecture Project . . . 9--21 Alan Jay Smith Bibliography and reading on CPU cache memories and related topics . . . . . . 22--42
H. Yokota and H. Itoh A model and an architecture for a relational knowledge base . . . . . . . 2--9 M. Amamiya and M. Takesue and R. Hasegawa and H. Mikami Implementation and evaluation of a list-processing-oriented data flow machine . . . . . . . . . . . . . . . . 10--19 K. Takahashi and H. Yamada and H. Nagai and K. Matsumi A new string search hardware architecture for VLSI . . . . . . . . . 20--27 A. Gupta and C. Forgy and A. Newell and R. Wedig Parallel algorithms and architectures for rule-based systems . . . . . . . . . 28--37 R. R. Halstead, Jr. and T. L. Anderson and R. B. Osborne and T. L. Sterling Concert: design of a multiprocessor development system . . . . . . . . . . . 40--48 H. T. Kung Memory requirements for balanced computer architectures . . . . . . . . . 49--54 Y. C. Hong and T. H. Payne and L. B. O. Ferguson Graph allocation in static dataflow systems . . . . . . . . . . . . . . . . 55--64 P. Agrawal and R. Agrawal Software implementation of a recursive fault tolerance algorithm on a network of computers . . . . . . . . . . . . . . 65--72 T. Nojiri and S. Kawasaki and K. Sakoda Microprogrammable processor for object-oriented architecture . . . . . . 74--81 S. S. Thakkar and W. E. Hostmann An instruction fetch unit for a graph reduction machine . . . . . . . . . . . 82--91 E. F. Gehringer and R. P. Colwell Fast object-oriented procedure calls: lessons from the Intel 432 . . . . . . . 92--101 D. M. Dias and B. R. Iyer and P. S. Yu On coupling many small systems for transaction processing . . . . . . . . . 104--110 M. I. Malkawi and J. H. Patel Performance measurement of paging behavior in multiprogramming systems . . 111--118 A. Agarwal and R. L. Sites and M. Horowitz ATUM: a new technique for capturing address traces using microcode . . . . . 119--127 M. J. Wise Experimenting with EPILOG: some results and preliminary conclusions . . . . . . 119--127 Y. Shobatake and H. Aiso A unification processor based on a uniformly structured cellular hardware 128--139 N. Ito and M. Sato and E. Kuno and K. Rokusawa The architecture and preliminary evaluation results of the experimental parallel inference machine PIM-D . . . . 149--156 A. Seznec An efficient routing control for the SIGMA network $ \Sigma (4) $ . . . . . . 158--168 J. D. Nicoud and K. Skala REYSM, a high performance, low power multi-processor bus . . . . . . . . . . 169--174 K. Y. Lee and W. Hegazy The extra stage gamma network . . . . . 175--182 M. Yuhara and A. Hattori and M. Niwa and M. Kishimoto and H. Hayashi Evaluation of the FACOM ALPHA Lisp machine . . . . . . . . . . . . . . . . 184--190 A. R. Pleszkun and M. J. Thazhuthaveetil An architecture for efficient Lisp list access . . . . . . . . . . . . . . . . . 191--198 T. Nakata and N. Koike A functional level simulation engine of MAN-YO: a special purpose parallel machine for logic design automation . . 202--208 E. H. Frank Exploiting parallelism in a switch-level simulation machine . . . . . . . . . . . 209--215 T. S. Anantharaman and R. Bisiani A hardware accelerator for speech recognition algorithms . . . . . . . . . 216--223 T. Shimada and K. Hiraki and K. Nishida and S. Sekiguchi Evaluation of a prototype data flow processor of the SIGMA-1 for scientific computations . . . . . . . . . . . . . . 226--234 J. Sargeant and C. C. Kirkham Stored data structures on the Manchester dataflow machine . . . . . . . . . . . . 235--242 K. Hawakami and J. R. Gurd A scalable dataflow structure store . . 243--250 M. Hasegawa and Y. Shigei $ A T^2 = O(N \log^4 N), T = O(\log N) $ Fast Fourier Transform in a light connected $3$-dimensional VLSI . . . . . 252--260 K. Sapiecha and R. Jarocki Modular architecture for high performance implementation of FFT algorithm . . . . . . . . . . . . . . . 261--270 J. J. Navarro and J. M. Llaberia and M. Valero Computing size-independent matrix problems on systolic array processors 271--278 S. Tomita and K. Shibayama and T. Nakata and S. Yuasa and H. Hagiwara A computer with low-level parallelism QA-2: its applications to $3$-D graphics and Prolog/Lisp machines . . . . . . . . 280--289 M. Hirayama VLSI oriented asynchronous architecture 290--296 W. Hwu and Y. N. Patt HPSm, a high performance restricted data flow architecture having minimal functionality . . . . . . . . . . . . . 297--306 K. Onaga and T. Takechi On design of rotary array communication and wavefront-driven algorithms for solving large-scale band-limited matrix equations . . . . . . . . . . . . . . . 308--315 L. M. Napolitano, Jr. A computer architecture for dynamic finite element analysis . . . . . . . . 316--323 D. T. Harper III and J. R. Jump Performance evaluation of vector accesses in parallel memories using a skewed storage scheme . . . . . . . . . 324--328 T. Kondo and T. Tsuchiya and T. Kitamura and Y. Sugiyama and T. Kimura Pseudo MIMD array processor---AAP2 . . . 330--337 A. L. Fisher Scan line array processors for image computation . . . . . . . . . . . . . . 338--345 M. Annaratone and E. Arnould and T. Gross and H. T. Kung and M. S. Lam Warp architecture and implementation . . 346--356 D. A. Wood and S. J. Eggers and G. Gibson and M. D. Hill and J. M. Pendleton An in-cache address translation mechanism . . . . . . . . . . . . . . . 358--365 D. R. Cheriton and G. A. Slavenburg and P. D. Boyle Software-controlled caches in the VMP multiprocessor . . . . . . . . . . . . . 366--374 J. R. Goodman and W. C. Hsu On the use of registers vs. cache to minimize memory traffic . . . . . . . . 375--383 P. Y. T. Hsu and E. S. Davidson Highly concurrent scalar processing . . 386--395 S. McFarling and J. Hennesey Reducing the cost of branches . . . . . 396--403 S. R. Kunkel and J. E. Smith Optimal pipelining in supercomputers . . 404--411 P. Sweazey and A. J. Smith A class of compatible cache consistency protocols and their support by the IEEE Futurebus . . . . . . . . . . . . . . . 414--423 P. Bitar and A. M. Despain Multiprocessor cache synchronization: issues, innovations, evolution . . . . . 424--433 M. Dubois and C. Scheurich and F. Briggs Memory access buffering in multiprocessors . . . . . . . . . . . . 434--442 G. S. Taylor and P. N. Hilfinger and J. R. Larus and D. A. Patterson and B. G. Zorn Evaluation of the SPUR Lisp architecture 444--452
Nam Sung Woo A reply to comments ``A Comment on 'A Hardware Unification Unit: Design and Analysis''\,' . . . . . . . . . . . . . 2--4 D. K. DuBose and D. K. Fotakis and D. Tabak A microcoded RISC . . . . . . . . . . . 5--16 Tomás Lang and Miquel Huguet Reduced register saving/restoring in single-window register files . . . . . . 17--26 Larry O'Neal Rouse The twisted double helix: a minimum distance architecture for 5th generation computing . . . . . . . . . . . . . . . 27--33 David M. Harland A recursively microcodable tagged architecture . . . . . . . . . . . . . . 34--40 Cedell Alexander and William Keshlear and Furrokh Cooper and Faye Briggs Cache memory performance in a Unix environment . . . . . . . . . . . . . . 41--61
Roger Stokes Traces for hardware verification . . . . 7--14 Claudio Kirner and Eduardo Marques Design of a distributed system support based on a centralized parallel bus . . 15--26 Mary Jane Irwin Secretary/Treasurer's Report . . . . . . 28--28
David M. Harland and Bruno Beloff Microcoding an object-oriented instruction set . . . . . . . . . . . . 3--12 William Stallings An annotated bibliography on reduced instruction set computers . . . . . . . 13--19
Robert H. Halstead, Jr. Overview of Concert MultiLisp: a multiprocessor symbolic computing system 5--14 Dave Patterson A progress report on SPUR: February 1, 1987 . . . . . . . . . . . . . . . . . . 15--21 A. Despain and Y. Patt and V. Srini and P. Bitar and W. Bush and C. Chien and W. Citrin and B. Fagin and W. Hwu and S. Melvin and R. McGeer and A. Singhal and M. Shebanow and P. Van Roy Aquarius . . . . . . . . . . . . . . . . 22--34 Madhur Kohli and Mark E. Giuliano and Jack Minker An overview of the PRISM project . . . . 35--42 M. V. Hermenegildo and R. A. Warren Designing a high performance parallel logic programming system . . . . . . . . 43--52 Jonathan W. Mills Coming to grips with a RISC: a report of the progress of the LOW RISC design group . . . . . . . . . . . . . . . . . 53--62 Brian Short Use of instruction set simulators to evaluate the LOW RISC . . . . . . . . . 63--67 Kurt M. Gutzmann Optimal dimension of hypercubes for sorting . . . . . . . . . . . . . . . . 68--72 Gilman Chesley Addressable WSI: a non-redundant approach . . . . . . . . . . . . . . . . 73--80 Nripendra N. Biswas and S. Srinivas and Trishala Dharanendra A centrally controlled shuffle network for reconfigurable and fault-tolerant architecture . . . . . . . . . . . . . . 81--87
D. R. Ditzel and H. R. McLellan Branch folding in the CRISP microprocessor: reducing branch delay to zero . . . . . . . . . . . . . . . . . . 2--8 J. A. DeRosa and H. M. Levy An evaluation of branch architectures 10--16 W. W. Hwu and Y. N. Patt Checkpoint repair for out-of-order execution machines . . . . . . . . . . . 18--26 G. S. Sohi and S. Vajapeyam Instruction issue logic for high-performance, interruptible pipelined processors . . . . . . . . . . 27--34 J. Swensen and Y. Patt Fast temporary storage for serial and parallel execution . . . . . . . . . . . 35--43 K. Wong and M. A. Franklin Performance analysis and design of a logic simulation machine . . . . . . . . 46--55 K. Doshi and P. Varman A modular systolic architecture for image convolutions . . . . . . . . . . . 56--63 S. Fujita and R. Aibara and M. Yamashita and T. Ae A template matching algorithm using optically-connected $3$-D VLSI architecture . . . . . . . . . . . . . . 64--70 B. Mendelson and G. M. Silberman Mapping data flow programs on a VLSI array of processors . . . . . . . . . . 72--80 D. Ghosal and L. N. Bhuyan Analytical modeling and architectural modifications of a dataflow computer . . 81--89 M. Takesue A unified resource management and execution control mechanism for data flow machines . . . . . . . . . . . . . 90--97 S. Abe and T. Bandoh and S. Yamaguchi and K. Kurosawa and K. Kiriyama High performance integrated Prolog processor IPP . . . . . . . . . . . . . 100--107 B. S. Fagin and A. M. Despain Performance studies of a parallel Prolog architecture . . . . . . . . . . . . . . 108--116 P. L. Civera and F. Maddaleno and G. L. Piccinini and M. Zamboni An experimental VLSI Prolog interpreter: preliminary measurements and results . . 117--126 O. Ridoux Deterministic and stochastic modeling of parallel garbage collection: towards real-time criteria . . . . . . . . . . . 128--136 C. Sun and Y. Tsu The sharing of environment in AND--OR-parallel execution of logic programs . . . . . . . . . . . . . . . . 137--144 A. Guha and R. Ramnarayan and M. Derstine Architectural issues in designing symbolic processors in optics . . . . . 145--151 A. Varma and C. S. Raghavendra Rearrangeability of multistage shuffle/exchange networks . . . . . . . 154--162 R. Beivide and E. Herrada and J. L. Balcazar and J. Labarta Optimized mesh-connected networks for SIMD and MIMD architectures . . . . . . 163--170 D. T. Harper III and J. R. Jump Performance evaluation of reduced bandwidth multistage interconnection networks . . . . . . . . . . . . . . . . 171--175 U. Ramachandran and M. Solomon and M. Vernon Hardware support for interprocess communication . . . . . . . . . . . . . 178--188 W. J. Dally and L. Chao and A. Chien and S. Hassoun and W. Horwat and J. Kaplan and P. Song and B. Totty and S. Wills Architecture of a message-driven processor . . . . . . . . . . . . . . . 189--196 M. Kumar Effect of storage allocation/reclamation methods on parallelism and storage requirements . . . . . . . . . . . . . . 197--205 J. H. Chang and H. Chao and K. So Cache design of a sub-micron CMOS System/370 . . . . . . . . . . . . . . . 208--213 M. Freeman An architectural perspective on a memory access controller . . . . . . . . . . . 214--223 K. Cheung and G. Sohi and K. Saluja and D. Pradhan Organization and analysis of a gracefully-degrading interleaved memory system . . . . . . . . . . . . . . . . . 224--231 C. Scheurich and M. Dubois Correct memory operation of cache-based multiprocessors . . . . . . . . . . . . 234--243 A. W. Wilson, Jr. Hierarchical cache/bus architecture for shared memory multiprocessors . . . . . 244--252 R. L. Lee and P. C. Yew and D. H. Lawrie Multiprocessor cache design considerations . . . . . . . . . . . . . 253--262 R. J. Eickemeyer and J. H. Patel Performance evaluation of multiple register sets . . . . . . . . . . . . . 264--271 T. J. Stanley and R. G. Wedig A performance analysis of automatically managed top of stack buffers . . . . . . 272--281 B. Moore and A. Padegs and R. Smith and W. Buchholz Concepts of the System/370 vector architecture . . . . . . . . . . . . . . 282--288 A. R. Pleszkun and J. R. Goodman and W. C. Hsu and R. T. Joersz and G. Bier and P. Woest and P. B. Schechter WISQ: a restartable architecture using queues . . . . . . . . . . . . . . . . . 290--299 P. Chow and M. Horowitz Architectural tradeoffs in the design of MIPS-X . . . . . . . . . . . . . . . . . 300--308 D. R. Ditzel and H. R. McLellan and A. D. Berenbaum The hardware architecture of the CRISP microprocessor . . . . . . . . . . . . . 309--319
Matthew Moore and Charles McDowell Bi-directional networks for large parallel processors . . . . . . . . . . 3--4 Ian Kaplan The LDF 100: a large grain dataflow parallel processor . . . . . . . . . . . 5--12 Stanley Lass Wide channel computers . . . . . . . . . 13--16 Reinder J. Bril An implementation independent approach to cache memories . . . . . . . . . . . 17--24 Reinder J. Bril On cacheability of lock-variables in tightly coupled multiprocessor systems 25--32
J. K. Iliffe A forward-looking method of Cache memory control . . . . . . . . . . . . . . . . 4--10 Amitava Bandyopadhyay and Yuan F. Zheng Combining both microcode and hardwired control in RISC . . . . . . . . . . . . 11--15 Martin Dowd An example RISC vector machine architecture . . . . . . . . . . . . . . 16--22 Sanjiv K. Bhatia and A. G. Starling Multilayered Illiac network scheme . . . 23--31 Lothar Nowak SAMP:a general purpose processor based on a self-timed VLIW structure . . . . . 32--39 Peter J. Ashenden and Chris J. Barter and Chris D. Marlin The Leopard workstation project . . . . 40--51 Y. P. Chiang and M. L. Manwaring Direct execution Lisp and cell memory 52--57 J. M. Terry Flow-control machines:the structured execution architecture (SXA) . . . . . . 58--69
Niklaus Wirth Hardware architectures for programming languages and programming languages for hardware architectures . . . . . . . . . 2--8 Bob Beck and Bob Kasten and Shreekant Thakkar VLSI assist for a multiprocessor . . . . 10--20 Roberto Bisiani and Alessandro Forin Architectural support for multilanguage parallel programming on heterogeneous systems . . . . . . . . . . . . . . . . 21--30 Richard Rashid and Avadis Tevanian and Michael Young and David Golub and Robert Baron Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures . . . . . . 31--39 John R. Hayes and Martin E. Fraeman and Robert L. Williams and Thomas Zaremba An architecture for the direct execution of the Forth programming language . . . 42--49 Peter Steenkiste and John Hennessy Tags and type checking in LISP: hardware and software approaches . . . . . . . . 50--59 Jack W. Davidson and Richard A. Vaughan The effect of instruction set complexity on program size and memory performance 60--64 Russell R. Atkinson and Edward M. McCreight The dragon processor . . . . . . . . . . 65--69 James R. Goodman Coherency for multiprocessor virtual address caches . . . . . . . . . . . . . 72--81 T. A. Cargill and B. N. Locanthi Cheap hardware support for software debugging and profiling . . . . . . . . 82--83 C. J. Georgiou and S. L. Palmer and P. L. Rosenfeld An experimental coprocessor for implementing persistent objects on an IBM 4381 . . . . . . . . . . . . . . . . 84--87 Daniel J. Magenheimer and Liz Peters and Karl Pettis and Dan Zuras Integer multiplication and division on the HP precision architecture . . . . . 90--99 David W. Wall and Michael L. Powell The Mahler experience: using an intermediate language as the machine description . . . . . . . . . . . . . . 100--104 Shlomo Weiss and James E. Smith A study of scalar compilation techniques for pipelined supercomputers . . . . . . 105--109 William R. Bush and A. Dain Samples and David Ungar and Paul N. Hilfinger Compiling Smalltalk-80 to a RISC . . . . 112--116 F. Chow and S. Correll and M. Himelstein and E. Killian and L. Weber How many addressing modes are enough? 117--121 Henry Massalin Superoptimizer: a look at the smallest program . . . . . . . . . . . . . . . . 122--126 Kazuo Taki and Katzuto Nakajima and Hiroshi Nakashima and Morihiro Ikeda Performance and architectural evaluation of the PSI machine . . . . . . . . . . . 128--135 Gaetano Borriello and Andrew R. Cherenson and Peter B. Danzig and Michael N. Nelson RISCs vs. CISCs for Prolog: a case study 136--145 Richard B. Kieburtz A RISC architecture for symbolic computation . . . . . . . . . . . . . . 146--155 David R. Ditzel and Hubert R. McLellan and Alan D. Berenbaum Design tradeoffs to support the C programming language in the CRISP microprocessor . . . . . . . . . . . . . 158--163 Charles P. Thacker and Lawrence C. Stewart Firefly: a multiprocessor workstation 164--172 Douglas W. Clark Pipelining and performance in the VAX 8800 processor . . . . . . . . . . . . . 173--177 Robert P. Colwell and Robert P. Nix and John J. O'Donnell and David B. Papworth and Paul K. Rodman A VLIW architecture for a trace scheduling compiler . . . . . . . . . . 180--192 Adam Levinthal and Pat Hanrahan and Mike Paquette and Jim Lawson Parallel computers for graphics applications . . . . . . . . . . . . . . 193--198 J. E. Smith and G. E. Dermer and B. D. Vanderwarn and S. D. Klinger and C. M. Rozewski The ZS-1 central processor . . . . . . . 199--204
E. E. E. Frietman and A. B. Ruighaver An electro-optic data communication system for the Delft parallel processor 2--8 G. B. Shippen and J. K. Archibald A tagged token dataflow machine for computing small, iterative algorithms 9--18
Clif Penn Preface to the Special issue on Neural Networks . . . . . . . . . . . . . . . . 6--6 Richard P. Lippmann An introduction to computing with neural nets . . . . . . . . . . . . . . . . . . 7--25 James A. Anderson and Edward J. Wisniewski and Susan R. Viscuso Software for neural networks . . . . . . 26--36 Simon Garth and Danny Pike An integrated system for neural network simulations . . . . . . . . . . . . . . 37--44 A. Jean Maren Conference report: IEEE First International Conference on Neural Networks . . . . . . . . . . . . . . . . 45--46 Jack J. Dongarra Performance of various computers using standard linear equations software in a FORTRAN environment . . . . . . . . . . 47--69 Wm. A. Wulf The WM computer architecture . . . . . . 70--84 Daniel Tabak Logarithmic indices for multiprocessor evaluation . . . . . . . . . . . . . . . 85--90 Martin Dowd An example RISC vector machine architecture . . . . . . . . . . . . . . 91--99 Martin Dowd RISC vector CPU's and crossbars in desktops . . . . . . . . . . . . . . . . 100--102 Stanley Lass Multiple instructions/operands per access to cache memory . . . . . . . . . 103--103 Wanda Gass Workshop report: synthesis of foo bars 104--108 F. Joel Ferguson Book Review: \em Logic Design Principles by Edward J. McCluskey, Prentice-Hall Publishers, Englewood Cliffs, New Jersey, 549 pp., \$39.95} . . . . . . . 109--109
J. Ghosh and K. Hwang Critical issues in mapping neural networks on message-passing multicomputers . . . . . . . . . . . . . 3--11 Y. Takefuji and R. Jannarone and Y. B. Cho and T. Chen Multinomial conjunctoid statistical learning machines . . . . . . . . . . . 12--17 A. Louri and K. Hwang A bit-plane architecture for optical computing with two-dimensional symbolic substitution . . . . . . . . . . . . . . 18--27 S. Fiske and W. J. Dally The reconfigurable arithmetic processor 30--36 A. R. Pleszkun and G. S. Sohi The performance potential of multiple functional unit processors . . . . . . . 37--44 W. W. Hwu and P. P. Chang Exploiting parallel microprocessor microarchitectures with a compiler code generator . . . . . . . . . . . . . . . 45--53 G. D. McNiven and E. S. Davidson Analysis of memory referencing behavior for design of local memories . . . . . . 56--63 R. J. Eickenmeyer and J. H. Patel Performance evaluation of on-chip register and cache organizations . . . . 64--72 J.-L. Baer and W.-H. Wang On the inclusion properties for multi-level cache hierarchies . . . . . 73--80 R. T. Short and H. M. Levy A simulation study of two-level caches 81--88 E. Chow and H. Madan and J. Peterson and D. Grunwald and D. Reed Hyperswitch network for the hypercube computer . . . . . . . . . . . . . . . . 90--99 D. C. Winsor and T. N. Mudge Analysis of bus hierarchies for multiprocessors . . . . . . . . . . . . 100--107 S. Wei and G. Lee Extra group network: a cost-effective fault-tolerant multistage interconnection network . . . . . . . . 108--115 H. Jiang and K. C. Smith A partial-multiple-bus computer structure with improved cost effectiveness . . . . . . . . . . . . . 116--122 I. Watson and V. Woods and P. Watson and R. Banach and M. Greenberg and J. Sargeant Flagship: a parallel architecture for declarative programming . . . . . . . . 124--130 R. A. Iannucci Toward a dataflow/von Neumann hybrid architecture . . . . . . . . . . . . . . 131--140 D. E. Culler and Arvind Resource requirements of dataflow programs . . . . . . . . . . . . . . . . 141--150 B. Sprunt and D. Kirk and L. Sha Priority-driven, preemptive I/O controllers for real-time systems . . . 152--159 S. B. Shukla and D. P. Agrawal A kernel-independent, pipelined architecture for real-time $2$-D convolution . . . . . . . . . . . . . . 160--166 W. Liu and T.-F. Yeh and W. E. Batchelor and R. Cavin Exploiting bit level concurrency in real-time geometric feature extractions 167--174 D. W. Clark and P. J. Bannon and J. B. Keller Measuring VAX 8800 performance with a histogram hardware monitor . . . . . . . 176--185 R. L. Sites and A. Agarwal Multiprocessor cache analysis using ATUM 186--195 S. Ng and D. Lang and R. Selinger Trade-offs between devices and paths in achieving disk interleaving . . . . . . 196--201 K. Jainandunsing and E. F. Deprettere Design of a concurrent computer for solving systems of linear equations . . 204--211 A. Wolfe and M. Breternitz, Jr. and C. Stephens and A. L. Ting and D. B. Kirk and R. P. Bianchini, Jr. and J. P. Shen The white dwarf: a high-performance application-specific processor . . . . . 212--222 J. L. Gaudiot and C. M. Lin and M. Hosseiniyar Solving partial differential equations in a data-driven multiprocessor environment . . . . . . . . . . . . . . 223--230 D. Lee Scrambled storage for parallel memory systems . . . . . . . . . . . . . . . . 232--239 V. Krishnaswamy and S. Ahuja and N. Carriero and D. Gelernter The architecture of a Linda coprocessor 240--249 H. T. Kung Deadlock avoidance for systolic communication . . . . . . . . . . . . . 252--260 K. So and V. Zecca Cache performance of vector processors 261--268 M. K. Vernon and U. Manber Distributed round-robin and first-come first-serve protocols and their applications to multiprocessor bus arbitration . . . . . . . . . . . . . . 269--279 A. Agarwal and R. Simoni and J. Hennessy and M. Horowitz An evaluation of directory schemes for cache coherence . . . . . . . . . . . . 280--298 S. Prybylski and M. Horowitz and J. Hennessy Performance tradeoffs in cache design 290--298 H. Cheong and A. V. Vaidenbaum A cache coherence scheme with fast selective invalidation . . . . . . . . . 299--307 M. K. Vernon and E. D. Lazowska and J. Zahorjan An accurate and efficient performance analysis technique for multiprocessor snooping cache-consistency protocols . . 308--315 D. Rau and J. A. B. Fortes and H. J. Siegel Destination tag routing techniques based on a state model for the LADM network 318--324 D. W. Kim and G. J. Lipovski and A. Hartmann and R. Jenevein Regular CC-banyan networks . . . . . . . 325--332 R. M. Jenevein and T. Mookken Traffic analysis of rectangular SW-banyan networks . . . . . . . . . . . 333--342 Y. Tamir and G. L. Frazier High-performance multi-queue buffers for VLSI communications switches . . . . . . 343--354 B. R. Preiss and V. C. Hamacher A cache-based message passing scheme for a shared-bus multiprocessor . . . . . . 358--364 T. Boku and S. Nomura and H. Amano IMPULSE: a high performance processing unit for multiprocessors for scientific calculation . . . . . . . . . . . . . . 365--372 S. J. Eggers and R. H. Katz A characterization of sharing in parallel programs and its application to coherency protocol evaluation . . . . . 373--382 G. J. Lipovski and P. Vaughan A fetch-and-op implementation for parallel computers . . . . . . . . . . . 384--392 A. Seznec and Y. Jégou Synchronizing processors through memory requests in a tightly coupled multiprocessor . . . . . . . . . . . . . 393--400 R. M. Fujimoto and J.-J. Tsai and G. Gopalakrishnan Design and performance of special purpose hardware for time warp . . . . . 401--409 D. R. Cheriton and A. Gupta and P. D. Boyle and H. A. Goosen The VMP multiprocessor: initial experience, refinements, and performance evaluation . . . . . . . . . . . . . . . 410--421 J. R. Goodman and P. J. Woest The Wisconsin multicube: a new large-scale cache-coherent multiprocessor . . . . . . . . . . . . . 422--431 E. Tick Data buffer performance for sequential Prolog architectures . . . . . . . . . . 434--442 R. H. Halstead, Jr. and T. Fujita MASA: a multithreaded processor architecture for parallel symbolic computing . . . . . . . . . . . . . . . 443--451 P. L. Butler and J. D. Allen, Jr. and D. W. Bouldin Parallel architecture for OPS5 . . . . . 452--457
David R. Cheriton and Pat Boyle and Gert A. Slavenburg Comments on ``Coherency for multiprocessor virtual addresses caches'' by James R. Goodman . . . . . . 3--6 James R. Goodman Reply to David R. Cheriton's, Pat Boyle's, and Gert A. Slavenburg's ``Comments on 'Coherency for multiprocessor virtual addressed caches''\,' by James R. Goodman . . . . 7--7 Guy Rabbat and Borko Furht and Ron Kibler Three-dimensional computers and measuring their performance . . . . . . 9--16 M. Castan and A. Contessa and E. Cousin and C. Coustet and B. Lecussan MaRs: a parallel graph reduction multiprocessor . . . . . . . . . . . . . 17--24 Alessandro Contessa An approach to fault tolerance and error recovery in a parallel graph reduction machine: MaRS---a case study . . . . . . 25--32 Chuck Crawford Evolution of the Harris H-series computers and speculations on their future . . . . . . . . . . . . . . . . . 33--39 Philip L. Good Structuring an instruction cache . . . . 40--43 Eric E. Johnson Completing an MIMD multiprocessor taxonomy . . . . . . . . . . . . . . . . 44--47 Douglas W. Jones The ultimate RISC . . . . . . . . . . . 48--55 Douglas W. Jones A minimal CISC . . . . . . . . . . . . . 56--63 Stanley Lass Shared cache multiprocessing with pack computers . . . . . . . . . . . . . . . 64--70 Norman P. Jouppi Superscalar vs. superpipelined machines 71--80 Lorne H. Schachter Book review of \em High-Performance Computer Architecture by Harold S. Stone. Addison-Wesley 1987 . . . . . . . 81--84
Umakishore Ramachandran Preface to the Special Issue on Architectural Support for Operating Systems . . . . . . . . . . . . . . . . 11--11 A. Asthana and H. V. Jagadish and J. A. Chandross and D. Lin and S. C. Knauer An intelligent memory system . . . . . . 12--20 Monica Beltrametti and Kenneth Bobey and John R. Zorbas The control mechanism for the Myrias parallel computer system . . . . . . . . 21--30 Raphael Finkel and Debra Hengsen YACKOS on a shared-memory multiprocessor 31--36 Marc F. Pucci and J. L. Alberi Optimized communication in an extended remote procedure call model . . . . . . 37--46 Jordi Cortadella and Teodor Jové Dynamic RAM for on-chip instruction caches . . . . . . . . . . . . . . . . . 45--50 M. Naderi Modelling and performance evaluation of multiprocessors organization with shared memories . . . . . . . . . . . . . . . . 51--74 Edward Gehringer and Janne Abullarade and Michael H. Gulyn A survey of commercial parallel processors . . . . . . . . . . . . . . . 75--107 Mark Lease and Mac Lively Comparing production system architectures . . . . . . . . . . . . . 108--116 Ivor Page and Jeff Niehaus The Flex architecture, a high speed graphics processor . . . . . . . . . . . 117--129 Kazuaki Murakami and Akira Fukuda and Toshinori Sueyoshi and Shinji Tomita An overview of the Kyushu University reconfigurable parallel processor . . . 130--137 Ora E. Percus and J. K. Percus Some results concerning clock-regulated queues . . . . . . . . . . . . . . . . . 138--144 Fleur Liane Williams Should SCC set condition codes? . . . . 145--149 Gordon B. Steven A novel effective address calculation mechanism for RISC microprocessors . . . 150--156 Behrooz Parhami From defects to failures: a view of dependable computing . . . . . . . . . . 157--168 David A. Patterson RISCY patents . . . . . . . . . . . . . 169--191 Helen C. Takacs Book review: \em A VLSI Architecture for Concurrent Data Structures by William J. Dally (Kluwer 1988) . . . . . . . . . . 192--193 Robert P. Colwell Book review: \em Computer Architecture and Organization, 2nd ed. by John P. Hayes (McGraw Hill, 1988) . . . . . . . 193--195 Charles E. McDowell Book review: \em Supercomputer Architectures by Paul B. Schneck (Kluwer Academic Publishers) . . . . . . . . . . 195--196
Herbert H. J. Hum and Guang R. Gao Summary of the workshop on frontiers in functional programming and dataflow architecture . . . . . . . . . . . . . . 12--19 Andre M. van Tilborg Instrumentation for distributed computing systems . . . . . . . . . . . 20--25 Glenn W. Griffin The ultimate ultimate RISC . . . . . . . 26--32 Douglas W. Jones Risks of comparing RISCs . . . . . . . . 33--34 M. Naderi Modelling and performance evaluation of multiprocessors, organizations with multi-memory units . . . . . . . . . . . 35--51 Peter Kogge and John Oldfield and Mark Brule and Charles Stormon VLSI and rule-based systems . . . . . . 52--65 Behrooz Parhami Book review: \em Memory Storage Patterns in Parallel Processing by Mary A. Mace (Kluwer Academic Publishers, Boston, 1987, 139 pp.) . . . . . . . . . . . . . 76--76
J. P. Moskowitz and C. Jousselin An algebraic memory model . . . . . . . 55--62 W. F. Wong A stack addressing scheme based on windowing . . . . . . . . . . . . . . . 63--69 Anonymous Pipelining through Dynamic Control ROM 70--72 Stanley E. Lass Some innovations in computer architecture . . . . . . . . . . . . . . 73--77 Philip Bitar Book reviews: Review of \em Parallel Execution of Logic Programs by John Conery. Kluwer Academic Publishers 1987 81--82
Robert Cohn and Thomas Gross and Monica Lam Architecture and compiler tradeoffs for a long instruction word processor . . . 2--14 Gurindar S. Sohi and Sriram Vajapeyam Tradeoffs in instruction format design for horizontal architectures . . . . . . 15--25 James C. Dehnert and Peter Y.-T. Hsu and Joseph P. Bratt Overlapped loop support in the Cydra 5 26--38 F. J. Burkowski and G. V. Cormack and G. D. P. Dueck Architectural support for synchronous task communication . . . . . . . . . . . 40--53 Rajiv Gupta The fuzzy barrier: a mechanism for high speed synchronization of processors . . 54--63 James R. Goodman and Mary K. Vernon and Philip J. Woest Efficient synchronization primitives for large-scale cache-coherent multiprocessors . . . . . . . . . . . . 64--75 J. M. Mellor-Crummey and T. J. LeBlanc A software instruction counter . . . . . 78--86 Z. Aral and I. Gerther and G. Schaffer Efficient debugging primitives for multiprocessors . . . . . . . . . . . . 87--95 M. E. Staknis Sheaved memory: architectural support for state saving and restoration in pages systems . . . . . . . . . . . . . 96--102 M. A. Holliday Reference history, page size, and migration daemons in local/remote architectures . . . . . . . . . . . . . 104--112 D. L. Black and R. F. Rashid and D. B. Golub and C. R. Hill Translation lookaside buffer consistency: a software approach . . . . 113--122 G. A. Gibson and L. Hellerstein and R. M. Karp and D. A. Patterson Failure correction techniques for large disk arrays . . . . . . . . . . . . . . 123--132 N. P. Jouppi and J. Bertoni and D. W. Wall A unified vector/scalar floating-point architecture . . . . . . . . . . . . . . 134--143 H. Mulder Data buffering: run-time versus compile-time support . . . . . . . . . . 144--151 T. L. Adams and R. E. Zimmerman An analysis of 8086 instruction set usage in MS DOS programs . . . . . . . . 152--160 J. Roos A real-time support processor for Ada tasking . . . . . . . . . . . . . . . . 162--171 Steven R. Vegdahl and Uwe F. Pleban The runtime environment for Scheme, a Scheme implementation on the 88000 . . . 172--182 S. McFarling Program optimization for instruction caches . . . . . . . . . . . . . . . . . 183--191 Paul A. Karger Using registers to optimize cross-domain call performance . . . . . . . . . . . . 194--204 Emmanuel Arnould and H. T. Kung and François Bitz and Robert D. Sansom and Eric C. Cooperm The design of nectar: a network backplane for heterogeneous multicomputers . . . . . . . . . . . . . 205--216 S. A. Delgado-Rannauro and T. J. Reynolds A message driven OR-parallel machine . . 217--228 S. Owicki and A. Agarwal Evaluating the performance of software cache coherence . . . . . . . . . . . . 230--242 W. Weber and A. Gupta Analysis of cache invalidation patterns in multiprocessors . . . . . . . . . . . 243--256 S. J. Eggers and R. H. Katz The effect of sharing on the cache and bus performance of parallel programs . . 257--270 N. P. Jouppi and D. W. Wall Available instruction-level parallelism for superscalar and superpipelined machines . . . . . . . . . . . . . . . . 272--282 W. J. Dally Micro-optimization of floating-point operations . . . . . . . . . . . . . . . 283--289 M. D. Smith and M. Johnson and M. A. Horowitz Limits on multiple instruction issue . . 290--302
S. J. Eggers and R. H. Katz Evaluating the performance of four snooping cache coherency protocols . . . 2--15 D. R. Cheriton and H. A. Goosen and P. D. Boyle Multi-level shared caching techniques for scalability in VMP-M/C . . . . . . . 16--24 A. Goto and A. Matsumoto and E. Tick Design and performance of a coherent cache for parallel logic programming architectures . . . . . . . . . . . . . 25--33 V. G. Grafe and G. S. Davidson and J. E. Hoch and V. P. Holmes The Epsilon dataflow processor . . . . . 36--45 S. Sakai and y. Yamaguchi and K. Hiraki and Y. Kodama and T. Yuba An architecture of a dataflow single chip processor . . . . . . . . . . . . . 46--53 P. Nitezki Exploiting data parallelism in signal processing on a dataflow machine . . . . 54--61 R. N. Ibbett and T. M. Hopkins and K. I. M. McKinnon Architectural mechanisms to support sparse vector processing . . . . . . . . 64--71 D. T. Harper and D. A. Linebarger A dynamic storage scheme for conflict-free vector access . . . . . . 72--77 K. Murakami and N. Irie and S. Tomita SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture 78--85 Y. Ben-Asher and D. Egozi and A. Schuster $2$-D SIMD algorithms in the perfect shuffle networks . . . . . . . . . . . . 88--95 M. Valero-Garcia and J. J. Navarro and J. M. Llaberia and M. Valero Systematic hardware adaptation of systolic algorithms . . . . . . . . . . 96--104 M.-S. Chen and K. G. Shin Task migration in hypercube multiprocessors . . . . . . . . . . . . 105--111 S. Przybylski and M. Horowitz and J. Hennessy Characteristics of performance-optimal multi-level cache hierarchies . . . . . 114--121 D. A. Wood and R. H. Katz Supporting reference and dirty bits in SPUR's virtual address cache . . . . . . 122--130 R. E. Kessler and R. Jooss and A. Lebeck and M. D. Hill Inexpensive implementations of set-associativity . . . . . . . . . . . 131--139 W. H. Wang and J.-L. Baer and H. M. Levy Organization and performance of a two-level virtual-real cache hierarchy 140--148 C. R. Jesshope and P. R. Miller and J. T. Yantchev High performance communications in processor networks . . . . . . . . . . . 150--157 H. E. Mizrahi and J. L. Baer and E. D. Lazowska and J. Zahorjan Introducing memory into the switch elements of multiprocessor interconnection networks . . . . . . . . 158--166 S. L. Scott and G. S. Sohi Using feedback to control tree saturation in multistage interconnection networks . . . . . . . . . . . . . . . . 167--176 P. D. Ezhilchelvan and S. K. Shrivastava and A. Tully Constructing replicated systems using processors with point-to-point communication links . . . . . . . . . . 177--184 H. Benker and J. M. Beacco and M. Dorochevsky and Th. Jeffré and A. Pöhlmann and J. Noyé and B. Poterie and J. C. Syre and O. Thibault and G. Watzlawik KCM: a knowledge crunching machine . . . 186--194 A. Singhal and Y. N. Patt A high performance Prolog processor with multiple function units . . . . . . . . 195--202 M. Morioka and S. Yamaguchi and T. Bandoh Evaluation of memory system for integrated Prolog processor IPP . . . . 203--210 K.-F. Wong and M. H. Williams A type driven hardware engine for Prolog clause retrieval over a large knowledge base . . . . . . . . . . . . . . . . . . 211--222 W. W. Hwu and T. M. Conte and P. P. Chang Comparing software and hardware schemes for reducing the cost of branches . . . 224--233 M. K. Farrens and a. R. Pleszkun Improving performance of small on-chip instruction caches . . . . . . . . . . . 234--241 W. W. Hwu and P. P. Chang Achieving high instruction cache performance with an optimizing compiler 242--251 P. Steenkiste The impact of code density on instruction cache performance . . . . . 252--259 R. S. Nikhil Can dataflow subsume von Neumann computing? . . . . . . . . . . . . . . . 262--272 W.-D. Weber and A. Gupta Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results . . . 273--280 N. P. Jouppi Architectural and organizational tradeoffs in the design of the MultiTitan CPU . . . . . . . . . . . . . 281--289 M. Sato and S. Ichikawa and E. Goto Run-time checking in Lisp by integrating memory addressing and range checking . . 290--297 A. Hopper and A. Jones and D. Lioupis Multiple vs. wide shared bus multiprocessors . . . . . . . . . . . . 300--306 M. Annaratone and R. Rühl Performance measurements on a commercial multiprocessor running parallel code . . 307--314 M. Annaratone and C. Pommerell and R. Rühl Interprocessor communication speed and performance in distributed-memory parallel processors . . . . . . . . . . 315--324 D. S. Ghosal and S. K. Tripathi and L. N. Bhuyan and H. Jiang Analysis of computation-communication issues in dynamic dataflow architectures 325--333 S. Kravitz and R. E. Bryant and R. Rutenbar Logic simulation on massively parallel architectures . . . . . . . . . . . . . 336--343 T. Fukazawa and T. Kimura and M. Tomizawa and K. Takeda and Y. Itoh R256: a research parallel processor for scientific computation . . . . . . . . . 344--351 M. L. Anido and D. J. Allerton and E. J. Zaluska A three-port/three-access register file for concurrent processing and I/O communication in a RISC-like graphics engine . . . . . . . . . . . . . . . . . 354--361 J. M. Mulder and R. J. Portier and A. Srivastava and R. in't Velt An architecture framework for application-specific and scalable architectures . . . . . . . . . . . . . 362--369 K. Kim and V. K. Prasanna-Kumar Perfect Latin squares and parallel array access . . . . . . . . . . . . . . . . . 372--379 S. Weiss An aperiodic storage scheme to reduce memory conflicts in vector processors 380--386 C.-L. Chen and C.-K. Liao Analysis of vector access performance on skewed interleaved memory . . . . . . . 387--394 A. Agarwal and M. Cherian Adaptive backoff synchronization techniques . . . . . . . . . . . . . . . 396--406 P. Stenström A cache consistency protocol for multiprocessors with multistage networks 407--415 H.-M. Su and P.-C. Yew On data synchronization for multiprocessors . . . . . . . . . . . . 416--423
A. M. van Tilborg Panel on future directions in parallel computer architecture . . . . . . . . . 3--53 N. J. Gunther and M. T. Noga ParcBench: a benchmark for shared-memory architectures . . . . . . . . . . . . . 54--61 A. Elkateeb and T. Le-Ngoc A priority strategy on RISC for real-time multitasking software applications . . . . . . . . . . . . . . 62--68 Y.-J. Oyang A multiprocessor configuration in accordance with the aspects of physical and systems design . . . . . . . . . . . 69--73 H. Seebauer A memory controller executing segment operations in time $ O(1) $ . . . . . . 74--81 R. J. Schwartz The design and development of a dynamic program behavior measurement tool for the Intel 8086/88 . . . . . . . . . . . 82--94 A. J. Martin and S. M. Burns and T. K. Lee and D. Borkovic and P. J. Hazewindus The first asynchronous microprocessor: the test results . . . . . . . . . . . . 95--110 F. Cornett The UT1000 microprogramming simulator: an educational tool . . . . . . . . . . 111--118 C. K. Yuen and W. F. Wong A bidirectional data driven Lisp engine for the direct execution of Lisp in parallel . . . . . . . . . . . . . . . . 119--130
M. Smotherman A sequencing-based taxonomy of I/O systems and review of historical machines . . . . . . . . . . . . . . . . 5--15 R. Cousins DMA considerations on RISC workstations 16--23 R. H. Katz A project on high performance I/O subsystems . . . . . . . . . . . . . . . 24--31 P. C. Dibble and M. L. Scott Beyond striping: the bridge multiprocessor file system . . . . . . . 32--39 A. L. N. Reddy and P. Banerjee A study parallel disk organizations . . 40--47 J. M. Smith and G. Q. Maguire, Jr. Measured response times for page-sized fetches on a network . . . . . . . . . . 48--54 B. Wolman and T. M. Olson IOBENCH: a system independent IO benchmark . . . . . . . . . . . . . . . 55--70 T. M. Oslon Disk array performance in a random IO environment . . . . . . . . . . . . . . 71--77 B. L. Wolman An analysis of server-based locking . . 78--82 E. H. Debaere Instruction-path coprocessing to solve some RISC problems . . . . . . . . . . . 83--94 H. Seebauer A memory controller executing segment operations in time $ O(1) $ . . . . . . 95--102 P. K. Chiu Representation of logic functions by if--then clauses . . . . . . . . . . . . 103--107 C. Baleanu and D. Tomescu Embedding computers in a cellular array 108--115 S. Lass On hardware enhanced 80386 software emulation, compiled emulation, a program distribution language, and pack computers . . . . . . . . . . . . . . . 116--118
Daniel Litaize and Omar Hammami and Mustapha Lalam and Adelaziz Mzoughi and Pascl Sinrat Multiprocessors with a serial multiport memory and a pseudo crossbar of serial links used as a processor-memory switch 8--21 G. Fritsch and W. Henning and H. Hesenuer and R. Klar and C. U. Linster and C. w. Oehlrich and P. Schlenk and J. Vokert Distributed shared memory multiprocessor architecture MEMSY for high performance parallel computations . . . . . . . . . 22--35 A. Mendelson and D. K. Pradhan and A. D. Singh A single cached copy data coherence scheme for multiprocessor systems . . . 36--49 Dror G. Feitelson and Larry Rudolph Architecture for a multi-user general-purpose parallel system . . . . 50--56 D. Quammen and D. R. Miller and D. Tabak Register window architecture for multitasking applications . . . . . . . 57--66 Arnold Rosenberg Efficient emulations of interconnection networks . . . . . . . . . . . . . . . . 67--79 Isaac D. Scherson and Peter F. Corbett Description and performance of a class of orthogonal multiprocessor networks 80--90 Llana David and Ran Ginosar and Michael Yoeli An efficient implementation of Boolean functions and finite state machine as self-timed circuit . . . . . . . . . . . 91--104 Apostolos Dollan and Robert F. Krick The case for the sustained performance computer architecture . . . . . . . . . 129--136 Eric E. Johnson Working set prefetching for cache memories . . . . . . . . . . . . . . . . 137--141 K. e H. Lee and C. H. Lam Massage-passing controller for a shared-memory multiprocessor . . . . . . 142--149 Tsong-Chih Hsu and Ling-Yang Kung Logic and conflict-free vector addresses 150--153 Tsong-Chih Hsu and Ling-Yang Kung An address generation unit for array accessing . . . . . . . . . . . . . . . 154--160 Tsong-Chih Hsu and Ling-Yang Kung A hardware mechanism for priority queue 162--169
V. Dvorak Microsequencer architecture supporting arbitrary branching up to 2m targets . . 9--9 Jack J. Dongarra Performance of various computers using standard linear equations software . . . 17--17 Tsong---Chih Hsu and Ling---Yang Kung A comment on ``A Fetch-and-Op Implementation for Parallel Computers'' 32--32 Robert Cousins A novel approach to character interfaces 35--35 Robert Cousins A reentrant peripheral interface . . . . 43--43 Noel W. Anderson Amorphous computer system architecture: a preliminary look . . . . . . . . . . . 51--51 Yen-Jen Oyang and Bor-Ting Chang and Shu-May Lin A cost-effective approach to implement a long instruction word microprocessor . . 59--59 C. Fritsch and T. Sánchez and J. Anaya Primitive based architectures . . . . . 73--73 Harold Lorin A model for recentralization of computing: (distributed processing comes home) . . . . . . . . . . . . . . . . . 81--81 Dan Teodosiu Computing in three dimensions . . . . . 99--99 Gary Frazier Ariel: a scalable multiprocessor for the simulation of neural networks . . . . . 107--107 Robert P. Colwell Book review: \em High-Level Language Computer Architecture edited by Veljko Milutinovic (Computer Science Press, 1989) . . . . . . . . . . . . . . . . . 120--122 Behrooz Parhami Book review: \em Advanced Research in VLSI, edited by Charles L. Seitz (The MIT Press, Cambridge, MA, 1989, 373 pp.) 122--123
Wolfgang Matthes Hardware Resources: a generalizing view on computer architectures . . . . . . . 7--14 Lawrence Rauchwerger and Michael P. Farmwald A multiple floating point coprocessor architecture . . . . . . . . . . . . . . 15--24 Andy Glew and Wen-Mei Hwu Snoopy cache test-and-test-and-set without excessive bus contention . . . . 25--32 Lee Higbee Quick and easy cache performance analysis . . . . . . . . . . . . . . . . 33--44 Arvin Park and Jeffrey C. Becker and Richard J. Lipton IOStone: a synthetic file system benchmark . . . . . . . . . . . . . . . 45--52 Dionisios N. Pnevmatikatos and Mark D. Hill Cache performance of the integer SPEC benchmarks on a RISC . . . . . . . . . . 53--68 A. B. Ruighaver A modular network for dense optical interconnection of processing elements 69--75 Alessandro De Gloria VISA: a variable instruction set architecture . . . . . . . . . . . . . . 76--84 Fleur L. Williams and Gordon B. Steven Address and data register separation on the M68000 family . . . . . . . . . . . 85--89
Sarita V. Adve and Mark D. Hill Weak ordering---a new definition . . . . 2--14 Kourosh Gharachorloo and Daniel Lenoski and James Laudon and Phillip Gibbons and Anoop Gupta and John Hennessy Memory consistency and event ordering in scalable shared-memory multiprocessors 15--26 Joonwon Lee and Umakishore Ramachandran Synchronization with multiprocessor caches . . . . . . . . . . . . . . . . . 27--37 Po-Jen Chuang and Nian-Feng Tzeng Dynamic processor allocation in hypercube computers . . . . . . . . . . 40--49 Abdou Youssef and Bruce Arden A new approach to fast control of $ r_2 \times r_2 $ $3$-stage Benes networks of $ r \times r$ crossbar switches . . . . 50--59 William J. Dally Virtual-channel flow control . . . . . . 60--68 Shekhar Borkar and Robert Cohn and George Cox and Thomas Gross and H. T. Kung and Monica Lam and Margie Levine and Brian Moore and Wire Moore and Craig Peterson and Jim Susman and Jim Sutton and John Urbanski and Jon Webb Supporting systolic and memory communication in iWarp . . . . . . . . . 70--81 Gregory M. Papadopoulos and David E. Culler Monsoon: an explicit token-store architecture . . . . . . . . . . . . . . 82--91 Marco Annaratone and Marco Fillo and Kiyoshi Nakabayashi and Marc Viredaz The K2 parallel processor: architecture and hardware implementation . . . . . . 92--101 Anant Agarwal and Beng-Hong Lim and David Kranz and John Kubiatowicz APRIL: a processor architecture for multiprocessing . . . . . . . . . . . . 104--114 Roberto Bisiani and Mosur Ravishankar PLUS: a distributed shared-memory system 115--124 John K. Bennett and John B. Carter and Willy Zwaenepoel Adaptive software cache management for distributed shared memory architectures 125--134 David R. Ditzel and John L. Hennessy and Bernie Rudin and Alan Jay Smith and Stephen L. Squires and Zeke Zalcstein Big science versus little science---do you have to build it? (panel session) 136--136 Brian W. O'Krafka and A. Richard Newton An empirical evaluation of two memory-efficient directory methods . . . 138--147 Daniel Lenoski and James Laudon and Kourosh Gharachorloo and Anoop Gupta and John Hennessy The directory-based cache coherence protocol for the DASH multiprocessor . . 148--159 Steven Przybylski The performance impact of block sizes and fetch strategies . . . . . . . . . . 160--169 D. Alpert and A. Averbuch and O. Danieli Performance comparison of load/store and symmetric instruction set architectures 172--181 Jack W. Davidson and David B. Whalley Reducing the cost of branches by using registers . . . . . . . . . . . . . . . 182--191 Carl E. Love and Harry F. Jordan An investigation of static versus dynamic scheduling . . . . . . . . . . . 192--201 Dileep Bhandarkar and Richard Brunner VAX vector architecture . . . . . . . . 204--215 Robert W. Horst and Richard L. Harris and Robert L. Jardine Multiple instruction issue in the NonStop Cyclone processor . . . . . . . 216--226 Shreekant S. Thakkar and Mark Sweiger Performance of an OLTP application on symmetry multiprocessor system . . . . . 228--238 Ding-Kai Chen and Hong-Men Su and Pen-Chung Yew The impact of synchronization and granularity on parallel systems . . . . 239--248 Håkon O. Bugge and Ernst H. Kristiansen and Bjòrn O. Bakka Trace-driven simulations for a two-level cache design in open bus systems . . . . 250--259 Jiun-Ming Hsu and Prithviraj Banerjee Performance measurement and trace driven simulation of parallel CAD and numeric applications on a hypercube multicomputer . . . . . . . . . . . . . 260--269 Anita Borg and R. E. Kessler and David W. Wall Generation and analysis of very long address traces . . . . . . . . . . . . . 270--279 Bruce K. Holmer and Barton Sano and Michael Carlton and Peter Van Roy and Ralph Haygood and William R. Bush and Alvin M. Despain and Joan M. Pendleton and Tep Dobry Fast Prolog with an extended general purpose architecture . . . . . . . . . . 282--291 Leon Alkalaj and Tomás Lang and Milo\vs Ercegovac Architectural support for the management of tightly-coupled fine-grain goals in flat concurrent Prolog . . . . . . . . . 292--301 Samuel Ho and Lawrence Snyder Balance in architectural design . . . . 302--310 A. L. Narasimha Reddy and Prithviraj Banerjee A study of I/O behavior of perfect benchmarks on a multiprocessor . . . . . 312--321 Peter M. Chen and David A. Patterson Maximizing performance in a striped disk array . . . . . . . . . . . . . . . . . 322--331 Kang G. Shin and Greg Dykema A distributed I/O architecture for HARTS 332--342 Michael D. Smith and Monica S. Lam and Mark A. Horowitz Boosting beyond static scheduling in a superscalar processor . . . . . . . . . 344--354 George Taylor and Peter Davies and Michael Farmwald The TLB slice---a low-cost high-speed address translation mechanism . . . . . 355--363 Norman P. Jouppi Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers . . . . . . . . . . . . . . . . 364--373 Edward S. Davidson and Gurindar S. Sohl and Joseph A. Fisher and Greg Grohoski and Yale Pratt and J. E. Smith and David R. Stiles Better than one operation per clock (panel): vectors, VLIW, and superscalar 376--376
Robert Alverson and David Callahan and Daniel Cummings and Brian Koblenz and Allan Porterfield and Burton Smith The Tera computer system . . . . . . . . 1--6 K. Hwang and M. Dubois and D. K. Panda and S. Rao and S. Shang and A. Uresin and W. Mao and H. Nair and M. Lytwyn and F. Hsieh and J. Liu and S. Mehrotra and C. M. Cheng OMP: a RISC-based multiprocessor using orthogonal-access memories and multiple spanning buses . . . . . . . . . . . . . 7--22 Kechang Dai and Wolfgang K. Giloi A basic architecture supporting LGDG computation . . . . . . . . . . . . . . 23--33 Sang Lyul Min and Jean-Loup Baer and Hyoung-Joo Kim An efficient caching support for critical sections in large-scale shared-memory multiprocessors . . . . . 34--47 Umpei Nagashima and Fumio Nishimoto and Takashi Shibata and Hiroshi Itoh and Minoru Gotoh An improvement of I/O function for auxiliary storage: parallel I/O for a large scale supercomputing . . . . . . . 48--59 Nian-Feng Tzeng Analysis of a variant hypercube topology 60--70 P. J. van der Houwen and B. P. Sommeijer Parallel ODE solvers . . . . . . . . . . 71--81 M. J. Daydé and I. S. Duff Use of parallel level 3 BLAS in LU factorization on three vector multiprocessors the ALLIANT FX/80, the CRAY-2, and the IBM 3090 VF . . . . . . 82--95 E. N. Houstis and J. R. Rice and N. P. Chrisochoides and H. C. Karathanasis and P. N. Papachiou and M. K. Samartzis and E. A. Vavalis and Ko Yang Wang and S. Weerawarana //ELLPACK: a numerical simulation programming environment for parallel MIMD machines . . . . . . . . . . . . . 96--107 Christina C. Christara Schur complement preconditioned conjugate gradient methods for spline collocation equations . . . . . . . . . 108--120 Kuo-Liang Chung and Ferng-Ching Lin and Wen-Chin Chen Cost-optimal parallel B-spline interpolations . . . . . . . . . . . . . 121--131 K. Gallivan and A. Sameh and Z. Zlatev Solving general sparse linear systems using conjugate gradient-type methods 132--139 Toshitsugu Yuba and Toshio Shimada and Yoshinori Yamaguchi and Kei Hiraki and Shuichi Sakai Dataflow computer development in Japan 140--147 Vivek Sarkar and David Cann POSC---a partitioning and optimizing SISAL compiler . . . . . . . . . . . . . 148--164 François Bodin and François Charot Loop optimization for horizontal microcoded machines . . . . . . . . . . 164--176 Peiyi Tang and Pen-Chung Yew and Chuan-Qi Zhu Compiler techniques for data synchronization in nested parallel loops 177--186 David E. Hudak and Santosh G. Abraham Compiler techniques for data partitioning of sequentially iterated parallel loops . . . . . . . . . . . . . 187--200 David Klappholz and Kleanthis Psarris and Xiangyun Kong On the perfect accuracy of an approximate subscript analysis test . . 201--212 Allen D. Malony and Daniel A. Reed A hardware-based performance monitor for the Intel iPSC/2 hypercube . . . . . . . 213--226 R. T. Dimpsey and R. K. Iyer Performance degradation due to multiprogramming and system overheads in real workloads: case study on a shared memory multiprocessor . . . . . . . . . 227--238 Youcef Saad and Harry A. G. Wijshoff SPARK: a benchmark package for sparse computations . . . . . . . . . . . . . . 239--253 George Cybenko and Lyle Kipp and Lynn Pointer and David Kuck Supercomputer performance evaluation and the Perfect Benchmarks . . . . . . . . . 254--266 Ahmed K. Noor and Jeanne M. Peters Strategies for large-scale structural problems on high-performance computers 267--280 V. Zecca and A. Kamel Elastodynamics on clustered vector multiprocessors . . . . . . . . . . . . 281--290 Victor Eijkhout Implementation of $5$-point/$9$-point multi-level methods on hypercube architectures . . . . . . . . . . . . . 291--295 Philip C. Chen Supercomputer-based visualization systems used for analyzing output data of a numerical weather prediction model 296--309 Yoshizo Takahashi and Shigetaka Sasaki Parallel automated wire-routing with a number of competing processors . . . . . 310--317 Tony F. Chan Hierarchical algorithms and architectures for parallel scientific computing . . . . . . . . . . . . . . . 318--329 Kevin Smith and Bill Appelbe and Kurt Stirewalt Incremental dependence analysis for interactive parallelization . . . . . . 330--341 Roland Rühl and Marco Annaratone Parallelization of FORTRAN code on distributed-memory parallel processors 342--353 Edward H. Gornish and Elana D. Granston and Alexander V. Veidenbaum Compiler-directed data prefetching in multiprocessors with memory hierarchies 354--368 Guang R. Gao and Herbert H. J. Hum and Yue-Bong Wong Towards efficient fine-grain software pipelining . . . . . . . . . . . . . . . 369--379 Françoise André and Jean-Louis Pazat and Henry Thomas Pandore: a system to manage data distribution . . . . . . . . . . . . . . 380--388 Rod A. Fatoohi Vector performance analysis of the NEC SX-2 . . . . . . . . . . . . . . . . . . 389--400 François Bodin and Daniel Windheiser and William Jalby and Daya Atapattu and Mannho Lee and Dennis Gannon Performance evaluation and prediction for parallel algorithms on the BBN GP1000 . . . . . . . . . . . . . . . . . 401--413 Luigi Brochard and Alex Freau Designing algorithms on hierarchical memory multiprocessors . . . . . . . . . 414--427 Ingrid Y. Bucher and Donald A. Calahan Access conflicts in multiprocessor memories queueing models and simulation studies . . . . . . . . . . . . . . . . 428--438 Emilio Luque and Ana Ripoll and Porfidio Hernández and Tomás Margalef Impact of task duplication on static-scheduling performance in multiprocessor systems with variable execution-time tasks . . . . . . . . . . 439--446 Apostolos Gerasoulis and Sesh Venugopal and Tao Yang Clustering task graphs for message passing architectures . . . . . . . . . 447--456 Edwin M. Paalvast and Arjan J. van Gemund and Henk J. Sips A method for parallel program generation with an application to the Booster language . . . . . . . . . . . . . . . . 457--469 M. A. Tsoukarellas and T. S. Papatheodorou A run time support system for multiprocessor machines . . . . . . . . 470--478 Anthony J. G. Hey Supercomputing with transputers---past, present and future . . . . . . . . . . . 479--489
Burton Smith The end of architecture . . . . . . . . 10--17 Mark D. Hill What is scalability? . . . . . . . . . . 18--21 P. A. Laplante A novel single instruction computer architecture . . . . . . . . . . . . . . 22--26 Ran Ginosar and Nick Michell On the potential of asynchronous pipelined processors . . . . . . . . . . 27--34 Yen-Jen Oyang and Chun-Hung Wen and Yu-Fen Chen and Shu-May Lin The effect of employing advanced branching mechanisms in superscalar processors . . . . . . . . . . . . . . . 35--52 Yannick Deville A low-cost usage-based replacement algorithm for cache memories . . . . . . 52--58 Bernard K. Gunther A high speed mechanism for short branches . . . . . . . . . . . . . . . . 59--61 Robert McLaughlin Design for fast DSP machine . . . . . . 62--66 Werner B. Joerg A subclass of Petri Nets as design abstraction for parallel architectures 67--77 Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 80--89 Glen G. Langdon, Jr. Book review: \em Highly Parallel Computing by George Almasi and Allan Gotlieb (Benjamin/Cummings, 1989) . . . 90--90 Glen G. Langdon, Jr. Book review: \em Solving Problems on Concurrent Processors, Vol II: Software for Concurrent Processors by I. Angus, G. Fox, J. Kim, and D. Walker (Prentice-Hall, 1990) . . . . . . . . . 90--91 Marc Dikotter Book review: \em The Definition of Standard ML by R. Milner, M. Torte, R. Harper . . . . . . . . . . . . . . . . . 91--91
F. T. Leighton Selected Papers from the Symposium on Parallel Algorithms and Architectures 5--5 John Y. Ngai and Charles L. Seitz A framework for adaptive routing in multicomputer networks . . . . . . . . . 6--14 Richard Beigel and Clydel P. Kruskal Processor networks and interconnection networks without long wires (extended abstract) . . . . . . . . . . . . . . . 15--24 Fred Annexstein Fault tolerance in hypercube-derivative networks (preliminary version) . . . . . 25--34 Richard M. Fujimoto The Virtual Time Machine . . . . . . . . 35--44 Ginfranco Bilardi and Scot W. Hornick and Majid Sarrafzadeh Optimal VLSI architectures for multidimensional DFT (preliminary version) . . . . . . . . . . . . . . . . 45--52 Clark D. Thomborson and Belle W.-Y. Wei Systolic implementations of a move-to-front text compressor . . . . . 53--60 Thomas F. Knight, Jr. Technologies for low latency interconnection switches . . . . . . . . 61--68 Martin C. Herbordt and Charles C. Weems and James C. Corbett Message-passing algorithms for a SIMD torus with coteries . . . . . . . . . . 69--78 S. Konstantinidou and L. Snyder The chaos router: a practical application of randomization in network routing . . . . . . . . . . . . . . . . 79--88 Jehoshua Bruck and Robert Cypher and Danny Soroker Running algorithms efficiently on faulty hypercubes (extended abstract) . . . . . 89--96 Naomi Nishimura Asynchronous shared memory parallel computation (preliminary version) . . . 97--105 M. Shand and P. Bertin and J. Vuillemin Hardware speedups in long integer multiplication . . . . . . . . . . . . . 106--113 Manu Thapar and Bruce Delagi Cache coherence for large scale shared memory multiprocessors . . . . . . . . . 114--119 Peter Grabienski FLIP-FLOP: a stack-oriented multiprocessing system . . . . . . . . . 120--127 Camille C. Price Task allocation in data flow multiprocessors: an annotated bibliography . . . . . . . . . . . . . . 128--134 Rod Adams and Gordon Steven A parallel pipelined processor with conditional instruction execution . . . 135--142 Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 146--150 Michael L. Hilton Book review: \em Systems Programming in Parallel Logic Languages by Ian Foster (Prentice Hall, 1990) . . . . . . . . . 151--151 Keith Anthony Book review: \em Technology Projection Modeling of Future Computer Systems by Al Cutaia (Prentice-Hall, 1990) . . . . 152--153 Paul B. Schneck Book review: \em Optimizing FORTRAN Programs by C. F. Schofield (Halstead Press, 1989) . . . . . . . . . . . . . . 153--154 Robert Bernecky Book review: \em Multiprocessors by Daniel Tabak (Prentice Hall, Englewood Cliffs, NJ) . . . . . . . . . . . . . . 154--156 Robert Bernecky Book review: \em Multiprocessor Performance by Erol Gelenbe (J. Wiley & Sons, Chichester, England) . . . . . . . 156--157 John Fulcher Book review: \em Neural Net Applications and Products by Richard K. Miller, Terri C. Walker, and Anne M. Ryan (SEAl Technical Publications, 1990) . . . . . 157--158
Andrew Wolfe and John P. Shen A variable instruction stream extension to the VLIW architecture . . . . . . . . 2--14 Manolis Katevenis and Nestoras Tzartzanis Reducing the branch penalty by rearranging instructions in a double-width memory . . . . . . . . . . 15--27 Roland L. Lee and Alex Y. Kwok and Fayé A. Briggs The floating point performance of a superscalar SPARC processor . . . . . . 28--37 David Callahan and Ken Kennedy and Allan Porterfield Software prefetching . . . . . . . . . . 40--52 Gurindar S. Sohi and Manoj Franklin High-bandwidth data memory systems for superscalar processors . . . . . . . . . 53--62 Monica D. Lam and Edward E. Rothberg and Michael E. Wolf The cache performance and optimizations of blocked algorithms . . . . . . . . . 63--74 Jeffrey C. Mogul and Anita Borg The effect of context switches on cache performance . . . . . . . . . . . . . . 75--84 David Keppel A portable interface for on-the-fly instruction space modification . . . . . 86--95 Andrew W. Appel and Kai Li Virtual memory primitives for user programs . . . . . . . . . . . . . . . . 96--107 Thomas E. Anderson and Henry M. Levy and Brian N. Bershad and Edward D. Lazowska The interaction of architecture and operating system design . . . . . . . . 108--120 David G. Bradlee and Susan J. Eggers and Robert R. Henry Integrating register allocation and instruction scheduling for RISCs . . . . 122--131 Manuel E. Benitez and Jack W. Davidson Code generation for streaming: an access/execute mechanism . . . . . . . . 132--141 Rajive Bagrodia and Sharad Mathur Efficient Implementation of high-level parallel programs . . . . . . . . . . . 142--151 William Mangione-Smith and Santosh G. Abraham and Edward S. Davidson Vector register design for polycyclic vector scheduling . . . . . . . . . . . 154--163 David E. Culler and Anurag Sah and Klaus E. Schauser and Thorsten von Eicken and John Wawrzynek Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine . . . . . . . 164--175 David W. Wall Limits of instruction-level parallelism 176--188 Edward K. Lee and Randy H. Katz Performance consequences of parity placement in disk arrays . . . . . . . . 190--199 Vincent Cate and Thomas Gross Combining the concepts of compression and caching for a two-level filesystem 200--211 William J. Bolosky and Michael L. Scott and Robert P. Fitzgerald and Robert J. Fowler and Alan L. Cox NUMA policies and their relation to memory architecture . . . . . . . . . . 212--221 David Chaiken and John Kubiatowicz and Anant Agarwal LimitLESS directories: a scalable cache coherence scheme . . . . . . . . . . . . 224--234 Sang L. Min and Jong-Deok Choi An efficient cache-based access anomaly detection scheme . . . . . . . . . . . . 235--244 Kourosh Gharachorloo and Anoop Gupta and John Hennessy Performance evaluation of memory consistency models for shared-memory multiprocessors . . . . . . . . . . . . 245--257 Eric Freudenthal and Allan Gottlieb Process coordination with fetch-and-increment . . . . . . . . . . 260--268 John M. Mellor-Crummey and Michael L. Scott Synchronization without contention . . . 269--278 Douglas Johnson The case for a read barrier . . . . . . 279--287 Robert F. Cmelik and Shing I. Kong and David R. Ditzel and Edmund J. Kelly An analysis of MIPS and SPARC instruction set utilization on the SPEC benchmarks . . . . . . . . . . . . . . . 290--302 C. Brian Hall and Kevin O'Brien Performance characteristics of architectural features of the IBM RISC System/6000 . . . . . . . . . . . . . . 303--309 Dileep Bhandarkar and Douglas W. Clark Performance from architecture: comparing a RISC and a CISC with similar hardware organization . . . . . . . . . . . . . . 310--319
R. F. DeMara and D. I. Moldovan The SNAP-1 parallel AI prototype . . . . 2--11 Wei Siong Tan and H. Russ and Cecil O. Alford GT-EP: a novel high-performance real-time architecture . . . . . . . . . 13--21 Tetsuya Higuchi and Tatsumi Furuya and Kenichi Handa and Naoto Takahashi and Hiroyasu Nishiyama and Akio Kokubu IXM2: a parallel associative processor 22--31 David R. Kaeli and Philip G. Emma Branch history table prediction of moving target branches due to subroutine returns . . . . . . . . . . . . . . . . 34--42 Alexander C. Klaiber and Henry M. Levy An architecture for software-controlled data prefetching . . . . . . . . . . . . 43--53 John W. C. Fu and Janak H. Patel Data prefetching in multiprocessor vector cache memories . . . . . . . . . 54--63 D. T. Harper III Reducing memory contention in shared memory multiprocessors . . . . . . . . . 66--73 B. Ramakrishna Rau Pseudo-randomly interleaved memory . . . 74--83 Kai Li and Karin Petersen Evaluation of memory system extensions 84--93 Patrick W. Dowd High performance interprocessor communication through optical wavelength division multiple access channels . . . 96--105 Anders Landin and Erik Hagersten and Seif Haridi Race-free interconnection networks and multiprocessor consistency . . . . . . . 106--115 Xiaola Lin and Lionel M. Ni Deadlock-free multicast wormhole routing in multicomputer networks . . . . . . . 116--125 Matthew Farrens and Arvin Park Dynamic base register caching: a technique for reducing address bus width 128--137 O. A. Olukotun and T. N. Mudge and R. B. Brown Implementing a cache for a high-performance GaAs microprocessor . . 138--147 Lizyamma Kurian and Paul T. Hulina and Lee D. Coraor and Dhamir N. Mannai Classification and performance evaluation of instruction buffering techniques . . . . . . . . . . . . . . . 150--159 Masaitsu Nakajima and Hiraku Nakano and Yasuhiro Nakakura and Tadahiro Yoshida and Yoshiyuki Goi and Yuji Nakai and Reiji Segawa and Takeshi Kishida and Hiroshi Kadota OHMEGA: a VLSI superscalar processor architecture for numerical applications 160--168 Sriram Vajapeyam and Gurindar S. Sohi and Wei-Chung Hsu An empirical study of the CRAY Y-MP processor using the Perfect Club benchmarks . . . . . . . . . . . . . . . 170--179 Chriss Stephens and Bryce Cogswell and John Heinlein and Gregory Palmer and John P. Shen Instruction level profiling and evaluation of the IBM/6000 . . . . . . . 180--189 R. T. Dimpsey and R. K. Iyer Performance prediction and tuning on a multiprocessor . . . . . . . . . . . . . 190--199 C. W. Oehlrich and A. Quick Performance evaluation of a communication system for transputer-networks based on monitored event traces . . . . . . . . . . . . . . 202--211 S. Konstantinidou and L. Snyder Chaos router: architecture and performance . . . . . . . . . . . . . . 212--221 Shridhar B. Shukla and Dharma P. Agrawal Scheduling pipelined communication in distributed memory multiprocessors for real-time applications . . . . . . . . . 222--231 Sarita V. Adve and Mark D. Hill and Barton P. Miller and Robert H. B. Netzer Detecting data races on weak memory systems . . . . . . . . . . . . . . . . 234--243 Eric J. Koldinger and Susan J. Eggers and Henry M. Levy On the validity of trace-driven simulation for multiprocessors . . . . . 244--253 Anoop Gupta and John Hennessy and Kourosh Gharachorloo and Todd Mowry and Wolf-Dietrich Weber Comparative evaluation of latency reducing and tolerating techniques . . . 254--263 Pohua P. Chang and Scott A. Mahlke and William Y. Chen and Nancy J. Warter and Wen-mei W. Hwu IMPACT: an architectural framework for multiple-instruction-issue processors 266--275 Michael Butler and Tse-Yu Yeh and Yale Patt and Mitch Alsup and Hunter Scales and Michael Shebanow Single instruction stream parallelism is greater than two . . . . . . . . . . . . 276--286 Stephen Melvin and Yale Patt Exploiting fine-grained parallelism through a combination of hardware and software techniques . . . . . . . . . . 287--296 Sarita V. Adve and Vikram S. Adve and Mark D. Hill and Mary K. Vernon Comparison of hardware and software cache coherence schemes . . . . . . . . 298--308 Richard Simoni and Mark Horowitz Modeling the performance of limited pointers directories for cache coherence 309--319 Donna J. Quammen and D. Richard Miller Flexible register management for sequential programs . . . . . . . . . . 320--329 David G. Bradlee and Susan J. Eggers and Robert R. Henry The effect on RISC performance of register set size and structure versus code generation strategy . . . . . . . . 330--339 Gregory M. Papadopoulos and Kenneth R. Traub Multithreading: a revisionist view of dataflow architectures . . . . . . . . . 342--351 Tzi-cker Chiueh Multi-threaded vectorization . . . . . . 352--361 Matthew K. Farrens and Andrew R. Pleszkun Strategies for achieving improved processor throughput . . . . . . . . . . 362--369 Toyohiko Kagimasa and Kikuo Takahashi and Toshiaki Mori and Seiichi Yoshizumi Adaptive storage management for very large virtual/real storage systems . . . 372--379 Judith S. Hall and Paul T. Robinson Virtualizing the VAX architecture . . . 380--389 Janaki Akella and Daniel P. Siewiorek Modeling and measurement of the impact of Input/Output on system performance 390--399
Paul R. Wilson Pointer swizzling at page fault time: efficiently supporting huge address spaces on standard hardware . . . . . . 6--13 Morihiro Kuga and Kazuaki Murakami and Shinji Tomita DSNS (dynamically-hazard-resolved statically-code-scheduled, nonuniform superscalar): yet another superscalar processor architecture . . . . . . . . . 14--29 Carl Ponder Performance variation across benchmark suites . . . . . . . . . . . . . . . . . 30--36 Thomas M. Conte and Wen-mei W. Hwu A brief survey of benchmark usage in the architecture community . . . . . . . . . 37--44 Todd D. Morris and Edward F. Gehringer A cost-effective reliable multipath interconnection network . . . . . . . . 45--65 P. A. Laplante An improved conditional branching scheme for a single instruction computer architecture . . . . . . . . . . . . . . 66--68 Andrew J. DuBois and John Rasure Design and evaluation of a distributed asynchronous VLSI crossbar switch controller for a packet switched supercomputer network . . . . . . . . . 69--79 Stanley E. Lass The compiler controlled pack cache and messaging . . . . . . . . . . . . . . . 80--85 Theo Ungerer and Eberhard Zehendner A multi-level parallelism architecture 86--93 Wolfgang Matthes How many operation units are adequate? 94--108 Alberto R. Cunha and Carlos N. Ribeiro and José A. Marques The architecture of a memory management unit for object-oriented systems . . . . 109--116 Norman Matloff An argument against scalable cache coherency . . . . . . . . . . . . . . . 117--123 D. P. Rodohan and R. J. Glover An overview of the A architecture for optimisation problems in a logic programming environment . . . . . . . . 124--131 Stuart C. Wray Time-sequenced DMA for multimedia computers . . . . . . . . . . . . . . . 132--137 Ganesh Ramamoorthy and Alok N. Choudhary A bibliography for multiprocessor cache memories . . . . . . . . . . . . . . . . 138--153 Alan Jay Smith Second bibliography on Cache memories 154--182 Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 185--191
David A. Patterson Towards guidelines for SIGARCH sponsored conferences . . . . . . . . . . . . . . 7--7 Yeong-Chang Maa and Dhiraj K. Pradhan and Dominique Thiébaut Two economical directory schemes for large-scale cache coherent multiprocessors . . . . . . . . . . . . 10--10 Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 21--26 Vladimir G. Ivanovic Book review: \em Computation Structures by Stephen A Ward and Robert H. Halstead, Jr. (MIT Press or McGraw-Hill, 1990) . . . . . . . . . . . . . . . . . 27--29 Moshe Krieger Book review: \em Multiprocessors by D. Tabak (Prentice-Hall, 1990) . . . . . . 27--29 John Fulcher Book review: \em The 68000 and 68020 Microprocessors: Hardware, Software and Interfacing Techniques by W. Triebel and A. Singh (Prentice Hall, 1991) . . . . . 29--30
Henry G. Baker Precise instruction scheduling without a precise machine model . . . . . . . . . 4--8 Robert McLaughlin Look-ahead branching hardware . . . . . 9--11 Thomas Beth and Volker Hatz A restricted crossbar implementation and its applications . . . . . . . . . . . . 12--16 Mark Thorson Usenet nuggets . . . . . . . . . . . . . 19--23 Robert Bernecky Book review: \em Past, Present, Parallel: A Survey of Available Parallel Computing Systems by Arthur Trew & Greg Wilson (Eds.), (Springer-Verlag 1991) 24--25
Jaswinder Pal Singh and Wolf-Dietrich Weber and Anoop Gupta SPLASH: Stanford parallel applications for shared-memory . . . . . . . . . . . 5--44 Eligiusz Wajda SPIRE: streaming processing with instructions release element . . . . . . 45--54 Yannick Deville and Jean Gobert A class of replacement policies for medium and high-associativity structures 55--64
Richard N. Zucker and Jean-Loup Baer A performance study of memory consistency models . . . . . . . . . . . 2--12 Pete Keleher and Alan L. Cox and Willy Zwaenepoel Lazy release consistency for software distributed shared memory . . . . . . . 13--21 Kourosh Gharachorloo and Anoop Gupta and John Hennessy Hiding memory latency using dynamic scheduling in shared-memory multiprocessors . . . . . . . . . . . . 22--33 Edil S. T. Fernandes and Fernando M. B. Barbosa Effects of building blocks on the performance of super-scalar architecture 36--45 Monica S. Lam and Robert P. Wilson Limits of control flow on parallelism 46--57 Manoj Franklin and Gurindar S. Sohi The expandable split window paradigm for exploiting fine-grain parallelism . . . 58--67 Daniel Litaize and Abdelaziz Mzoughi and Christine Rochange and Pascal Sainrat Towards a shared-memory massively parallel multiprocessor . . . . . . . . 70--79 Per Stenström and Truman Joe and Anoop Gupta Comparative performance evaluation of cache-coherent NUMA and COMA architectures . . . . . . . . . . . . . 80--91 Daniel Lenoski and James Laudon and Truman Joe and David Nakahira and Luis Stevens and Anoop Gupta and John Hennessy The DASH prototype: implementation and performance . . . . . . . . . . . . . . 92--103 Gideon Intrater and Ilan Spillinger Performance evaluation of a decoded instruction cache for variable instruction-length computers . . . . . . 106--113 J. Bradley Chen and Anita Borg and Norman P. Jouppi A simulation based study of TLB performance . . . . . . . . . . . . . . 114--123 Tse-Yu Yeh and Yale N. Patt Alternative implementations of two-level adaptive branch prediction . . . . . . . 124--134 Hiroaki Hirata and Kozo Kimura and Satoshi Nagamine and Yoshiyuki Mochizuki and Akio Nishimura and Yoshimori Nakase and Teiji Nishizawa An elementary processor architecture with simultaneous instruction issuing from multiple threads . . . . . . . . . 136--145 Mitsuhisa Sato and Yuetsu Kodama and Shuichi Sakai and Yoshinori Yamaguchi and Yasuhito Koumura Thread-based programming for the EM-4 hybrid dataflow machine . . . . . . . . 146--155 R. S. Nikhil and G. M. Papadopoulos and Arvind T: a multithreaded massively parallel architecture . . . . . . . . . . . . . . 156--167 Czarek Dubnicki and Thomas J. LeBlanc Adjustable block size coherent caches 170--180 Kunle Olukotun and Trevor Mudge and Richard Brown Performance optimization of pipelined primary cache . . . . . . . . . . . . . 181--190 Scott McFarling Cache replacement with dynamic exclusion 191--200 Stephem W. Keckler and William J. Dally Processor coupling: integrating compile time and runtime scheduling for parallelism . . . . . . . . . . . . . . 202--213 Bob Boothe and Abhiram Ranade Improved multithreading techniques for hiding communication latency in multiprocessors . . . . . . . . . . . . 214--223 Alessandro De Gloria and Paolo Faraboschi Instruction-level parallelism in Prolog: analysis and architectural support . . . 224--233 Lizyamma Kurian and Paul T. Hulina and Lee D. Coraor Memory latency effects in decoupled architectures with a single data memory module . . . . . . . . . . . . . . . . . 236--245 André Seznec and Jacques Lenfant Interleaved parallel schemes: improving memory throughput on supercomputers . . 246--255 Thorsten von Eicken and David E. Culler and Seth Copen Goldstein and Klaus Erik Schauser Active messages: a mechanism for integrated communication and computation 256--266 Andrew A. Chien and Jae H. Kim Planar-adaptive routing: low-cost adaptive networks for multiprocessors 268--277 Christopher J. Glass and Lionel M. Ni The turn model for adaptive routing . . 278--287 Toshiyuki Shimizu and Takeshi Horie and Hiroaki Ishihata Low-latency message communication support for the AP1000 . . . . . . . . . 288--297 Barbara P. Aichinger Futurebus+ as an I/O bus: profile B . . 300--307 A. L. Narasimha Reddy A study of I/O system organizations . . 308--317 Jai Menon and Dick Mattson Comparison of sparing alternatives for disk arrays . . . . . . . . . . . . . . 318--329 Markus Siegle and Richard Hofmann Monitoring program behaviour on SUPRENUM 332--341 Todd M. Austin and Gurindar S. Sohi Dynamic dependency analysis of ordinary programs . . . . . . . . . . . . . . . . 342--351 Walid A. Najjar and W. Marcus Miller and A. P. Wim Böhm An analysis of loop latency in dataflow execution . . . . . . . . . . . . . . . 352--360 Qing Yang and Liping Wu Yang A novel cache design for vector processing . . . . . . . . . . . . . . . 362--371 Mateo Valero and Tomás Lang and José M. Llabería and Montse Peiron and Eduard Ayguadé and Juan J. Navarra Increasing the number of strides for conflict-free vector access . . . . . . 372--381 Wm. A. Wulf Evaluation of the WM architecture . . . 382--390 Kirk L. Johnson The impact of communication locality on large-scale multiprocessor performance 392--402 Steven L. Scott and James R. Goodman and Mary K. Vernon Performance of the SCI ring . . . . . . 403--414 Madhusudhan Talluri and Shing Kong and Mark D. Hill and David A. Patterson Tradeoffs in supporting two page sizes 415--424 Ahmed Louri and Jongwhoa Na Parallel electro-optical rule-based system for fast execution of expert systems (abstract) . . . . . . . . . . . 427--427 André Seznec and Karl Courtel OPAC (abstract): a floating-point coprocessor dedicated to compute-bound kernels . . . . . . . . . . . . . . . . 427--427 Der-Chung Cheng and Kanad Ghose The time-constrained barrier synchronizer and its applications in parallel systems (abstract) . . . . . . 428--428 Ahmed Louri and Hongki Sung A new compiler-directed cache coherence scheme for shared memory multiprocessors with fast and parallel explicit invalidation (abstract) . . . . . . . . 428--428 Gautam B. Singh Architecture of a graphics processor (abstract) . . . . . . . . . . . . . . . 429--429 Ruben Yomtov Performance evaluation of disk subsystems . . . . . . . . . . . . . . . 429--429 Feipei Lai and Meng-chou Chang Enhancing boosting with semantic register in a superscalar processor (abstract) . . . . . . . . . . . . . . . 430--430 Ivan Sklenar Prefetch unit for vector operations on scalar computers (abstract) . . . . . . 430--430 Gary Newman Memory management support for tiled array organization (abstract) . . . . . 431--431 Augustus K. Uht and Darin B. Johnson Data path issues in a highly concurrent machine (abstract) . . . . . . . . . . . 431--431 Samuel A. Fineberg and Thomas L. Casavant and Brent H. Pease Seamless --- a latency-tolerant RISC-based multiprocessor architecture (abstract) . . . . . . . . . . . . . . . 432--432 M. A. Sayeed and M. Atiquzzaman Performance of multiple-bus multiprocessor under non-uniform memory reference model (abstract) . . . . . . . 432--432 M. Tahar Kechadi and J-L. Dekeyser and Ph. Marquet and Ph. Preux Performance improvement for vector pipeline multiprocessor systems using a disordered execution model(abstract) . . 433--433 Anujan Varma and Gunjan Sinha A class of prefetch schemes for on-chip data caches . . . . . . . . . . . . . . 433--433 Arthur Abnous and Nader Bagherzadeh Pipelining and bypassing in a VLIW processor (abstract) . . . . . . . . . . 434--434 Shiv Prakash and Alice C. Parker Synthesis of application-specific heterogeneous multiprocessor systems (abstract) . . . . . . . . . . . . . . . 434--434 Matthew Farrens and Arvin Park and Rob Fanfelle and Pius Ng and Gary Tyson A partitioned translation lookaside buffer approach to reducing address bandwidth (abstract) . . . . . . . . . . 435--435 James Laudon and Anoop Gupta and Mark Horowitz Architectural and implementation tradeoffs in the design of multiple-context processors (abstract) 435--435 Brian D. Alleyne and Isaac D. Scherson Expanded delta networks for very large parallel computers . . . . . . . . . . . 436--436 Jaswinder Pal Singh Implications of hierarchical N-body methods for multiprocessor architecture 436--436 Wisam Michael Directory-based cache coherency protocol for a ring-connected multiprocessor-array . . . . . . . . . . 437--437 Wen-Hann Wang and Jim Quinlan and Konrad Lai Revisit the case for direct-mapped chaches: a case for two-way set-associative level-two caches . . . . 437--437 David E. Culler and Michial Gunter and James C. Lee Analysis of multithreaded microprocessors under multiprogramming 438--438 C. M. Wittenbrink and A. K. Somani and C. H. Chen Cache write generate for high performance parallel processing . . . . 438--438 Walter H. Burkhardt and Stefan Rust Integrated computer architecture development system . . . . . . . . . . . 439--439
R. J. Chevance An evaluation methodology for microprocessor and system architecture 4--13 Michael Laird A comparison of three current superscalar designs . . . . . . . . . . 14--21 Jack J. Dongarra Performance of various computers using standard linear equations software . . . 22--44 William F. Keown, Jr. and Philip Koopman, Jr. and Aaron Collins Performance of the HARRIS RTX 2000 stack architecture versus the Sun 4 SPARC and the Sun 3 M68020 Architectures . . . . . 45--52 Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 56--62 Siddhartha Chalterjee Book review: \em The Impact of Vector and Parallel Architectures on the Gaussian Elimination Algorithm by Yves Robert (Manchester University Press and Halsted Press, 1991) . . . . . . . . . . 63--64
Margarita Esponda and Raúl Rojas A graphical comparison of RISC processors . . . . . . . . . . . . . . . 2--8 Shogo Matsui Dynamic refresh method for dynamic RAMs 9--16 Arvin Park and Ron Maeder Codes to reduce switching transients across VLSI I/O pins . . . . . . . . . . 17--21 Gary Newman Memory management support for tiled array organization . . . . . . . . . . . 22--30 Ivan Sklená\vr Prefetch unit for vector operations on scalar computers . . . . . . . . . . . . 31--37 Nadeem Malik and Richard J. Eickemeyer and Stamatis Vassiliadis Instruction-level parallelism from execution interlock collapsing . . . . . 38--43 Stamatis Vassiliadis and Bart Blaner and Richard J. Eickemeyer On the attributes of the SCISM organization . . . . . . . . . . . . . . 44--53 Mark Thorson Usenet nuggets . . . . . . . . . . . . . 56--64 Ken Allen Book review: \em Computing with Parallel Architectures: T.Node, edited by D. Gassilloud and J. C. Grossetie (Kluwer Academic Publishers 1991) . . . . . . . 65--66
Gavin Michael and Andrew Chien Future multicomputers: beyond minimalist multiprocessors? . . . . . . . . . . . . 6--12 R. P. Kaushal and J. S. Bedi Comparison of hypercube, hypernet, and symmetric hypernet architectures . . . . 13--25 Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 28--33 David Levy Book review: \em Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence by Bart Kosko (Prentice Hall 1992) . . . . . . . 34--34
Atsushi Inoue and Kenji Takeda Performance evaluation for various configuration of superscalar processors 4--11 Augustus K. Uht Extraction of massive instruction level parallelism . . . . . . . . . . . . . . 12--14 Nasr Ullah and Matt Holle The MC88110 implementation of precise exceptions in a superscalar architecture 15--25 Yannick Deville A process-dependent partitioning strategy for cache memories . . . . . . 26--33 Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 36--38 ACM SIGARCH Computer Architecture News Staff Book reviews . . . . . . . . . . . . . . 39--39
R. Cypher and A. Ho and S. Konstantinidou and P. Messina Architectural requirements of parallel scientific applications with explicit communication . . . . . . . . . . . . . 2--13 Edward Rothberg and Jaswinder Pal Singh and Anoop Gupta Working sets, cache sizes, and node granularity issues for large-scale multiprocessors . . . . . . . . . . . . 14--26 David Nagle and Richard Uhlig and Tim Stanley and Stuart Sechrest and Trevor Mudge and Richard Brown Design tradeoffs for software-managed TLBs . . . . . . . . . . . . . . . . . . 27--38 Jerry Huck and Jim Hays Architectural support for translation table management in large address space machines . . . . . . . . . . . . . . . . 39--50 Pei Cao and Swee Boon Lim and Shivakumar Venkataraman and John Wilkes The TickerTAIP parallel RAID architecture . . . . . . . . . . . . . . 52--63 Daniel Stodolsky and Garth Gibson and Mark Holland Parity logging overcoming the small write problem in redundant disk arrays 64--75 Jai Menon and Jim Cortney The architecture of a fault-tolerant cached RAID controller . . . . . . . . . 76--87 Michel Dubois and Jonas Skeppstedt and Livio Ricciulli and Krishnan Ramamurthy and Per Stenström The detection and elimination of useless misses in multiprocessors . . . . . . . 88--97 Alan L. Cox and Robert J. Fowler Adaptive cache coherency for detecting migratory shared data . . . . . . . . . 98--108 Per Stenström and Mats Brorsson and Lars Sandberg An adaptive cache coherence protocol optimized for migratory sharing . . . . 109--118 Carl A. Waldspurger and William E. Weihl Register relocation: flexible contexts for multithreading . . . . . . . . . . . 120--130 Yasuo Hidaka and Hanpei Koike and Hidehiko Tanaka Multiple threads in cyclic register windows . . . . . . . . . . . . . . . . 131--142 Sandhya Dwarkadas and Peter Keleher and Alan L. Cox and Willy Zwaenepoel Evaluation of release consistent software distributed shared memory on emerging network technology . . . . . . 144--155 David A. Wood and Satish Chandra and Babak Falsafi and Mark D. Hill and James R. Larus and Alvin R. Lebeck and James C. Lewis and Shubhendu S. Mukherjee and Subbarao Palacharla and Steven K. Reinhardt Mechanisms for cooperative shared memory 156--167 André Seznec A case for two-way skewed-associative caches . . . . . . . . . . . . . . . . . 169--178 Anant Agarwal and Stephen D. Pudar Column-associative caches: a technique for reducing the miss rate of direct-mapped caches . . . . . . . . . . 179--190 Norman P. Jouppi Cache write policies and performance . . 191--201 Eric L. Boyd and Edward S. Davidson Hierarchical performance modeling with MACS: a case study of the Convex C-240 203--210 D. Kuck and E. Davidson and D. Lawrie and A. Sameh and C. Q. Zhu and A. Veidenbaum and J. Konicek and P. Yew and K. Gallivan and W. Jalby and H. Wijshoff and R. Bramley and U. M. Yang and P. Emrath and D. Padua and R. Eigenmann and J. Hoeflinger and G. Jaxon and Z. Li and T. Murphy and J. Andrews The cedar system and an initial performance study . . . . . . . . . . . 213--223 Michael D. Noakes and Deborah A. Wallach and William J. Dally The J-machine multicomputer: an architectural evaluation . . . . . . . . 224--235 John Bunda and Don Fussell and W. C. Athas and Roy Jenevein 16-bit vs. 32-bit instructions for pipelined microprocessors . . . . . . . 237--246 Tokuzo Kiyohara and Scott Mahlke and William Chen and Roger Bringmann and Richard Hank and Sadun Anik and Wen-Mei Hwu Register connection: a new approach to adding registers into instruction set architectures . . . . . . . . . . . . . 247--256 Tse-Yu Yeh and Yale N. Patt A comparison of dynamic branch predictors that use two levels of branch history . . . . . . . . . . . . . . . . 257--266 Luis André Barroso and Michel Dubois The performance of cache-coherent ring-based multiprocessors . . . . . . . 268--277 Dean M. Tullsen and Susan J. Eggers Limitations of cache prefetching on a bus-based multiprocessor . . . . . . . . 278--288 Maurice Herlihy and J. Eliot B. Moss Transactional memory: architectural support for lock-free data structures 289--300 Ellen Spertus and Seth Copen Goldstein and Klaus Erik Schauser and Thorsten von Eicken and David E. Culler and William J. Dally Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5 . . . . . . . . . 302--313 Takeshi Horie and Kenichi Hayashi and Toshiyuki Shimizu and Hiroaki Ishihata Improving AP1000 parallel computer performance with message communication 314--325 W.-C. Hsu and J. E. Smith Performance of cached DRAM organizations in vector supercomputers . . . . . . . . 327--336 Q. S. Gao The Chinese remainder theorem and the prime memory system . . . . . . . . . . 337--340 André Seznec and Jacques Lenfant Odd memory systems may be quite interesting . . . . . . . . . . . . . . 341--350 Rajendra V. Boppana and Suresh Chalasani A comparison of adaptive wormhole routing algorithms . . . . . . . . . . . 351--360
Augustus K. Uht Extraction of massive instruction level parallelism . . . . . . . . . . . . . . 5--12 Gowri Ramanathan and Joel Oren Survey of commercial parallel machines 13--33 Benjamin J. Ewy and Joseph B. Evans Secondary cache performance in RISC architecture . . . . . . . . . . . . . . 34--37 Iraj Danesh Physical limitations of a computer . . . 40--45 Mark Thorson Usenet nuggets . . . . . . . . . . . . . 46--49 Gary Fostel Book Reviews: \em Principles of Computer Systems by Gerald M. Karam & John C. Bryant (Prentice Hall 1992) . . . . . . 50--51 Gary Fostel Book Review: \em Computer Architecture by Mario De Blasi (Addison-Wesley Publishing Company, 1990) . . . . . . . 51--53 John Fulcher Book Review: \em Practical Parallel Computing by Paul Messina and Almerico Murli, Editors (John Wiley and Sons, 1992) . . . . . . . . . . . . . . . . . 53--54
Mark D. Hill and James R. Larus and Alvin R. Lebeck and Madhusudhan Talluri and David A. Wood Wisconsin Architectural Research Tool Set . . . . . . . . . . . . . . . . . . 8--10 Craig Hyatt A high-performance object-oriented memory . . . . . . . . . . . . . . . . . 11--19 Gautam Dewan and V. S. S. Nair A case for uniform memory access multiprocessors . . . . . . . . . . . . 20--26 Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 27--28 Glen Langdon Book Reviews . . . . . . . . . . . . . . 29--29
Ravi Jain and John Werth and J. C. Browne Introduction to the Special Issue on Input/Output in Parallel Computer Systems . . . . . . . . . . . . . . . . 5--6 Peter F. Corbett and Sandra Johnson Baylor and Dror G. Feitelson Overview of the Vesta parallel file system . . . . . . . . . . . . . . . . . 7--14 Z. Lin and S. Zhou Parallelizing I/O intensive applications for a workstation cluster: a case study 15--22 Samuel A. Fineberg Implementing the NHT-1 application I/O benchmark . . . . . . . . . . . . . . . 23--30 Juan Miguel del Rosario and Rajesh Bordawekar and Alok Choudhary Improved parallel I/O via a two-phase run-time access strategy . . . . . . . . 31--38 Shahram Ghandeharizadeh and Cyrus Shahabi and Luis Ramos An overview of techniques to support continuous retrieval of multimedia objects . . . . . . . . . . . . . . . . 39--46 Ravi Jain and Kiran Somalwar and John Werth and J. C. Browne Scheduling parallel I/O operations . . . 47--54 Qiang Li and Naphtali Rishe A transputer T9000 family based architecture for parallel database machines . . . . . . . . . . . . . . . . 55--62 Claus Aßmann A RISC processor architecture with a versatile stack system . . . . . . . . . 63--70 Dajin Wang A note on ``Diagnosabilities of hypercubes under the pessimistic one-step diagnosis strategy'' . . . . . 71--78 Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 79--85 Bob Alverson Book Review: \em High-Speed Digital Design: A Handbook of Black Magic by Howard W. Johnson and Martin Graham (Prentice-Hall, 1993) . . . . . . . . . 85--86
Robert Iannucci and Anant Agarwal and Bill Dally and Anoop Gupta and Greg Papadopoulos and Burton Smith Architectural and implementation issues for multithreading (panel session I) . . 3--18 Burt Halstead and David Callahan and Jack Dennis and R. S. Nikhil and Vivek Sarkar Programming, compilation, and resource management issues for multithreading (panel session II) . . . . . . . . . . . 19--33 Henry G. Baker Linear logic and permutation stacks---the Forth shall be first . . . 34--43 Abraham Mendlson and Shlomit S. Pinter and Ruth Shtokhamer Compile time instruction cache optimizations . . . . . . . . . . . . . 44--51 David Barach and Jaspal Kohli and John Slice and Marc Spaulding and Rajeev Bharadhwaj and Don Hudson and Cliff Neighbors and Nirmal Saxena and Rolland Crunk HALSIM---a very fast SPARC V9 behavioral model . . . . . . . . . . . . . . . . . 52--58 Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 59--60 Ewerton Longoni Madruga Book Review: \em SNMP, SNMPv2, and CMIP: The Practical Guide to Network Management Standards by William Stallings (Addison-Wesley Publishing Company Inc. 1993) . . . . . . . . . . . 60--61
B. Calder and D. Grunwald Fast and accurate instruction fetch and branch prediction . . . . . . . . . . . 2--11 A. R. Talcott and W. Yamamoto and M. J. Serrano and R. C. Wood and M. Nemirovsky The impact of unresolved branches on branch prediction scheme performance . . 12--21 S. Palacharla and R. E. Kessler Evaluating stream buffers as a secondary cache replacement . . . . . . . . . . . 24--33 N. P. Jouppi and S. J. E. Wilton Tradeoffs in two-level on-chip caching 34--45 A. Singhal and A. J. Goldberg Architectural support for performance tuning: a case study on the SPARCcenter 2000 . . . . . . . . . . . . . . . . . . 48--59 Z. Cvetanovic and D. Bhandarkar Characterization of Alpha AXP performance using TP and SPEC workloads 60--70 C. Natarajan and S. Sharma and R. K. Iyer Measurement-based characterization of global memory and network contention, operating system and parallelization overheads . . . . . . . . . . . . . . . 71--80 T. Joe and J. L. Hennessy Evaluating the memory overhead required for COMA architectures . . . . . . . . . 82--93 A. C. Klaiber and H. M. Levy A comparison of message passing and shared memory architectures for data parallel programs . . . . . . . . . . . 94--105 A. L. Cox and S. Dwarkadas and P. Keleher and H. Lu and R. Rajamony and W. Zwaenepoel Software versus hardware shared-memory implementation: a case study . . . . . . 106--117 D. N. Pnevmatikatos and G. S. Sohi Guarded execution and branch prediction in dynamic ILP processors . . . . . . . 120--129 C.-L Su and A. M. Despain Branch with masked squashing in superpipelined processors . . . . . . . 130--140 M. A. Blumrich and K. Li and R. Alpert and C. Dubnicki and E. W. Felten and J. Sandberg Virtual memory mapped network interface for the SHRIMP multicomputer . . . . . . 142--153 P. Steenkiste and M. Hemy and T. Mummert and B. Zill Architecture and evaluation of a high-speed networking subsystem for distributed-memory systems . . . . . . . 154--163 B. A. Nayfeh and K. Olukotun Exploring the design space for a shared-cache multiprocessor . . . . . . 166--175 R. Thekkath and S. J. Eggers Impact of sharing-based thread placement on multithreaded architectures . . . . . 176--186 F. Dahlgren and M. Dubois and P. Stenström Combined performance gains of simple cache protocol extensions . . . . . . . 187--197 A. S. Huang and G. Slavenburg and J. P. Shen Speculative disambiguation: a compilation technique for dynamic memory disambiguation . . . . . . . . . . . . . 200--210 K. I. Farkas and N. P. Jouppi Complexity/performance tradeoffs with non-blocking loads . . . . . . . . . . . 211--222 T.-F. Chen and J.-L. Baer A performance study of software and hardware data prefetching schemes . . . 223--232 A. L. Drapeau and K. W. Shirriff and J. H. Hartman and E. L. Miller and S. Seshan and R. H. Katz and K. Lutz and D. A. Patterson and E. K. Lee and P. M. Chen and G. A. Gibson RAID-II: a high-bandwidth network file server . . . . . . . . . . . . . . . . . 234--244 M. Blaum and J. Brady and J. Bruck and J. Menon EVENODD: an optimal scheme for tolerating double disk failures in RAID architectures . . . . . . . . . . . . . 245--254 S. W. Ng Crosshatch disk array for improved reliability and performance . . . . . . 255--264 A. DeHon and F. Chong and M. Becker and E. Egozy and H. Minsky and S. Peretz and T. F. Knight, Jr. METRO: a router architecture for high-performance, short-haul routing networks . . . . . . . . . . . . . . . . 266--277 J. D. Allen and P. T. Gaughan and D. E. Schimmel and S. Yalamanchili Ariadne---an adaptive router for fault-tolerant multicomputers . . . . . 278--288 J. H. Kim and Z. Liu and A. A. Chien Compressionless routing: a framework for adaptive and fault-tolerant routing . . 289--300 J. Kuskin and D. Ofelt and M. Heinrich and J. Heinlein and R. Simoni and K. Gharachorloo and J. Chapin and D. Nakahira and J. Baxter and M. Horowitz and A. Gupta and M. Rosenblum and J. Hennessy The Stanford FLASH multiprocessor . . . 302--313 D. Chaiken and A. Agarwal Software-extended coherent shared memory: performance and cost . . . . . . 314--324 S. K. Reinhardt and J. R. Larus and D. A. Wood Tempest and Typhoon: user-level shared memory . . . . . . . . . . . . . . . . . 325--336 M. Farrens and G. Tyson and A. R. Pleszkun A study of single-chip processor/cache organizations for large numbers of transistors . . . . . . . . . . . . . . 338--347 C.-H. Chen and A. K. Somani A unified architectural tradeoff methodology . . . . . . . . . . . . . . 348--357 D. Nagle and R. Uhlig and T. Mudge and S. Sechrest Optimal allocation of on-chip memory for multiple-API operating systems . . . . . 358--369 R. W. Quong Expected I-cache miss rates via the gap model . . . . . . . . . . . . . . . . . 372--383 A. Seznec Decoupled sectored caches: conciliating low tag implementation cost . . . . . . 384--393
J. R. Gurd Supercomputing: big bang or steady state growth? . . . . . . . . . . . . . . . . 3--13 Kay P. Litchfield Instruction execution sequence confirmation . . . . . . . . . . . . . . 14--18 Phil Allen and Franc Brglez and Hal Carter and Robert Caverly and Jerry Dillion and Albert Lo and Ron Lomax and John Oldfield and Cesar Pina and T. J. Wilkinson Report of the 1993 Workshop on Rapid Prototyping of Microelectronic Systems for Universities . . . . . . . . . . . . 19--26 Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 27--28 Ewerton Longoni Madruga Book Review: \em Internetworking with TCP/IP, vol. III: Client-Server programming and applications (BSD Sockets version) by Douglas E. Comer and David L. Stevens (Prentice-Hall, 1993) 29--30
Ravi Jain and John Werth and J. C. Browne Special Issue on Input/Output in Parallel Computer Systems: Introduction 3--4 Sandra Johnson Baylor and Caroline Benveniste and Yarsun Hsu Performance evaluation of a massively parallel I/O subsystem . . . . . . . . . 5--10 James B. Sinclair and Jay Tang and Peter J. Varman Instability in parallel I/O systems . . 11--16 Steven H. Vanderleest and Ravishankar K. Iyer Measurement of I/O bus contention and correlation among heterogeneous device types in a single-bus multiprocessor system . . . . . . . . . . . . . . . . . 17--22 Rajeev Thakur and Rajesh Bordawekar and Alok Choudhary Compilation of out-of-core data parallel programs for distributed memory machines 23--28 Abhaya Asthana and Mark Cravatts and Paul Krzyzanowski An experimental active memory based I/O subsystem . . . . . . . . . . . . . . . 29--34 Dannie Durand and Ravi Jain and David Tseytlin Distributed scheduling algorithms to improve the performance of parallel data transfers . . . . . . . . . . . . . . . 35--40 Haruo Yokota DR-nets: data-reconstruction networks for highly reliable parallel-disk systems . . . . . . . . . . . . . . . . 41--46 Martti J. Forsell Are multiport memories physically feasible? . . . . . . . . . . . . . . . 47--54 Ghulam Chaudhry and Xuechang Li A case for the multithreaded processor architecture . . . . . . . . . . . . . . 55--59 Yin Chan and Ashok Sudarsanam and Andrew Wolfe The effect of compiler-flag tuning on SPEC benchmark performance . . . . . . . 60--70 Jin-Ho Lee and Min-Young Lee and Seong-Uk Choi and Myong-Soon Park Reducing cache conflicts in data cache prefetching . . . . . . . . . . . . . . 71--77 Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 78--81
Martti J. Forsell Are multiport memories physically feasible? . . . . . . . . . . . . . . . 3--10 Rok Sosi\vc History cache: hardware support for reverse execution . . . . . . . . . . . 11--18 Mark D. Hill and James R. Larus and David A. Wood The Wisconsin Wind Tunnel project: an annotated bibliography . . . . . . . . . 19--26 Avijit Saha and Nadeem Malik Distributed directory tags . . . . . . . 27--29 Ishaq H. Unwala and Harvey G. Cragon A study of MIPS programs . . . . . . . . 30--40 Mark Thorson Internet Nuggets . . . . . . . . . . . . 41--46 Kenneth R. Ohnemus and Diana F. Mallin Benefits of implementing on-line methods and procedures . . . . . . . . . . . . . 49--55 Daniel K. Cunningham and Steven J. Reilly Leading the design team---the evolution of the technical writer from a support role to a design role . . . . . . . . . 56--60 Ann Rockley Multimedia: towards an electronic performance support system . . . . . . . 61--65 Katherine E. Drew Telecommunicators and telecommuters: making multiple-site documentation projects work . . . . . . . . . . . . . 66--75
Aimee Severson and Brent Nelson Throughput in a counterflow pipeline processor . . . . . . . . . . . . . . . 5--12 Tsong-Chih Hsu and Sheng-De Wang A simple architecture for constant time sorting machines . . . . . . . . . . . . 13--19 Wm. A. Wulf and Sally A. McKee Hitting the memory wall: implications of the obvious . . . . . . . . . . . . . . 20--24 Mark Thorson Internet Nuggets . . . . . . . . . . . . 25--28
Anant Agarwal and Ricardo Bianchini and David Chaiken and Kirk L. Johnson and David Kranz and John Kubiatowicz and Beng-Hong Lim and Kenneth Mackenzie and Donald Yeung The MIT Alewife machine: architecture and performance . . . . . . . . . . . . 2--13 Yuetsu Kodama and Hirohumi Sakane and Mitsuhisa Sato and Hayato Yamana and Shuichi Sakai and Yoshinori Yamaguchi The EM-X parallel computer: architecture and basic performance . . . . . . . . . 14--23 Steven Cameron Woo and Moriyoshi Ohara and Evan Torrie and Jaswinder Pal Singh and Anoop Gupta The SPLASH-2 programs: characterization and methodological considerations . . . 24--36 Håkan Grahn and Per Stenström Efficient strategies for software-only protocols in shared-memory multiprocessors . . . . . . . . . . . . 38--47 Alvin R. Lebeck and David A. Wood Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors . . . . . . . . . . . . 48--59 Fredrik Dahlgren Boosting the performance of hybrid snooping cache protocols . . . . . . . . 60--69 Andreas G. Nowatzyk and Michael C. Browne and Edmund J. Kelly and Michael Parkin S-connect: from networks of workstations to supercomputer performance . . . . . . 71--82 Anujan Varma and Quinn Jacobson Destage algorithms for disk arrays with non-volatile caches . . . . . . . . . . 83--95 Gordon Stoll and Bin Wei and Douglas Clark and Edward W. Felten and Kai Li and Patrick Hanrahan Evaluating multi-port frame buffer designs for a mesh-connected multicomputer . . . . . . . . . . . . . 96--105 Andreas G. Nowatzyk and Paul R. Prucnal Are crossbars really dead?: the case for optical multiprocessor interconnect systems . . . . . . . . . . . . . . . . 106--115 Stéphan Jourdan and Pascal Sainrat and Daniel Litaize Exploring configurations of functional units in an out-of-order superscalar processor . . . . . . . . . . . . . . . 117--125 Hideki Ando and Chikako Nakanishi and Tetsuya Hara and Masao Nakaya Unconstrained speculative execution with predicated state buffering . . . . . . . 126--137 Scott A. Mahlke and Richard E. Hank and James E. McCormick and David I. August and Wen-Mei W. Hwu A comparison of full and partial predicated execution support for ILP processors . . . . . . . . . . . . . . . 138--150 M. Simone and A. Essen and A. Ike and A. Krishnamoorthy and T. Maruyama and N. Patkar and M. Ramaswami and M. Shebanow and V. Thirumalaiswamy and D. Tovey Implementation trade-offs in using a restricted data flow architecture in a high performance RISC microprocessor . . 151--162 Trung A. Diep and Christopher Nelson and John Paul Shen Performance evaluation of the PowerPC 620 microarchitecture . . . . . . . . . 163--174 Theodore H. Romer and Wayne H. Ohlrich and Anna R. Karlin and Brian N. Bershad Reducing TLB and memory overhead using online superpage promotion . . . . . . . 176--187 Zheng Zhang and Josep Torrellas Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching . . . . . 188--199 K. V. Anjan and Timothy Mark Pinkston An efficient, fully adaptive deadlock recovery scheme: DISHA . . . . . . . . . 201--210 Kang G. Shin and Stuart W. Daniel Analysis and implementation of hybrid switching . . . . . . . . . . . . . . . 211--219 Binh Vien Dao and Jose Duato and Sudhakar Yalamanchili Configurable flow control mechanisms for fault-tolerant routing . . . . . . . . . 220--229 Timothy Callahan and Seth Copen Goldstein NIFDY: a low overhead, high throughput network interface . . . . . . . . . . . 230--241 Montse Peiron and Mateo Valero and Eduard Ayguadé and Tomás Lang Vector multiprocessors with arbitrated memory access . . . . . . . . . . . . . 243--252 Krishna M. Kavi and A. R. Hurson and Phenil Patadia and Elizabeth Abraham and Ponnarasu Shanmugam Design of cache memories for multi-threaded dataflow architecture . . 253--264 François Bodin and André Seznec Skewed associativity enhances performance predictability . . . . . . . 265--274 Cliff Young and Nicolas Gloy and Michael D. Smith A comparative analysis of schemes for correlated branch prediction . . . . . . 276--286 Brad Calder and Dirk Grunwald Next cache line and set prediction . . . 287--296 Vijay Karamcheti and Andrew A. Chien A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D . . . . . . . . . . . . . . . . 298--307 T. Stricker and T. Gross Optimizing memory system performance for communication in parallel computers . . 308--319 Remzi H. Arpaci and David E. Culler and Arvind Krishnamurthy and Steve G. Steinberg and Katherine Yelick Empirical evaluation of the CRAY-T$3$D: a compiler perspective . . . . . . . . . 320--331 Thomas M. Conte and Kishore N. Menezes and Patrick M. Mills and Burzin A. Patel Optimization of instruction fetch mechanisms for high issue rates . . . . 333--344 Richard Uhlig and David Nagle and Trevor Mudge and Stuart Sechrest and Joel Emer Instruction fetching: coping with code bloat . . . . . . . . . . . . . . . . . 345--356 Dennis Lee and Jean-Loup Baer and Brad Calder and Dirk Grunwald Instruction cache fetch policies for speculative execution . . . . . . . . . 357--367 Todd M. Austin and Dionisios N. Pnevmatikatos and Gurindar S. Sohi Streamlining data cache access with fast address calculation . . . . . . . . . . 369--380 Hong Wang and Tong Sun and Qing Yang CAT---caching address tags: a technique for reducing area cost of on-chip caches 381--390 Dean M. Tullsen and Susan J. Eggers and Henry M. Levy Simultaneous multithreading: maximizing on-chip parallelism . . . . . . . . . . 392--403 Richard C. Ho and C. Han Yang and Mark A. Horowitz and David L. Dill Architecture validation for processors 404--413 Gurindar S. Sohi and Scott E. Breach and T. N. Vijaykumar Multiscalar processors . . . . . . . . . 414--425
Carl J. Beckmann HTGL: a program modelling language . . . 3--10 Jean-Louis Lafitte On structured data handling in parallel processing . . . . . . . . . . . . . . . 11--18 B. Ulmann o$ \mu $-EP-1: a simple 32-bit architecture . . . . . . . . . . . . . . 19--24 Mark Thorson Internet Nuggets . . . . . . . . . . . . 25--27 Daniel Tabak \em Cache and Memory Hierarchy Design: A Performance-Directed Approach by Steven A. Przybylski . . . . . . . . . . . . . 28--28
Maurice V. Wilkes The memory wall and the CMOS end-point 4--6 Eric E. Johnson Graffiti on ``the memory wall'' . . . . 7--8 Tariq Afzal Performance modeling using the Motorola PowerPC timing simulator . . . . . . . . 9--18 Behrooz Parhami SIMD machines: do they have a significant future? . . . . . . . . . . 19--22 Ravi Jain and John Werth Airdisks and airRAID (expanded extract): modeling and scheduling periodic wireless data broadcast . . . . . . . . 23--28 Leonidas I. Kontothanassis and Michael L. Scott Efficient shared memory with minimal hardware support . . . . . . . . . . . . 29--35
Michael K. Gschwind and Thomas J. Pietsch Vector prefetching . . . . . . . . . . . 1--7 Ramesh K. Karne Object-oriented computer architectures for new generation of applications . . . 8--19 Humayun Khalid The unconventional replacement algorithms . . . . . . . . . . . . . . . 20--26 Humayun Khalid A trace-driven simulation methodology 27--33 Nikki Mirghafori and Margret Jacoby and David Patterson Truth in SPEC benchmarks . . . . . . . . 34--42 Mark Thorson Internet Nuggets . . . . . . . . . . . . 43--44
Trevor Mudge Report on the panel: ``How Can Computer Architecture Researchers Avoid Becoming the Society for Irreproducible Results?'' . . . . . . . . . . . . . . . 1--5 Oh-Young Kwon and Gi-Ho Park and Tack-Don Han A compiler optimization to reduce execution time of loop nest . . . . . . 6--11 Mark Thorson Internet Nuggets . . . . . . . . . . . . 12--16 Daniel Tabak Book Review: \em Alpha Implementations and Architecture by Dileep P. Bhandarkar 17--18
Marius Evers and Po-Yung Chang and Yale N. Patt Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches . . . . 3--11 Nicolas Gloy and Cliff Young and J. Bradley Chen and Michael D. Smith An analysis of dynamic branch prediction schemes on system workloads . . . . . . 12--21 Stuart Sechrest and Chih-Chieh Lee and Trevor Mudge Correlation and aliasing in dynamic branch predictors . . . . . . . . . . . 22--32 Steven K. Reinhardt and Robert W. Pfile and David A. Wood Decoupled hardware support for distributed shared memory . . . . . . . 34--43 Donald Yeung and John Kubiatowicz and Anant Agarwal MGS: a multigrain shared memory system 44--55 Christine Morin and Alain Gefflaut and Michel Banâtre and Anne-Marie Kermarrec COMA: an opportunity for building fault-tolerant scalable shared memory multiprocessors . . . . . . . . . . . . 56--65 Basem A. Nayfeh and Lance Hammond and Kunle Olukotun Evaluation of design alternatives for a multiprocessor microprocessor . . . . . 67--77 Doug Burger and James R. Goodman and Alain Kägi Memory bandwidth limitations of future microprocessors . . . . . . . . . . . . 78--89 Ashley Saulsbury and Fong Pong and Andreas Nowatzyk Missing the memory wall: the case for processor/memory integration . . . . . . 90--101 André Seznec Don't use the page number, but a pointer to it . . . . . . . . . . . . . . . . . 104--113 Toni Juan and Tomás Lang and Juan J. Navarro The difference-bit cache . . . . . . . . 114--120 Liviu Iftode and Jaswinder Pal Singh and Kai Li Understanding application performance on shared virtual memory systems . . . . . 122--133 Chris Holt and Jaswinder Pal Singh and John Hennessy Application and architectural bottlenecks in large scale distributed shared memory machines . . . . . . . . . 134--145 Kenneth M. Wilson and Kunle Olukotun and Mendel Rosenblum Increasing cache port efficiency for dynamic superscalar microprocessors . . 147--157 Todd M. Austin and Gurindar S. Sohi High-bandwidth address translation for multiple-issue processors . . . . . . . 158--167 Yiming Hu and Qing Yang DCD---disk caching disk: a new approach for boosting I/O performance . . . . . . 169--178 Olivier Maquelin and Guang R. Gao and Herbert H. J. Hum and Kevin B. Theobald and Xin-Min Tian Polling watchdog: combining polling and interrupts for efficient message handling . . . . . . . . . . . . . . . . 179--188 Dean M. Tullsen and Susan J. Eggers and Joel S. Emer and Henry M. Levy and Jack L. Lo and Rebecca L. Stamm Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor . . . . . . . . 191--202 Richard J. Eickemeyer and Ross E. Johnson and Steven R. Kunkel and Mark S. Squillante and Shiafun Liu Evaluation of multithreaded uniprocessors for commercial application environments . . . . . . . . . . . . . . 203--212 Tetsuya Hara and Hideki Ando and Chikako Nakanishi and Masao Nakaya Performance comparison of ILP machines with cycle time evaluation . . . . . . . 213--224 Jae H. Kim and Andrew A. Chien Rotating combined queueing (RCQ): bandwidth and latency guarantees in low-cost, high-performance networks . . 226--236 Jennifer Rexford and John Hall and Kang G. Shin A router architecture for real-time point-to-point networks . . . . . . . . 237--246 Shubhendu S. Mukherjee and Babak Falsafi and Mark D. Hill and David A. Wood Coherent network interfaces for fine-grain communication . . . . . . . . 247--258 Mark Horowitz and Margaret Martonosi and Todd C. Mowry and Michael D. Smith Informing memory operations: providing memory performance feedback in modern processors . . . . . . . . . . . . . . . 260--270 Chun Xia and Josep Torrellas Instruction prefetching of systems codes with layout optimized for reduced cache misses . . . . . . . . . . . . . . . . . 271--282 Lynn Choi and Pen-Chung Yew Compiler and hardware support for cache coherence in large-scale multiprocessors: design considerations and performance study . . . . . . . . . 283--294 Edward W. Felten and Richard D. Alpert and Angelos Bilas and Matthias A. Blumrich and Douglas W. Clark and Stefanos N. Damianakis and Cezary Dubnicki and Liviu Iftode and Kai Li Early experience with message-passing on the SHRIMP multicomputer . . . . . . . . 296--307 Tom Lovett and Russell Clapp STiNG: a CC-NUMA computer system for the commercial marketplace . . . . . . . . . 308--317
J. Carretero and F. Pérez and P. de Miguel and F. García and L. Alonso A massively parallel and distributed I/O subsystem . . . . . . . . . . . . . . . 1--8 W. B. Ligon III and Daniel C. Stanzione, Jr. Distributing and load-balancing for loops in scientific applications . . . . 9--17 Samson Belayneh and David R. Kaeli A discussion on non-blocking/lockup-free caches . . . . . . . . . . . . . . . . . 18--25 Mark Thorson Internet Nuggets . . . . . . . . . . . . 26--32
Gerard Páez-Monzón and Charles Páez-Monzón The RISC processor DMN-6: a unified data-control flow architecture . . . . . 3--10 J. A. Gómez Pulido and J. M. Sánchez Pérez and J. A. Moreno Zamora An educational tool for testing hierarchical multilevel caches . . . . . 11--15 Samson Belayneh and David R. Kaeli A discussion on non-blocking/lockup-free caches . . . . . . . . . . . . . . . . . 16--16 Mark Rosenbaum Architectural potholes . . . . . . . . . 17--18 John Mashey Architectural potholes . . . . . . . . . 18--18 Adrian Cockcroft I/O potholes . . . . . . . . . . . . . . 18--19 Zahir Ebrahim I/O potholes . . . . . . . . . . . . . . 19--20 Brad Carlile Interpreting benchmarks . . . . . . . . 20--21 David Chase Register windows . . . . . . . . . . . . 21--21 Paul W. DeMone Register windows and delay slots . . . . 21--22
Charlton D. Rose and J. Kelly Flanagan Constructing instruction traces from cache-filtered address traces (CITCAT) 1--8 Susan Flynn Hummel Efficient data sharing with conditional remote memory transfers . . . . . . . . 9--17 Larry Widigen and Elliot Sowadsky and Kevin McGrath Eliminating operand read latency . . . . 18--22 Philip Machanick The case for SRAM main memory . . . . . 23--30
Dileep Bhandarkar RISC versus CISC: a tale of two chips 1--12 I. Martín and F. Tirado A SIMD computer for multigrid methods 13--18 Reinhold Weicker On the use of SPEC benchmarks in computer architecture research . . . . . 19--22 Shubhendu S. Mukherjee What should graduate students know before joining a large computer architecture project? . . . . . . . . . 23--26 Humayun Khalid A new cache replacement scheme based on backpropagation neural networks . . . . 27--33 Mark Thorson Internet Nuggets . . . . . . . . . . . . 34--36
Sriram Vajapeyam and Tulika Mitra Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences . . . . . . . . . . . . . 1--12 Ravi Nair and Martin E. Hopkins Exploiting instruction level parallelism in processors by caching scheduled groups . . . . . . . . . . . . . . . . . 13--25 Kemal Ebcio\uglu and Erik R. Altman DAISY: dynamic compilation for 100% architectural compatibility . . . . . . 26--37 Timothy Mark Pinkston and Sugath Warnakulasuriya On deadlocks in interconnection networks 38--49 Craig B. Stunkel and Rajeev Sivaram and Dhabaleswar K. Panda Implementing multidestination worms in switch-based parallel systems: architectural alternatives and their impact . . . . . . . . . . . . . . . . . 50--61 Guillermo A. Alvarez and Walter A. Burkhard and Flaviu Cristian Tolerating multiple failures in RAID architectures with optimal storage and uniform declustering . . . . . . . . . . 62--72 Dan Teodosiu and Joel Baxter and Kinshuk Govil and John Chapin and Mendel Rosenblum and Mark Horowitz Hardware fault containment in scalable shared-memory multiprocessors . . . . . 73--84 Richard P. Martin and Amin M. Vahdat and David E. Culler and Thomas E. Anderson Effects of communication latency, overhead, and bandwidth in a cluster architecture . . . . . . . . . . . . . . 85--97 Wolf-Dietrich Weber and Stephen Gold and Pat Helland and Takeshi Shimizu and Thomas Wicki and Winfried Wilcke The Mercury Interconnect Architecture: a cost-effective infrastructure for high-performance servers . . . . . . . . 98--107 Ziyad S. Hakura and Anoop Gupta The design and analysis of a cache architecture for texture mapping . . . . 108--120 Kenneth M. Wilson and Kunle Olukotun Designing high bandwidth on-chip caches 121--132 Keith I. Farkas and Paul Chow and Norman P. Jouppi and Zvonko Vranesic Memory-system design considerations for dynamically-scheduled processors . . . . 133--143 Parthasarathy Ranganathan and Vijay S. Pai and Hazim Abdel-Shafi and Sarita V. Adve The interaction of software prefetching with ILP processors in shared-memory systems . . . . . . . . . . . . . . . . 144--156 Leonidas Kontothanassis and Galen Hunt and Robert Stets and Nikolaos Hardavellas and Micha\l Cierniak and Srinivasan Parthasarathy and Wagner Meira, Jr. and Sandhya Dwarkadas and Michael Scott VM-based shared memory on low-latency, remote-memory-access networks . . . . . 157--169 Alain Kägi and Doug Burger and James R. Goodman Efficient synchronization: let them eat QOLB . . . . . . . . . . . . . . . . . . 170--180 Andreas Moshovos and Scott E. Breach and T. N. Vijaykumar and Gurindar S. Sohi Dynamic speculation and synchronization of data dependences . . . . . . . . . . 181--193 Avinash Sodani and Gurindar S. Sohi Dynamic instruction reuse . . . . . . . 194--205 Subbarao Palacharla and Norman P. Jouppi and J. E. Smith Complexity-effective superscalar processors . . . . . . . . . . . . . . . 206--218 Maged M. Michael and Ashwini K. Nanda and Beng-Hong Lim and Michael L. Scott Coherence controller architectures for SMP-based CC-NUMA multiprocessors . . . 219--228 Babak Falsafi and David A. Wood Reactive NUMA: a design for unifying S-COMA and CC-NUMA . . . . . . . . . . . 229--240 James Laudon and Daniel Lenoski The SGI Origin: a ccNUMA highly scalable server . . . . . . . . . . . . . . . . . 241--251 Doug Joseph and Dirk Grunwald Prefetching using Markov predictors . . 252--263 Vatsa Santhanam and Edward H. Gornish and Wei-Chung Hsu Data prefetching on the HP PA-8000 . . . 264--273 Po-Yung Chang and Eric Hao and Yale N. Patt Target prediction for indirect jumps . . 274--283 Eric Sprangle and Robert S. Chappell and Mitch Alsup and Yale N. Patt The agree predictor: a mechanism for reducing negative branch history interference . . . . . . . . . . . . . . 284--291 Pierre Michaud and André Seznec and Richard Uhlig Trading conflict and capacity aliasing in conditional branch predictors . . . . 292--303 Joel Emer and Nikolas Gloy A language for describing predictors and its application to automatic synthesis 304--314 Teresa L. Johnson and Wen-mei W. Hwu Run-time adaptive cache hierarchy management via reference analysis . . . 315--326 Richard Fromm and Stylianos Perissakis and Neal Cardwell and Christoforos Kozyrakis and Bruce McGaughy and David Patterson and Tom Anderson and Katherine Yelick The energy efficiency of IRAM architectures . . . . . . . . . . . . . 327--337 Doug Burger and Stefanos Kaxiras and James R. Goodman DataScalar architectures . . . . . . . . 338--349
Maurice Wilkes and Andrew Hopper The collapsed LAN: a solution to a bandwidth problem? . . . . . . . . . . . 1--5 Tommi Jokinen and Chia-Jiu Wang Cache design with path balancing table, skewing and indirect tags . . . . . . . 6--12 Doug Burger and Todd M. Austin The SimpleScalar tool set, version 2.0 13--25 Mark Thorson Internet Nuggets . . . . . . . . . . . . 26--27
Rodney Van Meter and Greg Finn and Steve Hotz and Dave Dyer Response to the collapsed LAN . . . . . 1--12 Weiwu Hu and Peisu Xia Out-of-order execution in sequentially consistent shared-memory systems . . . . 3--10 Humayun Khalid A novel trace sampling technique . . . . 11--16 Humayun Khalid Performance of the KORA-2 cache replacement scheme . . . . . . . . . . . 17--21 D. N. Jutla and P. Bodorik Improving applications performance: a memory model and cache architecture . . 22--29 B. Ulmann NICE: an elegant and powerful 32-bit architecture . . . . . . . . . . . . . . 30--35 Mark Thorson Internet Nuggets . . . . . . . . . . . . 36--41
Vijay S. Pai and Parthasarathy Ranganathan and Sarita V. Adve RSIM: Rice simulator for ILP multiprocessors . . . . . . . . . . . . 1--1 Weisong Shi and Weiwu Hu and Ming Zhu An innovative implementation for directory-based cache coherence in shared memory multiprocessors . . . . . 2--9 Mark Thorson Internet Nuggets . . . . . . . . . . . . 10--14
B. Ulmann Instruction looping, an extension to conditional execution . . . . . . . . . 3--4 Günter Haring and Christoph Lindemann and Martin Reiser International workshop performance evaluation --- origins and directions 5--6 Wes Munsil and Chia-Jiu Wang Reducing stack usage in Java bytecode execution . . . . . . . . . . . . . . . 7--11 Mark Thorson Internet nuggets . . . . . . . . . . . . 12--17
Mayan Moudgill Techniques for fast simulation of associative cache directories . . . . . 1--8 Byung-Kwon Chung and Jih-Kwon Peir LRU-based column-associative caches . . 9--17 Mark Thorson Internet Nuggets . . . . . . . . . . . . 18--22
Luiz André Barroso and Kourosh Gharachorloo and Edouard Bugnion Memory system characterization of commercial workloads . . . . . . . . . . 3--14 Kimberly Keeton and David A. Patterson and Yong Qiang He and Roger C. Raphael and Walter E. Baker Performance characterization of a Quad Pentium Pro SMP using OLTP workloads . . 15--26 Dennis C. Lee and Patrick J. Crowley and Jean-Loup Baer and Thomas E. Anderson and Brian N. Bershad Execution characteristics of desktop applications on Windows NT . . . . . . . 27--38 Jack L. Lo and Luiz André Barroso and Susan J. Eggers and Kourosh Gharachorloo and Henry M. Levy and Sujay S. Parekh An analysis of database workload performance on simultaneous multithreaded processors . . . . . . . . 39--50 Marius Evers and Sanjay J. Patel and Robert S. Chappell and Yale N. Patt An analysis of correlation and predictability: what makes two-level branch predictors work . . . . . . . . . 52--61 Eitan Federovsky and Meir Feder and Sholomo Weiss Branch prediction based on universal data compression algorithms . . . . . . 62--72 Yiannakis Sazeides and James E. Smith Modeling program predictability . . . . 73--84 Michael Cox and Narendra Bhandari and Michael Shantz Multi-level texture caching for $3$D graphics hardware . . . . . . . . . . . 86--97 Hans Eberle and Erwin Oertli Switcherland: a QoS communication architecture for workstation clusters 98--108 Guillermo A. Alvarez and Walter A. Burkhard and Larry J. Stockmeyer and Flaviu Cristian Declustered disk array architectures with optimal and near-optimal parallelism . . . . . . . . . . . . . . 109--120 Dirk Grunwald and Artur Klauser and Srilatha Manne and Andrew Pleszkun Confidence estimation for speculation control . . . . . . . . . . . . . . . . 122--131 Srilatha Manne and Artur Klauser and Dirk Grunwald Pipeline gating: speculation control for energy reduction . . . . . . . . . . . . 132--141 George Z. Chrysos and Joel S. Emer Memory dependence prediction using store sets . . . . . . . . . . . . . . . . . . 142--153 Toni Juan and Sanji Sanjeevan and Juan J. Navarro Dynamic history-length fitting: a third level of adaptivity for branch prediction . . . . . . . . . . . . . . . 155--166 Karel Driesen and Urs Hölzle Accurate indirect branch prediction . . 167--178 Shubhendu S. Mukherjee and Mark D. Hill Using prediction to accelerate coherence protocols . . . . . . . . . . . . . . . 179--190 Mark Oskin and Frederic T. Chong and Timothy Sherwood Active pages: a computation model for intelligent memory . . . . . . . . . . . 192--203 Mark Swanson and Leigh Stoller and John Carter Increasing TLB reach using superpages backed by shadow memory . . . . . . . . 204--213 Xiaogang Qiu and Michel Dubois Options for dynamic address translation in COMAs . . . . . . . . . . . . . . . . 214--225 David I. August and Daniel A. Connors and Scott A. Mahlke and John W. Sias and Kevin M. Crozier and Ben-Chung Cheng and Patrick R. Eaton and Qudus B. Olaniran and Wen-mei W. Hwu Integrated predicated and speculative execution in the IMPACT EPIC architecture . . . . . . . . . . . . . . 227--237 Steven Wallace and Brad Calder and Dean M. Tullsen Threaded multiple path execution . . . . 238--249 Artur Klauser and Abhijit Paithankar and Dirk Grunwald Selective eager execution on the PolyPath architecture . . . . . . . . . 250--259 Sanjay Jeram Patel and Marius Evers and Yale N. Patt Improving trace cache effectiveness with branch promotion and trace packing . . . 262--271 Freddy Gabbay and Avi Mendelson The effect of instruction fetch bandwidth on value prediction . . . . . 272--281 David H. Albonesi Dynamic IPC/clock rate optimization . . 282--292 Yinong Zhang and George B. Adams III Performance modeling and code partitioning for the DS architecture . . 293--304 Stephen W. Keckler and William J. Dally and Daniel Maskit and Nicholas P. Carter and Andrew Chang and Whay S. Lee Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor . . . . . . . . . . . . . . . 306--317 Gheith A. Abandah and Edward S. Davidson Effects of architectural and technological advances on the HP/Convex Exemplar's memory and communication performance . . . . . . . . . . . . . . 318--329 Matthias A. Blumrich and Richard D. Alpert and Yuqun Chen and Douglas W. Clark and Stefanos N. Damianakis and Cezary Dubnicki and Edward W. Felten and Liviu Iftode and Kai Li and Margaret Martonosi and Robert A. Shillner Design choices in the SHRIMP system: an empirical study . . . . . . . . . . . . 330--341 Vijayaraghavan Soundararajan and Mark Heinrich and Ben Verghese and Kourosh Gharachorloo and Anoop Gupta and John Hennessy Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors . . . . . . . . . . 342--355 Sanjeev Kumar and Christopher Wilkerson Exploiting spatial locality in data caches using spatial footprints . . . . 357--368 William L. Lynch and Gary Lauterbach and Joseph I. Chamdani Low load latency through sum-addressed memory (SAM) . . . . . . . . . . . . . . 369--379 Daniel J. Sorin and Vijay S. Pai and Sarita V. Adve and Mary K. Vernon and David A. Wood Analytic evaluation of shared-memory systems with ILP processors . . . . . . 380--391
Prasad N. Golla and Eric C. Lin A comparison of the effect of branch prediction on multithreaded and scalar architectures . . . . . . . . . . . . . 3--11 Mark Thorson Internet nuggets . . . . . . . . . . . . 12--16
Philip Machanick Streaming vs. latency in information mass-transit . . . . . . . . . . . . . . 4--6 Jean-Louis Lafitte A generalized mapping device to help memory latency . . . . . . . . . . . . . 7--13 Farooq Ashraf and Mostafa Abd-El-Barr and Khalid Al-Tawil Introduction to routing in multicomputer networks . . . . . . . . . . . . . . . . 14--21 Dick Wilmot Data threaded microarchitecture . . . . 22--32
C. K. Yuen Stack and RISC . . . . . . . . . . . . . 3--9 Sandra Johnson Baylor Unified scalable shared memory architectures . . . . . . . . . . . . . 10--21 Anthony DeWitt and Thomas Gross The potential of thread-level speculation based on value profiling . . 22--22 John Kalamatianos and David R. Kaeli Improving the accuracy of indirect branch prediction via branch classification . . . . . . . . . . . . . 23--26 Roy Dz-ching Ju and Jean-François Collard and Karim Oukbir Probabilistic memory disambiguation and its application to data speculation . . 27--30 Matthew A. Postiff and David A. Greene and Gary S. Tyson and Trevor N. Mudge The limits of instruction level parallelism in SPEC95 applications . . . 31--34 Byung-Sun Yang and Junpyo Lee and Jinpyo Park and Soo-Mook Moon and Kemal Ebcio\uglu and Erik Altman Lightweight monitor for Java VM . . . . 35--38 Amit Rao and Santosh Pande Storage assignment using expression tree transformations to generate compact and efficient DSP code . . . . . . . . . . . 39--42 Krisztián Flautner and Gary S. Tyson and Trevor Mudge A high level simulator integrated with the Mirv compiler . . . . . . . . . . . 43--46 H. Cassé and L. Féraud and C. Rochange and P. Sainrat Using the abstract interpretation technique for static pointer analysis 47--50 Iris Bahar and Brad Calder and Dirk Grunwald A comparison of software code reordering and victim buffers . . . . . . . . . . . 51--54 Steve Carr and Philip Sweany Improving software pipelining with hardware support for self-spatial loads 55--58
Rajeev Barua and Walter Lee and Saman Amarasinghe and Anant Agarwal Maps: a compiler-managed memory system for raw machines . . . . . . . . . . . . 4--15 Sriram Vajapeyam and P. J. Joseph and Tulika Mitra Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs . . . . . . . . . . . . . . . . 16--27 Seth Copen Goldstein and Herman Schmit and Matthew Moe and Mihai Budiu and Srihari Cadambi and R. Reed Taylor and Ronald Laufer PipeRench: a co/processor for streaming multimedia acceleration . . . . . . . . 28--39 Adi Yoaz and Mattan Erez and Ronny Ronen and Stephan Jourdan Speculation techniques for improving load related instruction scheduling . . 42--53 Michael Bekerman and Stephan Jourdan and Ronny Ronen and Gilad Kirshenboim and Lihu Rappoport and Adi Yoaz and Uri Weiser Correlated load-address predictors . . . 54--63 Brad Calder and Glenn Reinman and Dean M. Tullsen Selective value prediction . . . . . . . 64--74 Xiaogang Qiu and Michel Dubois Tolerating late memory traps in ILP processors . . . . . . . . . . . . . . . 76--87 Chi-Keung Luk and Todd C. Mowry Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation . . . . . . . 88--99 Sangyeun Cho and Pen-Chung Yew and Gyungho Lee Decoupling local variable accesses in a wide-issue superscalar processor . . . . 100--110 Amir Roth and Gurindar S. Sohi Effective jump-pointer prefetching for linked data structures . . . . . . . . . 111--121 Parthasarathy Ranganathan and Sarita Adve and Norman P. Jouppi Performance of image and video processing with general-purpose processors and media ISA extensions . . 124--135 Matthew C. Merten and Andrew R. Trick and Christopher N. George and John C. Gyllenhaal and Wen-mei W. Hwu A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization . . . . . . . . . . 136--147 Xiaowei Shen and Arvind and Larry Rudolph Commit-reconcile & fences (CRF): a new memory model for architects and compiler writers . . . . . . . . . . . . . . . . 150--161 Chris Gniady and Babak Falsafi and T. N. Vijaykumar Is SC + ILP = RC? . . . . . . . . . . . 162--171 An-Chow Lai and Babak Falsafi Memory sharing predictor: the key to a speculative coherent DSM . . . . . . . . 172--183 Robert S. Chappell and Jared Stark and Sangwook P. Kim and Steven K. Reinhardt and Yale N. Patt Simultaneous subordinate microthreading (SSMT) . . . . . . . . . . . . . . . . . 186--195 Bryan Black and Bohuslav Rychlik and John Paul Shen The block-based trace cache . . . . . . 196--207 David I. August and John W. Sias and Jean-Michel Puiatti and Scott A. Mahlke and Daniel A. Connors and Kevin M. Crozier and Wen-mei W. Hwu The program decision logic approach to predicated execution . . . . . . . . . . 208--219 Vinodh Cuppu and Bruce Jacob and Brian Davis and Trevor Mudge A performance comparison of contemporary DRAM architectures . . . . . . . . . . . 222--233 Glenn Reinman and Todd Austin and Brad Calder A scalable front-end architecture for fast instruction delivery . . . . . . . 234--245 Seongwoo Kim and Arun K. Somani Area efficient architectures for information integrity in cache memories 246--255 Tarun Nakra and Rajiv Gupta and Mary Lou Soffa Value prediction in VLIW machines . . . 258--269 Dean M. Tullsen and John S. Seng Storageless value prediction using prior register values . . . . . . . . . . . . 270--279 Angelos Bilas and Cheng Liao and Jaswinder Pal Singh Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems . . . . . 282--293 E. Ender Bilir and Ross M. Dickson and Ying Hu and Manoj Plakal and Daniel J. Sorin and Mark D. Hill and David A. Wood Multicast snooping: a new coherence method using a multicast address network 294--304 Dongming Jiang and Jaswinder Pal Singh Scaling application performance on a cache-coherent multiprocessor . . . . . 305--316
Anonymous In memoriam---SIGARCH founder: Caxton C. Foster . . . . . . . . . . . . . . . . . 1--3 Seung H. Hwang and Gwan S. Choi Selective-set-invalidation (SSI) for soft-error-resilient cache architecture 4--9 Peng Cheng and Hai Jin and Jiangling Zhang Design of high performance RAID in real-time system . . . . . . . . . . . . 10--17 C. K. Yuen Architectural support for the cache based vector computation . . . . . . . . 18--23 Benjamin Driker Disbursed control computer architecture 24--31 Humayun Khalid Performance evaluation of multimedia systems with MPEG-2 bitstreams . . . . . 32--37 Humayun Khalid A methodology for performance evaluation of systems with large emulation code . . 38--42 Humayun Khalid Tracing multimedia benchmarks with five degrees of validation . . . . . . . . . 43--48 Humayun Khalid Performance evaluation of two operating systems . . . . . . . . . . . . . . . . 49--52 Mark Thorson Internet Nuggets . . . . . . . . . . . . 53--60
Phillip Machanick Correction to RAMpage ASPOLOS paper . . 2--5 H. S. Shahhoseini and M. Naderi and S. Nemati Achieving the best performance on superscalar processors . . . . . . . . . 6--11 Mark Thorson Internet Nuggets . . . . . . . . . . . . 12--14
Marc Torrant and Muhammad Shaaban and Roy Czernikowski and Ken Hsu A simultaneous multithreading simulator 1--5 Mark Thorson Internet Nuggets . . . . . . . . . . . . 6--10
Min Dai and Christine Eisenbeis and Sid-Ahmed-Ali Touati Load-store optimization for software pipelining . . . . . . . . . . . . . . . 3--10 Philippe Clauss and Beno\^\it Meister Automatic memory layout transformations to optimize spatial locality in parameterized loop nests . . . . . . . . 11--19 Barbara Kreaseck and Dean Tullsen and Brad Calder Limits of task-based parallelism in irregular applications . . . . . . . . . 20--20 Junpyo Lee and Byung-Sun Yang and Suhyun Kim and Kemal Ebcio\uglu and Erik Altman and Seungil Lee and Yoo C. Chung and Heungbok Lee and Je Hyung Lee and Soo-Mook Moon Reducing virtual call overheads in a Java VM just-in-time compiler . . . . . 21--33 Chris Sadler and Sandeep K. S. Gupta and Rohit Bhatia Applying predication to efficiently handle runtime class testing . . . . . . 34--42 Nerina Bermudo and Xavier Vera and Antonio González and Josep Llosa Optimizing cache miss equations polyhedra . . . . . . . . . . . . . . . 43--52 A. Unger and E. Zehendner and Th. Ungerer A combined compiler and architecture technique to control multithreaded execution of branches and loop iterations . . . . . . . . . . . . . . . 53--61 Hakan Aydin and David Kaeli Using cache line coloring to perform aggressive procedure inlining . . . . . 62--71 Akhilesh Tyagi and Gyungho Lee A compiler optimization paradigm for dynamic energy management . . . . . . . 72--76 Mark Thorson Internet Nuggets . . . . . . . . . . . . 77--78
J. Greggory Steffan and Christopher B. Colohan and Antonia Zhai and Todd C. Mowry A scalable approach to thread-level speculation . . . . . . . . . . . . . . 1--12 Marcelo Cintra and José F. Martínez and Josep Torrellas Architectural support for scalable speculative parallelization in shared-memory multiprocessors . . . . . 13--24 Steven K. Reinhardt and Shubhendu S. Mukherjee Transient fault detection via simultaneous multithreading . . . . . . 25--36 Quinn Jacobson and James E. Smith Trace preconstruction . . . . . . . . . 37--46 Ryan Rakvic and Bryan Black and John Paul Shen Completion time multiple branch prediction for enhancing trace cache performance . . . . . . . . . . . . . . 47--58 Matthew C. Merten and Andrew R. Trick and Erik M. Nystrom and Ronald D. Barnes and Wen-mei W. Hmu A hardware mechanism for dynamic extraction and relayout of program hot spots . . . . . . . . . . . . . . . . . 59--70 Mark Oskin and Frederic T. Chong and Matthew Farrens HLS: combining statistical and symbolic simulation to guide microprocessor designs . . . . . . . . . . . . . . . . 71--82 David Brooks and Vivek Tiwari and Margaret Martonosi Wattch: a framework for architectural-level power analysis and optimizations . . . . . . . . . . . . . 83--94 N. Vijaykrishnan and M. Kandemir and M. J. Irwin and H. S. Kim and W. Ye Energy-driven integrated hardware-software optimizations using SimplePower . . . . . . . . . . . . . . 95--106 Erik G. Hallnor and Steven K. Reinhardt A fully associative software-managed cache design . . . . . . . . . . . . . . 107--116 Ashley Saulsbury and Fredrik Dahlgren and Per Stenström Recency-based TLB preloading . . . . . . 117--127 Scott Rixner and William J. Dally and Ujval J. Kapasi and Peter Mattson and John D. Owens Memory access scheduling . . . . . . . . 128--138 An-Chow Lai and Babak Falsafi Selective, accurate, and timely self-invalidation using last-touch prediction . . . . . . . . . . . . . . . 139--148 Norman Margolus An embedded DRAM architecture for large-scale spatial-lattice computations 149--160 Ken Mai and Tim Paaske and Nuwan Jayasena and Ron Ho and William J. Dally and Mark Horowitz Smart Memories: a modular reconfigurable architecture . . . . . . . . . . . . . . 161--171 Craig B. Zilles and Gurindar S. Sohi Understanding the backward slices of performance degrading instructions . . . 172--181 Kevin M. Lepak and Mikko H. Lipasti On the value locality of store instructions . . . . . . . . . . . . . . 182--191 Zarka Cvetanovic and R. E. Kessler Performance analysis of the Alpha 21264-based Compaq ES40 system . . . . . 192--202 Paolo Faraboschi and Geoffrey Brown and Joseph A. Fisher and Giuseppe Desoli and Fred Homewood Lx: a technology platform for customizable VLIW embedded processing 203--213 Parthasarathy Ranganathan and Sarita Adve and Norman P. Jouppi Reconfigurable caches and their application to media processing . . . . 214--224 Zhi Alex Ye and Andreas Moshovos and Scott Hauck and Prithviraj Banerjee CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit . . . . . 225--235 Dana S. Henry and Bradley C. Kuszmaul and Gabriel H. Loh and Rahul Sami Circuits for wide-window superscalar processors . . . . . . . . . . . . . . . 236--247 Vikas Agarwal and M. S. Hrishikesh and Stephen W. Keckler and Doug Burger Clock rate versus IPC: the end of the road for conventional microarchitectures 248--259 J. E. Smith and Greg Faanes and Rabin Sugumar Vector instruction set support for conditional operations . . . . . . . . . 260--269 Yuan Chou and John Paul Shen Instruction path coprocessors . . . . . 270--281 Luiz André Barroso and Kourosh Gharachorloo and Robert McNamara and Andreas Nowatzyk and Shaz Qadeer and Barton Sano and Scott Smith and Robert Stets and Ben Verghese Piranha: a scalable architecture based on single-chip multiprocessing . . . . . 282--293 Ramesh Radhakrishnan and Deependra Talla and Lizy Kurian John Allowing for ILP in an embedded Java processor . . . . . . . . . . . . . . . 294--305 Michael Bekerman and Adi Yoaz and Freddy Gabbay and Stephan Jourdan and Maxim Kalaev and Ronny Ronen Early load address resolution via register tracking . . . . . . . . . . . 306--315 José-Lorenzo Cruz and Antonio González and Mateo Valero and Nigel P. Topham Multiple-banked register file architectures . . . . . . . . . . . . . 316--325
Benjamín Sahelices Fernández and Diego R. Llanos Ferraris and Agustín de Dios Hernández Exploiting parallelism in a network of workstations using COMA-BC . . . . . . . 1--8 Mark Thorson Internet Nuggets . . . . . . . . . . . . 9--13
Jean-Louis Lafitte Regarding a device to help battering the RAM wall . . . . . . . . . . . . . . . . 4--10 S. Petit and J. A. Gil and J. Sahuquillo and A. Pont LIDE: a simulation environment for shared virtual memory systems . . . . . 11--18
Steven W. Schlosser and John Linwood Griffin and David F. Nagle and Gregory R. Ganger Designing computer systems with MEMS-based storage . . . . . . . . . . . 1--12 Kourosh Gharachorloo and Madhu Sharma and Simon Steely and Stephen Van Doren Architecture and design of AlphaServer GS320 . . . . . . . . . . . . . . . . . 13--24 Milo M. K. Martin and Daniel J. Sorin and Anastassia Ailamaki and Alaa R. Alameldeen and Ross M. Dickson and Carl J. Mauer and Kevin E. Moore and Manoj Plakal and Mark D. Hill and David A. Wood Timestamp snooping: an approach for extending SMPs . . . . . . . . . . . . . 25--36 Ashwini Nanda and Kwok-Ken Mak and Krishnan Sugarvanam and Ramendra K. Sahoo and Vijayaraghavan Soundararajan and T. Basil Smith MemorIES3: a programmable, real-time hardware emulation tool for multiprocessor server design . . . . . . 37--48 Jeff Gibson and Robert Kunz and David Ofelt and Mark Horowitz and John Hennessy and Mark Heinrich FLASH vs. (Simulated) FLASH: closing the simulation loop . . . . . . . . . . . . 49--58 Andy Chou and Benjamin Chelf and Dawson Engler and Mark Heinrich Using meta-level compilation to check FLASH protocol code . . . . . . . . . . 59--70 Raoul A. F. Bhoedjang and Kees Verstoep and Tim Rühl and Henri E. Bal and Rutger F. H. Hofman Evaluating design alternatives for reliable communication on high-speed networks . . . . . . . . . . . . . . . . 71--81 Peter Mattson and William J. Dally and Scott Rixner and Ujval J. Kapasi and John D. Owens Communication scheduling . . . . . . . . 82--92 Jason Hill and Robert Szewczyk and Alec Woo and Seth Hollar and David Culler and Kristofer Pister System architecture directions for networked sensors . . . . . . . . . . . 93--104 Alvin R. Lebeck and Xiaobo Fan and Heng Zeng and Carla Ellis Power aware page allocation . . . . . . 105--116 Emery D. Berger and Kathryn S. McKinley and Robert D. Blumofe and Paul R. Wilson Hoard: a scalable memory allocator for multithreaded applications . . . . . . . 117--128 Kristián Flautner and Rich Uhlig and Steve Reinhardt and Trevor Mudge Thread-level parallelism and interactive performance of desktop applications . . 129--138 Motohiro Kawahito and Hideaki Komatsu and Toshio Nakatani Effective null pointer check elimination utilizing hardware trap . . . . . . . . 139--149 Youtao Zhang and Jun Yang and Rajiv Gupta Frequent value locality and value-centric data cache design . . . . 150--159 M. Burrows and U. Erlingson and S-T. A. Leung and M. T. Vandevoorde and C. A. Waldspurger and K. Walker and W. E. Weihl Efficient and flexible value sampling 160--167 David Lie Chandramohan Thekkath and Mark Mitchell and Patrick Lincoln and Dan Boneh and John Mitchell and Mark Horowitz Architectural support for copy and tamper resistant software . . . . . . . 168--177 Jerome Burke and John McDonald and Todd Austin Architectural support for fast symmetric-key cryptography . . . . . . . 178--189 John Kubiatowicz and David Bindel and Yan Chen and Steven Czerwinski and Patrick Eaton and Dennis Geels and Ramakrishna Gummadi and Sean Rhea and Hakim Weatherspoon and Chris Wells and Ben Zhao OceanStore: an architecture for global-scale persistent storage . . . . 190--201 Evelyn Duesterwald and Vasanth Bala Software profiling for hot path prediction: less is more . . . . . . . . 202--211 Rumi Zahir and Jonathan Ross and Dale Morris and Drew Hess OS and compiler considerations in the design of the IA-64 architecture . . . . 212--221 Daniel A. Connors and Hillery C. Hunter and Ben-Chung Cheng and Wen-mei W. Hwu Hardware support for dynamic activation of compiler-directed computation reuse 222--233 Allan Snavely and Dean M. Tullsen Symbiotic job scheduling for a simultaneous multithreaded processor . . 234--244 Joshua A. Redstone and Susan J. Eggers and Henry M. Levy An analysis of operating system behavior on a simultaneous multithreaded architecture . . . . . . . . . . . . . . 245--256 Karthik Sundaramoorthy and Zach Purser and Eric Rotenburg Slipstream processors: improving both performance and fault tolerance . . . . 257--268
Maurice V. Wilkes The memory gap and the future of high performance memories . . . . . . . . . . 2--7 Naraig Manjikian Multiprocessor enhancements of the SimpleScalar tool set . . . . . . . . . 8--15 Frank Wang A modified architecture for high-density MRAM . . . . . . . . . . . . . . . . . . 16--22 Erik R. Altman and David Kaeli WBT-2000: Workshop on Binary Translation 2000 . . . . . . . . . . . . . . . . . . 23--25 Amitabh Srivastava Emerging opportunities for binary tools 26--26 Harold W. Cain and Kevin M. Lepak and Mikko H. Lipasti A dynamic binary translation approach to architectural simulation . . . . . . . . 27--36 Rolf Hilgendorf and Wolfram Sauer Instruction translation for an experimental S/390 processor . . . . . . 37--42 Michiel Ronsse and Koen De Bosschere JiTI: a robust just in time instrumentation technique . . . . . . . 43--54 David Ung and Cristina Cifuentes Optimising hot paths in a dynamic binary translator . . . . . . . . . . . . . . . 55--65 Michael Gschwind and Erik Altman Optimization and precise exceptions in dynamic compilation . . . . . . . . . . 66--74 Mark Thorson Internet Nuggets . . . . . . . . . . . . 75--77
Craig Zilles and Gurindar Sohi Execution-based prediction using speculative slices . . . . . . . . . . . 2--13 Jamison D. Collins and Hong Wang and Dean M. Tullsen and Christopher Hughes and Yong-Fong Lee and Dan Lavery and John P. Shen Speculative precomputation: long-range prefetching of delinquent loads . . . . 14--25 Rajeev Balasubramonian and Sandhya Dwarkadas and David H. Albonesi Dynamically allocating processor resources between nearby and distant ILP 26--37 Chi-Keung Luk Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors 40--51 Murali Annavaram and Jignesh M. Patel and Edward S. Davidson Data prefetching by dependence graph precomputation . . . . . . . . . . . . . 52--61 Vinodh Cuppu and Bruce Jacob Concurrency, latency, or system overhead: which has the largest impact on uniprocessor DRAM-system performance? 62--71 Brian Fields and Shai Rubin and Rastislav Bodík Focusing processor policies via critical-path prediction . . . . . . . . 74--85 Timothy Sherwood and Brad Calder Automated design of finite state machine predictors for customized processors . . 86--97 Youfeng Wu and Dong-Yuan Chen and Jesse Fang Better exploration of region-level value locality with integrated computation reuse and value prediction . . . . . . . 98--108 Lisa Wu and Chris Weaver and Todd Austin CryptoManiac: a fast flexible architecture for secure communication 110--119 Ki Hwan Yum and Eun Jung Kim and Chita R. Das QoS provisioning in clusters: an investigation of Router and NIC design 120--129 Srikanth T. Srinivasan and Roy Dz-ching Ju and Alvin R. Lebeck and Chris Wilkerson Locality vs. criticality . . . . . . . . 132--143 An-Chow Lai and Cem Fide and Babak Falsafi Dead-block prediction & dead-block correlating prefetchers . . . . . . . . 144--154 Alex Ramirez and Luiz André Barroso and Kourosh Gharachorloo and Robert Cohn and Josep Larriba-Pey and P. Geoffrey Lowney and Mateo Valero Code layout optimizations for transaction processing workloads . . . . 155--164 Michael Thaddeus Niemier and Peter M. Kogge Exploring and exploiting wire-level pipelining in emerging technologies . . 166--177 Seth Copen Goldstein and Mihai Budiu NanoFabrics: spatial computing using molecular electronics . . . . . . . . . 178--191 David Lie and Andy Chou and Dawson Engler and David L. Dill A simple method for extracting models for protocol code . . . . . . . . . . . 192--203 Milos Prvulovic and María Jesús Garzarán and Lawrence Rauchwerger and Josep Torrellas Removing architectural bottlenecks to the scalability of speculative parallelization . . . . . . . . . . . . 204--215 R. Iris Bahar and Srilatha Manne Power and energy reduction via pipeline balancing . . . . . . . . . . . . . . . 218--229 Daniele Folegnani and Antonio González Energy-effective issue logic . . . . . . 230--239 Stefanos Kaxiras and Zhigang Hu and Margaret Martonosi Cache decay: exploiting generational behavior to reduce cache leakage power 240--251 Christopher J. Hughes and Praful Kaul and Sarita V. Adve and Rohit Jain and Chanik Park and Jayanth Srinivasan Variability in the execution of multimedia applications and implications for architecture . . . . . . . . . . . . 254--265 S. Subramanya Sastry and Rastislav Bodík and James E. Smith Rapid profiling via stratified sampling 278--289
Craig B. Zilles Benchmark health considered harmful . . 4--5 Niki C. Thornock and J. Kelly Flanagan A national trace collection and distribution resource . . . . . . . . . 6--10 Mark Thorson Internet Nuggets . . . . . . . . . . . . 11--15
Naraig Manjikian More enhancements of the SimpleScalar tool set . . . . . . . . . . . . . . . . 5--12 Jason F. Cantin and Mark D. Hill Cache performance for selected SPEC CPU2000 benchmarks . . . . . . . . . . . 13--18 Jinsuo Zhang The predictability of load address . . . 19--28 Mark Thorson Internet Nuggets . . . . . . . . . . . . 29--31
M. Watheq El-Kharashi and Fayez Elguibaly and Kin F. Li Adapting Tomasulo's algorithm for bytecode folding based Java processors 1--8 S. Bartolini and R. Giorgi and J. Protic and C. A. Prete and M. Valero Parallel architecture and compilation techniques: selection of workshop papers, Guest Editors' introduction . . 9--12 Andrea Acquaviva and Luca Benini and Bruno Riccó Energy characterization of embedded real-time operating systems . . . . . . 13--18 M. Angels Moncusi and Alex Arenas and Jesus Labarta Improving energy saving in hard real time systems via a modified dual priority scheduling . . . . . . . . . . 19--24 Frank Vahid and Rilesh Patel and Greg Stitt Propagating constants past software to hardware peripherals in fixed-application embedded systems . . . 25--30 Vishal Aslot and Rudolf Eigenmann Performance characteristics of the SPEC OMP2001 benchmarks . . . . . . . . . . . 31--40 J. Mark Bull and Darragh O'Neill A microbenchmark suite for OpenMP 2.0 41--48 D. S. Nikolopoulos and E. Artiaga and E. Ayguadé and J. Labarta Exploiting memory affinity in OpenMP through schedule reuse . . . . . . . . . 49--55 Michael Sung and Ronny Krashinsky and Krste Asanovi\'c Multithreading decoupled architectures for complexity-effective general purpose computing . . . . . . . . . . . . . . . 56--61 Deependra Talla and Lizy K. John MediaBreeze: a decoupled architecture for accelerating multimedia applications 62--67 Tatsuo Nakajima A middleware component supporting flexible user interaction for networked home appliances . . . . . . . . . . . . 68--75 David Touzet and Jean-Marc Menaud and Frédéric Weis and Paul Couderc and Michel Banâtre SIDE surfer: enriching casual meetings with spontaneous information gathering 76--83 Erik R. Altman and David R. Kaeli Workshop on Binary Translation 2001 . . 84--85 Mark Thorson Internet Nuggets . . . . . . . . . . . . 86--90
Rajagopalan Desikan and Doug Burger and Stephen W. Keckler and Llorenc Cruz and Fernando Latorre and Antonio González and Mateo Valero Errata on ``Measuring Experimental Error in Microprocessor Simulation'' . . . . . 2--4 Fu-Chi Chang and Chia-Jiu Wang Architectural tradeoff in implementing RSA processors . . . . . . . . . . . . . 5--11 Augustus K. Uht Disjoint Eager Execution: what it is /what it is not . . . . . . . . . . . . 12--14 Mark Thorson Internet Nuggets . . . . . . . . . . . . 15--21
A. Hartstein and Thomas R. Puzak The optimum pipeline depth for a microprocessor . . . . . . . . . . . . . 7--13 M. S. Hrishikesh and Doug Burger and Norman P. Jouppi and Stephen W. Keckler and Keith I. Farkas and Premkishore Shivakumar The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays . . 14--24 Eric Sprangle and Doug Carmean Increasing processor performance by implementing deeper pipelines . . . . . 25--34 Dan Ernst and Todd Austin Efficient dynamic scheduling through tag elimination . . . . . . . . . . . . . . 37--46 Brian Fields and Rastislav Bodík and Mark D. Hill Slack: maximizing performance under technological constraints . . . . . . . 47--58 Alvin R. Lebeck and Jinson Koppanalil and Tong Li and Jaidev Patwardhan and Eric Rotenberg A large, fast instruction window for tolerating cache misses . . . . . . . . 59--70 Ho-Seop Kim and James E. Smith An instruction set and microarchitecture for instruction level distributed processing . . . . . . . . . . . . . . . 71--81 T. N. Vijaykumar and Irith Pomeranz and Karl Cheng Transient-fault recovery using simultaneous multithreading . . . . . . 87--98 Shubhendu S. Mukherjee and Michael Kontz and Steven K. Reinhardt Detailed design and evaluation of redundant multithreading alternatives 99--110 Milos Prvulovic and Zheng Zhang and Josep Torrellas ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors . . . . . 111--122 Daniel J. Sorin and Milo M. K. Martin and Mark D. Hill and David A. Wood SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery . . . . . . . 123--134 Seongmoo Heo and Kenneth Barr and Mark Hampton and Krste Asanovi\'c Dynamic fine-grain leakage reduction using leakage-biased bitlines . . . . . 137--147 Krisztián Flautner and Nam Sung Kim and Steve Martin and David Blaauw and Trevor Mudge Drowsy caches: simple techniques for reducing leakage power . . . . . . . . . 148--157 Anoop Iyer and Diana Marculescu Power and performance evaluation of globally asynchronous locally synchronous processors . . . . . . . . . 158--168 Yan Solihin and Jaejin Lee and Josep Torrellas Using a user-level memory thread for correlation prefetching . . . . . . . . 171--182 Jarrod A. Lewis and Bryan Black and Mikko H. Lipasti Avoiding initialization misses to the heap . . . . . . . . . . . . . . . . . . 183--194 Gokul B. Kandiraju and Anand Sivasubramaniam Going the distance for TLB prefetching: an application-driven study . . . . . . 195--206 Zhigang Hu and Stefanos Kaxiras and Margaret Martonosi Timekeeping in the memory system: predicting and optimizing memory behavior . . . . . . . . . . . . . . . . 209--220 Ilhyun Kim and Mikko H. Lipasti Implementing optimizations at decode time . . . . . . . . . . . . . . . . . . 221--232 Ashutosh S. Dhodapkar and James E. Smith Managing multi-configuration hardware via dynamic working set analysis . . . . 233--244 Philip Buonadonna and David Culler Queue pair IP: a hybrid architecture for system area networks . . . . . . . . . . 247--256 Yuanyuan Zhou and Angelos Bilas and Suresh Jagannathan and Cezary Dubnicki and James F. Philbin and Kai Li Experiences with VI communication for database storage . . . . . . . . . . . . 257--268 Alex Pajuelo and Antonio González and Mateo Valero Speculative dynamic vectorization . . . 271--280 Roger Espasa and Federico Ardanaz and Joel Emer and Stephen Felix and Julio Gago and Roger Gramunt and Isaac Hernandez and Toni Juan and Geoff Lowney and Matthew Mattina and André Seznec Tarantula: a vector extension to the Alpha architecture . . . . . . . . . . . 281--292 André Seznec and Stephen Felix and Venkata Krishnan and Yiannakis Sazeides Design tradeoffs for the Alpha EV8 conditional branch predictor . . . . . . 295--306 Robert S. Chappell and Francis Tseng and Adi Yoaz and Yale N. Patt Difficult-path branch prediction using subordinate microthreads . . . . . . . . 307--317 Steven E. Raasch and Nathan L. Binkert and Steven K. Reinhardt A scalable instruction queue design using dependence chains . . . . . . . . 318--329
Ken Steele and Jason Waterman and Eugene Weinstein The Oxygen H21 handheld . . . . . . . . 3--4 Diana Keen and Frederic T. Chong Hardware-software co-design of embedded sensor-actuator networks . . . . . . . . 5--6 Masaaki Kondo and Motonobu Fujita and Hiroshi Nakamura Software-controlled on-chip memory for high-performance and low-power computing 7--8 Ramendra K. Sahoo and Myung Bae and Jose Moreira Semi-hierarchical approach for reliability, availability, and serviceability of cellular systems . . . 9--10 Hans Eberle Monitoring and diagnosing computer systems by radio communication . . . . . 11--12 William Thies and Michal Karczmarek and Michael Gordon and David Maze and Jeremy Wong and Henry Hoffmann and Matthew Brown and Saman Amarasinghe A common machine language for grid-based architectures . . . . . . . . . . . . . 13--14 Frank Wang and Na Helian and Farhi Marir A novel associative memory architecture for quick matching . . . . . . . . . . . 15--16 Mike Parker A case for user-level interrupts . . . . 17--18 Martin Burtscher An improved index function for (D)FCM predictors . . . . . . . . . . . . . . . 19--24 Mark Thorson Internet Nuggets . . . . . . . . . . . . 25--26
I. G\`omez and L. Piñuel and M. Prieto and F. Tirado Analysis of simulation-adapted SPEC 2000 benchmarks . . . . . . . . . . . . . . . 4--10 Mark Thorson Internet Nuggets . . . . . . . . . . . . 11--16
Deborah Estrin Keynote address: Sensor network research: emerging challenges for architecture, systems, and languages . . 1--4 Ravi Rajwar and James R. Goodman Transactional lock-free execution of lock-based programs . . . . . . . . . . 5--17 José F. Martínez and Josep Torrellas Speculative synchronization: applying thread-level speculation to explicitly parallel applications . . . . . . . . . 18--29 Kevin M. Lepak and Mikko H. Lipasti Temporally silent stores . . . . . . . . 30--41 Timothy Sherwood and Erez Perelman and Greg Hamerly and Brad Calder Automatically characterizing large scale program behavior . . . . . . . . . . . . 45--57 Kazunori Ogata and Hideaki Komatsu and Toshio Nakatani Bytecode fetch optimization for a Java interpreter . . . . . . . . . . . . . . 58--67 Tao Li and Lizy Kurian John and Anand Sivasubramaniam and N. Vijaykrishnan and Juan Rubio Understanding and improving operating system effects in control flow prediction . . . . . . . . . . . . . . . 68--80 Philip Levis and David Culler Maté: a tiny virtual machine for sensor networks . . . . . . . . . . . . . . . . 85--95 Philo Juang and Hidekazu Oki and Yong Wang and Margaret Martonosi and Li Shiuan Peh and Daniel Rubenstein Energy-efficient computing for wildlife tracking: design tradeoffs and early experiences with ZebraNet . . . . . . . 96--107 Darko Kirovski and Milenko Drini\'c and Miodrag Potkonjak Enabling trusted software integrity . . 108--120 Heng Zeng and Carla S. Ellis and Alvin R. Lebeck and Amin Vahdat ECOSystem: managing energy as a first class operating system resource . . . . 123--132 Raksit Ashok and Saurabh Chheda and Csaba Andras Moritz Cool-Mem: combining statically speculative memory accessing with selective address translation for energy efficiency . . . . . . . . . . . . . . . 133--143 Ruchira Sasanka and Christopher J. Hughes and Sarita V. Adve Joint local and global hardware adaptations for energy . . . . . . . . . 144--155 Dongkeun Kim and Donald Yeung Design and evaluation of compiler algorithms for pre-execution . . . . . . 159--170 Antonia Zhai and Christopher B. Colohan and J. Gregory Steffan and Todd C. Mowry Compiler optimization of scalar value communication between speculative threads . . . . . . . . . . . . . . . . 171--183 Jeffrey Oplinger and Monica S. Lam Enhancing software reliability with speculative threads . . . . . . . . . . 184--196 J. Adam Butts and Guri Sohi Dynamic dead-instruction detection and elimination . . . . . . . . . . . . . . 199--210 Changkyu Kim and Doug Burger and Stephen W. Keckler An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches 211--222 Shubhendu S. Mukherjee and Federico Silla and Peter Bannon and Joel Emer and Steve Lang and David Webb A comparative study of arbitration algorithms for the Alpha 21364 pipelined router . . . . . . . . . . . . . . . . . 223--234 Hyong-youb Kim and Vijay S. Pai and Scott Rixner Increasing Web server throughput with network interface data caching . . . . . 239--250 Eddie Kohler and Robert Morris and Benjie Chen Programming language optimizations for modular router configurations . . . . . 251--263 Muthian Sivathanu and Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau Evolving RPC for active storage . . . . 264--276 Robert Cooksey and Stephan Jourdan and Dirk Grunwald A stateless, content-directed data prefetching mechanism . . . . . . . . . 279--290 Michael I. Gordon and William Thies and Michal Karczmarek and Jasper Lin and Ali S. Meli and Andrew A. Lamb and Chris Leger and Jeremy Wong and Henry Hoffmann and David Maze and Saman Amarasinghe A stream compiler for communication-exposed architectures . . 291--303 Emmett Witchel and Josh Cates and Krste Asanovi\'c Mondrian memory protection . . . . . . . 304--316
Jack B. Dennis Fresh Breeze: a multiprocessor chip architecture guided by modular programming principles . . . . . . . . . 7--15 D. Morano and A. Khalafi and D. R. Kaeli and A. K. Uht Realizing high IPC through a scalable memory-latency tolerant multipath microarchitecture . . . . . . . . . . . 16--25 George Almási and C\ualin Ca\cscaval and José G. Castaños and Monty Denneau and Derek Lieber and José E. Moreira and Henry S. Warren, Jr. Dissecting Cyclops: a detailed analysis of a multithreaded architecture . . . . 26--38 Mohamed M. Zahran On cache memory hierarchy for Chip-Multiprocessor . . . . . . . . . . 39--48 Gary Gréwal and Tom Wilson and Andrew Morton An EGA approach to the compile-time assignment of data to multiple memories in digital-signal processors . . . . . . 49--59 Ulrich Ramacher and Nico Brüs and Ulrich Hachmann and Jens Harnisch and Wolfgang Raab and Axel Techmer 100 GOPS vision processor for automotive applications . . . . . . . . . . . . . . 60--68 Nikos P. Pitsianis and Gerald G. Pechanek Indirect VLIW memory allocation for the ManArray multiprocessor DSP . . . . . . 69--74 Naohiko Shimizu and Ken Takatori A transparent Linux super page kernel for Alpha, Sparc64 and IA32: reducing TLB misses of applications . . . . . . . 75--84 Alessio Bechini and Pierfrancesco Foglia and Cosimo Antonio Prete Fine-grain design space exploration for a cartographic SoC multiprocessor . . . 85--92 Mark Thorson Internet Nuggets . . . . . . . . . . . . 93--96
Kevin Skadron and Mircea R. Stan and Wei Huang and Sivakumar Velusamy and Karthik Sankaranarayanan and David Tarjan Temperature-aware microarchitecture . . 2--13 Grigorios Magklis and Michael L. Scott and Greg Semeraro and David H. Albonesi and Steven Dropsho Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor . . . . . . . . . 14--27 Ilhyun Kim and Mikko H. Lipasti Half-price architecture . . . . . . . . 28--38 Il Park and Babak Falsafi and T. N. Vijaykumar Implicitly-multithreaded processors . . 39--51 Daniel Citron MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences . . . . . . . . 52--61 Jessica H. Tseng and Krste Asanovi\'c Banked multiported register files for high-frequency superscalar microprocessors . . . . . . . . . . . . 62--71 Michael D. Powell and T. N. Vijaykumar Pipeline damping: a microarchitectural technique to reduce inductive noise in supply voltage . . . . . . . . . . . . . 72--83 Roland E. Wunderlich and Thomas F. Wenisch and Babak Falsafi and James C. Hoe SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling . . . . . . . . . . . . . . . . 84--97 Mohamed Gomaa and Chad Scarbrough and T. N. Vijaykumar and Irith Pomeranz Transient-fault recovery for chip multiprocessors . . . . . . . . . . . . 98--109 Milos Prvulovic and Josep Torrellas ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes . . . . . . . . . . 110--121 Min Xu and Rastislav Bodik and Mark D. Hill A ``flight data recorder'' for enabling full-system multiprocessor deterministic replay . . . . . . . . . . . . . . . . . 122--135 Chuanjun Zhang and Frank Vahid and Walid Najjar A highly configurable cache architecture for embedded systems . . . . . . . . . . 136--146 Alper Buyuktosuno\uglu and Tejas Karkhanis and David H. Albonesi and Pradip Bose Energy efficient co-adaptive instruction fetch and issue . . . . . . . . . . . . 147--156 Michael C. Huang and Jose Renau and Josep Torrellas Positional adaptation of processors: application to energy reduction . . . . 157--168 Sudhanva Gurumurthi and Anand Sivasubramaniam and Mahmut Kandemir and Hubertus Franke DRPM: dynamic speed control for power management in server class disks . . . . 169--181 Milo M. K. Martin and Mark D. Hill and David A. Wood Token coherence: decoupling performance and correctness . . . . . . . . . . . . 182--193 Arjun Singh and William J. Dally and Amit K. Gupta and Brian Towles GOAL: a load-balanced adaptive routing algorithm for torus networks . . . . . . 194--205 Milo M. K. Martin and Pacia J. Harper and Daniel J. Sorin and Mark D. Hill and David A. Wood Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors . . . . 206--217 Zarka Cvetanovic Performance analysis of the Alpha 21364-based HP GS1280 multiprocessor . . 218--229 Paramjit S. Oberoi and Gurindar S. Sohi Parallelism in the front-end . . . . . . 230--240 André Seznec and Antony Fraboulet Effective ahead pipelining of instruction block address generation . . 241--252 Dan Ernst and Andrew Hamel and Todd Austin Cyclone: a broadcast-free dynamic instruction scheduler with selective replay . . . . . . . . . . . . . . . . . 253--263 Ravi Bhargava and Lizy K. John Improving dynamic cluster assignment for clustered trace cache processors . . . . 264--274 Rajeev Balasubramonian and Sandhya Dwarkadas and David H. Albonesi Dynamically managing the communication-parallelism trade-off in future clustered processors . . . . . . 275--287 Timothy Sherwood and George Varghese and Brad Calder A pipelined memory architecture for high throughput network processors . . . . . 288--299 Jahangir Hasan and Satish Chandra and T. N. Vijaykumar Efficient use of memory bandwidth to improve network processor throughput . . 300--313 Renju Thomas and Manoj Franklin and Chris Wilkerson and Jared Stark Improving branch prediction by dynamic dataflow-based identification of correlated branches from a large global history . . . . . . . . . . . . . . . . 314--323 Huiyang Zhou and Jill Flanagan and Thomas M. Conte Detecting global stride locality in value streams . . . . . . . . . . . . . 324--335 Timothy Sherwood and Suleyman Sair and Brad Calder Phase tracking and prediction . . . . . 336--349 Aravindh Anantaraman and Kiran Seth and Kaustubh Patil and Eric Rotenberg and Frank Mueller Virtual simple architecture (VISA): exceeding the complexity limit in safe real-time systems . . . . . . . . . . . 350--361 Marc L. Corliss and E. Christopher Lewis and Amir Roth DISE: a programmable macro engine for customizing applications . . . . . . . . 362--373 Mark Oskin and Frederic T. Chong and Isaac L. Chuang and John Kubiatowicz Building quantum wires: the long and the short of it . . . . . . . . . . . . . . 374--387 Zhenlin Wang and Doug Burger and Kathryn S. McKinley and Steven K. Reinhardt and Charles C. Weems Guided region prefetching: a cooperative hardware/software approach . . . . . . . 388--398 Christos Kozyrakis and David Patterson Overcoming the limitations of conventional vector processors . . . . . 399--409 Jinwoo Suh and Eun-Gyu Kim and Stephen P. Crago and Lakshmi Srinivasan and Matthew C. French A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels . . . . . . . . . . . . . . . . 410--421 Karthikeyan Sankaralingam and Ramadass Nagarajan and Haiming Liu and Changkyu Kim and Jaehyuk Huh and Doug Burger and Stephen W. Keckler and Charles R. Moore Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture . . . . 422--433 Michael K. Chen and Kunle Olukotun The Jrpm system for dynamically parallelizing Java programs . . . . . . 434--446
Anthony S. Fong A computer architecture with access control and cache option tags on individual instruction operands . . . . 1--5 Edwin J. Tan and Wendi B. Heinzelman DSP architectures: past, present and futures . . . . . . . . . . . . . . . . 6--19 Lucian N. Vintan and Marius Sbera and Ioan Z. Mihu and Adrian Florea An alternative to branch prediction: pre-computed branches . . . . . . . . . 20--29 Mark Heinrich and Mainak Chaudhuri Ocean warning: avoid drowning . . . . . 30--32 Jean-Louis Lafitte Qualitatively matching computer architecture with Turing machine . . . . 33--41 Takenori Koushiro and Toshinori Sato and Itsujiro Arita A trace-level value predictor for Contrail processors . . . . . . . . . . 42--47 Mark Thorson Internet Nuggets . . . . . . . . . . . . 48--54
Mikkel Thorup Combinatorial power in multimedia processors . . . . . . . . . . . . . . . 5--11 Gary K. W. Hau and Anthony Fong and Mok Pak Lun Support of Java API for the jHISC system 12--17 Mok Pak Lun and Richard Li and Anthony Fong Method manipulation in an object-oriented processor . . . . . . . 18--25 Mark Thorson Internet Nuggets . . . . . . . . . . . . 26--32
Kristopher C. Breen and Duncan G. Elliott Aliasing and anti-aliasing in branch history table prediction . . . . . . . . 1--4 Ryan W. S. Yu and Gary K. W. Hau and Anthony S. Fong Test bench for software development of object-oriented processor . . . . . . . 5--9 Mok Pak Lun and Anthony Fong and Gary K. W. Hau Object-oriented processor requirements with instruction analysis of Java programs . . . . . . . . . . . . . . . . 10--15 Mark Thorson Internet Nuggets . . . . . . . . . . . . 16--21
Lizy Kurian John More on finding a single number to indicate overall performance of a benchmark suite . . . . . . . . . . . . 3--8 Mark Thorson Internet Nuggets . . . . . . . . . . . . 9--13
Michael Bedford Taylor and Walter Lee and Jason Miller and David Wentzlaff and Ian Bratt and Ben Greenwald and Henry Hoffmann and Paul Johnson and Jason Kim and James Psota and Arvind Saraf and Nathan Shnidman and Volker Strumpen and Matt Frank and Saman Amarasinghe and Anant Agarwal Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams . . . . . . . . . . . . . . 2--2 Anonymous General Co-Chair's Message . . . . . . . 9--9 Anonymous Program Chair's Message . . . . . . . . 10--10 Anonymous Committees . . . . . . . . . . . . . . . 11--11 Anonymous Reviewers . . . . . . . . . . . . . . . 13--13 Jung Ho Ahn and William J. Dally and Brucek Khailany and Ujval J. Kapasi and Abhishek Das Evaluating the Imagine Stream Architecture . . . . . . . . . . . . . . 14--14 John W. Sias and Sain-zee Ueng and Geoff A. Kent and Ian M. Steiner and Erik M. Nystrom and Wen-mei W. Hwu Field-testing IMPACT EPIC research results in Itanium 2 . . . . . . . . . . 26--26 T. N. Vijaykumar and Zeshan Chishti Wire Delay is Not a Problem for SMT (In the Near Future) . . . . . . . . . . . . 40--40 Ronny Krashinsky and Christopher Batten and Mark Hampton and Steve Gerding and Brian Pharris and Jared Casper and Krste Asanovic The Vector-Thread Architecture . . . . . 52--52 Rakesh Kumar and Dean M. Tullsen and Parthasarathy Ranganathan and Norman P. Jouppi and Keith I. Farkas Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance . . . . . . . . . . . . . . 64--64 Yuan Chou and Brian Fahs and Santosh Abraham Microarchitecture Optimizations for Exploiting Memory-Level Parallelism . . 76--76 Harold W. Cain and Mikko H. Lipasti Memory Ordering: a Value-Based Approach 90--90 Lance Hammond and Vicky Wong and Mike Chen and Brian D. Carlstrom and John D. Davis and Ben Hertzberg and Manohar K. Prabhu and Honggo Wijaya and Christos Kozyrakis and Kunle Olukotun Transactional Memory Coherence and Consistency . . . . . . . . . . . . . . 102--102 Sudheendra Hangal and Durgam Vahia and Chaiyasit Manovit and Juin-Yeu Joseph Lu TSOtool: a Program for Verifying Memory Systems Using the Memory Consistency Model . . . . . . . . . . . . . . . . . 114--114 Mainak Chaudhuri and Mark Heinrich SMTp: An Architecture for Next-generation Scalable Multi-threading 124--124 Christopher J. Hughes and Sarita V. Adve A Formal Approach to Frequent Energy Adaptations for Multimedia Applications 138--138 John Oliver and Ravishankar Rao and Paul Sultana and Jedidiah Crandall and Erik Czernikowski and Leslie W. Jones IV and Diana Franklin and Venkatesh Akella and Frederic T. Chong Synchroscalar: a Multiple Clock Domain, Power-Aware, Tile-Based Embedded Processor . . . . . . . . . . . . . . . 150--150 Roni Rosner and Yoav Almog and Micha Moffie and Naftali Schwartz and Avi Mendelson Power Awareness through Selective Dynamically Optimized Traces . . . . . . 162--162 Lakshmi N. Bairavasundaram and Muthian Sivathanu and Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau X-RAY: a Non-Invasive Exclusive Caching Mechanism for RAIDs . . . . . . . . . . 176--176 Robert Mullins and Andrew West and Simon Moore Low-Latency Virtual-Channel Routers for On-Chip Networks . . . . . . . . . . . . 188--188 V. Puente and J. A. Gregorio and F. Vallejo and R. Beivide Immunet: a Cheap and Robust Fault-Tolerant Packet Routing Mechanism 198--198 Alaa R. Alameldeen and David A. Wood Adaptive Cache Compression for High-Performance Processors . . . . . . 212--212 Pin Zhou and Feng Qin and Wei Liu and Yuanyuan Zhou and Josep Torrellas iWatcher: Efficient Architectural Support for Software Debugging . . . . . 224--224 Sami Yehia and Olivier Temam From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation 238--238 Ayose Falcon and Jared Stark and Alex Ramirez and Konrad Lai and Mateo Valero Prophet/Critic Hybrid Branch Prediction 250--250 Christopher Weaver and Joel Emer and Shubhendu S. Mukherjee and Steven K. Reinhardt Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor . . 264--264 Jayanth Srinivasan and Sarita V. Adve and Pradip Bose and Jude A. Rivers The Case for Lifetime Reliability-Aware Microprocessors . . . . . . . . . . . . 276--276 Michael D. Powell and T. N. Vijaykumar Exploiting Resonant Behavior to Reduce Inductive Noise . . . . . . . . . . . . 288--288 J. Adam Butts and Gurindar S. Sohi Use-Based Register Caching with Decoupled Indexing . . . . . . . . . . . 302--302 Gonzalez Gonzalez and Adrian Cristal and Daniel Ortega and Alexander Veidenbaum and Mateo Valero A Content Aware Integer Register File Organization . . . . . . . . . . . . . . 314--314 Mikko H. Lipasti and Brian R. Mestan and Erika Gunadi Physical Register Inlining . . . . . . . 325--325 Tejas S. Karkhanis and James E. Smith A First-Order Superscalar Processor Model . . . . . . . . . . . . . . . . . 338--338 Lieven Eeckhout and Robert H. Bell Jr. and Bastiaan Stougie and Koen De Bosschere and Lizy K. John Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies . . . . . . . . 350--350 Bharath Iyer and Sadagopan Srinivasan and Bruce Jacob Extended Split-Issue: Enabling Flexibility in the Hardware Implementation of NUAL VLIW DSPs . . . . 364--364 Angshuman Parashar and Sudhanva Gurumurthi and Anand Sivasubramaniam A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy 376--376 Anonymous Author Index . . . . . . . . . . . . . . 387--387
Adrián Cristal and José F. Martínez and Josep Llosa and Mateo Valero A case for resource-conscious out-of-order processors: towards kilo-instruction in-flight processors 3--10 Partha Kundu and Murali Annavaram and Trung Diep and John Shen A case for shared instruction cache on chip multiprocessors running OLTP . . . 11--18 N. Venkateswaran and Waran Research Foundation and Aditya Krishnan and S. Niranjan Kumar and Arrvindh Shriraman and Srinivas Sridharan Memory in processor: a novel design paradigm for supercomputing architectures . . . . . . . . . . . . . 19--26 I. Branovic and R. Giorgi and E. Martinelli A workload characterization of elliptic curve cryptography methods in embedded environments . . . . . . . . . . . . . . 27--34 K. Brifault and H. P. Charles Data cache management on EPIC architecture: optimizing memory access for image processing . . . . . . . . . . 35--42 Naohiko Shimizu and Chiaki Kon Java object look aside buffer for embedded applications . . . . . . . . . 43--49 Akihito Sakanaka and Seiichirou Fujii and Toshinori Sato A leakage-energy-reduction technique for highly-associative caches in embedded systems . . . . . . . . . . . . . . . . 50--54 S. Moch and M. Berekovi\'c and H. J. Stolberg and L. Friebe and M. B. Kulaczewski and A. Dehnhardt and P. Pirsch HIBRID-SOC: a multi-core architecture for image and video applications . . . . 55--61 Mladen Berekovic and Sören Moch and Peter Pirsch A scalable, clustered SMT processor for digital signal processing . . . . . . . 62--69 S. Bartolini and C. A. Prete A proposal for input-sensitivity analysis of profile-driven optimizations on embedded applications . . . . . . . . 70--77 Mark Thorson Internet Nuggets . . . . . . . . . . . . 78--83
John R. Mashey War of the benchmark means: time for a truce . . . . . . . . . . . . . . . . . 1--14 Jean-Louis Lafitte 40 years later \ldots a new engine to handle an operating system infrastructure . . . . . . . . . . . . . 15--22 Mark Thorson Internet Nuggets . . . . . . . . . . . . 23--41
Lance Hammond and Brian D. Carlstrom and Vicky Wong and Ben Hertzberg and Mike Chen and Christos Kozyrakis and Kunle Olukotun Programming with transactional coherence and consistency (TCC) . . . . . . . . . 1--13 Mihai Budiu and Girish Venkataramani and Tiberiu Chelcea and Seth Copen Goldstein Spatial computation . . . . . . . . . . 14--26 Virantha Ekanayake and Clinton Kelly IV and Rajit Manohar An ultra low-power processor for sensor networks . . . . . . . . . . . . . . . . 27--36 Christopher R. Lumb and Richard Golding D-SPTF: decentralized request distribution in brick-based storage systems . . . . . . . . . . . . . . . . 37--47 Yasushi Saito and Svend Fròlund and Alistair Veitch and Arif Merchant and Susan Spence FAB: building distributed enterprise disk arrays from commodity components 48--58 Timothy E. Denehy and John Bent and Florentina I. Popovici and Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau Deconstructing storage arrays . . . . . 59--71 Xiaotong Zhuang and Tao Zhang and Santosh Pande HIDE: an infrastructure for efficiently protecting information leakage on the address bus . . . . . . . . . . . . . . 72--84 G. Edward Suh and Jae W. Lee and David Zhang and Srinivas Devadas Secure program execution via dynamic information flow tracking . . . . . . . 85--96 Jaehyuk Huh and Jichuan Chang and Doug Burger and Gurindar S. Sohi Coherence decoupling: making use of incoherence . . . . . . . . . . . . . . 97--106 Srikanth T. Srinivasan and Ravi Rajwar and Haitham Akkary and Amit Gandhi and Mike Upton Continual flow pipelines . . . . . . . . 107--119 Rajagopalan Desikan and Simha Sethumadhavan and Doug Burger and Stephen W. Keckler Scalable selective re-execution for EDGE architectures . . . . . . . . . . . . . 120--132 John Regehr and Alastair Reid HOIST: a system for automatically deriving static analyzers for embedded systems . . . . . . . . . . . . . . . . 133--143 Perry H. Wang and Jamison D. Collins and Hong Wang and Dongkeun Kim and Bill Greene and Kai-Ming Chan and Aamir B. Yunus and Terry Sych and Stephen F. Moore and John P. Shen Helper threads via virtual multithreading on an experimental Itanium-2 processor-based platform . . . 144--155 Matthias Hauswirth and Trishul M. Chilimbi Low-overhead memory leak detection using adaptive statistical profiling . . . . . 156--164 Xipeng Shen and Yutao Zhong and Chen Ding Locality phase prediction . . . . . . . 165--176 Pin Zhou and Vivek Pandey and Jagadeesan Sundaresan and Anand Raghuraman and Yuanyuan Zhou and Sanjeev Kumar Dynamic tracking of page miss ratio curve for memory management . . . . . . 177--188 Rodric M. Rabbah and Hariharan Sandanagobalane and Mongkol Ekpanyapong and Weng-Fai Wong Compiler orchestrated prefetching via speculation and predication . . . . . . 189--198 Chen-Yong Cher and Antony L. Hosking and T. N. Vijaykumar Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign . . . . . . . . . 199--210 David E. Lowell and Yasushi Saito and Eileen J. Samberg Devirtualizable virtual machines enabling general, single-node, online maintenance . . . . . . . . . . . . . . 211--223 Jared C. Smolens and Brian T. Gold and Jangwoo Kim and Babak Falsafi and James C. Hoe and Andreas G. Nowatzyk Fingerprinting: bounding soft-error detection latency and bandwidth . . . . 224--234 Greg Bronevetsky and Daniel Marques and Keshav Pingali and Peter Szwed and Martin Schulz Application-level checkpointing for shared memory programs . . . . . . . . . 235--247 Qiang Wu and Philo Juang and Margaret Martonosi and Douglas W. Clark Formal online methods for voltage/frequency control in multiple clock domain microprocessors . . . . . . 248--259 Mohamed Gomaa and Michael D. Powell and T. N. Vijaykumar Heat-and-run: leveraging SMT and CMP to manage power density through the operating system . . . . . . . . . . . . 260--270 Xiaodong Li and Zhenmin Li and Francis David and Pin Zhou and Yuanyuan Zhou and Sarita Adve and Sanjeev Kumar Performance directed energy management for main memory and disks . . . . . . . 271--283
David M. Chess Security in autonomic computing . . . . 2--5 Weidong Shi and Hsien-Hsin S. Lee and Chenghuai Lu and Mrinmoy Ghosh Towards the issues in architectural support for protection of software execution . . . . . . . . . . . . . . . 6--15 John P. McGregor and Ruby B. Lee Protecting cryptographic keys and computations via virtual secure coprocessing . . . . . . . . . . . . . . 16--26 Brian Rogers and Yan Solihin and Milos Prvulovic Memory predecryption: hiding the latency overhead of memory encryption . . . . . 27--33 David A. Holland and Ada T. Lim and Margo I. Seltzer An architecture a day keeps the hacker away . . . . . . . . . . . . . . . . . . 34--41 Stelios Sidiroglou and Michael E. Locasto and Angelos D. Keromytis Hardware support for self-healing software services . . . . . . . . . . . 42--47 Jedidiah R. Crandall and Frederic T. Chong A security assessment of the Minos architecture . . . . . . . . . . . . . . 48--57 Matthew Burnside and Angelos D. Keromytis The case for crypto protocol awareness inside the OS kernel . . . . . . . . . . 58--64 Marc L. Corliss and E. Christopher Lewis and Amir Roth Using DISE to protect return addresses from attack . . . . . . . . . . . . . . 65--72 Dong Ye and David Kaeli A reliable return address stack: microarchitectural features to defeat stack smashing . . . . . . . . . . . . . 73--80 Koji Inoue Energy-security tradeoff in a secure cache architecture against buffer overflow attacks . . . . . . . . . . . . 81--89 Derek Uluski and Micha Moffie and David Kaeli Characterizing antivirus workload execution . . . . . . . . . . . . . . . 90--98 Monther Aldwairi and Thomas Conte and Paul Franzon Configurable string matching hardware for speeding up intrusion detection . . 99--107 Milena Milenkovi\'c and Aleksandar Milenkovi\'c and Emil Jovanov Using instruction block signatures to counter code injection attacks . . . . . 108--117 Youtao Zhang and Jun Yang and Yongjing Lin and Lan Gao Architectural support for protecting user privacy on trusted processors . . . 118--123 Masaaki Shirase and Yasushi Hibino An architecture for elliptic curve cryptography computation . . . . . . . . 124--133 Taeho Kgil and Laura Falk and Trevor Mudge ChipLock: support for secure microarchitectures . . . . . . . . . . . 134--143 Magnus Ekman and Fredrik Warg and Jim Nilsson An in-depth look at computer performance growth . . . . . . . . . . . . . . . . . 144--147 N. Venkateswaran and S. Balaji and V. Sridhar Fault tolerant bus architecture for deep submicron based processors . . . . . . . 148--155 Mark Thorson Internet Nuggets . . . . . . . . . . . . 156--160
Ruby B. Lee and Peter C. S. Kwan and John P. McGregor and Jeffrey Dwoskin and Zhenghong Wang Architecture for Protecting Critical Secrets in Microprocessors . . . . . . . 2--13 Anonymous General Chair's Message . . . . . . . . 9--9 Anonymous Program Chair's Message . . . . . . . . x--xv Weidong Shi and Hsien-Hsin S. Lee and Mrinmoy Ghosh and Chenghuai Lu and Alexandra Boldyreva High Efficiency Counter Mode Security Architecture via Prediction and Precomputation . . . . . . . . . . . . . 14--24 Anonymous Committees . . . . . . . . . . . . . . . 16--16 Anonymous Reviewers . . . . . . . . . . . . . . . xvii--xviii G. Edward Suh and Charles W. O'Donnell and Ishan Sachdev and Srinivas Devadas Design and Implementation of the AEGIS Single-Chip Secure Processor Using Physical Random Functions . . . . . . . 25--36 Sudhanva Gurumurthi and Anand Sivasubramaniam and Vivek K. Natarajan Disk Drive Roadmap from the Thermal Perspective: a Case for Dynamic Thermal Management . . . . . . . . . . . . . . . 38--49 Ram Huggahalli and Ravi Iyer and Scott Tetrick Direct Cache Access for High Bandwidth Network I/O . . . . . . . . . . . . . . 50--59 Haryadi S. Gunawi and Nitin Agrawal and Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau and Jiri Schindler Deconstructing Commodity Storage Clusters . . . . . . . . . . . . . . . . 60--71 Magnus Ekman and Per Stenström A Robust Main-Memory Compression Scheme 74--85 Brian Fahs and Todd Rafacz and Sanjay J. Patel and Steven S. Lumetta Continuous Optimization . . . . . . . . 86--97 Vlad Petric and Tingting Sha and Amir Roth RENO: a Rename-Based Instruction Optimizer . . . . . . . . . . . . . . . 98--109 Lin Tan and Timothy Sherwood A High Throughput String Matching Architecture for Intrusion Detection and Prevention . . . . . . . . . . . . . . . 112--122 Florin Baboescu and Dean M. Tullsen and Grigore Rosu and Sumeet Singh A Tree Based Router Search Engine Architecture with Single Port Memories 123--133 Shorin Kyo and Shin'ichiro Okazaki and Tamio Arai An Integrated Memory Array Processor Architecture for Embedded Image Recognition Systems . . . . . . . . . . 134--145 George A. Reis and Jonathan Chang and Neil Vachharajani and Ram Rangan and David I. August and Shubhendu S. Mukherjee Design and Evaluation of Hybrid Fault-Detection Systems . . . . . . . . 148--159 Ethan Schuchman and T. N. Vijaykumar Rescue: a Microarchitecture for Testability and Defect Tolerance . . . . 160--171 Mohamed A. Gomaa and T. N. Vijaykumar Opportunistic Transient-Fault Detection 172--183 Steven Balensiefer and Lucas Kregor-Stickles and Mark Oskin An Evaluation Framework and Instruction Set Architecture for Ion-Trap Based Quantum Micro-Architectures . . . . . . 186--196 Leyla Nazhandali and Bo Zhai and Javin Olson and Anna Reeves and Michael Minuth and Ryan Helfand and Sanjay Pant and Todd Austin and David Blaauw Energy Optimization of Subthreshold-Voltage Sensor Network Processors . . . . . . . . . . . . . . . 197--207 Mark Hempstead and Nikhil Tripathi and Patrick Mauro and Gu-Yeon Wei and David Brooks An Ultra Low Power System Architecture for Sensor Network Applications . . . . 208--219 Thomas F. Wenisch and Stephen Somogyi and Nikolaos Hardavellas and Jangwoo Kim and Anastassia Ailamaki and Babak Falsafi Temporal Streaming of Shared Memory . . 222--233 Andreas Moshovos RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence . . . . 234--245 Jason F. Cantin and Mikko H. Lipasti and James E. Smith Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking . . 246--257 Stephen Hines and Joshua Green and Gary Tyson and David Whalley Improving Program Efficiency by Packing Instructions into Registers . . . . . . 260--271 Nathan Clark and Jason Blome and Michael Chu and Scott Mahlke and Stuart Biles and Krisztian Flautner An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors . . 272--283 Satish Narayanasamy and Gilles Pokam and Brad Calder BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging . . . . . . . . . . . . . . . 284--295 Murali Annavaram and Ed Grochowski and John Shen Mitigating Amdahl's Law through EPI Throttling . . . . . . . . . . . . . . . 298--309 Emil Talpes and Diana Marculescu Increased Scalability and Power Efficiency by Using Multiple Speed Pipelines . . . . . . . . . . . . . . . 310--321 Vlad Petric and Amir Roth Energy-Effectiveness of Pre-Execution and Energy-Aware P-Thread Selection . . 322--333 Michael Zhang and Krste Asanovic Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors . . . . . . . . . . . . 336--345 Evan Speight and Hazim Shafi and Lixin Zhang and Ram Rajamony Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors . . . . . . . . . . . . 346--356 Zeshan Chishti and Michael D. Powell and T. N. Vijaykumar Optimizing Replication, Communication, and Capacity Allocation in CMPs . . . . 357--368 Onur Mutlu and Hyesoon Kim and Yale N. Patt Techniques for Efficient Processing in Runahead Execution Engines . . . . . . . 370--381 Daniel A. Jimenez Piecewise Linear Branch Prediction . . . 382--393 Andre Seznec Analysis of the O-GEometric History Length Branch Predictor . . . . . . . . 394--405 Rakesh Kumar and Victor Zyuban and Dean M. Tullsen Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling . . . . . . . . . 408--419 John Kim and William J. Dally and Brian Towles and Amit K. Gupta Microarchitecture of a High-Radix Router 420--431 Daeho Seo and Akif Ali and Won-Taek Lim and Nauman Rafique and Mithuna Thottethodi Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks . . . . . . . . . . . . . . . . 432--443 Amit Gandhi and Haitham Akkary and Ravi Rajwar and Srikanth T. Srinivasan and Konrad Lai Scalable Load and Store Processing in Latency Tolerant Processors . . . . . . 446--457 Amir Roth Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization . . . . . . . . . . . . . . 458--468 E. F. Torres and P. Ibanez and V. Vinals and J. M. Llaberia Store Buffer Design in First-Level Multibanked Data Caches . . . . . . . . 469--480 Albert Meixner and Daniel J. Sorin Dynamic Verification of Sequential Consistency . . . . . . . . . . . . . . 482--493 Ravi Rajwar and Maurice Herlihy and Konrad Lai Virtualizing Transactional Memory . . . 494--505 Saisanthosh Balakrishnan and Ravi Rajwar and Mike Upton and Konrad Lai The Impact of Performance Asymmetry in Emerging Multicore Architectures . . . . 506--517 Jayanth Srinivasan and Sarita V. Adve and Pradip Bose and Jude A. Rivers Exploiting Structural Duplication for Lifetime Reliability Enhancement . . . . 520--531 Arijit Biswas and Paul Racunas and Razvan Cheveresan and Joel Emer and Shubhendu S. Mukherjee and Ram Rangan Computing Architectural Vulnerability Factors for Address-Based Structures . . 532--543 Moinuddin K. Qureshi and David Thompson and Yale N. Patt The V-Way Cache: Demand Based Associativity via Global Replacement . . 544--555 Anonymous Author Index . . . . . . . . . . . . . . 556--557
S. Bartolini and P. Foglia and C. A. Prete Guests editors' introduction . . . . . . 1--2 Hanene Ben Fradj and Asmaa el Ouardighi and Cécile Belleudy and Michel Auguin Energy aware memory architecture configuration . . . . . . . . . . . . . 3--9 Hyo-Joong Suh and Sung Woo Chung DRACO: optimized CC-NUMA system with novel dual-link interconnections to reduce the memory latency . . . . . . . 10--16 Sami Yehia and Jean-François Collard and Olivier Temam Load squared: adding logic close to memory to reduce the latency of indirect loads with high miss ratios . . . . . . 17--24 Hiroaki Kobayashi and Isao Kotera and Hiroyuki Takizawa Locality analysis to control dynamically way-adaptable caches . . . . . . . . . . 25--32 F. Arakawa and M. Ishikawa and Y. Kondo and T. Kamei and M. Ozawa and O. Nishii and T. Hattori SH-X: an embedded processor core for consumer appliances . . . . . . . . . . 33--40 Afrin Naz and Mehran Rezaei and Krishna Kavi and Philip Sweany Improving data cache performance with integrated use of split caches, victim cache and stream buffers . . . . . . . . 41--48 Alex Pajuelo and Antonio González and Mateo Valero Speculative execution for hiding memory latency . . . . . . . . . . . . . . . . 49--56 Javier Verdú and Jorge García and Mario Nemirovsky and Mateo Valero The impact of traffic aggregation on the memory performance of networking applications . . . . . . . . . . . . . . 57--62 Bramha Allu and Wei Zhang Exploiting the replication cache to improve performance for multiple-issue microprocessors . . . . . . . . . . . . 63--71 Mark Thorson Internet nuggets . . . . . . . . . . . . 72--74 Anonymous MEDEA 2004 workshop . . . . . . . . . . ??
Norman P. Jouppi and Rakesh Kumar and Dean Tullsen Introduction to the special issue on the 2005 Workshop on Design, Analysis, and Simulation of Chip Multiprocessors (dasCMP'05) . . . . . . . . . . . . . . 4--4 James Laudon Performance/Watt: the new server focus 5--13 John D. Davis and Cong Fu and James Laudon The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors 14--23 Lisa Hsu and Ravi Iyer and Srihari Makineni and Steve Reinhardt and Donald Newell Exploring the cache design space for large scale CMPs . . . . . . . . . . . . 24--33 John D. Davis and Stephen E. Richardson and Charis Charitsis and Kunle Olukotun A chip prototyping substrate: the flexible architecture for simulation and testing (FAST) . . . . . . . . . . . . . 34--43 Neil Vachharajani and Matthew Iyer and Chinmay Ashok and Manish Vachharajani and David I. August and Daniel Connors Chip multi-processor scalability for single-threaded applications . . . . . . 44--53 Julia Chen and Philo Juang and Kevin Ko and Gilberto Contreras and David Penry and Ram Rangan and Adam Stoler and Li-Shiuan Peh and Margaret Martonosi Hardware-modulated parallelism in chip multiprocessors . . . . . . . . . . . . 54--63 Jack Sampson and Rubén González and Jean-François Collard and Norman P. Jouppi and Mike Schlansker Fast synchronization for chip multiprocessors . . . . . . . . . . . . 64--69 Anahita Shayesteh and Glenn Reinman and Norman Jouppi and Suleyman Sair and Tim Sherwood Dynamically configurable shared CMP helper engines for improved performance 70--79 Theofanis Constantinou and Yiannakis Sazeides and Pierre Michaud and Damien Fetis and Andre Seznec Performance implications of single thread migration on a chip multi-core 80--91 Milo M. K. Martin and Daniel J. Sorin and Bradford M. Beckmann and Michael R. Marty and Min Xu and Alaa R. Alameldeen and Kevin E. Moore and Mark D. Hill and David A. Wood Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset 92--99 David Wang and Brinda Ganesh and Nuengwong Tuaycharoen and Kathleen Baynes and Aamer Jaleel and Bruce Jacob DRAMsim: a memory system simulator . . . 100--107 Barry Rountree and Robert Springer and David K. Lowenthal and Vincent W. Freeh Notes from HPPAC 2005 . . . . . . . . . 108--112 H. C. Wang and C. K. Yuen A general framework to build new CPUs by mapping abstract machine code to instruction level parallel execution hardware . . . . . . . . . . . . . . . . 113--120 Nana B. Sam and Martin Burtscher Improving memory system performance with energy-efficient value speculation . . . 121--127 Mark Thorson Internet Nuggets . . . . . . . . . . . . 128--133
David Kaeli and Robert Cohn WBIA'05: Introduction to the special issue . . . . . . . . . . . . . . . . . 1--2 Chunling Hu and John McCabe and Daniel A. Jiménez and Ulrich Kremer The Camino Compiler infrastructure . . . 3--8 Martin Schulz and Dong Ahn and Andrew Bernat and Bronis R. de Supinski and Steven Y. Ko and Gregory Lee and Barry Rountree Scalable dynamic binary instrumentation for Blue Gene/L . . . . . . . . . . . . 9--14 Edson Borin and Cheng Wang and Youfeng Wu and Guido Araujo Dynamic binary control-flow errors detection . . . . . . . . . . . . . . . 15--20 Micha Moffie and David Kaeli ASM: application security monitor . . . 21--26 Qin Zhao and Rodric Rabbah and Weng-Fai Wong Dynamic memory optimization using pool allocation and prefetching . . . . . . . 27--32 Xiaofeng Gao and Beth Simon and Allan Snavely ALITER: an asynchronous lightweight instrumentation tool for event recording 33--38 Collin McCurdy and Charles Fischer Using Pin as a memory reference generator for multiprocessor simulation 39--44 Heidi Pan and Krste Asanovi\'c and Robert Cohn and Chi-Keung Luk Controlling program execution through binary instrumentation . . . . . . . . . 45--50 Nikrouz Faroughi Profiling of parallel processing programs on shared memory multiprocessors using Simics . . . . . . 51--56 Naveen Kumar and Ramesh Peri Transparent debugging of dynamically instrumented programs . . . . . . . . . 57--62 Laune C. Harris and Barton P. Miller Practical analysis of stripped binary code . . . . . . . . . . . . . . . . . . 63--68 Vijay Janapa Reddi and Dan Connors and Robert S. Cohn Persistence in dynamic code transformation systems . . . . . . . . . 69--74 Ram Srinivasan and Olaf Lubeck MonteSim: a Monte Carlo performance model for in-order microarchitectures 75--80 Michael Laurenzano and Beth Simon and Allan Snavely and Meghan Gunn Low cost trace-driven memory simulation using SimPoint . . . . . . . . . . . . . 81--86 Mark Thorson Internet Nuggets . . . . . . . . . . . . 87--93
S. Bartolini and P. Foglia and R. Giorgi and C. A. Prete Memory performance: dealing with applications, systems and architecture 1--2 Scott Friedman and Praveen Krishnamurthy and Roger Chamberlain and Ron K. Cytron and Jason E. Fritts Dusty caches for reference counting garbage collection . . . . . . . . . . . 3--10 Subramanian Ramaswamy and Jaswanth Sreeram and Sudhakar Yalamanchili and Krishna V. Palem Data trace cache: an application specific cache architecture . . . . . . 11--18 Afrin Naz and Krishna Kavi and Mehran Rezaei and Wentong Li Making a case for split data caches for embedded applications . . . . . . . . . 19--26 B. Allu and W. Zhang and M. Kandala Exploiting the replication cache to improve cache read bandwidth cost effectively . . . . . . . . . . . . . . 27--32 Matteo Monchiero and Gianluca Palermo and Cristina Silvano and Oreste Villa An efficient synchronization technique for multiprocessor systems on-chip . . . 33--40 Farshad Khunjush and Nikitas J. Dimopoulos Hiding message delivery and reducing memory access latency by providing direct-to-cache transfer during receive operations in a message passing environment . . . . . . . . . . . . . . 41--48 Yao Yue and Chuang Lin and Zhangxi Tan NPCryptBench: a cryptographic benchmark suite for network processors . . . . . . 49--56 Abelardo López-Lagunas and Sek M. Chai Memory bandwidth optimization through stream descriptors . . . . . . . . . . . 57--64 Akihiro Chiyonobu and Toshinori Sato Energy-efficient instruction scheduling utilizing cache miss information . . . . 65--70 Alessandro Bardine and Alessio Bechini and Pierfrancesco Foglia and Cosimo Antonio Prete Analysis of embedded video coder systems: a system-level approach . . . . 71--76 Alex Gontmakher and Assaf Schuster and Avi Mendelson Inthreads: a low granularity parallelization model . . . . . . . . . 77--80 Mark Thorson Internet nuggets . . . . . . . . . . . . 81--86
Yale Patt Computer Architecture Research and Future Microprocessors: Where Do We Go from Here? . . . . . . . . . . . . . . . 2--2 Jongman Kim and Chrysostomos Nicopoulos and Dongkook Park A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks . . . 4--15 Anonymous Message from the General Chair . . . . . 10--10 Anonymous Message from the Program Chair . . . . . 11--11 Anonymous Reviewers . . . . . . . . . . . . . . . 14--14 Steve Scott and Dennis Abts and John Kim and William J. Dally The BlackWidow High-Radix Clos Network 16--28 Anonymous SIGARCH Guidelines . . . . . . . . . . . 17--17 Arvind Arvind and Jan-Willem Maessen Memory Model $=$ Instruction Reordering $+$ Store Atomicity . . . . . . . . . . 29--40 Christoph von Praun and Harold W. Cain and Jong-Deok Choi and Kyung Dong Ryu Conditional Memory Ordering . . . . . . 41--52 Austen McDonald and JaeWoong Chung and Brian D. Carlstrom and Chi Cao Minh and Hassan Chafi and Christos Kozyrakis and Kunle Olukotun Architectural Semantics for Practical Transactional Memory . . . . . . . . . . 53--65 Parthasarathy Ranganathan and Phil Leech and David Irwin and Jeffrey Chase Ensemble-level Power Management for Dense Blade Servers . . . . . . . . . . 66--77 James Donald and Margaret Martonosi Techniques for Multicore Thermal Management: Classification and New Exploration . . . . . . . . . . . . . . 78--88 Yuan Lin and Hyunseok Lee and Mark Woh and Yoav Harel and Scott Mahlke and Trevor Mudge and Chaitali Chakrabarti and Krisztian Flautner SODA: a Low-power Architecture For Software Radio . . . . . . . . . . . . . 89--101 Weidong Shi and Hsien-Hsin S. Lee and Laura `Falk and Mrinmoy Ghosh An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors . . . . . . . . . . 102--113 Richard A. Hankins and Gautham N. Chinya and Jamison D. Collins and Perry H. Wang and Ryan Rakvic and Hong Wang and John P. Shen Multiple Instruction Stream Processor 114--127 Philip Emma The End of Scaling? Revolutions in Technology and Microarchitecture as We Pass the 90 Nanometer Node . . . . . . . 128--128 Feihui Li and Chrysostomos Nicopoulos and Thomas Richardson and Yuan Xie and Vijaykrishnan Narayanan and Mahmut Kandemir Design and Management of $3$D Chip Multiprocessors Using Network-in-Memory 130--141 Alok Garg and M. Wasiur Rashid and Michael Huang Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification . . . . . . . . . 142--154 Chuanjun Zhang Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches . . . . . . . . 155--166 Moinuddin K. Qureshi and Daniel N. Lynch and Onur Mutlu and Yale N. Patt A Case for MLP-Aware Cache Replacement 167--178 Chenyu Yan and Daniel Englender and Milos Prvulovic and Brian Rogers and Yan Solihin Improving Cost, Performance, and Security of Memory Encryption and Authentication . . . . . . . . . . . . . 179--190 Benjamin C. Brodie and David E. Taylor and Ron K. Cytron A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching . . . . . . . . . . . . 191--202 Jahangir Hasan and Srihari Cadambi and Venkatta Jakkula and Srimat Chakradhar Chisel: a Storage-efficient, Collision-free Hash-based Network Processing Architecture . . . . . . . . 203--215 Christopher B. Colohan and Anastassia Ailamaki and J. Gregory Steffan and Todd C. Mowry Tolerating Dependences Between Large Speculative Threads Via Sub-Threads . . 216--226 Luis Ceze and James Tuck and Josep Torrellas and Calin Cascaval Bulk Disambiguation of Speculative Threads in Multiprocessors . . . . . . . 227--238 Seungryul Choi and Donald Yeung Learning-Based SMT Processor Resource Distribution via Hill-Climbing . . . . . 239--251 Stephen Somogyi and Thomas F. Wenisch and Anastassia Ailamaki and Babak Falsafi and Andreas Moshovos Spatial Memory Streaming . . . . . . . . 252--263 Jichuan Chang and Gurindar S. Sohi Cooperative Caching for Chip Multiprocessors . . . . . . . . . . . . 264--276 Shiliang Hu and James E. Smith Reducing Startup Time in Co-Designed Virtual Machines . . . . . . . . . . . . 277--288 Qing Yang and Weijun Xiao and Jin Ren TRAP-Array: a Disk Array Architecture Providing Timely Recovery to Any Point-in-time . . . . . . . . . . . . . 289--301 Saisanthosh Balakrishnan and Gurindar S. Sohi Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs . . . . . . . . . 302--313 Steven Swanson and Andrew Putnam and Martha Mercaldi and Ken Michelson and Andrew Petersen and Andrew Schwerin and Mark Oskin and Susan J. Eggers Area-Performance Trade-offs in Tiled Dataflow Architectures . . . . . . . . . 314--326 Karin Strauss and Xiaowei Shen and Josep Torrellas Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors . . . . . . . . . . . . 327--338 Liqun Cheng and Naveen Muralimanohar and Karthik Ramani and Rajeev Balasubramonian and John B. Carter Interconnect-Aware Coherence Protocols for Chip Multiprocessors . . . . . . . . 339--351 Steve Herrod The Future of Virtualization Technology 352--352 Rodney Van Meter and Kae Nemoto and W. J. Munro and Kohei M. Itoh Distributed Arithmetic on a Quantum Multicomputer . . . . . . . . . . . . . 354--365 Nemanja Isailovic and Yatish Patel and Mark Whitney and John Kubiatowicz Interconnection Networks for Scalable Quantum Computers . . . . . . . . . . . 366--377 Darshan D. Thaker and Tzvetan S. Metodi and Andrew W. Cross and Isaac L. Chuang and Frederic T. Chong Quantum Memory Hierarchies: Efficient Designs to Match Available Parallelism in Quantum Computing . . . . . . . . . . 378--390 Anonymous Author Index . . . . . . . . . . . . . . 391--391
Martin Burtscher TCgen 2.0: a tool to automatically generate lossless trace compressors . . 1--8 Abhas Kumar and Nisheet Jain and Mainak Chaudhuri Long-latency branches: how much do they matter? . . . . . . . . . . . . . . . . 9--15 Mark Thorson Internet nuggets . . . . . . . . . . . . 16--21
John L. Henning SPEC CPU2006 benchmark descriptions . . 1--17 Daniel Citron and Adham Hurani and Alaa Gnadrey The harmonic or geometric mean: does it really matter? . . . . . . . . . . . . . 18--25 James Poe and Tao Li BASS: a benchmark suite for evaluating architectural security systems . . . . . 26--33 Mark Thorson Internet nuggets . . . . . . . . . . . . 34--37
Mendel Rosenblum Impact of virtualization on computer architecture and operating systems . . . 1--1 Keith Adams and Ole Agesen A comparison of software and hardware techniques for x86 virtualization . . . 2--13 Stephen T. Jones and Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau Geiger: monitoring the buffer cache in a virtual machine environment . . . . . . 14--24 Jedidiah R. Crandall and Gary Wassermann and Daniela A. S. de Oliveira and Zhendong Su and S. Felix Wu and Frederic T. Chong Temporal search: detecting hidden malware timebombs with virtual machines 25--36 Shan Lu and Joseph Tucek and Feng Qin and Yuanyuan Zhou AVIO: detecting atomicity violations via access interleaving invariants . . . . . 37--48 Min Xu and Mark D. Hill and Rastislav Bodik A regulated transitive reduction (RTR) for longer memory race recording . . . . 49--60 Michael D. Bond and Kathryn S. McKinley Bell: bit-encoding online memory leak detection . . . . . . . . . . . . . . . 61--72 Smitha Shyam and Kypros Constantinides and Sujay Phadke and Valeria Bertacco and Todd Austin Ultra low-cost defect protection for microprocessor pipelines . . . . . . . . 73--82 Vimal K. Reddy and Eric Rotenberg and Sailashri Parthasarathy Understanding prediction-based partial redundant threading for low-overhead, high-coverage fault tolerance . . . . . 83--94 Angshuman Parashar and Anand Sivasubramaniam and Sudhanva Gurumurthi SlicK: slice-based locality exploitation for efficient redundant multithreading 95--105 Taliver Heath and Ana Paula Centeno and Pradeep George and Luiz Ramos and Yogesh Jaluria Mercury and Freon: temperature emulation and management for server systems . . . 106--116 Taeho Kgil and Shaun D'Souza and Ali Saidi and Nathan Binkert and Ronald Dreslinski and Trevor Mudge and Steven Reinhardt and Krisztian Flautner PicoServer: using $3$D stacking technology to enable a compact energy efficient chip multiprocessor . . . . . 117--128 Katherine E. Coons and Xia Chen and Doug Burger and Kathryn S. McKinley and Sundeep K. Kushwaha A spatial path scheduling algorithm for EDGE architectures . . . . . . . . . . . 129--140 Martha Mercaldi and Steven Swanson and Andrew Petersen and Andrew Putnam and Andrew Schwerin and Mark Oskin and Susan J. Eggers Instruction scheduling for a tiled dataflow architecture . . . . . . . . . 141--150 Michael I. Gordon and William Thies and Saman Amarasinghe Exploiting coarse-grained task, data, and pipeline parallelism in stream programs . . . . . . . . . . . . . . . . 151--162 Mahim Mishra and Timothy J. Callahan and Tiberiu Chelcea and Girish Venkataramani and Seth C. Goldstein and Mihai Budiu Tartan: evaluating spatial computation for whole program execution . . . . . . 163--174 Stijn Eyerman and Lieven Eeckhout and Tejas Karkhanis and James E. Smith A performance counter architecture for computing accurate CPI components . . . 175--184 Benjamin C. Lee and David M. Brooks Accurate and efficient regression modeling for microarchitectural performance and power prediction . . . . 185--194 Engin Ïpek and Sally A. McKee and Rich Caruana and Bronis R. de Supinski and Martin Schulz Efficiently exploring architectural design spaces via predictive modeling 195--206 Mazen Kharbutli and Xiaowei Jiang and Yan Solihin and Guru Venkataramani and Milos Prvulovic Comprehensively and efficiently protecting the heap . . . . . . . . . . 207--218 Trishul M. Chilimbi and Vinod Ganapathy HeapMD: identifying heap-based bugs using anomaly detection . . . . . . . . 219--228 Satish Narayanasamy and Cristiano Pereira and Brad Calder Recording shared memory dependencies using strata . . . . . . . . . . . . . . 229--240 Jaidev P. Patwardhan and Vijeta Johri and Chris Dwyer and Alvin R. Lebeck A defect tolerant self-organizing nanoscale SIMD architecture . . . . . . 241--251 Ethan Schuchman and T. N. Vijaykumar A program transformation and architecture support for quantum uncomputation . . . . . . . . . . . . . 252--263 Shashidhar Mysore and Banit Agrawal and Navin Srivastava and Sheng-Chih Lin and Kaustav Banerjee and Tim Sherwood Introspective $3$D chips . . . . . . . . 264--273 Jason F. Cantin and Mikko H. Lipasti and James E. Smith Stealth prefetching . . . . . . . . . . 274--282 Koushik Chakraborty and Philip M. Wells and Gurindar S. Sohi Computation spreading: employing hardware migration to specialize CMP cores on-the-fly . . . . . . . . . . . . 283--292 Jason E. Miller and Anant Agarwal Software-based instruction caching for embedded processors . . . . . . . . . . 293--302 Xin Li and Marian Boldt and Reinhard von Hanxleden Mapping Esterel onto a multi-threaded embedded processor . . . . . . . . . . . 303--314 Nathan L. Binkert and Ali G. Saidi and Steven K. Reinhardt Integrated network interfaces for high-bandwidth TCP/IP . . . . . . . . . 315--324 David Tarditi and Sidd Puri and Jose Oglesby Accelerator: using data parallelism to program GPUs for general-purpose uses 325--335 Peter Damron and Alexandra Fedorova and Yossi Lev Hybrid transactional memory . . . . . . 336--346 Weihaw Chuang and Satish Narayanasamy and Ganesh Venkatesh and Jack Sampson and Michael Van Biesbrouck and Gilles Pokam and Brad Calder and Osvaldo Colavin Unbounded page-based transactional memory . . . . . . . . . . . . . . . . . 347--358 Michelle J. Moravan and Jayaram Bobba and Kevin E. Moore and Luke Yen and Mark D. Hill and Ben Liblit and Michael M. Swift and David A. Wood Supporting nested transactional memory in logTM . . . . . . . . . . . . . . . . 359--370 JaeWoong Chung and Chi Cao Minh and Austen McDonald and Travis Skare and Hassan Chafi and Brian D. Carlstrom and Christos Kozyrakis and Kunle Olukotun Tradeoffs in transactional memory virtualization . . . . . . . . . . . . . 371--381 Motohiro Kawahito and Hideaki Komatsu and Takao Moriyama and Hiroshi Inoue and Toshio Nakatani A new idiom recognition framework for exploiting hardware-assist instructions 382--393 Sorav Bansal and Alex Aiken Automatic generation of peephole superoptimizers . . . . . . . . . . . . 394--403 Armando Solar-Lezama and Liviu Tancau and Rastislav Bodik and Sanjit Seshia and Vijay Saraswat Combinatorial sketching for finite programs . . . . . . . . . . . . . . . . 404--415 Jeff Da Silva and J. Gregory Steffan A probabilistic pointer analysis for speculative optimizations . . . . . . . 416--425
Dean Tullsen and Rakesh Kumar and Norman P. Jouppi Introduction to the special issue on the 2006 Workshop on Design, Analysis, and Simulation of Chip Multiprocessors: (dasCMP'06) . . . . . . . . . . . . . . 2--2 Aqeel Mahesri and Nicholas J. Wang and Sanjay J. Patel Hardware support for software controlled multithreading . . . . . . . . . . . . . 3--12 Xudong Shi and Feiqi Su and Jih-kwon Peir and Ye Xia and Zhen Yang CMP cache performance projection: accessibility vs. capacity . . . . . . . 13--20 Fei Guo and Hari Kannan and Li Zhao and Ramesh Illikkal and Ravi Iyer and Don Newell and Yan Solihin and Christos Kozyrakis From chaos to QoS: case studies in CMP resource management . . . . . . . . . . 21--30 Masaaki Kondo and Hiroshi Sasaki and Hiroshi Nakamura Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS . . . . . . 31--38 M. M. Waliullah and Per Stenstrom Starvation-free commit arbitration policies for transactional memory systems . . . . . . . . . . . . . . . . 39--46 Cesare Ferri and Tali Moreshet and R. Iris Bahar and Luca Benini and Maurice Herlihy A hardware/software framework for supporting transactional memory in a MPSoC environment . . . . . . . . . . . 47--54 Sean Rul and Hans Vandierendonck and Koen De Bosschere Function level parallelism driven by data dependencies . . . . . . . . . . . 55--62 John L. Henning Guest editor's introduction . . . . . . 63--64 John L. Henning SPEC CPU suite growth: an historical perspective . . . . . . . . . . . . . . 65--68 Aashish Phansalkar and Ajay Joshi and Lizy K. John Subsetting the SPEC CPU2006 benchmark suite . . . . . . . . . . . . . . . . . 69--76 Michael Wong C++ benchmarks in SPEC CPU2006 . . . . . 77--83 John L. Henning SPEC CPU2006 memory footprint . . . . . 84--89 Darryl Gove CPU2006 working set size . . . . . . . . 90--96 Wendy Korn and Moon S. Chang SPEC CPU2006 sensitivity to memory page sizes . . . . . . . . . . . . . . . . . 97--101 Reinhold P. Weicker and John L. Henning Subroutine profiling results for the CPU2006 benchmarks . . . . . . . . . . . 102--111 Dong Ye and Joydeep Ray and David Kaeli Characterization of file I/O activity for SPEC CPU2006 . . . . . . . . . . . . 112--117 John L. Henning Performance counters and development of SPEC CPU2006 . . . . . . . . . . . . . . 118--121 Darryl Gove and Lawrence Spracklen Evaluating the correspondence between training and reference workloads in SPEC CPU2006 . . . . . . . . . . . . . . . . 122--129 Cloyce D. Spradling SPEC CPU2006 benchmark tools . . . . . . 130--134 Swaroop Sridhar and Jonathan S. Shapiro and Prashanth P. Bungale HDTrans: a low-overhead dynamic translator . . . . . . . . . . . . . . . 135--140 Jun Yan and Wei Zhang Hybrid multi-core architecture for boosting single-threaded performance . . 141--148 Mark Thorson Internet nuggets . . . . . . . . . . . . 149--154
David E. Shaw and Martin M. Deneroff and Ron O. Dror and Jeffrey S. Kuskin and Richard H. Larson and John K. Salmon and Cliff Young and Brannon Batson and Kevin J. Bowers and Jack C. Chao and Michael P. Eastwood and Joseph Gagliardo and J. P. Grossman and C. Richard Ho and Douglas J. Ierardi and István Kolossváry and John L. Klepeis and Timothy Layman and Christine McLeavey and Mark A. Moraes and Rolf Mueller and Edward C. Priest and Yibing Shan and Jochen Spengler and Michael Theobald and Brian Towles and Stanley C. Wang Anton, a special-purpose machine for molecular dynamics simulation . . . . . 1--12 Xiaobo Fan and Wolf-Dietrich Weber and Luiz Andre Barroso Power provisioning for a warehouse-sized computer . . . . . . . . . . . . . . . . 13--23 Colin Blundell and Joe Devietti and E. Christopher Lewis and Milo M. K. Martin Making the fast case common and the uncommon case simple in unbounded transactional memory . . . . . . . . . . 24--34 Weirong Zhu and Vugranam C. Sreedhar and Ziang Hu and Guang R. Gao Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures . . . . . . . . 35--45 Michael R. Marty and Mark D. Hill Virtual hierarchies to support server consolidation . . . . . . . . . . . . . 46--56 Kyle J. Nesbit and James Laudon and James E. Smith Virtual private caches . . . . . . . . . 57--68 Chi Cao Minh and Martin Trautmann and JaeWoong Chung and Austen McDonald and Nathan Bronson and Jared Casper and Christos Kozyrakis and Kunle Olukotun An effective hybrid transactional memory system with strong isolation guarantees 69--80 Jayaram Bobba and Kevin E. Moore and Haris Volos and Luke Yen and Mark D. Hill and Michael M. Swift and David A. Wood Performance pathologies in hardware transactional memory . . . . . . . . . . 81--91 Hany E. Ramadan and Christopher J. Rossbach and Donald E. Porter and Owen S. Hofmann and Aditya Bhandari and Emmett Witchel MetaTM/TxLinux: transactional memory for an operating system . . . . . . . . . . 92--103 Arrvindh Shriraman and Michael F. Spear and Hemayet Hossain and Virendra J. Marathe and Sandhya Dwarkadas and Michael L. Scott An integrated hardware-software approach to flexible transactional memory . . . . 104--115 Pablo Abad and Valentin Puente and José Angel Gregorio and Pablo Prieto Rotary router: an efficient architecture for CMP interconnection networks . . . . 116--125 John Kim and William J. Dally and Dennis Abts Flattened butterfly: a cost-efficient topology for high-radix networks . . . . 126--137 Jongman Kim and Chrysostomos Nicopoulos and Dongkook Park and Reetuparna Das and Yuan Xie and Vijaykrishnan Narayanan and Mazin S. Yousif and Chita R. Das A novel dimensionally-decomposed router for on-chip communication in $3$D architectures . . . . . . . . . . . . . 138--149 Amit Kumar and Li-Shiuan Peh and Partha Kundu and Niraj K. Jha Express virtual channels: towards the ideal interconnection fabric . . . . . . 150--161 Sanjeev Kumar and Christopher J. Hughes and Anthony Nguyen Carbon: architectural support for fine-grained parallelism on chip multiprocessors . . . . . . . . . . . . 162--173 Naveen Neelakantam and Ravi Rajwar and Suresh Srinivas and Uma Srinivasan and Craig Zilles Hardware atomicity for reliable software speculation . . . . . . . . . . . . . . 174--185 Engin Ipek and Meyrem Kirman and Nevin Kirman and Jose F. Martinez Core fusion: accommodating software diversity in chip multiprocessors . . . 186--197 Eric Chi and Stephen A. Lyon and Margaret Martonosi Tailoring quantum architectures to implementation style: a quantum computer for mobile and persistent qubits . . . . 198--209 Xuejun Yang and Xiaobo Yan and Zuocheng Xing and Yu Deng and Jiang Jiang and Ying Zhang A 64-bit stream processor architecture for scientific applications . . . . . . 210--219 Christopher J. Hughes and Radek Grzeszczuk and Eftychios Sifakis and Daehyun Kim and Sanjeev Kumar and Andrew P. Selle and Jatin Chhugani and Matthew Holliman and Yen-Kuang Chen Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors . . . . . . . . . . . . 220--231 Thomas Y. Yeh and Petros Faloutsos and Sanjay J. Patel and Glenn Reinman ParallAX: an architecture for real-time physics . . . . . . . . . . . . . . . . 232--243 Martha Mercaldi Kim and Mojtaba Mehrara and Mark Oskin and Todd Austin Architectural implications of brick and mortar silicon manufacturing . . . . . . 244--253 Ahmed M. Amin and Mithuna Thottethodi and T. N. Vijaykumar and Steven Wereley and Stephen C. Jacobson Aquacore: a programmable architecture for microfluidics . . . . . . . . . . . 254--265 Thomas F. Wenisch and Anastasia Ailamaki and Babak Falsafi and Andreas Moshovos Mechanisms for store-wait-free multiprocessors . . . . . . . . . . . . 266--277 Luis Ceze and James Tuck and Pablo Montesinos and Josep Torrellas BulkSC: bulk enforcement of sequential consistency . . . . . . . . . . . . . . 278--289 Bruno Diniz and Dorgival Guedes and Wagner Meira, Jr. and Ricardo Bianchini Limiting the power consumption of main memory . . . . . . . . . . . . . . . . . 290--301 Francisco Javier Mesa-Martinez and Joseph Nayfach-Battilana and Jose Renau Power model validation through thermal measurements . . . . . . . . . . . . . . 302--311 Jiang Lin and Hongzhong Zheng and Zhichun Zhu and Howard David and Zhao Zhang Thermal modeling and management of DRAM memory systems . . . . . . . . . . . . . 312--322 Abhishek Tiwari and Smruti R. Sarangi and Josep Torrellas ReCycle: pipeline adaptation to tolerate process variation . . . . . . . . . . . 323--334 Peter G. Sassone and Jeff Rupley II and Edward Brekelbaum and Gabriel H. Loh and Bryan Black Matrix scheduler reloaded . . . . . . . 335--346 Simha Sethumadhavan and Franziska Roesner and Joel S. Emer and Doug Burger and Stephen W. Keckler Late-binding: enabling unordered load-store queues . . . . . . . . . . . 347--357 Jacob Leverich and Hideho Arakida and Alex Solomatnikov and Amin Firoozshahian and Mark Horowitz and Christos Kozyrakis Comparing memory systems for chip multiprocessors . . . . . . . . . . . . 358--368 Naveen Muralimanohar and Rajeev Balasubramonian Interconnect design considerations for large NUCA caches . . . . . . . . . . . 369--380 Moinuddin K. Qureshi and Aamer Jaleel and Yale N. Patt and Simon C. Steely and Joel Emer Adaptive insertion policies for high performance caching . . . . . . . . . . 381--391 Paul A. Karger Performance and security lessons learned from virtualizing the Alpha processor 392--401 Tejas S. Karkhanis and James E. Smith Automated design of application specific superscalar processors: an analytical approach . . . . . . . . . . . . . . . . 402--411 Aashish Phansalkar and Ajay Joshi and Lizy K. John Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite . . . . . . . . . . . . . . . . . 412--423 Hyesoon Kim and José A. Joao and Onur Mutlu and Chang Joo Lee and Yale N. Patt and Robert Cohn VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization . . . . . . . . 424--435 Andrew D. Hilton and Amir Roth Ginger: control independence using tag rewriting . . . . . . . . . . . . . . . 436--447 Ahmed S. Al-Zawawi and Vimal K. Reddy and Eric Rotenberg and Haitham H. Akkary Transparent control independence (TCI) 448--459 Nicholas J. Wang and Aqeel Mahesri and Sanjay J. Patel Examining ACE analysis reliability estimates using fault-injection . . . . 460--469 Nidhi Aggarwal and Parthasarathy Ranganathan and Norman P. Jouppi and James E. Smith Configurable isolation: building high availability systems with commodity multi-core processors . . . . . . . . . 470--481 Michael Dalton and Hari Kannan and Christos Kozyrakis Raksha: a flexible information flow architecture for software security . . . 482--493 Zhenghong Wang and Ruby B. Lee New cache designs for thwarting software cache-based side channel attacks . . . . 494--505 Niranjan Kumar Soundararajan and Angshuman Parashar and Anand Sivasubramaniam Mechanisms for bounding vulnerabilities of processor structures . . . . . . . . 506--515 Kristen R. Walcott and Greg Humphreys and Sudhanva Gurumurthi Dynamic prediction of architectural vulnerability from microarchitectural state . . . . . . . . . . . . . . . . . 516--527
Aneesh Aggarwal and Pradip Bose and Mohamed Zahran Introduction to the special issue on the 2006 Reconfigurable and Adaptive Architecture Workshop . . . . . . . . . 1--1 Nikolaos Bellas and Sek M. Chai and Malcolm Dwyer and Dan Linzmeier Mapping streaming architectures on reconfigurable platforms . . . . . . . . 2--8 Martin Labrecque and Peter Yiannacouras and J. Gregory Steffan Custom code generation for soft processors . . . . . . . . . . . . . . . 9--19 Tameesh Suri Improving instruction level parallelism through reconfigurable units in superscalar processors . . . . . . . . . 20--27 Hashem H. Najaf-abadi and Eric Rotenberg Architectural contesting: exposing and exploiting temperamental behavior . . . 28--35 Kuo-Kun Tseng and Ying-Dar Lin and Tsern-Huei Lee and Yuan-Cheng Lai Deterministic high-speed root-hashing automaton matching coprocessor for embedded network processor . . . . . . . 36--43 Fadi N. Sibai Performance analysis and workload characterization of the $3$DMark05 benchmark on modern parallel computer platforms . . . . . . . . . . . . . . . 44--52 Mark Thorson Internet nuggets . . . . . . . . . . . . 53--55
S. Bartolini and P. Foglia and C. A. Prete MEmory performance: DEaling with applications, systems and architecture 4--5 K. Patrick Lorton and David S. Wise Analyzing block locality in Morton-order and Morton-hybrid matrices . . . . . . . 6--12 Kaveh Jokar Deris and Amirali Baniasadi Investigating cache energy and latency break-even points in high performance processors . . . . . . . . . . . . . . . 13--20 Jun Yan and Wei Zhang Evaluating instruction cache vulnerability to transient errors . . . 21--28 Tanausú Ramírez and Alex Pajuelo and Oliverio J. Santana and Mateo Valero Energy saving through a simple load control mechanism . . . . . . . . . . . 29--36 Luis M. Ramos and José Luis Briz and Pablo E. Ibáñez and Victor Viñals Data prefetching in a cache hierarchy with high bandwidth and capacity . . . . 37--44 Haakon Dybdahl and Per Stenström and Lasse Natvig An LRU-based replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches . . . 45--52 A. Bardine and P. Foglia and G. Gabrielli and C. A. Prete and P. Stenström Improving power efficiency of D-NUCA caches . . . . . . . . . . . . . . . . . 53--58 Mark Thorson Internet nuggets . . . . . . . . . . . . 59--62
Kenji Kise and Toshinori Sato and Hironori Nakajo Special issue: ALPS'07 -- Advanced Low Power Systems: Introduction . . . . . . 1--2 Jun Yao and Shinobu Miwa and Hajime Shimada and Shinji Tomita Optimal pipeline depth with pipeline stage unification adoption . . . . . . . 3--9 Preetham Lakshmikanthan and Adrian Nuñez VCLEARIT: a VLSI CMOS circuit leakage reduction technique for nanoscale technologies . . . . . . . . . . . . . . 10--16 Kiyofumi Tanaka and Takahiro Kawahara Leakage energy reduction in cache memory by data compression . . . . . . . . . . 17--24 Hidetsugu Irie and Ken Sugimoto and Masahiro Goshima and Shuich Sakai Preventing timing errors on register writes: mechanisms of detections and recoveries . . . . . . . . . . . . . . . 25--31 Mihaela Mali\cta and Gheorghe \cStefan and Dominique Thiébaut Not multi-, but many-core: designing integral parallel architectures for embedded computation . . . . . . . . . . 32--38 Takefumi Miyoshi and Nobuhiko Sugino Fine-grain compensation method with consideration of trade-offs between computation and data transfer for power consumption . . . . . . . . . . . . . . 39--44 Bogdan F. Romanescu and Michael E. Bauer and Sule Ozev and Daniel J. Sorin VariaSim: simulating circuits and systems in the presence of process variability . . . . . . . . . . . . . . 45--48 N. Venkateswaran and Deepak Srinivasan and Madhavan Manivannan and T. P. Ramnath Sai Sagar and Shyamsundar Gopalakrishnan and VinothKrishnan Elangovan and Karthik Chandrasekar and Prem Kumar Ramesh and Viswanath Venkatesan and Arvindakshan Babu and Sudharshan Future generation supercomputers I: a paradigm for node architecture . . . . . 49--60 N. Venkateswaran and Deepak Srinivasan and Madhavan Manivannan and T. P. Ramnath Sai Sagar and Shyamsundar Gopalakrishnan and VinothKrishnan Elangovan and Arvind M. and Prem Kumar Ramesh and Karthik Ganesan and Viswanath Krishnamurthy and Sivaramakrishnan Future generation supercomputers II: a paradigm for cluster architecture . . . 61--70 Mark Thorson Internet nuggets . . . . . . . . . . . . 71--73
Erik Winfree Toward molecular programming with DNA 1--1 Xiaoxin Chen and Tal Garfinkel and E. Christopher Lewis and Pratap Subrahmanyam and Carl A. Waldspurger and Dan Boneh and Jeffrey Dwoskin and Dan R. K. Ports Overshadow: a virtualization-based approach to retrofitting protection in commodity operating systems . . . . . . 2--13 Jonathan M. McCune and Bryan Parno and Adrian Perrig and Michael K. Reiter and Arvind Seshadri How low can you go?: recommendations for hardware-supported minimal TCB code execution . . . . . . . . . . . . . . . 14--25 Ravi Bhargava and Benjamin Serebrin and Francesco Spadini and Srilatha Manne Accelerating two-dimensional page walks for virtualized systems . . . . . . . . 26--35 Benjamin C. Lee and David Brooks Efficiency trends and limits from comprehensive microarchitectural adaptivity . . . . . . . . . . . . . . . 36--47 Ramya Raghavendra and Parthasarathy Ranganathan and Vanish Talwar and Zhikui Wang and Xiaoyun Zhu No 'power' struggles: coordinated multi-level power management for the data center . . . . . . . . . . . . . . 48--59 Chinnakrishnan S. Ballapuram and Ahmad Sharif and Hsien-Hsin S. Lee Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors . . . . . . . . . . . . 60--69 Arindam Mallik and Jack Cosgrove and Robert P. Dick and Gokhan Memik and Peter Dinda PICSEL: measuring user-perceived performance to control dynamic frequency scaling . . . . . . . . . . . . . . . . 70--79 Jose A. Joao and Onur Mutlu and Hyesoon Kim and Rishi Agarwal and Yale N. Patt Improving the performance of object-oriented languages with dynamic predication of indirect jumps . . . . . 80--90 Michal Wegiel and Chandra Krintz The mapping collector: virtual memory support for generational, parallel, and concurrent compaction . . . . . . . . . 91--102 Joe Devietti and Colin Blundell and Milo M. K. Martin and Steve Zdancewic Hardbound: architectural support for spatial safety of the C programming language . . . . . . . . . . . . . . . . 103--114 Vitaliy B. Lvin and Gene Novark and Emery D. Berger and Benjamin G. Zorn Archipelago: trading address space for reliability and security . . . . . . . . 115--124 Bumyong Choi and Leo Porter and Dean M. Tullsen Accurate branch prediction for short threads . . . . . . . . . . . . . . . . 125--134 Shekhar Srikantaiah and Mahmut Kandemir and Mary Jane Irwin Adaptive set pinning: managing shared caches in chip multiprocessors . . . . . 135--144 James Tuck and Wonsun Ahn and Luis Ceze and Josep Torrellas SoftSig: software-exposed hardware signatures for code analysis and optimization . . . . . . . . . . . . . . 145--156 Ioana Burcea and Stephen Somogyi and Andreas Moshovos and Babak Falsafi Predictor virtualization . . . . . . . . 157--167 Vinod Ganapathy and Matthew J. Renzelmann and Arini Balakrishnan and Michael M. Swift and Somesh Jha The design and implementation of microdrivers . . . . . . . . . . . . . . 168--178 Yaron Weinsberg and Danny Dolev and Tal Anker and Muli Ben-Yehuda and Pete Wyckoff Tapping into the fountain of CPUs: on operating system support for programmable devices . . . . . . . . . . 179--188 Kai Shen and Ming Zhong and Sandhya Dwarkadas and Chuanpeng Li and Christopher Stewart and Xiao Zhang Hardware counter driven on-the-fly request signatures . . . . . . . . . . . 189--200 Luk Van Ertvelde and Lieven Eeckhout Dispersing proprietary applications as benchmarks through code mutation . . . . 201--210 Shashidhar Mysore and Bita Mazloom and Banit Agrawal and Timothy Sherwood Understanding and visualizing full systems with data flow tomography . . . 211--221 Guilherme Ottoni and David I. August Communication optimizations for global multi-threaded instruction scheduling 222--232 Milind Kulkarni and Keshav Pingali and Ganesh Ramanarayanan and Bruce Walter and Kavita Bala and L. Paul Chew Optimistic parallelism benefits from data partitioning . . . . . . . . . . . 233--243 Russ Cox and Tom Bergan and Austin T. Clements and Frans Kaashoek and Eddie Kohler Xoc, an extension-oriented compiler for systems programming . . . . . . . . . . 244--254 Philip M. Wells and Koushik Chakraborty and Gurindar S. Sohi Adapting to intermittent faults in multicore systems . . . . . . . . . . . 255--264 Man-Lap Li and Pradeep Ramachandran and Swarup Kumar Sahoo and Sarita V. Adve and Vikram S. Adve and Yuanyuan Zhou Understanding the propagation of hard errors to software and implications for resilient system design . . . . . . . . 265--276 M. Aater Suleman and Moinuddin K. Qureshi and Yale N. Patt Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs . . . . . . . . . . . . . . . . . . 277--286 Michael D. Linderman and Jamison D. Collins and Hong Wang and Teresa H. Meng Merge: a programming model for heterogeneous multi-core systems . . . . 287--296 Jayanth Gummaraju and Joel Coburn and Yoshio Turner and Mendel Rosenblum Streamware: programming general-purpose multicore processors using streams . . . 297--307 Edmund B. Nightingale and Daniel Peek and Peter M. Chen and Jason Flinn Parallelizing security checks on commodity hardware . . . . . . . . . . . 308--318 Miguel Castro and Manuel Costa and Jean-Philippe Martin Better bug reporting with better privacy 319--328 Shan Lu and Soyeon Park and Eunsoo Seo and Yuanyuan Zhou Learning from mistakes: a comprehensive study on real world concurrency bug characteristics . . . . . . . . . . . . 329--339
Anonymous Message from the General Chairs . . . . x--x Anonymous Message from the Program Chair . . . . . xi--xi Anonymous Reviewers . . . . . . . . . . . . . . . xv--xviii Francis Tseng and Yale N. Patt Achieving Out-of-Order Performance with Almost In-Order Complexity . . . . . . . 3--12 Mayank Agarwal and Nitin Navale and Kshitiz Malik and Matthew I. Frank Fetch-Criticality Reduction through Control Independence . . . . . . . . . . 13--24 Miquel Peric\`as and Adrian Cristal and Francisco J. Cazorla and Ruben González and Alex Veidenbaum and Daniel A. Jiménez and Mateo Valero A Two-Level Load/Store Queue Based on Execution Locality . . . . . . . . . . . 25--36 Engin Ipek and Onur Mutlu and José F. Martínez and Rich Caruana Self-Optimizing Memory Controllers: a Reinforcement Learning Approach . . . . 39--50 Shyamkumar Thoziyoor and Jung Ho Ahn and Matteo Monchiero and Jay B. Brockman and Norman P. Jouppi A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies 51--62 Onur Mutlu and Thomas Moscibroda Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems . . . . . . . . . 63--74 John Kim and William J. Dally and Steve Scott and Dennis Abts Technology-Driven, Highly-Scalable Dragonfly Topology . . . . . . . . . . . 77--88 Jae W. Lee and Man Cheuk Ng and Krste Asanovic Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks . . . . . . . . . . . . . . . . 89--100 Martha Mercaldi Kim and John D. Davis and Mark Oskin and Todd Austin Polymorphic On-Chip Networks . . . . . . 101--112 Lee Baugh and Naveen Neelakantam and Craig Zilles Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional Memory . . . . . . . . . . . . . . . . . 115--126 Jayaram Bobba and Neelam Goyal and Mark D. Hill and Michael M. Swift and David A. Wood TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory . . . . . . . . . . . . . . . . . 127--138 Arrvindh Shriraman and Sandhya Dwarkadas and Michael L. Scott Flexible Decoupled Transactional Memory Support . . . . . . . . . . . . . . . . 139--150 Dana Vantrease and Robert Schreiber and Matteo Monchiero and Moray McLaren and Norman P. Jouppi and Marco Fiorentino and Al Davis and Nathan Binkert and Raymond G. Beausoleil and Jung Ho Ahn Corona: System Implications of Emerging Nanophotonic Technology . . . . . . . . 153--164 Lucas Kreger-Stickles and Mark Oskin Microcoded Architectures for Ion-Tap Quantum Computers . . . . . . . . . . . 165--176 Nemanja Isailovic and Mark Whitney and Yatish Patel and John Kubiatowicz Running a Quantum Circuit at the Speed of Data . . . . . . . . . . . . . . . . 177--188 Xiaoyao Liang and Gu-Yeon Wei and David Brooks ReVIVaL: a Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency . . . . . . . . . . 191--202 Chris Wilkerson and Hongliang Gao and Alaa R. Alameldeen and Zeshan Chishti and Muhammad Khellah and Shih-Lien Lu Trading off Cache Capacity for Reliability to Enable Low Voltage Operation . . . . . . . . . . . . . . . 203--214 Franziska Roesner and Doug Burger and Stephen W. Keckler Counting Dependence Predictors . . . . . 215--226 Natalie Enright Jerger and Li-Shiuan Peh and Mikko Lipasti Virtual Circuit Tree Multicasting: a Case for On-Chip Hardware Multicast Support . . . . . . . . . . . . . . . . 229--240 Avinash Karanth Kodi and Ashwini Sarathy and Ahmed Louri iDEAL: Inter-router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures . . 241--250 Dongkook Park and Soumya Eachempati and Reetuparna Das and Asit K. Mishra and Yuan Xie and N. Vijaykrishnan and Chita R. Das MIRA: a Multi-layered On-Chip Interconnect Router Architecture . . . . 251--261 Derek R. Hower and Mark D. Hill Rerun: Exploiting Episodes for Lightweight Memory Race Recording . . . 265--276 Brandon Lucia and Joseph Devietti and Karin Strauss and Luis Ceze Atom-Aid: Detecting and Surviving Atomicity Violations . . . . . . . . . . 277--288 Pablo Montesinos and Luis Ceze and Josep Torrellas DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently . . . . . . . . . . . . . . 289--300 Sriram Sankar and Sudhanva Gurumurthi and Mircea R. Stan Intra-disk Parallelism: An Idea Whose Time Has Come . . . . . . . . . . . . . 303--314 Kevin Lim and Parthasarathy Ranganathan and Jichuan Chang and Chandrakant Patel and Trevor Mudge and Steven Reinhardt Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments . . . . 315--326 Taeho Kgil and David Roberts and Trevor Mudge Improving NAND Flash Based Disk Caches 327--338 Xiaodong Li and Sarita V. Adve and Pradip Bose and Jude A. Rivers Online Estimation of Architectural Vulnerability Factor for Soft Errors . . 341--352 Jeonghee Shin and Victor Zyuban and Pradip Bose and Timothy M. Pinkston A Proactive Wearout Recovery Approach for Exploiting Microarchitectural Redundancy to Extend Cache SRAM Lifetime 353--362 Radu Teodorescu and Josep Torrellas Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors . . . . . . . . . . . . 363--374 Shimin Chen and Michael Kozuch and Theodoros Strigkos and Babak Falsafi and Phillip B. Gibbons and Todd C. Mowry and Vijaya Ramachandran and Olatunji Ruwase and Michael Ryan and Evangelos Vlachos Flexible Hardware Acceleration for Instruction-Grain Program Monitoring . . 377--388 Nathan Clark and Amir Hormati and Scott Mahlke VEAL: Virtualized Execution Accelerator for Loops . . . . . . . . . . . . . . . 389--400 Haibo Chen and Xi Wu and Liwei Yuan and Binyu Zang and Pen-chung Yew and Frederic T. Chong From Speculation to Security: Practical and Efficient Information Flow Tracking Using Speculative Hardware . . . . . . . 401--412 Carlos Boneti and Francisco J. Cazorla and Roberto Gioiosa and Alper Buyuktosunoglu and Chen-Yong Cher and Mateo Valero Software-Controlled Priority Characterization of POWER5 Processor . . 415--426 Alex Shye and Berkin Ozisikyilmaz and Arindam Mallik and Gokhan Memik and Peter A. Dinda and Robert P. Dick and Alok N. Choudhary Learning and Leveraging the Relationship between Architecture-Level Measurements and Individual User Satisfaction . . . . 427--438 Sanjeev Kumar and Daehyun Kim and Mikhail Smelyanskiy and Yen-Kuang Chen and Jatin Chhugani and Christopher J. Hughes and Changkyu Kim and Victor W. Lee and Anthony D. Nguyen Atomic Vector Operations on Chip Multiprocessors . . . . . . . . . . . . 441--452 Gabriel H. Loh $3$D-Stacked Memory Architectures for Multi-core Processors . . . . . . . . . 453--464 Anonymous Author Index . . . . . . . . . . . . . . 465--466 Anonymous Publisher's Information . . . . . . . . 468--468 Anonymous Cover Art . . . . . . . . . . . . . . . C1--C1
Ramesh K. Karne and Alexander L. Wijesinha and George H. Ford, Jr. Opinion: stay on course with an evolution or choose a revolution in computing . . . . . . . . . . . . . . . 1--6 Mark Thorson Internet Nuggets . . . . . . . . . . . . 7--11
Jerker Bengtsson and Bertil Svensson A domain-specific approach for software development on Manycore platforms . . . 2--10 Daniel Cederman and Philippas Tsigas On sorting and load balancing on GPUs 11--18 Phuong Hoai Ha and Philippas Tsigas and Otto J. Anshus Non-blocking programming on multi-core graphics processors: (extended abstract) 19--28 Shuvra S. Bhattacharyya and Gordon Brebner and Jörn W. Janneck and Johan Eker and Carl von Platen and Marco Mattavelli and Mickaël Raulet OpenDF: a dataflow toolset for reconfigurable hardware and multicore systems . . . . . . . . . . . . . . . . 29--35 Christoph W. Kessler and Jörg Keller Optimized on-chip pipelining of memory-intensive computations on the cell BE . . . . . . . . . . . . . . . . 36--45 Håkan Lundvall and Kristian Stavåker and Peter Fritzson and Christoph Kessler Automatic parallelization of simulation code for equation-based models with software pipelining and measurements on three platforms . . . . . . . . . . . . 46--55 Huan Fang and Mats Brorsson Scalable directory architecture for distributed shared memory chip multiprocessors . . . . . . . . . . . . 56--64 Bengt Jonsson State-space exploration for concurrent algorithms under weak memory orderings: (preliminary version) . . . . . . . . . 65--71 Parosh Aziz Abdulla and Frédéric Haziza and Mats Kindahl Model checking race-freeness . . . . . . 72--79 Hakan Sundell and Philippas Tsigas NOBLE: non-blocking programming support via lock-free shared abstract data types 80--87 Anders Gidenstam and Marina Papatriantafilou LFTHREADS: a lock-free thread library 88--92 Karl-Filip Faxén Wool --- a work stealing library . . . . 93--100 Mark Thorson Internet nuggets . . . . . . . . . . . . 101--111
Mark Gebhart and Bertrand A. Maher and Katherine E. Coons and Jeff Diamond and Paul Gratz and Mario Marino and Nitya Ranganathan and Behnam Robatmili and Aaron Smith and James Burrill and Stephen W. Keckler and Doug Burger and Kathryn S. McKinley An evaluation of the TRIPS computer system . . . . . . . . . . . . . . . . . 1--12 Constantin Pistol and Wutichai Chongchitmate and Christopher Dwyer and Alvin R. Lebeck Architectural implications of nanoscale integrated sensing and computing . . . . 13--24 Soyeon Park and Shan Lu and Yuanyuan Zhou CTrigger: exposing atomicity violation bugs from their hiding places . . . . . 25--36 Stelios Sidiroglou and Oren Laadan and Carlos Perez and Nicolas Viennot and Jason Nieh and Angelos D. Keromytis ASSURE: automatic software self-healing using rescue points . . . . . . . . . . 37--48 Andrew Lenharth and Vikram S. Adve and Samuel T. King Recovery domains: an organizing principle for recoverable operating systems . . . . . . . . . . . . . . . . 49--60 Martin Dimitrov and Huiyang Zhou Anomaly-based bug prediction, isolation, and validation: an automated approach for software debugging . . . . . . . . . 61--72 Pablo Montesinos and Matthew Hicks and Samuel T. King and Josep Torrellas Capo: a software-hardware interface for practical deterministic multiprocessor replay . . . . . . . . . . . . . . . . . 73--84 Joseph Devietti and Brandon Lucia and Luis Ceze and Mark Oskin DMP: deterministic shared memory multiprocessing . . . . . . . . . . . . 85--96 Marek Olszewski and Jason Ansel and Saman Amarasinghe Kendo: efficient deterministic multithreading in software . . . . . . . 97--108 Mohit Tiwari and Hassan M. G. Wassel and Bita Mazloom and Shashidhar Mysore and Frederic T. Chong and Timothy Sherwood Complete information flow tracking from the gates up . . . . . . . . . . . . . . 109--120 David K. Tam and Reza Azimi and Livio B. Soares and Michael Stumm RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations . . . . . . . . . . . . . 121--132 Stijn Eyerman and Lieven Eeckhout Per-thread cycle accounting in SMT processors . . . . . . . . . . . . . . . 133--144 Owen S. Hofmann and Christopher J. Rossbach and Emmett Witchel Maximum benefit from a minimal HTM . . . 145--156 Dave Dice and Yossi Lev and Mark Moir and Daniel Nussbaum Early experience with a commercial hardware transactional memory implementation . . . . . . . . . . . . . 157--168 Philip M. Wells and Koushik Chakraborty and Gurindar S. Sohi Mixed-mode multicore reliability . . . . 169--180 Sriram Rajamani and G. Ramalingam and Venkatesh Prasad Ranganath and Kapil Vaswani ISOLATOR: dynamically ensuring isolation in comcurrent programs . . . . . . . . . 181--192 Joseph Tucek and Weiwei Xiong and Yuanyuan Zhou Efficient online validation with delta execution . . . . . . . . . . . . . . . 193--204 David Meisner and Brian T. Gold and Thomas F. Wenisch PowerNap: eliminating server idle power 205--216 Adrian M. Caulfield and Laura M. Grupp and Steven Swanson Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications . . . . . . 217--228 Aayush Gupta and Youngjae Kim and Bhuvan Urgaonkar DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings . . . . . 229--240 Farhana Aleen and Nathan Clark Commutativity analysis for software parallelization: letting program transformations see the big picture . . 241--252 M. Aater Suleman and Onur Mutlu and Moinuddin K. Qureshi and Yale N. Patt Accelerating critical section execution with asymmetric multi-core architectures 253--264 Todd Mytkowicz and Amer Diwan and Matthias Hauswirth and Peter F. Sweeney Producing wrong data without doing anything obviously wrong! . . . . . . . 265--276 Michael D. Bond and Kathryn S. McKinley Leak pruning . . . . . . . . . . . . . . 277--288 Michal Wegiel and Chandra Krintz Dynamic prediction of collection yield for managed runtimes . . . . . . . . . . 289--300 Aravind Menon and Simon Schubert and Willy Zwaenepoel TwinDrivers: semi-automatic derivation of fast and safe hypervisor network drivers from guest OS drivers . . . . . 301--312 Ioana Burcea and Andreas Moshovos Phantom-BTB: a virtualized branch target buffer design . . . . . . . . . . . . . 313--324 Karthik Ramani and Christiaan P. Gribble and Al Davis StreamRay: a stream filtering architecture for coherent ray tracing 325--336 Robert D. Cameron and Dan Lin Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle . . . . 337--348
Norman P. Jouppi and Rakesh Kumar and Dean Tullsen Introduction to the special issue on the 2008 Workshop on Design, Analysis, and Simulation of Chip Multiprocessors (dasCMP'08) . . . . . . . . . . . . . . 1--1 Hui Zeng and Matt Yourst and Kanad Ghose and Dmitry Ponomarev MPTLsim: a cycle-accurate, full-system simulator for x86-64 multicore architectures with coherent caches . . . 2--9 Matteo Monchiero and Jung Ho Ahn and Ayose Falcón and Daniel Ortega and Paolo Faraboschi How to simulate 1000 cores . . . . . . . 10--19 Jianwei Chen and Murali Annavaram and Michel Dubois SlackSim: a platform for parallel simulations of CMPs on CMPs . . . . . . 20--29 Madhura Purnaprajna and Mario Porrmann and Ulrich Rueckert Run-time reconfigurability in embedded multiprocessors . . . . . . . . . . . . 30--37 Chris Jesshope and Mike Lankamp and Li Zhang The implementation of an SVP many-core processor and the evaluation of its memory architecture . . . . . . . . . . 38--45 Karan Singh and Major Bhadauria and Sally A. McKee Real time power estimation and thread scheduling via performance counters . . 46--55 Omid Azizi and Aqeel Mahesri and Sanjay J. Patel and Mark Horowitz Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design . . . . . . . . . . . . 56--65 Mark Thorson Internet nuggets . . . . . . . . . . . . 66--69
Katherine Yelick Ten ways to waste a parallel computer 1--1 Benjamin C. Lee and Engin Ipek and Onur Mutlu and Doug Burger Architecting phase change memory as a scalable DRAM alternative . . . . . . . 2--13 Ping Zhou and Bo Zhao and Jun Yang and Youtao Zhang A durable and energy efficient main memory using phase change memory technology . . . . . . . . . . . . . . . 14--23 Moinuddin K. Qureshi and Vijayalakshmi Srinivasan and Jude A. Rivers Scalable high performance main memory system using phase-change memory technology . . . . . . . . . . . . . . . 24--33 Xiaoxia Wu and Jian Li and Lixin Zhang and Evan Speight and Ram Rajamony and Yuan Xie Hybrid cache architecture with disparate memory technologies . . . . . . . . . . 34--45 Jinho Suh and Michel Dubois Dynamic MIPS rate stabilization in out-of-order processors . . . . . . . . 46--56 Marco Paolieri and Eduardo Quiñones and Francisco J. Cazorla and Guillem Bernat and Mateo Valero Hardware support for WCET analysis of hard real-time multicore systems . . . . 57--68 Stephen Somogyi and Thomas F. Wenisch and Anastasia Ailamaki and Babak Falsafi Spatio-temporal memory streaming . . . . 69--80 Pedro Diaz and Marcelo Cintra Stream chaining: exploiting multiple levels of correlation in data prefetching . . . . . . . . . . . . . . 81--92 Michael D. Powell and Arijit Biswas and Shantanu Gupta and Shubhendu S. Mukherjee Architectural core salvaging in a multi-core processor for hard-error tolerance . . . . . . . . . . . . . . . 93--104 Javier Carretero and Pedro Chaparro and Xavier Vera and Jaume Abella and Antonio González End-to-end register data-flow continuous self-test . . . . . . . . . . . . . . . 105--115 Doe Hyun Yoon and Mattan Erez Memory mapped ECC: low-cost error protection for last level caches . . . . 116--127 Mark Woh and Sangwon Seo and Scott Mahlke and Trevor Mudge and Chaitali Chakrabarti and Krisztian Flautner AnySP: anytime anywhere anyway signal processing . . . . . . . . . . . . . . . 128--139 John H. Kelm and Daniel R. Johnson and Matthew R. Johnson and Neal C. Crago and William Tuohy and Aqeel Mahesri and Steven S. Lumetta and Matthew I. Frank and Sanjay J. Patel Rigel: an architecture and scalable programming interface for a 1000-core accelerator . . . . . . . . . . . . . . 140--151 Sunpyo Hong and Hyesoon Kim An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness . . . 152--163 Susmit Biswas and Diana Franklin and Alan Savage and Ryan Dixon and Timothy Sherwood and Frederic T. Chong Multi-execution: multicore caching for data-similar executions . . . . . . . . 164--173 Yuejian Xie and Gabriel H. Loh PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches . . . . . . . . . . . . . . . . . 174--183 Nikos Hardavellas and Michael Ferdman and Babak Falsafi and Anastasia Ailamaki Reactive NUCA: near-optimal block placement and replication in distributed caches . . . . . . . . . . . . . . . . . 184--195 Thomas Moscibroda and Onur Mutlu A case for bufferless routing in on-chip networks . . . . . . . . . . . . . . . . 196--207 Michel A. Kinsy and Myong Hyon Cho and Tina Wen and Edward Suh and Marten van Dijk and Srinivas Devadas Application-aware deadlock-free oblivious routing . . . . . . . . . . . 208--219 Nan Jiang and John Kim and William J. Dally Indirect adaptive routing on large scale interconnection networks . . . . . . . . 220--231 James Hamilton Internet-scale service infrastructure efficiency . . . . . . . . . . . . . . . 232--232 Colin Blundell and Milo M. K. Martin and Thomas F. Wenisch InvisiFence: performance-transparent memory ordering in conventional multiprocessors . . . . . . . . . . . . 233--244 Andrew Hilton and Amir Roth Decoupled store completion/silent deterministic replay: enabling scalable data memory for CPR/CFP processors . . . 245--254 Hongzhong Zheng and Jiang Lin and Zhao Zhang and Zhichun Zhu Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices . . . . . . . . . . . . . . . . 255--266 Kevin Lim and Jichuan Chang and Trevor Mudge and Parthasarathy Ranganathan and Steven K. Reinhardt and Thomas F. Wenisch Disaggregated memory for expansion and sharing in blade servers . . . . . . . . 267--278 Cagdas Dirik and Bruce Jacob The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization . . . . . . . . . . 279--289 Abhishek Bhattacharjee and Margaret Martonosi Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors . . . 290--301 Krishna K. Rangan and Gu-Yeon Wei and David Brooks Thread motion: fine-grained power management for multi-core systems . . . 302--313 Yefu Wang and Kai Ma and Xiaorui Wang Temperature-constrained power control for chip multiprocessors with online model estimation . . . . . . . . . . . . 314--324 Jie Yu and Satish Narayanasamy A case for an interleaving constrained shared-memory multi-processor . . . . . 325--336 Abdullah Muzahid and Dario Suárez and Shanxiang Qi and Josep Torrellas SigRace: signature-based data race detection . . . . . . . . . . . . . . . 337--348 Vijay Nagarajan and Rajiv Gupta ECMon: exposing cache events for monitoring . . . . . . . . . . . . . . . 349--360 Ali G. Saidi and Nathan L. Binkert and Steven K. Reinhardt and Trevor Mudge End-to-end performance forecasting: finding bottlenecks before they happen 361--370 Brian M. Rogers and Anil Krishna and Gordon B. Bell and Ken Vu and Xiaowei Jiang and Yan Solihin Scaling the bandwidth wall: challenges in and avenues for CMP scaling . . . . . 371--382 Mark G. Whitney and Nemanja Isailovic and Yatish Patel and John Kubiatowicz A fault tolerant, area efficient architecture for Shor's factoring algorithm . . . . . . . . . . . . . . . 383--394 Andrew Putnam and Susan Eggers and Dave Bennett and Eric Dellinger and Jeff Mason and Henry Styles and Prasanna Sundararajan and Ralph Wittig Performance and power of cache-based reconfigurable computing . . . . . . . . 395--405 Amin Firoozshahian and Alex Solomatnikov and Ofer Shacham and Zain Asgar and Stephen Richardson and Christos Kozyrakis and Mark Horowitz A memory system design framework: creating smart memories . . . . . . . . 406--417 José A. Joao and Onur Mutlu and Yale N. Patt Flexible reference-counting-based hardware acceleration for garbage collection . . . . . . . . . . . . . . . 418--428 Yan Pan and Prabhat Kumar and John Kim and Gokhan Memik and Yu Zhang and Alok Choudhary Firefly: illuminating future network-on-chip with nanophotonics . . . 429--440 Mark J. Cianchetti and Joseph C. Kerekes and David H. Albonesi Phastlane: a rapid transit optical routing network . . . . . . . . . . . . 441--450 Dennis Abts and Natalie D. Enright Jerger and John Kim and Dan Gibson and Mikko H. Lipasti Achieving predictable performance through better memory controller placement in many-core CMPs . . . . . . 451--461 Yangchun Luo and Venkatesan Packirisamy and Wei-Chung Hsu and Antonia Zhai and Nikhil Mungre and Ankit Tarkas Dynamic performance tuning for speculative threads . . . . . . . . . . 462--473 Carlos Madriles and Pedro López and Josep M. Codina and Enric Gibert and Fernando Latorre and Alejandro Martinez and Raúl Martinez and Antonio Gonzalez Boosting single-thread performance in multi-core systems through fine-grain multi-threading . . . . . . . . . . . . 474--483 Shailender Chaudhry and Robert Cypher and Magnus Ekman and Martin Karlsson and Anders Landin and Sherman Yip and Håkan Zeffer and Marc Tremblay Simultaneous speculative threading: a novel pipeline architecture implemented in Sun's Rock processor . . . . . . . . 484--495
Alexander Thomasian Publications on storage and systems research . . . . . . . . . . . . . . . . 1--26 Enric Musoll Mesh-based many-core performance under process variations: a core yield perspective . . . . . . . . . . . . . . 27--34 Angel V. Nikolov Queuing theoretic model for a multiprocessor with private caches and shared memory . . . . . . . . . . . . . 35--44 Mark Thorson Internet nuggets . . . . . . . . . . . . 45--51
Enric Musoll Leakage-saving opportunities in mesh-based massive multi-core architectures . . . . . . . . . . . . . 1--7 Abdul Naeem and Xiaowen Chen and Zhonghai Lu and Axel Jantsch Scalability of relaxed consistency models in NoC based multicore architectures . . . . . . . . . . . . . 8--15 Sandeep Sharma and K. S. Kahlon and P. K. Bansal Reliability and path length analysis of irregular fault tolerant multistage interconnection network . . . . . . . . 16--23 Mark Thorson Internet nuggets . . . . . . . . . . . . 24--30
Eric A. Brewer Technology for developing regions: Moore's Law is not enough . . . . . . . 1--2 Engin Ipek and Jeremy Condit and Edmund B. Nightingale and Doug Burger and Thomas Moscibroda Dynamically replicated memory: building reliable systems from nanoscale resistive memories . . . . . . . . . . . 3--14 Nevin Kirman and José F. Martínez A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing . . . . . . . . . . . 15--28 Naveen Neelakantam and David R. Ditzel and Craig Zilles A real system evaluation of hardware atomicity for software speculation . . . 29--38 Tim Harris and Sasa Tomic and Adrián Cristal and Osman Unsal Dynamic filtering: multi-purpose architecture support for language runtime systems . . . . . . . . . . . . 39--52 Tom Bergan and Owen Anderson and Joseph Devietti and Luis Ceze and Dan Grossman CoreDet: a compiler and runtime system for deterministic multithreaded execution . . . . . . . . . . . . . . . 53--64 Arun Raman and Hanjun Kim and Thomas R. Mason and Thomas B. Jablin and David I. August Speculative parallelization using software multi-threaded transactions . . 65--76 Dongyoon Lee and Benjamin Wester and Kaushik Veeraraghavan and Satish Narayanasamy and Peter M. Chen and Jason Flinn Respec: efficient online multiprocessor replay via speculation and external determinism . . . . . . . . . . . . . . 77--90 Stijn Eyerman and Lieven Eeckhout Probabilistic job symbiosis modeling for SMT processor scheduling . . . . . . . . 91--102 Kai Shen Request behavior variations . . . . . . 103--116 F. Ryan Johnson and Radu Stoica and Anastasia Ailamaki and Todd C. Mowry Decoupling contention management from scheduling . . . . . . . . . . . . . . . 117--128 Sergey Zhuravlev and Sergey Blagodurov and Alexandra Fedorova Addressing shared resource contention in multicore processors via scheduling . . 129--142 Ding Yuan and Haohui Mai and Weiwei Xiong and Lin Tan and Yuanyuan Zhou and Shankar Pasupathy SherLog: error diagnosis by connecting clues from run-time logs . . . . . . . . 143--154 Dasarath Weeratunge and Xiangyu Zhang and Suresh Jagannathan Analyzing multicore dumps to facilitate concurrency bug reproduction . . . . . . 155--166 Sebastian Burckhardt and Pravesh Kothari and Madanlal Musuvathi and Santosh Nagarakatte A randomized scheduler with probabilistic guarantees of finding bugs 167--178 Wei Zhang and Chong Sun and Shan Lu ConMem: detecting severe concurrency bugs through an effect-oriented approach 179--192 Francisco Javier Mesa-Martinez and Ehsan K. Ardestani and Jose Renau Characterizing processor thermal behavior . . . . . . . . . . . . . . . . 193--204 Ganesh Venkatesh and Jack Sampson and Nathan Goulding and Saturnino Garcia and Vladyslav Bryksin and Jose Lugo-Martinez and Steven Swanson and Michael Bedford Taylor Conservation cores: reducing the energy of mature computations . . . . . . . . . 205--218 Kshitij Sudan and Niladrish Chatterjee and David Nellans and Manu Awasthi and Rajeev Balasubramonian and Al Davis Micro-pages: increasing DRAM efficiency with locality-aware data placement . . . 219--230 Steven Pelley and David Meisner and Pooya Zandevakili and Thomas F. Wenisch and Jack Underwood Power routing: dynamic power provisioning in the data center . . . . 231--242 Faraz Ahmad and T. N. Vijaykumar Joint optimization of idle and cooling power in data centers while maintaining response time . . . . . . . . . . . . . 243--256 Michelle L. Goodstein and Evangelos Vlachos and Shimin Chen and Phillip B. Gibbons and Michael A. Kozuch and Todd C. Mowry Butterfly analysis: adapting dataflow analysis to dynamic parallel monitoring 257--270 Evangelos Vlachos and Michelle L. Goodstein and Michael A. Kozuch and Shimin Chen and Babak Falsafi and Phillip B. Gibbons and Todd C. Mowry ParaLog: enabling and accelerating online parallel monitoring of multithreaded applications . . . . . . . 271--284 Amir H. Hormati and Yoonseo Choi and Mark Woh and Manjunath Kudlur and Rodric Rabbah and Trevor Mudge and Scott Mahlke MacroSS: macro-SIMDization of streaming applications . . . . . . . . . . . . . . 285--296 Dong Hyuk Woo and Hsien-Hsin S. Lee COMPASS: a programmable data prefetcher using idle GPU shaders . . . . . . . . . 297--310 Daniel Sanchez and Richard M. Yoo and Christos Kozyrakis Flexible architectural support for fine-grain scheduling . . . . . . . . . 311--322 Bogdan F. Romanescu and Alvin R. Lebeck and Daniel J. Sorin Specifying and dynamically verifying address translation-aware memory consistency . . . . . . . . . . . . . . 323--334 Eiman Ebrahimi and Chang Joo Lee and Onur Mutlu and Yale N. Patt Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems . . . . . . . . . . . . . . . . 335--346 Isaac Gelado and Javier Cabezas and Nacho Navarro and John E. Stone and Sanjay Patel and Wen-mei W. Hwu An asymmetric distributed shared memory model for heterogeneous parallel systems 347--358 Abhishek Bhattacharjee and Margaret Martonosi Inter-core cooperative TLB for chip multiprocessors . . . . . . . . . . . . 359--370 Ruirui Huang and Daniel Y. Deng and G. Edward Suh Orthrus: efficient software integrity protection on multi-cores . . . . . . . 371--384 Shuguang Feng and Shantanu Gupta and Amin Ansari and Scott Mahlke Shoestring: probabilistic soft error reliability on the cheap . . . . . . . . 385--396 Doe Hyun Yoon and Mattan Erez Virtualized and flexible ECC for main memory . . . . . . . . . . . . . . . . . 397--408
Alexander Thomasian Storage research in industry and universities . . . . . . . . . . . . . . 1--48 Wolfgang Matthes Resources instead of cores? . . . . . . 49--63 Mark Thorson Internet nuggets . . . . . . . . . . . . 64--67
William J. Dally Moving the needle, computer architecture research in academe and industry . . . . 1--1 Yasuko Watanabe and John D. Davis and David A. Wood WiDGET: Wisconsin Decoupled Grid Execution Tiles . . . . . . . . . . . . 2--13 Dan Gibson and David A. Wood Forwardflow: a scalable core for power-constrained CMPs . . . . . . . . . 14--25 Omid Azizi and Aqeel Mahesri and Benjamin C. Lee and Sanjay J. Patel and Mark Horowitz Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis . . . . 26--36 Rehan Hameed and Wajahat Qadeer and Megan Wachs and Omid Azizi and Alex Solomatnikov and Benjamin C. Lee and Stephen Richardson and Christos Kozyrakis and Mark Horowitz Understanding sources of inefficiency in general-purpose chips . . . . . . . . . 37--47 Thomas W. Barr and Alan L. Cox and Scott Rixner Translation caching: skip, don't walk (the page table) . . . . . . . . . . . . 48--59 Aamer Jaleel and Kevin B. Theobald and Simon C. Steely, Jr. and Joel Emer High performance cache replacement using re-reference interval prediction (RRIP) 60--71 Jeffrey Stuecheli and Dimitris Kaseridis and David Daly and Hillery C. Hunter and Lizy K. John The virtual write queue: coordinating DRAM and last-level cache policies . . . 72--82 Chris Wilkerson and Alaa R. Alameldeen and Zeshan Chishti and Wei Wu and Dinesh Somasekhar and Shih-lien Lu Reducing cache power with low-cost, multi-bit error-correcting codes . . . . 83--93 Jing Xue and Alok Garg and Berkehan Ciftcio\uglu and Jianyun Hu and Shang Wang and Ioannis Savidis and Manish Jain and Rebecca Berman and Peng Liu and Michael Huang and Hui Wu and Eby Friedman and Gary Wicks and Duncan Moore An intra-chip free-space optical interconnect . . . . . . . . . . . . . . 94--105 Reetuparna Das and Onur Mutlu and Thomas Moscibroda and Chita R. Das Aérgia: exploiting packet latency slack in on-chip networks . . . . . . . . . . 106--116 Pranay Koka and Michael O. McCracken and Herb Schwetman and Xuezhe Zheng and Ron Ho and Ashok V. Krishnamoorthy Silicon-photonic network architectures for scalable, power-efficient multi-chip systems . . . . . . . . . . . . . . . . 117--128 Scott Beamer and Chen Sun and Yong-Jin Kwon and Ajay Joshi and Christopher Batten and Vladimir Stojanovi\'c and Krste Asanovi\'c Re-architecting DRAM memory systems with monolithically integrated silicon photonics . . . . . . . . . . . . . . . 129--140 Stuart Schechter and Gabriel H. Loh and Karin Straus and Doug Burger Use ECP, not ECC, for hard failures in resistive memories . . . . . . . . . . . 141--152 Moinuddin K. Qureshi and Michele M. Franceschini and Luis A. Lastras-Montaño and John P. Karidis Morphable memory system: a robust architecture for exploiting multi-level phase change memories . . . . . . . . . 153--162 Timothy Pritchett and Mithuna Thottethodi SieveStore: a highly-selective, ensemble-level disk cache for cost-performance . . . . . . . . . . . . 163--174 Aniruddha N. Udipi and Naveen Muralimanohar and Niladrish Chatterjee and Rajeev Balasubramonian and Al Davis and Norman P. Jouppi Rethinking DRAM design and organization for energy-constrained multi-cores . . . 175--186 Yunji Chen and Weiwu Hu and Tianshi Chen and Ruiyang Wu LReplay: a pending period based deterministic replay scheme . . . . . . 187--197 Gwendolyn Voskuilen and Faraz Ahmad and T. N. Vijaykumar Timetraveler: exploiting acyclic races for optimizing memory race recording . . 198--209 Brandon Lucia and Luis Ceze and Karin Strauss and Shaz Qadeer and Hans-J. Boehm Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races . . . . . . . . . . . . . . . 210--221 Brandon Lucia and Luis Ceze and Karin Strauss ColorSafe: architectural support for debugging and dynamically avoiding multi-variable atomicity violations . . 222--233 Mary Jane Irwin Shared caches in multicores: the good, the bad, and the ugly . . . . . . . . . 234--234 Jiayuan Meng and David Tarjan and Kevin Skadron Dynamic warp subdivision for integrated branch and memory divergence tolerance 235--246 Srimat Chakradhar and Murugan Sankaradas and Venkata Jakkula and Srihari Cadambi A dynamically configurable coprocessor for convolutional neural networks . . . 247--257 Colin Blundell and Arun Raghavan and Milo M. K. Martin RETCON: transactional repair without replay . . . . . . . . . . . . . . . . . 258--269 Janghaeng Lee and Haicheng Wu and Madhumitha Ravichandran and Nathan Clark Thread Tailor: dynamically weaving threads together for efficient, adaptive parallel applications . . . . . . . . . 270--279 Sunpyo Hong and Hyesoon Kim An integrated GPU power and performance model . . . . . . . . . . . . . . . . . 280--289 Zhangxi Tan and Andrew Waterman and Henry Cook and Sarah Bird and Krste Asanovi\'c and David Patterson A case for FAME: FPGA architecture model execution . . . . . . . . . . . . . . . 290--301 Geoffrey Blake and Ronald G. Dreslinski and Trevor Mudge and Krisztián Flautner Evolution of thread-level parallelism in desktop applications . . . . . . . . . . 302--313 Vijay Janapa Reddi and Benjamin C. Lee and Trishul Chilimbi and Kushagra Vaid Web search using mobile cores: quantifying and mitigating the price of efficiency . . . . . . . . . . . . . . . 314--325 Vijayaraghavan Soundararajan and Jennifer M. Anderson The impact of management operations on the virtualized datacenter . . . . . . . 326--337 Dennis Abts and Michael R. Marty and Philip M. Wells and Peter Klausler and Hong Liu Energy proportional datacenter networks 338--347 Charles P. Thacker Improving the future by examining the past . . . . . . . . . . . . . . . . . . 348--348 Olivier Temam The rebirth of neural networks . . . . . 349--349 Eric Keller and Jakub Szefer and Jennifer Rexford and Ruby B. Lee NoHype: virtualized cloud infrastructure without the virtualization . . . . . . . 350--361 Stijn Eyerman and Lieven Eeckhout Modeling critical sections in Amdahl's Law and its implications for multicore design . . . . . . . . . . . . . . . . . 362--370 Xiaochen Guo and Engin Ipek and Tolga Soyata Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing . . . . . . . . . . . . 371--382 Nak Hee Seong and Dong Hyuk Woo and Hsien-Hsin S. Lee Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping . . . . . . . 383--394 Ruirui Huang and G. Edward Suh IVEC: off-chip memory integrity protection for both security and reliability . . . . . . . . . . . . . . 395--406 Arrvindh Shriraman and Sandhya Dwarkadas Sentry: light-weight auxiliary memory access control . . . . . . . . . . . . . 407--418 Enric Herrero and José González and Ramon Canal Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors . . . 419--428 John H. Kelm and Daniel R. Johnson and William Tuohy and Steven S. Lumetta and Sanjay J. Patel Cohesion: a hybrid memory model for accelerators . . . . . . . . . . . . . . 429--440 M. Aater Suleman and Onur Mutlu and José A. Joao and Khubaib and Yale N. Patt Data marshaling for multi-core architectures . . . . . . . . . . . . . 441--450 Victor W. Lee and Changkyu Kim and Jatin Chhugani and Michael Deisher and Daehyun Kim and Anthony D. Nguyen and Nadathur Satish and Mikhail Smelyanskiy and Srinivas Chennupaty and Per Hammarlund and Ronak Singhal and Pradeep Dubey Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU . . . . . . . . . . . . . . 451--460 Vilas Sridharan and David R. Kaeli Using hardware vulnerability factors to enhance AVF analysis . . . . . . . . . . 461--472 Amin Ansari and Shuguang Feng and Shantanu Gupta and Scott Mahlke Necromancer: enhancing system throughput by animating dead cores . . . . . . . . 473--484 Guihai Yan and Xiaoyao Liang and Yinhe Han and Xiaowei Li Leveraging the core-level complementary effects of PVT variations to reduce timing emergencies in multi-core processors . . . . . . . . . . . . . . . 485--496 Marc de Kruijf and Shuou Nomura and Karthikeyan Sankaralingam Relax: an architectural framework for software recovery of hardware faults . . 497--508
Marco Nuño-Maganda and Cesar Torres-Huitzil A temporal coding hardware implementation for spiking neural networks . . . . . . . . . . . . . . . . 2--7 Hirokazu Morisita and Kenta Inakagata and Yasunori Osana and Naoyuki Fujita and Hideharu Amano Implementation and evaluation of an arithmetic pipeline on FLOPS-$2$D: multi-FPGA system . . . . . . . . . . . 8--13 Anson H. T. Tse and David B. Thomas and K. H. Tsoi and Wayne Luk Efficient reconfigurable design for pricing Asian options . . . . . . . . . 14--20 Tadayoshi Horita and Itsuo Takanami An FPGA-based fast classifier with high generalization property . . . . . . . . 21--26 Andrew Putnam and Aaron Smith and Doug Burger Dynamic vectorization in the E2 dynamic multicore architecture . . . . . . . . . 27--32 Jong Kyung Paek and Kiyoung Choi and Jongeun Lee Binary acceleration using coarse-grained reconfigurable architecture . . . . . . 33--39 Keisuke Dohi and Yuichiro Shibata and Tsuyoshi Hamada and Tomonari Masada and Kiyoshi Oguri and Duncan A. Buell Implementation of a programming environment with a multithread model for reconfigurable systems . . . . . . . . . 40--45 Mojtaba Sabeghi and Hamid Mushtaq and Koen Bertels Runtime multitasking support on polymorphic platforms . . . . . . . . . 46--52 Kuen Hung Tsoi and Anson H. T. Tse and Peter Pietzuch and Wayne Luk Programming framework for clusters with heterogeneous accelerators . . . . . . . 53--59 Claude Tadonki and Gilbert Grodidier and Olivier Pene An efficient CELL library for lattice quantum chromodynamics . . . . . . . . . 60--65 Ryan Taylor and Xiaoming Li Software-based branch predication for AMD GPUs . . . . . . . . . . . . . . . . 66--72 Sebastian Banescu and Florent de Dinechin and Bogdan Pasca and Radu Tudoran Multipliers for floating-point double precision and beyond on FPGAs . . . . . 73--79 Kentaro Sano and Luzhou Wang and Satoru Yamamoto Prototype implementation of array-processor extensible over multiple FPGAs for scalable stencil computation 80--86 Chi-Chiu Tsang and Hayden Kwok-Hay So Dynamic power reduction of FPGA-based reconfigurable computers using precomputation . . . . . . . . . . . . . 87--92 Mark Thorson Internet nuggets . . . . . . . . . . . . 93--96
Manideepa Mukherjee and Amitabha Sinha A novel architecture for conversion of binary to single digit double base numbers . . . . . . . . . . . . . . . . 1--6 Shobha T. and Syed Akram and G. Varaprasad Design and development of framework for diagnosing intermediate nodes . . . . . 7--11 Fuad Tabba Adding concurrency in Python using a commercial processor's hardware transactional memory support . . . . . . 12--19 Alexander Thomasian Why specialized disks for composite operations may be unnecessary . . . . . 20--27 Mark Thorson Internet nuggets . . . . . . . . . . . . 28--36
James R. Larus The cloud will change everything . . . . 1--2 Ding Yuan and Jing Zheng and Soyeon Park and Yuanyuan Zhou and Stefan Savage Improving software diagnosability via log enhancement . . . . . . . . . . . . 3--14 Kaushik Veeraraghavan and Dongyoon Lee and Benjamin Wester and Jessica Ouyang and Peter M. Chen and Jason Flinn and Satish Narayanasamy DoublePlay: parallelizing sequential logging and replay . . . . . . . . . . . 15--26 Jared Casper and Tayo Oguntebi and Sungpack Hong and Nathan G. Bronson and Christos Kozyrakis and Kunle Olukotun Hardware acceleration of transactional memory on commodity systems . . . . . . 27--38 Luke Dalessandro and François Carouge and Sean White and Yossi Lev and Mark Moir and Michael L. Scott and Michael F. Spear Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory . . . . . . . . . . 39--52 Abhayendra Singh and Daniel Marino and Satish Narayanasamy and Todd Millstein and Madan Musuvathi Efficient processor support for DRFx, a memory model with exceptions . . . . . . 53--66 Joseph Devietti and Jacob Nelson and Tom Bergan and Luis Ceze and Dan Grossman RCDC: a relaxed consistency deterministic computer . . . . . . . . . 67--78 Jacob Burnim and George Necula and Koushik Sen Specifying and checking semantic atomicity for multithreaded programs . . 79--90 Haris Volos and Andres Jaan Tack and Michael M. Swift Mnemosyne: lightweight persistent memory 91--104 Joel Coburn and Adrian M. Caulfield and Ameen Akel and Laura M. Grupp and Rajesh K. Gupta and Ranjit Jhala and Steven Swanson NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories . . . . . . . . . 105--118 Adrian Schüpbach and Andrew Baumann and Timothy Roscoe and Simon Peter A declarative language approach to device configuration . . . . . . . . . . 119--132 Leonid Ryzhyk and John Keys and Balachandra Mirla and Arun Raghunath and Mona Vij and Gernot Heiser Improved device driver reliability through hardware verification reuse . . 133--144 Atif Hashmi and Andrew Nere and James Jamal Thomas and Mikko Lipasti A case for neuromorphic ISAs . . . . . . 145--158 Benjamin Ransford and Jacob Sorber and Kevin Fu Mementos: system support for long-running computation on RFID-scale devices . . . . . . . . . . . . . . . . 159--170 Emmanouil Koukoumidis and Dimitrios Lymberopoulos and Karin Strauss and Jie Liu and Doug Burger Pocket cloudlets . . . . . . . . . . . . 171--184 Navin Sharma and Sean Barker and David Irwin and Prashant Shenoy Blink: managing server clusters on intermittent power . . . . . . . . . . . 185--198 Henry Hoffmann and Stelios Sidiroglou and Michael Carbin and Sasa Misailovic and Anant Agarwal and Martin Rinard Dynamic knobs for responsive power-aware computing . . . . . . . . . . . . . . . 199--212 Song Liu and Karthik Pattabiraman and Thomas Moscibroda and Benjamin G. Zorn Flikker: saving DRAM refresh-power through critical data partitioning . . . 213--224 Qingyuan Deng and David Meisner and Luiz Ramos and Thomas F. Wenisch and Ricardo Bianchini MemScale: active low-power modes for main memory . . . . . . . . . . . . . . 225--238 Qi Gao and Wenbin Zhang and Zhezhe Chen and Mai Zheng and Feng Qin 2ndStrike: toward manifesting hidden concurrency typestate bugs . . . . . . . 239--250 Wei Zhang and Junghee Lim and Ramya Olichandran and Joel Scherpelz and Guoliang Jin and Shan Lu and Thomas Reps ConSeq: detecting concurrency bugs through sequential errors . . . . . . . 251--264 Vitaly Chipounov and Volodymyr Kuznetsov and George Candea S2E: a platform for in-vivo multi-path analysis of software systems . . . . . . 265--278 Owen S. Hofmann and Alan M. Dunn and Sangman Kim and Indrajit Roy and Emmett Witchel Ensuring operating system kernel integrity with OSck . . . . . . . . . . 279--290 Donald E. Porter and Silas Boyd-Wickizer and Jon Howell and Reuben Olinsky and Galen C. Hunt Rethinking the library OS from the top down . . . . . . . . . . . . . . . . . . 291--304 Nicolas Palix and Gaël Thomas and Suman Saha and Christophe Calv\`es and Julia Lawall and Gilles Muller Faults in Linux: ten years later . . . . 305--318 Hadi Esmaeilzadeh and Ting Cao and Yang Xi and Stephen M. Blackburn and Kathryn S. McKinley Looking back on the language and hardware revolutions: measured power, performance, and scaling . . . . . . . . 319--332 Donald Nguyen and Keshav Pingali Synthesizing concurrent schedulers for irregular algorithms . . . . . . . . . . 333--344 Giang Hoang and Robby Bruce Findler and Russ Joseph Exploring circuit timing-aware language and compilation . . . . . . . . . . . . 345--356 Sardar M. Farhad and Yousun Ko and Bernd Burgstaller and Bernhard Scholz Orchestration by approximation: mapping stream programs onto multicore architectures . . . . . . . . . . . . . 357--368 Eddy Z. Zhang and Yunlian Jiang and Ziyu Guo and Kai Tian and Xipeng Shen On-the-fly elimination of dynamic irregularities for GPU computing . . . . 369--380 Amir H. Hormati and Mehrzad Samadi and Mark Woh and Trevor Mudge and Scott Mahlke Sponge: portable stream programming on graphics engines . . . . . . . . . . . . 381--392 Md Kamruzzaman and Steven Swanson and Dean M. Tullsen Inter-core prefetching for multicore processors using migrating helper threads . . . . . . . . . . . . . . . . 393--404 Hiroshige Hayashizaki and Peng Wu and Hiroshi Inoue and Mauricio J. Serrano and Toshio Nakatani Improving the performance of trace-based systems by false loop filtering . . . . 405--418
Nathan Binkert and Bradford Beckmann and Gabriel Black and Steven K. Reinhardt and Ali Saidi and Arkaprava Basu and Joel Hestness and Derek R. Hower and Tushar Krishna and Somayeh Sardashti and Rathijit Sen and Korey Sewell and Muhammad Shoaib and Nilay Vaish and Mark D. Hill and David A. Wood The gem5 simulator . . . . . . . . . . . 1--7 Alexander Thomasian Survey and analysis of disk scheduling methods . . . . . . . . . . . . . . . . 8--25 Thimmarayaswamy K and Mary M. Dsouza and G. Varaprasad Low power techniques for an Android based phone . . . . . . . . . . . . . . 26--35 Mark Thorson Internet nuggets . . . . . . . . . . . . 36--52
Atif Hashmi and Hugues Berry and Olivier Temam and Mikko Lipasti Automatic abstraction and fault tolerance in cortical microachitectures 1--10 Niket K. Choudhary and Salil V. Wadhavkar and Tanmay A. Shah and Hiran Mayukh and Jayneel Gandhi and Brandon H. Dwiel and Sandeep Navada and Hashem H. Najaf-abadi and Eric Rotenberg FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template . . . . . 11--22 Erika Gunadi and Mikko H. Lipasti CRIB: consolidated rename, issue, and bypass . . . . . . . . . . . . . . . . . 23--32 Rishi Agarwal and Josep Torrellas FlexBulk: intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes . . 33--44 Youngjin Kwon and Changdae Kim and Seungryoul Maeng and Jaehyuk Huh Virtualizing performance asymmetric multi-core systems . . . . . . . . . . . 45--56 Daniel Sanchez and Christos Kozyrakis Vantage: scalable and efficient fine-grain cache partitioning . . . . . 57--68 Asit K. Mishra and Xiangyu Dong and Guangyu Sun and Yuan Xie and N. Vijaykrishnan and Chita R. Das Architecting on-chip interconnects for stacked $3$D STT-RAM caches in CMPs . . 69--80 Jayesh Gaur and Mainak Chaudhuri and Sreenivas Subramoney Bypass and insertion algorithms for exclusive last-level caches . . . . . . 81--92 Blas A. Cuesta and Alberto Ros and María E. Gómez and Antonio Robles and José F. Duato Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks . . 93--104 Jungju Oh and Milos Prvulovic and Alenka Zajic TLSync: support for multiple fast barriers using on-chip transmission lines . . . . . . . . . . . . . . . . . 105--116 Neal Clayton Crago and Sanjay Jeram Patel OUTRIDER: efficient memory latency tolerance with decoupled strands . . . . 117--128 Yunsup Lee and Rimas Avizienis and Alex Bishara and Richard Xia and Derek Lockhart and Christopher Batten and Krste Asanovi\'c Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators . . . . . . . 129--140 Eiman Ebrahimi and Chang Joo Lee and Onur Mutlu and Yale N. Patt Prefetch-aware shared resource management for multi-core systems . . . 141--152 Rishi Agarwal and Pranav Garg and Josep Torrellas Rebound: scalable checkpointing for coherent shared memory . . . . . . . . . 153--164 Joseph L. Greathouse and Zhiqiang Ma and Matthew I. Frank and Ramesh Peri and Todd Austin Demand-driven software race detection using hardware performance counters . . 165--176 Siddhartha Chhabra and Yan Solihin i-NVMM: a secure non-volatile main memory system with incremental encryption . . . . . . . . . . . . . . . 177--188 Mohit Tiwari and Jason K. Oberg and Xun Li and Jonathan Valamehr and Timothy Levin and Ben Hardekopf and Ryan Kastner and Frederic T. Chong and Timothy Sherwood Crafting a usable microkernel, processor, and I/O system with strict and provable information flow security 189--200 Shuou Nomura and Matthew D. Sinclair and Chen-Han Ho and Venkatraman Govindaraju and Marc de Kruijf and Karthikeyan Sankaralingam Sampling $+$ DMR: practical and low-overhead permanent fault detection 201--212 Sangeetha Sudhakrishnan and Rigo Dicochea and Jose Renau Releasing efficient beta cores to market early . . . . . . . . . . . . . . . . . 213--222 Mehrtash Manoochehri and Murali Annavaram and Michel Dubois CPPC: correctable parity protected cache 223--234 Mark Gebhart and Daniel R. Johnson and David Tarjan and Stephen W. Keckler and William J. Dally and Erik Lindholm and Kevin Skadron Energy-efficient mechanisms for managing thread context in throughput processors 235--246 Wing-kei S. Yu and Ruirui Huang and Sarah Q. Xu and Sung-En Wang and Edwin Kan and G. Edward Suh SRAM--DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading . . . . 247--258 Binzhang Fu and Yinhe Han and Jun Ma and Huawei Li and Xiaowei Li An abacus turn model for time/space-efficient reconfigurable routing . . . . . . . . . . . . . . . . 259--270 Aaron Carpenter and Jianyun Hu and Jie Xu and Michael Huang and Hui Wu A case for globally shared-medium on-chip interconnect . . . . . . . . . . 271--282 Lingjia Tang and Jason Mars and Neil Vachharajani and Robert Hundt and Mary Lou Soffa The impact of memory subsystem resource sharing on datacenter applications . . . 283--294 Doe Hyun Yoon and Min Kyu Jeong and Mattan Erez Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput . . . . . . . . . . . . . . . 295--306 Thomas W. Barr and Alan L. Cox and Scott Rixner SpecTLB: a mechanism for speculative address translation . . . . . . . . . . 307--318 David Meisner and Christopher M. Sadler and Luiz André Barroso and Wolf-Dietrich Weber and Thomas F. Wenisch Power management of online data-intensive services . . . . . . . . 319--330 Susmit Biswas and Mohit Tiwari and Timothy Sherwood and Luke Theogarajan and Frederic T. Chong Fighting fire with fire: modeling the datacenter-scale effects of targeted superlattice thermal management . . . . 331--340 Sriram Govindan and Anand Sivasubramaniam and Bhuvan Urgaonkar Benefits and limitations of tapping into stored energy for datacenters . . . . . 341--352 John Demme and Simha Sethumadhavan Rapid identification of architectural bottlenecks via precise event counting 353--364 Hadi Esmaeilzadeh and Emily Blem and Renee St. Amant and Karthikeyan Sankaralingam and Doug Burger Dark silicon and the end of multicore scaling . . . . . . . . . . . . . . . . 365--376 Guangyu Sun and Christopher J. Hughes and Changkyu Kim and Jishen Zhao and Cong Xu and Yuan Xie and Yen-Kuang Chen Moguls: a model to explore the memory hierarchy for bandwidth improvements . . 377--388 Asit K. Mishra and N. Vijaykrishnan and Chita R. Das A case for heterogeneous on-chip interconnects for CMPs . . . . . . . . . 389--400 Boris Grot and Joel Hestness and Stephen W. Keckler and Onur Mutlu Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees . . . 401--412 Sheng Ma and Natalie Enright Jerger and Zhiying Wang DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip . . . . . . . . . . 413--424 Aniruddha N. Udipi and Naveen Muralimanohar and Rajeev Balasubramonian and Al Davis and Norman P. Jouppi Combining memory and a controller with photonics through $3$D-stacking to enable scalable and energy-efficient systems . . . . . . . . . . . . . . . . 425--436 Nathan Binkert and Al Davis and Norman P. Jouppi and Moray McLaren and Naveen Muralimanohar and Robert Schreiber and Jung Ho Ahn The role of optics in future high radix switch design . . . . . . . . . . . . . 437--448 Kai Ma and Xue Li and Ming Chen and Xiaorui Wang Scalable power control for many-core architectures running multi-threaded applications . . . . . . . . . . . . . . 449--460 Alaa R. Alameldeen and Ilya Wagner and Zeshan Chishti and Wei Wu and Chris Wilkerson and Shih-Lien Lu Energy-efficient cache design using variable-strength error-correcting codes 461--472 Luiz Andre Barroso Warehouse-Scale Computing: Entering the Teenage Decade . . . . . . . . . . . . . ?? David A. Ferrucci IBM's Watson/DeepQA . . . . . . . . . . ?? Ravi Kannan Algorithms: Recent Highlights and Challenges . . . . . . . . . . . . . . . ??
Miriam Leeser and Devon Yablonski and Dana Brooks and Laurie Smith King The challenges of writing portable, correct and high performance libraries for GPUs . . . . . . . . . . . . . . . . 2--7 Kuen Hung Tsoi and Wayne Luk Power profiling and optimization for heterogeneous multi-core systems . . . . 8--13 Serban Georgescu and Peter Chow GPU accelerated CAE using open solvers and the cloud . . . . . . . . . . . . . 14--19 Junying Chen and Billy Y. S. Yiu and Brandon K. Hamilton and Alfred C. H. Yu and Hayden K.-H. So Design space exploration of adaptive beamforming acceleration for bedside and portable medical ultrasound imaging . . 20--25 Keisuke Dohi and Yuichiro Shibata and Kiyoshi Oguri and Takafumi Fujimoto GPU implementation and optimization of electromagnetic simulation using the FDTD method for antenna designing . . . 26--31 Tomoyuki Nagatsuka and Yoshito Sakaguchi and Takayuki Matsumura and Kenji Kise CoreSymphony: an efficient reconfigurable multi-core architecture 32--37 Shinya Takamaeda-Yamazaki and Ryosuke Sasakawa and Yoshito Sakaguchi and Kenji Kise An FPGA-based scalable simulation accelerator for tile architectures . . . 38--43 Kentaro Sano and Satoru Yamamoto and Yoshiaki Hatsuda Domain-specific programmable design of scalable streaming-array for power-efficient stencil computation . . 44--49 Takayuki Akamine and Kenta Inakagata and Yasunori Osana and Naoyuki Fujita and Hideharu Amano An implementation of out-of-order execution system for acceleration of computational fluid dynamics on FPGAs 50--55 Haisheng Liu and Smail Niar and Yassin El-Hillali and Atika Rivenq Embedded architecture with hardware accelerator for target recognition in driver assistance system . . . . . . . . 56--59 Oliver Pell and Oskar Mencer Surviving the end of frequency scaling with reconfigurable dataflow computing 60--65 Ana Balevic and Bart Kienhuis KPN2GPU: an approach for discovery and exploitation of fine-grain data parallelism in process networks . . . . 66--71 Amila Akagi\'c and Hideharu Amano High speed CRC with 64-bit generator polynomial on an FPGA . . . . . . . . . 72--77 Shufan Yang and T. M. McGinnity A biologically plausible real-time spiking neuron simulation environment based on a multiple-FPGA platform . . . 78--81 Hiroomi Sawada and Morihiro Kuga and Motoki Amagasaki and Masahiro Iida and Toshinori Sueyoshi Parallelization of the channel width search for FPGA routing . . . . . . . . 82--85 Shoji Tanabe and Takuya Nagashima and Yoshiki Yamaguchi A study of an FPGA based flexible SIMD processor . . . . . . . . . . . . . . . 86--89 Antoine Trouve and Kazuaki Murakami Augmenting DR-ASIP flexibility through multi-mode custom instructions . . . . . 90--93 Shinya Kubota and Minoru Watanabe A MEMS writer system embedded for a programmable optically reconfigurable gate array . . . . . . . . . . . . . . . 94--97 Jan Fousek and Ji\vri Filipovi\vc and Matu\vs Madzin Automatic fusions of CUDA--GPU kernels for parallel map . . . . . . . . . . . . 98--99 Kohei Matsunobu and Keisuke Dohi and Yuichiro Shibata and Kiyoshi Oguri A discussion on calculating eigenvalues of real symmetric tridiagonal matrices on a GPU . . . . . . . . . . . . . . . . 100--101 Dominik Meyer and Bernd Klauer Multicore reconfiguration platform an alternative to RAMPSoC . . . . . . . . . 102--103 Robin Bonamy and Daniel Chillet and Olivier Sentieys and Sebastien Bilavarn Parallelism Level Impact on Energy Consumption in Reconfigurable Devices 104--105 Michael Opoku Agyeman and Ali Ahmadinia Power and area optimisation in heterogeneous $3$D networks-on-chip architectures . . . . . . . . . . . . . 106--107 Mark Thorson Internet nuggets . . . . . . . . . . . . 108--117
Malay Das and Amitabha Sinha and Nishant Kumar Giri High speed residue number system (RNS) based FIR filter using distributed arithmetic (DA) . . . . . . . . . . . . 1--4 Anindita Chakraborty and Amitabha Sinha Conversion of binary to single-term triple base numbers for DSP applications 5--11 Satrughna Singha and Aniruddha Ghosh and Amitabha Sinha A new architecture for FPGA based implementation of conversion of binary to double base number system (DBNS) using parallel search technique . . . . 12--18 Mark Thorson Internet nuggets . . . . . . . . . . . . 19--23
Dimitrios Lymberopoulos and Oriana Riva and Karin Strauss and Akshay Mittal and Alexandros Ntoulas PocketWeb: instant web browsing for mobile devices . . . . . . . . . . . . . 1--12 Felix Xiaozhu Lin and Zhen Wang and Robert LiKamWa and Lin Zhong Reflex: using low-power processors in smartphones without knowing them . . . . 13--24 Jichuan Chang and Justin Meza and Parthasarathy Ranganathan and Amip Shah and Rocky Shih and Cullen Bash Totally green: evaluating and designing servers for lifecycle environmental impact . . . . . . . . . . . . . . . . . 25--36 Michael Ferdman and Almutaz Adileh and Onur Kocberber and Stavros Volos and Mohammad Alisafaee and Djordje Jevdjic and Cansu Kaynak and Adrian Daniel Popescu and Anastasia Ailamaki and Babak Falsafi Clearing the clouds: a study of emerging scale-out workloads on modern hardware 37--48 Yang Chen and Shuangde Fang and Lieven Eeckhout and Olivier Temam and Chengyong Wu Iterative optimization for the data center . . . . . . . . . . . . . . . . . 49--60 Faraz Ahmad and Srimat T. Chakradhar and Anand Raghunathan and T. N. Vijaykumar Tarazu: optimizing MapReduce on heterogeneous clusters . . . . . . . . . 61--74 Sriram Govindan and Di Wang and Anand Sivasubramaniam and Bhuvan Urgaonkar Leveraging stored energy for handling power emergencies in aggressively provisioned datacenters . . . . . . . . 75--86 Asim Kadav and Michael M. Swift Understanding modern device drivers . . 87--98 Sankaralingam Panneerselvam and Michael M. Swift Chameleon: operating system support for dynamic processors . . . . . . . . . . . 99--110 Andy A. Hwang and Ioan A. Stefanovici and Bianca Schroeder Cosmic rays don't strike twice: understanding the nature of DRAM errors and the implications for system design 111--122 Siva Kumar Sastry Hari and Sarita V. Adve and Helia Naeimi and Pradeep Ramachandran Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults . . . . . 123--134 Peter Feiner and Angela Demke Brown and Ashvin Goel Comprehensive kernel instrumentation via dynamic binary translation . . . . . . . 135--146 Rei Odaira and Toshio Nakatani Continuous object access profiling and optimizations to overcome the memory wall and bloat . . . . . . . . . . . . . 147--158 Joseph L. Greathouse and Hongyi Xin and Yixin Luo and Todd Austin A case for unlimited watchpoints . . . . 159--172 Marek Olszewski and Qin Zhao and David Koh and Jason Ansel and Saman Amarasinghe Aikido: accelerating shared data dynamic analyses . . . . . . . . . . . . . . . . 173--184 Baris Kasikci and Cristian Zamfir and George Candea Data races vs. data race bugs: telling the difference with Portend . . . . . . 185--198 Austin T. Clements and M. Frans Kaashoek and Nickolai Zeldovich Scalable address spaces using RCU balanced trees . . . . . . . . . . . . . 199--210 Haris Volos and Andres Jaan Tack and Michael M. Swift and Shan Lu Applying transactional memory to concurrency bugs . . . . . . . . . . . . 211--222 José A. Joao and M. Aater Suleman and Onur Mutlu and Yale N. Patt Bottleneck identification and scheduling in multithreaded applications . . . . . 223--234 Petar Radojkovi\'c and Vladimir Cakarevi\'c and Miquel Moretó and Javier Verdú and Alex Pajuelo and Francisco J. Cazorla and Mario Nemirovsky and Mateo Valero Optimal task assignment in multithreaded processors: a statistical approach . . . 235--248 Aamer Jaleel and Hashem H. Najaf-abadi and Samantika Subramaniam and Simon C. Steely and Joel Emer CRUISE: cache replacement and utility-aware scheduling . . . . . . . . 249--260 Matthew DeVuyst and Ashish Venkat and Dean M. Tullsen Execution migration in a heterogeneous-ISA chip multiprocessor 261--272 Changhui Lin and Vijay Nagarajan and Rajiv Gupta and Bharghava Rajaram Efficient sequential consistency via conflict ordering . . . . . . . . . . . 273--286 David Cheriton and Amin Firoozshahian and Alex Solomatnikov and John P. Stevenson and Omid Azizi HICAMP: architectural support for efficient concurrency-safe shared structured data access . . . . . . . . . 287--300 Hadi Esmaeilzadeh and Adrian Sampson and Luis Ceze and Doug Burger Architecture support for disciplined approximate programming . . . . . . . . 301--312 David Meisner and Thomas F. Wenisch DreamWeaver: architectural support for deep sleep . . . . . . . . . . . . . . . 313--324 Myron King and Nirav Dave and Arvind Automatic generation of hardware/software interfaces . . . . . . 325--336 Lorenzo Martignoni and Stephen McCamant and Pongsin Poosankam and Dawn Song and Petros Maniatis Path-exploration lifting: hi-fi tests for lo-fi emulators . . . . . . . . . . 337--348 Sungpack Hong and Hassan Chafi and Edic Sedlar and Kunle Olukotun Green-Marl: a DSL for easy and efficient graph analysis . . . . . . . . . . . . . 349--362 Yongjun Park and Sangwon Seo and Hyunchul Park and Hyoun Kyu Cho and Scott Mahlke SIMD defragmenter: efficient ILP realization on data-parallel architectures . . . . . . . . . . . . . 363--374 Dilip Nijagal Simha and Maohua Lu and Tzi-cker Chiueh An update-aware storage system for low-locality update-intensive workloads 375--386 Adrian M. Caulfield and Todor I. Mollov and Louis Alex Eisner and Arup De and Joel Coburn and Steven Swanson Providing safe, user space access to fast, solid state disks . . . . . . . . 387--400 Dushyanth Narayanan and Orion Hodson Whole-system persistence . . . . . . . . 401--410 Abel Gordon and Nadav Amit and Nadav Har'El and Muli Ben-Yehuda and Alex Landau and Assaf Schuster and Dan Tsafrir ELI: bare-metal performance for I/O virtualization . . . . . . . . . . . . . 411--422 Nedeljko Vasi\'c and Dejan Novakovi\'c and Svetozar Miucin and Dejan Kosti\'c and Ricardo Bianchini DejaVu: accelerating resource allocation in virtualized environments . . . . . . 423--436 Jakub Szefer and Ruby B. Lee Architectural support for hypervisor-secure virtualization . . . . 437--450 Min Lee and Karsten Schwan Region scheduling: efficiently using the cache architectures via page-level affinity . . . . . . . . . . . . . . . . 451--462
B. H. H. Juurlink and C. H. Meenderinck Amdahl's law for predicting the future of multicores considered harmful . . . . 1--9 Conrad Mueller Axiom based architecture . . . . . . . . 10--17 Alexander Thomasian Rebuild processing in RAID5 with emphasis on the supplementary parity augmentation method . . . . . . . . . . 18--27 Nishant Kumar Giri and Amitabha Sinha FPGA implementation of a novel architecture for performance enhancement of Radix-2 FFT . . . . . . . . . . . . . 28--32 Aniruddha Ghosh and Satrughna Singha and Amitabha Sinha A new architecture for FPGA implementation of a MAC unit for digital signal processors using mixed number system . . . . . . . . . . . . . . . . . 33--38 Aniruddha Ghosh and Satrughna Singha and Amitabha Sinha ``Floating point RNS'': a new concept for designing the MAC unit of digital signal processor . . . . . . . . . . . . 39--43 Mark Thorson Internet nuggets . . . . . . . . . . . . 44--49
Jamie Liu and Ben Jaiyen and Richard Veras and Onur Mutlu RAIDR: Retention-Aware Intelligent DRAM Refresh . . . . . . . . . . . . . . . . 1--12 Mahdi Nazm Bojnordi and Engin Ipek PARDIS: a programmable memory controller for the DDRx interfacing standards . . . 13--24 Doe Hyun Yoon and Jichuan Chang and Naveen Muralimanohar and Parthasarathy Ranganathan BOOM: enabling mobile memory based low-power server DIMMs . . . . . . . . . 25--36 Krishna T. Malladi and Benjamin C. Lee and Frank A. Nothaft and Christos Kozyrakis and Karthika Periyathambi and Mark Horowitz Towards energy-proportional datacenter memory with mobile DRAM . . . . . . . . 37--48 Nicolas Brunie and Sylvain Collange and Gregory Diamos Simultaneous branch and warp interweaving for sustained GPU performance . . . . . . . . . . . . . . 49--60 Minsoo Rhu and Mattan Erez CAPRI: prediction of compaction-adequacy for handling control-divergence in GPGPU architectures . . . . . . . . . . . . . 61--71 Jaikrishnan Menon and Marc De Kruijf and Karthikeyan Sankaralingam iGPU: exception support and speculative execution on GPUs . . . . . . . . . . . 72--83 José-María Arnau and Joan-Manuel Parcerisa and Polychronis Xekalakis Boosting mobile GPU performance with a decoupled access/execute fragment processor . . . . . . . . . . . . . . . 84--93 Mehmet Kayaalp and Meltem Ozsoy and Nael Abu-Ghazaleh and Dmitry Ponomarev Branch regulation: low-overhead protection from code reuse attacks . . . 94--105 John Demme and Robert Martin and Adam Waksman and Simha Sethumadhavan Side-channel vulnerability factor: a metric for measuring information leakage 106--117 Robert Martin and John Demme and Simha Sethumadhavan TimeWarp: rethinking timekeeping and performance monitoring mechanisms to mitigate side-channel attacks . . . . . 118--129 Jonathan Valamehr and Melissa Chase and Seny Kamara and Andrew Putnam and Dan Shumow and Vinod Vaikuntanathan and Timothy Sherwood Inspection resistant memory: architectural support for security from physical examination . . . . . . . . . . 130--141 Yi Xu and Jun Yang and Rami Melhem Tolerating process variations in nanophotonic on-chip networks . . . . . 142--152 Pranay Koka and Michael O. McCracken and Herb Schwetman and Chia-Hsin Owen Chen and Xuezhe Zheng and Ron Ho and Kannan Raj and Ashok V. Krishnamoorthy A micro-architectural analysis of switched photonic multi-chip interconnects . . . . . . . . . . . . . 153--164 Aaron Carpenter and Jianyun Hu and Ovunc Kocabas and Michael Huang and Hui Wu Enhancing effective throughput for transmission line-based bus . . . . . . 165--176 Michihiro Koibuchi and Hiroki Matsutani and Hideharu Amano and D. Frank Hsu and Henri Casanova A case for random shortcut topologies for HPC interconnects . . . . . . . . . 177--188 Santosh Nagarakatte and Milo M. K. Martin and Steve Zdancewic Watchdog: hardware for safe and secure manual memory management and full memory safety . . . . . . . . . . . . . . . . . 189--200 Joseph Devietti and Benjamin P. Wood and Karin Strauss and Luis Ceze and Dan Grossman and Shaz Qadeer RADISH: always-on sound and complete \underlineRace \underlineDetection \underlinein \underlineSoftware and \underlineHardware . . . . . . . . . . . 201--212 Kenzo Van Craeynest and Aamer Jaleel and Lieven Eeckhout and Paolo Narvaez and Joel Emer Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE) . . . . . . . . . . . . . . . . . 213--224 Ting Cao and Stephen M. Blackburn and Tiejun Gao and Kathryn S. McKinley The yin and yang of power and performance for asymmetric hardware and managed software . . . . . . . . . . . . 225--236 Evgeni Krimer and Patrick Chiang and Mattan Erez Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures . . . . . . . . . . . . . 237--248 Timothy N. Miller and Renji Thomas and Xiang Pan and Radu Teodorescu VRSync: characterizing and eliminating synchronization-induced voltage emergencies in many-core processors . . 249--260 Ioannis Doudalis and Milos Prvulovic Euripus: a flexible unified hardware memory checkpointing accelerator for bidirectional-debugging and reliability 261--272 Arun Arvind Nair and Stijn Eyerman and Lieven Eeckhout and Lizy Kurian John A first-order mechanistic model for architectural vulnerability factor . . . 273--284 Aniruddha N. Udipi and Naveen Muralimanohar and Rajeev Balsubramonian and Al Davis and Norman P. Jouppi LOT-ECC: localized and tiered reliability mechanisms for commodity memory systems . . . . . . . . . . . . . 285--296 Arkaprava Basu and Mark D. Hill and Michael M. Swift Reducing memory reference energy with opportunistic virtual caching . . . . . 297--308 Zhe Wang and Samira M. Khan and Daniel A. Jiménez Improving writeback efficiency with decoupled last-write prediction . . . . 309--320 Jaewoong Sim and Jaekyu Lee and Moinuddin K. Qureshi and Hyesoon Kim FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion . . . . . . . . . . . . . . . 321--332 Gaurang Upasani and Xavier Vera and Antonio González Setting an error detection infrastructure with low cost acoustic wave detectors . . . . . . . . . . . . . 333--343 Andrea Pellegrini and Joseph L. Greathouse and Valeria Bertacco Viper: virtual pipelines for enhanced reliability . . . . . . . . . . . . . . 344--355 Olivier Temam A defect-tolerant accelerator for emerging high-performance applications 356--367 Yoongu Kim and Vivek Seshadri and Donghyuk Lee and Jamie Liu and Onur Mutlu A case for exploiting subarray-level parallelism (SALP) in DRAM . . . . . . . 368--379 Moinuddin K. Qureshi and Michele M. Franceschini and Ashish Jagmohan and Luis A. Lastras PreSET: improving performance of phase change memories by exploiting asymmetry in write times . . . . . . . . . . . . . 380--391 Elliott Cooper-Balis and Paul Rosenfeld and Bruce Jacob Buffer-on-board memory systems . . . . . 392--403 Myoungsoo Jung and Ellis H. Wilson III and Mahmut Kandemir Physically Addressed Queueing (PAQ): improving parallelism in solid state disks . . . . . . . . . . . . . . . . . 404--415 Rachata Ausavarungnirun and Kevin Kai-Wei Chang and Lavanya Subramanian and Gabriel H. Loh and Onur Mutlu Staged memory scheduling: achieving high performance and scalability in heterogeneous systems . . . . . . . . . 416--427 R. Manikantan and Kaushik Rajan and R. Govindarajan Probabilistic Shared Cache Management (PriSM) . . . . . . . . . . . . . . . . 428--439 Nadathur Satish and Changkyu Kim and Jatin Chhugani and Hideki Saito and Rakesh Krishnaiyer and Mikhail Smelyanskiy and Milind Girkar and Pradeep Dubey Can traditional programming bridge the Ninja performance gap for parallel computing applications? . . . . . . . . 440--451 Melanie Kambadur and Kui Tang and Martha A. Kim Harmony: collection and analysis of parallel block vectors . . . . . . . . . 452--463 David Wentzlaff and Christopher J. Jackson and Patrick Griffin and Anant Agarwal Configurable fine-grain protection for multicore processor virtualization . . . 464--475 Jeongseob Ahn and Seongwook Jin and Jaehyuk Huh Revisiting hardware-assisted page walks for virtualized systems . . . . . . . . 476--487 Vasileios Kontorinis and Liuyi Eric Zhang and Baris Aksanli and Jack Sampson and Houman Homayoun and Eddie Pettis and Dean M. Tullsen and Tajana Simunic Rosing Managing distributed UPS energy for effective power capping in data centers 488--499 Pejman Lotfi-Kamran and Boris Grot and Michael Ferdman and Stavros Volos and Onur Kocberber and Javier Picorel and Almutaz Adileh and Djordje Jevdjic and Sachin Idgunji and Emre Ozer and Babak Falsafi Scale-out processors . . . . . . . . . . 500--511 Chao Li and Amer Qouneh and Tao Li iSwitch: coordinating and optimizing renewable energy powered server clusters 512--523 Abhayendra Singh and Satish Narayanasamy and Daniel Marino and Todd Millstein and Madanlal Musuvathi End-to-end sequential consistency . . . 524--535 Jason Mars and Naveen Kumar BlockChop: dynamic squash elimination for hybrid processor architecture . . . 536--547 Doe Hyun Yoon and Min Kyu Jeong and Michael Sullivan and Mattan Erez The dynamic granularity memory system 548--559
Marcos K. Aguilera and Dahlia Malkhi and Keith Marzullo and Alessandro Panconesi and Andrzej Pelc and Roger Wattenhofer Announcing the 2012 Edsger W. Dijkstra Prize in Distributed Computing . . . . . 1--2 Subhashis Maitra and Amitabha Sinha A new algorithm for computing triple-base number system . . . . . . . 3--9 Shiv Kumar and Seshadri Krishna Murthy and G. Varaprasad and S. Sivasathya Network load and traffic pattern on the capacity of wireless ad hoc networks . . 10--25 M. N. Isa and K. Benkrid and T. Clayton Efficient architecture and scheduling technique for pairwise sequence alignment . . . . . . . . . . . . . . . 26--31 A. K. Oudjida and N. Chaillet and M. L. Berrandjia and A. Liacha A new high radix-2 $r$ ($ r \geq 8$) multibit recoding algorithm for large operand size ($ N \geq 32$) multipliers 32--43 Mark Thorson Internet nuggets . . . . . . . . . . . . 44--48
Hideharu Amano and Wayne Luk FPGA-based Connect6 solver with hardware-accelerated move refinement . . 4--9 Thomas C. P. Chau and Wayne Luk and Peter Y. K. Cheung Roberts: reconfigurable platform for benchmarking real-time systems . . . . . 10--15 Kei Kinoshita and Daisuke Takano and Tomoyuki Okamura and Tetsuhiko Yao and Yoshiki Yamaguchi An augmented reality system with a coarse-grained reconfigurable device . . 16--21 Nicholas Ng and Nobuko Yoshida and Xin Yu Niu and Kuen Hung Tsoi Session types: towards safe and fast reconfigurable programming . . . . . . . 22--27 Rizwan Syed and Yajun Ha and Bharadwaj Veeravalli A low overhead abstract architecture for FPGA resource management . . . . . . . . 28--33 Kuen Hung Tsoi and Tobias Becker and Wayne Luk Modelling reconfigurable systems in event driven simulation . . . . . . . . 34--39 Zheng Zhi Shun and Tsutomu Maruyama FPGA acceleration of CDO pricing based on correlation expansions . . . . . . . 40--45 Hiroki Nakahara and Hiroyuki Nakanishi and Tsutomu Sasao On a wideband Fast Fourier Transform for a radio telescope . . . . . . . . . . . 46--51 Cheng Ling and Khaled Benkrid and Tsuyoshi Hamada High performance phylogenetic analysis on CUDA-compatible GPUs . . . . . . . . 52--57 Colin Yu Lin and Hayden Kwok-Hay Kwok-Hay So Energy-efficient dataflow computations on FPGAs using application-specific coarse-grain architecture synthesis . . 58--63 Jamshaid Sarwar Malik and Paolo Palazzari and Ahmed Hemani Effort, resources, and abstraction vs performance in high-level synthesis: finding new answers to an old question 64--69 Takeshi Kakimoto and Keisuke Dohi and Yuichiro Shibata and Kiyoshi Oguri Performance comparison of GPU programming frameworks with the striped Smith--Waterman algorithm . . . . . . . 70--75 Julien Tribino and Antoine Trouvé and Hadrien A. Clarke and Kazuaki J. Murakami PASTIS: a photonic arbitration with scalable token injection scheme . . . . 76--81 Takahiro Watanabe and Minoru Watanabe $ 0.18 \mu $ m CMOS process high-sensitivity optically reconfigurable gate array VLSI . . . . . 82--86 Shogo Nakaya and Makoto Miyamura and Noboru Sakimura and Yuichi Nakamura and Tadahiko Sugibayashi A non-volatile reconfigurable offloader for wireless sensor nodes . . . . . . . 87--92 Mark Thorson Internet nuggets . . . . . . . . . . . . 93--112
Michael Bond GPUDet: a deterministic GPU architecture 1--12 Hyojin Sung and Rakesh Komuravelli and Sarita V. Adve DeNovoND: efficient hardware support for disciplined non-determinism . . . . . . 13--26 Benjamin Wester and David Devecsery and Peter M. Chen and Jason Flinn and Satish Narayanasamy Parallelizing data race detection . . . 27--38 Brandon Lucia and Luis Ceze Cooperative empirical failure avoidance for multithreaded programs . . . . . . . 39--50 Íñigo Goiri and William Katsak and Kien Le and Thu D. Nguyen and Ricardo Bianchini Parasol and GreenSwitch: managing datacenters powered by renewable energy 51--64 Kai Shen and Arrvindh Shriraman and Sandhya Dwarkadas and Xiao Zhang and Zhuan Chen Power containers: an OS facility for fine-grained power and energy management on multicore servers . . . . . . . . . . 65--76 Christina Delimitrou and Christos Kozyrakis Paragon: QoS-aware scheduling for heterogeneous datacenters . . . . . . . 77--88 Lingjia Tang and Jason Mars and Wei Wang and Tanima Dey and Mary Lou Soffa ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers . . . . . . . . . . . . . . . 89--100 Joy Arulraj and Po-Chun Chang and Guoliang Jin and Shan Lu Production-run software failure diagnosis via hardware performance counters . . . . . . . . . . . . . . . . 101--112 Wei Zhang and Marc de Kruijf and Ang Li and Shan Lu and Karthikeyan Sankaralingam ConAir: featherweight concurrency bug recovery via single-threaded idempotent execution . . . . . . . . . . . . . . . 113--126 Nicolas Viennot and Siddharth Nair and Jason Nieh Transparent mutable replay for multicore debugging and patch validation . . . . . 127--138 Swarup Kumar Sahoo and John Criswell and Chase Geigle and Vikram Adve Using likely invariants for automated software fault localization . . . . . . 139--152 Eric Paulos The rise of the expert amateur: DIY culture and the evolution of computer science . . . . . . . . . . . . . . . . 153--154 Arun Raghavan and Laurel Emurian and Lei Shao and Marios Papaefthymiou and Kevin P. Pipe and Thomas F. Wenisch and Milo M. K. Martin Computational sprinting on a hardware/software testbed . . . . . . . 155--166 Wonsun Ahn and Yuelu Duan and Josep Torrellas DeAliaser: alias speculation using atomic region support . . . . . . . . . 167--180 Heekwon Park and Seungjae Baek and Jongmoo Choi and Donghee Lee and Sam H. Noh Regularities considered harmful: forcing randomness to memory accesses to reduce row buffer conflicts for multi-core, multi-bank systems . . . . . . . . . . . 181--192 Nima Honarmand and Nathan Dautenhahn and Josep Torrellas and Samuel T. King and Gilles Pokam and Cristiano Pereira Cyrus: unintrusive application-level record-replay for replay parallelism . . 193--206 Augusto Born de Oliveira and Sebastian Fischmeister and Amer Diwan and Matthias Hauswirth and Peter F. Sweeney Why you should care about quantile regression . . . . . . . . . . . . . . . 207--218 Charlie Curtsinger and Emery D. Berger STABILIZER: statistically sound performance evaluation . . . . . . . . . 219--228 Lokesh Gidra and Gaël Thomas and Julien Sopena and Marc Shapiro A study of the scalability of stop-the-world garbage collectors on multicores . . . . . . . . . . . . . . . 229--240 Daniel S. McFarlin and Charles Tucker and Craig Zilles Discerning the dominant out-of-order performance advantage: is it speculation or dynamism? . . . . . . . . . . . . . . 241--252 Stephen Checkoway and Hovav Shacham Iago attacks: why the system call API is a bad untrusted RPC interface . . . . . 253--264 Owen S. Hofmann and Sangman Kim and Alan M. Dunn and Michael Z. Lee and Emmett Witchel InkTag: secure applications on an untrusted operating system . . . . . . . 265--278 Cristiano Giuffrida and Anton Kuijsten and Andrew S. Tanenbaum Safe and automatic live update for operating systems . . . . . . . . . . . 279--292 Haohui Mai and Edgar Pek and Hui Xue and Samuel Talmadge King and Parthasarathy Madhusudan Verifying security invariants in ExpressOS . . . . . . . . . . . . . . . 293--304 Eric Schkufza and Rahul Sharma and Alex Aiken Stochastic superoptimization . . . . . . 305--316 Eric Schulte and Jonathan DiLorenzo and Westley Weimer and Stephanie Forrest Automated repair of binary and assembly programs for cooperating embedded devices . . . . . . . . . . . . . . . . 317--328 Heming Cui and Gang Hu and Jingyue Wu and Junfeng Yang Verifying systems rules using rule-directed symbolic execution . . . . 329--342 Xiaoya Xiang and Chen Ding and Hao Luo and Bin Bao HOTL: a higher order theory of locality 343--356 Hui Kang and Jennifer L. Wong To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach . . . . 357--368 Hwanju Kim and Sangwook Kim and Jinkyu Jeong and Joonwon Lee and Seungryoul Maeng Demand-based coordinated scheduling for SMP VMs . . . . . . . . . . . . . . . . 369--380 Mohammad Dashti and Alexandra Fedorova and Justin Funston and Fabien Gaud and Renaud Lachaize and Baptiste Lepers and Vivien Quema and Mark Roth Traffic management: a holistic approach to memory placement on NUMA systems . . 381--394 Adwait Jog and Onur Kayiran and Nachiappan Chidambaram Nachiappan and Asit K. Mishra and Mahmut T. Kandemir and Onur Mutlu and Ravishankar Iyer and Chita R. Das OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance . . . . . . . . . . . 395--406 Sreepathi Pai and Matthew J. Thazhuthaveetil and R. Govindarajan Improving GPGPU concurrency with elastic kernels . . . . . . . . . . . . . . . . 407--418 Taewook Oh and Hanjun Kim and Nick P. Johnson and Jae W. Lee and David I. August Practical automatic loop specialization 419--430 Phitchaya Mangpo Phothilimthana and Jason Ansel and Jonathan Ragan-Kelley and Saman Amarasinghe Portable performance on heterogeneous architectures . . . . . . . . . . . . . 431--444 Aashish Mittal and Dushyant Bansal and Sorav Bansal and Varun Sethi Efficient virtualization on embedded Power Architecture\reg platforms . . . . 445--458 Mark D. Hill Research directions for 21st century computer systems: ASPLOS 2013 panel . . 459--460 Anil Madhavapeddy and Richard Mortier and Charalampos Rotsos and David Scott and Balraj Singh and Thomas Gazagnaire and Steven Smith and Steven Hand and Jon Crowcroft Unikernels: library operating systems for the cloud . . . . . . . . . . . . . 461--472 Asim Kadav and Matthew J. Renzelmann and Michael M. Swift Fine-grained fault tolerance using device checkpoints . . . . . . . . . . . 473--484 Mark Silberstein and Bryan Ford and Idit Keidar and Emmett Witchel GPUfs: integrating a file system with GPUs . . . . . . . . . . . . . . . . . . 485--498 Nicholas Hunt and Tom Bergan and Luis Ceze and Steven D. Gribble DDOS: taming nondeterminism in distributed systems . . . . . . . . . . 499--508 Cheng Wang and Youfeng Wu TSO\_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations . . . . . . . . . . . . . 509--520 Syed Ali Raza Jafri and Gwendolyn Voskuilen and T. N. Vijaykumar Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies . . . 521--534 Xuehai Qian and Josep Torrellas and Benjamin Sahelices and Depei Qian Volition: scalable and precise sequential consistency violation detection . . . . . . . . . . . . . . . 535--548 J. P. Grossman and Jeffrey S. Kuskin and Joseph A. Bank and Michael Theobald and Ron O. Dror and Douglas J. Ierardi and Richard H. Larson and U. Ben Schafer and Brian Towles and Cliff Young and David E. Shaw Hardware support for fine-grained event-driven computation in Anton 2 . . 549--560
Amitabha Sinha and Mitrava Sarkar and Soumojit Acharyya and Suranjan Chakraborty A novel reconfigurable architecture of a DSP processor for efficient mapping of DSP functions using field programmable DSP arrays . . . . . . . . . . . . . . . 1--8 Amrita Saha and Manideepa Mukherjee and Debanjana Datta and Sangita Saha and Amitabha Sinha Performance analysis of a FPGA based novel binary and DBNS multiplier . . . . 9--16 Michael Sartin-Tarm and Tony Nowatzki and Lorenzo De Carli and Karthikeyan Sankaralingam and Cristian Estan Constraint centric scheduling guide . . 17--21 Apala Guha and Yao Zhang and Raihan ur Rasool and Andrew A. Chien Systematic evaluation of workload clustering for extremely energy-efficient architectures . . . . . 22--29 Amrita Saha and Pijush Biswas and Amitabha Sinha An integrated development platform of a reconfigurable radio processor for software defined radio . . . . . . . . . 30--35 Santanu Pal and Amitabha Sinha and Pijush Biswas FPGA implementation of a novel DCT architecture reducing constant cosine terms . . . . . . . . . . . . . . . . . 36--40 Kuo-Kun Tseng and Fu-Fu Zeng and Huang-Nan Huang and Yiming Liu and Jeng-Shyang Pan and W. H. Ip and C. H. Wu A new non-exact Aho--Corasick framework for ECG classification . . . . . . . . . 41--46 Subhashis Maitra and Amitabha Sinha High performance MAC unit for DSP and cryptographic applications . . . . . . . 47--55 Mark Thorson Internet nuggets . . . . . . . . . . . . 56--71
Bilel Belhadj and Antoine Joubert and Zheng Li and Rodolphe Héliot and Olivier Temam Continuous real-world inputs can open up alternative accelerator designs . . . . 1--12 Paula Petrica and Adam M. Izraelevitz and David H. Albonesi and Christine A. Shoemaker Flicker: a dynamically adaptive architecture for power limited multicore systems . . . . . . . . . . . . . . . . 13--23 Wajahat Qadeer and Rehan Hameed and Ofer Shacham and Preethi Venkatesan and Christos Kozyrakis and Mark A. Horowitz Convolution engine: balancing efficiency & flexibility in specialized computing 24--35 Kevin Lim and David Meisner and Ali G. Saidi and Parthasarathy Ranganathan and Thomas F. Wenisch Thin servers with smart pipes: designing SoC accelerators for memcached . . . . . 36--47 Janani Mukundan and Hillery Hunter and Kyu-hyoun Kim and Jeffrey Stuecheli and José F. Martínez Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems . . . . . . . . . . . . . . . . 48--59 Jamie Liu and Ben Jaiyen and Yoongu Kim and Chris Wilkerson and Onur Mutlu An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms . . . . . . . . . . 60--71 Prashant J. Nair and Dae-Hyun Kim and Moinuddin K. Qureshi ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates . . . . . . . . . . . . 72--83 Saugata Ghose and Hyodong Lee and José F. Martínez Improving memory scheduling via processor-side load criticality information . . . . . . . . . . . . . . 84--95 Canturk Isci and Suzanne McIntosh and Jeffrey Kephart and Rajarshi Das and James Hanson and Scott Piper and Robert Wolford and Thomas Brey and Robert Kantner and Allen Ng and James Norris and Abdoulaye Traore and Michael Frissora Agile, efficient virtualization power management with low-latency server power states . . . . . . . . . . . . . . . . . 96--107 Cheng-Chun Tu and Chao-tang Lee and Tzi-cker Chiueh Secure I/O device sharing among virtual machines on multiple hosts . . . . . . . 108--119 Xiaotao Chang and Hubertus Franke and Yi Ge and Tao Liu and Kun Wang and Jimi Xenidis and Fei Chen and Yu Zhang Improving virtualization in the presence of software managed translation lookaside buffers . . . . . . . . . . . 120--129 Ji Kim and Christopher Torng and Shreesha Srinath and Derek Lockhart and Christopher Batten Microarchitectural mechanisms to exploit value structure in SIMT architectures 130--141 Angshuman Parashar and Michael Pellauer and Michael Adler and Bushra Ahsan and Neal Crago and Daniel Lustig and Vladimir Pavlov and Antonia Zhai and Mohit Gambhir and Aamer Jaleel and Randy Allmon and Rachid Rayess and Stephen Maresh and Joel Emer Triggered instructions: a control paradigm for spatially-programmed architectures . . . . . . . . . . . . . 142--153 José A. Joao and M. Aater Suleman and Onur Mutlu and Yale N. Patt Utility-based acceleration of multithreaded applications on asymmetric CMPs . . . . . . . . . . . . . . . . . . 154--165 Daniel Kudrow and Kenneth Bier and Zhaoxia Deng and Diana Franklin and Yu Tomita and Kenneth R. Brown and Frederic T. Chong Quantum rotations: a case study in static and dynamic machine-code generation for quantum computers . . . . 166--176 Richard A. Muscat and Karin Strauss and Luis Ceze and Georg Seelig DNA-based molecular architecture with spatially localized components . . . . . 177--188 Qing Guo and Xiaochen Guo and Ravi Patel and Engin Ipek and Eby G. Friedman AC-DIMM: associative computing with STT-MRAM . . . . . . . . . . . . . . . . 189--200 Blake A. Hechtman and Daniel J. Sorin Exploring memory consistency for massively-threaded throughput-oriented processors . . . . . . . . . . . . . . . 201--212 Yuelu Duan and Abdullah Muzahid and Josep Torrellas WeeFence: toward making fences free in TSO . . . . . . . . . . . . . . . . . . 213--224 Harold W. Cain and Maged M. Michael and Brad Frey and Cathy May and Derek Williams and Hung Le Robust architectural support for transactional memory in the Power architecture . . . . . . . . . . . . . . 225--236 Arkaprava Basu and Jayneel Gandhi and Jichuan Chang and Mark D. Hill and Michael M. Swift Efficient virtual memory for big memory servers . . . . . . . . . . . . . . . . 237--248 Lisa Wu and Raymond J. Barker and Martha A. Kim and Kenneth A. Ross Navigating big data with high-throughput, energy-efficient data partitioning . . . . . . . . . . . . . . 249--260 Eric S. Chung and John D. Davis and Jaewon Lee LINQits: big data on little clients . . 261--272 Islam Atta and Pinar Tözün and Xin Tong and Anastasia Ailamaki and Andreas Moshovos STREX: boosting instruction cache reuse in OLTP workloads through stratified transaction execution . . . . . . . . . 273--284 Indrani Paul and Srilatha Manne and Manish Arora and W. Lloyd Bircher and Sudhakar Yalamanchili Cooperative boosting: needy versus greedy power management . . . . . . . . 285--296 Anys Bacha and Radu Teodorescu Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors . . . . . . . . . . . . . . . 297--307 Henry Cook and Miquel Moreto and Sarah Bird and Khanh Dao and David A. Patterson and Krste Asanovic A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness . . . . . . . . . . . . . 308--319 Reetuparna Das and Satish Narayanasamy and Sudhir K. Satpathy and Ronald G. Dreslinski Catnap: energy proportional multiple network-on-chip . . . . . . . . . . . . 320--331 Adwait Jog and Onur Kayiran and Asit K. Mishra and Mahmut T. Kandemir and Onur Mutlu and Ravishankar Iyer and Chita R. Das Orchestrated scheduling and prefetching for GPGPUs . . . . . . . . . . . . . . . 332--343 Naifeng Jing and Yao Shen and Yao Lu and Shrikanth Ganapathy and Zhigang Mao and Minyi Guo and Ramon Canal and Xiaoyao Liang An energy-efficient and scalable eDRAM-based register file architecture for GPGPU . . . . . . . . . . . . . . . 344--355 Minsoo Rhu and Mattan Erez Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation . . . 356--367 Aniruddha S. Vaidya and Anahita Shayesteh and Dong Hyuk Woo and Roy Saharoy and Mani Azimi SIMD divergence optimization through intra-warp compaction . . . . . . . . . 368--379 Young Hoon Son and O. Seongil and Yuhwan Ro and Jae W. Lee and Jung Ho Ahn Reducing memory access latency with asymmetric DRAM bank organizations . . . 380--391 Ziyi Liu and JongHyuk Lee and Junyuan Zeng and Yuanfeng Wen and Zhiqiang Lin and Weidong Shi CPU transparent protection of OS kernel and hypervisor integrity with programmable DRAM . . . . . . . . . . . 392--403 Djordje Jevdjic and Stavros Volos and Babak Falsafi Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? Have it all with footprint cache . . . . . . . . 404--415 Jaewoong Sim and Gabriel H. Loh and Vilas Sridharan and Mike O'Connor Resilient die-stacked DRAM caches . . . 416--427 Yu Du and Miao Zhou and Bruce R. Childers and Daniel Mossé and Rami Melhem Bit mapping for balanced PCM cell programming . . . . . . . . . . . . . . 428--439 Nak Hee Seong and Sungkap Yeo and Hsien-Hsin S. Lee Tri-level-cell phase change memory: toward an efficient and reliable memory system . . . . . . . . . . . . . . . . . 440--451 Rodolfo Azevedo and John D. Davis and Karin Strauss and Parikshit Gopalan and Mark Manasse and Sergey Yekhanin Zombie memory: extending memory lifetime by reviving dead blocks . . . . . . . . 452--463 Adrian M. Caulfield and Steven Swanson QuickSAN: a storage area network for fast, distributed, solid state disks . . 464--474 Daniel Sanchez and Christos Kozyrakis ZSim: fast and accurate microarchitectural simulation of thousand-core systems . . . . . . . . . 475--486 Jingwen Leng and Tayler Hetherington and Ahmed ElTantawy and Syed Gilani and Nam Sung Kim and Tor M. Aamodt and Vijay Janapa Reddi GPUWattch: enabling energy optimizations in GPGPUs . . . . . . . . . . . . . . . 487--498 Meng-Ju Wu and Minshu Zhao and Donald Yeung Studying multicore processor scaling via reuse distance analysis . . . . . . . . 499--510 Kristof Du Bois and Stijn Eyerman and Jennifer B. Sartor and Lieven Eeckhout Criticality stacks: identifying critical threads in parallel programs using synchronization behavior . . . . . . . . 511--522 George Kurian and Omer Khan and Srinivas Devadas The locality-aware adaptive cache coherence protocol . . . . . . . . . . . 523--534 Stefanos Kaxiras and Alberto Ros A new perspective for efficient virtual-cache coherence . . . . . . . . 535--546 Hongzhou Zhao and Arrvindh Shriraman and Snehasish Kumar and Sandhya Dwarkadas Protozoa: adaptive granularity cache coherence . . . . . . . . . . . . . . . 547--558 John Demme and Matthew Maycock and Jared Schmitz and Adrian Tang and Adam Waksman and Simha Sethumadhavan and Salvatore Stolfo On the feasibility of online malware detection with performance counters . . 559--570 Ling Ren and Xiangyao Yu and Christopher W. Fletcher and Marten van Dijk and Srinivas Devadas Design space exploration and optimization of path oblivious RAM in secure processors . . . . . . . . . . . 571--582 Hassan M. G. Wassel and Ying Gao and Jason K. Oberg and Ted Huffmire and Ryan Kastner and Frederic T. Chong and Timothy Sherwood SurfNoC: a low latency and provably non-interfering approach to secure networks-on-chip . . . . . . . . . . . . 583--594 Di Wang and Chuangang Ren and Anand Sivasubramaniam Virtualizing power distribution in datacenters . . . . . . . . . . . . . . 595--606 Hailong Yang and Alex Breslow and Jason Mars and Lingjia Tang Bubble-Flux: precise online QoS management for increased utilization in warehouse scale computers . . . . . . . 607--618 Jason Mars and Lingjia Tang Whare-map: heterogeneity in ``homogeneous'' warehouse-scale computers . . . . . . . . . . . . . . . 619--630 Nikos Foutris and Dimitris Gizopoulos and Xavier Vera and Antonio Gonzalez Deconfigurable microprocessor architectures for silicon debug acceleration . . . . . . . . . . . . . . 631--642 Gilles Pokam and Klaus Danne and Cristiano Pereira and Rolf Kassa and Tim Kranich and Shiliang Hu and Justin Gottschlich and Nima Honarmand and Nathan Dautenhahn and Samuel T. King and Josep Torrellas QuickRec: prototyping an Intel architecture extension for record and replay of multithreaded programs . . . . 643--654 Ruirui Huang and Erik Halberg and G. Edward Suh Non-race concurrency bug detection through order-sensitive critical sections . . . . . . . . . . . . . . . . 655--666
Subhashis Maitra and Amitabha Sinha High efficiency MAC unit used in digital signal processing and elliptic curve cryptography . . . . . . . . . . . . . . 1--7 Tomislav Janjusic and Krishna Kavi Gleipnir: a memory profiling and tracing tool . . . . . . . . . . . . . . . . . . 8--12 Mark Thorson Internet nuggets . . . . . . . . . . . . 13--22
Ivan Godard The Mill: split-stream encoding . . . . 1--5 Alexander Thomasian Disk arrays with multiple RAID levels 6--24 Subhashis Maitra and Amitabha Sinha Design and simulation of MAC unit using combinational circuit and adder . . . . 25--33 Thomas C. P. Chau and James S. Targett and Marlon Wijeyasinghe and Wayne Luk and Peter Y. K. Cheung and Benjamin Cope and Alison Eele and Jan Maciejowski Accelerating sequential Monte Carlo method for real-time air traffic management . . . . . . . . . . . . . . . 35--40 Atabak Mahram and Martin C. Herbordt NCBI BLASTP on the Convey HC1-EX . . . . 41--46 Kentaro Sano and Yoshiaki Kono and Hayato Suzuki and Ryotaro Chiba and Ryo Ito and Tomohiro Ueno and Kyo Koizumi and Satoru Yamamoto Efficient custom computing of fully-streamed lattice Boltzmann method on tightly-coupled FPGA cluster . . . . 47--52 Wim Vanderbauwhede and Anton Frolov and Sai Rahul Chalamalasetti and Martin Margala A hybrid CPU--FPGA system for high throughput (10Gb/s) streaming document classification . . . . . . . . . . . . . 53--58 Ce Guo and Wayne Luk and Ekaterina Vinkovskaya and Rama Cont Customisable pipelined engine for intensity evaluation in multivariate Hawkes point processes . . . . . . . . . 59--64 Heiner Giefers and Christian Plessl and Jens Förstner Accelerating finite difference time domain simulations with reconfigurable dataflow computers . . . . . . . . . . . 65--70 Yuki Ogawa and Masahiro Iida and Motoki Amagasaki and Morihiro Kuga and Toshinori Sueyoshi A reconfigurable Java accelerator with software compatibility for embedded systems . . . . . . . . . . . . . . . . 71--76 Takeshi Ohkawa and Daichi Uetake and Takashi Yokota and Kanemitsu Ootsu and Takanobu Baba Reconfigurable and hardwired ORB engine on FPGA by Java-to-HDL synthesizer for realtime application . . . . . . . . . . 77--82 Florent de Dinechin and Matei Istoan and Guillaume Sergent Fixed-point trigonometric functions on FPGAs . . . . . . . . . . . . . . . . . 83--88 Jubee Tada Performance evaluation of $3$-D stacked $ 32$-bit parallel multipliers . . . . . 89--94 Yuichiroh Tanaka and Shimpei Sato and Kenji Kise The UltraSmall soft processor . . . . . 95--100 Liucheng Guo and David B. Thomas and Wayne Luk Customisable architectures for the set covering problem . . . . . . . . . . . . 101--106 Gary Plumbridge and Jack Whitham and Neil Audsley Blueshell: a platform for rapid prototyping of multiprocessor NoCs and accelerators . . . . . . . . . . . . . . 107--117 Chuan Hong and Khaled Benkrid and Nazrin Isa and Xabier Iturbe A run-time reconfigurable system for adaptive high performance efficient computing . . . . . . . . . . . . . . . 113--118 Mark Thorson Internet nuggets . . . . . . . . . . . . 119--127
Al Davis Inside Windows Azure: the challenges and opportunities of a cloud operating system . . . . . . . . . . . . . . . . . 1--2 Stanko Novakovic and Alexandros Daglis and Edouard Bugnion and Babak Falsafi and Boris Grot Scale-out NUMA . . . . . . . . . . . . . 3--18 Sandeep R. Agrawal and Valentin Pistol and Jun Pang and John Tran and David Tarjan and Alvin R. Lebeck Rhythm: harnessing data parallel hardware for server workloads . . . . . 19--34 Mehrzad Samadi and Davoud Anoushe Jamshidi and Janghaeng Lee and Scott Mahlke Paraprox: pattern-based approximation for data parallel applications . . . . . 35--50 James Bornholt and Todd Mytkowicz and Kathryn S. McKinley Uncertain$<$ t$>$: a first-order type for uncertain data . . . . . . . . . . . . . 51--66 Nuno Santos and Himanshu Raj and Stefan Saroiu and Alec Wolman Using ARM trustzone to build a trusted language runtime for mobile applications 67--80 John Criswell and Nathan Dautenhahn and Vikram Adve Virtual Ghost: protecting applications from hostile operating systems . . . . . 81--96 Xun Li and Vineeth Kashyap and Jason K. Oberg and Mohit Tiwari and Vasanth Ram Rajarathinam and Ryan Kastner and Timothy Sherwood and Ben Hardekopf and Frederic T. Chong Sapper: a language for hardware-level security policy enforcement . . . . . . 97--112 Radu Banabic and George Candea and Rachid Guerraoui Finding Trojan message vulnerabilities in distributed systems . . . . . . . . . 113--126 Christina Delimitrou and Christos Kozyrakis Quasar: resource-efficient and QoS-aware cluster management . . . . . . . . . . . 127--144 Seyed Majid Zahedi and Benjamin C. Lee REF: resource elasticity fairness with sharing incentives for multiprocessors 145--160 Thannirmalai Somu Muthukaruppan and Anuj Pathania and Tulika Mitra Price theory based power management for heterogeneous multi-cores . . . . . . . 161--176 Di Wang and Sriram Govindan and Anand Sivasubramaniam and Aman Kansal and Jie Liu and Badriddine Khessib Underprovisioning backup power infrastructure for datacenters . . . . . 177--192 Xiao Yu and Shi Han and Dongmei Zhang and Tao Xie Comprehending performance from real-world execution traces: a device-driver case . . . . . . . . . . . 193--206 Joy Arulraj and Guoliang Jin and Shan Lu Leveraging the short-term memory of hardware to diagnose production-run software failures . . . . . . . . . . . 207--222 Nima Honarmand and Josep Torrellas RelaxReplay: record and replay for relaxed-consistency multiprocessors . . 223--238 Stefan Bucur and Johannes Kinder and George Candea Prototyping symbolic execution engines for interpreted languages . . . . . . . 239--254 Lisa Wu and Andrea Lottarini and Timothy K. Paine and Martha A. Kim and Kenneth A. Ross Q100: the architecture and design of a database processing unit . . . . . . . . 255--268 Tianshi Chen and Zidong Du and Ninghui Sun and Jia Wang and Chengyong Wu and Yunji Chen and Olivier Temam DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning . . . . . . 269--284 Felix Xiaozhu Lin and Zhen Wang and Lin Zhong K2: a mobile operating system for heterogeneous coherence domains . . . . 285--300 Konstantinos Menychtas and Kai Shen and Michael L. Scott Disengaged scheduling for fair, protected access to fast computational accelerators . . . . . . . . . . . . . . 301--316 Jeff Gehlhaar Neuromorphic processing: a new frontier in scaling computer architecture . . . . 317--318 Ardalan Amiri Sani and Kevin Boos and Shaopu Qin and Lin Zhong I/O paravirtualization at the device file boundary . . . . . . . . . . . . . 319--332 Christoffer Dall and Jason Nieh KVM\slash ARM: the design and implementation of the Linux ARM hypervisor . . . . . . . . . . . . . . . 333--348 Nadav Amit and Dan Tsafrir and Assaf Schuster VSwapper: a memory swapper for virtualized environments . . . . . . . . 349--366 Jeremy Andrus and Alexander Van't Hof and Naser AlDuaij and Christoffer Dall and Nicolas Viennot and Jason Nieh Cider: native execution of iOS apps on Android . . . . . . . . . . . . . . . . 367--382 Heiner Litz and David Cheriton and Amin Firoozshahian and Omid Azizi and John P. Stevenson SI-TM: reducing transactional memory abort rates through snapshot isolation 383--398 Wenjia Ruan and Trilok Vyas and Yujie Liu and Michael Spear Transactionalizing legacy code: an experience report using GCC and Memcached . . . . . . . . . . . . . . . 399--412 Adam Morrison and Yehuda Afek Fence-free work stealing on bounded TSO processors . . . . . . . . . . . . . . . 413--426 Derek R. Hower and Blake A. Hechtman and Bradford M. Beckmann and Benedict R. Gaster and Mark D. Hill and Steven K. Reinhardt and David A. Wood Heterogeneous-race-free memory models 427--440 Myoungsoo Jung and Wonil Choi and John Shalf and Mahmut Taylan Kandemir Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems . . . . . . . . . . . . 441--454 Ren-Shuo Liu and De-Yu Shen and Chia-Lin Yang and Shun-Chih Yu and Cheng-Yuan Michael Wang NVM duet: unified working memory and persistent store architecture . . . . . 455--470 Jian Ouyang and Shiding Lin and Song Jiang and Zhenyu Hou and Yong Wang and Yuanzheng Wang SDF: software-defined flash for Web-scale Internet storage systems . . . 471--484 Anthony Gutierrez and Michael Cieslak and Bharan Giridhar and Ronald G. Dreslinski and Luis Ceze and Trevor Mudge Integrated $3$D-stacked server designs for increasing physical density of key-value stores . . . . . . . . . . . . 485--498 Donald Nguyen and Andrew Lenharth and Keshav Pingali Deterministic Galois: on-demand, portable and parameterless . . . . . . . 499--512 Haris Ribic and Yu David Liu Energy-efficient work-stealing language runtimes . . . . . . . . . . . . . . . . 513--528 Todd Mytkowicz and Madanlal Musuvathi and Wolfram Schulte Data-parallel finite-state machines . . 529--542 Zhijia Zhao and Bo Wu and Xipeng Shen Challenging the ``embarrassingly sequential'': parallelizing finite state machine-based computations through principled speculation . . . . . . . . . 543--558 Yanqi Zhou and David Wentzlaff The sharing architecture: sub-core configurability for IaaS clouds . . . . 559--574 Amos Waterland and Elaine Angelino and Ryan P. Adams and Jonathan Appavoo and Margo Seltzer ASC: automatically scalable computation 575--590 Stijn Eyerman and Lieven Eeckhout The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism . . . . . . . . 591--606 Yufei Ding and Mingzhou Zhou and Zhijia Zhao and Sarah Eisenstat and Xipeng Shen Finding the limit: examining the potential and complexity of compilation scheduling for JIT-based runtime systems 607--622 Marc Lupon and Enric Gibert and Grigorios Magklis and Sridhar Samudrala and Raúl Martínez and Kyriakos Stavrou and David R. Ditzel Speculative hardware/software co-designed floating-point multiply-add fusion . . . . . . . . . . . . . . . . . 623--638 Eric Schulte and Jonathan Dorn and Stephen Harding and Stephanie Forrest and Westley Weimer Post-compiler software optimization for reducing energy . . . . . . . . . . . . 639--652 David A. Wood Resolved: specialized architectures, languages, and system software should supplant general-purpose alternatives within a decade . . . . . . . . . . . . 653--654 Olatunji Ruwase and Michael A. Kozuch and Phillip B. Gibbons and Todd C. Mowry Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers . . . . . . . . . . . . . . . . 655--670 Benjamin P. Wood and Luis Ceze and Dan Grossman Low-level detection of language-level data races with LARD . . . . . . . . . . 671--686 Jiaqi Zhang and Lakshminarayanan Renganarayana and Xiaolan Zhang and Niyu Ge and Vasanth Bala and Tianyin Xu and Yuanyuan Zhou EnCore: exploiting system environment and correlation information for misconfiguration detection . . . . . . . 687--700 Gwendolyn Voskuilen and T. N. Vijaykumar High-performance fractal coherence . . . 701--714 Woo-Cheol Kwon and Tushar Krishna and Li-Shiuan Peh Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs 715--728 Harshad Kasture and Daniel Sanchez Ubik: efficient cache sharing with strict QoS for latency-critical workloads . . . . . . . . . . . . . . . 729--742 Bharath Pichai and Lisa Hsu and Abhishek Bhattacharjee Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces . . . . . . . . . 743--758
Subijit Mondal and Subhashis Maitra Data security-modified AES algorithm and its applications . . . . . . . . . . . . 1--8 Soumik Sen and Subhashis Maitra Three levels three dimensional compact coding . . . . . . . . . . . . . . . . . 9--14 Alexander Thomasian and Bingxing Liu and Yuhui Deng Balancing disk access times in RAID5 disk arrays in degraded mode by conditionally prioritizing fork/join requests . . . . . . . . . . . . . . . . 15--19 Jayneel Gandhi and Arkaprava Basu and Mark D. Hill and Michael M. Swift BadgerTrap: a tool to instrument x86-64 TLB misses . . . . . . . . . . . . . . . 20--23 Mark Thorson Internet nuggets . . . . . . . . . . . . 24--36
Brian Towles and J. P. Grossman and Brian Greskamp and David E. Shaw Unifying on-chip and inter-node switching within the Anton 2 network . . 1--12 Andrew Putnam and Adrian M. Caulfield and Eric S. Chung and Derek Chiou and Kypros Constantinides and John Demme and Hadi Esmaeilzadeh and Jeremy Fowers and Gopi Prashanth and Gopal Jan and Gray Michael and Haselman Scott Hauck and Stephen Heil and Amir Hormati and Joo-Young Kim and Sitaram Lanka and James Larus and Eric Peterson and Simon Pope and Aaron Smith and Jason Thong and Phillip Yi and Xiao Doug Burger A reconfigurable fabric for accelerating large-scale datacenter services . . . . 13--24 Bhavya K. Daya and Chia-Hsin Owen Chen and Suvinay Subramanian and Woo-Cheol Kwon and Sunghyun Park and Tushar Krishna and Jim Holt and Anantha P. Chandrakasan and Li-Shiuan Peh SCORPIO: a $ 36$-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering . . . . . . . . . . . . . . . . 25--36 Gaurang Upasani and Xavier Vera and Antonio González Avoiding core's DUE & SDC via acoustic wave detectors and tailored error containment and recovery . . . . . . . . 37--48 Long Chen and Zhao Zhang MemGuard: a low cost and energy efficient design to support and enhance memory system reliability . . . . . . . 49--60 Siva Kumar Sastry Hari and Radha Venkatagiri and Sarita V. Adve and Helia Naeimi GangES: gang error simulation for hardware resiliency evaluation . . . . . 61--72 Jack Wadden and Alexander Lyashevsky and Sudhanva Gurumurthi and Vilas Sridharan and Kevin Skadron Real-world design and evaluation of compiler-managed GPU redundant multithreading . . . . . . . . . . . . . 73--84 Tianshi Chen and Qi Guo and Ke Tang and Olivier Temam and Zhiwei Xu and Zhi-Hua Zhou and Yunji Chen ArchRanker: a ranking approach to design space exploration . . . . . . . . . . . 85--96 Yakun Sophia Shao and Brandon Reagen and Gu-Yeon Wei and David Brooks Aladdin: a Pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures . . . . . . . . . . . . . 97--108 Mario Badr and Natalie Enright Jerger SynFull: synthetic traffic models capturing cache coherent behaviour . . . 109--120 Ashish Venkat and Dean M. Tullsen Harnessing ISA diversity: design of a heterogeneous-ISA chip multiprocessor 121--132 Andreas Sembrant and Erik Hagersten and David Black-Schaffer The Direct-to-Data (D2D) cache: navigating the cache hierarchy with a single lookup . . . . . . . . . . . . . 133--144 Angelos Arelakis and Per Stenstrom SC2: a statistical compression cache scheme . . . . . . . . . . . . . . . . . 145--156 Vivek Seshadri and Abhishek Bhowmick and Onur Mutlu and Phillip B. Gibbons and Michael A. Kozuch and Todd C. Mowry The dirty-block index . . . . . . . . . 157--168 Lei Liu and Yong Li and Zehan Cui and Yungang Bao and Mingyu Chen and Chengyong Wu Going vertical in memory management: handling multiplicity by multi-policy 169--180 Marc S. Orr and Bradford M. Beckmann and Steven K. Reinhardt and David A. Wood Fine-grain task aggregation and coordination on GPUs . . . . . . . . . . 181--192 Ivan Tanasic and Isaac Gelado and Javier Cabezas and Alex Ramirez and Nacho Navarro and Mateo Valero Enabling preemptive multiprogramming on GPUs . . . . . . . . . . . . . . . . . . 193--204 Dani Voitsechov and Yoav Etsion Single-graph multiple flows: energy efficient design alternative for GPGPUs 205--216 Simone Campanoni and Kevin Brownell and Svilen Kanev and Timothy M. Jones and Gu-Yeon Wei and David Brooks HELIX--RC: an architecture-compiler co-design for automatic parallelization of irregular programs . . . . . . . . . 217--228 James E. Smith Efficient digital neurons for large scale cortical architectures . . . . . . 229--240 Karthik Swaminathan and Huichu Liu and Jack Sampson and Vijaykrishnan Narayanan An examination of the architecture and system-level tradeoffs of employing steep slope devices in $3$D CMPs . . . . 241--252 Rangharajan Venkatesan and Shankar Ganesh Ramasubramanian and Swagath Venkataramani and Kaushik Roy and Anand Raghunathan STAG: spintronic-tape architecture for GPGPU cache hierarchies . . . . . . . . 253--264 Steven Pelley and Peter M. Chen and Thomas F. Wenisch Memory persistency . . . . . . . . . . . 265--276 Morteza Hoseinzadeh and Mohammad Arjomand and Hamid Sarbazi-Azad Reducing access latency of MLC PCMs through line striping . . . . . . . . . 277--288 Myoungsoo Jung and Wonil Choi and Shekhar Srikantaiah and Joonhyuk Yoo and Mahmut T. Kandemir HIOS: a host interface I/O scheduler for solid state disks . . . . . . . . . . . 289--300 David Lo and Liqun Cheng and Rama Govindaraju and Luiz André Barroso and Christos Kozyrakis Towards energy proportionality for large-scale latency-critical workloads 301--312 Yanpei Liu and Stark C. Draper and Nam Sung Kim SleepScale: runtime joint speed scaling and sleep states management for power efficient data centers . . . . . . . . . 313--324 Ming Liu and Tao Li Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads . . . . . . . . . . 325--336 Seongil O and Young Hoon Son and Nam Sung Kim and Jung Ho Ahn Row-buffer decoupling: a case for low-latency DRAM microarchitecture . . . 337--348 Tao Zhang and Ke Chen and Cong Xu and Guangyu Sun and Tao Wang and Yuan Xie Half-DRAM: a high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation 349--360 Yoongu Kim and Ross Daly and Jeremie Kim and Chris Fallin and Ji Hye Lee and Donghyuk Lee and Chris Wilkerson and Konrad Lai and Onur Mutlu Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors . . . . . . . . 361--372 Runjie Zhang and Ke Wang and Brett H. Meyer and Mircea R. Stan and Kevin Skadron Architecture implications of pads as a scarce resource . . . . . . . . . . . . 373--384 Shaoming Chen and Yue Hu and Ying Zhang and Lu Peng and Jesse Ardonne and Samuel Irving and Ashok Srivastava Increasing off-chip bandwidth in multi-core processors with switchable pins . . . . . . . . . . . . . . . . . . 385--396 Lei Jiang and Bo Zhao and Jun Yang and Youtao Zhang A low power and reliable charge pump design for phase change memories . . . . 397--408 Gwendolyn Voskuilen and T. N. Vijaykumar Fractal++: closing the performance gap between fractal and conventional coherence . . . . . . . . . . . . . . . 409--420 Xuehai Qian and Benjamin Sahelices and Josep Torrellas OmniOrder: directory-based conflict serialization of transactions . . . . . 421--432 Xuehai Qian and Benjamin Sahelices and Depei Qian Pacifier: record and replay for relaxed-consistency multiprocessors with distributed directory protocol . . . . . 433--444 Nima Honarmand and Josep Torrellas Replay debugging: leveraging record and replay for program debugging . . . . . . 445--456 Jonathan Woodruff and Robert N. M. Watson and David Chisnall and Simon W. Moore and Jonathan Anderson and Brooks Davis and Ben Laurie and Peter G. Neumann and Robert Norton and Michael Roe The CHERI capability model: revisiting RISC in an age of risk . . . . . . . . . 457--468 Llu\"\is Vilanova and Muli Ben-Yehuda and Nacho Navarro and Yoav Etsion and Mateo Valero CODOMs: protecting software with code-centric memory domains . . . . . . 469--480 Arthur Perais and André Seznec EOLE: paving the way for an effective implementation of value prediction . . . 481--492 Kenneth Czechowski and Victor W. Lee and Ed Grochowski and Ronny Ronen and Ronak Singhal and Richard Vuduc and Pradeep Dubey Improving the energy efficiency of big cores . . . . . . . . . . . . . . . . . 493--504 Renée St. Amant and Amir Yazdanbakhsh and Jongse Park and Bradley Thwaites and Hadi Esmaeilzadeh and Arjang Hassibi and Luis Ceze and Doug Burger General-purpose code acceleration with limited-precision analog computation . . 505--516 Advait Madhavan and Timothy Sherwood and Dmitri Strukov Race logic: a hardware acceleration for dynamic programming algorithms . . . . . 517--528 Jose-Maria Arnau and Joan-Manuel Parcerisa and Polychronis Xekalakis Eliminating redundant fragment shader executions on a mobile GPU via hardware memoization . . . . . . . . . . . . . . 529--540 Yuhao Zhu and Vijay Janapa Reddi WebCore: architectural support for mobile Web browsing . . . . . . . . . . 541--552
Yuetsu Kodama and Toshihiro Hanawa and Taisuke Boku and Mitsuhisa Sato PEACH2: an FPGA-based PCIe network device for Tightly Coupled Accelerators 3--8 Shimpei Nomura and Takuji Mitsuishi and Jun Suzuki and Yuki Hayashi and Masaki Kan and Hideharu Amano Performance Analysis of the Multi-GPU System with ExpEther . . . . . . . . . . 9--14 Tsuyoshi Watanabe and Naohito Nakasato GPU Accelerated Hybrid Tree Algorithm for Collision Less $N$-body Simulations 15--20 Haruhisa Tsuyama and Tsutomu Maruyama GPU and FPGA Acceleration of Level Set Method . . . . . . . . . . . . . . . . . 21--25 Yu Tanabe and Tsutomu Maruyama Fast and Accurate Optical Flow Estimation using FPGA . . . . . . . . . 27--32 Cesar Torres-Huitzil and Marco Aurelio Nuño-Maganda Area-time Efficient Implementation of Local Adaptive Image Thresholding in Reconfigurable Hardware . . . . . . . . 33--38 Diana Göhringer Reconfigurable Multiprocessor Systems: Handling Hydras Heads --- A Survey . . . 39--44 Kentaro Sano and Ryotaro Chiba and Tomoya Ueno and Hayato Suzuki and Ryo Ito and Satoru Yamamoto FPGA-based Custom Computing Architecture for Large-Scale Fluid Simulation with Building Cube Method . . . . . . . . . . 45--50 Tao Wang and Guangyu Sun and Jiahua Chen and Jian Gong and Haoyang Wu and Xiaoguang Li and Songwu Lu and Jason Cong GRT: a Reconfigurable SDR Platform with High Performance and Usability . . . . . 51--56 Yuki Ando and Masataka Ogawa and Yuya Mizoguchi and Kouta Kumagai and Miaw Torng-Der and Shinya Honda A Case Study of FPGA Blokus Duo Solver by System-Level Design . . . . . . . . . 57--62 Mioara Joldes and Valentina Popescu and Warwick Tucker Searching for Sinks for the Hénon Map using a Multiple-precision GPU Arithmetic Library . . . . . . . . . . . 63--68 Rie Soejima and Koji Okina and Keisuke Dohi and Yuichiro Shibata and Kiyoshi Oguri A Memory Profiling Framework for Stencil Computation on an FPGA Accelerator with High Level Synthesis . . . . . . . . . . 69--74 Shin Morishima and Hiroki Matsutani Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries . . . . . . . . . . 75--80 Takuji Mitsuishi and Shimpei Nomura and Jun Suzuki and Yuki Hayashi and Masaki Kan and Hideharu Amano Accelerating Breadth First Search on GPU--BOX . . . . . . . . . . . . . . . . 81--86 Jose Nunez-Yanez Energy efficient Reconfigurable Computing with Adaptive Voltage and Logic scaling . . . . . . . . . . . . . 87--92 Mark Thorson Internet Nuggets . . . . . . . . . . . . 93--101 Mark Thorson Internet Nuggets . . . . . . . . . . . . 93--101
Ozcan Ozturk Architectural Support for Cyber-Physical Systems . . . . . . . . . . . . . . . . 1--1 Yiying Zhang and Jian Yang and Amirsaman Memaripour and Steven Swanson Mojim: a Reliable and Highly-Available Non-Volatile Memory System . . . . . . . 3--18 Rujia Wang and Lei Jiang and Youtao Zhang and Jun Yang SD--PCM: Constructing Reliable Super Dense Phase Change Memory under Write Disturbance . . . . . . . . . . . . . . 19--31 Vinson Young and Prashant J. Nair and Moinuddin K. Qureshi DEUCE: Write-Efficient Encryption for Non-Volatile Memories . . . . . . . . . 33--44 Adam Morrison and Yehuda Afek Temporally Bounding TSO for Fence-Free Asymmetric Synchronization . . . . . . . 45--58 Alexander Matveev and Nir Shavit Reduced Hardware NOrec: a Safe and Scalable Hybrid Transactional Memory . . 59--71 Marc S. Orr and Shuai Che and Ayse Yilmazer and Bradford M. Beckmann and Mark D. Hill and David A. Wood Synchronization Using Remote-Scope Promotion . . . . . . . . . . . . . . . 73--86 Chang Liu and Austin Harris and Martin Maas and Michael Hicks and Mohit Tiwari and Elaine Shi GhostRider: a Hardware-Software System for Memory Trace Oblivious Computation 87--101 Christopher W. Fletcher and Ling Ren and Albert Kwon and Marten van Dijk and Srinivas Devadas Freecursive ORAM: [Nearly] Free Recursion and Integrity Verification for Position-based Oblivious RAM . . . . . . 103--116 David Chisnall and Colin Rothwell and Robert N. M. Watson and Jonathan Woodruff and Munraj Vadera and Simon W. Moore and Michael Roe and Brooks Davis and Peter G. Neumann Beyond the PDP-11: Architectural Support for a Memory-Safe C Abstract Machine . . 117--130 Jiuyue Ma and Xiufeng Sui and Ninghui Sun and Yupeng Li and Zihao Yu and Bowen Huang and Tianni Xu and Zhicheng Yao and Yun Chen and Haibin Wang and Lixin Zhang and Yungang Bao Supporting Differentiated Services in Computers via Programmable Architecture for Resourcing-on-Demand (PARD) . . . . 131--143 Yushi Omote and Takahiro Shinagawa and Kazuhiko Kato Improving Agility and Elasticity in Bare-metal Clouds . . . . . . . . . . . 145--159 Md E. Haque and Yong hun Eom and Yuxiong He and Sameh Elnikety and Ricardo Bianchini and Kathryn S. McKinley Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services . . . . . . . . . . . . . . . . 161--175 Patrick Colp and Jiawen Zhang and James Gleeson and Sahil Suneja and Eyal de Lara and Himanshu Raj and Stefan Saroiu and Alec Wolman Protecting Data on Smartphones and Tablets from Memory Attacks . . . . . . 177--189 Nathan Dautenhahn and Theodoros Kasampalis and Will Dietz and John Criswell and Vikram Adve Nested Kernel: an Operating System Architecture for Intra-Kernel Privilege Separation . . . . . . . . . . . . . . . 191--206 Zhangxi Tan and Zhenghao Qian and Xi Chen and Krste Asanovic and David Patterson DIABLO: a Warehouse-Scale Computer Network Simulator using FPGAs . . . . . 207--221 Johann Hauswald and Michael A. Laurenzano and Yunqi Zhang and Cheng Li and Austin Rovinski and Arjun Khurana and Ronald G. Dreslinski and Trevor Mudge and Vinicius Petrucci and Lingjia Tang and Jason Mars Sirius: an Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers . . . . . . . . . . . . . . . 223--238 Chao Xu and Felix Xiaozhu Lin and Yuyang Wang and Lin Zhong Automated OS-level Device Runtime Power Management . . . . . . . . . . . . . . . 239--252 Íñigo Goiri and Thu D. Nguyen and Ricardo Bianchini CoolAir: Temperature- and Variation-Aware Management for Free-Cooled Datacenters . . . . . . . . 253--265 Nikita Mishra and Huazhe Zhang and John D. Lafferty and Henry Hoffmann A Probabilistic Graphical Model-based Approach for Minimizing Energy Under Performance Constraints . . . . . . . . 267--281 Jun Pang and Chris Dwyer and Alvin R. Lebeck More is Less, Less is More: Molecular-Scale Photonic NoC Power Topologies . . . . . . . . . . . . . . . 283--296 Vilas Sridharan and Nathan DeBardeleben and Sean Blanchard and Kurt B. Ferreira and Jon Stearley and John Shalf and Sudhanva Gurumurthi Memory Errors in Modern Systems: The Good, The Bad, and The Ugly . . . . . . 297--310 Yavuz Yetim and Sharad Malik and Margaret Martonosi CommGuard: Mitigating Communication Errors in Error-Prone Parallel Execution 311--323 Dohyeong Kim and Yonghwi Kwon and William N. Sumner and Xiangyu Zhang and Dongyan Xu Dual Execution for On the Fly Fine Grained Execution Comparison . . . . . . 325--338 Petr Hosek and Cristian Cadar VARAN the Unbelievable: an Efficient $N$-version Execution Framework . . . . 339--353 Moshe Malka and Nadav Amit and Muli Ben-Yehuda and Dan Tsafrir rIOMMU: Efficient IOMMU for I/O Devices that Employ Ring Buffers . . . . . . . . 355--368 Daofu Liu and Tianshi Chen and Shaoli Liu and Jinhong Zhou and Shengyuan Zhou and Olivier Teman and Xiaobing Feng and Xuehai Zhou and Yunji Chen PuDianNao: a Polyvalent Machine Learning Accelerator . . . . . . . . . . . . . . 369--381 Inigo Goiri and Ricardo Bianchini and Santosh Nagarakatte and Thu D. Nguyen ApproxHadoop: Bringing Approximations to MapReduce Frameworks . . . . . . . . . . 383--397 Michael Ringenburg and Adrian Sampson and Isaac Ackerman and Luis Ceze and Dan Grossman Monitoring and Debugging the Quality of Results in Approximate Programs . . . . 399--411 Guruduth Banavar Watson and the Era of Cognitive Computing . . . . . . . . . . . . . . . 413--413 Gordon Stewart and Mahanth Gowda and Geoffrey Mainland and Bozidar Radunovic and Dimitrios Vytiniotis and Cristina Luengo Agullo Ziria: a DSL for Wireless Systems Programming . . . . . . . . . . . . . . 415--428 Ravi Teja Mullapudi and Vinay Vasista and Uday Bondhugula PolyMage: Automatic Optimization for Image Processing Pipelines . . . . . . . 429--443 Jeff Heckey and Shruti Patil and Ali JavadiAbhari and Adam Holmes and Daniel Kudrow and Kenneth R. Brown and Diana Franklin and Frederic T. Chong and Margaret Martonosi Compiler Management of Communication and Parallelism for Quantum Computation . . 445--456 Muhammad Amber Hassaan and Donald D. Nguyen and Keshav K. Pingali Kinetic Dependence Graphs . . . . . . . 457--471 Stelios Sidiroglou-Douskos and Eric Lahtinen and Nathan Rittenhouse and Paolo Piselli and Fan Long and Deokhwan Kim and Martin Rinard Targeted Automatic Integer Overflow Discovery Using Goal-Directed Conditional Branch Enforcement . . . . . 473--486 Udit Dhawan and Catalin Hritcu and Raphael Rubin and Nikos Vasilakis and Silviu Chiricescu and Jonathan M. Smith and Thomas F. Knight, Jr. and Benjamin C. Pierce and Andre DeHon Architectural Support for Software-Defined Metadata Processing . . 487--502 Danfeng Zhang and Yao Wang and G. Edward Suh and Andrew C. Myers A Hardware Design Language for Timing-Sensitive Information-Flow Security . . . . . . . . . . . . . . . . 503--516 Matthew Hicks and Cynthia Sturton and Samuel T. King and Jonathan M. Smith SPECS: a Lightweight Runtime Mechanism for Protecting Software from Security-Critical Processor Bugs . . . . 517--529 Yuelu Duan and Nima Honarmand and Josep Torrellas Asymmetric Memory Fences: Optimizing Both Performance and Implementability 531--543 Hyojin Sung and Sarita V. Adve DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations . . . . . 545--559 Aritra Sengupta and Swarnendu Biswas and Minjia Zhang and Michael D. Bond and Milind Kulkarni Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability . . . . . . . . . . . . 561--575 Jade Alglave and Mark Batty and Alastair F. Donaldson and Ganesh Gopalakrishnan and Jeroen Ketema and Daniel Poetzl and Tyler Sorensen and John Wickerson GPU Concurrency: Weak Behaviours and Programming Assumptions . . . . . . . . 577--591 Jason Jong Kyu Park and Yongjun Park and Scott Mahlke Chimera: Collaborative Preemption for Multitasking on a Shared GPU . . . . . . 593--606 Neha Agarwal and David Nellans and Mark Stephenson and Mike O'Connor and Stephen W. Keckler Page Placement Strategies for GPUs within Heterogeneous Memory Systems . . 607--618 Zhijia Zhao and Xipeng Shen On-the-Fly Principled Speculation for FSM Parallelization . . . . . . . . . . 619--630 Tudor David and Rachid Guerraoui and Vasileios Trigonakis Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures . . . . . . . . . . . . . . . 631--644 Pramod Bhatotia and Pedro Fonseca and Umut A. Acar and Björn B. Brandenburg and Rodrigo Rodrigues iThreads: a Threading Library for Parallel Incremental Computation . . . . 645--659 Lokesh Gidra and Gaël Thomas and Julien Sopena and Marc Shapiro and Nhan Nguyen NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines . . . . . . . 661--673 Khanh Nguyen and Kai Wang and Yingyi Bu and Lu Fang and Jianfei Hu and Guoqing Xu FACADE: a Compiler and Runtime for (Almost) Object-Bounded Big Data Applications . . . . . . . . . . . . . . 675--690 Varun Agrawal and Abhiroop Dabral and Tapti Palit and Yongming Shen and Michael Ferdman Architectural Support for Dynamic Linking . . . . . . . . . . . . . . . . 691--702
Andrew A. Chien and Tung Thanh-Hoang and Dilip Vasudevan and Yuanwei Fang and Amirali Shambayati $ 10 \times 10 $: a Case Study in Highly-Programmable and Energy-Efficient Heterogeneous Federated Architecture . . 2--9 Mark Thorson Internet Nuggets . . . . . . . . . . . . 10--16
Martin Herbordt and Miriam Leeser Off-Loading LET Generation to PEACH2: a Switching Hub for High Performance GPU Clusters . . . . . . . . . . . . . . . . 3--8 Koji Okina and Rie Soejima and Kota Fukumoto and Yuichiro Shibata and Kiyoshi Oguri Power Performance Profiling of $3$-D Stencil Computation on an FPGA Accelerator for Efficient Pipeline Optimization . . . . . . . . . . . . . . 9--14 Ahmad Lashgar and Ebad Salehi and Amirali Baniasadi A Case Study in Reverse Engineering GPGPUs: Outstanding Memory Handling Resources . . . . . . . . . . . . . . . 15--21 Ami Hayashi and Yuta Tokusashi and Hiroki Matsutani A Line Rate Outlier Filtering FPGA NIC using 10GbE Interface . . . . . . . . . 22--27 Abhishek Kumar Jain and Xiangwei Li and Suhaib A. Fahmy and Douglas L. Maskell Adapting the DySER Architecture with DSP Blocks as an Overlay for the Xilinx Zynq 28--33 David de la Chevallerie and Jens Korinth and Andreas Koch ffLink: a Lightweight High-Performance Open-Source PCI Express Gen3 Interface for Reconfigurable Accelerators . . . . 34--39 Soukaina N. Hmid and Jose G. F. Coutinho and Wayne Luk A Transfer-Aware Runtime System for Heterogeneous Asynchronous Parallel Execution . . . . . . . . . . . . . . . 40--45 Ahmed Al-Wattar and Shawki Areibi and Gary Grewal Efficient Mapping and Allocation of Execution Units to Task Graphs using an Evolutionary Framework . . . . . . . . . 46--51 Amir Momeni and Hamed Tabkhi and Yash Ukidave and Gunar Schirner and David Kaeli Exploring the Efficiency of the OpenCL Pipe Semantic on an FPGA . . . . . . . . 52--57 Takuji Mitsuishi and Jun Suzuki and Yuki Hayashi and Masaki Kan and Hideharu Amano Breadth First Search on Cost-efficient Multi-GPU Systems . . . . . . . . . . . 58--63 Michael Mefenza and Nicolas Edwards and Christophe Bobda Interface Based Memory Synthesis of Image Processing Applications in FPGA 64--69 Da Tong and Viktor Prasanna High Throughput Sketch Based Online Heavy Hitter Detection on FPGA . . . . . 70--75 Xinying Wang and Phillip H. Jones and Joseph Zambreno A Configurable Architecture for Sparse $ L U $ Decomposition on Matrices with Arbitrary Patterns . . . . . . . . . . . 76--81 Kentaro Sano and Fumiya Kono and Naohito Nakasato and Alexander Vazhenin and Stanislav Sedukhin Stream Computation of Shallow Water Equation Solver for FPGA-based $1$D Tsunami Simulation . . . . . . . . . . . 82--87 Liucheng Guo and Andreea Ingrid Funie and David B. Thomas and Haohuan Fu and Wayne Luk Parallel Genetic Algorithms on Multiple FPGAs . . . . . . . . . . . . . . . . . 86--93 Mark Thorson Internet Nuggets . . . . . . . . . . . . 94--100
Mark Thorson Internet Nuggets . . . . . . . . . . . . 7--11
Hadi Asgharimoghaddam and Nam Sung Kim SpinWise: a Practical Energy-Efficient Synchronization Technique for CMPs . . . 1--8 Lena E. Olson and Mark D. Hill Probabilistic Directed Writebacks for Exclusive Caches . . . . . . . . . . . . 9--18 Mark Thorson Internet Nuggets . . . . . . . . . . . . 19--22
Yuanyuan Zhou Programming Uncertain $<$T$>$ hings . . . 1--2 Sergi Abadal and Albert Cabellos-Aparicio and Eduard Alarcon and Josep Torrellas WiSync: an Architecture for Fast Synchronization through On-Chip Wireless Communication . . . . . . . . . . . . . 3--17 Xiaodong Wang and José F. Martínez ReBudget: Trading Off Efficiency vs. Fairness in Market-Based Multicore Resource Allocation via Runtime Budget Reassignment . . . . . . . . . . . . . . 19--32 Haishan Zhu and Mattan Erez Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems . . . . . . . . . . . 33--47 Yossi Kuperman and Eyal Moscovici and Joel Nider and Razya Ladelsky and Abel Gordon and Dan Tsafrir Paravirtual Remote I/O . . . . . . . . . 49--65 Antoine Kaufmann and SImon Peter and Naveen Kr. Sharma and Thomas Anderson and Arvind Krishnamurthy High Performance Packet Processing with FlexNIC . . . . . . . . . . . . . . . . 67--81 James Bornholt and Antoine Kaufmann and Jialin Li and Arvind Krishnamurthy and Emina Torlak and Xi Wang Specifying and Checking File System Crash-Consistency Models . . . . . . . . 83--98 Aravinda Prasad and K. Gopinath Prudent Memory Reclamation in Procrastination-Based Synchronization 99--112 Anurag Mukkara and Nathan Beckmann and Daniel Sanchez Whirlpool: Improving Dynamic Cache Management with Static Data Classification . . . . . . . . . . . . . 113--127 Myeongjae Jeon and Yuxiong He and Hwanju Kim and Sameh Elnikety and Scott Rixner and Alan L. Cox TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services . . . . 129--141 Fraser Brown and Andres Nötzli and Dawson Engler How to Build Static Checking Systems Using Orders of Magnitude Less Code . . 143--157 Tong Zhang and Dongyoon Lee and Changhee Jung TxRace: Efficient Data Race Detection Using Commodity Hardware Transactional Memory . . . . . . . . . . . . . . . . . 159--173 Sidney Amani and Alex Hixon and Zilin Chen and Christine Rizkallah and Peter Chubb and Liam O'Connor and Joel Beeren and Yutaka Nagashima and Japheth Lim and Thomas Sewell and Joseph Tuong and Gabriele Keller and Toby Murray and Gerwin Klein and Gernot Heiser Cogent: Verifying High-Assurance File System Implementations . . . . . . . . . 175--188 Nils Asmussen and Marcus Völp and Benedikt Nöthen and Hermann Härtig and Gerhard Fettweis M3: a Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores . . . . . . . . . . . . . . . 189--203 Daniyal Liaqat and Silviu Jingoi and Eyal de Lara and Ashvin Goel and Wilson To and Kevin Lee and Italo De Moraes Garcia and Manuel Saldana Sidewinder: an Energy Efficient and Developer Friendly Heterogeneous Architecture for Continuous Mobile Sensing . . . . . . . . . . . . . . . . 205--215 Jonathan Balkind and Michael McKeown and Yaosheng Fu and Tri Nguyen and Yanqi Zhou and Alexey Lavrov and Mohammad Shahrad and Adi Fuchs and Samuel Payne and Xiaohua Liang and Matthew Matl and David Wentzlaff OpenPiton: an Open Source Manycore Research Framework . . . . . . . . . . . 217--232 Daniel Lustig and Geet Sethi and Margaret Martonosi and Abhishek Bhattacharjee COATCheck: Verifying Memory Ordering at the Hardware-OS Interface . . . . . . . 233--247 Alex Markuze and Adam Morrison and Dan Tsafrir True IOMMU Protection from DMA Attacks: When Copy is Faster than Zero Copy . . . 249--262 Amro Awad and Pratyusa Manadhata and Stuart Haber and Yan Solihin and William Horne Silent Shredder: Zero-Cost Shredding for Secure Non-Volatile Main Memory Controllers . . . . . . . . . . . . . . 263--276 Youngjin Kwon and Alan M. Dunn and Michael Z. Lee and Owen S. Hofmann and Yuanzhong Xu and Emmett Witchel Sego: Pervasive Trusted Metadata for Efficiently Verified Untrusted System Services . . . . . . . . . . . . . . . . 277--290 Dan Tsafrir Synopsis of the ASPLOS '16 Wild and Crazy Ideas (WACI) Invited-Speakers Session . . . . . . . . . . . . . . . . 291--294 R. Stanley Williams Brain Inspired Computing . . . . . . . . 295--295 Phitchaya Mangpo Phothilimthana and Aditya Thakur and Rastislav Bodik and Dinakar Dhurjati Scaling up Superoptimization . . . . . . 297--310 Niranjan Hasabnis and R. Sekar Lifting Assembly to Intermediate Representation: a Novel Approach Leveraging Compilers . . . . . . . . . . 311--324 Saurav Muralidharan and Amit Roy and Mary Hall and Michael Garland and Piyush Rai Architecture-Adaptive Code Variant Tuning . . . . . . . . . . . . . . . . . 325--338 Xiaofeng Lin and Yu Chen and Xiaodong Li and Junjie Mao and Jiaquan He and Wei Xu and Yuanchun Shi Scalable Kernel TCP Design and Implementation for Short-Lived Connections . . . . . . . . . . . . . . 339--352 Izzat El Hajj and Alexander Merritt and Gerd Zellweger and Dejan Milojicic and Reto Achermann and Paolo Faraboschi and Wen-mei Hwu and Timothy Roscoe and Karsten Schwan SpaceJMP: Programming with Multiple Virtual Address Spaces . . . . . . . . . 353--368 Felix Xiaozhu Lin and Xu Liu \ttf memif: Towards Programming Heterogeneous Memory Asynchronously . . 369--383 Wook-Hee Kim and Jinwoong Kim and Woongki Baek and Beomseok Nam and Youjip Won NVWAL: Exploiting NVRAM in Write-Ahead Logging . . . . . . . . . . . . . . . . 385--398 Aasheesh Kolli and Steven Pelley and Ali Saidi and Peter M. Chen and Thomas F. Wenisch High-Performance Transactions for Persistent Memories . . . . . . . . . . 399--411 Qing Guo and Karin Strauss and Luis Ceze and Henrique S. Malvar High-Density Image Storage Using Approximate Memory Cells . . . . . . . . 413--426 Joseph Izraelevitz and Terence Kelly and Aasheesh Kolli Failure-Atomic Persistent Memory Updates via JUSTDO Logging . . . . . . . . . . . 427--442 Jaeung Han and Seungheun Jeon and Young-ri Choi and Jaehyuk Huh Interference Management for Distributed Parallel Applications in Consolidated Clusters . . . . . . . . . . . . . . . . 443--456 Martin Maas and Krste Asanovi\'c and Tim Harris and John Kubiatowicz Taurus: a Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications . . . . . 457--471 Christina Delimitrou and Christos Kozyrakis HCloud: Resource-Efficient Provisioning in Shared Cloud Systems . . . . . . . . 473--488 Xiao Yu and Pallavi Joshi and Jianwu Xu and Guoliang Jin and Hui Zhang and Guofei Jiang CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs . . 489--502 Yonghwi Kwon and Dohyeong Kim and William Nick Sumner and Kyungtae Kim and Brendan Saltaformaggio and Xiangyu Zhang and Dongyan Xu LDX: Causality Inference by Lightweight Dual Execution . . . . . . . . . . . . . 503--515 Tanakorn Leesatapornwongsa and Jeffrey F. Lukman and Shan Lu and Haryadi S. Gunawi TaxDC: a Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems . . . . . . . . . . 517--530 Junjie Mao and Yu Chen and Qixue Xiao and Yuanchun Shi RID: Finding Reference Count Bugs with Inconsistent Path Pair Checking . . . . 531--544 Huazhe Zhang and Henry Hoffmann Maximizing Performance Under a Power Cap: a Comparison of Hardware, Software, and Hybrid Techniques . . . . . . . . . 545--559 Songchun Fan and Seyed Majid Zahedi and Benjamin C. Lee The Computational Sprinting Game . . . . 561--575 Alexei Colin and Graham Harvey and Brandon Lucia and Alanson P. Sample An Energy-interference-free Hardware-Software Debugger for Intermittent Energy-harvesting Systems 577--589 Emmett Witchel Programmer Productivity in a World of Mushy Interfaces: Challenges of the Post-ISA Reality . . . . . . . . . . . . 591--591 Kevin Angstadt and Westley Weimer and Kevin Skadron RAPID Programming of Pattern-Recognition Processors . . . . . . . . . . . . . . . 593--605 Xin Sui and Andrew Lenharth and Donald S. Fussell and Keshav Pingali Proactive Control of Approximate Programs . . . . . . . . . . . . . . . . 607--621 Jongse Park and Emmanuel Amaro and Divya Mahajan and Bradley Thwaites and Hadi Esmaeilzadeh AxGames: Towards Crowdsourcing Quality Target Determination in Approximate Computing . . . . . . . . . . . . . . . 623--636 James Bornholt and Randolph Lopez and Douglas M. Carmean and Luis Ceze and Georg Seelig and Karin Strauss A DNA-Based Archival Storage System . . 637--649 Raghu Prabhakar and David Koeplinger and Kevin J. Brown and HyoukJoong Lee and Christopher De Sa and Christos Kozyrakis and Kunle Olukotun Generating Configurable Hardware from Parallel Patterns . . . . . . . . . . . 651--665 Li-Wen Chang and Hee-Seok Kim and Wen-mei W. Hwu DySel: Lightweight Dynamic Selection for Kernel-based Data-parallel Programming Model . . . . . . . . . . . . . . . . . 667--680 Quan Chen and Hailong Yang and Jason Mars and Lingjia Tang Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers . . . . . . . . . . . . . . . 681--696 Tony Nowatzki and Karthikeyan Sankaralingam Analyzing Behavior Specialized Acceleration . . . . . . . . . . . . . . 697--711 Man-Ki Yoon and Negin Salajegheh and Yin Chen and Mihai Christodorescu PIFT: Predictive Information-Flow Tracking . . . . . . . . . . . . . . . . 713--725 Ashish Venkat and Sriskanda Shamasunder and Hovav Shacham and Dean M. Tullsen HIPStR: Heterogeneous-ISA Program State Relocation . . . . . . . . . . . . . . . 727--741 Zelalem Birhanu Aweke and Salessawi Ferede Yitbarek and Rui Qiao and Reetuparna Das and Matthew Hicks and Yossi Oren and Todd Austin ANVIL: Software-Based Protection Against Next-Generation Rowhammer Attacks . . . 743--755 Diego Didona and Nuno Diegues and Anne-Marie Kermarrec and Rachid Guerraoui and Ricardo Neves and Paolo Romano ProteusTM: Abstraction Meets Performance in Transactional Memory . . . . . . . . 757--771 Noam Shalev and Eran Harpaz and Hagar Porat and Idit Keidar and Yaron Weinsberg CSR: Core Surprise Removal in Commodity Operating Systems . . . . . . . . . . . 773--787 Tanmay Gangwani and Adam Morrison and Josep Torrellas CASPAR: Breaking Serialization in Lock-Free Multicore Synchronization . . 789--804
Jorge Albericio and Patrick Judd and Tayler Hetherington and Tor Aamodt and Natalie Enright Jerger and Andreas Moshovos Cnvlutin: ineffectual-neuron-free deep neural network computing . . . . . . . . 1--13 Ali Shafiee and Anirban Nag and Naveen Muralimanohar and Rajeev Balasubramonian and John Paul Strachan and Miao Hu and R. Stanley Williams and Vivek Srikumar ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars . . . . . . . . 14--26 Ping Chi and Shuangchen Li and Cong Xu and Tao Zhang and Jishen Zhao and Yongpan Liu and Yu Wang and Yuan Xie PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory 27--39 Christopher Torng and Moyang Wang and Christopher Batten Asymmetry-aware work-stealing runtimes 40--52 Hung-Wei Tseng and Qianchen Zhao and Yuxiao Zhou and Mark Gahagan and Steven Swanson Morpheus: creating application objects efficiently for heterogeneous computing 53--65 Divya Mahajan and Amir Yazdanbakhsh and Jongse Park and Bradley Thwaites and Hadi Esmaeilzadeh Towards statistical guarantees in controlling quality tradeoffs for approximate acceleration . . . . . . . . 66--77 Akanksha Jain and Calvin Lin Back to the future: leveraging Belady's algorithm for improved cache replacement 78--89 Caching Hyun Park and Taekyung Heo and Jaehyuk Huh Efficient synonym filtering and scalable delayed translation for hybrid virtual 90--102 Hsiang-Yun Cheng and Jishen Zhao and Jack Sampson and Mary Jane Irwin and Aamer Jaleel and Yu Lu and Yuan Xie LAP: loop-block aware inclusion properties for energy-efficient asymmetric last level caches . . . . . . 103--114 David Koeplinger and Christina Delimitrou and Raghu Prabhakar and Christos Kozyrakis and Yaqi Zhang and Kunle Olukotun Automatic generation of efficient accelerators for reconfigurable hardware 115--127 Donggyu Kim and Adam Izraelevitz and Christopher Celio and Hokeun Kim and Brian Zimmer and Yunsup Lee and Jonathan Bachrach and Krste Asanovi\'c Strober: fast and accurate sample-based energy simulation for arbitrary RTL . . 128--139 Michael A. Laurenzano and Yunqi Zhang and Jiang Chen and Lingjia Tang and Jason Mars PowerChop: identifying and managing non-critical units in hybrid processor architectures . . . . . . . . . . . . . 140--152 Boncheol Gu and Andre S. Yoon and Duck-Ho Bae and Insoon Jo and Jinyoung Lee and Jonghyun Yoon and Jeong-Uk Kang and Moonsang Kwon and Chanho Yoon and Sangyeun Cho and Jaeheon Jeong and Duckhyun Chang Biscuit: a framework for near-data processing of big data workloads . . . . 153--165 Muhammet Mustafa Ozdal and Serif Yesil and Taemin Kim and Andrey Ayupov and John Greth and Steven Burns and Ozcan Ozturk Energy efficient architecture for graph analytics accelerators . . . . . . . . . 166--177 Ikuo Magaki and Moein Khazraee and Luis Vega Gutierrez and Michael Bedford Taylor ASIC clouds: specializing the datacenter 178--190 Yunho Oh and Keunsoo Kim and Myung Kuk Yoon and Jong Hyun Park and Yongjun Park and Won Woo Ro and Murali Annavaram APRES: improving cache efficiency by exploiting load characteristics on GPUs 191--203 Kevin Hsieh and Eiman Ebrahimi and Gwangsun Kim and Niladrish Chatterjee and Mike O'Connor and Nandita Vijaykumar and Onur Mutlu and Stephen W. Keckler Transparent offloading and mapping (TOM): enabling programmer-transparent near-data processing in GPU systems . . 204--216 Chang Hyun Park and Taekyung Heo and Jaehyuk Huh Efficient synonym filtering and scalable delayed translation for hybrid virtual caching . . . . . . . . . . . . . . . . 217--229 Qiumin Xu and Hyeran Jeon and Keunsoo Kim and Won Woo Ro and Murali Annavaram Warped-slicer: efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming 230--242 Song Han and Xingyu Liu and Huizi Mao and Jing Pu and Ardavan Pedram and Mark A. Horowitz and William J. Dally EIE: efficient inference engine on compressed deep neural network . . . . . 243--254 Robert LiKamWa and Yunhui Hou and Julian Gao and Mia Polansky and Lin Zhong RedEye: analog ConvNet image sensor architecture for continuous mobile vision . . . . . . . . . . . . . . . . . 255--266 Brandon Reagen and Paul Whatmough and Robert Adolf and Saketh Rama and Hyunkwang Lee and Sae Kyu Lee and José Miguel Hernández-Lobato and Gu-Yeon Wei and David Brooks Minerva: enabling low-power, highly-accurate deep neural network accelerators . . . . . . . . . . . . . . 267--278 Yuan Yao and Zhonghai Lu Opportunistic competition overhead reduction for expediting critical section in NoC based CMPs . . . . . . . 279--290 Channoh Kim and Sungmin Kim and Hyeon Gyu Cho and Dooyoung Kim and Jaehyeok Kim and Young H. Oh and Hakbeom Jang and Jae W. Lee Short-circuit dispatch: accelerating virtual machine interpreters on embedded processors . . . . . . . . . . . . . . . 291--303 Christoffer Dall and Shih-Wei Li and Jin Tack Lim and Jason Nieh and Georgios Koloventzos ARM virtualization: performance and architectural implications . . . . . . . 304--316 Jayesh Gaur and Alaa R. Alameldeen and Sreenivas Subramoney Base-victim compression: an opportunistic cache compression architecture . . . . . . . . . . . . . . 317--328 Jungrae Kim and Michael Sullivan and Esha Choukse and Mattan Erez Bit-plane compression: transforming data for better compression in many-core architectures . . . . . . . . . . . . . 329--340 Prashant J. Nair and Vilas Sridharan and Moinuddin K. Qureshi XED: exposing on-die error detection information for strong memory reliability . . . . . . . . . . . . . . 341--353 Mohammad Mejbah ul Alam and Abdullah Muzahid Production-run software failure diagnosis via \underlineadaptive \underlinecommunication \underlinetracking . . . . . . . . . . . 354--366 Yu-Hsin Chen and Joel Emer and Vivienne Sze Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks . . . . . 367--379 Duckhwan Kim and Jaeha Kung and Sek Chai and Sudhakar Yalamanchili and Saibal Mukhopadhyay Neurocube: a programmable digital neuromorphic architecture with high-density $3$D memory . . . . . . . . 380--392 Shaoli Liu and Zidong Du and Jinhua Tao and Dong Han and Tao Luo and Yuan Xie and Yunji Chen and Tianshi Chen Cambricon: an instruction set architecture for neural networks . . . . 393--405 Ziqiang Huang and Andrew D. Hilton and Benjamin C. Lee Decoupling loads for nano-instruction set computers . . . . . . . . . . . . . 406--417 Timothy Hayes and Oscar Palomar and Osman Unsal and Adrian Cristal and Mateo Valero Future vector microprocessor extensions for data aggregations . . . . . . . . . 418--430 Faissal M. Sleiman and Thomas F. Wenisch Efficiently scaling out-of-order cores for simultaneous multithreading . . . . 431--443 Milad Hashemi and Khubaib and Eiman Ebrahimi and Onur Mutlu and Yale N. Patt Accelerating dependent cache misses with an enhanced memory controller . . . . . 444--455 Yunqi Zhang and David Meisner and Jason Mars and Lingjia Tang Treadmill: attributing the source of tail latency through precise load testing and statistical inference . . . 456--468 Qiang Wu and Qingyuan Deng and Lakshmi Ganesh and Chang-Hong Hsu and Yun Jin and Sanjeev Kumar and Bin Li and Justin Meza and Yee Jiun Song Dynamo: facebook's data center-wide power management system . . . . . . . . 469--480 Daniel Wong Peak efficiency aware scheduling for highly energy proportional servers . . . 481--492 Chao Li and Zhenhua Wang and Xiaofeng Hou and Haopeng Chen and Xiaoyao Liang and Minyi Guo Power attack defense: securing battery-backed data centers . . . . . . 493--505 Mingyu Gao and Christina Delimitrou and Dimin Niu and Krishna T. Malladi and Hongzhong Zheng and Bob Brennan and Christos Kozyrakis DRAF: a low-power DRAM-based reconfigurable acceleration fabric . . . 506--518 Lunkai Zhang and Brian Neely and Diana Franklin and Dmitri Strukov and Yuan Xie and Frederic T. Chong Mellow Writes: extending lifetime in resistive memories through selective slow write backs . . . . . . . . . . . . 519--531 Yanqi Zhou and David Wentzlaff MITTS: memory inter-arrival time traffic shaping . . . . . . . . . . . . . . . . 532--544 Joshua San Miguel and Natalie Enright Jerger The anytime automaton . . . . . . . . . 545--557 Siyang Wang and Xiangyu Zhang and Yuxuan Li and Ramin Bashizade and Song Yang and Chris Dwyer and Alvin R. Lebeck Accelerating Markov random field inference using molecular optical Gibbs sampling units . . . . . . . . . . . . . 558--569 Yipeng Huang and Ning Guo and Mingoo Seok and Yannis Tsividis and Simha Sethumadhavan Evaluation of an analog accelerator for linear algebra . . . . . . . . . . . . . 570--582 Jin Wang and Norm Rubin and Albert Sidelnik and Sudhakar Yalamanchili LaPerm: locality aware scheduler for dynamic parallelism on GPUs . . . . . . 583--595 Sagi Shahar and Shai Bergman and Mark Silberstein ActivePointers: a case for software address translation on GPUs . . . . . . 596--608 Myung Kuk Yoon and Keunsoo Kim and Sangpil Lee and Won Woo Ro and Murali Annavaram Virtual thread: maximizing thread-level parallelism beyond GPU scheduling limit 609--621 Jungrae Kim and Michael Sullivan and Sangkug Lym and Mattan Erez All-inclusive ECC: thorough end-to-end protection for reliable computer memory 622--633 Henry Duwe and Xun Jian and Daniel Petrisko and Rakesh Kumar Rescuing uncorrectable fault patterns in on-chip memories through error pattern transformation . . . . . . . . . . . . . 634--644 Dong Wan Kim and Mattan Erez RelaxFault memory repair . . . . . . . . 645--657 Raghavendra Pradyumna Pothukuchi and Amin Ansari and Petros Voulgaris and Josep Torrellas Using multiple input, multiple output formal control to maximize resource efficiency in architectures . . . . . . 658--670 Hari Cherupalli and Rakesh Kumar and John Sartori Exploiting dynamic timing slack for energy efficiency in ultra-low-power embedded systems . . . . . . . . . . . . 671--681 Yanqi Zhou and Henry Hoffmann and David Wentzlaff CASH: supporting IaaS customers with a sub-core configurable architecture . . . 682--694 Mohammad Arjomand and Mahmut T. Kandemir and Anand Sivasubramaniam and Chita R. Das Boosting access parallelism to PCM-based main memory . . . . . . . . . . . . . . 695--706 Jayneel Gandhi and Mark D. Hill and Michael M. Swift Agile paging: exceeding the best of nested and shadow paging . . . . . . . . 707--718 Hoseok Seol and Wongyu Shin and Jaemin Jang and Jungwhan Choi and Jinwoong Suh and Lee-Sup Kim Energy efficient data encoding in DRAM channels exploiting data value similarity . . . . . . . . . . . . . . . 719--730
Jiayi Sheng and Qingqing Xiong and Chen Yang and Martin C. Herbordt Collective Communication on FPGA Clusters with Static Scheduling . . . . 2--7 Susumu Mashimo and Thiem Van Chu and Kenji Kise Cost-Effective and High-Throughput Merge Network: Architecture for the Fastest FPGA Sorting Accelerator . . . . . . . . 8--13 Cuong Pham-Quoc and Biet Nguyen and Tran Ngoc Thinh FPGA-based Multicore Architecture for Integrating Multiple DDoS Defense Mechanisms . . . . . . . . . . . . . . . 14--19 Fatemeh Eslami and Steven J. E. Wilton An Improved Overlay and Mapping Algorithm Supporting Rapid Triggering for FPGA Debug . . . . . . . . . . . . . 20--25 Ryohei Kobayashi and Tomohiro Misono and Kenji Kise A High-speed Verilog HDL Simulation Method using a Lightweight Translator 26--31 Shohei Sassa and Kenji Kanazawa and Shaowei Cai and Moritoshi Yasunaga An FPGA Solver for Partial MaxSAT Problems Based on Stochastic Local Search . . . . . . . . . . . . . . . . . 32--37 Ernst Joachim Houtgast and VladMihai Sima and Koen Bertels and Zaid AlArs An Efficient GPUAccelerated Implementation of Genomic Short Read Mapping with BWAMEM . . . . . . . . . . 38--43 Hiroki Nakahara and Hiroyuki Nakanishi and Kazumasa Iwai and Tsutomu Sasao An FFT Circuit for a Spectrometer of a Radio Telescope using the Nested RNS including the Constant Division . . . . 44--49 Vinod Pangracious and Mulhim Al-Doori Novel Three-Dimensional Embedded FPGA Technology and Achitecture . . . . . . . 50--55 Oliver Knodel and Paul R. Genssler and Rainer G. Spallek Migration of long-running Tasks between Reconfigurable Resources using Virtualization . . . . . . . . . . . . . 56--61 Jubee Tada and Maiki Hosokawa and Ryusuke Egawa and Hiroaki Kobayashi Effects of Stacking Granularity on $3$-D Stacked Floating-point Fused Multiply Add Units . . . . . . . . . . . . . . . 62--67 Jiang Su and Jianxiong Liu and David B. Thomas and Peter Y. K. Cheung Neural Network Based Reinforcement Learning Acceleration on FPGA Platforms 68--73 Erik H. D'Hollander High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication . . . . . . . . . . . . . 74--79 Chengzhe Li and Lai Yoong Yee and Hiroshi Maruyama and Yoshiki Yamaguchi FPGA-based Volleyball Player Tracker . . 80--86 Qian Zhao and Motoki Amagasaki and Masahiro Iida and Morihiro Kuga and Toshinori Sueyoshi A Study of Heterogeneous Computing Design Method based on Virtualization Technology . . . . . . . . . . . . . . . 86--91 Colin Yu Lin and Zhenghong Jiang and Cheng Fu and Hayden Kwok-Hay So and Haigang Yang FPGA High-level Synthesis versus Overlay: Comparisons on Computation Kernels . . . . . . . . . . . . . . . . 92--97
Xusheng Zhan and Yungang Bao and Christian Bienia and Kai Li PARSEC3.0: a Multicore Benchmark Suite with Network Stacks and SPLASH-2X . . . 1--16
Yunji Chen Big Data Analytics and Intelligence at Alibaba Cloud . . . . . . . . . . . . . 1--1 Hari Cherupalli and Henry Duwe and Weidong Ye and Rakesh Kumar and John Sartori Determining Application-specific Peak Power and Energy Requirements for Ultra-low Power Processors . . . . . . . 3--16 Quan Chen and Hailong Yang and Minyi Guo and Ram Srivatsa Kannan and Jason Mars and Lingjia Tang Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers 17--32 Svilen Kanev and Sam Likun Xi and Gu-Yeon Wei and David Brooks Mallacc: Accelerating Memory Allocation 33--45 Shasha Wen and Milind Chabbi and Xu Liu REDSPY: Exploring Value Locality in Software . . . . . . . . . . . . . . . . 47--61 Abhishek Bhattacharjee Translation-Triggered Prefetching . . . 63--76 Channoh Kim and Jaehyeok Kim and Sungmin Kim and Dooyoung Kim and Namho Kim and Gitae Na and Young H. Oh and Hyeon Gyu Cho and Jae W. Lee Typed Architectures: Architectural Support for Lightweight Scripting . . . 77--90 Jihye Seo and Wook-Hee Kim and Woongki Baek and Beomseok Nam and Sam H. Noh Failure-Atomic Slotted Paging for Persistent Memory . . . . . . . . . . . 91--104 Donald Nguyen and Keshav Pingali What Scalable Programs Need from Transactional Memory . . . . . . . . . . 105--118 Caroline Trippel and Yatin A. Manerkar and Daniel Lustig and Michael Pellauer and Margaret Martonosi TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA . . . . . . . . . . . . . . . . 119--133 Sanketh Nalli and Swapnil Haria and Mark D. Hill and Michael M. Swift and Haris Volos and Kimberly Keeton An Analysis of Persistent Memory Use with WHISPER . . . . . . . . . . . . . . 135--148 Tong Zhang and Changhee Jung and Dongyoon Lee ProRace: Practical Data Race Detection for Production Use . . . . . . . . . . . 149--162 Lena E. Olson and Mark D. Hill and David A. Wood Crossing Guard: Mediating Host-Accelerator Coherence Interactions 163--176 Joseph McMahan and Michael Christensen and Lawton Nichols and Jared Roesch and Sung-Yee Guo and Ben Hardekopf and Timothy Sherwood An Architecture Supporting Formal and Compositional Binary Analysis . . . . . 177--191 Chun-Hung Hsiao and Satish Narayanasamy and Essam Muhammad Idris Khan and Cristiano L. Pereira and Gilles A. Pokam AsyncClock: Scalable Inference of Asynchronous Event Causality . . . . . . 193--205 Irina Calciu and Siddhartha Sen and Mahesh Balakrishnan and Marcos K. Aguilera Black-box Concurrent Data Structures for NUMA Architectures . . . . . . . . . . . 207--221 Keval Vora and Chen Tian and Rajiv Gupta and Ziang Hu CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing . . . . . 223--236 Keval Vora and Rajiv Gupta and Guoqing Xu KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations . . . . . . . . . 237--251 Bobby Powers and John Vilk and Emery D. Berger Browsix: Bridging the Gap Between Unix and the Browser . . . . . . . . . . . . 253--266 Samyam Rajbhandari and Yuxiong He and Olatunji Ruwase and Michael Carbin and Trishul Chilimbi Optimizing CNNs on Multicores for Scalability, Performance and Goodput . . 267--280 Kirshanthan Sundararajah and Laith Sakka and Milind Kulkarni Locality Transformations for Nested Recursive Iteration Spaces . . . . . . . 281--295 Ang Li and Shuaiwen Leon Song and Weifeng Liu and Xu Liu and Akash Kumar and Henk Corporaal Locality-Aware CTA Clustering for Modern GPUs . . . . . . . . . . . . . . . . . . 297--311 Berkeley Churchill and Rahul Sharma and JF Bastien and Alex Aiken Sound Loop Superoptimization for Google Native Client . . . . . . . . . . . . . 313--326 Ricardo Bianchini Improving Datacenter Efficiency . . . . 327--327 Mengxing Liu and Mingxing Zhang and Kang Chen and Xuehai Qian and Yongwei Wu and Weimin Zheng and Jinglei Ren DudeTM: Building Durable Transactions with Decoupling for Persistent Memory 329--343 Ana Klimovic and Heiner Litz and Christos Kozyrakis ReFlex: Remote Flash $ \approx $ Local Flash . . . . . . . . . . . . . . . . . 345--359 Djordje Jevdjic and Karin Strauss and Luis Ceze and Henrique S. Malvar Approximate Storage of Compressed and Encrypted Videos . . . . . . . . . . . . 361--373 Nima Elyasi and Mohammad Arjomand and Anand Sivasubramaniam and Mahmut T. Kandemir and Chita R. Das and Myoungsoo Jung Exploiting Intra-Request Slack to Improve SSD Performance . . . . . . . . 375--388 Kai Wang and Aftab Hussain and Zhiqiang Zuo and Guoqing Xu and Ardalan Amiri Sani Graspan: a Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code . . 389--404 Ao Ren and Zhe Li and Caiwen Ding and Qinru Qiu and Yanzhi Wang and Ji Li and Xuehai Qian and Bo Yuan SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing . . . . . . . . . . 405--418 Jerry Ajay and Chen Song and Aditya Singh Rathore and Chi Zhou and Wenyao Xu $3$DGates: an Instruction-Level Energy Analysis and Optimization of $3$D Printers . . . . . . . . . . . . . . . . 419--433 Guilherme Cox and Abhishek Bhattacharjee Efficient Address Translation for Architectures with Multiple Page Sizes 435--448 Ilya Lesokhin and Haggai Eran and Shachar Raindel and Guy Shapiro and Sagi Grimberg and Liran Liss and Muli Ben-Yehuda and Nadav Amit and Dan Tsafrir Page Fault Support for Network Controllers . . . . . . . . . . . . . . 449--466 Yang Hu and Mingcong Song and Tao Li Towards ``Full Containerization'' in Containerized Network Function Virtualization . . . . . . . . . . . . . 467--481 Bo Wu and Xu Liu and Xiaobo Zhou and Changjun Jiang FLEP: Enabling Flexible and Efficient Preemption on GPUs . . . . . . . . . . . 483--496 Kaiwei Li and Jianfei Chen and Wenguang Chen and Jun Zhu SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs . . . . . . . . . . 497--509 Moein Khazraee and Lu Zhang and Luis Vega and Michael Bedford Taylor Moonwalk: NRE Optimization in ASIC Clouds . . . . . . . . . . . . . . . . . 511--526 Jason Jong Kyu Park and Yongjun Park and Scott Mahlke Dynamic Resource Management for Efficient Utilization of Multitasking GPUs . . . . . . . . . . . . . . . . . . 527--540 Rui Zhang and Natalie Stanley and Christopher Griggs and Andrew Chi and Cynthia Sturton Identifying Security Critical Properties for the Dynamic Verification of a Processor . . . . . . . . . . . . . . . 541--554 Andrew Ferraiuolo and Rui Xu and Danfeng Zhang and Andrew C. Myers and G. Edward Suh Verification of a Practical Hardware Security Architecture Through Static Information Flow Analysis . . . . . . . 555--568 David Chisnall and Brooks Davis and Khilan Gudka and David Brazdil and Alexandre Joannou and Jonathan Woodruff and A. Theodore Markettos and J. Edward Maste and Robert Norton and Stacey Son and Michael Roe and Simon W. Moore and Peter G. Neumann and Ben Laurie and Robert N. M. Watson CHERI JNI: Sinking the Java Security Model into the C . . . . . . . . . . . . 569--583 Xinyang Ge and Weidong Cui and Trent Jaeger GRIFFIN: Guarding Control Flows Using Intel Processor Trace . . . . . . . . . 585--598 Christina Delimitrou and Christos Kozyrakis Bolt: I Know What You Did Last Summer \ldots In The Cloud . . . . . . . . . . 599--613 Yiping Kang and Johann Hauswald and Cao Gao and Austin Rovinski and Trevor Mudge and Jason Mars and Lingjia Tang Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge . . . 615--629 Neha Agarwal and Thomas F. Wenisch Thermostat: Application-transparent Page Management for Two-tiered Main Memory 631--644 Antonio Barbalace and Robert Lyerly and Christopher Jelesnianski and Anthony Carno and Ho-Ren Chuang and Vincent Legout and Binoy Ravindran Breaking the Boundaries in Heterogeneous-ISA Datacenters . . . . . 645--659 Daniel Lustig and Andrew Wright and Alexandros Papakonstantinou and Olivier Giroux Automated Synthesis of Comprehensive Memory Model Litmus Test Suites . . . . 661--675 Haopeng Liu and Guangpu Li and Jeffrey F. Lukman and Jiaxin Li and Shan Lu and Haryadi S. Gunawi and Chen Tian DCatch: Automatically Detecting Distributed Concurrency Bugs in Cloud Systems . . . . . . . . . . . . . . . . 677--691 Ali José Mashtizadeh and Tal Garfinkel and David Terei and David Mazieres and Mendel Rosenblum Towards Practical Default-On Multi-Core Record/Replay . . . . . . . . . . . . . 693--708 Jian Huang and Michael Allen-Bond and Xuechen Zhang Pallas: Semantic-Aware Checking for Finding Deep Bugs in Fast Path . . . . . 709--722 Jagadish B. Kotra and Narges Shahidi and Zeshan A. Chishti and Mahmut T. Kandemir Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: a Case for Refresh-Aware Process Scheduling . . . . 723--736 Jinchun Kim and Elvira Teran and Paul V. Gratz and Daniel A. Jiménez and Seth H. Pugsley and Chris Wilkerson Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy . . . . . . . . . . . . . . . 737--749 Mingyu Gao and Jing Pu and Xuan Yang and Mark Horowitz and Christos Kozyrakis TETRIS: Scalable and Efficient Neural Network Acceleration with $3$D Memory 751--764 Wonjun Song and Gwangsun Kim and Hyungjoon Jung and Jongwook Chung and Jung Ho Ahn and Jae W. Lee and John Kim History-Based Arbitration for Fairness in Processor-Interconnect of NUMA Servers . . . . . . . . . . . . . . . . 765--777 Pulkit A. Misra and Jeffrey S. Chase and Johannes Gehrke and Alvin R. Lebeck Enabling Lightweight Transactions with Precision Time . . . . . . . . . . . . . 779--794 Ming Liu and Liang Luo and Jacob Nelson and Luis Ceze and Arvind Krishnamurthy and Kishore Atreya IncBricks: Toward In-Network Computation with an In-Network Cache . . . . . . . . 795--809 Ismail Akturk and Ulya R. Karpuzcu AMNESIAC: Amnesic Automatic Computer . . 811--824 Yuxin Bai and Victor W. Lee and Engin Ipek Voltage Regulator Efficiency Aware Power Management . . . . . . . . . . . . . . . 825--838
Norman P. Jouppi and Cliff Young and Nishant Patil and David Patterson and Gaurav Agrawal and Raminder Bajwa and Sarah Bates and Suresh Bhatia and Nan Boden and Al Borchers and Rick Boyle and Pierre-luc Cantin and Clifford Chao and Chris Clark and Jeremy Coriell and Mike Daley and Matt Dau and Jeffrey Dean and Ben Gelb and Tara Vazir Ghaemmaghami and Rajendra Gottipati and William Gulland and Robert Hagmann and C. Richard Ho and Doug Hogberg and John Hu and Robert Hundt and Dan Hurt and Julian Ibarz and Aaron Jaffey and Alek Jaworski and Alexander Kaplan and Harshit Khaitan and Daniel Killebrew and Andy Koch and Naveen Kumar and Steve Lacy and James Laudon and James Law and Diemthu Le and Chris Leary and Zhuyuan Liu and Kyle Lucke and Alan Lundin and Gordon MacKean and Adriana Maggiore and Maire Mahony and Kieran Miller and Rahul Nagarajan and Ravi Narayanaswami and Ray Ni and Kathy Nix and Thomas Norrie and Mark Omernick and Narayana Penukonda and Andy Phelps and Jonathan Ross and Matt Ross and Amir Salek and Emad Samadiani and Chris Severn and Gregory Sizikov and Matthew Snelham and Jed Souter and Dan Steinberg and Andy Swing and Mercedes Tan and Gregory Thorson and Bo Tian and Horia Toma and Erick Tuttle and Vijay Vasudevan and Richard Walter and Walter Wang and Eric Wilcox and Doe Hyun Yoon In-Datacenter Performance Analysis of a Tensor Processing Unit . . . . . . . . . 1--12 Swagath Venkataramani and Ashish Ranjan and Subarno Banerjee and Dipankar Das and Sasikanth Avancha and Ashok Jagannathan and Ajaya Durg and Dheemanth Nagaraj and Bharat Kaul and Pradeep Dubey and Anand Raghunathan ScaleDeep: a Scalable Compute Architecture for Learning and Evaluating Deep Networks . . . . . . . . . . . . . 13--26 Angshuman Parashar and Minsoo Rhu and Anurag Mukkara and Antonio Puglielli and Rangharajan Venkatesan and Brucek Khailany and Joel Emer and Stephen W. Keckler and William J. Dally SCNN: an Accelerator for Compressed-sparse Convolutional Neural Networks . . . . . . . . . . . . . . . . 27--40 Hari Cherupalli and Henry Duwe and Weidong Ye and Rakesh Kumar and John Sartori Bespoke Processors for Applications with Ultra-low Area and Power Constraints . . 41--54 Yajing Chen and Shengshuo Lu and Cheng Fu and David Blaauw and Ronald Dreslinski, Jr. and Trevor Mudge and Hun-Seok Kim A Programmable Galois Field Processor for the Internet of Things . . . . . . . 55--68 Aosen Wang and Lizhong Chen and Wenyao Xu XPro: a Cross-End Processing Architecture for Data Analytics in Wearables . . . . . . . . . . . . . . . 69--80 Ofir Weisse and Valeria Bertacco and Todd Austin Regaining Lost Cycles with HotCalls: a Fast Interface for SGX Secure Enclaves 81--93 Shaizeen Aga and Satish Narayanasamy InvisiMem: Smart Memory Defenses for Memory Bus Side Channel . . . . . . . . 94--106 Amro Awad and Yipeng Wang and Deborah Shands and Yan Solihin ObfusMem: a Low-Overhead Access Obfuscation for Trusted Memories . . . . 107--119 S. Karen Khatamifard and Longfei Wang and Weize Yu and Selçuk Köse and Ulya R. Karpuzcu ThermoGater: Thermally-Aware On-Chip Voltage Regulation . . . . . . . . . . . 120--132 Hailong Yang and Quan Chen and Moeiz Riaz and Zhongzhi Luan and Lingjia Tang and Jason Mars PowerChief: Intelligent Power Allocation for Multi-Stage Applications to Improve Responsiveness on Power Constrained CMP 133--146 Gokul Subramanian Ravi and Mikko H. Lipasti CHARSTAR: Clock Hierarchy Aware Resource Scaling in Tiled ARchitectures . . . . . 147--160 Matthew D. Sinclair and Johnathan Alsop and Sarita V. Adve Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics on Heterogeneous Systems . . . . . . . . . 161--174 Seunghee Shin and James Tuck and Yan Solihin Hiding the Long Latency of Persist Barriers Using Speculative Execution . . 175--186 Alberto Ros and Trevor E. Carlson and Mehdi Alipour and Stefanos Kaxiras Non-Speculative Load-Load Reordering in TSO . . . . . . . . . . . . . . . . . . 187--200 Doowon Lee and Valeria Bertacco MTraceCheck: Validating Non-Deterministic Behavior of Memory Consistency Models in Post-Silicon Validation . . . . . . . . . . . . . . . 201--213 Ruohuang Zheng and Michael C. Huang Redundant Memory Array Architecture for Efficient Selective Protection . . . . . 214--227 Matthew Hicks Clank: Architectural Support for Intermittent Computation . . . . . . . . 228--240 Manolis Kaliorakis and Dimitris Gizopoulos and Ramon Canal and Antonio Gonzalez MeRLiN: Exploiting Dynamic Instruction Behavior for Fast and Accurate Microarchitecture Level Reliability Assessment . . . . . . . . . . . . . . . 241--254 Minesh Patel and Jeremie S. Kim and Onur Mutlu The Reach Profiler (REAPER): Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions . . . . . . . . . . . . . . . 255--268 Zhenning Wang and Jun Yang and Rami Melhem and Bruce Childers and Youtao Zhang and Minyi Guo Quality of Service Support for Fine-Grained Sharing on GPUs . . . . . . 269--281 Sui Chen and Lu Peng and Samuel Irving Accelerating GPU Hardware Transactional Memory with Snapshot Isolation . . . . . 282--294 Kai Wang and Calvin Lin Decoupled Affine Computation for SIMT GPUs . . . . . . . . . . . . . . . . . . 295--306 Gunjae Koo and Yunho Oh and Won Woo Ro and Murali Annavaram Access Pattern-Aware Cache Management for Improving Data Utilization in GPU 307--319 Akhil Arunkumar and Evgeny Bolotin and Benjamin Cho and Ugljesa Milic and Eiman Ebrahimi and Oreste Villa and Aamer Jaleel and Carole-Jean Wu and David Nellans MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability . . . 320--332 Alireza Nazari and Nader Sehatbakhsh and Monjur Alam and Alenka Zajic and Milos Prvulovic EDDIE: EM-Based Detection of Deviations in Program Execution . . . . . . . . . . 333--346 Mengjia Yan and Bhargava Gopireddy and Thomas Shull and Josep Torrellas Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Atacks . . . . 347--360 Zhaoxia Deng and Ariel Feldman and Stuart A. Kurtz and Frederic T. Chong Lemonade from Lemons: Harnessing Device Wearout to Create Limited-Use Security Architectures . . . . . . . . . . . . . 361--374 Muhammad Shoaib Bin Altaf and David A. Wood LogCA: a High-Level Performance Model for Hardware Accelerators . . . . . . . 375--388 Raghu Prabhakar and Yaqi Zhang and David Koeplinger and Matt Feldman and Tian Zhao and Stefan Hadjis and Ardavan Pedram and Christos Kozyrakis and Kunle Olukotun Plasticine: a Reconfigurable Architecture For Parallel Paterns . . . 389--402 Jaeha Kung and Yun Long and Duckhwan Kim and Saibal Mukhopadhyay A Programmable Hardware Accelerator for Simulating Dynamical Systems . . . . . . 403--415 Tony Nowatzki and Vinay Gangadhar and Newsha Ardalani and Karthikeyan Sankaralingam Stream-Dataflow Acceleration . . . . . . 416--429 Zi Yan and Ján Veselý and Guilherme Cox and Abhishek Bhattacharjee Hardware Translation Coherence for Virtualized Systems . . . . . . . . . . 430--443 Chang Hyun Park and Taekyung Heo and Jungi Jeong and Jaehyuk Huh Hybrid TLB Coalescing: Improving TLB Translation Coverage under Diverse Fragmented Memory Allocations . . . . . 444--456 Hanna Alam and Tianhao Zhang and Mattan Erez and Yoav Etsion Do-It-Yourself Virtual Memory Translation . . . . . . . . . . . . . . 457--468 Jee Ho Ryoo and Nagendra Gulur and Shuang Song and Lizy K. John Rethinking TLB Designs in Virtualized Environments: a Very Large Part-of-Memory TLB . . . . . . . . . . . 469--480 Aasheesh Kolli and Vaibhav Gogte and Ali Saidi and Stephan Diestelhorst and Peter M. Chen and Satish Narayanasamy and Thomas F. Wenisch Language-level persistency . . . . . . . 481--493 Jiho Choi and Thomas Shull and Maria J. Garzaran and Josep Torrellas ShortCut: Architectural Support for Fast Object Access in Scripting Languages . . 494--506 Dibakar Gope and David J. Schlais and Mikko H. Lipasti Architectural Support for Server-Side PHP Processing . . . . . . . . . . . . . 507--520 Sudarsun Kannan and Ada Gavrilovska and Vishal Gupta and Karsten Schwan HeteroOS: OS Design for Heterogeneous Memory Management in Datacenter . . . . 521--534 Yongming Shen and Michael Ferdman and Peter Milder Maximizing CNN Accelerator Efficiency Through Resource Partitioning . . . . . 535--547 Jiecao Yu and Andrew Lukefahr and David Palframan and Ganesh Dasika and Reetuparna Das and Scott Mahlke Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism . . . . 548--560 Christopher De Sa and Matthew Feldman and Christopher Ré and Kunle Olukotun Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent . . . . . . . . . . . . 561--574 Zhaoshi Li and Leibo Liu and Yangdong Deng and Shouyi Yin and Yao Wang and Shaojun Wei Aggressive Pipelining of Irregular Applications on Reconfigurable Hardware 575--586 Suvinay Subramanian and Mark C. Jeffrey and Maleen Abeydeera and Hyun Ryong Lee and Victor A. Ying and Joel Emer and Daniel Sanchez Fractal: an Execution Model for Fine-Grain Nested Speculative Parallelism . . . . . . . . . . . . . . 587--599 Arun Subramaniyan and Reetuparna Das Parallel Automata Processor . . . . . . 600--612 Rajat Kateja and Anirudh Badam and Sriram Govindan and Bikash Sharma and Greg Ganger Viyojit: Decoupling Battery and DRAM Capacities for Battery-Backed DRAM . . . 613--626 Vinson Young and Prashant J. Nair and Moinuddin K. Qureshi DICE: Compressing DRAM Caches for Bandwidth and Capacity . . . . . . . . . 627--638 Mario Drumond and Alexandros Daglis and Nooshin Mirzadeh and Dmitrii Ustiugov and Javier Picorel and Babak Falsafi and Boris Grot and Dionisios Pnevmatikatos The Mondrian Data Engine . . . . . . . . 639--651 Po-An Tsai and Nathan Beckmann and Daniel Sanchez Jenga: Software-Defined Cache Hierarchies . . . . . . . . . . . . . . 652--665 Rahul Boyapati and Jiayi Huang and Pritam Majumder and Ki Hwan Yum and Eun Jung Kim APPROX-NoC: a Data Approximation Framework for Network-On-Chip Architectures . . . . . . . . . . . . . 666--677 Matthew Poremba and Itir Akgun and Jieming Yin and Onur Kayiran and Yuan Xie and Gabriel H. Loh There and Back Again: Optimizing the Interconnect in Networks of Memory Cubes 678--690 Binzhang Fu and John Kim Footprint: Regulating Routing Adaptiveness in Networks-on-Chip . . . . 691--702 Masoumeh Ebrahimi and Masoud Daneshtalab EbDa: a New Theory on Design and Verification of Deadlock-free Interconnection Networks . . . . . . . . 703--715