Table of contents for issues of ACM SIGARCH Computer Architecture News

Last update: Mon Nov 10 17:05:10 MST 2025

ACM SIGARCH Computer Architecture News
Volume 1, Number 2, April, 1972

               Caxton C. Foster   A review of dynamic memories with
                                  enhanced data access by Harold S. Stone.
                                  IEEETC Vol. C-21, #4, p 359--386, April
                                  1972 . . . . . . . . . . . . . . . . . . 3--7
                    M. Bataille   Something old: the Gamma 60 the computer
                                  that was ahead of its time . . . . . . . 10--15
               Caxton C. Foster   Something new: the Intel MCS-4 micro
                                  computer set . . . . . . . . . . . . . . 16--17
                   J. A. N. Lee   My next compiler . . . . . . . . . . . . 17--19
           Michael J. Flynn and   
              Mrs. Carol Rogers   Computer architecture at Johns Hopkins   21--33

ACM SIGARCH Computer Architecture News
Volume 1, Number 4, October, 1972

              R. F. Vaughan and   
                  R. A. Collins   On computer architecture, software
                                  portability & microprogramming  . . . . . 14--15
            James C. Brakefield   An optimal floating point format . . . . 16--17
                   J. E. Brewer   Recent doctoral dissertations of
                                  interest to SIGARCH  . . . . . . . . . . 18--20

ACM SIGARCH Computer Architecture News
Volume 2, Number 1, January, 1973

                 C. W. Bettcher   Thread standardization and relative cost 9--9
               Richard L. Sites   Floating point significance interrupt
                                  proposal . . . . . . . . . . . . . . . . 10--12
                  Caxton Foster   Computer architecture  . . . . . . . . . 13--18

ACM SIGARCH Computer Architecture News
Volume 2, Number 3, October, 1973

                 Louis S. Adler   A mini-computer configuration for CAI: a
                                  systems engineering view . . . . . . . . 10--19
            W. M. Gentleman and   
                 B. A. Wichmann   Timing on computers  . . . . . . . . . . 20--23
                    Karl Schank   Architectural assistance to software
                                  debugging aids . . . . . . . . . . . . . 37--38

ACM SIGARCH Computer Architecture News
Volume 2, Number 4, December, 1973

       Dileep P. Bhandarkar and   
               Samuel H. Fuller   Markov chain models for analyzing memory
                                  interference in multiprocessor computer
                                  systems  . . . . . . . . . . . . . . . . 1--6
             George A. Anderson   Interconnecting a distributed processor
                                  system for avionics  . . . . . . . . . . 11--16
             L. Rodney Goke and   
                 G. J. Lipovski   Banyan networks for partitioning
                                  multiprocessor systems . . . . . . . . . 21--28
            Harry F. Jordan and   
                Burton J. Smith   Structure of digital system description
                                  languages  . . . . . . . . . . . . . . . 31--34
                 John A. N. Lee   VDL---a definition system for all levels 41--48
           Charles H. Radoy and   
    George P. Copeland, Jr. and   
                 G. J. Lipovski   A methodology for parallel processing
                                  design tradeoffs . . . . . . . . . . . . 51--56
                 S. F. Reddaway   DAP---a distributed array processor  . . 61--65
                 Peter M. Kogge   Maximal rate pipelined solutions to
                                  recurrence problems  . . . . . . . . . . 71--76
             Tilak Agerwala and   
                     Mike Flynn   Comments on capabilities, limitations
                                  and ``correctness'' of Petri nets  . . . 81--86
         Wayne E. Omohundro and   
                James H. Tracey   Flowware---a flow charting procedure to
                                  describe digital networks  . . . . . . . 91--97
          Mario R. Barbacci and   
            Daniel P. Siewiorek   Automated exploration of the design
                                  space for register transfer (RT) systems 101--106
                 T. A. Laliotis   Implementation aspects of the symbol
                                  hardware compiler  . . . . . . . . . . . 111--115
    George P. Copeland, Jr. and   
             G. J. Lipovski and   
               Stanley Y. W. Su   The architecture of CASSM: a cellular
                                  system for non-numeric processing  . . . 121--128
           John M. Hemphill and   
                 S. A. Szygenda   Deriving design guidelines for
                                  diagnosable computer systems . . . . . . 131--135
            Behrooz Parhami and   
             Algirdas Avizienis   Design of fault-tolerant associative
                                  processors . . . . . . . . . . . . . . . 141--145
             M. A. Fischler and   
                   O. Firschein   A fault tolerant multiprocessor
                                  architecture for real-time control
                                  applications . . . . . . . . . . . . . . 151--157
                 G. J. Lipovski   A varistructured fail-soft cellular
                                  computer . . . . . . . . . . . . . . . . 161--165
               Jean Vaucher and   
                  Christian Rey   A hardware laboratory for computer
                                  architecture research  . . . . . . . . . 171--175
                    P. J. Knoke   Simulation exercises for computer
                                  architecture education . . . . . . . . . 181--185
                    M. E. Sloan   Computer architecture courses in
                                  electrical engineering departments . . . 191--195
                 R. Hartenstein   Increasing hardware complexity---a
                                  challenge to computer architecture
                                  education  . . . . . . . . . . . . . . . 201--206
                George Rossmann   Review of the \em Workshop on Computer
                                  Architecture Education . . . . . . . . . 211--214
              Richard G. Cooper   Micromodules: Microprogrammable building
                                  blocks for hardware development  . . . . 221--226
               S. H. Fuller and   
            D. P. Siewiorek and   
                     R. J. Swan   Computer Modules: an architecture for
                                  large digital modules  . . . . . . . . . 231--237
                    Rodnay Zaks   A microprogrammed architecture for front
                                  end processing . . . . . . . . . . . . . 241--246
             Z. G. Vranesic and   
             V. C. Hamacher and   
                    Y. Y. Leung   Design of a fully variable-length
                                  structured minicomputer  . . . . . . . . 251--255
                 Orin E. Marvel   Happe Honeywell Associative Parallel
                                  Processing Ensemble  . . . . . . . . . . 261--267
             Mario R. Schaffner   A computer architecture and its
                                  programming language . . . . . . . . . . 271--277

ACM SIGARCH Computer Architecture News
Volume 3, Number 1, March, 1974

                     John Shore   Conjecture corner  . . . . . . . . . . . 3--6
                 W. M. McKeeman   Computer design evaluation using
                                  programming language primitives  . . . . 7--18
          Reiner W. Hartenstein   Letter to membership from incoming
                                  chairman (CAN, Oct. 73)  . . . . . . . . 19--22

ACM SIGARCH Computer Architecture News
Volume 3, Number 2, June, 1974

              David Stryker and   
                    David Weiss   Secure system architecture . . . . . . . 37--38

ACM SIGARCH Computer Architecture News
Volume 3, Number 3, September, 1974

               Stephen Y. H. Su   Book review of \em Logic and Logic
                                  Design by B. Girling and H. G. Morning.
                                  International Textbook Company Limited
                                  1973 . . . . . . . . . . . . . . . . . . 2--3
                     John Shore   Conjecture corner  . . . . . . . . . . . 4--9

ACM SIGARCH Computer Architecture News
Volume 3, Number 4, December, 1974

               L. Nisnevich and   
                E. Strasbourger   Decentralized priority control in data
                                  communication  . . . . . . . . . . . . . 1--6
            Cecil C. Reames and   
                    Ming T. Liu   A loop network for simultaneous
                                  transmission of variable-length messages 7--12
                James F. Callan   The architecture of the Picture System   13--16
           John Staudhammer and   
         Jeffrey F. Eastman and   
               James N. England   A fast display-oriented processor  . . . 17--22
         Jeffrey F. Eastman and   
               John Staudhammer   Computer display of colored
                                  three-dimensional objects  . . . . . . . 23--27
                  Henry D. Kerr   A microprogrammed processor for
                                  interactive computer graphics  . . . . . 28--33
             C. V. W. Armstrong   Functional memory techniques applied to
                                  the microprogrammed control of an
                                  associative processor  . . . . . . . . . 34--40
              James F. Wade and   
                Paul D. Stigall   Instruction design to minimize program
                                  size . . . . . . . . . . . . . . . . . . 41--44
             James O. Bondi and   
                Paul D. Stigall   HMO, a hardware microcode optimizer  . . 45--51
                   A. M. Peskin   The computer aided design of processor
                                  architectures  . . . . . . . . . . . . . 51--55
                 W. H. Huen and   
                D. P. Siewiorek   Intermodule protocol for register
                                  transfer level modules: representation
                                  and analytic tools . . . . . . . . . . . 56--62
                Portia Isaacson   Picture systems, PS, and the design of a
                                  channel-to-channel computer interface    63--70
           Lennart Löfgren   Reference concepts in a tree structured
                                  address space  . . . . . . . . . . . . . 71--79
         Judith A. Anderson and   
                 G. J. Lipovski   A virtual memory for microprocessors . . 80--84
             R. E. Brundage and   
                   A. P. Batson   The performance enhancement of
                                  descriptor-based virtual memory systems
                                  through the use of associative registers 85--90
                 Orin E. Marvel   SPEAC: special purpose electronic area
                                  correlator . . . . . . . . . . . . . . . 91--94
           James M. Satterfield   Architectural advances of the space
                                  shuttle orbiter avionics computer system 95--98
              Uno R. Kodres and   
           William L. McCracken   Design study of an avionics navigation
                                  microcomputer  . . . . . . . . . . . . . 99--105
                 Gerald R. Kane   An iteratively structured information
                                  processor  . . . . . . . . . . . . . . . 106--112
           H. Richards, Jr. and   
                A. E. Oldehoeft   Hardware-software interactions in
                                  SYMBOL-2R's operating system . . . . . . 113--118
             Pierre Sylvain and   
                Maniel Vineberg   The design and evaluation of the array
                                  machine: a high-level language processor 119--125
             Jack B. Dennis and   
               David P. Misunas   A preliminary architecture for a basic
                                  data-flow processor  . . . . . . . . . . 126--132
                 K. J. Berkling   Reduction languages for reduction
                                  machines . . . . . . . . . . . . . . . . 133--140
             Willis K. King and   
               Fulvio Carbonaro   Output devices sharing by minicomputers  141--145
                  S. Rannem and   
             V. C. Hamacher and   
                 S. G. Zaky and   
                    P. Connolly   On relating small computer performance
                                  to design parameters . . . . . . . . . . 146--151
      Harold W. Lawson, Jr. and   
                Bengt Magnhagen   Advantages of structured hardware  . . . 152--158
                 Peter Kornerup   Concepts of the MATHILDA system  . . . . 159--164
               Caxton C. Foster   SOCRATES . . . . . . . . . . . . . . . . 165--169
             Donald F. Wann and   
                Robert A. Ellis   Conjoined computer systems: an
                                  architecture for laboratory data
                                  processing and instrument control  . . . 170--175
              E. Douglas Jensen   A distributed function computer for
                                  real-time control  . . . . . . . . . . . 176--182
                C. H. Radoy and   
                 G. J. Lipovski   Switched multiple instruction, multiple
                                  data stream processing . . . . . . . . . 183--187
              Robert J. Lechner   Sequentially encoded data structures
                                  that support bidirectional scanning  . . 188--194
                 Martin Freeman   An instruction class for an extensible
                                  interpreter  . . . . . . . . . . . . . . 195--200
                W. K. Giloi and   
                        H. Berg   STARLET: a computer concept based on
                                  ordered sets as primitive data types . . 201--206
              R. G. Cornell and   
                    H. C. Torng   A cellular general purpose computer  . . 207--213
         Barry C. Goldstein and   
            Thomas W. Scrutchin   A machine-oriented resource management
                                  architecture . . . . . . . . . . . . . . 214--219
                    M. E. Sloan   A design-oriented computer engineering
                                  program  . . . . . . . . . . . . . . . . 220--224
         Janis Beitch Baron and   
                   D. E. Atkins   An educational laboratory in
                                  contemporary digital design  . . . . . . 225--231

ACM SIGARCH Computer Architecture News
Volume 4, Number 1, March, 1975

                    W. R. Smith   AADC computer family architecture
                                  program  . . . . . . . . . . . . . . . . 4--8
              Åmund Lunde   More data on the O/W ratios: a note on a
                                  paper by Flynn . . . . . . . . . . . . . 9--13
           G. Jack Lipovski and   
              Stanley Y. W. and   
                             Sr   On non-numeric architecture  . . . . . . 14--29

ACM SIGARCH Computer Architecture News
Volume 4, Number 2, June, 1975

                Guy. G. Boulaye   Structured design for structured
                                  computer architecture  . . . . . . . . . 8--17

ACM SIGARCH Computer Architecture News
Volume 4, Number 3, September, 1975

                   D. L. Parnas   Evaluation criteria for abstract
                                  machines with unknown applications . . . 2--9
               William R. Smith   AADC computer family architecture
                                  questions and answers  . . . . . . . . . 15--21
               Stephen Y. H. Su   An introduction to CHDL (computer
                                  hardware description languages)  . . . . 22--23
                    R. W. Doran   The International Computers Ltd. ICL2900
                                  computer architecture  . . . . . . . . . 24--47

ACM SIGARCH Computer Architecture News
Volume 4, Number 4, January, 1976

                Gordon Bell and   
            William D. Strecker   Computer structures: What have we
                                  learned from the PDP-11? . . . . . . . . 1--14
              Helmut Kerner and   
                 Werner Beyerle   A PMS level language for performance
                                  evaluation modelling (V-PMS) . . . . . . 15--19
                  M. Moalla and   
                 G. Saucier and   
                 J. Sifakis and   
                 M. Zachariades   A design tool for the multilevel
                                  description and simulation of systems of
                                  interconnected modules . . . . . . . . . 20--27
                 Jonathan Allen   A course in computer structures  . . . . 28--32
             George E. Rossmann   The IEEE Computer Society task force on
                                  computer architecture  . . . . . . . . . 33--33
       Lawrence C. Widdoes, Jr.   The Minerva multi-microprocessor . . . . 34--39
               R. G. Arnold and   
                     E. W. Page   A hierarchical, restructurable
                                  multi-microprocessor architecture  . . . 40--45
              Robert McGill and   
                 John Steinhoff   A multimicroprocessor approach to
                                  numerical analysis: An application to
                                  gaming problems  . . . . . . . . . . . . 46--51
             John E. Jensen and   
                 Jean-Loup Baer   A model of interference in a shared
                                  resource multiprocessor  . . . . . . . . 52--57
        Clement K. C. Leung and   
           David P. Misunas and   
             Andrij Neczwid and   
                 Jack B. Dennis   A computer simulation facility for
                                  packet communication architecture  . . . 58--63
                     S. L. Rege   Cost, performance and size tradeoffs for
                                  different levels in a memory hierarchy   64--67
             Paul E. Dworak and   
                Alice C. Parker   An input interface for a real-time
                                  digital sound generation system  . . . . 68--73
          Michael C. Mulder and   
              Patrick P. Fasang   A microprocessor oriented data
                                  acquisition and control system for power
                                  system control . . . . . . . . . . . . . 74--78
              H. M. Gladney and   
                  G. Hochweller   Multiprogramming for real-time
                                  applications . . . . . . . . . . . . . . 79--85
               Theodore H. Kehl   Basil architecture --- an HLL
                                  minicomputer . . . . . . . . . . . . . . 86--92
          Harold W. Lawson, Jr.   Function distribution in computer system
                                  architectures  . . . . . . . . . . . . . 93--97
               Chris A. Vissers   Interface, a dispersed architecture  . . 98--104
               A. Thomasian and   
                   A. Avizienis   A design study of a shared resource
                                  computing system . . . . . . . . . . . . 105--112
                 W. S. Ford and   
                 V. C. Hamacher   Hardware support for inter-process
                                  communication and processor sharing  . . 113--118
            Ulrich Trambacz and   
                     Georg Hyla   A taxonomy of display processors . . . . 119--120
                    W. E. Kluge   Traversing binary tree structures with
                                  shift register memories (recent results) 121.1--121.1
       Eduardo B. Fernandez and   
            Rita C. Summers and   
             Charles D. Coleman   Architectural support for system
                                  protection (recent results)  . . . . . . 121.2--121.2
             James W. Gault and   
                Alice C. Parker   The design of a user-programmable
                                  digital interface (recent results) . . . 121.3--121.3
             Serge Fournier and   
                    Ming T. Liu   System design of a grammar-programmable
                                  high-level language machine  . . . . . . 122.4--122.4
                 Ch. Kuznia and   
                   R. Kober and   
                        H. Kopp   SMS 101 --- a structured multi
                                  microprocessor system with deadlock-free
                                  operation scheme . . . . . . . . . . . . 122.5--122.5
              Philip S. Liu and   
              Frederic J. Mowle   Selection schemes for dynamically
                                  microcoding Fortran programs . . . . . . 122.6--122.6
               S. H. Fuller and   
            D. P. Siewiorek and   
                     R. J. Swan   The design of a multi-micro-computer
                                  system . . . . . . . . . . . . . . . . . 123--123
            Cecil C. Reames and   
                    Ming T. Liu   Design and simulation of the distributed
                                  loop computer network (DLCN) . . . . . . 124--129
                  Paolo Franchi   Distribution of functions and control in
                                  RPCNET . . . . . . . . . . . . . . . . . 130--135
                Larry D. Wittie   Efficient message routing in
                                  Mega-Micro-Computer networks . . . . . . 136--140
                 Terry A. Welch   An investigation of descriptor oriented
                                  architecture . . . . . . . . . . . . . . 141--146
                  E. A. Feustel   Tagged architecture and the semantics of
                                  programming languages: Extensible types  147--150
               A. P. Batson and   
             R. E. Brundage and   
                   J. P. Kearns   Design data for Algol-60 machines  . . . 151--154
            William D. Strecker   Cache memories for PDP-11 family
                                  computers  . . . . . . . . . . . . . . . 155--158
             Janak H. Patel and   
             Edward S. Davidson   Improving the throughput of a pipeline
                                  by insertion of delays . . . . . . . . . 159--164
             A. M. Abd-Alla and   
               Laird H. Moffett   On-line architecture tuning using
                                  microcapture . . . . . . . . . . . . . . 165--171
               Leonard D. Healy   A character-oriented context-addressed
                                  segment-sequential storage . . . . . . . 172--177
                 J. A. Bush and   
             G. J. Lipovski and   
                S. Y. W. su and   
               J. K. Watson and   
                 S. J. Ackerman   Some implementations of segment
                                  sequential functions . . . . . . . . . . 178--185
          Manlio DeMartinis and   
           G. Jack Lipovski and   
           Stanley Y. W. Su and   
                   J. K. Watson   A Self Managing Secondary Memory system  186--194
               Samuel H. Fuller   Price/performance comparison of C.mmp
                                  and the PDP-10 . . . . . . . . . . . . . 195--202

ACM SIGARCH Computer Architecture News
Volume 5, Number 1, April, 1976

             Lars-Erik Thorelli   Representation of arrays in computers    6--9
                  Helmut Berndt   Evolutionary computer architecture: the
                                  Unidata 7.000 series . . . . . . . . . . 10--16
                 Jack B. Dennis   Computer architecture and the cost of
                                  software . . . . . . . . . . . . . . . . 17--21
               George Lindamood   On navel contemplation and the art of
                                  computer maintenance . . . . . . . . . . 22--23

ACM SIGARCH Computer Architecture News
Volume 5, Number 2, June, 1976

               S. H. Fuller and   
                   G. A. Mathew   Implementing microprogram storage with
                                  PLA's  . . . . . . . . . . . . . . . . . 6--11
                    D. R. Hicks   A generalized queue scheme for process
                                  synchronization and communication  . . . 12--14
                Glen G. Langdon   Book reviews: Review of \em Introduction
                                  to Computer Architecture by Harold S.
                                  Stone  . . . . . . . . . . . . . . . . . 17--19

ACM SIGARCH Computer Architecture News
Volume 5, Number 4, October, 1976

             Kenneth J. Thurber   ARPS: a new real-time computer . . . . . 6--16
              Alan B. Salisbury   MCF: a military computer family for
                                  computer-based systems . . . . . . . . . 17--20
                Frederic N. Ris   A unified decimal floating-point
                                  architecture for the support of
                                  high-level languages . . . . . . . . . . 21--31
               G. Jack Lipovski   A question of style  . . . . . . . . . . 32--38
                     G. Chroust   Data interfaces versus control
                                  interfaces: a half-baked conjecture  . . 39--40

ACM SIGARCH Computer Architecture News
Volume 5, Number 6, February, 1977

                Glen G. Langdon   Considerations on the ``figure of
                                  merit'' technique for storage hierarchy
                                  design . . . . . . . . . . . . . . . . . 25--28
               Edward F. Miller   Book Reviews: Review of \em High-Level
                                  Language Computer Architecture by Yaohan
                                  Chu. Academic Press, New York, 1975  . . 29--29

ACM SIGARCH Computer Architecture News
Volume 5, Number 7, March, 1977

                     Yaohan Chu   Architecture of a hardware data
                                  interpreter  . . . . . . . . . . . . . . 1--9
               Subrata Dasgupta   The design of some language constructs
                                  for horizontal microprogramming  . . . . 10--16
          E. Douglas Jensen and   
                Richard Y. Kain   The Honeywell Modular Microprogram
                                  Machine: M3  . . . . . . . . . . . . . . 17--28
        Richard R. Ramseyer and   
                Andries van Dam   A multi-microprocessor implementation of
                                  a general purpose pipelined CPU  . . . . 29--34
                 C. V. Ravi and   
                  Torben Moller   A hierarchical microcomputer system for
                                  hardware and software development  . . . 35--40
           J. Archer Harris and   
                 David R. Smith   Hierarchical multiprocessor
                                  organizations  . . . . . . . . . . . . . 41--48
                K. Hurakami and   
               S. Nishikawa and   
                        M. Sato   Poly-Processor System analysis and
                                  design . . . . . . . . . . . . . . . . . 49--56
                     Guy Mazare   A few examples of how to use a
                                  symmetrical multi-micro-processor  . . . 57--62
                 Peter M. Kogge   The microprogramming of pipelined
                                  processors . . . . . . . . . . . . . . . 63--69
              Howard Jay Siegel   The universality of various types of
                                  SIMD machine interconnection networks    70--79
         Ramakrishna B. Rau and   
             George E. Rossmann   The effect of instruction fetch
                                  strategies upon the performance of
                                  pipelined instruction units  . . . . . . 80--89
                S. R. Ahuja and   
                     J. R. Jump   A modular memory scheme for array
                                  processing . . . . . . . . . . . . . . . 90--94
              Leonard S. Haynes   The architecture of an ALGOL 60 computer
                                  implemented with distributed processors  95--104
           Herbert Sullivan and   
                  T. R. Bashkow   A large scale, homogeneous, fully
                                  distributed parallel machine, I  . . . . 105--117
           Herbert Sullivan and   
        Theodore R. Bashkow and   
                David Klappholz   A Large Scale, Homogeneous, Fully
                                  Distributed Parallel Machine, II . . . . 118--124
               G. Jack Lipovski   On virtual memories and micronetworks    125--134
             Jon C. Strauss and   
             Kenneth J. Thurber   Considerations for new tactical computer
                                  systems  . . . . . . . . . . . . . . . . 135--140
         Kenneth J. Thurber and   
            Peter C. Patton and   
           Robert C. Deward and   
             Jon C. Strauss and   
           Thomas W. Petschauer   An advanced tactical computer concept    141--146
                   Gary J. Nutt   Microprocessor implementation of a
                                  parallel processor . . . . . . . . . . . 147--152
                Paul Dworak and   
            Alice C. Parker and   
                   Richard Blum   The design and implementation of a
                                  real-time sound generation system  . . . 153--158
               A. C. Parker and   
                    A. W. Nagle   Hardware/software tradeoffs in a
                                  variable word width, variable queue
                                  length buffer memory . . . . . . . . . . 159--164
           Bernard L. Peuto and   
             Leonard J. Shustek   An instruction timing model of CPU
                                  performance  . . . . . . . . . . . . . . 165--178
        Cornelis H. Hoogendoorn   Reduction of memory interference in
                                  multiprocessor systems . . . . . . . . . 179--183
          D. W. Hammerstrom and   
                 E. S. Davidson   Information content of CPU memory
                                  referencing behavior . . . . . . . . . . 184--192
                Ming T. Liu and   
                Cecil C. Reames   Message communication protocol and
                                  operating system design for the
                                  Distributed Loop Computer Network (DLCN) 193--200
                G. H. Poujoulat   Architecture of the CORAIL building
                                  block system . . . . . . . . . . . . . . 201--204
           H. L. Tredennick and   
                    T. A. Welch   High-speed buffering for variable length
                                  operands . . . . . . . . . . . . . . . . 205--210

ACM SIGARCH Computer Architecture News
Volume 5, Number 8, April, 1977

                      Rod Steel   Another general purpose computer
                                  architecture . . . . . . . . . . . . . . 5--11
            George E. Lindamood   What's in a name?  . . . . . . . . . . . 12--14
               Conrad Schneiker   The microprocessors of the future  . . . 15--16
          Edward F. Miller, Jr.   Book review: Review of \em Large-Scale
                                  Computer Architecture: Parallel and
                                  Associative Processors by Kenneth J.
                                  Thurber, Hayden Book Company, Rochelle
                                  Park, New Jersey 1976  . . . . . . . . . 17--17

ACM SIGARCH Computer Architecture News
Volume 6, Number 1, June, 1977

          William M. Conner and   
              Edward R. Dirling   Input/Output considerations in
                                  look-ahead processing  . . . . . . . . . 7--12
                Robert F. Rosin   The significance of microprogramming . . 14--19
              Mario J. Gonzalez   Book review: Review of \em
                                  Microprogramming Primer by Harry Katzan,
                                  Jr., McGraw-Hill 1977  . . . . . . . . . 29--30

ACM SIGARCH Computer Architecture News
Volume 6, Number 2, May, 1977

                Maniel Vineberg   Implementation of character string
                                  pattern matching on a multiprocessor . . 1--7
                 R. M. Bird and   
                   J. C. Tu and   
                   R. M. Worthy   Associative/parallel processors for
                                  searching very large textual data bases  8--9
                 G. J. Lipovski   On imaginary fields, token transfers and
                                  floating codes in intelligent secondary
                                  memories . . . . . . . . . . . . . . . . 17--22
                     S. G. Zaky   Microprocessors for non-numeric
                                  processing . . . . . . . . . . . . . . . 23--30
             David K. Hsiao and   
           Krishnamurthi Kannan   The architecture of a database computer
                                  --- a summary  . . . . . . . . . . . . . 31--33
            Robert S. Rosenthal   The data management machine, a
                                  classification . . . . . . . . . . . . . 35--39
                Ken J. McDonell   Trends in non-software support for
                                  input-output functions . . . . . . . . . 40--47
                R. Cerretti and   
                 D. Jasilli and   
                D. R. Matteucci   Ulisse: An Italian project for a
                                  multifunctional terminal system  . . . . 48--50
                   Olin H. Bray   Data management requirements: The
                                  similarity of memory management,
                                  database systems, and message processing 68--76
           Barry M. Landson and   
              Robert G. Sargent   A comparison of sequential and associate
                                  computing of priority queues . . . . . . 77--78

ACM SIGARCH Computer Architecture News
Volume 6, Number 3, August, 1977

              Glenford J. Myers   The case against stack-oriented
                                  instruction sets . . . . . . . . . . . . 7--10
            Andrew S. Tanenbaum   Ambiguous machine architecture and
                                  program efficiency . . . . . . . . . . . 11--13
                    D. R. Hicks   Microprogramming with a
                                  content-addressable read-only-memory . . 14--15
                    D. R. Hicks   Multitasking as a program structuring
                                  primitive  . . . . . . . . . . . . . . . 16--18

ACM SIGARCH Computer Architecture News
Volume 6, Number 4, October, 1977

                     G. Chroust   Book reviews: Review of \em Digital
                                  System Implementation by Gerrit A.
                                  Blaauw, Prentice Hall, Series in
                                  Automatic Computation 1976 . . . . . . . 27--28

ACM SIGARCH Computer Architecture News
Volume 6, Number 5, December, 1977

                R. A. Hagan and   
                  C. S. Wallace   A virtual memory system for the Hewlett
                                  Packard 2100A  . . . . . . . . . . . . . 5--13
                 Forest Baskett   More on microprocessors of the future    14--17
                     Yaohan Chu   Direct-execution computer architecture   18--23
        Peter U. Schulthess and   
            Eduard P. Mumprecht   Reply to the case against stack-oriented
                                  instruction sets . . . . . . . . . . . . 24--27

ACM SIGARCH Computer Architecture News
Volume 6, Number 6, February, 1978

           John B. Mountain and   
               Philip H. Enslow   Application of the military computer
                                  family architecture selection criteria
                                  to the PR1ME P400  . . . . . . . . . . . 3--17
               G. Jack Lipovski   Just a few more words on microprocessors
                                  of the future  . . . . . . . . . . . . . 18--21
                    J. L. Keedy   On the use of stacks in the evaluation
                                  of expressions . . . . . . . . . . . . . 22--28
            Andrew S. Tanenbaum   Review of \em Processor Architecture by
                                  S. H. Lavington, NCC Publications,
                                  Manchester 1976  . . . . . . . . . . . . 31--31
                A. E. Whiteside   Book reviews: Review of \em The
                                  Architecture of Concurrent Programs by
                                  Per Brinch Hansen, Prentice-Hall 1977    32--32

ACM SIGARCH Computer Architecture News
Volume 7, Number 1, August, 1978

       Dileep P. Bhandarkar and   
              J. Egil Juliussen   Semiconductor technology: trends and
                                  implications . . . . . . . . . . . . . . 4--14
                    A. J. Payne   A computer console design to help the
                                  operator . . . . . . . . . . . . . . . . 15--22
              Daniel R. McGlynn   Review of \em Content Addressable
                                  Parallel Processors by Caxton C. Foster.
                                  Van Nostrand Reinhold Co. 1976 . . . . . 23--23
              C. V. Ramamoorthy   Review of \em Structured Computer
                                  Organization by Andrew S. Tanenbaum,
                                  Prentice-Hall 1976 . . . . . . . . . . . 23--23
                    W. Buchholz   Review of \em Computer System
                                  Architecture by M. Morris Mano,
                                  Prentice-Hall 1976 . . . . . . . . . . . 24--24
                 Z. G. Vranesic   Book reviews: Review of \em Content
                                  Addressable Parallel Processors by
                                  Caxton C. Foster, Van Nostrand Reinhold
                                  Co. 1976 . . . . . . . . . . . . . . . . 24--24

ACM SIGARCH Computer Architecture News
Volume 7, Number 2, August, 1978

             R. R. Korfhage and   
               W. H. E. Day and   
                 L. L. Beck and   
                  W. F. Appelbe   Data physics: an unorthodox view of data
                                  and its implications in data processors  1--7
             George P. Copeland   String storage and searching for data
                                  base applications: implementation on the
                                  INDY backend kernel  . . . . . . . . . . 8--17
              Allen J. Otis and   
             George P. Copeland   Editing requirements for data base
                                  applications and their implementation on
                                  the INDY backend kernel  . . . . . . . . 18--29
               G. Jack Lipovski   Semantic paging on intelligent discs . . 30--34
                  Rhon Williams   A multiprocessing system for the direct
                                  execution of LISP  . . . . . . . . . . . 35--41
                 R. M. Bird and   
             J. B. Newsbaum and   
                 J. L. Trefftzs   Text file inversion: an evaluation . . . 42--50
               David C. Roberts   A specialized computer architecture for
                                  text retrieval . . . . . . . . . . . . . 51--59
               M. J. Stucki and   
                  J. R. Cox and   
                G. C. Roman and   
                    P. N. Turcu   Coordinating concurrent access in a
                                  distributed database architecture  . . . 60--64
               Mohamed G. Gouda   A hierarchical controller for concurrent
                                  accessing of distributed databases . . . 65--70
             Bezalel Gavish and   
                    Harvey Koch   An extensible architecture for data flow
                                  processing . . . . . . . . . . . . . . . 71--76
                  J. B. Harvill   Functional parallelism in an operand
                                  state saving computer  . . . . . . . . . 77--84
            J. S. Hutchison and   
                    W. G. Roman   Madman machine . . . . . . . . . . . . . 85--90
           Jayanta Banerjee and   
                 David K. Hsiao   The use of a database machine for
                                  supporting relational databases  . . . . 91--98
           Paul J. Sadowski and   
                 S. A. Schuster   Exploiting parallelism in a Relational
                                  Associative Processor  . . . . . . . . . 99--109
                      Hsu Chang   Bubbles for relational database  . . . . 110--116
                A. El Masri and   
                  J. Rohmer and   
                      D. Tusera   A machine for information retrieval  . . 117--120
             Dante R. Matteucci   A distributed structure for the
                                  automization of the Catalog of the
                                  National Cultural Heritage: experiences
                                  and proposals  . . . . . . . . . . . . . 121--133

ACM SIGARCH Computer Architecture News
Volume 7, Number 3, October, 1978

             Kenneth J. Thurber   Computer communication techniques  . . . 7--16
                Hal W. Jennings   A variation on the PDP 11  . . . . . . . 17--26

ACM SIGARCH Computer Architecture News
Volume 7, Number 4, December, 1978

              Per Brinch Hansen   Multiprocessor architectures for
                                  concurrent programs  . . . . . . . . . . 4--23
                    J. L. Keedy   On the evaluation of expressions using
                                  accumulators, stacks and store-to-store
                                  instructions . . . . . . . . . . . . . . 24--27
                Rahul Chattergy   In the current literature  . . . . . . . 30--30

ACM SIGARCH Computer Architecture News
Volume 7, Number 5, February, 1979

               Harvey G. Cragon   An evaluation of code space requirements
                                  and performance of various architectures 5--21
         Kenneth J. Thurber and   
              Harvey A. Freeman   A bibliography of local computer network
                                  architectures  . . . . . . . . . . . . . 22--27

ACM SIGARCH Computer Architecture News
Volume 7, Number 7, April, 1979

               Lyle A. Cox, Jr.   The nature of ``computer architecture''  8--12
Jan L. A. van de Snepscheut and   
             Gert A. Slavenburg   Introducing the notion of processes to
                                  hardware . . . . . . . . . . . . . . . . 13--23
                   D. E. Atkins   Review of \em Advances in Computer
                                  Architecture by Glenford J. Myers.
                                  Wiley-Interscience Division of John
                                  Wiley and Sons 1978  . . . . . . . . . . 25--26
                Kevin W. Bowyer   Book review of \em The Structure of
                                  Computers and Computations: Volume One
                                  by David J. Kuck. John Wiley & Sons 1978  27--30

ACM SIGARCH Computer Architecture News
Volume 7, Number 8, June, 1979

             Randall Gibson and   
                  Paul Anderson   Technical overview of the Renaissance
                                  Octobus system . . . . . . . . . . . . . 2--9
         Johan W. Stevenson and   
            Andrew S. Tanenbaum   Efficient encoding of machine
                                  instructions . . . . . . . . . . . . . . 10--17
                    J. L. Keedy   More on the use of stacks in the
                                  evaluation of expressions  . . . . . . . 18--22
                    G. E. Quick   Intelligent memory: ``a parallel
                                  processing concept'' . . . . . . . . . . 23--28

ACM SIGARCH Computer Architecture News
Volume 7, Number 9, August, 1979

               Ronald L. Rivest   The BLIZZARD computer architecture . . . 2--10
                    J. L. Keedy   A technique for passing reference
                                  parameters in an information-hiding
                                  architecture . . . . . . . . . . . . . . 11--15

ACM SIGARCH Computer Architecture News
Volume 7, Number 10, October, 1979

      Krishna M. Kavipurapu and   
              Dennis J. Frailey   Quantification of architectures using
                                  software science . . . . . . . . . . . . 2--6
                  Trevor Turton   A proposed high-speed computer design    7--21
Computer Architecture News staff   In the current literature  . . . . . . . 22--22

ACM SIGARCH Computer Architecture News
Volume 8, Number 2, April, 1980

                  Dana Richards   On a ``Counter--Example''  . . . . . . . 2--3
               Peter J. Denning   Why not innovations in computer
                                  architecture?  . . . . . . . . . . . . . 4--7
                  G. W. Gerrity   Hardware detection of undefined
                                  references . . . . . . . . . . . . . . . 8--11
           Peter J. Denning and   
                  T. Don Dennis   On minimizing contention at semaphores   12--19

ACM SIGARCH Computer Architecture News
Volume 8, Number 3, 1980

             Jack B. Dennis and   
         G. Andrew Boughton and   
            Clement K. C. Leung   Building blocks for data flow prototypes 1--8
             Edward S. Davidson   A multiple stream microprocessor
                                  prototype system: AMP-1  . . . . . . . . 9--16
                   F. Andre and   
        J. P. Banâtre and   
                   H. Leroy and   
                   G. Paget and   
                F. Ployette and   
                  J. P. Routeau   KENSUR: An architecture oriented towards
                                  programming languages translation  . . . 17--22
                 J. G. Kuhl and   
                    S. M. Reddy   Distributed fault-tolerance for large
                                  multiprocessor systems . . . . . . . . . 23--30
                 Miroslaw Malek   A comparison connection assignment for
                                  diagnosis of multiprocessor systems  . . 31--36
         K. E. Grosspietsch and   
                  J. Kaiser and   
                        E. Nett   A concept for test and reconfiguration
                                  of a fault-tolerant VLSI processor
                                  system . . . . . . . . . . . . . . . . . 37--43
         Jean-Paul Brassard and   
                     Jan Gecsei   Path building in cellular partitioning
                                  networks . . . . . . . . . . . . . . . . 44--50
         Robert J. McMillen and   
              Howard Jay Siegel   MIMD machine communication using the
                                  augmented data manipulator network . . . 51--60
               John P. Shen and   
                  John P. Hayes   Fault tolerance of a class of connecting
                                  networks . . . . . . . . . . . . . . . . 61--71
         E. G. Coffman, Jr. and   
                     Kimming So   On the comparison between single and
                                  multiple processor systems . . . . . . . 72--79
           V. Carl Hamacher and   
              Gerald S. Shedler   Performance of a collision-free local
                                  bus network having asynchronous
                                  distributed control  . . . . . . . . . . 80--87
                  W. M. Zuberek   Timed Petri nets and preliminary
                                  performance evaluation . . . . . . . . . 88--96
            David R. Ditzel and   
             David A. Patterson   Retrospective on high-level language
                                  computer architecture  . . . . . . . . . 97--104
            J. P. Sansonnet and   
                  M. Castan and   
                   C. Percebois   M3L: a list-directed architecture  . . . 105--112
                 Yasushi Hibino   A Practical Parallel Garbage Collection
                                  Algorithm and Its Implementation . . . . 113--120
        Philip C. Treleaven and   
               Geoffrey F. Mole   A multi-processor reduction machine for
                                  user-defined reduction languages . . . . 121--130
              Jeffrey M. Tobias   A single user multiprocessor
                                  incorporating processor manipulation
                                  facilities . . . . . . . . . . . . . . . 131--138
    Robert H. Halstead, Jr. and   
                Stephen A. Ward   The MuNet: a scalable decentralized
                                  architecture for parallel computation    139--145
          Butler W. Lampson and   
                Kenneth A. Pier   A processor for a high-performance
                                  personal computer  . . . . . . . . . . . 146--160
           D. B. G. Edwards and   
              A. E. Knowles and   
                    J. V. Woods   MU6-G: a new design to achieve mainframe
                                  performance from a mini-sized computer   161--167
             Kenneth E. Batcher   Architecture of a massively parallel
                                  processor  . . . . . . . . . . . . . . . 168--173
                    John Palmer   The Intel 8087 numeric data processor    174--181
                 Robert H. Kuhn   Efficient mapping of algorithms to
                                  single-stage interconnections  . . . . . 182--189
              David Nassimi and   
                   Sartaj Sahni   A self routing Benes network . . . . . . 190--195
          H. von Issendorff and   
              W. Grünewald   An adaptable network for functional
                                  distributed systems  . . . . . . . . . . 196--201
            Mokhtar Boshra Riad   A combination of field and current
                                  access techniques for efficient and
                                  cost-effective bubble memories . . . . . 202--210
                  K. S. Trivedi   Designing linear storage hierarchies so
                                  as to maximize reliability subject to
                                  cost and performance constraints . . . . 211--217
            Sudhir R. Ahuja and   
             Charles S. Roberts   An associative/parallel processor for
                                  partial match retrieval using
                                  superimposed codes . . . . . . . . . . . 218--227
             M. D. Ruggiero and   
                     S. G. Zaky   A microprocessor-based virtual memory
                                  system . . . . . . . . . . . . . . . . . 228--235
              Anand Jagannathan   A technique for the architectural
                                  implementation of software subsystems    236--244
                Viktors Berstis   Security and protection of data in the
                                  IBM System/38  . . . . . . . . . . . . . 245--252
  Miguel García Hoffmann   Hardware implementation of communication
                                  protocols: a formal approach . . . . . . 253--263
                P. Guillier and   
                    D. Slosberg   An architecture with comprehensive
                                  facilities of inter-process
                                  synchronization and communication  . . . 264--270
         Robert M. Lougheed and   
             David L. McCubbrey   The cytocomputer: a practical pipelined
                                  image processor  . . . . . . . . . . . . 271--277
                C. Halatsis and   
                 A. van Dam and   
                 J. Joosten and   
                    M. Letheren   Architectural considerations for a
                                  microprogrammable emulating engine using
                                  bit-slices . . . . . . . . . . . . . . . 278--291
            Mary Jane Irwin and   
                     Don Heller   Online pipeline systems for recursive
                                  numeric computations . . . . . . . . . . 292--299
               M. J. Foster and   
                     H. T. Kung   Design of special-purpose VLSI chips:
                                  Example and opinions . . . . . . . . . . 300--307
               Anshul Kumar and   
                 P. C. P. Bhatt   A structured language for CAD of digital
                                  systems  . . . . . . . . . . . . . . . . 308--316
               Uwe Hercksen and   
                Rainer Klar and   
        Wolfgang Kleinöder   Hardware-measurements of storage access
                                  conflicts in the processor array EGPA(1) 317--324
               Mario Tokoro and   
            Kiichiro Tamaru and   
             Masaaki Mizuno and   
                     Masao Hori   A high level multi-lingual
                                  multiprocessor KMP/II  . . . . . . . . . 325--333

ACM SIGARCH Computer Architecture News
Volume 8, Number 4, June, 1980

                   Ken Aupperle   A real innovation in computer
                                  architecture . . . . . . . . . . . . . . 6--7
          John R. Galloway, Jr.   Architectural innovation round: round #3 8--10
                  John A. Sharp   Some thoughts on data flow architectures 11--21
                 Mary Payne and   
              Dileep Bhandarkar   VAX floating point: a solid foundation
                                  for numerical computation  . . . . . . . 22--33
                  Lloyd Dickman   Treasurer's report . . . . . . . . . . . 37--38
Computer Architecture News staff   Current literature: abstracts of
                                  articles of interest\ldots . . . . . . . 48--48

ACM SIGARCH Computer Architecture News
Volume 8, Number 5, August, 1980

                  Julian Davies   Clock architecture and management  . . . 3--6
                 G. Chroust and   
          J. R. Mühlbacher   Rivalling multiprocessor organization: a
                                  hardware/speed trade-off . . . . . . . . 7--10
                David Stevenson   A report on the proposed IEEE Floating
                                  Point Standard (IEEE Task p754)  . . . . 11--12

ACM SIGARCH Computer Architecture News
Volume 8, Number 6, October, 1980

             Justin Rattner and   
                     George Cox   Object-based computer architecture . . . 4--11
                G. J. Myers and   
            B. R. S. Buckingham   A hardware implementation of
                                  capability-based addressing  . . . . . . 12--24
         David A. Patterson and   
                David R. Ditzel   The case for the reduced instruction set
                                  computer . . . . . . . . . . . . . . . . 25--33
           Douglas W. Clark and   
            William D. Strecker   Comments on ``The Case for the Reduced
                                  Instruction Set Computer,'' by Patterson
                                  and Ditzel . . . . . . . . . . . . . . . 34--38
            James C. Brakefield   Is 32 bits of address too much?  . . . . 39--40
            James C. Brakefield   The peripheral bus . . . . . . . . . . . 41--43
                   Trevor Mudge   Book reviews: Review of \em The
                                  Structure of Computers and Computation,
                                  Vol. I by David J. Kuck, John Wiley & and
                                  Sons 1978  . . . . . . . . . . . . . . . 44--45
Computer Architecture News Staff   Current literature: abstracts of
                                  articles of interest\ldots . . . . . . . 46--46

ACM SIGARCH Computer Architecture News
Volume 8, Number 7, October, 1980

                      Karl Reed   The way forward in computer architecture
                                  research . . . . . . . . . . . . . . . . 3--7
                   John Gilmore   Suggested enhancements to the Motorola
                                  MC68000  . . . . . . . . . . . . . . . . 8--14
                John F. Wakerly   Pascal extensions for describing
                                  computer instruction sets  . . . . . . . 15--23
                Krishna M. Kavi   Semantics of an algorithm  . . . . . . . 24--26
            Philip C. Treleaven   VLSI: machine architecture and very high
                                  level languages  . . . . . . . . . . . . 27--38

ACM SIGARCH Computer Architecture News
Volume 9, Number 1, February, 1981

                  Lloyd Dickman   SIGARCH business . . . . . . . . . . . . 7--8

ACM SIGARCH Computer Architecture News
Volume 9, Number 2, April, 1981

           Martin L. De Prycker   A new index mode for the VAX-11  . . . . 10--11
                David Stevenson   The Phoenix Project  . . . . . . . . . . 12--15
           E. M. J. C. Van Oost   Multi-processor system description and
                                  simulation using structured
                                  multi-programming languages  . . . . . . 16--32
                   John Wakerly   Book review: Review of 'The Computers
                                  that Saved Metropolis, by DC Comics and
                                  Radio Shack', July 1980  . . . . . . . . 33--34

ACM SIGARCH Computer Architecture News
Volume 9, Number 3, 1981

                     Arvind and   
                     V. Kathail   A Multiple Processor Data Flow Machine
                                  that Supports Generalized Procedures . . ??

ACM SIGARCH Computer Architecture News
Volume 9, Number 4, June, 1981

                  G. W. Gerrity   On processes and interrupts  . . . . . . 4--14
                 Dwight D. Hill   A hardware mechanism for supporting
                                  range checks . . . . . . . . . . . . . . 15--21
        Vladimir S. Cherniavsky   The computing memory another distributed
                                  computer architecture  . . . . . . . . . 22--24
              James E. Thornton   8th Annual Symposium on Computer
                                  Architecture: Heterogeneous Computer
                                  Architecture . . . . . . . . . . . . . . 25--33
Computer Architecture News Staff   Errata for two publications  . . . . . . 34--34

ACM SIGARCH Computer Architecture News
Volume 9, Number 5, August, 1981

              Donald C. Lindsay   Cache memory for microprocessors . . . . 6--13
                Krishna M. Kavi   Innovative architectures and commercial
                                  computers: a summary of the panel
                                  discussion at NCC 1981 . . . . . . . . . 14--16
             R. M. Jenevein and   
                 ?. DeGroot and   
               G. Jack Lipovski   Errata: ``A hardware support mechanism
                                  for scheduling resources in parallel
                                  machine environment'': (from Proceedings
                                  of the 8th Annual Symposium on Computer
                                  Architecture, p. 57) . . . . . . . . . . 17--17

ACM SIGARCH Computer Architecture News
Volume 9, Number 6, October, 1981

                     C. K. Yuen   Extending the power of short-wordlength
                                  processors by means of context-dependent
                                  machine instructions . . . . . . . . . . 9--15
             Allan Gottlieb and   
               Clyde P. Kruskal   Coordinating parallel processors: a
                                  partial unification  . . . . . . . . . . 16--24
                      Anonymous   Errata: Structured machine design: an
                                  ongoing experiment . . . . . . . . . . . 25--25

ACM SIGARCH Computer Architecture News
Volume 10, Number 1, January, 1982

               Charlie McDowell   Protection at the micromachine level . . 4--8
              Edward A. Feustel   Protected procedure call on the
                                  PRIME(TM) machines . . . . . . . . . . . 9--22
           Hossam El-Halabi and   
              Dharma P. Agrawal   Some remarks on direct execution
                                  computers  . . . . . . . . . . . . . . . 23--27
      Daniel T. Fitzpatrick and   
           John K. Foderaro and   
    Manolis G. H. Katevenis and   
          Howard A. Landman and   
         David A. Patterson and   
              James B. Peek and   
               Zvi Peshkess and   
     Carlo H. Séquin and   
        Robert W. Sherburne and   
             Korbin S. Van Dyke   A RISCy approach to VLSI . . . . . . . . 28--32

ACM SIGARCH Computer Architecture News
Volume 10, Number 2, March, 1982

                 Justin Rattner   Hardware/software cooperation in the
                                  iAPX-432 . . . . . . . . . . . . . . . . 1--1
              John Hennessy and   
              Norman Jouppi and   
             Forest Baskett and   
               Thomas Gross and   
                      John Gill   Hardware/software tradeoffs for
                                  increased performance  . . . . . . . . . 2--11
             James W. Rymarczyk   Coding guidelines for pipelined
                                  processors . . . . . . . . . . . . . . . 12--19
        Richard K. Johnsson and   
                   John D. Wick   An overview of the mesa processor
                                  architecture . . . . . . . . . . . . . . 20--29
          Alan D. Berenbaum and   
          Michael W. Condry and   
                Priscilla M. Lu   The operating system and language
                                  support features of the BELLMACTM-32
                                  microprocessor . . . . . . . . . . . . . 30--38
                   George Radin   The 801 minicomputer . . . . . . . . . . 39--47
            David R. Ditzel and   
                 H. R. McLellan   Register allocation for free: The C
                                  machine stack cache  . . . . . . . . . . 48--56
             Samuel P. Harbison   An architectural alternative to
                                  optimizing compilers . . . . . . . . . . 57--65
              Butler W. Lampson   Fast procedure calls . . . . . . . . . . 66--76
               Douglas W. Jones   Systematic protection mechanism design   77--80
                      Karl Reed   On a general property of memory mapping
                                  tables . . . . . . . . . . . . . . . . . 81--86
             Robert P. Cook and   
                    Nitin Donde   An experiment to improve operand
                                  addressing . . . . . . . . . . . . . . . 87--91
              Akira Fusaoka and   
              Masaharu Hirayama   Compiler chip: a hardware implementation
                                  of compiler  . . . . . . . . . . . . . . 92--95
                  B. R. Rau and   
              C. D. Glaeser and   
               E. M. Greenawalt   Architectural support for the efficient
                                  generation of code for horizontal
                                  architectures  . . . . . . . . . . . . . 96--99
               R. E. McLear and   
          D. M. Scheibelhut and   
                     E. Tammaru   Guidelines for creating a debuggable
                                  processor  . . . . . . . . . . . . . . . 100--106
                   M. V. Wilkes   Hardware support for memory protection:
                                  Capability implementations . . . . . . . 107--116
            Fred J. Pollack and   
              George W. Cox and   
         Dan W. Hammerstrom and   
              Kevin C. Kahn and   
              Konrad K. Lai and   
              Justin R. Rattner   Supporting Ada memory management in the
                                  iAPX-432 . . . . . . . . . . . . . . . . 117--131
            J. P. Sansonnet and   
                  M. Castan and   
               C. Percebois and   
                 D. Botella and   
                       J. Perez   Direct execution of Lisp on a
                                  list-directed architecture . . . . . . . 132--139
             Mark Scott Johnson   Some requirements for architectural
                                  support of software debugging  . . . . . 140--148
               C. A. Middelburg   The effect of the PDP-11 architecture on
                                  code generation for chill  . . . . . . . 149--157
           Richard E. Sweet and   
          James G. Sandman, Jr.   Empirical analysis of the mesa
                                  instruction set  . . . . . . . . . . . . 158--166
                  Gene McDaniel   An analysis of a mesa instruction set
                                  using dynamic instruction frequencies    167--176
               Cheryl A. Wiecek   A case study of VAX-11 instruction set
                                  usage for compiler execution . . . . . . 177--184
             Mamoru Maekawa and   
               Ken Sakamura and   
                Chiaki Ishikawa   Firmware structure and architectural
                                  support for monitors, vertical migration
                                  and user microprogramming  . . . . . . . 185--194
             N. Kamibayashi and   
                 H. Ogawana and   
                K. Nagayama and   
                        H. Aiso   Heart: an operating system nucleus
                                  machine implemented by firmware  . . . . 195--204
            Sudhir R. Ahuja and   
                 Abhaya Asthana   A multi-microprocessor architecture with
                                  hardware support for communication and
                                  scheduling . . . . . . . . . . . . . . . 205--209

ACM SIGARCH Computer Architecture News
Volume 10, Number 3, April, 1982

         David A. Patterson and   
              Richard S. Piepho   RISC assessment: a high-level language
                                  experiment . . . . . . . . . . . . . . . 3--8
           Douglas W. Clark and   
                  Henry M. Levy   Measurement and analysis of instruction
                                  use in the VAX-11/780  . . . . . . . . . 9--17
               Krishna Kavi and   
      Boumediene Belkhouche and   
             Evelyn Bullard and   
             Lois Delcambre and   
                Stephen Nemecek   HLL architectures: Pitfalls and
                                  predilections  . . . . . . . . . . . . . 18--23
             Allan Gottlieb and   
             Ralph Grishman and   
           Clyde P. Kruskal and   
         Kevin P. McAuliffe and   
              Larry Rudolph and   
                      Marc Snir   The NYU Ultracomputer---designing a
                                  MIMD, shared-memory parallel machine
                                  (extended abstract)  . . . . . . . . . . 27--42
              King-Hang Chu and   
                    King-Sun Fu   VLSI architectures for high speed
                                  recognition of context-free languages
                                  and finite-state languages . . . . . . . 43--49
           Mark A. Franklin and   
                 Donald F. Wann   Asynchronous and clocked control
                                  structures for VLSI based
                                  interconnection networks . . . . . . . . 50--59
         Robert J. McMillen and   
              Howard Jay Siegel   Performance and fault tolerance
                                  improvements in the Inverse Augmented
                                  Data Manipulator network . . . . . . . . 63--72
               D. S. Parker and   
              C. S. Raghavendra   The Gamma network: a multiprocessor
                                  interconnection network with redundant
                                  paths  . . . . . . . . . . . . . . . . . 73--80
             R. M. Jenevein and   
                   J. C. Browne   A control processor for a reconfigurable
                                  array computer . . . . . . . . . . . . . 81--89
            Laxmi N. Bhuyan and   
              Dharma P. Agrawal   A general class of processor
                                  interconnection strategies . . . . . . . 90--98
                F. J. Burkowski   Instruction set design issues relating
                                  to a static dataflow computer  . . . . . 101--111
                 James E. Smith   Decoupled access/execute computer
                                  architectures  . . . . . . . . . . . . . 112--119
           L. J. Caluwaerts and   
                J. Debacker and   
             J. A. Peperstraete   A data flow architecture with a paged
                                  memory system  . . . . . . . . . . . . . 120--127
         B. Ramakrishna Rau and   
     Christopher D. Glaeser and   
              Raymond L. Picard   Efficient code generation for horizontal
                                  architectures: Compiler techniques and
                                  architectural support  . . . . . . . . . 131--139
                 Gene C. Barton   Sentry: a novel hardware implementation
                                  of classic operating system mechanisms   140--147
              M. Abramovici and   
             Y. H. Levendel and   
                    P. R. Menon   A logic simulation machine . . . . . . . 148--157
           Subrata Dasgupta and   
                Marius Olafsson   Towards a family of languages for the
                                  design and implementation of machine
                                  architectures  . . . . . . . . . . . . . 158--167
              Yann-Hang Lee and   
                   Kang G. Shin   Rollback propagation detection and
                                  performance evaluation of FTMR2M---a
                                  fault-tolerant multiprocessor  . . . . . 171--180
                   Woei Lin and   
                   Chuan-lin Wu   Design of a $ 2 \times 2 $
                                  fault-tolerant switching element . . . . 181--189
             Donald Fussell and   
                   Peter Varman   Fault-tolerant wafer-scale architectures
                                  for VLSI . . . . . . . . . . . . . . . . 190--198
                 Sakti Pramanik   Database filters . . . . . . . . . . . . 201--210
               Mario Tokoro and   
               Takashi Takizuka   On the semantic structure of information
                                  --- a proposal of the abstract storage
                                  architecture . . . . . . . . . . . . . . 211--217
              Yasunori Dohi and   
               Akira Suzuki and   
                Noriyuki Matsui   Hardware sorter and its application to
                                  data base machine  . . . . . . . . . . . 218--225
        Philip C. Treleaven and   
             Richard P. Hopkins   A recursive computer architecture for
                                  VLSI . . . . . . . . . . . . . . . . . . 229--238
                  M. Castan and   
                 E. I. Organick   $ \mu $3L: an HLL-RISC processor for
                                  parallel execution of FP-language
                                  programs . . . . . . . . . . . . . . . . 239--247
                      F. Hommes   The heap/substitution concept --- an
                                  implementation of functional operations
                                  on data structures for a reduction
                                  machine  . . . . . . . . . . . . . . . . 248--256
          Paul F. Reynolds, Jr.   A shared resource algorithm for
                                  distributed simulation . . . . . . . . . 259--266
               Bijendra N. Jain   Duplication of packets and their
                                  detection in X.25 communication
                                  protocols  . . . . . . . . . . . . . . . 267--273
            Pauline Markenscoff   A multiple processor system for real
                                  time control tasks . . . . . . . . . . . 274--280
             Leslie Jill Miller   A heterogeneous multiprocessor design
                                  and the distributed scheduling of its
                                  task group workload  . . . . . . . . . . 283--290
            George H. Goble and   
               Michael H. Marsh   A dual processor VAX 11/780  . . . . . . 291--298
              Michel Dubois and   
               Fay\.e A. Briggs   Effects of cache coherency in
                                  multiprocessors  . . . . . . . . . . . . 299--308
                T. N. Mudge and   
                 B. A. Makrucki   Probabilistic analysis of a crossbar
                                  switch . . . . . . . . . . . . . . . . . 311--320
          Steven P. Levitan and   
               Caxton C. Foster   Finding an extremum in a network . . . . 321--325
            U. V. Premkumar and   
                   J. C. Browne   Resource allocation in rectangular SW
                                  banyans  . . . . . . . . . . . . . . . . 326--333
                      Anonymous   List of authors  . . . . . . . . . . . . 335--335

ACM SIGARCH Computer Architecture News
Volume 10, Number 4, June, 1982

           Alastair J. W. Mayer   The architecture of the Burroughs B5000:
                                  20 years later and still ahead of the
                                  times? . . . . . . . . . . . . . . . . . 3--10
            James C. Brakefield   From the other side of the Atlantic: how
                                  to improve upon the MU5 design . . . . . 11--16
             Paul M. Hansen and   
             Mark A. Linton and   
             Robert N. Mayo and   
          Marguerite Murphy and   
             David A. Patterson   A performance evaluation of the Intel
                                  iAPX 432 . . . . . . . . . . . . . . . . 17--26
                  Miquel Huguet   The protection of the processor status
                                  word of the PDP-11/60  . . . . . . . . . 27--30
               James Brakefield   Just what is an op-code?: or a universal
                                  computer design  . . . . . . . . . . . . 31--34

ACM SIGARCH Computer Architecture News
Volume 10, Number 5, September, 1982

                J. D. Knott and   
                 T. W. Crockett   Fair dynamic arbitration for a
                                  multiprocessor communications bus  . . . 4--9
                 James R. Larus   A comparison of microcode, assembly
                                  code, and high-level languages on the
                                  VAX-11 and RISC I  . . . . . . . . . . . 10--15
             David A. Patterson   A performance evaluation of the Intel
                                  80286  . . . . . . . . . . . . . . . . . 16--18
                       Rod Egan   The effect of VLSI on computer
                                  architecture . . . . . . . . . . . . . . 19--22
                  Thomas Benzie   Book reviews: Review of \em
                                  Microcomputer Architecture and
                                  Programming by John F. Wakerly, John
                                  Wiley & Sons, Inc., 1981  . . . . . . . . 23--23

ACM SIGARCH Computer Architecture News
Volume 10, Number 6, December, 1982

              Henry M. Levy and   
               Douglas W. Clark   On the use of benchmarks for measuring
                                  system performance . . . . . . . . . . . 5--8
           Peter Schulthess and   
                 Fritz Vonaesch   OPA: a new architecture for Pascal-like
                                  languages  . . . . . . . . . . . . . . . 9--20
            James C. Brakefield   Talk on interpreters . . . . . . . . . . 21--28
                    D. W. Doran   Main frame computer trends . . . . . . . 29--44

ACM SIGARCH Computer Architecture News
Volume 11, Number 1, March, 1983

              Daniel Gajski and   
                 David Kuck and   
              Duncan Lawrie and   
                    Ahmed Sameh   CEDAR: a large scale multiprocessor  . . 7--11
              Elaine French and   
                    Hugh Glaser   TUKI: a data flow processor  . . . . . . 12--18
                  Nenad Marovac   A systematic approach to the design and
                                  implementation of a computer instruction
                                  set  . . . . . . . . . . . . . . . . . . 19--24
                  Harvey Cragon   Executable instruction set specification 25--43
          Robert P. Colwell and   
       Charles Y. Hitchcock and   
              E. Douglas Jensen   Peering through the RISC/CISC fog: an
                                  outline of research  . . . . . . . . . . 44--50
                 G. W. Gorsline   Review of \em Advances in Computer
                                  Architecture by Glenford J. Myers, John
                                  Wiley & Sons, Inc. 1982 . . . . . . . . . 55--55
                    M. W. Sachs   Book reviews: Review of \em
                                  Microcomputer Interfacing by G. Jack
                                  Lipovski, Lexington Books 1980 . . . . . 55--55

ACM SIGARCH Computer Architecture News
Volume 11, Number 2, June, 1983

             David Abramson and   
                 John Rosenberg   Hardware support for program debuggers
                                  in a paged virtual memory  . . . . . . . 8--19
              Dennis J. Frailey   Word length of a computer architecture
                                  definitions and applications . . . . . . 20--26
                 Lee A. Hollaar   Book reviews: Review of \em Computer
                                  Design by Glen G. Langdon, Computeach
                                  Press  . . . . . . . . . . . . . . . . . 27--28

ACM SIGARCH Computer Architecture News
Volume 11, Number 3, June, 1983

              Maurice V. Wilkes   Size, power, and speed (keynote address) 2--4
                    W. K. Giloi   Towards a taxonomy of computer
                                  architecture based on the machine data
                                  type view  . . . . . . . . . . . . . . . 6--15
           Algirdas Avi\vzienis   Framework for a taxonomy of
                                  fault-tolerance attributes in computer
                                  systems  . . . . . . . . . . . . . . . . 16--21
         Björn Pehrson and   
                 Joachim Parrow   Caddie an interactive design environment 24--31
               Subrata Dasgupta   On the verification of computer
                                  architectures using an architecture
                                  description language . . . . . . . . . . 32--38
                Richard M. King   Research on synthesis of concurrent
                                  computing systems (extended abstract)    39--46
            Allan L. Fisher and   
                 H. T. Kung and   
            Louis M. Monier and   
                  Yasunori Dohi   Architecture of the PSC---a programmable
                                  systolic chip  . . . . . . . . . . . . . 48--53
            Allan L. Fisher and   
                     H. T. Kung   Synchronizing large VLSI processor
                                  arrays . . . . . . . . . . . . . . . . . 54--58
               Robert A. Wagner   The Boolean Vector Machine [BVM] . . . . 59--66
           M. A. Bonuccelli and   
                    E. Lodi and   
                  F. Luccio and   
               P. Maestrini and   
                       L. Pagli   A VLSI tree machine for relational data
                                  bases  . . . . . . . . . . . . . . . . . 67--73
           L. J. Caluwaerts and   
                J. Debacker and   
             J. A. Peperstraete   Implementing streams on a data flow
                                  computer system with paged memory  . . . 76--83
                Joseph E. Requa   The Piecewise Data Flow architecture
                                  control flow and register management . . 84--89
               Mario Tokoro and   
          J. R. Jagannathan and   
                Hideki Sunahara   On the working set concept for data-flow
                                  machines . . . . . . . . . . . . . . . . 90--97
         R. W. Marczy\'nski and   
                    J. Milewski   A data driven system based on a
                                  microprogrammed processor module . . . . 98--106
         David A. Patterson and   
              Phil Garrison and   
                  Mark Hill and   
           Dimitris Lioupis and   
               Chris Nyberg and   
                 Tim Sippel and   
                Korbin Van Dyke   Architecture of a VLSI instruction cache
                                  for a RISC . . . . . . . . . . . . . . . 108--116
             Phil C. C. Yeh and   
             Janak H. Patel and   
             Edward S. Davidson   Performance of shared cache for
                                  parallel-pipelined computer systems  . . 117--123
               James R. Goodman   Using cache memory to reduce
                                  processor-memory traffic . . . . . . . . 124--131
             James E. Smith and   
               James R. Goodman   A study of instruction cache
                                  organizations and replacement policies   132--137
               Joseph A. Fisher   Very Long Instruction Word architectures
                                  and the ELI-512  . . . . . . . . . . . . 140--150
              Shinji Tomita and   
          Kiyoshi Shibayama and   
          Toshiaki Kitamura and   
           Toshiyuki Nakata and   
               Hiroshi Hagiwara   A user-microprogrammable, local host
                                  computer with low-level parallelism  . . 151--157
            Richard H. Gumpertz   Combining tags with error codes  . . . . 160--165
             Young Gil Park and   
                   Jung Wan Cho   Fault diagnosis of bit-slice processor   166--172
                 M. A. Fiol and   
                  I. Alegre and   
                 J. L. A. Yebra   Line digraph iterations and the (d,k)
                                  problem for directed graphs  . . . . . . 174--177
                  Eli Opper and   
             Miroslaw Malek and   
               G. Jack Lipovski   Resource allocation in rectangular
                                  CC-banyans . . . . . . . . . . . . . . . 178--184
            Franti\vsek Sovi\vs   Uniform theory of the shuffle-exchange
                                  type permutation networks  . . . . . . . 185--191
             Vason P. Srini and   
                Jorge F. Asenjo   Analysis of Cray-1S architecture . . . . 194--206
                Harry F. Jordan   Performance measurements on HEP --- a
                                  pipelined MIMD computer  . . . . . . . . 207--212
             Hideharu Amano and   
           Takaichi Yoshida and   
                     Hideo Aiso   (SM)2-Sparse Matrix Solving Machine  . . 213--220
        R. Kalyana Krishnan and   
            A. K. Rajasekar and   
                    C. S. Moghe   An experimental system for Computer
                                  Science instruction  . . . . . . . . . . 222--227
             Klaus Kronlöf   Execution control and memory management
                                  of a Data Flow Signal Processor  . . . . 230--235
             Masasuke Kishi and   
           Hiroshi Yasuhara and   
              Yasusuke Kawamura   DDDP---a Distributed Data Driven
                                  Processor  . . . . . . . . . . . . . . . 236--242
          Naohisa Takahashi and   
                 Makoto Amamiya   A data flow processor array system:
                                  Design and analysis  . . . . . . . . . . 243--250
                Kenneth A. Pier   A retrospective on the Dorado, a
                                  high-performance personal computer . . . 252--269
                Robert J. Dugan   System/370 extended architecture: a
                                  program view of the channel subsystem    270--276
          Richard L. Norton and   
               Jacob A. Abraham   Adaptive interpretation as a means of
                                  exploiting complex instruction sets  . . 277--282
                Manoj Kumar and   
             Daniel M. Dias and   
                     J. R. Jump   Switching strategies in a class of
                                  packet switching networks  . . . . . . . 284--300
                Benjamin W. Wah   A comparative study of distributed
                                  resource sharing on multiprocessors  . . 301--308
              W. Kent Fuchs and   
           Jacob A. Abraham and   
                Kuang-Hua Huang   Concurrent error detection in VLSI
                                  interconnection networks . . . . . . . . 309--315
                W. K. Giloi and   
                        P. Behr   Hierarchical function distribution --- a
                                  design principle for advanced
                                  multicomputer architectures  . . . . . . 318--325
                  Luigi Stringa   EMMA-an industrial experience on large
                                  multiprocessing architectures  . . . . . 326--333
             Lars Philipson and   
                 Bo Nilsson and   
               Bjorn Breidegard   A communication structure for a
                                  multiprocessor computer with distributed
                                  global memory  . . . . . . . . . . . . . 334--340
             Hiromu Hayashi and   
              Akira Hattori and   
                  Haruo Akimoto   ALPHA---a high-performance LISP machine
                                  equipped with a new stack structure and
                                  garbage collection system  . . . . . . . 342--348
             Shinji Umeyama and   
                Koichiro Tamura   A parallel execution model of logic
                                  programs . . . . . . . . . . . . . . . . 349--355
         Claudia Schmittgen and   
                   Werner Kluge   A system architecture for the concurrent
                                  evaluation of applicative program
                                  expressions  . . . . . . . . . . . . . . 356--362
        Yoshinori Yamaguchi and   
                 Kenji Toda and   
                Toshitsugu Yuba   A performance evaluation of a Lisp-based
                                  data-driven machine (EM-3) . . . . . . . 363--369
             Steven L. Tanimoto   A pyramidal approach to parallel
                                  processing . . . . . . . . . . . . . . . 372--378
          Gérard Gaillat   The design of a parallel processor for
                                  image processing on-board satellites: an
                                  application oriented approach  . . . . . 379--386
          Hitoshi Nishimura and   
               Hiroshi Ohno and   
                Toru Kawata and   
             Isao Shirakawa and   
                   Koichi Omura   Links-1 --- a parallel pipelined
                                  multimicrocomputer system for image
                                  creation . . . . . . . . . . . . . . . . 387--394
                T. Ericsson and   
               P. E. Danielsson   LIPP --- a SIMD multiprocessor
                                  architecture for image processing  . . . 395--400
            Philip C. Treleaven   The new generation of computer
                                  architecture . . . . . . . . . . . . . . 402--409
                Shunichi Uchida   Inference machine: From sequential to
                                  parallel . . . . . . . . . . . . . . . . 410--416
                 Tohru Moto-oka   Overview to the Fifth Generation
                                  Computer System project  . . . . . . . . 417--422
             Kunio Murakami and   
               Takeo Kakuta and   
         Nobuyoshi Miyazaki and   
          Shigeki Shibayama and   
                   Haruo Yokota   A relational data base machine: First
                                  step to knowledge base machine . . . . . 423--425
                     Arvind and   
             Robert A. Iannucci   A critique of multiprocessing von
                                  Neumann style  . . . . . . . . . . . . . 426--436

ACM SIGARCH Computer Architecture News
Volume 11, Number 4, September, 1983

                 Dwight D. Hill   An analysis of C machine support for
                                  other block-structured languages . . . . 6--16
                  Nenad Marovac   On interprocess interaction in
                                  distributed architectures  . . . . . . . 17--22
            Robert J. Schalkoff   Towards an efficient, dedicated
                                  architecture for a Digital Geometric
                                  Image Transformer (DGIT) . . . . . . . . 23--29
              Arieh Plotkin and   
                   Daniel Tabak   A Tree Structured Architecture for
                                  semantic gap reduction . . . . . . . . . 30--44

ACM SIGARCH Computer Architecture News
Volume 11, Number 5, December, 1983

              Maurice V. Wilkes   Keeping jump instructions out of the
                                  pipeline of a RISC-like computer . . . . 5--7
                   Jeremy Jones   Puzzling with microcode  . . . . . . . . 8--12
                  Wayne Amsbury   A code-splitting algorithm . . . . . . . 13--21
               Jack J. Dongarra   Performance of various computers using
                                  standard linear equations software in a
                                  Fortran environment  . . . . . . . . . . 22--27
                  M. R. Bhujade   On the design of Always Compatible
                                  Instruction Set Architecture(ACISA)  . . 28--30

ACM SIGARCH Computer Architecture News
Volume 12, Number 1, March, 1984

                    J. L. Heath   Re-evaluation of the RISC I  . . . . . . 3--10
             David A. Patterson   RISC watch . . . . . . . . . . . . . . . 11--19
                 Michael Beeler   Beyond the Baskett benchmark . . . . . . 20--31
              Edward A. Feustel   Process exchange on the PR1ME family of
                                  computers  . . . . . . . . . . . . . . . 32--43
                  P. M. Fenwick   Addressing operations for automatic data
                                  structure accessing  . . . . . . . . . . 44--57
                     C. K. Yuen   Some applications of the implicit
                                  register reference . . . . . . . . . . . 58--63
            Krishna M. Kavi and   
                K. Krishnamohan   Architecture quality . . . . . . . . . . 64--72

ACM SIGARCH Computer Architecture News
Volume 12, Number 2, June, 1984

          Dharma P. Agrawal and   
            Winser E. Alexander   B-HIVE: a heterogeneous, interconnected,
                                  versatile and expandable multicomputer
                                  system . . . . . . . . . . . . . . . . . 7--13

ACM SIGARCH Computer Architecture News
Volume 12, Number 3, June, 1984

                F. J. Burkowski   A vector and array multiprocessor
                                  extension of the sylvan architecture . . 4--11
          Alejandro Kapauan and   
           J. Timothy Field and   
           Dennis B. Gannon and   
                Lawrence Snyder   The Pringle parallel computer  . . . . . 12--20
             Mehrad Yasrebi and   
                 G. J. Lipovski   A state-of-the-art SIMD two-dimensional
                                  FFT array processor  . . . . . . . . . . 21--27
                   Y. W. Ma and   
                R. Krishnamurti   The architecture of Replica: a
                                  special-purpose computer system for
                                  active multi-sensory perception of
                                  $3$-dimensional objects  . . . . . . . . 30--37
           Samuel M. Goldwasser   A generalized object display processor
                                  architecture . . . . . . . . . . . . . . 38--47
           Katsura Kawakami and   
               Shigeo Shimazaki   A special purpose LSI processor using
                                  the DDA algorithm for image
                                  transformation . . . . . . . . . . . . . 48--54
            Benjamin W. Wah and   
                 Guo-Jie Li and   
                    Chee-Fen Yu   The status of MANIP --- a multicomputer
                                  architecture for solving, combinatorial
                                  extremum-search problems . . . . . . . . 56--63
          R. Gonzalez-Rubio and   
                  J. Rohmer and   
                      D. Terral   The SCHUSS filter: a processor for
                                  non-numerical data processing  . . . . . 64--73
               Carl Ebeling and   
                   Andrew Palay   The design and implementation of a VLSI
                                  chess move generator . . . . . . . . . . 74--80
                 Manjai Lee and   
                   Chuan-lin Wu   Performance analysis of circuit
                                  switching, baseline interconnection
                                  networks . . . . . . . . . . . . . . . . 82--90
           Clyde P. Kruskal and   
                      Marc Snir   The importance of being square . . . . . 91--98
              Chi-Yuan Chin and   
                      Kai Hwang   Connection principles for multipath,
                                  packet switching networks  . . . . . . . 99--108
               Shlomo Weiss and   
                 James E. Smith   Instruction issue logic for pipelined
                                  supercomputers . . . . . . . . . . . . . 110--118
            Robert G. Wedig and   
                   Marc A. Rose   The reduction of branch instruction
                                  execution overhead using structured
                                  control flow . . . . . . . . . . . . . . 119--125
             Utpal Banerjee and   
               Daniel D. Gajski   Fast execution of loops with if
                                  statements . . . . . . . . . . . . . . . 126--132
              Daniel Gajski and   
                    Won Kim and   
                 Shinya Fushimi   A parallel pipelined relational query
                                  processor: an architectural overview . . 134--141
             Arun K. Somani and   
               Vinod K. Agarwal   An efficient VLSI dictionary machine . . 142--150
                Allan L. Fisher   Dictionary machines with a small number
                                  of processors  . . . . . . . . . . . . . 151--156
               Mark D. Hill and   
                 Alan Jay Smith   Experimental evaluation of on-chip
                                  microprocessor cache memories  . . . . . 158--166
           James R. Goodman and   
                Men-chow Chiang   The use of static column RAM as a memory
                                  hierarchy  . . . . . . . . . . . . . . . 167--173
                  I. J. Haikala   Cache hit ratios with geometric task
                                  switch intervals . . . . . . . . . . . . 175--175
            Yutaka Ishikawa and   
                   Mario Tokoro   The design of an object oriented
                                  architecture . . . . . . . . . . . . . . 178--187
                David Ungar and   
                 Ricki Blau and   
                Peter Foley and   
               Dain Samples and   
                David Patterson   Architecture of SOAR: Smalltalk on a
                                  RISC . . . . . . . . . . . . . . . . . . 188--197
                Pradip Bose and   
             Edward S. Davidson   Design of instruction set architectures
                                  for support of high-level languages  . . 198--206
                Patrice Quinton   Automatic synthesis of systolic arrays
                                  from uniform recurrent equations . . . . 208--214
           Chang nian Zhang and   
                David Y. Y. Yun   Multi-dimensional systolic networks, for
                                  Discrete Fourier Transform . . . . . . . 215--222
            J. A. B. Fortes and   
                 D. I. Moldovan   Data broadcasting in linearly scheduled
                                  array processors . . . . . . . . . . . . 224--231
         I. V. Ramakrishnan and   
                   P. J. Varman   Modular matrix multiplication on a
                                  linear array . . . . . . . . . . . . . . 232--238
                   T. R. N. Rao   Joint encryption and error correction
                                  schemes  . . . . . . . . . . . . . . . . 240--241
                     Bella Bose   Unidirectional error
                                  correction/detection for VLSI memory . . 242--244
                     C. L. Chen   Error-correcting codes for semiconductor
                                  memories . . . . . . . . . . . . . . . . 245--247
       Khaled Abdel Ghaffar and   
             Robert J. McEliece   Soft error correction for increased
                                  densities in VLSI memories . . . . . . . 248--250
            Richard M. King and   
               Robert A. Wagner   Combining speed with alpha-particle
                                  induced memory, error tolerance in a
                                  large Boolean vector machine . . . . . . 251--253
                Laxmi N. Bhuyan   On the performance of loosely coupled
                                  multiprocessors  . . . . . . . . . . . . 256--262
              Ravi Mehrotra and   
             Sarosh N. Talukdar   Scheduling of tasks for distributed
                                  processors . . . . . . . . . . . . . . . 263--270
            Krishna M. Kavi and   
           Edward W. Banios and   
               Bruce D. Shriver   Message repository definitional
                                  facility: an architectural model for
                                  interprocess communication . . . . . . . 271--278
        Prithviraj Banerjee and   
               Jacob A. Abraham   Fault-secure algorithms for
                                  multiple-processor systems . . . . . . . 279--287
                    Lubomir Bic   Execution of logic programs on a
                                  dataflow architecture  . . . . . . . . . 290--296
                 W. G. Rudd and   
            Duncan A. Buell and   
            Donald M. Chiarulli   A high performance factoring machine . . 297--300
               Joel S. Emer and   
               Douglas W. Clark   A characterization of processor
                                  performance in the VAX-11/780  . . . . . 301--310
              W. D. Moeller and   
                     G. Sandweg   The peripheral processor PP4, a highly
                                  regular VLSI processor . . . . . . . . . 312--318
                 Lars Philipson   VLSI based design principles for MIMD
                                  multiprocessor computers with
                                  distributed memory management  . . . . . 319--327
             M. R. Samatham and   
                  D. K. Pradhan   A multiprocessor network suitable for
                                  single-chip VLSI implementation  . . . . 328--339
              Larry Rudolph and   
                    Zary Segall   Dynamic decentralized cache schemes for
                                  MIMD parallel processors . . . . . . . . 340--347
         Mark S. Papamarcos and   
                 Janak H. Patel   A low-overhead coherence solution for
                                  multiprocessors with private cache
                                  memories . . . . . . . . . . . . . . . . 348--354
            James Archibald and   
                 Jean Loup Baer   An economical solution to the cache
                                  coherence problem  . . . . . . . . . . . 355--362
               Ilkka J. Haikala   Cache hit ratios with geometric task
                                  switch intervals . . . . . . . . . . . . 364--371

ACM SIGARCH Computer Architecture News
Volume 12, Number 4, September, 1984

              Gilman D. Chesley   A wafer microcomputer  . . . . . . . . . 4--6
          Howard Jay Siegel and   
         Thomas Schwederski and   
      Nathaniel J. Davis IV and   
                 James T. Kuehn   PASM: a reconfigurable parallel system
                                  for image processing . . . . . . . . . . 7--19

ACM SIGARCH Computer Architecture News
Volume 12, Number 5, December, 1984

                   Javaid Aslam   Methodology for designing a computer
                                  architecture . . . . . . . . . . . . . . 4--11
             Peter C. J. Graham   Providing architectural support for
                                  expert systems . . . . . . . . . . . . . 12--18

ACM SIGARCH Computer Architecture News
Volume 13, Number 1, March, 1985

               Jack J. Dongarra   Performance of various computers using
                                  standard linear equations software in a
                                  Fortran environment  . . . . . . . . . . 3--11
                  T. M. Hor and   
                     C. K. Yuen   The design and programming of a powerful
                                  short wordlength processor using
                                  context-dependent machine instructions   12--26
                     E. N. Miya   Multiprocessor/distributed processing
                                  bibliography (in machine-readable form)  27--29

ACM SIGARCH Computer Architecture News
Volume 13, Number 2, June, 1985

                     Weiming Hu   Dataflow architecture for EEG patient
                                  monitor  . . . . . . . . . . . . . . . . 3--10
                     A. G. Tagg   Speculations on the evolution of an
                                  architecture . . . . . . . . . . . . . . 11--18
                  Brian Randell   Hardware/software tradeoffs: a general
                                  design principle?  . . . . . . . . . . . 19--21

ACM SIGARCH Computer Architecture News
Volume 13, Number 3, June, 1985

       V. K. Prasanna Kumar and   
              C. S. Raghavendra   Array processor with multiple
                                  broadcasting . . . . . . . . . . . . . . 2--10
                    G. Wolf and   
                     J. R. Jump   Matrix multiplication in an interleaved
                                  array processing architecture  . . . . . 11--17
              J. R. Goodman and   
              Jian-tu Hsieh and   
               Koujuch Liou and   
         Andrew R. Pleszkun and   
            P. B. Schechter and   
               Honesty C. Young   PIPE: a VLSI decoupled architecture  . . 20--27
            Peter Y. T. Hsu and   
           Joseph T. Rahmeh and   
         Edward S. Davidson and   
               Jacob A. Abraham   TIDBITS: speedup via time-delay
                                  bit-slicing in ALU design for VLSI
                                  technology . . . . . . . . . . . . . . . 29--35
             James E. Smith and   
             Andrew R. Pleszkun   Implementation of precise interrupts in
                                  pipelined processors . . . . . . . . . . 36--44
             Herb Schwetman and   
              Daniel Gajski and   
              Dennis Gannon and   
               Daniel Hills and   
             Jacob Schwartz and   
                   James Browne   Classification of parallel processor
                                  architectures (invited tutorial session) 45--45
            Makoto Hasegawa and   
               Yoshiharu Shigei   High-speed top-of-stack scheme for VLSI
                                  processor: a management algorithm and
                                  its analysis . . . . . . . . . . . . . . 48--54
   Charles Y. Hitchcock III and   
          H. M. Brinkley Sprunt   Analyzing multiple register sets . . . . 55--63
                 Alan Jay Smith   Cache evaluation and the impact of
                                  workload choice  . . . . . . . . . . . . 64--73
                  David A. Moon   Architecture of the Symbolics 3600 . . . 76--83
                 Ashwin Ram and   
                 Janak H. Patel   Parallel garbage collection without
                                  synchronization overhead . . . . . . . . 84--90
           Gurindar S. Sohi and   
         Edward S. Davidson and   
                 Janak H. Patel   An efficient LISP-execution architecture
                                  with a new representation for list
                                  structures . . . . . . . . . . . . . . . 91--98
             Hideharu Amano and   
               Taisuke Boku and   
             Tomohiro Kudoh and   
                     Hideo Aiso   (SM)2-II: a new version of the sparse
                                  matrix solving machine . . . . . . . . . 100--107
                John Beetem and   
              Monty Denneau and   
                 Don Weingarten   The GF11 supercomputer . . . . . . . . . 108--115
       Bradley Warren Smith and   
              Howard Jay Siegel   Models for use in the design of
                                  macro-pipelined parallel processors  . . 116--123
                  Jan Edler and   
             Allan Gottlieb and   
           Clyde P. Kruskal and   
         Kevin P. McAuliffe and   
              Larry Rudolph and   
                  Marc Snir and   
         Patricia J. Teller and   
                   James Wilson   Issues related to MIMD shared-memory
                                  computers: the NYU Ultracomputer
                                  approach . . . . . . . . . . . . . . . . 126--135
               R. N. Ibbett and   
                P. C. Capon and   
                   N. P. Topham   MU6V: a parallel vector processing
                                  system . . . . . . . . . . . . . . . . . 136--144
           Stephen F. Lundstrom   A decentralized control, highly
                                  concurrent multiprocessor  . . . . . . . 145--151
           William J. Dally and   
                James T. Kajiya   An object oriented architecture  . . . . 154--161
        Edward F. Gehringer and   
                J. Leslie Keedy   Tagged architecture: how compelling are
                                  its advantages?  . . . . . . . . . . . . 162--170
                   S. Nanba and   
                    N. Ohno and   
                    H. Kubo and   
                 H. Morisue and   
                 T. Ohshima and   
                   H. Yamagishi   VM/4: ACOS-4 virtual machine
                                  architecture . . . . . . . . . . . . . . 171--178
                T. P. Dobry and   
              A. M. Despain and   
                     Y. N. Patt   Performance studies of a Prolog machine
                                  architecture . . . . . . . . . . . . . . 180--190
            Ryosei Nakazaki and   
           Akihiko Konagaya and   
           Shin'ichi Habata and   
              Hideo Shimazu and   
            Mamoru Umemutra and   
          Masahiro Yamamoto and   
              Minoru Yokota and   
              Takashi Chikayama   Design of a high-speed Prolog machine
                                  (HPM)  . . . . . . . . . . . . . . . . . 191--197
                   Nam Sung Woo   A hardware unification unit: design and
                                  analysis . . . . . . . . . . . . . . . . 198--205
               Nicholas Matelan   The FLEX/32 multicomputer  . . . . . . . 209--213
                     J. Rattner   Commercial multiprocessors (title only)  214--214
                    Dick Naedel   Closely coupled asynchronous
                                  hierarchical and parallel processing in
                                  an open architecture . . . . . . . . . . 215--220
                     Jim Savage   Parallel processing as a language design
                                  problem  . . . . . . . . . . . . . . . . 221--224
               David P. Rodgers   Improvements in multiprocessor system
                                  design . . . . . . . . . . . . . . . . . 225--231
                  Peter B. Mark   The Sequoia computer: a fault-tolerant
                                  tightly-coupled multiprocessor
                                  architecture . . . . . . . . . . . . . . 232--232
              Elliot Nestle and   
               Armond Inselberg   The SYNAPSE N+1 System: architectural
                                  characteristics and performance data of
                                  a tightly-coupled multiprocessor system  233--239
            Robert W. Horst and   
             Timothy C. K. Chou   An architecture for high volume
                                  transaction processing . . . . . . . . . 240--245
               Harold Stone and   
               Eric Manning and   
              Harriet Rigas and   
               Philip Treleaven   The fifth generation computer systems
                                  projects (invited session) . . . . . . . 247--247
              Shigeo Kamiya and   
             Susumu Matsuda and   
             Kazuhide Iwata and   
          Shigeki Shibayama and   
              Hiroshi Sakai and   
                 Kunio Murakami   A hardware pipeline algorithm for
                                  relational database operation  . . . . . 250--257
                    Dik Lun Lee   A distributed multiple-response resolver
                                  for value-order retrieval  . . . . . . . 258--265
                   John Feo and   
               Roy Jenevein and   
                   J. C. Browne   Dynamic, distributed resource
                                  configuration on SW-banyans  . . . . . . 268--275
                 R. H. Katz and   
               S. J. Eggers and   
                 D. A. Wood and   
              C. L. Perkins and   
                  R. G. Sheldon   Implementing a cache consistency
                                  protocol . . . . . . . . . . . . . . . . 276--283
                 Zhiyuan Li and   
                Walid Abu-Sufah   A technique for reducing synchronization
                                  overhead in large scale multiprocessors  284--291
          Colin Whitby-Strevens   The transputer . . . . . . . . . . . . . 292--300
               A. R. Hurson and   
                     B. Shirazi   A systolic multiplier unit and its VLSI
                                  design . . . . . . . . . . . . . . . . . 302--309
                    Rami Melhem   A language for the simulation of
                                  systolic architectures . . . . . . . . . 310--314
         Henry Y. H. Chuang and   
                         Guo He   A versatile systolic array for matrix
                                  computations . . . . . . . . . . . . . . 315--322
                 Rex Vedder and   
                    Dennis Finn   The Hughes Data Flow Multiprocessor:
                                  architecture for efficient signal and
                                  data processing  . . . . . . . . . . . . 324--332
               Kenneth R. Traub   An abstract parallel graph reduction
                                  machine  . . . . . . . . . . . . . . . . 333--341
            Bruno R. Preiss and   
                 V. C. Hamacher   Data flow on a queue machine . . . . . . 342--351
                  J. L. Gaudiot   Methods for handling structures in
                                  data-flow systems  . . . . . . . . . . . 352--358
             M. R. Samatham and   
                  D. K. Pradhan   The de Bruijn multiprocessor network: a
                                  versatile sorting network  . . . . . . . 360--367
            Nian-Feng Tzeng and   
              Pen-Chung Yew and   
                    Chun-Qi Zhu   A fault-tolerant scheme for multistage
                                  interconnection networks . . . . . . . . 368--375
                V. P. Kumar and   
                    S. M. Reddy   Design and analysis of fault-tolerant
                                  multistage interconnection networks with
                                  low link complexity  . . . . . . . . . . 376--386
      Nathaniel J. Davis IV and   
              Howard Jay Siegel   The performance analysis of partitioned
                                  circuit switched multistage
                                  interconnection networks . . . . . . . . 387--394
          Dalibor Vrsalovic and   
        Edward F. Gehringer and   
             Zary Z. Segall and   
            Daniel P. Siewiorek   The influence of parallel decomposition
                                  strategies on the performance of
                                  multiprocessor systems . . . . . . . . . 396--405
            Walid Abu-Sufah and   
                   Alex Y. Kwok   Performance prediction tools for Cedar:
                                  a multiprocessor supercomputer . . . . . 406--413
José M. Llabería Griñó and   
 Mateo Valero Cortés and   
      Enrique Herrada Lillo and   
    Jesús Labarta Mancho   Analysis and simulation of multiplexed
                                  single-bus networks with and without
                                  buffering  . . . . . . . . . . . . . . . 414--421
             J. Sanguinetti and   
                       B. Kumar   Performance of a message-based
                                  multiprocessor . . . . . . . . . . . . . 424--425

ACM SIGARCH Computer Architecture News
Volume 13, Number 4, September, 1985

                    J.-Fr. Hake   PDOC --- a database on parallel
                                  processing literature  . . . . . . . . . 2--7
                    Mark Rockey   The dataflow architecture: a suitable
                                  base for the implementation of expert
                                  systems  . . . . . . . . . . . . . . . . 8--14
               Harvey G. Cragon   An architecture design system  . . . . . 15--21
              Miquel Huguet and   
              Tomás Lang   A reduced register file for RISC
                                  architectures  . . . . . . . . . . . . . 22--31

ACM SIGARCH Computer Architecture News
Volume 13, Number 5, December, 1985

        Cedell A. Alexander and   
        William M. Keshlear and   
                    Faye Briggs   Translation buffer performance in a UNIX
                                  environment  . . . . . . . . . . . . . . 2--14
                    Rosanna Lee   On ``hot spot'' contention . . . . . . . 15--20

ACM SIGARCH Computer Architecture News
Volume 14, Number 1, January, 1986

               Nam Sung Woo and   
                Richard O'Keefe   A comment on ``A hardware unification
                                  unit: design and analysis''  . . . . . . 2--3
                A. B. Ruighaver   Design aspects of the Delft Parallel
                                  Processor DPP84 and its programming
                                  system . . . . . . . . . . . . . . . . . 4--8
            Dan Hammerstrom and   
                David Maier and   
              Shreekant Thakkar   The Cognitive Architecture Project . . . 9--21
                 Alan Jay Smith   Bibliography and reading on CPU cache
                                  memories and related topics  . . . . . . 22--42

ACM SIGARCH Computer Architecture News
Volume 14, Number 2, June, 1986

                  H. Yokota and   
                        H. Itoh   A model and an architecture for a
                                  relational knowledge base  . . . . . . . 2--9
                 M. Amamiya and   
                 M. Takesue and   
                R. Hasegawa and   
                      H. Mikami   Implementation and evaluation of a
                                  list-processing-oriented data flow
                                  machine  . . . . . . . . . . . . . . . . 10--19
               K. Takahashi and   
                  H. Yamada and   
                   H. Nagai and   
                     K. Matsumi   A new string search hardware
                                  architecture for VLSI  . . . . . . . . . 20--27
                   A. Gupta and   
                   C. Forgy and   
                  A. Newell and   
                       R. Wedig   Parallel algorithms and architectures
                                  for rule-based systems . . . . . . . . . 28--37
        R. R. Halstead, Jr. and   
             T. L. Anderson and   
              R. B. Osborne and   
                 T. L. Sterling   Concert: design of a multiprocessor
                                  development system . . . . . . . . . . . 40--48
                     H. T. Kung   Memory requirements for balanced
                                  computer architectures . . . . . . . . . 49--54
                 Y. C. Hong and   
                T. H. Payne and   
              L. B. O. Ferguson   Graph allocation in static dataflow
                                  systems  . . . . . . . . . . . . . . . . 55--64
                 P. Agrawal and   
                     R. Agrawal   Software implementation of a recursive
                                  fault tolerance algorithm on a network
                                  of computers . . . . . . . . . . . . . . 65--72
                  T. Nojiri and   
                S. Kawasaki and   
                      K. Sakoda   Microprogrammable processor for
                                  object-oriented architecture . . . . . . 74--81
              S. S. Thakkar and   
                 W. E. Hostmann   An instruction fetch unit for a graph
                                  reduction machine  . . . . . . . . . . . 82--91
            E. F. Gehringer and   
                  R. P. Colwell   Fast object-oriented procedure calls:
                                  lessons from the Intel 432 . . . . . . . 92--101
                 D. M. Dias and   
                 B. R. Iyer and   
                       P. S. Yu   On coupling many small systems for
                                  transaction processing . . . . . . . . . 104--110
              M. I. Malkawi and   
                    J. H. Patel   Performance measurement of paging
                                  behavior in multiprogramming systems . . 111--118
                 A. Agarwal and   
                R. L. Sites and   
                    M. Horowitz   ATUM: a new technique for capturing
                                  address traces using microcode . . . . . 119--127
                     M. J. Wise   Experimenting with EPILOG: some results
                                  and preliminary conclusions  . . . . . . 119--127
               Y. Shobatake and   
                        H. Aiso   A unification processor based on a
                                  uniformly structured cellular hardware   128--139
                     N. Ito and   
                    M. Sato and   
                    E. Kuno and   
                    K. Rokusawa   The architecture and preliminary
                                  evaluation results of the experimental
                                  parallel inference machine PIM-D . . . . 149--156
                      A. Seznec   An efficient routing control for the
                                  SIGMA network $ \Sigma (4) $ . . . . . . 158--168
               J. D. Nicoud and   
                       K. Skala   REYSM, a high performance, low power
                                  multi-processor bus  . . . . . . . . . . 169--174
                  K. Y. Lee and   
                      W. Hegazy   The extra stage gamma network  . . . . . 175--182
                  M. Yuhara and   
                 A. Hattori and   
                    M. Niwa and   
               M. Kishimoto and   
                     H. Hayashi   Evaluation of the FACOM ALPHA Lisp
                                  machine  . . . . . . . . . . . . . . . . 184--190
             A. R. Pleszkun and   
          M. J. Thazhuthaveetil   An architecture for efficient Lisp list
                                  access . . . . . . . . . . . . . . . . . 191--198
                  T. Nakata and   
                       N. Koike   A functional level simulation engine of
                                  MAN-YO: a special purpose parallel
                                  machine for logic design automation  . . 202--208
                    E. H. Frank   Exploiting parallelism in a switch-level
                                  simulation machine . . . . . . . . . . . 209--215
         T. S. Anantharaman and   
                     R. Bisiani   A hardware accelerator for speech
                                  recognition algorithms . . . . . . . . . 216--223
                 T. Shimada and   
                  K. Hiraki and   
                 K. Nishida and   
                   S. Sekiguchi   Evaluation of a prototype data flow
                                  processor of the SIGMA-1 for scientific
                                  computations . . . . . . . . . . . . . . 226--234
                J. Sargeant and   
                  C. C. Kirkham   Stored data structures on the Manchester
                                  dataflow machine . . . . . . . . . . . . 235--242
                K. Hawakami and   
                     J. R. Gurd   A scalable dataflow structure store  . . 243--250
                M. Hasegawa and   
                      Y. Shigei   $ A T^2 = O(N \log^4 N), T = O(\log N) $
                                  Fast Fourier Transform in a light
                                  connected $3$-dimensional VLSI . . . . . 252--260
                K. Sapiecha and   
                     R. Jarocki   Modular architecture for high
                                  performance implementation of FFT
                                  algorithm  . . . . . . . . . . . . . . . 261--270
              J. J. Navarro and   
             J. M. Llaberia and   
                      M. Valero   Computing size-independent matrix
                                  problems on systolic array processors    271--278
                  S. Tomita and   
               K. Shibayama and   
                  T. Nakata and   
                   S. Yuasa and   
                    H. Hagiwara   A computer with low-level parallelism
                                  QA-2: its applications to $3$-D graphics
                                  and Prolog/Lisp machines . . . . . . . . 280--289
                    M. Hirayama   VLSI oriented asynchronous architecture  290--296
                     W. Hwu and   
                     Y. N. Patt   HPSm, a high performance restricted data
                                  flow architecture having minimal
                                  functionality  . . . . . . . . . . . . . 297--306
                   K. Onaga and   
                     T. Takechi   On design of rotary array communication
                                  and wavefront-driven algorithms for
                                  solving large-scale band-limited matrix
                                  equations  . . . . . . . . . . . . . . . 308--315
          L. M. Napolitano, Jr.   A computer architecture for dynamic
                                  finite element analysis  . . . . . . . . 316--323
           D. T. Harper III and   
                     J. R. Jump   Performance evaluation of vector
                                  accesses in parallel memories using a
                                  skewed storage scheme  . . . . . . . . . 324--328
                   T. Kondo and   
                T. Tsuchiya and   
                T. Kitamura and   
                Y. Sugiyama and   
                      T. Kimura   Pseudo MIMD array processor---AAP2 . . . 330--337
                   A. L. Fisher   Scan line array processors for image
                                  computation  . . . . . . . . . . . . . . 338--345
              M. Annaratone and   
                 E. Arnould and   
                   T. Gross and   
                 H. T. Kung and   
                      M. S. Lam   Warp architecture and implementation . . 346--356
                 D. A. Wood and   
               S. J. Eggers and   
                  G. Gibson and   
                 M. D. Hill and   
                J. M. Pendleton   An in-cache address translation
                                  mechanism  . . . . . . . . . . . . . . . 358--365
             D. R. Cheriton and   
           G. A. Slavenburg and   
                    P. D. Boyle   Software-controlled caches in the VMP
                                  multiprocessor . . . . . . . . . . . . . 366--374
              J. R. Goodman and   
                      W. C. Hsu   On the use of registers vs. cache to
                                  minimize memory traffic  . . . . . . . . 375--383
               P. Y. T. Hsu and   
                 E. S. Davidson   Highly concurrent scalar processing  . . 386--395
               S. McFarling and   
                    J. Hennesey   Reducing the cost of branches  . . . . . 396--403
               S. R. Kunkel and   
                    J. E. Smith   Optimal pipelining in supercomputers . . 404--411
                 P. Sweazey and   
                    A. J. Smith   A class of compatible cache consistency
                                  protocols and their support by the IEEE
                                  Futurebus  . . . . . . . . . . . . . . . 414--423
                   P. Bitar and   
                  A. M. Despain   Multiprocessor cache synchronization:
                                  issues, innovations, evolution . . . . . 424--433
                  M. Dubois and   
               C. Scheurich and   
                      F. Briggs   Memory access buffering in
                                  multiprocessors  . . . . . . . . . . . . 434--442
               G. S. Taylor and   
            P. N. Hilfinger and   
                J. R. Larus and   
            D. A. Patterson and   
                     B. G. Zorn   Evaluation of the SPUR Lisp architecture 444--452

ACM SIGARCH Computer Architecture News
Volume 14, Number 3, June, 1986

                   Nam Sung Woo   A reply to comments ``A Comment on 'A
                                  Hardware Unification Unit: Design and
                                  Analysis''\,'  . . . . . . . . . . . . . 2--4
               D. K. DuBose and   
              D. K. Fotakis and   
                       D. Tabak   A microcoded RISC  . . . . . . . . . . . 5--16
          Tomás Lang and   
                  Miquel Huguet   Reduced register saving/restoring in
                                  single-window register files . . . . . . 17--26
             Larry O'Neal Rouse   The twisted double helix: a minimum
                                  distance architecture for 5th generation
                                  computing  . . . . . . . . . . . . . . . 27--33
               David M. Harland   A recursively microcodable tagged
                                  architecture . . . . . . . . . . . . . . 34--40
           Cedell Alexander and   
           William Keshlear and   
             Furrokh Cooper and   
                    Faye Briggs   Cache memory performance in a Unix
                                  environment  . . . . . . . . . . . . . . 41--61

ACM SIGARCH Computer Architecture News
Volume 14, Number 4, September, 1986

                   Roger Stokes   Traces for hardware verification . . . . 7--14
             Claudio Kirner and   
                Eduardo Marques   Design of a distributed system support
                                  based on a centralized parallel bus  . . 15--26
                Mary Jane Irwin   Secretary/Treasurer's Report . . . . . . 28--28

ACM SIGARCH Computer Architecture News
Volume 14, Number 5, December, 1986

           David M. Harland and   
                   Bruno Beloff   Microcoding an object-oriented
                                  instruction set  . . . . . . . . . . . . 3--12
              William Stallings   An annotated bibliography on reduced
                                  instruction set computers  . . . . . . . 13--19

ACM SIGARCH Computer Architecture News
Volume 15, Number 1, March, 1987

        Robert H. Halstead, Jr.   Overview of Concert MultiLisp: a
                                  multiprocessor symbolic computing system 5--14
                 Dave Patterson   A progress report on SPUR: February 1,
                                  1987 . . . . . . . . . . . . . . . . . . 15--21
                 A. Despain and   
                    Y. Patt and   
                   V. Srini and   
                   P. Bitar and   
                    W. Bush and   
                   C. Chien and   
                  W. Citrin and   
                   B. Fagin and   
                     W. Hwu and   
                  S. Melvin and   
                  R. McGeer and   
                 A. Singhal and   
                M. Shebanow and   
                     P. Van Roy   Aquarius . . . . . . . . . . . . . . . . 22--34
               Madhur Kohli and   
           Mark E. Giuliano and   
                    Jack Minker   An overview of the PRISM project . . . . 35--42
         M. V. Hermenegildo and   
                   R. A. Warren   Designing a high performance parallel
                                  logic programming system . . . . . . . . 43--52
              Jonathan W. Mills   Coming to grips with a RISC: a report of
                                  the progress of the LOW RISC design
                                  group  . . . . . . . . . . . . . . . . . 53--62
                    Brian Short   Use of instruction set simulators to
                                  evaluate the LOW RISC  . . . . . . . . . 63--67
               Kurt M. Gutzmann   Optimal dimension of hypercubes for
                                  sorting  . . . . . . . . . . . . . . . . 68--72
                 Gilman Chesley   Addressable WSI: a non-redundant
                                  approach . . . . . . . . . . . . . . . . 73--80
        Nripendra N. Biswas and   
                S. Srinivas and   
           Trishala Dharanendra   A centrally controlled shuffle network
                                  for reconfigurable and fault-tolerant
                                  architecture . . . . . . . . . . . . . . 81--87

ACM SIGARCH Computer Architecture News
Volume 15, Number 2, 1987

               D. R. Ditzel and   
                 H. R. McLellan   Branch folding in the CRISP
                                  microprocessor: reducing branch delay to
                                  zero . . . . . . . . . . . . . . . . . . 2--8
               J. A. DeRosa and   
                     H. M. Levy   An evaluation of branch architectures    10--16
                  W. W. Hwu and   
                     Y. N. Patt   Checkpoint repair for out-of-order
                                  execution machines . . . . . . . . . . . 18--26
                 G. S. Sohi and   
                   S. Vajapeyam   Instruction issue logic for
                                  high-performance, interruptible
                                  pipelined processors . . . . . . . . . . 27--34
                 J. Swensen and   
                        Y. Patt   Fast temporary storage for serial and
                                  parallel execution . . . . . . . . . . . 35--43
                    K. Wong and   
                 M. A. Franklin   Performance analysis and design of a
                                  logic simulation machine . . . . . . . . 46--55
                   K. Doshi and   
                      P. Varman   A modular systolic architecture for
                                  image convolutions . . . . . . . . . . . 56--63
                  S. Fujita and   
                  R. Aibara and   
               M. Yamashita and   
                          T. Ae   A template matching algorithm using
                                  optically-connected $3$-D VLSI
                                  architecture . . . . . . . . . . . . . . 64--70
               B. Mendelson and   
                G. M. Silberman   Mapping data flow programs on a VLSI
                                  array of processors  . . . . . . . . . . 72--80
                  D. Ghosal and   
                   L. N. Bhuyan   Analytical modeling and architectural
                                  modifications of a dataflow computer . . 81--89
                     M. Takesue   A unified resource management and
                                  execution control mechanism for data
                                  flow machines  . . . . . . . . . . . . . 90--97
                     S. Abe and   
                  T. Bandoh and   
               S. Yamaguchi and   
                K. Kurosawa and   
                    K. Kiriyama   High performance integrated Prolog
                                  processor IPP  . . . . . . . . . . . . . 100--107
                B. S. Fagin and   
                  A. M. Despain   Performance studies of a parallel Prolog
                                  architecture . . . . . . . . . . . . . . 108--116
               P. L. Civera and   
               F. Maddaleno and   
            G. L. Piccinini and   
                     M. Zamboni   An experimental VLSI Prolog interpreter:
                                  preliminary measurements and results . . 117--126
                      O. Ridoux   Deterministic and stochastic modeling of
                                  parallel garbage collection: towards
                                  real-time criteria . . . . . . . . . . . 128--136
                     C. Sun and   
                         Y. Tsu   The sharing of environment in
                                  AND--OR-parallel execution of logic
                                  programs . . . . . . . . . . . . . . . . 137--144
                    A. Guha and   
              R. Ramnarayan and   
                    M. Derstine   Architectural issues in designing
                                  symbolic processors in optics  . . . . . 145--151
                   A. Varma and   
              C. S. Raghavendra   Rearrangeability of multistage
                                  shuffle/exchange networks  . . . . . . . 154--162
                 R. Beivide and   
                 E. Herrada and   
             J. L. Balcazar and   
                     J. Labarta   Optimized mesh-connected networks for
                                  SIMD and MIMD architectures  . . . . . . 163--170
           D. T. Harper III and   
                     J. R. Jump   Performance evaluation of reduced
                                  bandwidth multistage interconnection
                                  networks . . . . . . . . . . . . . . . . 171--175
            U. Ramachandran and   
                 M. Solomon and   
                      M. Vernon   Hardware support for interprocess
                                  communication  . . . . . . . . . . . . . 178--188
                W. J. Dally and   
                    L. Chao and   
                   A. Chien and   
                 S. Hassoun and   
                  W. Horwat and   
                  J. Kaplan and   
                    P. Song and   
                   B. Totty and   
                       S. Wills   Architecture of a message-driven
                                  processor  . . . . . . . . . . . . . . . 189--196
                       M. Kumar   Effect of storage allocation/reclamation
                                  methods on parallelism and storage
                                  requirements . . . . . . . . . . . . . . 197--205
                J. H. Chang and   
                    H. Chao and   
                          K. So   Cache design of a sub-micron CMOS
                                  System/370 . . . . . . . . . . . . . . . 208--213
                     M. Freeman   An architectural perspective on a memory
                                  access controller  . . . . . . . . . . . 214--223
                  K. Cheung and   
                    G. Sohi and   
                  K. Saluja and   
                     D. Pradhan   Organization and analysis of a
                                  gracefully-degrading interleaved memory
                                  system . . . . . . . . . . . . . . . . . 224--231
               C. Scheurich and   
                      M. Dubois   Correct memory operation of cache-based
                                  multiprocessors  . . . . . . . . . . . . 234--243
              A. W. Wilson, Jr.   Hierarchical cache/bus architecture for
                                  shared memory multiprocessors  . . . . . 244--252
                  R. L. Lee and   
                  P. C. Yew and   
                   D. H. Lawrie   Multiprocessor cache design
                                  considerations . . . . . . . . . . . . . 253--262
           R. J. Eickemeyer and   
                    J. H. Patel   Performance evaluation of multiple
                                  register sets  . . . . . . . . . . . . . 264--271
              T. J. Stanley and   
                    R. G. Wedig   A performance analysis of automatically
                                  managed top of stack buffers . . . . . . 272--281
                   B. Moore and   
                  A. Padegs and   
                   R. Smith and   
                    W. Buchholz   Concepts of the System/370 vector
                                  architecture . . . . . . . . . . . . . . 282--288
             A. R. Pleszkun and   
              J. R. Goodman and   
                  W. C. Hsu and   
               R. T. Joersz and   
                    G. Bier and   
                   P. Woest and   
                P. B. Schechter   WISQ: a restartable architecture using
                                  queues . . . . . . . . . . . . . . . . . 290--299
                    P. Chow and   
                    M. Horowitz   Architectural tradeoffs in the design of
                                  MIPS-X . . . . . . . . . . . . . . . . . 300--308
               D. R. Ditzel and   
             H. R. McLellan and   
                A. D. Berenbaum   The hardware architecture of the CRISP
                                  microprocessor . . . . . . . . . . . . . 309--319

ACM SIGARCH Computer Architecture News
Volume 15, Number 3, June, 1987

              Matthew Moore and   
               Charles McDowell   Bi-directional networks for large
                                  parallel processors  . . . . . . . . . . 3--4
                     Ian Kaplan   The LDF 100: a large grain dataflow
                                  parallel processor . . . . . . . . . . . 5--12
                   Stanley Lass   Wide channel computers . . . . . . . . . 13--16
                Reinder J. Bril   An implementation independent approach
                                  to cache memories  . . . . . . . . . . . 17--24
                Reinder J. Bril   On cacheability of lock-variables in
                                  tightly coupled multiprocessor systems   25--32

ACM SIGARCH Computer Architecture News
Volume 15, Number 4, September, 1987

                   J. K. Iliffe   A forward-looking method of Cache memory
                                  control  . . . . . . . . . . . . . . . . 4--10
      Amitava Bandyopadhyay and   
                  Yuan F. Zheng   Combining both microcode and hardwired
                                  control in RISC  . . . . . . . . . . . . 11--15
                    Martin Dowd   An example RISC vector machine
                                  architecture . . . . . . . . . . . . . . 16--22
           Sanjiv K. Bhatia and   
                 A. G. Starling   Multilayered Illiac network scheme . . . 23--31
                   Lothar Nowak   SAMP:a general purpose processor based
                                  on a self-timed VLIW structure . . . . . 32--39
          Peter J. Ashenden and   
            Chris J. Barter and   
                Chris D. Marlin   The Leopard workstation project  . . . . 40--51
               Y. P. Chiang and   
                M. L. Manwaring   Direct execution Lisp and cell memory    52--57
                    J. M. Terry   Flow-control machines:the structured
                                  execution architecture (SXA) . . . . . . 58--69

ACM SIGARCH Computer Architecture News
Volume 15, Number 5, October, 1987

                  Niklaus Wirth   Hardware architectures for programming
                                  languages and programming languages for
                                  hardware architectures . . . . . . . . . 2--8
                   Bob Beck and   
                 Bob Kasten and   
              Shreekant Thakkar   VLSI assist for a multiprocessor . . . . 10--20
            Roberto Bisiani and   
               Alessandro Forin   Architectural support for multilanguage
                                  parallel programming on heterogeneous
                                  systems  . . . . . . . . . . . . . . . . 21--30
             Richard Rashid and   
            Avadis Tevanian and   
              Michael Young and   
                David Golub and   
                   Robert Baron   Machine-independent virtual memory
                                  management for paged uniprocessor and
                                  multiprocessor architectures . . . . . . 31--39
              John R. Hayes and   
          Martin E. Fraeman and   
         Robert L. Williams and   
                 Thomas Zaremba   An architecture for the direct execution
                                  of the Forth programming language  . . . 42--49
           Peter Steenkiste and   
                  John Hennessy   Tags and type checking in LISP: hardware
                                  and software approaches  . . . . . . . . 50--59
           Jack W. Davidson and   
             Richard A. Vaughan   The effect of instruction set complexity
                                  on program size and memory performance   60--64
        Russell R. Atkinson and   
            Edward M. McCreight   The dragon processor . . . . . . . . . . 65--69
               James R. Goodman   Coherency for multiprocessor virtual
                                  address caches . . . . . . . . . . . . . 72--81
              T. A. Cargill and   
                 B. N. Locanthi   Cheap hardware support for software
                                  debugging and profiling  . . . . . . . . 82--83
             C. J. Georgiou and   
               S. L. Palmer and   
                P. L. Rosenfeld   An experimental coprocessor for
                                  implementing persistent objects on an
                                  IBM 4381 . . . . . . . . . . . . . . . . 84--87
      Daniel J. Magenheimer and   
                 Liz Peters and   
                Karl Pettis and   
                      Dan Zuras   Integer multiplication and division on
                                  the HP precision architecture  . . . . . 90--99
              David W. Wall and   
              Michael L. Powell   The Mahler experience: using an
                                  intermediate language as the machine
                                  description  . . . . . . . . . . . . . . 100--104
               Shlomo Weiss and   
                 James E. Smith   A study of scalar compilation techniques
                                  for pipelined supercomputers . . . . . . 105--109
            William R. Bush and   
            A. Dain Samples and   
                David Ungar and   
              Paul N. Hilfinger   Compiling Smalltalk-80 to a RISC . . . . 112--116
                    F. Chow and   
                 S. Correll and   
              M. Himelstein and   
                 E. Killian and   
                       L. Weber   How many addressing modes are enough?    117--121
                 Henry Massalin   Superoptimizer: a look at the smallest
                                  program  . . . . . . . . . . . . . . . . 122--126
                 Kazuo Taki and   
           Katzuto Nakajima and   
          Hiroshi Nakashima and   
                 Morihiro Ikeda   Performance and architectural evaluation
                                  of the PSI machine . . . . . . . . . . . 128--135
          Gaetano Borriello and   
        Andrew R. Cherenson and   
            Peter B. Danzig and   
              Michael N. Nelson   RISCs vs. CISCs for Prolog: a case study 136--145
            Richard B. Kieburtz   A RISC architecture for symbolic
                                  computation  . . . . . . . . . . . . . . 146--155
            David R. Ditzel and   
         Hubert R. McLellan and   
              Alan D. Berenbaum   Design tradeoffs to support the C
                                  programming language in the CRISP
                                  microprocessor . . . . . . . . . . . . . 158--163
         Charles P. Thacker and   
            Lawrence C. Stewart   Firefly: a multiprocessor workstation    164--172
               Douglas W. Clark   Pipelining and performance in the VAX
                                  8800 processor . . . . . . . . . . . . . 173--177
          Robert P. Colwell and   
              Robert P. Nix and   
          John J. O'Donnell and   
          David B. Papworth and   
                 Paul K. Rodman   A VLIW architecture for a trace
                                  scheduling compiler  . . . . . . . . . . 180--192
             Adam Levinthal and   
               Pat Hanrahan and   
              Mike Paquette and   
                     Jim Lawson   Parallel computers for graphics
                                  applications . . . . . . . . . . . . . . 193--198
                J. E. Smith and   
               G. E. Dermer and   
           B. D. Vanderwarn and   
              S. D. Klinger and   
                 C. M. Rozewski   The ZS-1 central processor . . . . . . . 199--204

ACM SIGARCH Computer Architecture News
Volume 15, Number 6, December, 1987

          E. E. E. Frietman and   
                A. B. Ruighaver   An electro-optic data communication
                                  system for the Delft parallel processor  2--8
              G. B. Shippen and   
                J. K. Archibald   A tagged token dataflow machine for
                                  computing small, iterative algorithms    9--18

ACM SIGARCH Computer Architecture News
Volume 16, Number 1, March, 1988

                      Clif Penn   Preface to the Special issue on Neural
                                  Networks . . . . . . . . . . . . . . . . 6--6
            Richard P. Lippmann   An introduction to computing with neural
                                  nets . . . . . . . . . . . . . . . . . . 7--25
          James A. Anderson and   
       Edward J. Wisniewski and   
               Susan R. Viscuso   Software for neural networks . . . . . . 26--36
                Simon Garth and   
                     Danny Pike   An integrated system for neural network
                                  simulations  . . . . . . . . . . . . . . 37--44
                  A. Jean Maren   Conference report: IEEE First
                                  International Conference on Neural
                                  Networks . . . . . . . . . . . . . . . . 45--46
               Jack J. Dongarra   Performance of various computers using
                                  standard linear equations software in a
                                  FORTRAN environment  . . . . . . . . . . 47--69
                    Wm. A. Wulf   The WM computer architecture . . . . . . 70--84
                   Daniel Tabak   Logarithmic indices for multiprocessor
                                  evaluation . . . . . . . . . . . . . . . 85--90
                    Martin Dowd   An example RISC vector machine
                                  architecture . . . . . . . . . . . . . . 91--99
                    Martin Dowd   RISC vector CPU's and crossbars in
                                  desktops . . . . . . . . . . . . . . . . 100--102
                   Stanley Lass   Multiple instructions/operands per
                                  access to cache memory . . . . . . . . . 103--103
                     Wanda Gass   Workshop report: synthesis of foo bars   104--108
               F. Joel Ferguson   Book Review: \em Logic Design Principles
                                  by Edward J. McCluskey, Prentice-Hall
                                  Publishers, Englewood Cliffs, New
                                  Jersey, 549 pp., \$39.95}  . . . . . . . 109--109

ACM SIGARCH Computer Architecture News
Volume 16, Number 2, May, 1988

                   J. Ghosh and   
                       K. Hwang   Critical issues in mapping neural
                                  networks on message-passing
                                  multicomputers . . . . . . . . . . . . . 3--11
                Y. Takefuji and   
               R. Jannarone and   
                  Y. B. Cho and   
                        T. Chen   Multinomial conjunctoid statistical
                                  learning machines  . . . . . . . . . . . 12--17
                   A. Louri and   
                       K. Hwang   A bit-plane architecture for optical
                                  computing with two-dimensional symbolic
                                  substitution . . . . . . . . . . . . . . 18--27
                   S. Fiske and   
                    W. J. Dally   The reconfigurable arithmetic processor  30--36
             A. R. Pleszkun and   
                     G. S. Sohi   The performance potential of multiple
                                  functional unit processors . . . . . . . 37--44
                  W. W. Hwu and   
                    P. P. Chang   Exploiting parallel microprocessor
                                  microarchitectures with a compiler code
                                  generator  . . . . . . . . . . . . . . . 45--53
              G. D. McNiven and   
                 E. S. Davidson   Analysis of memory referencing behavior
                                  for design of local memories . . . . . . 56--63
          R. J. Eickenmeyer and   
                    J. H. Patel   Performance evaluation of on-chip
                                  register and cache organizations . . . . 64--72
                 J.-L. Baer and   
                     W.-H. Wang   On the inclusion properties for
                                  multi-level cache hierarchies  . . . . . 73--80
                R. T. Short and   
                     H. M. Levy   A simulation study of two-level caches   81--88
                    E. Chow and   
                   H. Madan and   
                J. Peterson and   
                D. Grunwald and   
                        D. Reed   Hyperswitch network for the hypercube
                                  computer . . . . . . . . . . . . . . . . 90--99
               D. C. Winsor and   
                    T. N. Mudge   Analysis of bus hierarchies for
                                  multiprocessors  . . . . . . . . . . . . 100--107
                     S. Wei and   
                         G. Lee   Extra group network: a cost-effective
                                  fault-tolerant multistage
                                  interconnection network  . . . . . . . . 108--115
                   H. Jiang and   
                    K. C. Smith   A partial-multiple-bus computer
                                  structure with improved cost
                                  effectiveness  . . . . . . . . . . . . . 116--122
                  I. Watson and   
                   V. Woods and   
                  P. Watson and   
                  R. Banach and   
               M. Greenberg and   
                    J. Sargeant   Flagship: a parallel architecture for
                                  declarative programming  . . . . . . . . 124--130
                 R. A. Iannucci   Toward a dataflow/von Neumann hybrid
                                  architecture . . . . . . . . . . . . . . 131--140
               D. E. Culler and   
                         Arvind   Resource requirements of dataflow
                                  programs . . . . . . . . . . . . . . . . 141--150
                  B. Sprunt and   
                    D. Kirk and   
                         L. Sha   Priority-driven, preemptive I/O
                                  controllers for real-time systems  . . . 152--159
               S. B. Shukla and   
                  D. P. Agrawal   A kernel-independent, pipelined
                                  architecture for real-time $2$-D
                                  convolution  . . . . . . . . . . . . . . 160--166
                     W. Liu and   
                  T.-F. Yeh and   
            W. E. Batchelor and   
                       R. Cavin   Exploiting bit level concurrency in
                                  real-time geometric feature extractions  167--174
                D. W. Clark and   
               P. J. Bannon and   
                   J. B. Keller   Measuring VAX 8800 performance with a
                                  histogram hardware monitor . . . . . . . 176--185
                R. L. Sites and   
                     A. Agarwal   Multiprocessor cache analysis using ATUM 186--195
                      S. Ng and   
                    D. Lang and   
                    R. Selinger   Trade-offs between devices and paths in
                                  achieving disk interleaving  . . . . . . 196--201
           K. Jainandunsing and   
               E. F. Deprettere   Design of a concurrent computer for
                                  solving systems of linear equations  . . 204--211
                   A. Wolfe and   
         M. Breternitz, Jr. and   
                C. Stephens and   
                 A. L. Ting and   
                 D. B. Kirk and   
       R. P. Bianchini, Jr. and   
                     J. P. Shen   The white dwarf: a high-performance
                                  application-specific processor . . . . . 212--222
              J. L. Gaudiot and   
                  C. M. Lin and   
                 M. Hosseiniyar   Solving partial differential equations
                                  in a data-driven multiprocessor
                                  environment  . . . . . . . . . . . . . . 223--230
                         D. Lee   Scrambled storage for parallel memory
                                  systems  . . . . . . . . . . . . . . . . 232--239
            V. Krishnaswamy and   
                   S. Ahuja and   
                N. Carriero and   
                   D. Gelernter   The architecture of a Linda coprocessor  240--249
                     H. T. Kung   Deadlock avoidance for systolic
                                  communication  . . . . . . . . . . . . . 252--260
                      K. So and   
                       V. Zecca   Cache performance of vector processors   261--268
               M. K. Vernon and   
                      U. Manber   Distributed round-robin and first-come
                                  first-serve protocols and their
                                  applications to multiprocessor bus
                                  arbitration  . . . . . . . . . . . . . . 269--279
                 A. Agarwal and   
                  R. Simoni and   
                J. Hennessy and   
                    M. Horowitz   An evaluation of directory schemes for
                                  cache coherence  . . . . . . . . . . . . 280--298
               S. Prybylski and   
                M. Horowitz and   
                    J. Hennessy   Performance tradeoffs in cache design    290--298
                  H. Cheong and   
               A. V. Vaidenbaum   A cache coherence scheme with fast
                                  selective invalidation . . . . . . . . . 299--307
               M. K. Vernon and   
             E. D. Lazowska and   
                    J. Zahorjan   An accurate and efficient performance
                                  analysis technique for multiprocessor
                                  snooping cache-consistency protocols . . 308--315
                     D. Rau and   
            J. A. B. Fortes and   
                   H. J. Siegel   Destination tag routing techniques based
                                  on a state model for the LADM network    318--324
                  D. W. Kim and   
             G. J. Lipovski and   
                A. Hartmann and   
                    R. Jenevein   Regular CC-banyan networks . . . . . . . 325--332
             R. M. Jenevein and   
                     T. Mookken   Traffic analysis of rectangular
                                  SW-banyan networks . . . . . . . . . . . 333--342
                   Y. Tamir and   
                  G. L. Frazier   High-performance multi-queue buffers for
                                  VLSI communications switches . . . . . . 343--354
               B. R. Preiss and   
                 V. C. Hamacher   A cache-based message passing scheme for
                                  a shared-bus multiprocessor  . . . . . . 358--364
                    T. Boku and   
                  S. Nomura and   
                       H. Amano   IMPULSE: a high performance processing
                                  unit for multiprocessors for scientific
                                  calculation  . . . . . . . . . . . . . . 365--372
               S. J. Eggers and   
                     R. H. Katz   A characterization of sharing in
                                  parallel programs and its application to
                                  coherency protocol evaluation  . . . . . 373--382
             G. J. Lipovski and   
                     P. Vaughan   A fetch-and-op implementation for
                                  parallel computers . . . . . . . . . . . 384--392
                  A. Seznec and   
                Y. Jégou   Synchronizing processors through memory
                                  requests in a tightly coupled
                                  multiprocessor . . . . . . . . . . . . . 393--400
             R. M. Fujimoto and   
                 J.-J. Tsai and   
              G. Gopalakrishnan   Design and performance of special
                                  purpose hardware for time warp . . . . . 401--409
             D. R. Cheriton and   
                   A. Gupta and   
                P. D. Boyle and   
                   H. A. Goosen   The VMP multiprocessor: initial
                                  experience, refinements, and performance
                                  evaluation . . . . . . . . . . . . . . . 410--421
              J. R. Goodman and   
                    P. J. Woest   The Wisconsin multicube: a new
                                  large-scale cache-coherent
                                  multiprocessor . . . . . . . . . . . . . 422--431
                        E. Tick   Data buffer performance for sequential
                                  Prolog architectures . . . . . . . . . . 434--442
        R. H. Halstead, Jr. and   
                      T. Fujita   MASA: a multithreaded processor
                                  architecture for parallel symbolic
                                  computing  . . . . . . . . . . . . . . . 443--451
               P. L. Butler and   
           J. D. Allen, Jr. and   
                  D. W. Bouldin   Parallel architecture for OPS5 . . . . . 452--457

ACM SIGARCH Computer Architecture News
Volume 16, Number 3, June, 1988

          David R. Cheriton and   
                  Pat Boyle and   
             Gert A. Slavenburg   Comments on ``Coherency for
                                  multiprocessor virtual addresses
                                  caches'' by James R. Goodman . . . . . . 3--6
               James R. Goodman   Reply to David R. Cheriton's, Pat
                                  Boyle's, and Gert A. Slavenburg's
                                  ``Comments on 'Coherency for
                                  multiprocessor virtual addressed
                                  caches''\,' by James R. Goodman  . . . . 7--7
                 Guy Rabbat and   
                Borko Furht and   
                     Ron Kibler   Three-dimensional computers and
                                  measuring their performance  . . . . . . 9--16
                  M. Castan and   
                A. Contessa and   
                  E. Cousin and   
                 C. Coustet and   
                    B. Lecussan   MaRs: a parallel graph reduction
                                  multiprocessor . . . . . . . . . . . . . 17--24
            Alessandro Contessa   An approach to fault tolerance and error
                                  recovery in a parallel graph reduction
                                  machine: MaRS---a case study . . . . . . 25--32
                 Chuck Crawford   Evolution of the Harris H-series
                                  computers and speculations on their
                                  future . . . . . . . . . . . . . . . . . 33--39
                 Philip L. Good   Structuring an instruction cache . . . . 40--43
                Eric E. Johnson   Completing an MIMD multiprocessor
                                  taxonomy . . . . . . . . . . . . . . . . 44--47
               Douglas W. Jones   The ultimate RISC  . . . . . . . . . . . 48--55
               Douglas W. Jones   A minimal CISC . . . . . . . . . . . . . 56--63
                   Stanley Lass   Shared cache multiprocessing with pack
                                  computers  . . . . . . . . . . . . . . . 64--70
               Norman P. Jouppi   Superscalar vs. superpipelined machines  71--80
             Lorne H. Schachter   Book review of \em High-Performance
                                  Computer Architecture by Harold S.
                                  Stone. Addison-Wesley 1987 . . . . . . . 81--84

ACM SIGARCH Computer Architecture News
Volume 16, Number 4, September, 1988

        Umakishore Ramachandran   Preface to the Special Issue on
                                  Architectural Support for Operating
                                  Systems  . . . . . . . . . . . . . . . . 11--11
                 A. Asthana and   
             H. V. Jagadish and   
            J. A. Chandross and   
                     D. Lin and   
                   S. C. Knauer   An intelligent memory system . . . . . . 12--20
         Monica Beltrametti and   
              Kenneth Bobey and   
                 John R. Zorbas   The control mechanism for the Myrias
                                  parallel computer system . . . . . . . . 21--30
             Raphael Finkel and   
                  Debra Hengsen   YACKOS on a shared-memory multiprocessor 31--36
              Marc F. Pucci and   
                   J. L. Alberi   Optimized communication in an extended
                                  remote procedure call model  . . . . . . 37--46
           Jordi Cortadella and   
             Teodor Jové   Dynamic RAM for on-chip instruction
                                  caches . . . . . . . . . . . . . . . . . 45--50
                      M. Naderi   Modelling and performance evaluation of
                                  multiprocessors organization with shared
                                  memories . . . . . . . . . . . . . . . . 51--74
           Edward Gehringer and   
           Janne Abullarade and   
               Michael H. Gulyn   A survey of commercial parallel
                                  processors . . . . . . . . . . . . . . . 75--107
                 Mark Lease and   
                     Mac Lively   Comparing production system
                                  architectures  . . . . . . . . . . . . . 108--116
                  Ivor Page and   
                   Jeff Niehaus   The Flex architecture, a high speed
                                  graphics processor . . . . . . . . . . . 117--129
           Kazuaki Murakami and   
               Akira Fukuda and   
         Toshinori Sueyoshi and   
                  Shinji Tomita   An overview of the Kyushu University
                                  reconfigurable parallel processor  . . . 130--137
              Ora E. Percus and   
                   J. K. Percus   Some results concerning clock-regulated
                                  queues . . . . . . . . . . . . . . . . . 138--144
           Fleur Liane Williams   Should SCC set condition codes?  . . . . 145--149
               Gordon B. Steven   A novel effective address calculation
                                  mechanism for RISC microprocessors . . . 150--156
                Behrooz Parhami   From defects to failures: a view of
                                  dependable computing . . . . . . . . . . 157--168
             David A. Patterson   RISCY patents  . . . . . . . . . . . . . 169--191
                Helen C. Takacs   Book review: \em A VLSI Architecture for
                                  Concurrent Data Structures by William J.
                                  Dally (Kluwer 1988)  . . . . . . . . . . 192--193
              Robert P. Colwell   Book review: \em Computer Architecture
                                  and Organization, 2nd ed. by John P.
                                  Hayes (McGraw Hill, 1988)  . . . . . . . 193--195
            Charles E. McDowell   Book review: \em Supercomputer
                                  Architectures by Paul B. Schneck (Kluwer
                                  Academic Publishers) . . . . . . . . . . 195--196

ACM SIGARCH Computer Architecture News
Volume 16, Number 5, December, 1988

          Herbert H. J. Hum and   
                   Guang R. Gao   Summary of the workshop on frontiers in
                                  functional programming and dataflow
                                  architecture . . . . . . . . . . . . . . 12--19
           Andre M. van Tilborg   Instrumentation for distributed
                                  computing systems  . . . . . . . . . . . 20--25
               Glenn W. Griffin   The ultimate ultimate RISC . . . . . . . 26--32
               Douglas W. Jones   Risks of comparing RISCs . . . . . . . . 33--34
                      M. Naderi   Modelling and performance evaluation of
                                  multiprocessors, organizations with
                                  multi-memory units . . . . . . . . . . . 35--51
                Peter Kogge and   
              John Oldfield and   
                 Mark Brule and   
                Charles Stormon   VLSI and rule-based systems  . . . . . . 52--65
                Behrooz Parhami   Book review: \em Memory Storage Patterns
                                  in Parallel Processing by Mary A. Mace
                                  (Kluwer Academic Publishers, Boston,
                                  1987, 139 pp.) . . . . . . . . . . . . . 76--76

ACM SIGARCH Computer Architecture News
Volume 17, Number 1, March, 1989

            J. P. Moskowitz and   
                   C. Jousselin   An algebraic memory model  . . . . . . . 55--62
                     W. F. Wong   A stack addressing scheme based on
                                  windowing  . . . . . . . . . . . . . . . 63--69
                      Anonymous   Pipelining through Dynamic Control ROM   70--72
                Stanley E. Lass   Some innovations in computer
                                  architecture . . . . . . . . . . . . . . 73--77
                   Philip Bitar   Book reviews: Review of \em Parallel
                                  Execution of Logic Programs by John
                                  Conery. Kluwer Academic Publishers 1987  81--82

ACM SIGARCH Computer Architecture News
Volume 17, Number 2, April, 1989

                Robert Cohn and   
               Thomas Gross and   
                     Monica Lam   Architecture and compiler tradeoffs for
                                  a long instruction word processor  . . . 2--14
           Gurindar S. Sohi and   
               Sriram Vajapeyam   Tradeoffs in instruction format design
                                  for horizontal architectures . . . . . . 15--25
           James C. Dehnert and   
            Peter Y.-T. Hsu and   
                Joseph P. Bratt   Overlapped loop support in the Cydra 5   26--38
            F. J. Burkowski and   
              G. V. Cormack and   
                 G. D. P. Dueck   Architectural support for synchronous
                                  task communication . . . . . . . . . . . 40--53
                    Rajiv Gupta   The fuzzy barrier: a mechanism for high
                                  speed synchronization of processors  . . 54--63
           James R. Goodman and   
             Mary K. Vernon and   
                Philip J. Woest   Efficient synchronization primitives for
                                  large-scale cache-coherent
                                  multiprocessors  . . . . . . . . . . . . 64--75
       J. M. Mellor-Crummey and   
                  T. J. LeBlanc   A software instruction counter . . . . . 78--86
                    Z. Aral and   
                 I. Gerther and   
                    G. Schaffer   Efficient debugging primitives for
                                  multiprocessors  . . . . . . . . . . . . 87--95
                  M. E. Staknis   Sheaved memory: architectural support
                                  for state saving and restoration in
                                  pages systems  . . . . . . . . . . . . . 96--102
                 M. A. Holliday   Reference history, page size, and
                                  migration daemons in local/remote
                                  architectures  . . . . . . . . . . . . . 104--112
                D. L. Black and   
               R. F. Rashid and   
                D. B. Golub and   
                     C. R. Hill   Translation lookaside buffer
                                  consistency: a software approach . . . . 113--122
               G. A. Gibson and   
             L. Hellerstein and   
                 R. M. Karp and   
                D. A. Patterson   Failure correction techniques for large
                                  disk arrays  . . . . . . . . . . . . . . 123--132
               N. P. Jouppi and   
                 J. Bertoni and   
                     D. W. Wall   A unified vector/scalar floating-point
                                  architecture . . . . . . . . . . . . . . 134--143
                      H. Mulder   Data buffering: run-time versus
                                  compile-time support . . . . . . . . . . 144--151
                T. L. Adams and   
                R. E. Zimmerman   An analysis of 8086 instruction set
                                  usage in MS DOS programs . . . . . . . . 152--160
                        J. Roos   A real-time support processor for Ada
                                  tasking  . . . . . . . . . . . . . . . . 162--171
          Steven R. Vegdahl and   
                  Uwe F. Pleban   The runtime environment for Scheme, a
                                  Scheme implementation on the 88000 . . . 172--182
                   S. McFarling   Program optimization for instruction
                                  caches . . . . . . . . . . . . . . . . . 183--191
                 Paul A. Karger   Using registers to optimize cross-domain
                                  call performance . . . . . . . . . . . . 194--204
           Emmanuel Arnould and   
                 H. T. Kung and   
       François Bitz and   
           Robert D. Sansom and   
                Eric C. Cooperm   The design of nectar: a network
                                  backplane for heterogeneous
                                  multicomputers . . . . . . . . . . . . . 205--216
     S. A. Delgado-Rannauro and   
                 T. J. Reynolds   A message driven OR-parallel machine . . 217--228
                  S. Owicki and   
                     A. Agarwal   Evaluating the performance of software
                                  cache coherence  . . . . . . . . . . . . 230--242
                   W. Weber and   
                       A. Gupta   Analysis of cache invalidation patterns
                                  in multiprocessors . . . . . . . . . . . 243--256
               S. J. Eggers and   
                     R. H. Katz   The effect of sharing on the cache and
                                  bus performance of parallel programs . . 257--270
               N. P. Jouppi and   
                     D. W. Wall   Available instruction-level parallelism
                                  for superscalar and superpipelined
                                  machines . . . . . . . . . . . . . . . . 272--282
                    W. J. Dally   Micro-optimization of floating-point
                                  operations . . . . . . . . . . . . . . . 283--289
                M. D. Smith and   
                 M. Johnson and   
                 M. A. Horowitz   Limits on multiple instruction issue . . 290--302

ACM SIGARCH Computer Architecture News
Volume 17, Number 3, June, 1989

               S. J. Eggers and   
                     R. H. Katz   Evaluating the performance of four
                                  snooping cache coherency protocols . . . 2--15
             D. R. Cheriton and   
               H. A. Goosen and   
                    P. D. Boyle   Multi-level shared caching techniques
                                  for scalability in VMP-M/C . . . . . . . 16--24
                    A. Goto and   
               A. Matsumoto and   
                        E. Tick   Design and performance of a coherent
                                  cache for parallel logic programming
                                  architectures  . . . . . . . . . . . . . 25--33
                V. G. Grafe and   
             G. S. Davidson and   
                 J. E. Hoch and   
                   V. P. Holmes   The Epsilon dataflow processor . . . . . 36--45
                   S. Sakai and   
               y. Yamaguchi and   
                  K. Hiraki and   
                  Y. Kodama and   
                        T. Yuba   An architecture of a dataflow single
                                  chip processor . . . . . . . . . . . . . 46--53
                     P. Nitezki   Exploiting data parallelism in signal
                                  processing on a dataflow machine . . . . 54--61
               R. N. Ibbett and   
              T. M. Hopkins and   
              K. I. M. McKinnon   Architectural mechanisms to support
                                  sparse vector processing . . . . . . . . 64--71
               D. T. Harper and   
               D. A. Linebarger   A dynamic storage scheme for
                                  conflict-free vector access  . . . . . . 72--77
                K. Murakami and   
                    N. Irie and   
                      S. Tomita   SIMP (Single Instruction stream/Multiple
                                  instruction Pipelining): a novel
                                  high-speed single-processor architecture 78--85
               Y. Ben-Asher and   
                   D. Egozi and   
                    A. Schuster   $2$-D SIMD algorithms in the perfect
                                  shuffle networks . . . . . . . . . . . . 88--95
           M. Valero-Garcia and   
              J. J. Navarro and   
             J. M. Llaberia and   
                      M. Valero   Systematic hardware adaptation of
                                  systolic algorithms  . . . . . . . . . . 96--104
                 M.-S. Chen and   
                     K. G. Shin   Task migration in hypercube
                                  multiprocessors  . . . . . . . . . . . . 105--111
              S. Przybylski and   
                M. Horowitz and   
                    J. Hennessy   Characteristics of performance-optimal
                                  multi-level cache hierarchies  . . . . . 114--121
                 D. A. Wood and   
                     R. H. Katz   Supporting reference and dirty bits in
                                  SPUR's virtual address cache . . . . . . 122--130
              R. E. Kessler and   
                   R. Jooss and   
                  A. Lebeck and   
                     M. D. Hill   Inexpensive implementations of
                                  set-associativity  . . . . . . . . . . . 131--139
                 W. H. Wang and   
                 J.-L. Baer and   
                     H. M. Levy   Organization and performance of a
                                  two-level virtual-real cache hierarchy   140--148
             C. R. Jesshope and   
               P. R. Miller and   
                 J. T. Yantchev   High performance communications in
                                  processor networks . . . . . . . . . . . 150--157
              H. E. Mizrahi and   
                 J. L. Baer and   
             E. D. Lazowska and   
                    J. Zahorjan   Introducing memory into the switch
                                  elements of multiprocessor
                                  interconnection networks . . . . . . . . 158--166
                S. L. Scott and   
                     G. S. Sohi   Using feedback to control tree
                                  saturation in multistage interconnection
                                  networks . . . . . . . . . . . . . . . . 167--176
         P. D. Ezhilchelvan and   
          S. K. Shrivastava and   
                       A. Tully   Constructing replicated systems using
                                  processors with point-to-point
                                  communication links  . . . . . . . . . . 177--184
                  H. Benker and   
               J. M. Beacco and   
             M. Dorochevsky and   
          Th. Jeffré and   
           A. Pöhlmann and   
             J. Noyé and   
                 B. Poterie and   
                 J. C. Syre and   
                O. Thibault and   
                   G. Watzlawik   KCM: a knowledge crunching machine . . . 186--194
                 A. Singhal and   
                     Y. N. Patt   A high performance Prolog processor with
                                  multiple function units  . . . . . . . . 195--202
                 M. Morioka and   
               S. Yamaguchi and   
                      T. Bandoh   Evaluation of memory system for
                                  integrated Prolog processor IPP  . . . . 203--210
                 K.-F. Wong and   
                 M. H. Williams   A type driven hardware engine for Prolog
                                  clause retrieval over a large knowledge
                                  base . . . . . . . . . . . . . . . . . . 211--222
                  W. W. Hwu and   
                T. M. Conte and   
                    P. P. Chang   Comparing software and hardware schemes
                                  for reducing the cost of branches  . . . 224--233
              M. K. Farrens and   
                 a. R. Pleszkun   Improving performance of small on-chip
                                  instruction caches . . . . . . . . . . . 234--241
                  W. W. Hwu and   
                    P. P. Chang   Achieving high instruction cache
                                  performance with an optimizing compiler  242--251
                  P. Steenkiste   The impact of code density on
                                  instruction cache performance  . . . . . 252--259
                   R. S. Nikhil   Can dataflow subsume von Neumann
                                  computing? . . . . . . . . . . . . . . . 262--272
                W.-D. Weber and   
                       A. Gupta   Exploring the benefits of multiple
                                  hardware contexts in a multiprocessor
                                  architecture: preliminary results  . . . 273--280
                   N. P. Jouppi   Architectural and organizational
                                  tradeoffs in the design of the
                                  MultiTitan CPU . . . . . . . . . . . . . 281--289
                    M. Sato and   
                S. Ichikawa and   
                        E. Goto   Run-time checking in Lisp by integrating
                                  memory addressing and range checking . . 290--297
                  A. Hopper and   
                   A. Jones and   
                     D. Lioupis   Multiple vs. wide shared bus
                                  multiprocessors  . . . . . . . . . . . . 300--306
              M. Annaratone and   
                   R. Rühl   Performance measurements on a commercial
                                  multiprocessor running parallel code . . 307--314
              M. Annaratone and   
               C. Pommerell and   
                   R. Rühl   Interprocessor communication speed and
                                  performance in distributed-memory
                                  parallel processors  . . . . . . . . . . 315--324
               D. S. Ghosal and   
             S. K. Tripathi and   
               L. N. Bhuyan and   
                       H. Jiang   Analysis of computation-communication
                                  issues in dynamic dataflow architectures 325--333
                 S. Kravitz and   
               R. E. Bryant and   
                    R. Rutenbar   Logic simulation on massively parallel
                                  architectures  . . . . . . . . . . . . . 336--343
                T. Fukazawa and   
                  T. Kimura and   
                M. Tomizawa and   
                  K. Takeda and   
                        Y. Itoh   R256: a research parallel processor for
                                  scientific computation . . . . . . . . . 344--351
                M. L. Anido and   
             D. J. Allerton and   
                  E. J. Zaluska   A three-port/three-access register file
                                  for concurrent processing and I/O
                                  communication in a RISC-like graphics
                                  engine . . . . . . . . . . . . . . . . . 354--361
               J. M. Mulder and   
              R. J. Portier and   
              A. Srivastava and   
                   R. in't Velt   An architecture framework for
                                  application-specific and scalable
                                  architectures  . . . . . . . . . . . . . 362--369
                     K. Kim and   
           V. K. Prasanna-Kumar   Perfect Latin squares and parallel array
                                  access . . . . . . . . . . . . . . . . . 372--379
                       S. Weiss   An aperiodic storage scheme to reduce
                                  memory conflicts in vector processors    380--386
                 C.-L. Chen and   
                     C.-K. Liao   Analysis of vector access performance on
                                  skewed interleaved memory  . . . . . . . 387--394
                 A. Agarwal and   
                     M. Cherian   Adaptive backoff synchronization
                                  techniques . . . . . . . . . . . . . . . 396--406
              P. Stenström   A cache consistency protocol for
                                  multiprocessors with multistage networks 407--415
                   H.-M. Su and   
                      P.-C. Yew   On data synchronization for
                                  multiprocessors  . . . . . . . . . . . . 416--423

ACM SIGARCH Computer Architecture News
Volume 17, Number 4, June, 1989

              A. M. van Tilborg   Panel on future directions in parallel
                                  computer architecture  . . . . . . . . . 3--53
              N. J. Gunther and   
                     M. T. Noga   ParcBench: a benchmark for shared-memory
                                  architectures  . . . . . . . . . . . . . 54--61
                A. Elkateeb and   
                     T. Le-Ngoc   A priority strategy on RISC for
                                  real-time multitasking software
                                  applications . . . . . . . . . . . . . . 62--68
                    Y.-J. Oyang   A multiprocessor configuration in
                                  accordance with the aspects of physical
                                  and systems design . . . . . . . . . . . 69--73
                    H. Seebauer   A memory controller executing segment
                                  operations in time $ O(1) $  . . . . . . 74--81
                 R. J. Schwartz   The design and development of a dynamic
                                  program behavior measurement tool for
                                  the Intel 8086/88  . . . . . . . . . . . 82--94
               A. J. Martin and   
                S. M. Burns and   
                  T. K. Lee and   
                D. Borkovic and   
               P. J. Hazewindus   The first asynchronous microprocessor:
                                  the test results . . . . . . . . . . . . 95--110
                     F. Cornett   The UT1000 microprogramming simulator:
                                  an educational tool  . . . . . . . . . . 111--118
                 C. K. Yuen and   
                     W. F. Wong   A bidirectional data driven Lisp engine
                                  for the direct execution of Lisp in
                                  parallel . . . . . . . . . . . . . . . . 119--130

ACM SIGARCH Computer Architecture News
Volume 17, Number 5, September, 1989

                  M. Smotherman   A sequencing-based taxonomy of I/O
                                  systems and review of historical
                                  machines . . . . . . . . . . . . . . . . 5--15
                     R. Cousins   DMA considerations on RISC workstations  16--23
                     R. H. Katz   A project on high performance I/O
                                  subsystems . . . . . . . . . . . . . . . 24--31
               P. C. Dibble and   
                    M. L. Scott   Beyond striping: the bridge
                                  multiprocessor file system . . . . . . . 32--39
             A. L. N. Reddy and   
                    P. Banerjee   A study parallel disk organizations  . . 40--47
                J. M. Smith and   
             G. Q. Maguire, Jr.   Measured response times for page-sized
                                  fetches on a network . . . . . . . . . . 48--54
                  B. Wolman and   
                    T. M. Olson   IOBENCH: a system independent IO
                                  benchmark  . . . . . . . . . . . . . . . 55--70
                    T. M. Oslon   Disk array performance in a random IO
                                  environment  . . . . . . . . . . . . . . 71--77
                   B. L. Wolman   An analysis of server-based locking  . . 78--82
                  E. H. Debaere   Instruction-path coprocessing to solve
                                  some RISC problems . . . . . . . . . . . 83--94
                    H. Seebauer   A memory controller executing segment
                                  operations in time $ O(1) $  . . . . . . 95--102
                     P. K. Chiu   Representation of logic functions by
                                  if--then clauses . . . . . . . . . . . . 103--107
                 C. Baleanu and   
                     D. Tomescu   Embedding computers in a cellular array  108--115
                        S. Lass   On hardware enhanced 80386 software
                                  emulation, compiled emulation, a program
                                  distribution language, and pack
                                  computers  . . . . . . . . . . . . . . . 116--118

ACM SIGARCH Computer Architecture News
Volume 17, Number 6, December, 1989

             Daniel Litaize and   
               Omar Hammami and   
             Mustapha Lalam and   
           Adelaziz Mzoughi and   
                   Pascl Sinrat   Multiprocessors with a serial multiport
                                  memory and a pseudo crossbar of serial
                                  links used as a processor-memory switch  8--21
                 G. Fritsch and   
                 W. Henning and   
                H. Hesenuer and   
                    R. Klar and   
              C. U. Linster and   
             C. w. Oehlrich and   
                 P. Schlenk and   
                      J. Vokert   Distributed shared memory multiprocessor
                                  architecture MEMSY for high performance
                                  parallel computations  . . . . . . . . . 22--35
               A. Mendelson and   
              D. K. Pradhan and   
                    A. D. Singh   A single cached copy data coherence
                                  scheme for multiprocessor systems  . . . 36--49
          Dror G. Feitelson and   
                  Larry Rudolph   Architecture for a multi-user
                                  general-purpose parallel system  . . . . 50--56
                 D. Quammen and   
               D. R. Miller and   
                       D. Tabak   Register window architecture for
                                  multitasking applications  . . . . . . . 57--66
               Arnold Rosenberg   Efficient emulations of interconnection
                                  networks . . . . . . . . . . . . . . . . 67--79
          Isaac D. Scherson and   
               Peter F. Corbett   Description and performance of a class
                                  of orthogonal multiprocessor networks    80--90
                Llana David and   
                Ran Ginosar and   
                  Michael Yoeli   An efficient implementation of Boolean
                                  functions and finite state machine as
                                  self-timed circuit . . . . . . . . . . . 91--104
           Apostolos Dollan and   
                Robert F. Krick   The case for the sustained performance
                                  computer architecture  . . . . . . . . . 129--136
                Eric E. Johnson   Working set prefetching for cache
                                  memories . . . . . . . . . . . . . . . . 137--141
                K. e H. Lee and   
                      C. H. Lam   Massage-passing controller for a
                                  shared-memory multiprocessor . . . . . . 142--149
             Tsong-Chih Hsu and   
                 Ling-Yang Kung   Logic and conflict-free vector addresses 150--153
             Tsong-Chih Hsu and   
                 Ling-Yang Kung   An address generation unit for array
                                  accessing  . . . . . . . . . . . . . . . 154--160
             Tsong-Chih Hsu and   
                 Ling-Yang Kung   A hardware mechanism for priority queue  162--169

ACM SIGARCH Computer Architecture News
Volume 18, Number 1, March, 1990

                      V. Dvorak   Microsequencer architecture supporting
                                  arbitrary branching up to 2m targets . . 9--9
               Jack J. Dongarra   Performance of various computers using
                                  standard linear equations software . . . 17--17
           Tsong---Chih Hsu and   
               Ling---Yang Kung   A comment on ``A Fetch-and-Op
                                  Implementation for Parallel Computers''  32--32
                 Robert Cousins   A novel approach to character interfaces 35--35
                 Robert Cousins   A reentrant peripheral interface . . . . 43--43
               Noel W. Anderson   Amorphous computer system architecture:
                                  a preliminary look . . . . . . . . . . . 51--51
              Yen-Jen Oyang and   
             Bor-Ting Chang and   
                    Shu-May Lin   A cost-effective approach to implement a
                                  long instruction word microprocessor . . 59--59
                 C. Fritsch and   
          T. Sánchez and   
                       J. Anaya   Primitive based architectures  . . . . . 73--73
                   Harold Lorin   A model for recentralization of
                                  computing: (distributed processing comes
                                  home)  . . . . . . . . . . . . . . . . . 81--81
                   Dan Teodosiu   Computing in three dimensions  . . . . . 99--99
                   Gary Frazier   Ariel: a scalable multiprocessor for the
                                  simulation of neural networks  . . . . . 107--107
              Robert P. Colwell   Book review: \em High-Level Language
                                  Computer Architecture edited by Veljko
                                  Milutinovic (Computer Science Press,
                                  1989)  . . . . . . . . . . . . . . . . . 120--122
                Behrooz Parhami   Book review: \em Advanced Research in
                                  VLSI, edited by Charles L. Seitz (The
                                  MIT Press, Cambridge, MA, 1989, 373 pp.) 122--123

ACM SIGARCH Computer Architecture News
Volume 18, Number 2, June, 1990

               Wolfgang Matthes   Hardware Resources: a generalizing view
                                  on computer architectures  . . . . . . . 7--14
       Lawrence Rauchwerger and   
            Michael P. Farmwald   A multiple floating point coprocessor
                                  architecture . . . . . . . . . . . . . . 15--24
                  Andy Glew and   
                    Wen-Mei Hwu   Snoopy cache test-and-test-and-set
                                  without excessive bus contention . . . . 25--32
                     Lee Higbee   Quick and easy cache performance
                                  analysis . . . . . . . . . . . . . . . . 33--44
                 Arvin Park and   
          Jeffrey C. Becker and   
              Richard J. Lipton   IOStone: a synthetic file system
                                  benchmark  . . . . . . . . . . . . . . . 45--52
 Dionisios N. Pnevmatikatos and   
                   Mark D. Hill   Cache performance of the integer SPEC
                                  benchmarks on a RISC . . . . . . . . . . 53--68
                A. B. Ruighaver   A modular network for dense optical
                                  interconnection of processing elements   69--75
           Alessandro De Gloria   VISA: a variable instruction set
                                  architecture . . . . . . . . . . . . . . 76--84
          Fleur L. Williams and   
               Gordon B. Steven   Address and data register separation on
                                  the M68000 family  . . . . . . . . . . . 85--89

ACM SIGARCH Computer Architecture News
Volume 18, Number 3a, June, 1990

             Sarita V. Adve and   
                   Mark D. Hill   Weak ordering---a new definition . . . . 2--14
       Kourosh Gharachorloo and   
             Daniel Lenoski and   
               James Laudon and   
            Phillip Gibbons and   
                Anoop Gupta and   
                  John Hennessy   Memory consistency and event ordering in
                                  scalable shared-memory multiprocessors   15--26
                Joonwon Lee and   
        Umakishore Ramachandran   Synchronization with multiprocessor
                                  caches . . . . . . . . . . . . . . . . . 27--37
              Po-Jen Chuang and   
                Nian-Feng Tzeng   Dynamic processor allocation in
                                  hypercube computers  . . . . . . . . . . 40--49
              Abdou Youssef and   
                    Bruce Arden   A new approach to fast control of $ r_2
                                  \times r_2 $ $3$-stage Benes networks of
                                  $ r \times r$ crossbar switches  . . . . 50--59
               William J. Dally   Virtual-channel flow control . . . . . . 60--68
             Shekhar Borkar and   
                Robert Cohn and   
                 George Cox and   
               Thomas Gross and   
                 H. T. Kung and   
                 Monica Lam and   
              Margie Levine and   
                Brian Moore and   
                 Wire Moore and   
             Craig Peterson and   
                 Jim Susman and   
                 Jim Sutton and   
              John Urbanski and   
                       Jon Webb   Supporting systolic and memory
                                  communication in iWarp . . . . . . . . . 70--81
    Gregory M. Papadopoulos and   
                David E. Culler   Monsoon: an explicit token-store
                                  architecture . . . . . . . . . . . . . . 82--91
           Marco Annaratone and   
                Marco Fillo and   
        Kiyoshi Nakabayashi and   
                   Marc Viredaz   The K2 parallel processor: architecture
                                  and hardware implementation  . . . . . . 92--101
              Anant Agarwal and   
              Beng-Hong Lim and   
                David Kranz and   
               John Kubiatowicz   APRIL: a processor architecture for
                                  multiprocessing  . . . . . . . . . . . . 104--114
            Roberto Bisiani and   
              Mosur Ravishankar   PLUS: a distributed shared-memory system 115--124
            John K. Bennett and   
             John B. Carter and   
               Willy Zwaenepoel   Adaptive software cache management for
                                  distributed shared memory architectures  125--134
            David R. Ditzel and   
           John L. Hennessy and   
               Bernie Rudin and   
             Alan Jay Smith and   
         Stephen L. Squires and   
                 Zeke Zalcstein   Big science versus little science---do
                                  you have to build it? (panel session)    136--136
          Brian W. O'Krafka and   
              A. Richard Newton   An empirical evaluation of two
                                  memory-efficient directory methods . . . 138--147
             Daniel Lenoski and   
               James Laudon and   
       Kourosh Gharachorloo and   
                Anoop Gupta and   
                  John Hennessy   The directory-based cache coherence
                                  protocol for the DASH multiprocessor . . 148--159
              Steven Przybylski   The performance impact of block sizes
                                  and fetch strategies . . . . . . . . . . 160--169
                  D. Alpert and   
                A. Averbuch and   
                     O. Danieli   Performance comparison of load/store and
                                  symmetric instruction set architectures  172--181
           Jack W. Davidson and   
               David B. Whalley   Reducing the cost of branches by using
                                  registers  . . . . . . . . . . . . . . . 182--191
               Carl E. Love and   
                Harry F. Jordan   An investigation of static versus
                                  dynamic scheduling . . . . . . . . . . . 192--201
          Dileep Bhandarkar and   
                Richard Brunner   VAX vector architecture  . . . . . . . . 204--215
            Robert W. Horst and   
          Richard L. Harris and   
              Robert L. Jardine   Multiple instruction issue in the
                                  NonStop Cyclone processor  . . . . . . . 216--226
       Shreekant S. Thakkar and   
                   Mark Sweiger   Performance of an OLTP application on
                                  symmetry multiprocessor system . . . . . 228--238
              Ding-Kai Chen and   
                Hong-Men Su and   
                  Pen-Chung Yew   The impact of synchronization and
                                  granularity on parallel systems  . . . . 239--248
       Håkon O. Bugge and   
       Ernst H. Kristiansen and   
          Bjòrn O. Bakka   Trace-driven simulations for a two-level
                                  cache design in open bus systems . . . . 250--259
              Jiun-Ming Hsu and   
            Prithviraj Banerjee   Performance measurement and trace driven
                                  simulation of parallel CAD and numeric
                                  applications on a hypercube
                                  multicomputer  . . . . . . . . . . . . . 260--269
                 Anita Borg and   
              R. E. Kessler and   
                  David W. Wall   Generation and analysis of very long
                                  address traces . . . . . . . . . . . . . 270--279
            Bruce K. Holmer and   
                Barton Sano and   
            Michael Carlton and   
              Peter Van Roy and   
              Ralph Haygood and   
            William R. Bush and   
           Alvin M. Despain and   
          Joan M. Pendleton and   
                      Tep Dobry   Fast Prolog with an extended general
                                  purpose architecture . . . . . . . . . . 282--291
               Leon Alkalaj and   
          Tomás Lang and   
              Milo\vs Ercegovac   Architectural support for the management
                                  of tightly-coupled fine-grain goals in
                                  flat concurrent Prolog . . . . . . . . . 292--301
                  Samuel Ho and   
                Lawrence Snyder   Balance in architectural design  . . . . 302--310
      A. L. Narasimha Reddy and   
            Prithviraj Banerjee   A study of I/O behavior of perfect
                                  benchmarks on a multiprocessor . . . . . 312--321
              Peter M. Chen and   
             David A. Patterson   Maximizing performance in a striped disk
                                  array  . . . . . . . . . . . . . . . . . 322--331
               Kang G. Shin and   
                    Greg Dykema   A distributed I/O architecture for HARTS 332--342
           Michael D. Smith and   
              Monica S. Lam and   
               Mark A. Horowitz   Boosting beyond static scheduling in a
                                  superscalar processor  . . . . . . . . . 344--354
              George Taylor and   
               Peter Davies and   
               Michael Farmwald   The TLB slice---a low-cost high-speed
                                  address translation mechanism  . . . . . 355--363
               Norman P. Jouppi   Improving direct-mapped cache
                                  performance by the addition of a small
                                  fully-associative cache and prefetch
                                  buffers  . . . . . . . . . . . . . . . . 364--373
         Edward S. Davidson and   
           Gurindar S. Sohl and   
           Joseph A. Fisher and   
              Greg Grohoski and   
                 Yale Pratt and   
                J. E. Smith and   
                David R. Stiles   Better than one operation per clock
                                  (panel): vectors, VLIW, and superscalar  376--376

ACM SIGARCH Computer Architecture News
Volume 18, Number 3b, September, 1990

            Robert Alverson and   
             David Callahan and   
            Daniel Cummings and   
              Brian Koblenz and   
          Allan Porterfield and   
                   Burton Smith   The Tera computer system . . . . . . . . 1--6
                   K. Hwang and   
                  M. Dubois and   
                D. K. Panda and   
                     S. Rao and   
                   S. Shang and   
                  A. Uresin and   
                     W. Mao and   
                    H. Nair and   
                  M. Lytwyn and   
                   F. Hsieh and   
                     J. Liu and   
                S. Mehrotra and   
                    C. M. Cheng   OMP: a RISC-based multiprocessor using
                                  orthogonal-access memories and multiple
                                  spanning buses . . . . . . . . . . . . . 7--22
                Kechang Dai and   
              Wolfgang K. Giloi   A basic architecture supporting LGDG
                                  computation  . . . . . . . . . . . . . . 23--33
              Sang Lyul Min and   
             Jean-Loup Baer and   
                 Hyoung-Joo Kim   An efficient caching support for
                                  critical sections in large-scale
                                  shared-memory multiprocessors  . . . . . 34--47
            Umpei Nagashima and   
            Fumio Nishimoto and   
            Takashi Shibata and   
               Hiroshi Itoh and   
                   Minoru Gotoh   An improvement of I/O function for
                                  auxiliary storage: parallel I/O for a
                                  large scale supercomputing . . . . . . . 48--59
                Nian-Feng Tzeng   Analysis of a variant hypercube topology 60--70
       P. J. van der Houwen and   
                B. P. Sommeijer   Parallel ODE solvers . . . . . . . . . . 71--81
         M. J. Daydé and   
                     I. S. Duff   Use of parallel level 3 BLAS in LU
                                  factorization on three vector
                                  multiprocessors the ALLIANT FX/80, the
                                  CRAY-2, and the IBM 3090 VF  . . . . . . 82--95
              E. N. Houstis and   
                 J. R. Rice and   
        N. P. Chrisochoides and   
         H. C. Karathanasis and   
            P. N. Papachiou and   
            M. K. Samartzis and   
              E. A. Vavalis and   
               Ko Yang Wang and   
                 S. Weerawarana   //ELLPACK: a numerical simulation
                                  programming environment for parallel
                                  MIMD machines  . . . . . . . . . . . . . 96--107
         Christina C. Christara   Schur complement preconditioned
                                  conjugate gradient methods for spline
                                  collocation equations  . . . . . . . . . 108--120
            Kuo-Liang Chung and   
            Ferng-Ching Lin and   
                  Wen-Chin Chen   Cost-optimal parallel B-spline
                                  interpolations . . . . . . . . . . . . . 121--131
                K. Gallivan and   
                   A. Sameh and   
                      Z. Zlatev   Solving general sparse linear systems
                                  using conjugate gradient-type methods    132--139
            Toshitsugu Yuba and   
             Toshio Shimada and   
        Yoshinori Yamaguchi and   
                 Kei Hiraki and   
                  Shuichi Sakai   Dataflow computer development in Japan   140--147
               Vivek Sarkar and   
                     David Cann   POSC---a partitioning and optimizing
                                  SISAL compiler . . . . . . . . . . . . . 148--164
      François Bodin and   
         François Charot   Loop optimization for horizontal
                                  microcoded machines  . . . . . . . . . . 164--176
                 Peiyi Tang and   
              Pen-Chung Yew and   
                   Chuan-Qi Zhu   Compiler techniques for data
                                  synchronization in nested parallel loops 177--186
             David E. Hudak and   
             Santosh G. Abraham   Compiler techniques for data
                                  partitioning of sequentially iterated
                                  parallel loops . . . . . . . . . . . . . 187--200
            David Klappholz and   
          Kleanthis Psarris and   
                  Xiangyun Kong   On the perfect accuracy of an
                                  approximate subscript analysis test  . . 201--212
            Allen D. Malony and   
                 Daniel A. Reed   A hardware-based performance monitor for
                                  the Intel iPSC/2 hypercube . . . . . . . 213--226
              R. T. Dimpsey and   
                     R. K. Iyer   Performance degradation due to
                                  multiprogramming and system overheads in
                                  real workloads: case study on a shared
                                  memory multiprocessor  . . . . . . . . . 227--238
                Youcef Saad and   
           Harry A. G. Wijshoff   SPARK: a benchmark package for sparse
                                  computations . . . . . . . . . . . . . . 239--253
             George Cybenko and   
                  Lyle Kipp and   
               Lynn Pointer and   
                     David Kuck   Supercomputer performance evaluation and
                                  the Perfect Benchmarks . . . . . . . . . 254--266
              Ahmed K. Noor and   
               Jeanne M. Peters   Strategies for large-scale structural
                                  problems on high-performance computers   267--280
                   V. Zecca and   
                       A. Kamel   Elastodynamics on clustered vector
                                  multiprocessors  . . . . . . . . . . . . 281--290
                Victor Eijkhout   Implementation of $5$-point/$9$-point
                                  multi-level methods on hypercube
                                  architectures  . . . . . . . . . . . . . 291--295
                 Philip C. Chen   Supercomputer-based visualization
                                  systems used for analyzing output data
                                  of a numerical weather prediction model  296--309
          Yoshizo Takahashi and   
               Shigetaka Sasaki   Parallel automated wire-routing with a
                                  number of competing processors . . . . . 310--317
                   Tony F. Chan   Hierarchical algorithms and
                                  architectures for parallel scientific
                                  computing  . . . . . . . . . . . . . . . 318--329
                Kevin Smith and   
               Bill Appelbe and   
                 Kurt Stirewalt   Incremental dependence analysis for
                                  interactive parallelization  . . . . . . 330--341
           Roland Rühl and   
               Marco Annaratone   Parallelization of FORTRAN code on
                                  distributed-memory parallel processors   342--353
          Edward H. Gornish and   
          Elana D. Granston and   
        Alexander V. Veidenbaum   Compiler-directed data prefetching in
                                  multiprocessors with memory hierarchies  354--368
               Guang R. Gao and   
          Herbert H. J. Hum and   
                  Yue-Bong Wong   Towards efficient fine-grain software
                                  pipelining . . . . . . . . . . . . . . . 369--379
Françoise André and   
           Jean-Louis Pazat and   
                   Henry Thomas   Pandore: a system to manage data
                                  distribution . . . . . . . . . . . . . . 380--388
                 Rod A. Fatoohi   Vector performance analysis of the NEC
                                  SX-2 . . . . . . . . . . . . . . . . . . 389--400
      François Bodin and   
          Daniel Windheiser and   
              William Jalby and   
              Daya Atapattu and   
                 Mannho Lee and   
                  Dennis Gannon   Performance evaluation and prediction
                                  for parallel algorithms on the BBN
                                  GP1000 . . . . . . . . . . . . . . . . . 401--413
             Luigi Brochard and   
                     Alex Freau   Designing algorithms on hierarchical
                                  memory multiprocessors . . . . . . . . . 414--427
           Ingrid Y. Bucher and   
              Donald A. Calahan   Access conflicts in multiprocessor
                                  memories queueing models and simulation
                                  studies  . . . . . . . . . . . . . . . . 428--438
               Emilio Luque and   
                 Ana Ripoll and   
  Porfidio Hernández and   
          Tomás Margalef   Impact of task duplication on
                                  static-scheduling performance in
                                  multiprocessor systems with variable
                                  execution-time tasks . . . . . . . . . . 439--446
       Apostolos Gerasoulis and   
             Sesh Venugopal and   
                       Tao Yang   Clustering task graphs for message
                                  passing architectures  . . . . . . . . . 447--456
          Edwin M. Paalvast and   
        Arjan J. van Gemund and   
                   Henk J. Sips   A method for parallel program generation
                                  with an application to the Booster
                                  language . . . . . . . . . . . . . . . . 457--469
         M. A. Tsoukarellas and   
            T. S. Papatheodorou   A run time support system for
                                  multiprocessor machines  . . . . . . . . 470--478
              Anthony J. G. Hey   Supercomputing with transputers---past,
                                  present and future . . . . . . . . . . . 479--489

ACM SIGARCH Computer Architecture News
Volume 18, Number 4, December, 1990

                   Burton Smith   The end of architecture  . . . . . . . . 10--17
                   Mark D. Hill   What is scalability? . . . . . . . . . . 18--21
                 P. A. Laplante   A novel single instruction computer
                                  architecture . . . . . . . . . . . . . . 22--26
                Ran Ginosar and   
                   Nick Michell   On the potential of asynchronous
                                  pipelined processors . . . . . . . . . . 27--34
              Yen-Jen Oyang and   
              Chun-Hung Wen and   
                Yu-Fen Chen and   
                    Shu-May Lin   The effect of employing advanced
                                  branching mechanisms in superscalar
                                  processors . . . . . . . . . . . . . . . 35--52
                Yannick Deville   A low-cost usage-based replacement
                                  algorithm for cache memories . . . . . . 52--58
             Bernard K. Gunther   A high speed mechanism for short
                                  branches . . . . . . . . . . . . . . . . 59--61
              Robert McLaughlin   Design for fast DSP machine  . . . . . . 62--66
                Werner B. Joerg   A subclass of Petri Nets as design
                                  abstraction for parallel architectures   67--77
                   Mark Thorson   Usenet Nuggets . . . . . . . . . . . . . 80--89
           Glen G. Langdon, Jr.   Book review: \em Highly Parallel
                                  Computing by George Almasi and Allan
                                  Gotlieb (Benjamin/Cummings, 1989)  . . . 90--90
           Glen G. Langdon, Jr.   Book review: \em Solving Problems on
                                  Concurrent Processors, Vol II: Software
                                  for Concurrent Processors by I. Angus,
                                  G. Fox, J. Kim, and D. Walker
                                  (Prentice-Hall, 1990)  . . . . . . . . . 90--91
                  Marc Dikotter   Book review: \em The Definition of
                                  Standard ML by R. Milner, M. Torte, R.
                                  Harper . . . . . . . . . . . . . . . . . 91--91

ACM SIGARCH Computer Architecture News
Volume 19, Number 1, March, 1991

                 F. T. Leighton   Selected Papers from the Symposium on
                                  Parallel Algorithms and Architectures    5--5
               John Y. Ngai and   
               Charles L. Seitz   A framework for adaptive routing in
                                  multicomputer networks . . . . . . . . . 6--14
             Richard Beigel and   
              Clydel P. Kruskal   Processor networks and interconnection
                                  networks without long wires (extended
                                  abstract)  . . . . . . . . . . . . . . . 15--24
                Fred Annexstein   Fault tolerance in hypercube-derivative
                                  networks (preliminary version) . . . . . 25--34
            Richard M. Fujimoto   The Virtual Time Machine . . . . . . . . 35--44
          Ginfranco Bilardi and   
            Scot W. Hornick and   
              Majid Sarrafzadeh   Optimal VLSI architectures for
                                  multidimensional DFT (preliminary
                                  version) . . . . . . . . . . . . . . . . 45--52
        Clark D. Thomborson and   
                Belle W.-Y. Wei   Systolic implementations of a
                                  move-to-front text compressor  . . . . . 53--60
          Thomas F. Knight, Jr.   Technologies for low latency
                                  interconnection switches . . . . . . . . 61--68
         Martin C. Herbordt and   
           Charles C. Weems and   
               James C. Corbett   Message-passing algorithms for a SIMD
                                  torus with coteries  . . . . . . . . . . 69--78
          S. Konstantinidou and   
                      L. Snyder   The chaos router: a practical
                                  application of randomization in network
                                  routing  . . . . . . . . . . . . . . . . 79--88
             Jehoshua Bruck and   
              Robert Cypher and   
                  Danny Soroker   Running algorithms efficiently on faulty
                                  hypercubes (extended abstract) . . . . . 89--96
                Naomi Nishimura   Asynchronous shared memory parallel
                                  computation (preliminary version)  . . . 97--105
                   M. Shand and   
                  P. Bertin and   
                   J. Vuillemin   Hardware speedups in long integer
                                  multiplication . . . . . . . . . . . . . 106--113
                Manu Thapar and   
                   Bruce Delagi   Cache coherence for large scale shared
                                  memory multiprocessors . . . . . . . . . 114--119
               Peter Grabienski   FLIP-FLOP: a stack-oriented
                                  multiprocessing system . . . . . . . . . 120--127
               Camille C. Price   Task allocation in data flow
                                  multiprocessors: an annotated
                                  bibliography . . . . . . . . . . . . . . 128--134
                  Rod Adams and   
                  Gordon Steven   A parallel pipelined processor with
                                  conditional instruction execution  . . . 135--142
                   Mark Thorson   Usenet Nuggets . . . . . . . . . . . . . 146--150
              Michael L. Hilton   Book review: \em Systems Programming in
                                  Parallel Logic Languages by Ian Foster
                                  (Prentice Hall, 1990)  . . . . . . . . . 151--151
                  Keith Anthony   Book review: \em Technology Projection
                                  Modeling of Future Computer Systems by
                                  Al Cutaia (Prentice-Hall, 1990)  . . . . 152--153
                Paul B. Schneck   Book review: \em Optimizing FORTRAN
                                  Programs by C. F. Schofield (Halstead
                                  Press, 1989) . . . . . . . . . . . . . . 153--154
                Robert Bernecky   Book review: \em Multiprocessors by
                                  Daniel Tabak (Prentice Hall, Englewood
                                  Cliffs, NJ)  . . . . . . . . . . . . . . 154--156
                Robert Bernecky   Book review: \em Multiprocessor
                                  Performance by Erol Gelenbe (J. Wiley &
                                  Sons, Chichester, England) . . . . . . . 156--157
                   John Fulcher   Book review: \em Neural Net Applications
                                  and Products by Richard K. Miller, Terri
                                  C. Walker, and Anne M. Ryan (SEAl
                                  Technical Publications, 1990)  . . . . . 157--158

ACM SIGARCH Computer Architecture News
Volume 19, Number 2, April, 1991

               Andrew Wolfe and   
                   John P. Shen   A variable instruction stream extension
                                  to the VLIW architecture . . . . . . . . 2--14
          Manolis Katevenis and   
            Nestoras Tzartzanis   Reducing the branch penalty by
                                  rearranging instructions in a
                                  double-width memory  . . . . . . . . . . 15--27
              Roland L. Lee and   
               Alex Y. Kwok and   
          Fayé A. Briggs   The floating point performance of a
                                  superscalar SPARC processor  . . . . . . 28--37
             David Callahan and   
                Ken Kennedy and   
              Allan Porterfield   Software prefetching . . . . . . . . . . 40--52
           Gurindar S. Sohi and   
                 Manoj Franklin   High-bandwidth data memory systems for
                                  superscalar processors . . . . . . . . . 53--62
              Monica D. Lam and   
         Edward E. Rothberg and   
                Michael E. Wolf   The cache performance and optimizations
                                  of blocked algorithms  . . . . . . . . . 63--74
           Jeffrey C. Mogul and   
                     Anita Borg   The effect of context switches on cache
                                  performance  . . . . . . . . . . . . . . 75--84
                   David Keppel   A portable interface for on-the-fly
                                  instruction space modification . . . . . 86--95
            Andrew W. Appel and   
                         Kai Li   Virtual memory primitives for user
                                  programs . . . . . . . . . . . . . . . . 96--107
         Thomas E. Anderson and   
              Henry M. Levy and   
           Brian N. Bershad and   
             Edward D. Lazowska   The interaction of architecture and
                                  operating system design  . . . . . . . . 108--120
           David G. Bradlee and   
            Susan J. Eggers and   
                Robert R. Henry   Integrating register allocation and
                                  instruction scheduling for RISCs . . . . 122--131
          Manuel E. Benitez and   
               Jack W. Davidson   Code generation for streaming: an
                                  access/execute mechanism . . . . . . . . 132--141
            Rajive Bagrodia and   
                  Sharad Mathur   Efficient Implementation of high-level
                                  parallel programs  . . . . . . . . . . . 142--151
     William Mangione-Smith and   
         Santosh G. Abraham and   
             Edward S. Davidson   Vector register design for polycyclic
                                  vector scheduling  . . . . . . . . . . . 154--163
            David E. Culler and   
                 Anurag Sah and   
          Klaus E. Schauser and   
        Thorsten von Eicken and   
                 John Wawrzynek   Fine-grain parallelism with minimal
                                  hardware support: a compiler-controlled
                                  threaded abstract machine  . . . . . . . 164--175
                  David W. Wall   Limits of instruction-level parallelism  176--188
              Edward K. Lee and   
                  Randy H. Katz   Performance consequences of parity
                                  placement in disk arrays . . . . . . . . 190--199
               Vincent Cate and   
                   Thomas Gross   Combining the concepts of compression
                                  and caching for a two-level filesystem   200--211
         William J. Bolosky and   
           Michael L. Scott and   
       Robert P. Fitzgerald and   
           Robert J. Fowler and   
                    Alan L. Cox   NUMA policies and their relation to
                                  memory architecture  . . . . . . . . . . 212--221
              David Chaiken and   
           John Kubiatowicz and   
                  Anant Agarwal   LimitLESS directories: a scalable cache
                                  coherence scheme . . . . . . . . . . . . 224--234
                Sang L. Min and   
                 Jong-Deok Choi   An efficient cache-based access anomaly
                                  detection scheme . . . . . . . . . . . . 235--244
       Kourosh Gharachorloo and   
                Anoop Gupta and   
                  John Hennessy   Performance evaluation of memory
                                  consistency models for shared-memory
                                  multiprocessors  . . . . . . . . . . . . 245--257
           Eric Freudenthal and   
                 Allan Gottlieb   Process coordination with
                                  fetch-and-increment  . . . . . . . . . . 260--268
     John M. Mellor-Crummey and   
               Michael L. Scott   Synchronization without contention . . . 269--278
                Douglas Johnson   The case for a read barrier  . . . . . . 279--287
           Robert F. Cmelik and   
              Shing I. Kong and   
            David R. Ditzel and   
                Edmund J. Kelly   An analysis of MIPS and SPARC
                                  instruction set utilization on the SPEC
                                  benchmarks . . . . . . . . . . . . . . . 290--302
              C. Brian Hall and   
                  Kevin O'Brien   Performance characteristics of
                                  architectural features of the IBM RISC
                                  System/6000  . . . . . . . . . . . . . . 303--309
          Dileep Bhandarkar and   
               Douglas W. Clark   Performance from architecture: comparing
                                  a RISC and a CISC with similar hardware
                                  organization . . . . . . . . . . . . . . 310--319

ACM SIGARCH Computer Architecture News
Volume 19, Number 3, May, 1991

               R. F. DeMara and   
                 D. I. Moldovan   The SNAP-1 parallel AI prototype . . . . 2--11
              Wei Siong Tan and   
                    H. Russ and   
                Cecil O. Alford   GT-EP: a novel high-performance
                                  real-time architecture . . . . . . . . . 13--21
            Tetsuya Higuchi and   
             Tatsumi Furuya and   
              Kenichi Handa and   
            Naoto Takahashi and   
         Hiroyasu Nishiyama and   
                    Akio Kokubu   IXM2: a parallel associative processor   22--31
             David R. Kaeli and   
                 Philip G. Emma   Branch history table prediction of
                                  moving target branches due to subroutine
                                  returns  . . . . . . . . . . . . . . . . 34--42
       Alexander C. Klaiber and   
                  Henry M. Levy   An architecture for software-controlled
                                  data prefetching . . . . . . . . . . . . 43--53
              John W. C. Fu and   
                 Janak H. Patel   Data prefetching in multiprocessor
                                  vector cache memories  . . . . . . . . . 54--63
               D. T. Harper III   Reducing memory contention in shared
                                  memory multiprocessors . . . . . . . . . 66--73
             B. Ramakrishna Rau   Pseudo-randomly interleaved memory . . . 74--83
                     Kai Li and   
                 Karin Petersen   Evaluation of memory system extensions   84--93
                Patrick W. Dowd   High performance interprocessor
                                  communication through optical wavelength
                                  division multiple access channels  . . . 96--105
              Anders Landin and   
             Erik Hagersten and   
                    Seif Haridi   Race-free interconnection networks and
                                  multiprocessor consistency . . . . . . . 106--115
                 Xiaola Lin and   
                   Lionel M. Ni   Deadlock-free multicast wormhole routing
                                  in multicomputer networks  . . . . . . . 116--125
            Matthew Farrens and   
                     Arvin Park   Dynamic base register caching: a
                                  technique for reducing address bus width 128--137
             O. A. Olukotun and   
                T. N. Mudge and   
                    R. B. Brown   Implementing a cache for a
                                  high-performance GaAs microprocessor . . 138--147
            Lizyamma Kurian and   
             Paul T. Hulina and   
              Lee D. Coraor and   
               Dhamir N. Mannai   Classification and performance
                                  evaluation of instruction buffering
                                  techniques . . . . . . . . . . . . . . . 150--159
          Masaitsu Nakajima and   
              Hiraku Nakano and   
          Yasuhiro Nakakura and   
           Tadahiro Yoshida and   
              Yoshiyuki Goi and   
                 Yuji Nakai and   
               Reiji Segawa and   
            Takeshi Kishida and   
                 Hiroshi Kadota   OHMEGA: a VLSI superscalar processor
                                  architecture for numerical applications  160--168
           Sriram Vajapeyam and   
           Gurindar S. Sohi and   
                  Wei-Chung Hsu   An empirical study of the CRAY Y-MP
                                  processor using the Perfect Club
                                  benchmarks . . . . . . . . . . . . . . . 170--179
            Chriss Stephens and   
             Bryce Cogswell and   
              John Heinlein and   
             Gregory Palmer and   
                   John P. Shen   Instruction level profiling and
                                  evaluation of the IBM/6000 . . . . . . . 180--189
              R. T. Dimpsey and   
                     R. K. Iyer   Performance prediction and tuning on a
                                  multiprocessor . . . . . . . . . . . . . 190--199
             C. W. Oehlrich and   
                       A. Quick   Performance evaluation of a
                                  communication system for
                                  transputer-networks based on monitored
                                  event traces . . . . . . . . . . . . . . 202--211
          S. Konstantinidou and   
                      L. Snyder   Chaos router: architecture and
                                  performance  . . . . . . . . . . . . . . 212--221
         Shridhar B. Shukla and   
              Dharma P. Agrawal   Scheduling pipelined communication in
                                  distributed memory multiprocessors for
                                  real-time applications . . . . . . . . . 222--231
             Sarita V. Adve and   
               Mark D. Hill and   
           Barton P. Miller and   
            Robert H. B. Netzer   Detecting data races on weak memory
                                  systems  . . . . . . . . . . . . . . . . 234--243
          Eric J. Koldinger and   
            Susan J. Eggers and   
                  Henry M. Levy   On the validity of trace-driven
                                  simulation for multiprocessors . . . . . 244--253
                Anoop Gupta and   
              John Hennessy and   
       Kourosh Gharachorloo and   
                 Todd Mowry and   
            Wolf-Dietrich Weber   Comparative evaluation of latency
                                  reducing and tolerating techniques . . . 254--263
             Pohua P. Chang and   
            Scott A. Mahlke and   
            William Y. Chen and   
            Nancy J. Warter and   
                 Wen-mei W. Hwu   IMPACT: an architectural framework for
                                  multiple-instruction-issue processors    266--275
             Michael Butler and   
                 Tse-Yu Yeh and   
                  Yale Patt and   
                Mitch Alsup and   
              Hunter Scales and   
               Michael Shebanow   Single instruction stream parallelism is
                                  greater than two . . . . . . . . . . . . 276--286
             Stephen Melvin and   
                      Yale Patt   Exploiting fine-grained parallelism
                                  through a combination of hardware and
                                  software techniques  . . . . . . . . . . 287--296
             Sarita V. Adve and   
             Vikram S. Adve and   
               Mark D. Hill and   
                 Mary K. Vernon   Comparison of hardware and software
                                  cache coherence schemes  . . . . . . . . 298--308
             Richard Simoni and   
                  Mark Horowitz   Modeling the performance of limited
                                  pointers directories for cache coherence 309--319
           Donna J. Quammen and   
              D. Richard Miller   Flexible register management for
                                  sequential programs  . . . . . . . . . . 320--329
           David G. Bradlee and   
            Susan J. Eggers and   
                Robert R. Henry   The effect on RISC performance of
                                  register set size and structure versus
                                  code generation strategy . . . . . . . . 330--339
    Gregory M. Papadopoulos and   
               Kenneth R. Traub   Multithreading: a revisionist view of
                                  dataflow architectures . . . . . . . . . 342--351
                Tzi-cker Chiueh   Multi-threaded vectorization . . . . . . 352--361
         Matthew K. Farrens and   
             Andrew R. Pleszkun   Strategies for achieving improved
                                  processor throughput . . . . . . . . . . 362--369
          Toyohiko Kagimasa and   
            Kikuo Takahashi and   
              Toshiaki Mori and   
              Seiichi Yoshizumi   Adaptive storage management for very
                                  large virtual/real storage systems . . . 372--379
             Judith S. Hall and   
               Paul T. Robinson   Virtualizing the VAX architecture  . . . 380--389
              Janaki Akella and   
            Daniel P. Siewiorek   Modeling and measurement of the impact
                                  of Input/Output on system performance    390--399

ACM SIGARCH Computer Architecture News
Volume 19, Number 4, June, 1991

                 Paul R. Wilson   Pointer swizzling at page fault time:
                                  efficiently supporting huge address
                                  spaces on standard hardware  . . . . . . 6--13
              Morihiro Kuga and   
           Kazuaki Murakami and   
                  Shinji Tomita   DSNS (dynamically-hazard-resolved
                                  statically-code-scheduled, nonuniform
                                  superscalar): yet another superscalar
                                  processor architecture . . . . . . . . . 14--29
                    Carl Ponder   Performance variation across benchmark
                                  suites . . . . . . . . . . . . . . . . . 30--36
            Thomas M. Conte and   
                 Wen-mei W. Hwu   A brief survey of benchmark usage in the
                                  architecture community . . . . . . . . . 37--44
             Todd D. Morris and   
            Edward F. Gehringer   A cost-effective reliable multipath
                                  interconnection network  . . . . . . . . 45--65
                 P. A. Laplante   An improved conditional branching scheme
                                  for a single instruction computer
                                  architecture . . . . . . . . . . . . . . 66--68
           Andrew J. DuBois and   
                    John Rasure   Design and evaluation of a distributed
                                  asynchronous VLSI crossbar switch
                                  controller for a packet switched
                                  supercomputer network  . . . . . . . . . 69--79
                Stanley E. Lass   The compiler controlled pack cache and
                                  messaging  . . . . . . . . . . . . . . . 80--85
               Theo Ungerer and   
             Eberhard Zehendner   A multi-level parallelism architecture   86--93
               Wolfgang Matthes   How many operation units are adequate?   94--108
           Alberto R. Cunha and   
          Carlos N. Ribeiro and   
         José A. Marques   The architecture of a memory management
                                  unit for object-oriented systems . . . . 109--116
                 Norman Matloff   An argument against scalable cache
                                  coherency  . . . . . . . . . . . . . . . 117--123
              D. P. Rodohan and   
                   R. J. Glover   An overview of the A architecture for
                                  optimisation problems in a logic
                                  programming environment  . . . . . . . . 124--131
                 Stuart C. Wray   Time-sequenced DMA for multimedia
                                  computers  . . . . . . . . . . . . . . . 132--137
         Ganesh Ramamoorthy and   
              Alok N. Choudhary   A bibliography for multiprocessor cache
                                  memories . . . . . . . . . . . . . . . . 138--153
                 Alan Jay Smith   Second bibliography on Cache memories    154--182
                   Mark Thorson   Usenet Nuggets . . . . . . . . . . . . . 185--191

ACM SIGARCH Computer Architecture News
Volume 19, Number 5, September, 1991

             David A. Patterson   Towards guidelines for SIGARCH sponsored
                                  conferences  . . . . . . . . . . . . . . 7--7
            Yeong-Chang Maa and   
          Dhiraj K. Pradhan and   
      Dominique Thiébaut   Two economical directory schemes for
                                  large-scale cache coherent
                                  multiprocessors  . . . . . . . . . . . . 10--10
                   Mark Thorson   Usenet Nuggets . . . . . . . . . . . . . 21--26
           Vladimir G. Ivanovic   Book review: \em Computation Structures
                                  by Stephen A Ward and Robert H.
                                  Halstead, Jr. (MIT Press or McGraw-Hill,
                                  1990)  . . . . . . . . . . . . . . . . . 27--29
                  Moshe Krieger   Book review: \em Multiprocessors by D.
                                  Tabak (Prentice-Hall, 1990)  . . . . . . 27--29
                   John Fulcher   Book review: \em The 68000 and 68020
                                  Microprocessors: Hardware, Software and
                                  Interfacing Techniques by W. Triebel and
                                  A. Singh (Prentice Hall, 1991) . . . . . 29--30

ACM SIGARCH Computer Architecture News
Volume 19, Number 6, December, 1991

                 Henry G. Baker   Precise instruction scheduling without a
                                  precise machine model  . . . . . . . . . 4--8
              Robert McLaughlin   Look-ahead branching hardware  . . . . . 9--11
                Thomas Beth and   
                    Volker Hatz   A restricted crossbar implementation and
                                  its applications . . . . . . . . . . . . 12--16
                   Mark Thorson   Usenet nuggets . . . . . . . . . . . . . 19--23
                Robert Bernecky   Book review: \em Past, Present,
                                  Parallel: A Survey of Available Parallel
                                  Computing Systems by Arthur Trew & Greg
                                  Wilson (Eds.), (Springer-Verlag 1991)    24--25

ACM SIGARCH Computer Architecture News
Volume 20, Number 1, March, 1992

        Jaswinder Pal Singh and   
        Wolf-Dietrich Weber and   
                    Anoop Gupta   SPLASH: Stanford parallel applications
                                  for shared-memory  . . . . . . . . . . . 5--44
                 Eligiusz Wajda   SPIRE: streaming processing with
                                  instructions release element . . . . . . 45--54
            Yannick Deville and   
                    Jean Gobert   A class of replacement policies for
                                  medium and high-associativity structures 55--64

ACM SIGARCH Computer Architecture News
Volume 20, Number 2, May, 1992

          Richard N. Zucker and   
                 Jean-Loup Baer   A performance study of memory
                                  consistency models . . . . . . . . . . . 2--12
               Pete Keleher and   
                Alan L. Cox and   
               Willy Zwaenepoel   Lazy release consistency for software
                                  distributed shared memory  . . . . . . . 13--21
       Kourosh Gharachorloo and   
                Anoop Gupta and   
                  John Hennessy   Hiding memory latency using dynamic
                                  scheduling in shared-memory
                                  multiprocessors  . . . . . . . . . . . . 22--33
       Edil S. T. Fernandes and   
         Fernando M. B. Barbosa   Effects of building blocks on the
                                  performance of super-scalar architecture 36--45
              Monica S. Lam and   
               Robert P. Wilson   Limits of control flow on parallelism    46--57
             Manoj Franklin and   
               Gurindar S. Sohi   The expandable split window paradigm for
                                  exploiting fine-grain parallelism  . . . 58--67
             Daniel Litaize and   
          Abdelaziz Mzoughi and   
         Christine Rochange and   
                 Pascal Sainrat   Towards a shared-memory massively
                                  parallel multiprocessor  . . . . . . . . 70--79
         Per Stenström and   
                 Truman Joe and   
                    Anoop Gupta   Comparative performance evaluation of
                                  cache-coherent NUMA and COMA
                                  architectures  . . . . . . . . . . . . . 80--91
             Daniel Lenoski and   
               James Laudon and   
                 Truman Joe and   
             David Nakahira and   
               Luis Stevens and   
                Anoop Gupta and   
                  John Hennessy   The DASH prototype: implementation and
                                  performance  . . . . . . . . . . . . . . 92--103
            Gideon Intrater and   
                Ilan Spillinger   Performance evaluation of a decoded
                                  instruction cache for variable
                                  instruction-length computers . . . . . . 106--113
            J. Bradley Chen and   
                 Anita Borg and   
               Norman P. Jouppi   A simulation based study of TLB
                                  performance  . . . . . . . . . . . . . . 114--123
                 Tse-Yu Yeh and   
                   Yale N. Patt   Alternative implementations of two-level
                                  adaptive branch prediction . . . . . . . 124--134
             Hiroaki Hirata and   
                Kozo Kimura and   
           Satoshi Nagamine and   
        Yoshiyuki Mochizuki and   
             Akio Nishimura and   
           Yoshimori Nakase and   
                Teiji Nishizawa   An elementary processor architecture
                                  with simultaneous instruction issuing
                                  from multiple threads  . . . . . . . . . 136--145
             Mitsuhisa Sato and   
              Yuetsu Kodama and   
              Shuichi Sakai and   
        Yoshinori Yamaguchi and   
               Yasuhito Koumura   Thread-based programming for the EM-4
                                  hybrid dataflow machine  . . . . . . . . 146--155
               R. S. Nikhil and   
         G. M. Papadopoulos and   
                         Arvind   T: a multithreaded massively parallel
                                  architecture . . . . . . . . . . . . . . 156--167
            Czarek Dubnicki and   
              Thomas J. LeBlanc   Adjustable block size coherent caches    170--180
             Kunle Olukotun and   
               Trevor Mudge and   
                  Richard Brown   Performance optimization of pipelined
                                  primary cache  . . . . . . . . . . . . . 181--190
                Scott McFarling   Cache replacement with dynamic exclusion 191--200
         Stephem W. Keckler and   
               William J. Dally   Processor coupling: integrating compile
                                  time and runtime scheduling for
                                  parallelism  . . . . . . . . . . . . . . 202--213
                 Bob Boothe and   
                 Abhiram Ranade   Improved multithreading techniques for
                                  hiding communication latency in
                                  multiprocessors  . . . . . . . . . . . . 214--223
       Alessandro De Gloria and   
               Paolo Faraboschi   Instruction-level parallelism in Prolog:
                                  analysis and architectural support . . . 224--233
            Lizyamma Kurian and   
             Paul T. Hulina and   
                  Lee D. Coraor   Memory latency effects in decoupled
                                  architectures with a single data memory
                                  module . . . . . . . . . . . . . . . . . 236--245
        André Seznec and   
                Jacques Lenfant   Interleaved parallel schemes: improving
                                  memory throughput on supercomputers  . . 246--255
        Thorsten von Eicken and   
            David E. Culler and   
       Seth Copen Goldstein and   
            Klaus Erik Schauser   Active messages: a mechanism for
                                  integrated communication and computation 256--266
            Andrew A. Chien and   
                     Jae H. Kim   Planar-adaptive routing: low-cost
                                  adaptive networks for multiprocessors    268--277
       Christopher J. Glass and   
                   Lionel M. Ni   The turn model for adaptive routing  . . 278--287
          Toshiyuki Shimizu and   
              Takeshi Horie and   
               Hiroaki Ishihata   Low-latency message communication
                                  support for the AP1000 . . . . . . . . . 288--297
           Barbara P. Aichinger   Futurebus+ as an I/O bus: profile B  . . 300--307
          A. L. Narasimha Reddy   A study of I/O system organizations  . . 308--317
                  Jai Menon and   
                   Dick Mattson   Comparison of sparing alternatives for
                                  disk arrays  . . . . . . . . . . . . . . 318--329
              Markus Siegle and   
                Richard Hofmann   Monitoring program behaviour on SUPRENUM 332--341
             Todd M. Austin and   
               Gurindar S. Sohi   Dynamic dependency analysis of ordinary
                                  programs . . . . . . . . . . . . . . . . 342--351
            Walid A. Najjar and   
           W. Marcus Miller and   
            A. P. Wim Böhm   An analysis of loop latency in dataflow
                                  execution  . . . . . . . . . . . . . . . 352--360
                  Qing Yang and   
                 Liping Wu Yang   A novel cache design for vector
                                  processing . . . . . . . . . . . . . . . 362--371
               Mateo Valero and   
          Tomás Lang and   
José M. Llabería and   
              Montse Peiron and   
      Eduard Ayguadé and   
                Juan J. Navarra   Increasing the number of strides for
                                  conflict-free vector access  . . . . . . 372--381
                    Wm. A. Wulf   Evaluation of the WM architecture  . . . 382--390
                Kirk L. Johnson   The impact of communication locality on
                                  large-scale multiprocessor performance   392--402
            Steven L. Scott and   
           James R. Goodman and   
                 Mary K. Vernon   Performance of the SCI ring  . . . . . . 403--414
        Madhusudhan Talluri and   
                 Shing Kong and   
               Mark D. Hill and   
             David A. Patterson   Tradeoffs in supporting two page sizes   415--424
                Ahmed Louri and   
                    Jongwhoa Na   Parallel electro-optical rule-based
                                  system for fast execution of expert
                                  systems (abstract) . . . . . . . . . . . 427--427
        André Seznec and   
                   Karl Courtel   OPAC (abstract): a floating-point
                                  coprocessor dedicated to compute-bound
                                  kernels  . . . . . . . . . . . . . . . . 427--427
            Der-Chung Cheng and   
                    Kanad Ghose   The time-constrained barrier
                                  synchronizer and its applications in
                                  parallel systems (abstract)  . . . . . . 428--428
                Ahmed Louri and   
                    Hongki Sung   A new compiler-directed cache coherence
                                  scheme for shared memory multiprocessors
                                  with fast and parallel explicit
                                  invalidation (abstract)  . . . . . . . . 428--428
                Gautam B. Singh   Architecture of a graphics processor
                                  (abstract) . . . . . . . . . . . . . . . 429--429
                   Ruben Yomtov   Performance evaluation of disk
                                  subsystems . . . . . . . . . . . . . . . 429--429
                 Feipei Lai and   
                Meng-chou Chang   Enhancing boosting with semantic
                                  register in a superscalar processor
                                  (abstract) . . . . . . . . . . . . . . . 430--430
                   Ivan Sklenar   Prefetch unit for vector operations on
                                  scalar computers (abstract)  . . . . . . 430--430
                    Gary Newman   Memory management support for tiled
                                  array organization (abstract)  . . . . . 431--431
            Augustus K. Uht and   
               Darin B. Johnson   Data path issues in a highly concurrent
                                  machine (abstract) . . . . . . . . . . . 431--431
         Samuel A. Fineberg and   
         Thomas L. Casavant and   
                 Brent H. Pease   Seamless --- a latency-tolerant
                                  RISC-based multiprocessor architecture
                                  (abstract) . . . . . . . . . . . . . . . 432--432
               M. A. Sayeed and   
                 M. Atiquzzaman   Performance of multiple-bus
                                  multiprocessor under non-uniform memory
                                  reference model (abstract) . . . . . . . 432--432
           M. Tahar Kechadi and   
              J-L. Dekeyser and   
                Ph. Marquet and   
                      Ph. Preux   Performance improvement for vector
                                  pipeline multiprocessor systems using a
                                  disordered execution model(abstract) . . 433--433
               Anujan Varma and   
                   Gunjan Sinha   A class of prefetch schemes for on-chip
                                  data caches  . . . . . . . . . . . . . . 433--433
              Arthur Abnous and   
              Nader Bagherzadeh   Pipelining and bypassing in a VLIW
                                  processor (abstract) . . . . . . . . . . 434--434
               Shiv Prakash and   
                Alice C. Parker   Synthesis of application-specific
                                  heterogeneous multiprocessor systems
                                  (abstract) . . . . . . . . . . . . . . . 434--434
            Matthew Farrens and   
                 Arvin Park and   
               Rob Fanfelle and   
                    Pius Ng and   
                     Gary Tyson   A partitioned translation lookaside
                                  buffer approach to reducing address
                                  bandwidth (abstract) . . . . . . . . . . 435--435
               James Laudon and   
                Anoop Gupta and   
                  Mark Horowitz   Architectural and implementation
                                  tradeoffs in the design of
                                  multiple-context processors (abstract)   435--435
           Brian D. Alleyne and   
              Isaac D. Scherson   Expanded delta networks for very large
                                  parallel computers . . . . . . . . . . . 436--436
            Jaswinder Pal Singh   Implications of hierarchical N-body
                                  methods for multiprocessor architecture  436--436
                  Wisam Michael   Directory-based cache coherency protocol
                                  for a ring-connected
                                  multiprocessor-array . . . . . . . . . . 437--437
              Wen-Hann Wang and   
                Jim Quinlan and   
                     Konrad Lai   Revisit the case for direct-mapped
                                  chaches: a case for two-way
                                  set-associative level-two caches . . . . 437--437
            David E. Culler and   
             Michial Gunter and   
                   James C. Lee   Analysis of multithreaded
                                  microprocessors under multiprogramming   438--438
          C. M. Wittenbrink and   
               A. K. Somani and   
                     C. H. Chen   Cache write generate for high
                                  performance parallel processing  . . . . 438--438
        Walter H. Burkhardt and   
                    Stefan Rust   Integrated computer architecture
                                  development system . . . . . . . . . . . 439--439

ACM SIGARCH Computer Architecture News
Volume 20, Number 3, June, 1992

                 R. J. Chevance   An evaluation methodology for
                                  microprocessor and system architecture   4--13
                  Michael Laird   A comparison of three current
                                  superscalar designs  . . . . . . . . . . 14--21
               Jack J. Dongarra   Performance of various computers using
                                  standard linear equations software . . . 22--44
      William F. Keown, Jr. and   
        Philip Koopman, Jr. and   
                  Aaron Collins   Performance of the HARRIS RTX 2000 stack
                                  architecture versus the Sun 4 SPARC and
                                  the Sun 3 M68020 Architectures . . . . . 45--52
                   Mark Thorson   Usenet Nuggets . . . . . . . . . . . . . 56--62
          Siddhartha Chalterjee   Book review: \em The Impact of Vector
                                  and Parallel Architectures on the
                                  Gaussian Elimination Algorithm by Yves
                                  Robert (Manchester University Press and
                                  Halsted Press, 1991) . . . . . . . . . . 63--64

ACM SIGARCH Computer Architecture News
Volume 20, Number 4, September, 1992

          Margarita Esponda and   
              Raúl Rojas   A graphical comparison of RISC
                                  processors . . . . . . . . . . . . . . . 2--8
                   Shogo Matsui   Dynamic refresh method for dynamic RAMs  9--16
                 Arvin Park and   
                     Ron Maeder   Codes to reduce switching transients
                                  across VLSI I/O pins . . . . . . . . . . 17--21
                    Gary Newman   Memory management support for tiled
                                  array organization . . . . . . . . . . . 22--30
          Ivan Sklená\vr   Prefetch unit for vector operations on
                                  scalar computers . . . . . . . . . . . . 31--37
               Nadeem Malik and   
      Richard J. Eickemeyer and   
           Stamatis Vassiliadis   Instruction-level parallelism from
                                  execution interlock collapsing . . . . . 38--43
       Stamatis Vassiliadis and   
                Bart Blaner and   
          Richard J. Eickemeyer   On the attributes of the SCISM
                                  organization . . . . . . . . . . . . . . 44--53
                   Mark Thorson   Usenet nuggets . . . . . . . . . . . . . 56--64
                      Ken Allen   Book review: \em Computing with Parallel
                                  Architectures: T.Node, edited by D.
                                  Gassilloud and J. C. Grossetie (Kluwer
                                  Academic Publishers 1991)  . . . . . . . 65--66

ACM SIGARCH Computer Architecture News
Volume 20, Number 5, December, 1992

              Gavin Michael and   
                   Andrew Chien   Future multicomputers: beyond minimalist
                                  multiprocessors? . . . . . . . . . . . . 6--12
              R. P. Kaushal and   
                     J. S. Bedi   Comparison of hypercube, hypernet, and
                                  symmetric hypernet architectures . . . . 13--25
                   Mark Thorson   Usenet Nuggets . . . . . . . . . . . . . 28--33
                     David Levy   Book review: \em Neural Networks and
                                  Fuzzy Systems: A Dynamical Systems
                                  Approach to Machine Intelligence by Bart
                                  Kosko (Prentice Hall 1992) . . . . . . . 34--34

ACM SIGARCH Computer Architecture News
Volume 21, Number 1, March, 1993

              Atsushi Inoue and   
                   Kenji Takeda   Performance evaluation for various
                                  configuration of superscalar processors  4--11
                Augustus K. Uht   Extraction of massive instruction level
                                  parallelism  . . . . . . . . . . . . . . 12--14
                 Nasr Ullah and   
                     Matt Holle   The MC88110 implementation of precise
                                  exceptions in a superscalar architecture 15--25
                Yannick Deville   A process-dependent partitioning
                                  strategy for cache memories  . . . . . . 26--33
                   Mark Thorson   Usenet Nuggets . . . . . . . . . . . . . 36--38
ACM SIGARCH Computer Architecture News Staff   Book reviews . . . . . . . . . . . . . . 39--39

ACM SIGARCH Computer Architecture News
Volume 21, Number 2, May, 1993

                  R. Cypher and   
                      A. Ho and   
          S. Konstantinidou and   
                     P. Messina   Architectural requirements of parallel
                                  scientific applications with explicit
                                  communication  . . . . . . . . . . . . . 2--13
            Edward Rothberg and   
        Jaswinder Pal Singh and   
                    Anoop Gupta   Working sets, cache sizes, and node
                                  granularity issues for large-scale
                                  multiprocessors  . . . . . . . . . . . . 14--26
                David Nagle and   
              Richard Uhlig and   
                Tim Stanley and   
            Stuart Sechrest and   
               Trevor Mudge and   
                  Richard Brown   Design tradeoffs for software-managed
                                  TLBs . . . . . . . . . . . . . . . . . . 27--38
                 Jerry Huck and   
                       Jim Hays   Architectural support for translation
                                  table management in large address space
                                  machines . . . . . . . . . . . . . . . . 39--50
                    Pei Cao and   
              Swee Boon Lim and   
    Shivakumar Venkataraman and   
                    John Wilkes   The TickerTAIP parallel RAID
                                  architecture . . . . . . . . . . . . . . 52--63
           Daniel Stodolsky and   
               Garth Gibson and   
                   Mark Holland   Parity logging overcoming the small
                                  write problem in redundant disk arrays   64--75
                  Jai Menon and   
                    Jim Cortney   The architecture of a fault-tolerant
                                  cached RAID controller . . . . . . . . . 76--87
              Michel Dubois and   
           Jonas Skeppstedt and   
            Livio Ricciulli and   
        Krishnan Ramamurthy and   
             Per Stenström   The detection and elimination of useless
                                  misses in multiprocessors  . . . . . . . 88--97
                Alan L. Cox and   
               Robert J. Fowler   Adaptive cache coherency for detecting
                                  migratory shared data  . . . . . . . . . 98--108
         Per Stenström and   
              Mats Brorsson and   
                  Lars Sandberg   An adaptive cache coherence protocol
                                  optimized for migratory sharing  . . . . 109--118
        Carl A. Waldspurger and   
               William E. Weihl   Register relocation: flexible contexts
                                  for multithreading . . . . . . . . . . . 120--130
               Yasuo Hidaka and   
               Hanpei Koike and   
                Hidehiko Tanaka   Multiple threads in cyclic register
                                  windows  . . . . . . . . . . . . . . . . 131--142
          Sandhya Dwarkadas and   
              Peter Keleher and   
                Alan L. Cox and   
               Willy Zwaenepoel   Evaluation of release consistent
                                  software distributed shared memory on
                                  emerging network technology  . . . . . . 144--155
              David A. Wood and   
             Satish Chandra and   
              Babak Falsafi and   
               Mark D. Hill and   
             James R. Larus and   
            Alvin R. Lebeck and   
             James C. Lewis and   
     Shubhendu S. Mukherjee and   
        Subbarao Palacharla and   
            Steven K. Reinhardt   Mechanisms for cooperative shared memory 156--167
            André Seznec   A case for two-way skewed-associative
                                  caches . . . . . . . . . . . . . . . . . 169--178
              Anant Agarwal and   
               Stephen D. Pudar   Column-associative caches: a technique
                                  for reducing the miss rate of
                                  direct-mapped caches . . . . . . . . . . 179--190
               Norman P. Jouppi   Cache write policies and performance . . 191--201
               Eric L. Boyd and   
             Edward S. Davidson   Hierarchical performance modeling with
                                  MACS: a case study of the Convex C-240   203--210
                    D. Kuck and   
                E. Davidson and   
                  D. Lawrie and   
                   A. Sameh and   
                  C. Q. Zhu and   
              A. Veidenbaum and   
                 J. Konicek and   
                     P. Yew and   
                K. Gallivan and   
                   W. Jalby and   
                H. Wijshoff and   
                 R. Bramley and   
                 U. M. Yang and   
                  P. Emrath and   
                   D. Padua and   
               R. Eigenmann and   
              J. Hoeflinger and   
                   G. Jaxon and   
                      Z. Li and   
                  T. Murphy and   
                     J. Andrews   The cedar system and an initial
                                  performance study  . . . . . . . . . . . 213--223
          Michael D. Noakes and   
         Deborah A. Wallach and   
               William J. Dally   The J-machine multicomputer: an
                                  architectural evaluation . . . . . . . . 224--235
                 John Bunda and   
                Don Fussell and   
                W. C. Athas and   
                   Roy Jenevein   16-bit vs. 32-bit instructions for
                                  pipelined microprocessors  . . . . . . . 237--246
            Tokuzo Kiyohara and   
               Scott Mahlke and   
               William Chen and   
            Roger Bringmann and   
               Richard Hank and   
                 Sadun Anik and   
                    Wen-Mei Hwu   Register connection: a new approach to
                                  adding registers into instruction set
                                  architectures  . . . . . . . . . . . . . 247--256
                 Tse-Yu Yeh and   
                   Yale N. Patt   A comparison of dynamic branch
                                  predictors that use two levels of branch
                                  history  . . . . . . . . . . . . . . . . 257--266
  Luis André Barroso and   
                  Michel Dubois   The performance of cache-coherent
                                  ring-based multiprocessors . . . . . . . 268--277
            Dean M. Tullsen and   
                Susan J. Eggers   Limitations of cache prefetching on a
                                  bus-based multiprocessor . . . . . . . . 278--288
            Maurice Herlihy and   
               J. Eliot B. Moss   Transactional memory: architectural
                                  support for lock-free data structures    289--300
              Ellen Spertus and   
       Seth Copen Goldstein and   
        Klaus Erik Schauser and   
        Thorsten von Eicken and   
            David E. Culler and   
               William J. Dally   Evaluation of mechanisms for
                                  fine-grained parallel programs in the
                                  J-machine and the CM-5 . . . . . . . . . 302--313
              Takeshi Horie and   
            Kenichi Hayashi and   
          Toshiyuki Shimizu and   
               Hiroaki Ishihata   Improving AP1000 parallel computer
                                  performance with message communication   314--325
                  W.-C. Hsu and   
                    J. E. Smith   Performance of cached DRAM organizations
                                  in vector supercomputers . . . . . . . . 327--336
                      Q. S. Gao   The Chinese remainder theorem and the
                                  prime memory system  . . . . . . . . . . 337--340
        André Seznec and   
                Jacques Lenfant   Odd memory systems may be quite
                                  interesting  . . . . . . . . . . . . . . 341--350
        Rajendra V. Boppana and   
               Suresh Chalasani   A comparison of adaptive wormhole
                                  routing algorithms . . . . . . . . . . . 351--360

ACM SIGARCH Computer Architecture News
Volume 21, Number 3, June, 1993

                Augustus K. Uht   Extraction of massive instruction level
                                  parallelism  . . . . . . . . . . . . . . 5--12
           Gowri Ramanathan and   
                      Joel Oren   Survey of commercial parallel machines   13--33
            Benjamin J. Ewy and   
                Joseph B. Evans   Secondary cache performance in RISC
                                  architecture . . . . . . . . . . . . . . 34--37
                    Iraj Danesh   Physical limitations of a computer . . . 40--45
                   Mark Thorson   Usenet nuggets . . . . . . . . . . . . . 46--49
                    Gary Fostel   Book Reviews: \em Principles of Computer
                                  Systems by Gerald M. Karam & John C.
                                  Bryant (Prentice Hall 1992)  . . . . . . 50--51
                    Gary Fostel   Book Review: \em Computer Architecture
                                  by Mario De Blasi (Addison-Wesley
                                  Publishing Company, 1990)  . . . . . . . 51--53
                   John Fulcher   Book Review: \em Practical Parallel
                                  Computing by Paul Messina and Almerico
                                  Murli, Editors (John Wiley and Sons,
                                  1992)  . . . . . . . . . . . . . . . . . 53--54

ACM SIGARCH Computer Architecture News
Volume 21, Number 4, September, 1993

               Mark D. Hill and   
             James R. Larus and   
            Alvin R. Lebeck and   
        Madhusudhan Talluri and   
                  David A. Wood   Wisconsin Architectural Research Tool
                                  Set  . . . . . . . . . . . . . . . . . . 8--10
                    Craig Hyatt   A high-performance object-oriented
                                  memory . . . . . . . . . . . . . . . . . 11--19
               Gautam Dewan and   
                  V. S. S. Nair   A case for uniform memory access
                                  multiprocessors  . . . . . . . . . . . . 20--26
                   Mark Thorson   Usenet Nuggets . . . . . . . . . . . . . 27--28
                   Glen Langdon   Book Reviews . . . . . . . . . . . . . . 29--29

ACM SIGARCH Computer Architecture News
Volume 21, Number 5, December, 1993

                  Ravi Jain and   
                 John Werth and   
                   J. C. Browne   Introduction to the Special Issue on
                                  Input/Output in Parallel Computer
                                  Systems  . . . . . . . . . . . . . . . . 5--6
           Peter F. Corbett and   
      Sandra Johnson Baylor and   
              Dror G. Feitelson   Overview of the Vesta parallel file
                                  system . . . . . . . . . . . . . . . . . 7--14
                     Z. Lin and   
                        S. Zhou   Parallelizing I/O intensive applications
                                  for a workstation cluster: a case study  15--22
             Samuel A. Fineberg   Implementing the NHT-1 application I/O
                                  benchmark  . . . . . . . . . . . . . . . 23--30
    Juan Miguel del Rosario and   
          Rajesh Bordawekar and   
                 Alok Choudhary   Improved parallel I/O via a two-phase
                                  run-time access strategy . . . . . . . . 31--38
    Shahram Ghandeharizadeh and   
              Cyrus Shahabi and   
                     Luis Ramos   An overview of techniques to support
                                  continuous retrieval of multimedia
                                  objects  . . . . . . . . . . . . . . . . 39--46
                  Ravi Jain and   
             Kiran Somalwar and   
                 John Werth and   
                   J. C. Browne   Scheduling parallel I/O operations . . . 47--54
                   Qiang Li and   
                 Naphtali Rishe   A transputer T9000 family based
                                  architecture for parallel database
                                  machines . . . . . . . . . . . . . . . . 55--62
             Claus Aßmann   A RISC processor architecture with a
                                  versatile stack system . . . . . . . . . 63--70
                     Dajin Wang   A note on ``Diagnosabilities of
                                  hypercubes under the pessimistic
                                  one-step diagnosis strategy''  . . . . . 71--78
                   Mark Thorson   Usenet Nuggets . . . . . . . . . . . . . 79--85
                   Bob Alverson   Book Review: \em High-Speed Digital
                                  Design: A Handbook of Black Magic by
                                  Howard W. Johnson and Martin Graham
                                  (Prentice-Hall, 1993)  . . . . . . . . . 85--86

ACM SIGARCH Computer Architecture News
Volume 22, Number 1, March, 1994

            Robert Iannucci and   
              Anant Agarwal and   
                 Bill Dally and   
                Anoop Gupta and   
          Greg Papadopoulos and   
                   Burton Smith   Architectural and implementation issues
                                  for multithreading (panel session I) . . 3--18
              Burt Halstead and   
             David Callahan and   
                Jack Dennis and   
               R. S. Nikhil and   
                   Vivek Sarkar   Programming, compilation, and resource
                                  management issues for multithreading
                                  (panel session II) . . . . . . . . . . . 19--33
                 Henry G. Baker   Linear logic and permutation
                                  stacks---the Forth shall be first  . . . 34--43
           Abraham Mendlson and   
          Shlomit S. Pinter and   
                Ruth Shtokhamer   Compile time instruction cache
                                  optimizations  . . . . . . . . . . . . . 44--51
               David Barach and   
               Jaspal Kohli and   
                 John Slice and   
             Marc Spaulding and   
          Rajeev Bharadhwaj and   
                 Don Hudson and   
            Cliff Neighbors and   
              Nirmal Saxena and   
                  Rolland Crunk   HALSIM---a very fast SPARC V9 behavioral
                                  model  . . . . . . . . . . . . . . . . . 52--58
                   Mark Thorson   Usenet Nuggets . . . . . . . . . . . . . 59--60
        Ewerton Longoni Madruga   Book Review: \em SNMP, SNMPv2, and CMIP:
                                  The Practical Guide to Network
                                  Management Standards by William
                                  Stallings (Addison-Wesley Publishing
                                  Company Inc. 1993) . . . . . . . . . . . 60--61

ACM SIGARCH Computer Architecture News
Volume 22, Number 2, April, 1994

                  B. Calder and   
                    D. Grunwald   Fast and accurate instruction fetch and
                                  branch prediction  . . . . . . . . . . . 2--11
              A. R. Talcott and   
                W. Yamamoto and   
              M. J. Serrano and   
                 R. C. Wood and   
                  M. Nemirovsky   The impact of unresolved branches on
                                  branch prediction scheme performance . . 12--21
              S. Palacharla and   
                  R. E. Kessler   Evaluating stream buffers as a secondary
                                  cache replacement  . . . . . . . . . . . 24--33
               N. P. Jouppi and   
                S. J. E. Wilton   Tradeoffs in two-level on-chip caching   34--45
                 A. Singhal and   
                 A. J. Goldberg   Architectural support for performance
                                  tuning: a case study on the SPARCcenter
                                  2000 . . . . . . . . . . . . . . . . . . 48--59
              Z. Cvetanovic and   
                  D. Bhandarkar   Characterization of Alpha AXP
                                  performance using TP and SPEC workloads  60--70
               C. Natarajan and   
                  S. Sharma and   
                     R. K. Iyer   Measurement-based characterization of
                                  global memory and network contention,
                                  operating system and parallelization
                                  overheads  . . . . . . . . . . . . . . . 71--80
                     T. Joe and   
                 J. L. Hennessy   Evaluating the memory overhead required
                                  for COMA architectures . . . . . . . . . 82--93
              A. C. Klaiber and   
                     H. M. Levy   A comparison of message passing and
                                  shared memory architectures for data
                                  parallel programs  . . . . . . . . . . . 94--105
                  A. L. Cox and   
               S. Dwarkadas and   
                 P. Keleher and   
                      H. Lu and   
                R. Rajamony and   
                  W. Zwaenepoel   Software versus hardware shared-memory
                                  implementation: a case study . . . . . . 106--117
        D. N. Pnevmatikatos and   
                     G. S. Sohi   Guarded execution and branch prediction
                                  in dynamic ILP processors  . . . . . . . 120--129
                    C.-L Su and   
                  A. M. Despain   Branch with masked squashing in
                                  superpipelined processors  . . . . . . . 130--140
             M. A. Blumrich and   
                      K. Li and   
                  R. Alpert and   
                C. Dubnicki and   
               E. W. Felten and   
                    J. Sandberg   Virtual memory mapped network interface
                                  for the SHRIMP multicomputer . . . . . . 142--153
              P. Steenkiste and   
                    M. Hemy and   
                 T. Mummert and   
                        B. Zill   Architecture and evaluation of a
                                  high-speed networking subsystem for
                                  distributed-memory systems . . . . . . . 154--163
               B. A. Nayfeh and   
                    K. Olukotun   Exploring the design space for a
                                  shared-cache multiprocessor  . . . . . . 166--175
                R. Thekkath and   
                   S. J. Eggers   Impact of sharing-based thread placement
                                  on multithreaded architectures . . . . . 176--186
                F. Dahlgren and   
                  M. Dubois and   
              P. Stenström   Combined performance gains of simple
                                  cache protocol extensions  . . . . . . . 187--197
                A. S. Huang and   
              G. Slavenburg and   
                     J. P. Shen   Speculative disambiguation: a
                                  compilation technique for dynamic memory
                                  disambiguation . . . . . . . . . . . . . 200--210
               K. I. Farkas and   
                   N. P. Jouppi   Complexity/performance tradeoffs with
                                  non-blocking loads . . . . . . . . . . . 211--222
                 T.-F. Chen and   
                     J.-L. Baer   A performance study of software and
                                  hardware data prefetching schemes  . . . 223--232
              A. L. Drapeau and   
             K. W. Shirriff and   
              J. H. Hartman and   
               E. L. Miller and   
                  S. Seshan and   
                 R. H. Katz and   
                    K. Lutz and   
            D. A. Patterson and   
                  E. K. Lee and   
                 P. M. Chen and   
                   G. A. Gibson   RAID-II: a high-bandwidth network file
                                  server . . . . . . . . . . . . . . . . . 234--244
                   M. Blaum and   
                   J. Brady and   
                   J. Bruck and   
                       J. Menon   EVENODD: an optimal scheme for
                                  tolerating double disk failures in RAID
                                  architectures  . . . . . . . . . . . . . 245--254
                       S. W. Ng   Crosshatch disk array for improved
                                  reliability and performance  . . . . . . 255--264
                   A. DeHon and   
                   F. Chong and   
                  M. Becker and   
                   E. Egozy and   
                  H. Minsky and   
                  S. Peretz and   
              T. F. Knight, Jr.   METRO: a router architecture for
                                  high-performance, short-haul routing
                                  networks . . . . . . . . . . . . . . . . 266--277
                J. D. Allen and   
              P. T. Gaughan and   
             D. E. Schimmel and   
                S. Yalamanchili   Ariadne---an adaptive router for
                                  fault-tolerant multicomputers  . . . . . 278--288
                  J. H. Kim and   
                     Z. Liu and   
                    A. A. Chien   Compressionless routing: a framework for
                                  adaptive and fault-tolerant routing  . . 289--300
                  J. Kuskin and   
                   D. Ofelt and   
                M. Heinrich and   
                J. Heinlein and   
                  R. Simoni and   
            K. Gharachorloo and   
                  J. Chapin and   
                D. Nakahira and   
                  J. Baxter and   
                M. Horowitz and   
                   A. Gupta and   
               M. Rosenblum and   
                    J. Hennessy   The Stanford FLASH multiprocessor  . . . 302--313
                 D. Chaiken and   
                     A. Agarwal   Software-extended coherent shared
                                  memory: performance and cost . . . . . . 314--324
            S. K. Reinhardt and   
                J. R. Larus and   
                     D. A. Wood   Tempest and Typhoon: user-level shared
                                  memory . . . . . . . . . . . . . . . . . 325--336
                 M. Farrens and   
                   G. Tyson and   
                 A. R. Pleszkun   A study of single-chip processor/cache
                                  organizations for large numbers of
                                  transistors  . . . . . . . . . . . . . . 338--347
                 C.-H. Chen and   
                   A. K. Somani   A unified architectural tradeoff
                                  methodology  . . . . . . . . . . . . . . 348--357
                   D. Nagle and   
                   R. Uhlig and   
                   T. Mudge and   
                    S. Sechrest   Optimal allocation of on-chip memory for
                                  multiple-API operating systems . . . . . 358--369
                    R. W. Quong   Expected I-cache miss rates via the gap
                                  model  . . . . . . . . . . . . . . . . . 372--383
                      A. Seznec   Decoupled sectored caches: conciliating
                                  low tag implementation cost  . . . . . . 384--393

ACM SIGARCH Computer Architecture News
Volume 22, Number 3, June, 1994

                     J. R. Gurd   Supercomputing: big bang or steady state
                                  growth?  . . . . . . . . . . . . . . . . 3--13
              Kay P. Litchfield   Instruction execution sequence
                                  confirmation . . . . . . . . . . . . . . 14--18
                 Phil Allen and   
               Franc Brglez and   
                 Hal Carter and   
             Robert Caverly and   
              Jerry Dillion and   
                  Albert Lo and   
                  Ron Lomax and   
              John Oldfield and   
                 Cesar Pina and   
                T. J. Wilkinson   Report of the 1993 Workshop on Rapid
                                  Prototyping of Microelectronic Systems
                                  for Universities . . . . . . . . . . . . 19--26
                   Mark Thorson   Usenet Nuggets . . . . . . . . . . . . . 27--28
        Ewerton Longoni Madruga   Book Review: \em Internetworking with
                                  TCP/IP, vol. III: Client-Server
                                  programming and applications (BSD
                                  Sockets version) by Douglas E. Comer and
                                  David L. Stevens (Prentice-Hall, 1993)   29--30

ACM SIGARCH Computer Architecture News
Volume 22, Number 4, September, 1994

                  Ravi Jain and   
                 John Werth and   
                   J. C. Browne   Special Issue on Input/Output in
                                  Parallel Computer Systems: Introduction  3--4
      Sandra Johnson Baylor and   
        Caroline Benveniste and   
                     Yarsun Hsu   Performance evaluation of a massively
                                  parallel I/O subsystem . . . . . . . . . 5--10
          James B. Sinclair and   
                   Jay Tang and   
                Peter J. Varman   Instability in parallel I/O systems  . . 11--16
      Steven H. Vanderleest and   
            Ravishankar K. Iyer   Measurement of I/O bus contention and
                                  correlation among heterogeneous device
                                  types in a single-bus multiprocessor
                                  system . . . . . . . . . . . . . . . . . 17--22
              Rajeev Thakur and   
          Rajesh Bordawekar and   
                 Alok Choudhary   Compilation of out-of-core data parallel
                                  programs for distributed memory machines 23--28
             Abhaya Asthana and   
              Mark Cravatts and   
              Paul Krzyzanowski   An experimental active memory based I/O
                                  subsystem  . . . . . . . . . . . . . . . 29--34
              Dannie Durand and   
                  Ravi Jain and   
                 David Tseytlin   Distributed scheduling algorithms to
                                  improve the performance of parallel data
                                  transfers  . . . . . . . . . . . . . . . 35--40
                   Haruo Yokota   DR-nets: data-reconstruction networks
                                  for highly reliable parallel-disk
                                  systems  . . . . . . . . . . . . . . . . 41--46
              Martti J. Forsell   Are multiport memories physically
                                  feasible?  . . . . . . . . . . . . . . . 47--54
            Ghulam Chaudhry and   
                    Xuechang Li   A case for the multithreaded processor
                                  architecture . . . . . . . . . . . . . . 55--59
                   Yin Chan and   
           Ashok Sudarsanam and   
                   Andrew Wolfe   The effect of compiler-flag tuning on
                                  SPEC benchmark performance . . . . . . . 60--70
                 Jin-Ho Lee and   
              Min-Young Lee and   
              Seong-Uk Choi and   
                Myong-Soon Park   Reducing cache conflicts in data cache
                                  prefetching  . . . . . . . . . . . . . . 71--77
                   Mark Thorson   Usenet Nuggets . . . . . . . . . . . . . 78--81

ACM SIGARCH Computer Architecture News
Volume 22, Number 5, December, 1994

              Martti J. Forsell   Are multiport memories physically
                                  feasible?  . . . . . . . . . . . . . . . 3--10
                    Rok Sosi\vc   History cache: hardware support for
                                  reverse execution  . . . . . . . . . . . 11--18
               Mark D. Hill and   
             James R. Larus and   
                  David A. Wood   The Wisconsin Wind Tunnel project: an
                                  annotated bibliography . . . . . . . . . 19--26
                Avijit Saha and   
                   Nadeem Malik   Distributed directory tags . . . . . . . 27--29
            Ishaq H. Unwala and   
               Harvey G. Cragon   A study of MIPS programs . . . . . . . . 30--40
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 41--46
         Kenneth R. Ohnemus and   
                Diana F. Mallin   Benefits of implementing on-line methods
                                  and procedures . . . . . . . . . . . . . 49--55
       Daniel K. Cunningham and   
               Steven J. Reilly   Leading the design team---the evolution
                                  of the technical writer from a support
                                  role to a design role  . . . . . . . . . 56--60
                    Ann Rockley   Multimedia: towards an electronic
                                  performance support system . . . . . . . 61--65
              Katherine E. Drew   Telecommunicators and telecommuters:
                                  making multiple-site documentation
                                  projects work  . . . . . . . . . . . . . 66--75

ACM SIGARCH Computer Architecture News
Volume 23, Number 1, March, 1995

             Aimee Severson and   
                   Brent Nelson   Throughput in a counterflow pipeline
                                  processor  . . . . . . . . . . . . . . . 5--12
             Tsong-Chih Hsu and   
                  Sheng-De Wang   A simple architecture for constant time
                                  sorting machines . . . . . . . . . . . . 13--19
                Wm. A. Wulf and   
                 Sally A. McKee   Hitting the memory wall: implications of
                                  the obvious  . . . . . . . . . . . . . . 20--24
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 25--28

ACM SIGARCH Computer Architecture News
Volume 23, Number 2, May, 1995

              Anant Agarwal and   
          Ricardo Bianchini and   
              David Chaiken and   
            Kirk L. Johnson and   
                David Kranz and   
           John Kubiatowicz and   
              Beng-Hong Lim and   
          Kenneth Mackenzie and   
                   Donald Yeung   The MIT Alewife machine: architecture
                                  and performance  . . . . . . . . . . . . 2--13
              Yuetsu Kodama and   
            Hirohumi Sakane and   
             Mitsuhisa Sato and   
              Hayato Yamana and   
              Shuichi Sakai and   
            Yoshinori Yamaguchi   The EM-X parallel computer: architecture
                                  and basic performance  . . . . . . . . . 14--23
         Steven Cameron Woo and   
            Moriyoshi Ohara and   
                Evan Torrie and   
        Jaswinder Pal Singh and   
                    Anoop Gupta   The SPLASH-2 programs: characterization
                                  and methodological considerations  . . . 24--36
          Håkan Grahn and   
             Per Stenström   Efficient strategies for software-only
                                  protocols in shared-memory
                                  multiprocessors  . . . . . . . . . . . . 38--47
            Alvin R. Lebeck and   
                  David A. Wood   Dynamic self-invalidation: reducing
                                  coherence overhead in shared-memory
                                  multiprocessors  . . . . . . . . . . . . 48--59
               Fredrik Dahlgren   Boosting the performance of hybrid
                                  snooping cache protocols . . . . . . . . 60--69
        Andreas G. Nowatzyk and   
          Michael C. Browne and   
            Edmund J. Kelly and   
                 Michael Parkin   S-connect: from networks of workstations
                                  to supercomputer performance . . . . . . 71--82
               Anujan Varma and   
                 Quinn Jacobson   Destage algorithms for disk arrays with
                                  non-volatile caches  . . . . . . . . . . 83--95
               Gordon Stoll and   
                    Bin Wei and   
              Douglas Clark and   
           Edward W. Felten and   
                     Kai Li and   
               Patrick Hanrahan   Evaluating multi-port frame buffer
                                  designs for a mesh-connected
                                  multicomputer  . . . . . . . . . . . . . 96--105
        Andreas G. Nowatzyk and   
                Paul R. Prucnal   Are crossbars really dead?: the case for
                                  optical multiprocessor interconnect
                                  systems  . . . . . . . . . . . . . . . . 106--115
     Stéphan Jourdan and   
             Pascal Sainrat and   
                 Daniel Litaize   Exploring configurations of functional
                                  units in an out-of-order superscalar
                                  processor  . . . . . . . . . . . . . . . 117--125
                Hideki Ando and   
          Chikako Nakanishi and   
               Tetsuya Hara and   
                   Masao Nakaya   Unconstrained speculative execution with
                                  predicated state buffering . . . . . . . 126--137
            Scott A. Mahlke and   
            Richard E. Hank and   
         James E. McCormick and   
            David I. August and   
                 Wen-Mei W. Hwu   A comparison of full and partial
                                  predicated execution support for ILP
                                  processors . . . . . . . . . . . . . . . 138--150
                  M. Simone and   
                   A. Essen and   
                     A. Ike and   
          A. Krishnamoorthy and   
                T. Maruyama and   
                  N. Patkar and   
               M. Ramaswami and   
                M. Shebanow and   
         V. Thirumalaiswamy and   
                       D. Tovey   Implementation trade-offs in using a
                                  restricted data flow architecture in a
                                  high performance RISC microprocessor . . 151--162
              Trung A. Diep and   
         Christopher Nelson and   
                 John Paul Shen   Performance evaluation of the PowerPC
                                  620 microarchitecture  . . . . . . . . . 163--174
          Theodore H. Romer and   
           Wayne H. Ohlrich and   
             Anna R. Karlin and   
               Brian N. Bershad   Reducing TLB and memory overhead using
                                  online superpage promotion . . . . . . . 176--187
                Zheng Zhang and   
                Josep Torrellas   Speeding up irregular applications in
                                  shared-memory multiprocessors: memory
                                  binding and group prefetching  . . . . . 188--199
                K. V. Anjan and   
          Timothy Mark Pinkston   An efficient, fully adaptive deadlock
                                  recovery scheme: DISHA . . . . . . . . . 201--210
               Kang G. Shin and   
               Stuart W. Daniel   Analysis and implementation of hybrid
                                  switching  . . . . . . . . . . . . . . . 211--219
              Binh Vien Dao and   
                 Jose Duato and   
          Sudhakar Yalamanchili   Configurable flow control mechanisms for
                                  fault-tolerant routing . . . . . . . . . 220--229
           Timothy Callahan and   
           Seth Copen Goldstein   NIFDY: a low overhead, high throughput
                                  network interface  . . . . . . . . . . . 230--241
              Montse Peiron and   
               Mateo Valero and   
      Eduard Ayguadé and   
              Tomás Lang   Vector multiprocessors with arbitrated
                                  memory access  . . . . . . . . . . . . . 243--252
            Krishna M. Kavi and   
               A. R. Hurson and   
             Phenil Patadia and   
          Elizabeth Abraham and   
            Ponnarasu Shanmugam   Design of cache memories for
                                  multi-threaded dataflow architecture . . 253--264
      François Bodin and   
            André Seznec   Skewed associativity enhances
                                  performance predictability . . . . . . . 265--274
                Cliff Young and   
               Nicolas Gloy and   
               Michael D. Smith   A comparative analysis of schemes for
                                  correlated branch prediction . . . . . . 276--286
                Brad Calder and   
                  Dirk Grunwald   Next cache line and set prediction . . . 287--296
           Vijay Karamcheti and   
                Andrew A. Chien   A comparison of architectural support
                                  for messaging in the TMC CM-5 and the
                                  Cray T3D . . . . . . . . . . . . . . . . 298--307
                T. Stricker and   
                       T. Gross   Optimizing memory system performance for
                                  communication in parallel computers  . . 308--319
            Remzi H. Arpaci and   
            David E. Culler and   
       Arvind Krishnamurthy and   
         Steve G. Steinberg and   
               Katherine Yelick   Empirical evaluation of the CRAY-T$3$D:
                                  a compiler perspective . . . . . . . . . 320--331
            Thomas M. Conte and   
         Kishore N. Menezes and   
           Patrick M. Mills and   
                Burzin A. Patel   Optimization of instruction fetch
                                  mechanisms for high issue rates  . . . . 333--344
              Richard Uhlig and   
                David Nagle and   
               Trevor Mudge and   
            Stuart Sechrest and   
                      Joel Emer   Instruction fetching: coping with code
                                  bloat  . . . . . . . . . . . . . . . . . 345--356
                 Dennis Lee and   
             Jean-Loup Baer and   
                Brad Calder and   
                  Dirk Grunwald   Instruction cache fetch policies for
                                  speculative execution  . . . . . . . . . 357--367
             Todd M. Austin and   
 Dionisios N. Pnevmatikatos and   
               Gurindar S. Sohi   Streamlining data cache access with fast
                                  address calculation  . . . . . . . . . . 369--380
                  Hong Wang and   
                   Tong Sun and   
                      Qing Yang   CAT---caching address tags: a technique
                                  for reducing area cost of on-chip caches 381--390
            Dean M. Tullsen and   
            Susan J. Eggers and   
                  Henry M. Levy   Simultaneous multithreading: maximizing
                                  on-chip parallelism  . . . . . . . . . . 392--403
              Richard C. Ho and   
                C. Han Yang and   
           Mark A. Horowitz and   
                  David L. Dill   Architecture validation for processors   404--413
           Gurindar S. Sohi and   
            Scott E. Breach and   
               T. N. Vijaykumar   Multiscalar processors . . . . . . . . . 414--425

ACM SIGARCH Computer Architecture News
Volume 23, Number 3, June, 1995

               Carl J. Beckmann   HTGL: a program modelling language . . . 3--10
             Jean-Louis Lafitte   On structured data handling in parallel
                                  processing . . . . . . . . . . . . . . . 11--18
                      B. Ulmann   o$ \mu $-EP-1: a simple 32-bit
                                  architecture . . . . . . . . . . . . . . 19--24
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 25--27
                   Daniel Tabak   \em Cache and Memory Hierarchy Design: A
                                  Performance-Directed Approach by Steven
                                  A. Przybylski  . . . . . . . . . . . . . 28--28

ACM SIGARCH Computer Architecture News
Volume 23, Number 4, September, 1995

              Maurice V. Wilkes   The memory wall and the CMOS end-point   4--6
                Eric E. Johnson   Graffiti on ``the memory wall''  . . . . 7--8
                    Tariq Afzal   Performance modeling using the Motorola
                                  PowerPC timing simulator . . . . . . . . 9--18
                Behrooz Parhami   SIMD machines: do they have a
                                  significant future?  . . . . . . . . . . 19--22
                  Ravi Jain and   
                     John Werth   Airdisks and airRAID (expanded extract):
                                  modeling and scheduling periodic
                                  wireless data broadcast  . . . . . . . . 23--28
 Leonidas I. Kontothanassis and   
               Michael L. Scott   Efficient shared memory with minimal
                                  hardware support . . . . . . . . . . . . 29--35

ACM SIGARCH Computer Architecture News
Volume 23, Number 5, December, 1995

        Michael K. Gschwind and   
              Thomas J. Pietsch   Vector prefetching . . . . . . . . . . . 1--7
                Ramesh K. Karne   Object-oriented computer architectures
                                  for new generation of applications . . . 8--19
                 Humayun Khalid   The unconventional replacement
                                  algorithms . . . . . . . . . . . . . . . 20--26
                 Humayun Khalid   A trace-driven simulation methodology    27--33
           Nikki Mirghafori and   
             Margret Jacoby and   
                David Patterson   Truth in SPEC benchmarks . . . . . . . . 34--42
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 43--44

ACM SIGARCH Computer Architecture News
Volume 24, Number 1, March, 1996

                   Trevor Mudge   Report on the panel: ``How Can Computer
                                  Architecture Researchers Avoid Becoming
                                  the Society for Irreproducible
                                  Results?'' . . . . . . . . . . . . . . . 1--5
              Oh-Young Kwon and   
                 Gi-Ho Park and   
                   Tack-Don Han   A compiler optimization to reduce
                                  execution time of loop nest  . . . . . . 6--11
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 12--16
                   Daniel Tabak   Book Review: \em Alpha Implementations
                                  and Architecture by Dileep P. Bhandarkar 17--18

ACM SIGARCH Computer Architecture News
Volume 24, Number 2, May, 1996

               Marius Evers and   
              Po-Yung Chang and   
                   Yale N. Patt   Using hybrid branch predictors to
                                  improve branch prediction accuracy in
                                  the presence of context switches . . . . 3--11
               Nicolas Gloy and   
                Cliff Young and   
            J. Bradley Chen and   
               Michael D. Smith   An analysis of dynamic branch prediction
                                  schemes on system workloads  . . . . . . 12--21
            Stuart Sechrest and   
             Chih-Chieh Lee and   
                   Trevor Mudge   Correlation and aliasing in dynamic
                                  branch predictors  . . . . . . . . . . . 22--32
        Steven K. Reinhardt and   
            Robert W. Pfile and   
                  David A. Wood   Decoupled hardware support for
                                  distributed shared memory  . . . . . . . 34--43
               Donald Yeung and   
           John Kubiatowicz and   
                  Anant Agarwal   MGS: a multigrain shared memory system   44--55
            Christine Morin and   
             Alain Gefflaut and   
       Michel Banâtre and   
           Anne-Marie Kermarrec   COMA: an opportunity for building
                                  fault-tolerant scalable shared memory
                                  multiprocessors  . . . . . . . . . . . . 56--65
            Basem A. Nayfeh and   
              Lance Hammond and   
                 Kunle Olukotun   Evaluation of design alternatives for a
                                  multiprocessor microprocessor  . . . . . 67--77
                Doug Burger and   
           James R. Goodman and   
                Alain Kägi   Memory bandwidth limitations of future
                                  microprocessors  . . . . . . . . . . . . 78--89
           Ashley Saulsbury and   
                  Fong Pong and   
               Andreas Nowatzyk   Missing the memory wall: the case for
                                  processor/memory integration . . . . . . 90--101
            André Seznec   Don't use the page number, but a pointer
                                  to it  . . . . . . . . . . . . . . . . . 104--113
                  Toni Juan and   
          Tomás Lang and   
                Juan J. Navarro   The difference-bit cache . . . . . . . . 114--120
               Liviu Iftode and   
        Jaswinder Pal Singh and   
                         Kai Li   Understanding application performance on
                                  shared virtual memory systems  . . . . . 122--133
                 Chris Holt and   
        Jaswinder Pal Singh and   
                  John Hennessy   Application and architectural
                                  bottlenecks in large scale distributed
                                  shared memory machines . . . . . . . . . 134--145
          Kenneth M. Wilson and   
             Kunle Olukotun and   
               Mendel Rosenblum   Increasing cache port efficiency for
                                  dynamic superscalar microprocessors  . . 147--157
             Todd M. Austin and   
               Gurindar S. Sohi   High-bandwidth address translation for
                                  multiple-issue processors  . . . . . . . 158--167
                  Yiming Hu and   
                      Qing Yang   DCD---disk caching disk: a new approach
                                  for boosting I/O performance . . . . . . 169--178
           Olivier Maquelin and   
               Guang R. Gao and   
          Herbert H. J. Hum and   
          Kevin B. Theobald and   
                   Xin-Min Tian   Polling watchdog: combining polling and
                                  interrupts for efficient message
                                  handling . . . . . . . . . . . . . . . . 179--188
            Dean M. Tullsen and   
            Susan J. Eggers and   
               Joel S. Emer and   
              Henry M. Levy and   
                 Jack L. Lo and   
               Rebecca L. Stamm   Exploiting choice: instruction fetch and
                                  issue on an implementable simultaneous
                                  multithreading processor . . . . . . . . 191--202
      Richard J. Eickemeyer and   
            Ross E. Johnson and   
           Steven R. Kunkel and   
         Mark S. Squillante and   
                    Shiafun Liu   Evaluation of multithreaded
                                  uniprocessors for commercial application
                                  environments . . . . . . . . . . . . . . 203--212
               Tetsuya Hara and   
                Hideki Ando and   
          Chikako Nakanishi and   
                   Masao Nakaya   Performance comparison of ILP machines
                                  with cycle time evaluation . . . . . . . 213--224
                 Jae H. Kim and   
                Andrew A. Chien   Rotating combined queueing (RCQ):
                                  bandwidth and latency guarantees in
                                  low-cost, high-performance networks  . . 226--236
           Jennifer Rexford and   
                  John Hall and   
                   Kang G. Shin   A router architecture for real-time
                                  point-to-point networks  . . . . . . . . 237--246
     Shubhendu S. Mukherjee and   
              Babak Falsafi and   
               Mark D. Hill and   
                  David A. Wood   Coherent network interfaces for
                                  fine-grain communication . . . . . . . . 247--258
              Mark Horowitz and   
         Margaret Martonosi and   
              Todd C. Mowry and   
               Michael D. Smith   Informing memory operations: providing
                                  memory performance feedback in modern
                                  processors . . . . . . . . . . . . . . . 260--270
                   Chun Xia and   
                Josep Torrellas   Instruction prefetching of systems codes
                                  with layout optimized for reduced cache
                                  misses . . . . . . . . . . . . . . . . . 271--282
                  Lynn Choi and   
                  Pen-Chung Yew   Compiler and hardware support for cache
                                  coherence in large-scale
                                  multiprocessors: design considerations
                                  and performance study  . . . . . . . . . 283--294
           Edward W. Felten and   
          Richard D. Alpert and   
              Angelos Bilas and   
       Matthias A. Blumrich and   
           Douglas W. Clark and   
     Stefanos N. Damianakis and   
            Cezary Dubnicki and   
               Liviu Iftode and   
                         Kai Li   Early experience with message-passing on
                                  the SHRIMP multicomputer . . . . . . . . 296--307
                 Tom Lovett and   
                  Russell Clapp   STiNG: a CC-NUMA computer system for the
                                  commercial marketplace . . . . . . . . . 308--317

ACM SIGARCH Computer Architecture News
Volume 24, Number 3, June, 1996

               J. Carretero and   
            F. Pérez and   
               P. de Miguel and   
           F. García and   
                      L. Alonso   A massively parallel and distributed I/O
                                  subsystem  . . . . . . . . . . . . . . . 1--8
            W. B. Ligon III and   
       Daniel C. Stanzione, Jr.   Distributing and load-balancing for
                                  loops in scientific applications . . . . 9--17
            Samson Belayneh and   
                 David R. Kaeli   A discussion on non-blocking/lockup-free
                                  caches . . . . . . . . . . . . . . . . . 18--25
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 26--32

ACM SIGARCH Computer Architecture News
Volume 24, Number 4, September, 1996

Gerard Páez-Monzón and   
Charles Páez-Monzón   The RISC processor DMN-6: a unified
                                  data-control flow architecture . . . . . 3--10
  J. A. Gómez Pulido and   
J. M. Sánchez Pérez and   
            J. A. Moreno Zamora   An educational tool for testing
                                  hierarchical multilevel caches . . . . . 11--15
            Samson Belayneh and   
                 David R. Kaeli   A discussion on non-blocking/lockup-free
                                  caches . . . . . . . . . . . . . . . . . 16--16
                 Mark Rosenbaum   Architectural potholes . . . . . . . . . 17--18
                    John Mashey   Architectural potholes . . . . . . . . . 18--18
               Adrian Cockcroft   I/O potholes . . . . . . . . . . . . . . 18--19
                  Zahir Ebrahim   I/O potholes . . . . . . . . . . . . . . 19--20
                   Brad Carlile   Interpreting benchmarks  . . . . . . . . 20--21
                    David Chase   Register windows . . . . . . . . . . . . 21--21
                 Paul W. DeMone   Register windows and delay slots . . . . 21--22

ACM SIGARCH Computer Architecture News
Volume 24, Number 5, December, 1996

           Charlton D. Rose and   
              J. Kelly Flanagan   Constructing instruction traces from
                                  cache-filtered address traces (CITCAT)   1--8
             Susan Flynn Hummel   Efficient data sharing with conditional
                                  remote memory transfers  . . . . . . . . 9--17
              Larry Widigen and   
            Elliot Sowadsky and   
                  Kevin McGrath   Eliminating operand read latency . . . . 18--22
               Philip Machanick   The case for SRAM main memory  . . . . . 23--30

ACM SIGARCH Computer Architecture News
Volume 25, Number 1, March, 1997

              Dileep Bhandarkar   RISC versus CISC: a tale of two chips    1--12
           I. Martín and   
                      F. Tirado   A SIMD computer for multigrid methods    13--18
               Reinhold Weicker   On the use of SPEC benchmarks in
                                  computer architecture research . . . . . 19--22
         Shubhendu S. Mukherjee   What should graduate students know
                                  before joining a large computer
                                  architecture project?  . . . . . . . . . 23--26
                 Humayun Khalid   A new cache replacement scheme based on
                                  backpropagation neural networks  . . . . 27--33
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 34--36

ACM SIGARCH Computer Architecture News
Volume 25, Number 2, May, 1997

           Sriram Vajapeyam and   
                   Tulika Mitra   Improving superscalar instruction
                                  dispatch and issue by exploiting dynamic
                                  code sequences . . . . . . . . . . . . . 1--12
                  Ravi Nair and   
              Martin E. Hopkins   Exploiting instruction level parallelism
                                  in processors by caching scheduled
                                  groups . . . . . . . . . . . . . . . . . 13--25
           Kemal Ebcio\uglu and   
                 Erik R. Altman   DAISY: dynamic compilation for 100%
                                  architectural compatibility  . . . . . . 26--37
      Timothy Mark Pinkston and   
         Sugath Warnakulasuriya   On deadlocks in interconnection networks 38--49
           Craig B. Stunkel and   
             Rajeev Sivaram and   
           Dhabaleswar K. Panda   Implementing multidestination worms in
                                  switch-based parallel systems:
                                  architectural alternatives and their
                                  impact . . . . . . . . . . . . . . . . . 50--61
       Guillermo A. Alvarez and   
         Walter A. Burkhard and   
                Flaviu Cristian   Tolerating multiple failures in RAID
                                  architectures with optimal storage and
                                  uniform declustering . . . . . . . . . . 62--72
               Dan Teodosiu and   
                Joel Baxter and   
              Kinshuk Govil and   
                John Chapin and   
           Mendel Rosenblum and   
                  Mark Horowitz   Hardware fault containment in scalable
                                  shared-memory multiprocessors  . . . . . 73--84
          Richard P. Martin and   
             Amin M. Vahdat and   
            David E. Culler and   
             Thomas E. Anderson   Effects of communication latency,
                                  overhead, and bandwidth in a cluster
                                  architecture . . . . . . . . . . . . . . 85--97
        Wolf-Dietrich Weber and   
               Stephen Gold and   
                Pat Helland and   
            Takeshi Shimizu and   
               Thomas Wicki and   
                Winfried Wilcke   The Mercury Interconnect Architecture: a
                                  cost-effective infrastructure for
                                  high-performance servers . . . . . . . . 98--107
            Ziyad S. Hakura and   
                    Anoop Gupta   The design and analysis of a cache
                                  architecture for texture mapping . . . . 108--120
          Kenneth M. Wilson and   
                 Kunle Olukotun   Designing high bandwidth on-chip caches  121--132
            Keith I. Farkas and   
                  Paul Chow and   
           Norman P. Jouppi and   
                Zvonko Vranesic   Memory-system design considerations for
                                  dynamically-scheduled processors . . . . 133--143
  Parthasarathy Ranganathan and   
               Vijay S. Pai and   
          Hazim Abdel-Shafi and   
                 Sarita V. Adve   The interaction of software prefetching
                                  with ILP processors in shared-memory
                                  systems  . . . . . . . . . . . . . . . . 144--156
    Leonidas Kontothanassis and   
                 Galen Hunt and   
               Robert Stets and   
       Nikolaos Hardavellas and   
           Micha\l Cierniak and   
   Srinivasan Parthasarathy and   
          Wagner Meira, Jr. and   
          Sandhya Dwarkadas and   
                  Michael Scott   VM-based shared memory on low-latency,
                                  remote-memory-access networks  . . . . . 157--169
            Alain Kägi and   
                Doug Burger and   
               James R. Goodman   Efficient synchronization: let them eat
                                  QOLB . . . . . . . . . . . . . . . . . . 170--180
           Andreas Moshovos and   
            Scott E. Breach and   
           T. N. Vijaykumar and   
               Gurindar S. Sohi   Dynamic speculation and synchronization
                                  of data dependences  . . . . . . . . . . 181--193
             Avinash Sodani and   
               Gurindar S. Sohi   Dynamic instruction reuse  . . . . . . . 194--205
        Subbarao Palacharla and   
           Norman P. Jouppi and   
                    J. E. Smith   Complexity-effective superscalar
                                  processors . . . . . . . . . . . . . . . 206--218
           Maged M. Michael and   
           Ashwini K. Nanda and   
              Beng-Hong Lim and   
               Michael L. Scott   Coherence controller architectures for
                                  SMP-based CC-NUMA multiprocessors  . . . 219--228
              Babak Falsafi and   
                  David A. Wood   Reactive NUMA: a design for unifying
                                  S-COMA and CC-NUMA . . . . . . . . . . . 229--240
               James Laudon and   
                 Daniel Lenoski   The SGI Origin: a ccNUMA highly scalable
                                  server . . . . . . . . . . . . . . . . . 241--251
                Doug Joseph and   
                  Dirk Grunwald   Prefetching using Markov predictors  . . 252--263
            Vatsa Santhanam and   
          Edward H. Gornish and   
                  Wei-Chung Hsu   Data prefetching on the HP PA-8000 . . . 264--273
              Po-Yung Chang and   
                   Eric Hao and   
                   Yale N. Patt   Target prediction for indirect jumps . . 274--283
              Eric Sprangle and   
         Robert S. Chappell and   
                Mitch Alsup and   
                   Yale N. Patt   The agree predictor: a mechanism for
                                  reducing negative branch history
                                  interference . . . . . . . . . . . . . . 284--291
             Pierre Michaud and   
        André Seznec and   
                  Richard Uhlig   Trading conflict and capacity aliasing
                                  in conditional branch predictors . . . . 292--303
                  Joel Emer and   
                   Nikolas Gloy   A language for describing predictors and
                                  its application to automatic synthesis   304--314
          Teresa L. Johnson and   
                 Wen-mei W. Hwu   Run-time adaptive cache hierarchy
                                  management via reference analysis  . . . 315--326
              Richard Fromm and   
       Stylianos Perissakis and   
              Neal Cardwell and   
     Christoforos Kozyrakis and   
             Bruce McGaughy and   
            David Patterson and   
               Tom Anderson and   
               Katherine Yelick   The energy efficiency of IRAM
                                  architectures  . . . . . . . . . . . . . 327--337
                Doug Burger and   
           Stefanos Kaxiras and   
               James R. Goodman   DataScalar architectures . . . . . . . . 338--349

ACM SIGARCH Computer Architecture News
Volume 25, Number 3, June, 1997

             Maurice Wilkes and   
                  Andrew Hopper   The collapsed LAN: a solution to a
                                  bandwidth problem? . . . . . . . . . . . 1--5
              Tommi Jokinen and   
                  Chia-Jiu Wang   Cache design with path balancing table,
                                  skewing and indirect tags  . . . . . . . 6--12
                Doug Burger and   
                 Todd M. Austin   The SimpleScalar tool set, version 2.0   13--25
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 26--27

ACM SIGARCH Computer Architecture News
Volume 25, Number 4, September, 1997

           Rodney Van Meter and   
                  Greg Finn and   
                 Steve Hotz and   
                      Dave Dyer   Response to the collapsed LAN  . . . . . 1--12
                   Weiwu Hu and   
                      Peisu Xia   Out-of-order execution in sequentially
                                  consistent shared-memory systems . . . . 3--10
                 Humayun Khalid   A novel trace sampling technique . . . . 11--16
                 Humayun Khalid   Performance of the KORA-2 cache
                                  replacement scheme . . . . . . . . . . . 17--21
                D. N. Jutla and   
                     P. Bodorik   Improving applications performance: a
                                  memory model and cache architecture  . . 22--29
                      B. Ulmann   NICE: an elegant and powerful 32-bit
                                  architecture . . . . . . . . . . . . . . 30--35
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 36--41

ACM SIGARCH Computer Architecture News
Volume 25, Number 5, December, 1997

               Vijay S. Pai and   
  Parthasarathy Ranganathan and   
                 Sarita V. Adve   RSIM: Rice simulator for ILP
                                  multiprocessors  . . . . . . . . . . . . 1--1
                Weisong Shi and   
                   Weiwu Hu and   
                       Ming Zhu   An innovative implementation for
                                  directory-based cache coherence in
                                  shared memory multiprocessors  . . . . . 2--9
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 10--14

ACM SIGARCH Computer Architecture News
Volume 26, Number 1, March, 1998

                      B. Ulmann   Instruction looping, an extension to
                                  conditional execution  . . . . . . . . . 3--4
         Günter Haring and   
        Christoph Lindemann and   
                  Martin Reiser   International workshop performance
                                  evaluation --- origins and directions    5--6
                 Wes Munsil and   
                  Chia-Jiu Wang   Reducing stack usage in Java bytecode
                                  execution  . . . . . . . . . . . . . . . 7--11
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 12--17

ACM SIGARCH Computer Architecture News
Volume 26, Number 2, May, 1998

                 Mayan Moudgill   Techniques for fast simulation of
                                  associative cache directories  . . . . . 1--8
           Byung-Kwon Chung and   
                  Jih-Kwon Peir   LRU-based column-associative caches  . . 9--17
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 18--22

ACM SIGARCH Computer Architecture News
Volume 26, Number 3, June, 1998

  Luiz André Barroso and   
       Kourosh Gharachorloo and   
                Edouard Bugnion   Memory system characterization of
                                  commercial workloads . . . . . . . . . . 3--14
            Kimberly Keeton and   
         David A. Patterson and   
              Yong Qiang He and   
           Roger C. Raphael and   
                Walter E. Baker   Performance characterization of a Quad
                                  Pentium Pro SMP using OLTP workloads . . 15--26
              Dennis C. Lee and   
         Patrick J. Crowley and   
             Jean-Loup Baer and   
         Thomas E. Anderson and   
               Brian N. Bershad   Execution characteristics of desktop
                                  applications on Windows NT . . . . . . . 27--38
                 Jack L. Lo and   
  Luiz André Barroso and   
            Susan J. Eggers and   
       Kourosh Gharachorloo and   
              Henry M. Levy and   
                Sujay S. Parekh   An analysis of database workload
                                  performance on simultaneous
                                  multithreaded processors . . . . . . . . 39--50
               Marius Evers and   
            Sanjay J. Patel and   
         Robert S. Chappell and   
                   Yale N. Patt   An analysis of correlation and
                                  predictability: what makes two-level
                                  branch predictors work . . . . . . . . . 52--61
           Eitan Federovsky and   
                 Meir Feder and   
                  Sholomo Weiss   Branch prediction based on universal
                                  data compression algorithms  . . . . . . 62--72
         Yiannakis Sazeides and   
                 James E. Smith   Modeling program predictability  . . . . 73--84
                Michael Cox and   
          Narendra Bhandari and   
                 Michael Shantz   Multi-level texture caching for $3$D
                                  graphics hardware  . . . . . . . . . . . 86--97
                Hans Eberle and   
                   Erwin Oertli   Switcherland: a QoS communication
                                  architecture for workstation clusters    98--108
       Guillermo A. Alvarez and   
         Walter A. Burkhard and   
        Larry J. Stockmeyer and   
                Flaviu Cristian   Declustered disk array architectures
                                  with optimal and near-optimal
                                  parallelism  . . . . . . . . . . . . . . 109--120
              Dirk Grunwald and   
              Artur Klauser and   
             Srilatha Manne and   
                Andrew Pleszkun   Confidence estimation for speculation
                                  control  . . . . . . . . . . . . . . . . 122--131
             Srilatha Manne and   
              Artur Klauser and   
                  Dirk Grunwald   Pipeline gating: speculation control for
                                  energy reduction . . . . . . . . . . . . 132--141
          George Z. Chrysos and   
                   Joel S. Emer   Memory dependence prediction using store
                                  sets . . . . . . . . . . . . . . . . . . 142--153
                  Toni Juan and   
            Sanji Sanjeevan and   
                Juan J. Navarro   Dynamic history-length fitting: a third
                                  level of adaptivity for branch
                                  prediction . . . . . . . . . . . . . . . 155--166
              Karel Driesen and   
                Urs Hölzle   Accurate indirect branch prediction  . . 167--178
     Shubhendu S. Mukherjee and   
                   Mark D. Hill   Using prediction to accelerate coherence
                                  protocols  . . . . . . . . . . . . . . . 179--190
                 Mark Oskin and   
          Frederic T. Chong and   
               Timothy Sherwood   Active pages: a computation model for
                                  intelligent memory . . . . . . . . . . . 192--203
               Mark Swanson and   
              Leigh Stoller and   
                    John Carter   Increasing TLB reach using superpages
                                  backed by shadow memory  . . . . . . . . 204--213
               Xiaogang Qiu and   
                  Michel Dubois   Options for dynamic address translation
                                  in COMAs . . . . . . . . . . . . . . . . 214--225
            David I. August and   
          Daniel A. Connors and   
            Scott A. Mahlke and   
               John W. Sias and   
           Kevin M. Crozier and   
            Ben-Chung Cheng and   
           Patrick R. Eaton and   
          Qudus B. Olaniran and   
                 Wen-mei W. Hwu   Integrated predicated and speculative
                                  execution in the IMPACT EPIC
                                  architecture . . . . . . . . . . . . . . 227--237
             Steven Wallace and   
                Brad Calder and   
                Dean M. Tullsen   Threaded multiple path execution . . . . 238--249
              Artur Klauser and   
         Abhijit Paithankar and   
                  Dirk Grunwald   Selective eager execution on the
                                  PolyPath architecture  . . . . . . . . . 250--259
         Sanjay Jeram Patel and   
               Marius Evers and   
                   Yale N. Patt   Improving trace cache effectiveness with
                                  branch promotion and trace packing . . . 262--271
              Freddy Gabbay and   
                  Avi Mendelson   The effect of instruction fetch
                                  bandwidth on value prediction  . . . . . 272--281
              David H. Albonesi   Dynamic IPC/clock rate optimization  . . 282--292
               Yinong Zhang and   
            George B. Adams III   Performance modeling and code
                                  partitioning for the DS architecture . . 293--304
         Stephen W. Keckler and   
           William J. Dally and   
              Daniel Maskit and   
         Nicholas P. Carter and   
               Andrew Chang and   
                    Whay S. Lee   Exploiting fine-grain thread level
                                  parallelism on the MIT multi-ALU
                                  processor  . . . . . . . . . . . . . . . 306--317
          Gheith A. Abandah and   
             Edward S. Davidson   Effects of architectural and
                                  technological advances on the HP/Convex
                                  Exemplar's memory and communication
                                  performance  . . . . . . . . . . . . . . 318--329
       Matthias A. Blumrich and   
          Richard D. Alpert and   
                 Yuqun Chen and   
           Douglas W. Clark and   
     Stefanos N. Damianakis and   
            Cezary Dubnicki and   
           Edward W. Felten and   
               Liviu Iftode and   
                     Kai Li and   
         Margaret Martonosi and   
             Robert A. Shillner   Design choices in the SHRIMP system: an
                                  empirical study  . . . . . . . . . . . . 330--341
Vijayaraghavan Soundararajan and   
              Mark Heinrich and   
               Ben Verghese and   
       Kourosh Gharachorloo and   
                Anoop Gupta and   
                  John Hennessy   Flexible use of memory for
                                  replication/migration in cache-coherent
                                  DSM multiprocessors  . . . . . . . . . . 342--355
              Sanjeev Kumar and   
          Christopher Wilkerson   Exploiting spatial locality in data
                                  caches using spatial footprints  . . . . 357--368
           William L. Lynch and   
            Gary Lauterbach and   
             Joseph I. Chamdani   Low load latency through sum-addressed
                                  memory (SAM) . . . . . . . . . . . . . . 369--379
            Daniel J. Sorin and   
               Vijay S. Pai and   
             Sarita V. Adve and   
             Mary K. Vernon and   
                  David A. Wood   Analytic evaluation of shared-memory
                                  systems with ILP processors  . . . . . . 380--391

ACM SIGARCH Computer Architecture News
Volume 26, Number 4, September, 1998

            Prasad N. Golla and   
                    Eric C. Lin   A comparison of the effect of branch
                                  prediction on multithreaded and scalar
                                  architectures  . . . . . . . . . . . . . 3--11
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 12--16

ACM SIGARCH Computer Architecture News
Volume 26, Number 5, December, 1998

               Philip Machanick   Streaming vs. latency in information
                                  mass-transit . . . . . . . . . . . . . . 4--6
             Jean-Louis Lafitte   A generalized mapping device to help
                                  memory latency . . . . . . . . . . . . . 7--13
              Farooq Ashraf and   
        Mostafa Abd-El-Barr and   
                Khalid Al-Tawil   Introduction to routing in multicomputer
                                  networks . . . . . . . . . . . . . . . . 14--21
                    Dick Wilmot   Data threaded microarchitecture  . . . . 22--32

ACM SIGARCH Computer Architecture News
Volume 27, Number 1, March, 1999

                     C. K. Yuen   Stack and RISC . . . . . . . . . . . . . 3--9
          Sandra Johnson Baylor   Unified scalable shared memory
                                  architectures  . . . . . . . . . . . . . 10--21
             Anthony DeWitt and   
                   Thomas Gross   The potential of thread-level
                                  speculation based on value profiling . . 22--22
          John Kalamatianos and   
                 David R. Kaeli   Improving the accuracy of indirect
                                  branch prediction via branch
                                  classification . . . . . . . . . . . . . 23--26
            Roy Dz-ching Ju and   
Jean-François Collard and   
                   Karim Oukbir   Probabilistic memory disambiguation and
                                  its application to data speculation  . . 27--30
         Matthew A. Postiff and   
            David A. Greene and   
              Gary S. Tyson and   
                Trevor N. Mudge   The limits of instruction level
                                  parallelism in SPEC95 applications . . . 31--34
             Byung-Sun Yang and   
                 Junpyo Lee and   
                Jinpyo Park and   
              Soo-Mook Moon and   
           Kemal Ebcio\uglu and   
                    Erik Altman   Lightweight monitor for Java VM  . . . . 35--38
                   Amit Rao and   
                  Santosh Pande   Storage assignment using expression tree
                                  transformations to generate compact and
                                  efficient DSP code . . . . . . . . . . . 39--42
  Krisztián Flautner and   
              Gary S. Tyson and   
                   Trevor Mudge   A high level simulator integrated with
                                  the Mirv compiler  . . . . . . . . . . . 43--46
            H. Cassé and   
           L. Féraud and   
                C. Rochange and   
                     P. Sainrat   Using the abstract interpretation
                                  technique for static pointer analysis    47--50
                 Iris Bahar and   
                Brad Calder and   
                  Dirk Grunwald   A comparison of software code reordering
                                  and victim buffers . . . . . . . . . . . 51--54
                 Steve Carr and   
                  Philip Sweany   Improving software pipelining with
                                  hardware support for self-spatial loads  55--58

ACM SIGARCH Computer Architecture News
Volume 27, Number 2, May, 1999

               Rajeev Barua and   
                 Walter Lee and   
          Saman Amarasinghe and   
                  Anant Agarwal   Maps: a compiler-managed memory system
                                  for raw machines . . . . . . . . . . . . 4--15
           Sriram Vajapeyam and   
               P. J. Joseph and   
                   Tulika Mitra   Dynamic vectorization: a mechanism for
                                  exploiting far-flung ILP in ordinary
                                  programs . . . . . . . . . . . . . . . . 16--27
       Seth Copen Goldstein and   
              Herman Schmit and   
                Matthew Moe and   
                Mihai Budiu and   
            Srihari Cadambi and   
             R. Reed Taylor and   
                  Ronald Laufer   PipeRench: a co/processor for streaming
                                  multimedia acceleration  . . . . . . . . 28--39
                   Adi Yoaz and   
                Mattan Erez and   
                Ronny Ronen and   
                Stephan Jourdan   Speculation techniques for improving
                                  load related instruction scheduling  . . 42--53
           Michael Bekerman and   
            Stephan Jourdan and   
                Ronny Ronen and   
          Gilad Kirshenboim and   
             Lihu Rappoport and   
                   Adi Yoaz and   
                     Uri Weiser   Correlated load-address predictors . . . 54--63
                Brad Calder and   
              Glenn Reinman and   
                Dean M. Tullsen   Selective value prediction . . . . . . . 64--74
               Xiaogang Qiu and   
                  Michel Dubois   Tolerating late memory traps in ILP
                                  processors . . . . . . . . . . . . . . . 76--87
              Chi-Keung Luk and   
                  Todd C. Mowry   Memory forwarding: enabling aggressive
                                  layout optimizations by guaranteeing the
                                  safety of data relocation  . . . . . . . 88--99
               Sangyeun Cho and   
              Pen-Chung Yew and   
                    Gyungho Lee   Decoupling local variable accesses in a
                                  wide-issue superscalar processor . . . . 100--110
                  Amir Roth and   
               Gurindar S. Sohi   Effective jump-pointer prefetching for
                                  linked data structures . . . . . . . . . 111--121
  Parthasarathy Ranganathan and   
                Sarita Adve and   
               Norman P. Jouppi   Performance of image and video
                                  processing with general-purpose
                                  processors and media ISA extensions  . . 124--135
          Matthew C. Merten and   
            Andrew R. Trick and   
      Christopher N. George and   
         John C. Gyllenhaal and   
                 Wen-mei W. Hwu   A hardware-driven profiling scheme for
                                  identifying program hot spots to support
                                  runtime optimization . . . . . . . . . . 136--147
               Xiaowei Shen and   
                     Arvind and   
                  Larry Rudolph   Commit-reconcile & fences (CRF): a new
                                  memory model for architects and compiler
                                  writers  . . . . . . . . . . . . . . . . 150--161
               Chris Gniady and   
              Babak Falsafi and   
               T. N. Vijaykumar   Is SC + ILP = RC?  . . . . . . . . . . . 162--171
                An-Chow Lai and   
                  Babak Falsafi   Memory sharing predictor: the key to a
                                  speculative coherent DSM . . . . . . . . 172--183
         Robert S. Chappell and   
                Jared Stark and   
            Sangwook P. Kim and   
        Steven K. Reinhardt and   
                   Yale N. Patt   Simultaneous subordinate microthreading
                                  (SSMT) . . . . . . . . . . . . . . . . . 186--195
                Bryan Black and   
           Bohuslav Rychlik and   
                 John Paul Shen   The block-based trace cache  . . . . . . 196--207
            David I. August and   
               John W. Sias and   
        Jean-Michel Puiatti and   
            Scott A. Mahlke and   
          Daniel A. Connors and   
           Kevin M. Crozier and   
                 Wen-mei W. Hwu   The program decision logic approach to
                                  predicated execution . . . . . . . . . . 208--219
               Vinodh Cuppu and   
                Bruce Jacob and   
                Brian Davis and   
                   Trevor Mudge   A performance comparison of contemporary
                                  DRAM architectures . . . . . . . . . . . 222--233
              Glenn Reinman and   
                Todd Austin and   
                    Brad Calder   A scalable front-end architecture for
                                  fast instruction delivery  . . . . . . . 234--245
               Seongwoo Kim and   
                 Arun K. Somani   Area efficient architectures for
                                  information integrity in cache memories  246--255
                Tarun Nakra and   
                Rajiv Gupta and   
                 Mary Lou Soffa   Value prediction in VLIW machines  . . . 258--269
            Dean M. Tullsen and   
                   John S. Seng   Storageless value prediction using prior
                                  register values  . . . . . . . . . . . . 270--279
              Angelos Bilas and   
                 Cheng Liao and   
            Jaswinder Pal Singh   Using network interface support to avoid
                                  asynchronous protocol processing in
                                  shared virtual memory systems  . . . . . 282--293
             E. Ender Bilir and   
            Ross M. Dickson and   
                    Ying Hu and   
               Manoj Plakal and   
            Daniel J. Sorin and   
               Mark D. Hill and   
                  David A. Wood   Multicast snooping: a new coherence
                                  method using a multicast address network 294--304
             Dongming Jiang and   
            Jaswinder Pal Singh   Scaling application performance on a
                                  cache-coherent multiprocessor  . . . . . 305--316

ACM SIGARCH Computer Architecture News
Volume 27, Number 3, June, 1999

                      Anonymous   In memoriam---SIGARCH founder: Caxton C.
                                  Foster . . . . . . . . . . . . . . . . . 1--3
             Seung H. Hwang and   
                   Gwan S. Choi   Selective-set-invalidation (SSI) for
                                  soft-error-resilient cache architecture  4--9
                 Peng Cheng and   
                    Hai Jin and   
                Jiangling Zhang   Design of high performance RAID in
                                  real-time system . . . . . . . . . . . . 10--17
                     C. K. Yuen   Architectural support for the cache
                                  based vector computation . . . . . . . . 18--23
                Benjamin Driker   Disbursed control computer architecture  24--31
                 Humayun Khalid   Performance evaluation of multimedia
                                  systems with MPEG-2 bitstreams . . . . . 32--37
                 Humayun Khalid   A methodology for performance evaluation
                                  of systems with large emulation code . . 38--42
                 Humayun Khalid   Tracing multimedia benchmarks with five
                                  degrees of validation  . . . . . . . . . 43--48
                 Humayun Khalid   Performance evaluation of two operating
                                  systems  . . . . . . . . . . . . . . . . 49--52
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 53--60

ACM SIGARCH Computer Architecture News
Volume 27, Number 4, September, 1999

              Phillip Machanick   Correction to RAMpage ASPOLOS paper  . . 2--5
          H. S. Shahhoseini and   
                  M. Naderi and   
                      S. Nemati   Achieving the best performance on
                                  superscalar processors . . . . . . . . . 6--11
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 12--14

ACM SIGARCH Computer Architecture News
Volume 27, Number 5, December, 1999

               Marc Torrant and   
           Muhammad Shaaban and   
           Roy Czernikowski and   
                        Ken Hsu   A simultaneous multithreading simulator  1--5
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 6--10

ACM SIGARCH Computer Architecture News
Volume 28, Number 1, March, 2000

                    Min Dai and   
        Christine Eisenbeis and   
           Sid-Ahmed-Ali Touati   Load-store optimization for software
                                  pipelining . . . . . . . . . . . . . . . 3--10
            Philippe Clauss and   
              Beno\^\it Meister   Automatic memory layout transformations
                                  to optimize spatial locality in
                                  parameterized loop nests . . . . . . . . 11--19
           Barbara Kreaseck and   
               Dean Tullsen and   
                    Brad Calder   Limits of task-based parallelism in
                                  irregular applications . . . . . . . . . 20--20
                 Junpyo Lee and   
             Byung-Sun Yang and   
                 Suhyun Kim and   
           Kemal Ebcio\uglu and   
                Erik Altman and   
                Seungil Lee and   
               Yoo C. Chung and   
               Heungbok Lee and   
               Je Hyung Lee and   
                  Soo-Mook Moon   Reducing virtual call overheads in a
                                  Java VM just-in-time compiler  . . . . . 21--33
               Chris Sadler and   
        Sandeep K. S. Gupta and   
                   Rohit Bhatia   Applying predication to efficiently
                                  handle runtime class testing . . . . . . 34--42
             Nerina Bermudo and   
                Xavier Vera and   
    Antonio González and   
                    Josep Llosa   Optimizing cache miss equations
                                  polyhedra  . . . . . . . . . . . . . . . 43--52
                   A. Unger and   
               E. Zehendner and   
                    Th. Ungerer   A combined compiler and architecture
                                  technique to control multithreaded
                                  execution of branches and loop
                                  iterations . . . . . . . . . . . . . . . 53--61
                Hakan Aydin and   
                    David Kaeli   Using cache line coloring to perform
                                  aggressive procedure inlining  . . . . . 62--71
             Akhilesh Tyagi and   
                    Gyungho Lee   A compiler optimization paradigm for
                                  dynamic energy management  . . . . . . . 72--76
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 77--78

ACM SIGARCH Computer Architecture News
Volume 28, Number 2, May, 2000

        J. Greggory Steffan and   
     Christopher B. Colohan and   
               Antonia Zhai and   
                  Todd C. Mowry   A scalable approach to thread-level
                                  speculation  . . . . . . . . . . . . . . 1--12
             Marcelo Cintra and   
José F. Martínez and   
                Josep Torrellas   Architectural support for scalable
                                  speculative parallelization in
                                  shared-memory multiprocessors  . . . . . 13--24
        Steven K. Reinhardt and   
         Shubhendu S. Mukherjee   Transient fault detection via
                                  simultaneous multithreading  . . . . . . 25--36
             Quinn Jacobson and   
                 James E. Smith   Trace preconstruction  . . . . . . . . . 37--46
                Ryan Rakvic and   
                Bryan Black and   
                 John Paul Shen   Completion time multiple branch
                                  prediction for enhancing trace cache
                                  performance  . . . . . . . . . . . . . . 47--58
          Matthew C. Merten and   
            Andrew R. Trick and   
            Erik M. Nystrom and   
           Ronald D. Barnes and   
                 Wen-mei W. Hmu   A hardware mechanism for dynamic
                                  extraction and relayout of program hot
                                  spots  . . . . . . . . . . . . . . . . . 59--70
                 Mark Oskin and   
          Frederic T. Chong and   
                Matthew Farrens   HLS: combining statistical and symbolic
                                  simulation to guide microprocessor
                                  designs  . . . . . . . . . . . . . . . . 71--82
               David Brooks and   
               Vivek Tiwari and   
             Margaret Martonosi   Wattch: a framework for
                                  architectural-level power analysis and
                                  optimizations  . . . . . . . . . . . . . 83--94
           N. Vijaykrishnan and   
                M. Kandemir and   
                M. J. Irwin and   
                  H. S. Kim and   
                          W. Ye   Energy-driven integrated
                                  hardware-software optimizations using
                                  SimplePower  . . . . . . . . . . . . . . 95--106
            Erik G. Hallnor and   
            Steven K. Reinhardt   A fully associative software-managed
                                  cache design . . . . . . . . . . . . . . 107--116
           Ashley Saulsbury and   
           Fredrik Dahlgren and   
             Per Stenström   Recency-based TLB preloading . . . . . . 117--127
               Scott Rixner and   
           William J. Dally and   
            Ujval J. Kapasi and   
              Peter Mattson and   
                  John D. Owens   Memory access scheduling . . . . . . . . 128--138
                An-Chow Lai and   
                  Babak Falsafi   Selective, accurate, and timely
                                  self-invalidation using last-touch
                                  prediction . . . . . . . . . . . . . . . 139--148
                Norman Margolus   An embedded DRAM architecture for
                                  large-scale spatial-lattice computations 149--160
                    Ken Mai and   
                 Tim Paaske and   
             Nuwan Jayasena and   
                     Ron Ho and   
           William J. Dally and   
                  Mark Horowitz   Smart Memories: a modular reconfigurable
                                  architecture . . . . . . . . . . . . . . 161--171
            Craig B. Zilles and   
               Gurindar S. Sohi   Understanding the backward slices of
                                  performance degrading instructions . . . 172--181
             Kevin M. Lepak and   
               Mikko H. Lipasti   On the value locality of store
                                  instructions . . . . . . . . . . . . . . 182--191
           Zarka Cvetanovic and   
                  R. E. Kessler   Performance analysis of the Alpha
                                  21264-based Compaq ES40 system . . . . . 192--202
           Paolo Faraboschi and   
             Geoffrey Brown and   
           Joseph A. Fisher and   
            Giuseppe Desoli and   
                  Fred Homewood   Lx: a technology platform for
                                  customizable VLIW embedded processing    203--213
  Parthasarathy Ranganathan and   
                Sarita Adve and   
               Norman P. Jouppi   Reconfigurable caches and their
                                  application to media processing  . . . . 214--224
                Zhi Alex Ye and   
           Andreas Moshovos and   
                Scott Hauck and   
            Prithviraj Banerjee   CHIMAERA: a high-performance
                                  architecture with a tightly-coupled
                                  reconfigurable functional unit . . . . . 225--235
              Dana S. Henry and   
        Bradley C. Kuszmaul and   
             Gabriel H. Loh and   
                     Rahul Sami   Circuits for wide-window superscalar
                                  processors . . . . . . . . . . . . . . . 236--247
              Vikas Agarwal and   
           M. S. Hrishikesh and   
         Stephen W. Keckler and   
                    Doug Burger   Clock rate versus IPC: the end of the
                                  road for conventional microarchitectures 248--259
                J. E. Smith and   
                Greg Faanes and   
                  Rabin Sugumar   Vector instruction set support for
                                  conditional operations . . . . . . . . . 260--269
                  Yuan Chou and   
                 John Paul Shen   Instruction path coprocessors  . . . . . 270--281
  Luiz André Barroso and   
       Kourosh Gharachorloo and   
            Robert McNamara and   
           Andreas Nowatzyk and   
                Shaz Qadeer and   
                Barton Sano and   
                Scott Smith and   
               Robert Stets and   
                   Ben Verghese   Piranha: a scalable architecture based
                                  on single-chip multiprocessing . . . . . 282--293
       Ramesh Radhakrishnan and   
            Deependra Talla and   
               Lizy Kurian John   Allowing for ILP in an embedded Java
                                  processor  . . . . . . . . . . . . . . . 294--305
           Michael Bekerman and   
                   Adi Yoaz and   
              Freddy Gabbay and   
            Stephan Jourdan and   
               Maxim Kalaev and   
                    Ronny Ronen   Early load address resolution via
                                  register tracking  . . . . . . . . . . . 306--315
   José-Lorenzo Cruz and   
    Antonio González and   
               Mateo Valero and   
                Nigel P. Topham   Multiple-banked register file
                                  architectures  . . . . . . . . . . . . . 316--325

ACM SIGARCH Computer Architecture News
Volume 28, Number 3, June, 2000

Benjamín Sahelices Fernández and   
   Diego R. Llanos Ferraris and   
Agustín de Dios Hernández   Exploiting parallelism in a network of
                                  workstations using COMA-BC . . . . . . . 1--8
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 9--13

ACM SIGARCH Computer Architecture News
Volume 28, Number 4, September, 2000

             Jean-Louis Lafitte   Regarding a device to help battering the
                                  RAM wall . . . . . . . . . . . . . . . . 4--10
                   S. Petit and   
                  J. A. Gil and   
              J. Sahuquillo and   
                        A. Pont   LIDE: a simulation environment for
                                  shared virtual memory systems  . . . . . 11--18

ACM SIGARCH Computer Architecture News
Volume 28, Number 5, December, 2000

        Steven W. Schlosser and   
       John Linwood Griffin and   
             David F. Nagle and   
              Gregory R. Ganger   Designing computer systems with
                                  MEMS-based storage . . . . . . . . . . . 1--12
       Kourosh Gharachorloo and   
               Madhu Sharma and   
               Simon Steely and   
              Stephen Van Doren   Architecture and design of AlphaServer
                                  GS320  . . . . . . . . . . . . . . . . . 13--24
          Milo M. K. Martin and   
            Daniel J. Sorin and   
        Anastassia Ailamaki and   
         Alaa R. Alameldeen and   
            Ross M. Dickson and   
              Carl J. Mauer and   
             Kevin E. Moore and   
               Manoj Plakal and   
               Mark D. Hill and   
                  David A. Wood   Timestamp snooping: an approach for
                                  extending SMPs . . . . . . . . . . . . . 25--36
              Ashwini Nanda and   
               Kwok-Ken Mak and   
        Krishnan Sugarvanam and   
          Ramendra K. Sahoo and   
Vijayaraghavan Soundararajan and   
                 T. Basil Smith   MemorIES3: a programmable, real-time
                                  hardware emulation tool for
                                  multiprocessor server design . . . . . . 37--48
                Jeff Gibson and   
                Robert Kunz and   
                David Ofelt and   
              Mark Horowitz and   
              John Hennessy and   
                  Mark Heinrich   FLASH vs. (Simulated) FLASH: closing the
                                  simulation loop  . . . . . . . . . . . . 49--58
                  Andy Chou and   
             Benjamin Chelf and   
              Dawson Engler and   
                  Mark Heinrich   Using meta-level compilation to check
                                  FLASH protocol code  . . . . . . . . . . 59--70
      Raoul A. F. Bhoedjang and   
              Kees Verstoep and   
              Tim Rühl and   
               Henri E. Bal and   
            Rutger F. H. Hofman   Evaluating design alternatives for
                                  reliable communication on high-speed
                                  networks . . . . . . . . . . . . . . . . 71--81
              Peter Mattson and   
           William J. Dally and   
               Scott Rixner and   
            Ujval J. Kapasi and   
                  John D. Owens   Communication scheduling . . . . . . . . 82--92
                 Jason Hill and   
            Robert Szewczyk and   
                   Alec Woo and   
                Seth Hollar and   
               David Culler and   
               Kristofer Pister   System architecture directions for
                                  networked sensors  . . . . . . . . . . . 93--104
            Alvin R. Lebeck and   
                 Xiaobo Fan and   
                  Heng Zeng and   
                    Carla Ellis   Power aware page allocation  . . . . . . 105--116
            Emery D. Berger and   
        Kathryn S. McKinley and   
          Robert D. Blumofe and   
                 Paul R. Wilson   Hoard: a scalable memory allocator for
                                  multithreaded applications . . . . . . . 117--128
   Kristián Flautner and   
                 Rich Uhlig and   
            Steve Reinhardt and   
                   Trevor Mudge   Thread-level parallelism and interactive
                                  performance of desktop applications  . . 129--138
          Motohiro Kawahito and   
            Hideaki Komatsu and   
                Toshio Nakatani   Effective null pointer check elimination
                                  utilizing hardware trap  . . . . . . . . 139--149
               Youtao Zhang and   
                   Jun Yang and   
                    Rajiv Gupta   Frequent value locality and
                                  value-centric data cache design  . . . . 150--159
                 M. Burrows and   
               U. Erlingson and   
              S-T. A. Leung and   
          M. T. Vandevoorde and   
          C. A. Waldspurger and   
                  K. Walker and   
                    W. E. Weihl   Efficient and flexible value sampling    160--167
David Lie Chandramohan Thekkath and   
              Mark Mitchell and   
            Patrick Lincoln and   
                  Dan Boneh and   
              John Mitchell and   
                  Mark Horowitz   Architectural support for copy and
                                  tamper resistant software  . . . . . . . 168--177
               Jerome Burke and   
              John McDonald and   
                    Todd Austin   Architectural support for fast
                                  symmetric-key cryptography . . . . . . . 178--189
           John Kubiatowicz and   
               David Bindel and   
                   Yan Chen and   
          Steven Czerwinski and   
              Patrick Eaton and   
               Dennis Geels and   
        Ramakrishna Gummadi and   
                  Sean Rhea and   
         Hakim Weatherspoon and   
                Chris Wells and   
                       Ben Zhao   OceanStore: an architecture for
                                  global-scale persistent storage  . . . . 190--201
         Evelyn Duesterwald and   
                   Vasanth Bala   Software profiling for hot path
                                  prediction: less is more . . . . . . . . 202--211
                 Rumi Zahir and   
              Jonathan Ross and   
                Dale Morris and   
                      Drew Hess   OS and compiler considerations in the
                                  design of the IA-64 architecture . . . . 212--221
          Daniel A. Connors and   
          Hillery C. Hunter and   
            Ben-Chung Cheng and   
                 Wen-mei W. Hwu   Hardware support for dynamic activation
                                  of compiler-directed computation reuse   222--233
              Allan Snavely and   
                Dean M. Tullsen   Symbiotic job scheduling for a
                                  simultaneous multithreaded processor . . 234--244
         Joshua A. Redstone and   
            Susan J. Eggers and   
                  Henry M. Levy   An analysis of operating system behavior
                                  on a simultaneous multithreaded
                                  architecture . . . . . . . . . . . . . . 245--256
     Karthik Sundaramoorthy and   
                Zach Purser and   
                 Eric Rotenburg   Slipstream processors: improving both
                                  performance and fault tolerance  . . . . 257--268

ACM SIGARCH Computer Architecture News
Volume 29, Number 1, March, 2001

              Maurice V. Wilkes   The memory gap and the future of high
                                  performance memories . . . . . . . . . . 2--7
               Naraig Manjikian   Multiprocessor enhancements of the
                                  SimpleScalar tool set  . . . . . . . . . 8--15
                     Frank Wang   A modified architecture for high-density
                                  MRAM . . . . . . . . . . . . . . . . . . 16--22
             Erik R. Altman and   
                    David Kaeli   WBT-2000: Workshop on Binary Translation
                                  2000 . . . . . . . . . . . . . . . . . . 23--25
             Amitabh Srivastava   Emerging opportunities for binary tools  26--26
             Harold W. Cain and   
             Kevin M. Lepak and   
               Mikko H. Lipasti   A dynamic binary translation approach to
                                  architectural simulation . . . . . . . . 27--36
            Rolf Hilgendorf and   
                  Wolfram Sauer   Instruction translation for an
                                  experimental S/390 processor . . . . . . 37--42
             Michiel Ronsse and   
              Koen De Bosschere   JiTI: a robust just in time
                                  instrumentation technique  . . . . . . . 43--54
                  David Ung and   
             Cristina Cifuentes   Optimising hot paths in a dynamic binary
                                  translator . . . . . . . . . . . . . . . 55--65
           Michael Gschwind and   
                    Erik Altman   Optimization and precise exceptions in
                                  dynamic compilation  . . . . . . . . . . 66--74
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 75--77

ACM SIGARCH Computer Architecture News
Volume 29, Number 2, May, 2001

               Craig Zilles and   
                  Gurindar Sohi   Execution-based prediction using
                                  speculative slices . . . . . . . . . . . 2--13
         Jamison D. Collins and   
                  Hong Wang and   
            Dean M. Tullsen and   
         Christopher Hughes and   
              Yong-Fong Lee and   
                 Dan Lavery and   
                   John P. Shen   Speculative precomputation: long-range
                                  prefetching of delinquent loads  . . . . 14--25
     Rajeev Balasubramonian and   
          Sandhya Dwarkadas and   
              David H. Albonesi   Dynamically allocating processor
                                  resources between nearby and distant ILP 26--37
                  Chi-Keung Luk   Tolerating memory latency through
                                  software-controlled pre-execution in
                                  simultaneous multithreading processors   40--51
           Murali Annavaram and   
           Jignesh M. Patel and   
             Edward S. Davidson   Data prefetching by dependence graph
                                  precomputation . . . . . . . . . . . . . 52--61
               Vinodh Cuppu and   
                    Bruce Jacob   Concurrency, latency, or system
                                  overhead: which has the largest impact
                                  on uniprocessor DRAM-system performance? 62--71
               Brian Fields and   
                 Shai Rubin and   
         Rastislav Bodík   Focusing processor policies via
                                  critical-path prediction . . . . . . . . 74--85
           Timothy Sherwood and   
                    Brad Calder   Automated design of finite state machine
                                  predictors for customized processors . . 86--97
                 Youfeng Wu and   
             Dong-Yuan Chen and   
                     Jesse Fang   Better exploration of region-level value
                                  locality with integrated computation
                                  reuse and value prediction . . . . . . . 98--108
                    Lisa Wu and   
               Chris Weaver and   
                    Todd Austin   CryptoManiac: a fast flexible
                                  architecture for secure communication    110--119
                Ki Hwan Yum and   
               Eun Jung Kim and   
                   Chita R. Das   QoS provisioning in clusters: an
                                  investigation of Router and NIC design   120--129
     Srikanth T. Srinivasan and   
            Roy Dz-ching Ju and   
            Alvin R. Lebeck and   
                Chris Wilkerson   Locality vs. criticality . . . . . . . . 132--143
                An-Chow Lai and   
                   Cem Fide and   
                  Babak Falsafi   Dead-block prediction & dead-block
                                  correlating prefetchers  . . . . . . . . 144--154
               Alex Ramirez and   
  Luiz André Barroso and   
       Kourosh Gharachorloo and   
                Robert Cohn and   
          Josep Larriba-Pey and   
         P. Geoffrey Lowney and   
                   Mateo Valero   Code layout optimizations for
                                  transaction processing workloads . . . . 155--164
   Michael Thaddeus Niemier and   
                 Peter M. Kogge   Exploring and exploiting wire-level
                                  pipelining in emerging technologies  . . 166--177
       Seth Copen Goldstein and   
                    Mihai Budiu   NanoFabrics: spatial computing using
                                  molecular electronics  . . . . . . . . . 178--191
                  David Lie and   
                  Andy Chou and   
              Dawson Engler and   
                  David L. Dill   A simple method for extracting models
                                  for protocol code  . . . . . . . . . . . 192--203
            Milos Prvulovic and   
María Jesús Garzarán and   
       Lawrence Rauchwerger and   
                Josep Torrellas   Removing architectural bottlenecks to
                                  the scalability of speculative
                                  parallelization  . . . . . . . . . . . . 204--215
              R. Iris Bahar and   
                 Srilatha Manne   Power and energy reduction via pipeline
                                  balancing  . . . . . . . . . . . . . . . 218--229
          Daniele Folegnani and   
        Antonio González   Energy-effective issue logic . . . . . . 230--239
           Stefanos Kaxiras and   
                 Zhigang Hu and   
             Margaret Martonosi   Cache decay: exploiting generational
                                  behavior to reduce cache leakage power   240--251
      Christopher J. Hughes and   
                Praful Kaul and   
             Sarita V. Adve and   
                 Rohit Jain and   
                Chanik Park and   
             Jayanth Srinivasan   Variability in the execution of
                                  multimedia applications and implications
                                  for architecture . . . . . . . . . . . . 254--265
       S. Subramanya Sastry and   
     Rastislav Bodík and   
                 James E. Smith   Rapid profiling via stratified sampling  278--289

ACM SIGARCH Computer Architecture News
Volume 29, Number 3, June, 2001

                Craig B. Zilles   Benchmark health considered harmful  . . 4--5
           Niki C. Thornock and   
              J. Kelly Flanagan   A national trace collection and
                                  distribution resource  . . . . . . . . . 6--10
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 11--15

ACM SIGARCH Computer Architecture News
Volume 29, Number 4, September, 2001

               Naraig Manjikian   More enhancements of the SimpleScalar
                                  tool set . . . . . . . . . . . . . . . . 5--12
            Jason F. Cantin and   
                   Mark D. Hill   Cache performance for selected SPEC
                                  CPU2000 benchmarks . . . . . . . . . . . 13--18
                   Jinsuo Zhang   The predictability of load address . . . 19--28
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 29--31

ACM SIGARCH Computer Architecture News
Volume 29, Number 5, December, 2001

      M. Watheq El-Kharashi and   
            Fayez Elguibaly and   
                      Kin F. Li   Adapting Tomasulo's algorithm for
                                  bytecode folding based Java processors   1--8
               S. Bartolini and   
                  R. Giorgi and   
                  J. Protic and   
                C. A. Prete and   
                      M. Valero   Parallel architecture and compilation
                                  techniques: selection of workshop
                                  papers, Guest Editors' introduction  . . 9--12
           Andrea Acquaviva and   
                Luca Benini and   
             Bruno Riccó   Energy characterization of embedded
                                  real-time operating systems  . . . . . . 13--18
          M. Angels Moncusi and   
                Alex Arenas and   
                  Jesus Labarta   Improving energy saving in hard real
                                  time systems via a modified dual
                                  priority scheduling  . . . . . . . . . . 19--24
                Frank Vahid and   
               Rilesh Patel and   
                     Greg Stitt   Propagating constants past software to
                                  hardware peripherals in
                                  fixed-application embedded systems . . . 25--30
               Vishal Aslot and   
               Rudolf Eigenmann   Performance characteristics of the SPEC
                                  OMP2001 benchmarks . . . . . . . . . . . 31--40
               J. Mark Bull and   
                Darragh O'Neill   A microbenchmark suite for OpenMP 2.0    41--48
         D. S. Nikolopoulos and   
                 E. Artiaga and   
          E. Ayguadé and   
                     J. Labarta   Exploiting memory affinity in OpenMP
                                  through schedule reuse . . . . . . . . . 49--55
               Michael Sung and   
           Ronny Krashinsky and   
               Krste Asanovi\'c   Multithreading decoupled architectures
                                  for complexity-effective general purpose
                                  computing  . . . . . . . . . . . . . . . 56--61
            Deependra Talla and   
                   Lizy K. John   MediaBreeze: a decoupled architecture
                                  for accelerating multimedia applications 62--67
                Tatsuo Nakajima   A middleware component supporting
                                  flexible user interaction for networked
                                  home appliances  . . . . . . . . . . . . 68--75
               David Touzet and   
           Jean-Marc Menaud and   
Frédéric Weis and   
               Paul Couderc and   
           Michel Banâtre   SIDE surfer: enriching casual meetings
                                  with spontaneous information gathering   76--83
             Erik R. Altman and   
                 David R. Kaeli   Workshop on Binary Translation 2001  . . 84--85
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 86--90

ACM SIGARCH Computer Architecture News
Volume 30, Number 1, March, 2002

        Rajagopalan Desikan and   
                Doug Burger and   
         Stephen W. Keckler and   
               Llorenc Cruz and   
           Fernando Latorre and   
    Antonio González and   
                   Mateo Valero   Errata on ``Measuring Experimental Error
                                  in Microprocessor Simulation'' . . . . . 2--4
               Fu-Chi Chang and   
                  Chia-Jiu Wang   Architectural tradeoff in implementing
                                  RSA processors . . . . . . . . . . . . . 5--11
                Augustus K. Uht   Disjoint Eager Execution: what it is
                                  /what it is not  . . . . . . . . . . . . 12--14
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 15--21

ACM SIGARCH Computer Architecture News
Volume 30, Number 2, May, 2002

               A. Hartstein and   
                Thomas R. Puzak   The optimum pipeline depth for a
                                  microprocessor . . . . . . . . . . . . . 7--13
           M. S. Hrishikesh and   
                Doug Burger and   
           Norman P. Jouppi and   
         Stephen W. Keckler and   
            Keith I. Farkas and   
         Premkishore Shivakumar   The optimal logic depth per pipeline
                                  stage is 6 to 8 FO4 inverter delays  . . 14--24
              Eric Sprangle and   
                   Doug Carmean   Increasing processor performance by
                                  implementing deeper pipelines  . . . . . 25--34
                  Dan Ernst and   
                    Todd Austin   Efficient dynamic scheduling through tag
                                  elimination  . . . . . . . . . . . . . . 37--46
               Brian Fields and   
     Rastislav Bodík and   
                   Mark D. Hill   Slack: maximizing performance under
                                  technological constraints  . . . . . . . 47--58
            Alvin R. Lebeck and   
          Jinson Koppanalil and   
                    Tong Li and   
          Jaidev Patwardhan and   
                 Eric Rotenberg   A large, fast instruction window for
                                  tolerating cache misses  . . . . . . . . 59--70
                Ho-Seop Kim and   
                 James E. Smith   An instruction set and microarchitecture
                                  for instruction level distributed
                                  processing . . . . . . . . . . . . . . . 71--81
           T. N. Vijaykumar and   
             Irith Pomeranz and   
                     Karl Cheng   Transient-fault recovery using
                                  simultaneous multithreading  . . . . . . 87--98
     Shubhendu S. Mukherjee and   
              Michael Kontz and   
            Steven K. Reinhardt   Detailed design and evaluation of
                                  redundant multithreading alternatives    99--110
            Milos Prvulovic and   
                Zheng Zhang and   
                Josep Torrellas   ReVive: cost-effective architectural
                                  support for rollback recovery in
                                  shared-memory multiprocessors  . . . . . 111--122
            Daniel J. Sorin and   
          Milo M. K. Martin and   
               Mark D. Hill and   
                  David A. Wood   SafetyNet: improving the availability of
                                  shared memory multiprocessors with
                                  global checkpoint/recovery . . . . . . . 123--134
               Seongmoo Heo and   
               Kenneth Barr and   
               Mark Hampton and   
               Krste Asanovi\'c   Dynamic fine-grain leakage reduction
                                  using leakage-biased bitlines  . . . . . 137--147
  Krisztián Flautner and   
               Nam Sung Kim and   
               Steve Martin and   
               David Blaauw and   
                   Trevor Mudge   Drowsy caches: simple techniques for
                                  reducing leakage power . . . . . . . . . 148--157
                 Anoop Iyer and   
               Diana Marculescu   Power and performance evaluation of
                                  globally asynchronous locally
                                  synchronous processors . . . . . . . . . 158--168
                Yan Solihin and   
                 Jaejin Lee and   
                Josep Torrellas   Using a user-level memory thread for
                                  correlation prefetching  . . . . . . . . 171--182
            Jarrod A. Lewis and   
                Bryan Black and   
               Mikko H. Lipasti   Avoiding initialization misses to the
                                  heap . . . . . . . . . . . . . . . . . . 183--194
         Gokul B. Kandiraju and   
          Anand Sivasubramaniam   Going the distance for TLB prefetching:
                                  an application-driven study  . . . . . . 195--206
                 Zhigang Hu and   
           Stefanos Kaxiras and   
             Margaret Martonosi   Timekeeping in the memory system:
                                  predicting and optimizing memory
                                  behavior . . . . . . . . . . . . . . . . 209--220
                 Ilhyun Kim and   
               Mikko H. Lipasti   Implementing optimizations at decode
                                  time . . . . . . . . . . . . . . . . . . 221--232
      Ashutosh S. Dhodapkar and   
                 James E. Smith   Managing multi-configuration hardware
                                  via dynamic working set analysis . . . . 233--244
          Philip Buonadonna and   
                   David Culler   Queue pair IP: a hybrid architecture for
                                  system area networks . . . . . . . . . . 247--256
              Yuanyuan Zhou and   
              Angelos Bilas and   
         Suresh Jagannathan and   
            Cezary Dubnicki and   
           James F. Philbin and   
                         Kai Li   Experiences with VI communication for
                                  database storage . . . . . . . . . . . . 257--268
               Alex Pajuelo and   
    Antonio González and   
                   Mateo Valero   Speculative dynamic vectorization  . . . 271--280
               Roger Espasa and   
           Federico Ardanaz and   
                  Joel Emer and   
              Stephen Felix and   
                 Julio Gago and   
              Roger Gramunt and   
            Isaac Hernandez and   
                  Toni Juan and   
               Geoff Lowney and   
            Matthew Mattina and   
            André Seznec   Tarantula: a vector extension to the
                                  Alpha architecture . . . . . . . . . . . 281--292
        André Seznec and   
              Stephen Felix and   
           Venkata Krishnan and   
             Yiannakis Sazeides   Design tradeoffs for the Alpha EV8
                                  conditional branch predictor . . . . . . 295--306
         Robert S. Chappell and   
              Francis Tseng and   
                   Adi Yoaz and   
                   Yale N. Patt   Difficult-path branch prediction using
                                  subordinate microthreads . . . . . . . . 307--317
           Steven E. Raasch and   
          Nathan L. Binkert and   
            Steven K. Reinhardt   A scalable instruction queue design
                                  using dependence chains  . . . . . . . . 318--329

ACM SIGARCH Computer Architecture News
Volume 30, Number 3, June, 2002

                 Ken Steele and   
             Jason Waterman and   
               Eugene Weinstein   The Oxygen H21 handheld  . . . . . . . . 3--4
                 Diana Keen and   
              Frederic T. Chong   Hardware-software co-design of embedded
                                  sensor-actuator networks . . . . . . . . 5--6
              Masaaki Kondo and   
            Motonobu Fujita and   
               Hiroshi Nakamura   Software-controlled on-chip memory for
                                  high-performance and low-power computing 7--8
          Ramendra K. Sahoo and   
                  Myung Bae and   
                   Jose Moreira   Semi-hierarchical approach for
                                  reliability, availability, and
                                  serviceability of cellular systems . . . 9--10
                    Hans Eberle   Monitoring and diagnosing computer
                                  systems by radio communication . . . . . 11--12
              William Thies and   
          Michal Karczmarek and   
             Michael Gordon and   
                 David Maze and   
                Jeremy Wong and   
             Henry Hoffmann and   
              Matthew Brown and   
              Saman Amarasinghe   A common machine language for grid-based
                                  architectures  . . . . . . . . . . . . . 13--14
                 Frank Wang and   
                  Na Helian and   
                    Farhi Marir   A novel associative memory architecture
                                  for quick matching . . . . . . . . . . . 15--16
                    Mike Parker   A case for user-level interrupts . . . . 17--18
               Martin Burtscher   An improved index function for (D)FCM
                                  predictors . . . . . . . . . . . . . . . 19--24
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 25--26

ACM SIGARCH Computer Architecture News
Volume 30, Number 4, September, 2002

                 I. G\`omez and   
           L. Piñuel and   
                  M. Prieto and   
                      F. Tirado   Analysis of simulation-adapted SPEC 2000
                                  benchmarks . . . . . . . . . . . . . . . 4--10
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 11--16

ACM SIGARCH Computer Architecture News
Volume 30, Number 5, December, 2002

                 Deborah Estrin   Keynote address: Sensor network
                                  research: emerging challenges for
                                  architecture, systems, and languages . . 1--4
                Ravi Rajwar and   
               James R. Goodman   Transactional lock-free execution of
                                  lock-based programs  . . . . . . . . . . 5--17
José F. Martínez and   
                Josep Torrellas   Speculative synchronization: applying
                                  thread-level speculation to explicitly
                                  parallel applications  . . . . . . . . . 18--29
             Kevin M. Lepak and   
               Mikko H. Lipasti   Temporally silent stores . . . . . . . . 30--41
           Timothy Sherwood and   
              Erez Perelman and   
               Greg Hamerly and   
                    Brad Calder   Automatically characterizing large scale
                                  program behavior . . . . . . . . . . . . 45--57
             Kazunori Ogata and   
            Hideaki Komatsu and   
                Toshio Nakatani   Bytecode fetch optimization for a Java
                                  interpreter  . . . . . . . . . . . . . . 58--67
                     Tao Li and   
           Lizy Kurian John and   
      Anand Sivasubramaniam and   
           N. Vijaykrishnan and   
                     Juan Rubio   Understanding and improving operating
                                  system effects in control flow
                                  prediction . . . . . . . . . . . . . . . 68--80
               Philip Levis and   
                   David Culler   Maté: a tiny virtual machine for sensor
                                  networks . . . . . . . . . . . . . . . . 85--95
                Philo Juang and   
               Hidekazu Oki and   
                  Yong Wang and   
         Margaret Martonosi and   
              Li Shiuan Peh and   
              Daniel Rubenstein   Energy-efficient computing for wildlife
                                  tracking: design tradeoffs and early
                                  experiences with ZebraNet  . . . . . . . 96--107
             Darko Kirovski and   
           Milenko Drini\'c and   
              Miodrag Potkonjak   Enabling trusted software integrity  . . 108--120
                  Heng Zeng and   
             Carla S. Ellis and   
            Alvin R. Lebeck and   
                    Amin Vahdat   ECOSystem: managing energy as a first
                                  class operating system resource  . . . . 123--132
               Raksit Ashok and   
             Saurabh Chheda and   
            Csaba Andras Moritz   Cool-Mem: combining statically
                                  speculative memory accessing with
                                  selective address translation for energy
                                  efficiency . . . . . . . . . . . . . . . 133--143
            Ruchira Sasanka and   
      Christopher J. Hughes and   
                 Sarita V. Adve   Joint local and global hardware
                                  adaptations for energy . . . . . . . . . 144--155
               Dongkeun Kim and   
                   Donald Yeung   Design and evaluation of compiler
                                  algorithms for pre-execution . . . . . . 159--170
               Antonia Zhai and   
     Christopher B. Colohan and   
         J. Gregory Steffan and   
                  Todd C. Mowry   Compiler optimization of scalar value
                                  communication between speculative
                                  threads  . . . . . . . . . . . . . . . . 171--183
           Jeffrey Oplinger and   
                  Monica S. Lam   Enhancing software reliability with
                                  speculative threads  . . . . . . . . . . 184--196
              J. Adam Butts and   
                      Guri Sohi   Dynamic dead-instruction detection and
                                  elimination  . . . . . . . . . . . . . . 199--210
               Changkyu Kim and   
                Doug Burger and   
             Stephen W. Keckler   An adaptive, non-uniform cache structure
                                  for wire-delay dominated on-chip caches  211--222
     Shubhendu S. Mukherjee and   
             Federico Silla and   
               Peter Bannon and   
                  Joel Emer and   
                 Steve Lang and   
                     David Webb   A comparative study of arbitration
                                  algorithms for the Alpha 21364 pipelined
                                  router . . . . . . . . . . . . . . . . . 223--234
             Hyong-youb Kim and   
               Vijay S. Pai and   
                   Scott Rixner   Increasing Web server throughput with
                                  network interface data caching . . . . . 239--250
               Eddie Kohler and   
              Robert Morris and   
                    Benjie Chen   Programming language optimizations for
                                  modular router configurations  . . . . . 251--263
          Muthian Sivathanu and   
   Andrea C. Arpaci-Dusseau and   
        Remzi H. Arpaci-Dusseau   Evolving RPC for active storage  . . . . 264--276
             Robert Cooksey and   
            Stephan Jourdan and   
                  Dirk Grunwald   A stateless, content-directed data
                                  prefetching mechanism  . . . . . . . . . 279--290
          Michael I. Gordon and   
              William Thies and   
          Michal Karczmarek and   
                 Jasper Lin and   
                Ali S. Meli and   
             Andrew A. Lamb and   
                Chris Leger and   
                Jeremy Wong and   
             Henry Hoffmann and   
                 David Maze and   
              Saman Amarasinghe   A stream compiler for
                                  communication-exposed architectures  . . 291--303
             Emmett Witchel and   
                 Josh Cates and   
               Krste Asanovi\'c   Mondrian memory protection . . . . . . . 304--316

ACM SIGARCH Computer Architecture News
Volume 31, Number 1, March, 2003

                 Jack B. Dennis   Fresh Breeze: a multiprocessor chip
                                  architecture guided by modular
                                  programming principles . . . . . . . . . 7--15
                  D. Morano and   
                 A. Khalafi and   
                D. R. Kaeli and   
                      A. K. Uht   Realizing high IPC through a scalable
                                  memory-latency tolerant multipath
                                  microarchitecture  . . . . . . . . . . . 16--25
       George Almási and   
         C\ualin Ca\cscaval and   
José G. Castaños and   
              Monty Denneau and   
               Derek Lieber and   
     José E. Moreira and   
           Henry S. Warren, Jr.   Dissecting Cyclops: a detailed analysis
                                  of a multithreaded architecture  . . . . 26--38
              Mohamed M. Zahran   On cache memory hierarchy for
                                  Chip-Multiprocessor  . . . . . . . . . . 39--48
         Gary Gréwal and   
                 Tom Wilson and   
                  Andrew Morton   An EGA approach to the compile-time
                                  assignment of data to multiple memories
                                  in digital-signal processors . . . . . . 49--59
            Ulrich Ramacher and   
             Nico Brüs and   
            Ulrich Hachmann and   
              Jens Harnisch and   
              Wolfgang Raab and   
                   Axel Techmer   100 GOPS vision processor for automotive
                                  applications . . . . . . . . . . . . . . 60--68
         Nikos P. Pitsianis and   
             Gerald G. Pechanek   Indirect VLIW memory allocation for the
                                  ManArray multiprocessor DSP  . . . . . . 69--74
            Naohiko Shimizu and   
                   Ken Takatori   A transparent Linux super page kernel
                                  for Alpha, Sparc64 and IA32: reducing
                                  TLB misses of applications . . . . . . . 75--84
            Alessio Bechini and   
       Pierfrancesco Foglia and   
           Cosimo Antonio Prete   Fine-grain design space exploration for
                                  a cartographic SoC multiprocessor  . . . 85--92
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 93--96

ACM SIGARCH Computer Architecture News
Volume 31, Number 2, May, 2003

              Kevin Skadron and   
             Mircea R. Stan and   
                  Wei Huang and   
         Sivakumar Velusamy and   
   Karthik Sankaranarayanan and   
                   David Tarjan   Temperature-aware microarchitecture  . . 2--13
          Grigorios Magklis and   
           Michael L. Scott and   
              Greg Semeraro and   
          David H. Albonesi and   
                 Steven Dropsho   Profile-based dynamic voltage and
                                  frequency scaling for a multiple clock
                                  domain microprocessor  . . . . . . . . . 14--27
                 Ilhyun Kim and   
               Mikko H. Lipasti   Half-price architecture  . . . . . . . . 28--38
                    Il Park and   
              Babak Falsafi and   
               T. N. Vijaykumar   Implicitly-multithreaded processors  . . 39--51
                  Daniel Citron   MisSPECulation: partial and misleading
                                  use of SPEC CPU2000 in computer
                                  architecture conferences . . . . . . . . 52--61
           Jessica H. Tseng and   
               Krste Asanovi\'c   Banked multiported register files for
                                  high-frequency superscalar
                                  microprocessors  . . . . . . . . . . . . 62--71
          Michael D. Powell and   
               T. N. Vijaykumar   Pipeline damping: a microarchitectural
                                  technique to reduce inductive noise in
                                  supply voltage . . . . . . . . . . . . . 72--83
       Roland E. Wunderlich and   
          Thomas F. Wenisch and   
              Babak Falsafi and   
                   James C. Hoe   SMARTS: accelerating microarchitecture
                                  simulation via rigorous statistical
                                  sampling . . . . . . . . . . . . . . . . 84--97
              Mohamed Gomaa and   
            Chad Scarbrough and   
           T. N. Vijaykumar and   
                 Irith Pomeranz   Transient-fault recovery for chip
                                  multiprocessors  . . . . . . . . . . . . 98--109
            Milos Prvulovic and   
                Josep Torrellas   ReEnact: using thread-level speculation
                                  mechanisms to debug data races in
                                  multithreaded codes  . . . . . . . . . . 110--121
                     Min Xu and   
            Rastislav Bodik and   
                   Mark D. Hill   A ``flight data recorder'' for enabling
                                  full-system multiprocessor deterministic
                                  replay . . . . . . . . . . . . . . . . . 122--135
             Chuanjun Zhang and   
                Frank Vahid and   
                   Walid Najjar   A highly configurable cache architecture
                                  for embedded systems . . . . . . . . . . 136--146
     Alper Buyuktosuno\uglu and   
            Tejas Karkhanis and   
          David H. Albonesi and   
                    Pradip Bose   Energy efficient co-adaptive instruction
                                  fetch and issue  . . . . . . . . . . . . 147--156
           Michael C. Huang and   
                 Jose Renau and   
                Josep Torrellas   Positional adaptation of processors:
                                  application to energy reduction  . . . . 157--168
        Sudhanva Gurumurthi and   
      Anand Sivasubramaniam and   
            Mahmut Kandemir and   
                Hubertus Franke   DRPM: dynamic speed control for power
                                  management in server class disks . . . . 169--181
          Milo M. K. Martin and   
               Mark D. Hill and   
                  David A. Wood   Token coherence: decoupling performance
                                  and correctness  . . . . . . . . . . . . 182--193
                Arjun Singh and   
           William J. Dally and   
              Amit K. Gupta and   
                   Brian Towles   GOAL: a load-balanced adaptive routing
                                  algorithm for torus networks . . . . . . 194--205
          Milo M. K. Martin and   
            Pacia J. Harper and   
            Daniel J. Sorin and   
               Mark D. Hill and   
                  David A. Wood   Using destination-set prediction to
                                  improve the latency/bandwidth tradeoff
                                  in shared-memory multiprocessors . . . . 206--217
               Zarka Cvetanovic   Performance analysis of the Alpha
                                  21364-based HP GS1280 multiprocessor . . 218--229
         Paramjit S. Oberoi and   
               Gurindar S. Sohi   Parallelism in the front-end . . . . . . 230--240
        André Seznec and   
               Antony Fraboulet   Effective ahead pipelining of
                                  instruction block address generation . . 241--252
                  Dan Ernst and   
               Andrew Hamel and   
                    Todd Austin   Cyclone: a broadcast-free dynamic
                                  instruction scheduler with selective
                                  replay . . . . . . . . . . . . . . . . . 253--263
              Ravi Bhargava and   
                   Lizy K. John   Improving dynamic cluster assignment for
                                  clustered trace cache processors . . . . 264--274
     Rajeev Balasubramonian and   
          Sandhya Dwarkadas and   
              David H. Albonesi   Dynamically managing the
                                  communication-parallelism trade-off in
                                  future clustered processors  . . . . . . 275--287
           Timothy Sherwood and   
            George Varghese and   
                    Brad Calder   A pipelined memory architecture for high
                                  throughput network processors  . . . . . 288--299
             Jahangir Hasan and   
             Satish Chandra and   
               T. N. Vijaykumar   Efficient use of memory bandwidth to
                                  improve network processor throughput . . 300--313
               Renju Thomas and   
             Manoj Franklin and   
            Chris Wilkerson and   
                    Jared Stark   Improving branch prediction by dynamic
                                  dataflow-based identification of
                                  correlated branches from a large global
                                  history  . . . . . . . . . . . . . . . . 314--323
               Huiyang Zhou and   
              Jill Flanagan and   
                Thomas M. Conte   Detecting global stride locality in
                                  value streams  . . . . . . . . . . . . . 324--335
           Timothy Sherwood and   
              Suleyman Sair and   
                    Brad Calder   Phase tracking and prediction  . . . . . 336--349
       Aravindh Anantaraman and   
                 Kiran Seth and   
             Kaustubh Patil and   
             Eric Rotenberg and   
                  Frank Mueller   Virtual simple architecture (VISA):
                                  exceeding the complexity limit in safe
                                  real-time systems  . . . . . . . . . . . 350--361
            Marc L. Corliss and   
       E. Christopher Lewis and   
                      Amir Roth   DISE: a programmable macro engine for
                                  customizing applications . . . . . . . . 362--373
                 Mark Oskin and   
          Frederic T. Chong and   
            Isaac L. Chuang and   
               John Kubiatowicz   Building quantum wires: the long and the
                                  short of it  . . . . . . . . . . . . . . 374--387
               Zhenlin Wang and   
                Doug Burger and   
        Kathryn S. McKinley and   
        Steven K. Reinhardt and   
               Charles C. Weems   Guided region prefetching: a cooperative
                                  hardware/software approach . . . . . . . 388--398
         Christos Kozyrakis and   
                David Patterson   Overcoming the limitations of
                                  conventional vector processors . . . . . 399--409
                 Jinwoo Suh and   
                Eun-Gyu Kim and   
           Stephen P. Crago and   
         Lakshmi Srinivasan and   
              Matthew C. French   A performance analysis of PIM, stream
                                  processing, and tiled processing on
                                  memory-intensive signal processing
                                  kernels  . . . . . . . . . . . . . . . . 410--421
  Karthikeyan Sankaralingam and   
         Ramadass Nagarajan and   
                Haiming Liu and   
               Changkyu Kim and   
                Jaehyuk Huh and   
                Doug Burger and   
         Stephen W. Keckler and   
               Charles R. Moore   Exploiting ILP, TLP, and DLP with the
                                  polymorphous TRIPS architecture  . . . . 422--433
            Michael K. Chen and   
                 Kunle Olukotun   The Jrpm system for dynamically
                                  parallelizing Java programs  . . . . . . 434--446

ACM SIGARCH Computer Architecture News
Volume 31, Number 3, June, 2003

                Anthony S. Fong   A computer architecture with access
                                  control and cache option tags on
                                  individual instruction operands  . . . . 1--5
               Edwin J. Tan and   
            Wendi B. Heinzelman   DSP architectures: past, present and
                                  futures  . . . . . . . . . . . . . . . . 6--19
           Lucian N. Vintan and   
               Marius Sbera and   
               Ioan Z. Mihu and   
                  Adrian Florea   An alternative to branch prediction:
                                  pre-computed branches  . . . . . . . . . 20--29
              Mark Heinrich and   
               Mainak Chaudhuri   Ocean warning: avoid drowning  . . . . . 30--32
             Jean-Louis Lafitte   Qualitatively matching computer
                                  architecture with Turing machine . . . . 33--41
          Takenori Koushiro and   
             Toshinori Sato and   
                 Itsujiro Arita   A trace-level value predictor for
                                  Contrail processors  . . . . . . . . . . 42--47
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 48--54

ACM SIGARCH Computer Architecture News
Volume 31, Number 4, September, 2003

                  Mikkel Thorup   Combinatorial power in multimedia
                                  processors . . . . . . . . . . . . . . . 5--11
             Gary K. W. Hau and   
               Anthony Fong and   
                    Mok Pak Lun   Support of Java API for the jHISC system 12--17
                Mok Pak Lun and   
                 Richard Li and   
                   Anthony Fong   Method manipulation in an
                                  object-oriented processor  . . . . . . . 18--25
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 26--32

ACM SIGARCH Computer Architecture News
Volume 31, Number 5, December, 2003

        Kristopher C. Breen and   
              Duncan G. Elliott   Aliasing and anti-aliasing in branch
                                  history table prediction . . . . . . . . 1--4
              Ryan W. S. Yu and   
             Gary K. W. Hau and   
                Anthony S. Fong   Test bench for software development of
                                  object-oriented processor  . . . . . . . 5--9
                Mok Pak Lun and   
               Anthony Fong and   
                 Gary K. W. Hau   Object-oriented processor requirements
                                  with instruction analysis of Java
                                  programs . . . . . . . . . . . . . . . . 10--15
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 16--21

ACM SIGARCH Computer Architecture News
Volume 32, Number 1, March, 2004

               Lizy Kurian John   More on finding a single number to
                                  indicate overall performance of a
                                  benchmark suite  . . . . . . . . . . . . 3--8
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 9--13

ACM SIGARCH Computer Architecture News
Volume 32, Number 2, March, 2004

     Michael Bedford Taylor and   
                 Walter Lee and   
               Jason Miller and   
            David Wentzlaff and   
                  Ian Bratt and   
              Ben Greenwald and   
             Henry Hoffmann and   
               Paul Johnson and   
                  Jason Kim and   
                James Psota and   
               Arvind Saraf and   
            Nathan Shnidman and   
            Volker Strumpen and   
                 Matt Frank and   
          Saman Amarasinghe and   
                  Anant Agarwal   Evaluation of the Raw Microprocessor: An
                                  Exposed-Wire-Delay Architecture for ILP
                                  and Streams  . . . . . . . . . . . . . . 2--2
                      Anonymous   General Co-Chair's Message . . . . . . . 9--9
                      Anonymous   Program Chair's Message  . . . . . . . . 10--10
                      Anonymous   Committees . . . . . . . . . . . . . . . 11--11
                      Anonymous   Reviewers  . . . . . . . . . . . . . . . 13--13
                Jung Ho Ahn and   
           William J. Dally and   
            Brucek Khailany and   
            Ujval J. Kapasi and   
                   Abhishek Das   Evaluating the Imagine Stream
                                  Architecture . . . . . . . . . . . . . . 14--14
               John W. Sias and   
              Sain-zee Ueng and   
              Geoff A. Kent and   
             Ian M. Steiner and   
            Erik M. Nystrom and   
                 Wen-mei W. Hwu   Field-testing IMPACT EPIC research
                                  results in Itanium 2 . . . . . . . . . . 26--26
           T. N. Vijaykumar and   
                 Zeshan Chishti   Wire Delay is Not a Problem for SMT (In
                                  the Near Future) . . . . . . . . . . . . 40--40
           Ronny Krashinsky and   
         Christopher Batten and   
               Mark Hampton and   
              Steve Gerding and   
              Brian Pharris and   
               Jared Casper and   
                 Krste Asanovic   The Vector-Thread Architecture . . . . . 52--52
               Rakesh Kumar and   
            Dean M. Tullsen and   
  Parthasarathy Ranganathan and   
           Norman P. Jouppi and   
                Keith I. Farkas   Single-ISA Heterogeneous Multi-Core
                                  Architectures for Multithreaded Workload
                                  Performance  . . . . . . . . . . . . . . 64--64
                  Yuan Chou and   
                 Brian Fahs and   
                Santosh Abraham   Microarchitecture Optimizations for
                                  Exploiting Memory-Level Parallelism  . . 76--76
             Harold W. Cain and   
               Mikko H. Lipasti   Memory Ordering: a Value-Based Approach  90--90
              Lance Hammond and   
                 Vicky Wong and   
                  Mike Chen and   
         Brian D. Carlstrom and   
              John D. Davis and   
              Ben Hertzberg and   
          Manohar K. Prabhu and   
              Honggo Wijaya and   
         Christos Kozyrakis and   
                 Kunle Olukotun   Transactional Memory Coherence and
                                  Consistency  . . . . . . . . . . . . . . 102--102
          Sudheendra Hangal and   
               Durgam Vahia and   
          Chaiyasit Manovit and   
             Juin-Yeu Joseph Lu   TSOtool: a Program for Verifying Memory
                                  Systems Using the Memory Consistency
                                  Model  . . . . . . . . . . . . . . . . . 114--114
           Mainak Chaudhuri and   
                  Mark Heinrich   SMTp: An Architecture for
                                  Next-generation Scalable Multi-threading 124--124
      Christopher J. Hughes and   
                 Sarita V. Adve   A Formal Approach to Frequent Energy
                                  Adaptations for Multimedia Applications  138--138
                John Oliver and   
            Ravishankar Rao and   
               Paul Sultana and   
          Jedidiah Crandall and   
          Erik Czernikowski and   
         Leslie W. Jones IV and   
             Diana Franklin and   
           Venkatesh Akella and   
              Frederic T. Chong   Synchroscalar: a Multiple Clock Domain,
                                  Power-Aware, Tile-Based Embedded
                                  Processor  . . . . . . . . . . . . . . . 150--150
                Roni Rosner and   
                 Yoav Almog and   
               Micha Moffie and   
           Naftali Schwartz and   
                  Avi Mendelson   Power Awareness through Selective
                                  Dynamically Optimized Traces . . . . . . 162--162
 Lakshmi N. Bairavasundaram and   
          Muthian Sivathanu and   
   Andrea C. Arpaci-Dusseau and   
        Remzi H. Arpaci-Dusseau   X-RAY: a Non-Invasive Exclusive Caching
                                  Mechanism for RAIDs  . . . . . . . . . . 176--176
             Robert Mullins and   
                Andrew West and   
                    Simon Moore   Low-Latency Virtual-Channel Routers for
                                  On-Chip Networks . . . . . . . . . . . . 188--188
                  V. Puente and   
             J. A. Gregorio and   
                 F. Vallejo and   
                     R. Beivide   Immunet: a Cheap and Robust
                                  Fault-Tolerant Packet Routing Mechanism  198--198
         Alaa R. Alameldeen and   
                  David A. Wood   Adaptive Cache Compression for
                                  High-Performance Processors  . . . . . . 212--212
                   Pin Zhou and   
                   Feng Qin and   
                    Wei Liu and   
              Yuanyuan Zhou and   
                Josep Torrellas   iWatcher: Efficient Architectural
                                  Support for Software Debugging . . . . . 224--224
                 Sami Yehia and   
                  Olivier Temam   From Sequences of Dependent Instructions
                                  to Functions: An Approach for Improving
                                  Performance without ILP or Speculation   238--238
               Ayose Falcon and   
                Jared Stark and   
               Alex Ramirez and   
                 Konrad Lai and   
                   Mateo Valero   Prophet/Critic Hybrid Branch Prediction  250--250
         Christopher Weaver and   
                  Joel Emer and   
     Shubhendu S. Mukherjee and   
            Steven K. Reinhardt   Techniques to Reduce the Soft Error Rate
                                  of a High-Performance Microprocessor . . 264--264
         Jayanth Srinivasan and   
             Sarita V. Adve and   
                Pradip Bose and   
                 Jude A. Rivers   The Case for Lifetime Reliability-Aware
                                  Microprocessors  . . . . . . . . . . . . 276--276
          Michael D. Powell and   
               T. N. Vijaykumar   Exploiting Resonant Behavior to Reduce
                                  Inductive Noise  . . . . . . . . . . . . 288--288
              J. Adam Butts and   
               Gurindar S. Sohi   Use-Based Register Caching with
                                  Decoupled Indexing . . . . . . . . . . . 302--302
          Gonzalez Gonzalez and   
             Adrian Cristal and   
              Daniel Ortega and   
       Alexander Veidenbaum and   
                   Mateo Valero   A Content Aware Integer Register File
                                  Organization . . . . . . . . . . . . . . 314--314
           Mikko H. Lipasti and   
            Brian R. Mestan and   
                   Erika Gunadi   Physical Register Inlining . . . . . . . 325--325
         Tejas S. Karkhanis and   
                 James E. Smith   A First-Order Superscalar Processor
                                  Model  . . . . . . . . . . . . . . . . . 338--338
            Lieven Eeckhout and   
         Robert H. Bell Jr. and   
           Bastiaan Stougie and   
          Koen De Bosschere and   
                   Lizy K. John   Control Flow Modeling in Statistical
                                  Simulation for Accurate and Efficient
                                  Processor Design Studies . . . . . . . . 350--350
               Bharath Iyer and   
       Sadagopan Srinivasan and   
                    Bruce Jacob   Extended Split-Issue: Enabling
                                  Flexibility in the Hardware
                                  Implementation of NUAL VLIW DSPs . . . . 364--364
         Angshuman Parashar and   
        Sudhanva Gurumurthi and   
          Anand Sivasubramaniam   A Complexity-Effective Approach to ALU
                                  Bandwidth Enhancement for
                                  Instruction-Level Temporal Redundancy    376--376
                      Anonymous   Author Index . . . . . . . . . . . . . . 387--387

ACM SIGARCH Computer Architecture News
Volume 32, Number 3, June, 2004

      Adrián Cristal and   
José F. Martínez and   
                Josep Llosa and   
                   Mateo Valero   A case for resource-conscious
                                  out-of-order processors: towards
                                  kilo-instruction in-flight processors    3--10
               Partha Kundu and   
           Murali Annavaram and   
                 Trung Diep and   
                      John Shen   A case for shared instruction cache on
                                  chip multiprocessors running OLTP  . . . 11--18
           N. Venkateswaran and   
  Waran Research Foundation and   
            Aditya Krishnan and   
          S. Niranjan Kumar and   
         Arrvindh Shriraman and   
             Srinivas Sridharan   Memory in processor: a novel design
                                  paradigm for supercomputing
                                  architectures  . . . . . . . . . . . . . 19--26
                I. Branovic and   
                  R. Giorgi and   
                  E. Martinelli   A workload characterization of elliptic
                                  curve cryptography methods in embedded
                                  environments . . . . . . . . . . . . . . 27--34
                K. Brifault and   
                  H. P. Charles   Data cache management on EPIC
                                  architecture: optimizing memory access
                                  for image processing . . . . . . . . . . 35--42
            Naohiko Shimizu and   
                     Chiaki Kon   Java object look aside buffer for
                                  embedded applications  . . . . . . . . . 43--49
           Akihito Sakanaka and   
           Seiichirou Fujii and   
                 Toshinori Sato   A leakage-energy-reduction technique for
                                  highly-associative caches in embedded
                                  systems  . . . . . . . . . . . . . . . . 50--54
                    S. Moch and   
             M. Berekovi\'c and   
             H. J. Stolberg and   
                  L. Friebe and   
          M. B. Kulaczewski and   
               A. Dehnhardt and   
                      P. Pirsch   HIBRID-SOC: a multi-core architecture
                                  for image and video applications . . . . 55--61
           Mladen Berekovic and   
            Sören Moch and   
                   Peter Pirsch   A scalable, clustered SMT processor for
                                  digital signal processing  . . . . . . . 62--69
               S. Bartolini and   
                    C. A. Prete   A proposal for input-sensitivity
                                  analysis of profile-driven optimizations
                                  on embedded applications . . . . . . . . 70--77
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 78--83

ACM SIGARCH Computer Architecture News
Volume 32, Number 4, September, 2004

                 John R. Mashey   War of the benchmark means: time for a
                                  truce  . . . . . . . . . . . . . . . . . 1--14
             Jean-Louis Lafitte   40 years later \ldots a new engine to
                                  handle an operating system
                                  infrastructure . . . . . . . . . . . . . 15--22
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 23--41

ACM SIGARCH Computer Architecture News
Volume 32, Number 5, December, 2004

              Lance Hammond and   
         Brian D. Carlstrom and   
                 Vicky Wong and   
              Ben Hertzberg and   
                  Mike Chen and   
         Christos Kozyrakis and   
                 Kunle Olukotun   Programming with transactional coherence
                                  and consistency (TCC)  . . . . . . . . . 1--13
                Mihai Budiu and   
       Girish Venkataramani and   
            Tiberiu Chelcea and   
           Seth Copen Goldstein   Spatial computation  . . . . . . . . . . 14--26
         Virantha Ekanayake and   
           Clinton Kelly IV and   
                  Rajit Manohar   An ultra low-power processor for sensor
                                  networks . . . . . . . . . . . . . . . . 27--36
        Christopher R. Lumb and   
                Richard Golding   D-SPTF: decentralized request
                                  distribution in brick-based storage
                                  systems  . . . . . . . . . . . . . . . . 37--47
              Yasushi Saito and   
       Svend Fròlund and   
            Alistair Veitch and   
              Arif Merchant and   
                   Susan Spence   FAB: building distributed enterprise
                                  disk arrays from commodity components    48--58
          Timothy E. Denehy and   
                  John Bent and   
     Florentina I. Popovici and   
   Andrea C. Arpaci-Dusseau and   
        Remzi H. Arpaci-Dusseau   Deconstructing storage arrays  . . . . . 59--71
            Xiaotong Zhuang and   
                  Tao Zhang and   
                  Santosh Pande   HIDE: an infrastructure for efficiently
                                  protecting information leakage on the
                                  address bus  . . . . . . . . . . . . . . 72--84
              G. Edward Suh and   
                 Jae W. Lee and   
                David Zhang and   
               Srinivas Devadas   Secure program execution via dynamic
                                  information flow tracking  . . . . . . . 85--96
                Jaehyuk Huh and   
              Jichuan Chang and   
                Doug Burger and   
               Gurindar S. Sohi   Coherence decoupling: making use of
                                  incoherence  . . . . . . . . . . . . . . 97--106
     Srikanth T. Srinivasan and   
                Ravi Rajwar and   
             Haitham Akkary and   
                Amit Gandhi and   
                     Mike Upton   Continual flow pipelines . . . . . . . . 107--119
        Rajagopalan Desikan and   
        Simha Sethumadhavan and   
                Doug Burger and   
             Stephen W. Keckler   Scalable selective re-execution for EDGE
                                  architectures  . . . . . . . . . . . . . 120--132
                John Regehr and   
                  Alastair Reid   HOIST: a system for automatically
                                  deriving static analyzers for embedded
                                  systems  . . . . . . . . . . . . . . . . 133--143
              Perry H. Wang and   
         Jamison D. Collins and   
                  Hong Wang and   
               Dongkeun Kim and   
                Bill Greene and   
              Kai-Ming Chan and   
             Aamir B. Yunus and   
                 Terry Sych and   
           Stephen F. Moore and   
                   John P. Shen   Helper threads via virtual
                                  multithreading on an experimental
                                  Itanium-2 processor-based platform . . . 144--155
         Matthias Hauswirth and   
            Trishul M. Chilimbi   Low-overhead memory leak detection using
                                  adaptive statistical profiling . . . . . 156--164
                Xipeng Shen and   
                Yutao Zhong and   
                      Chen Ding   Locality phase prediction  . . . . . . . 165--176
                   Pin Zhou and   
               Vivek Pandey and   
      Jagadeesan Sundaresan and   
           Anand Raghuraman and   
              Yuanyuan Zhou and   
                  Sanjeev Kumar   Dynamic tracking of page miss ratio
                                  curve for memory management  . . . . . . 177--188
           Rodric M. Rabbah and   
  Hariharan Sandanagobalane and   
        Mongkol Ekpanyapong and   
                  Weng-Fai Wong   Compiler orchestrated prefetching via
                                  speculation and predication  . . . . . . 189--198
             Chen-Yong Cher and   
          Antony L. Hosking and   
               T. N. Vijaykumar   Software prefetching for mark-sweep
                                  garbage collection: hardware analysis
                                  and software redesign  . . . . . . . . . 199--210
            David E. Lowell and   
              Yasushi Saito and   
              Eileen J. Samberg   Devirtualizable virtual machines
                                  enabling general, single-node, online
                                  maintenance  . . . . . . . . . . . . . . 211--223
           Jared C. Smolens and   
              Brian T. Gold and   
                Jangwoo Kim and   
              Babak Falsafi and   
               James C. Hoe and   
            Andreas G. Nowatzyk   Fingerprinting: bounding soft-error
                                  detection latency and bandwidth  . . . . 224--234
           Greg Bronevetsky and   
             Daniel Marques and   
             Keshav Pingali and   
                Peter Szwed and   
                  Martin Schulz   Application-level checkpointing for
                                  shared memory programs . . . . . . . . . 235--247
                   Qiang Wu and   
                Philo Juang and   
         Margaret Martonosi and   
               Douglas W. Clark   Formal online methods for
                                  voltage/frequency control in multiple
                                  clock domain microprocessors . . . . . . 248--259
              Mohamed Gomaa and   
          Michael D. Powell and   
               T. N. Vijaykumar   Heat-and-run: leveraging SMT and CMP to
                                  manage power density through the
                                  operating system . . . . . . . . . . . . 260--270
                Xiaodong Li and   
                 Zhenmin Li and   
              Francis David and   
                   Pin Zhou and   
              Yuanyuan Zhou and   
                Sarita Adve and   
                  Sanjeev Kumar   Performance directed energy management
                                  for main memory and disks  . . . . . . . 271--283

ACM SIGARCH Computer Architecture News
Volume 33, Number 1, March, 2005

                 David M. Chess   Security in autonomic computing  . . . . 2--5
                Weidong Shi and   
          Hsien-Hsin S. Lee and   
               Chenghuai Lu and   
                  Mrinmoy Ghosh   Towards the issues in architectural
                                  support for protection of software
                                  execution  . . . . . . . . . . . . . . . 6--15
           John P. McGregor and   
                    Ruby B. Lee   Protecting cryptographic keys and
                                  computations via virtual secure
                                  coprocessing . . . . . . . . . . . . . . 16--26
               Brian Rogers and   
                Yan Solihin and   
                Milos Prvulovic   Memory predecryption: hiding the latency
                                  overhead of memory encryption  . . . . . 27--33
           David A. Holland and   
                 Ada T. Lim and   
               Margo I. Seltzer   An architecture a day keeps the hacker
                                  away . . . . . . . . . . . . . . . . . . 34--41
         Stelios Sidiroglou and   
         Michael E. Locasto and   
           Angelos D. Keromytis   Hardware support for self-healing
                                  software services  . . . . . . . . . . . 42--47
       Jedidiah R. Crandall and   
              Frederic T. Chong   A security assessment of the Minos
                                  architecture . . . . . . . . . . . . . . 48--57
           Matthew Burnside and   
           Angelos D. Keromytis   The case for crypto protocol awareness
                                  inside the OS kernel . . . . . . . . . . 58--64
            Marc L. Corliss and   
       E. Christopher Lewis and   
                      Amir Roth   Using DISE to protect return addresses
                                  from attack  . . . . . . . . . . . . . . 65--72
                    Dong Ye and   
                    David Kaeli   A reliable return address stack:
                                  microarchitectural features to defeat
                                  stack smashing . . . . . . . . . . . . . 73--80
                     Koji Inoue   Energy-security tradeoff in a secure
                                  cache architecture against buffer
                                  overflow attacks . . . . . . . . . . . . 81--89
               Derek Uluski and   
               Micha Moffie and   
                    David Kaeli   Characterizing antivirus workload
                                  execution  . . . . . . . . . . . . . . . 90--98
           Monther Aldwairi and   
               Thomas Conte and   
                   Paul Franzon   Configurable string matching hardware
                                  for speeding up intrusion detection  . . 99--107
        Milena Milenkovi\'c and   
    Aleksandar Milenkovi\'c and   
                   Emil Jovanov   Using instruction block signatures to
                                  counter code injection attacks . . . . . 108--117
               Youtao Zhang and   
                   Jun Yang and   
               Yongjing Lin and   
                        Lan Gao   Architectural support for protecting
                                  user privacy on trusted processors . . . 118--123
            Masaaki Shirase and   
                 Yasushi Hibino   An architecture for elliptic curve
                                  cryptography computation . . . . . . . . 124--133
                 Taeho Kgil and   
                 Laura Falk and   
                   Trevor Mudge   ChipLock: support for secure
                                  microarchitectures . . . . . . . . . . . 134--143
               Magnus Ekman and   
               Fredrik Warg and   
                    Jim Nilsson   An in-depth look at computer performance
                                  growth . . . . . . . . . . . . . . . . . 144--147
           N. Venkateswaran and   
                  S. Balaji and   
                     V. Sridhar   Fault tolerant bus architecture for deep
                                  submicron based processors . . . . . . . 148--155
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 156--160

ACM SIGARCH Computer Architecture News
Volume 33, Number 2, May, 2005

                Ruby B. Lee and   
           Peter C. S. Kwan and   
           John P. McGregor and   
            Jeffrey Dwoskin and   
                 Zhenghong Wang   Architecture for Protecting Critical
                                  Secrets in Microprocessors . . . . . . . 2--13
                      Anonymous   General Chair's Message  . . . . . . . . 9--9
                      Anonymous   Program Chair's Message  . . . . . . . . x--xv
                Weidong Shi and   
          Hsien-Hsin S. Lee and   
              Mrinmoy Ghosh and   
               Chenghuai Lu and   
            Alexandra Boldyreva   High Efficiency Counter Mode Security
                                  Architecture via Prediction and
                                  Precomputation . . . . . . . . . . . . . 14--24
                      Anonymous   Committees . . . . . . . . . . . . . . . 16--16
                      Anonymous   Reviewers  . . . . . . . . . . . . . . . xvii--xviii
              G. Edward Suh and   
       Charles W. O'Donnell and   
              Ishan Sachdev and   
               Srinivas Devadas   Design and Implementation of the AEGIS
                                  Single-Chip Secure Processor Using
                                  Physical Random Functions  . . . . . . . 25--36
        Sudhanva Gurumurthi and   
      Anand Sivasubramaniam and   
             Vivek K. Natarajan   Disk Drive Roadmap from the Thermal
                                  Perspective: a Case for Dynamic Thermal
                                  Management . . . . . . . . . . . . . . . 38--49
             Ram Huggahalli and   
                  Ravi Iyer and   
                  Scott Tetrick   Direct Cache Access for High Bandwidth
                                  Network I/O  . . . . . . . . . . . . . . 50--59
          Haryadi S. Gunawi and   
              Nitin Agrawal and   
   Andrea C. Arpaci-Dusseau and   
    Remzi H. Arpaci-Dusseau and   
                 Jiri Schindler   Deconstructing Commodity Storage
                                  Clusters . . . . . . . . . . . . . . . . 60--71
               Magnus Ekman and   
             Per Stenström   A Robust Main-Memory Compression Scheme  74--85
                 Brian Fahs and   
                Todd Rafacz and   
            Sanjay J. Patel and   
              Steven S. Lumetta   Continuous Optimization  . . . . . . . . 86--97
                Vlad Petric and   
               Tingting Sha and   
                      Amir Roth   RENO: a Rename-Based Instruction
                                  Optimizer  . . . . . . . . . . . . . . . 98--109
                    Lin Tan and   
               Timothy Sherwood   A High Throughput String Matching
                                  Architecture for Intrusion Detection and
                                  Prevention . . . . . . . . . . . . . . . 112--122
            Florin Baboescu and   
            Dean M. Tullsen and   
               Grigore Rosu and   
                   Sumeet Singh   A Tree Based Router Search Engine
                                  Architecture with Single Port Memories   123--133
                 Shorin Kyo and   
        Shin'ichiro Okazaki and   
                     Tamio Arai   An Integrated Memory Array Processor
                                  Architecture for Embedded Image
                                  Recognition Systems  . . . . . . . . . . 134--145
             George A. Reis and   
             Jonathan Chang and   
          Neil Vachharajani and   
                 Ram Rangan and   
            David I. August and   
         Shubhendu S. Mukherjee   Design and Evaluation of Hybrid
                                  Fault-Detection Systems  . . . . . . . . 148--159
            Ethan Schuchman and   
               T. N. Vijaykumar   Rescue: a Microarchitecture for
                                  Testability and Defect Tolerance . . . . 160--171
           Mohamed A. Gomaa and   
               T. N. Vijaykumar   Opportunistic Transient-Fault Detection  172--183
         Steven Balensiefer and   
      Lucas Kregor-Stickles and   
                     Mark Oskin   An Evaluation Framework and Instruction
                                  Set Architecture for Ion-Trap Based
                                  Quantum Micro-Architectures  . . . . . . 186--196
           Leyla Nazhandali and   
                    Bo Zhai and   
                Javin Olson and   
                Anna Reeves and   
             Michael Minuth and   
               Ryan Helfand and   
                Sanjay Pant and   
                Todd Austin and   
                   David Blaauw   Energy Optimization of
                                  Subthreshold-Voltage Sensor Network
                                  Processors . . . . . . . . . . . . . . . 197--207
             Mark Hempstead and   
            Nikhil Tripathi and   
              Patrick Mauro and   
                Gu-Yeon Wei and   
                   David Brooks   An Ultra Low Power System Architecture
                                  for Sensor Network Applications  . . . . 208--219
          Thomas F. Wenisch and   
            Stephen Somogyi and   
       Nikolaos Hardavellas and   
                Jangwoo Kim and   
        Anastassia Ailamaki and   
                  Babak Falsafi   Temporal Streaming of Shared Memory  . . 222--233
               Andreas Moshovos   RegionScout: Exploiting Coarse Grain
                                  Sharing in Snoop-Based Coherence . . . . 234--245
            Jason F. Cantin and   
           Mikko H. Lipasti and   
                 James E. Smith   Improving Multiprocessor Performance
                                  with Coarse-Grain Coherence Tracking . . 246--257
              Stephen Hines and   
               Joshua Green and   
                 Gary Tyson and   
                  David Whalley   Improving Program Efficiency by Packing
                                  Instructions into Registers  . . . . . . 260--271
               Nathan Clark and   
                Jason Blome and   
                Michael Chu and   
               Scott Mahlke and   
               Stuart Biles and   
             Krisztian Flautner   An Architecture Framework for
                                  Transparent Instruction Set
                                  Customization in Embedded Processors . . 272--283
        Satish Narayanasamy and   
               Gilles Pokam and   
                    Brad Calder   BugNet: Continuously Recording Program
                                  Execution for Deterministic Replay
                                  Debugging  . . . . . . . . . . . . . . . 284--295
           Murali Annavaram and   
              Ed Grochowski and   
                      John Shen   Mitigating Amdahl's Law through EPI
                                  Throttling . . . . . . . . . . . . . . . 298--309
                Emil Talpes and   
               Diana Marculescu   Increased Scalability and Power
                                  Efficiency by Using Multiple Speed
                                  Pipelines  . . . . . . . . . . . . . . . 310--321
                Vlad Petric and   
                      Amir Roth   Energy-Effectiveness of Pre-Execution
                                  and Energy-Aware P-Thread Selection  . . 322--333
              Michael Zhang and   
                 Krste Asanovic   Victim Replication: Maximizing Capacity
                                  while Hiding Wire Delay in Tiled Chip
                                  Multiprocessors  . . . . . . . . . . . . 336--345
               Evan Speight and   
                Hazim Shafi and   
                Lixin Zhang and   
                   Ram Rajamony   Adaptive Mechanisms and Policies for
                                  Managing Cache Hierarchies in Chip
                                  Multiprocessors  . . . . . . . . . . . . 346--356
             Zeshan Chishti and   
          Michael D. Powell and   
               T. N. Vijaykumar   Optimizing Replication, Communication,
                                  and Capacity Allocation in CMPs  . . . . 357--368
                 Onur Mutlu and   
                Hyesoon Kim and   
                   Yale N. Patt   Techniques for Efficient Processing in
                                  Runahead Execution Engines . . . . . . . 370--381
              Daniel A. Jimenez   Piecewise Linear Branch Prediction . . . 382--393
                   Andre Seznec   Analysis of the O-GEometric History
                                  Length Branch Predictor  . . . . . . . . 394--405
               Rakesh Kumar and   
              Victor Zyuban and   
                Dean M. Tullsen   Interconnections in Multi-Core
                                  Architectures: Understanding Mechanisms,
                                  Overheads and Scaling  . . . . . . . . . 408--419
                   John Kim and   
           William J. Dally and   
               Brian Towles and   
                  Amit K. Gupta   Microarchitecture of a High-Radix Router 420--431
                  Daeho Seo and   
                   Akif Ali and   
               Won-Taek Lim and   
             Nauman Rafique and   
            Mithuna Thottethodi   Near-Optimal Worst-Case Throughput
                                  Routing for Two-Dimensional Mesh
                                  Networks . . . . . . . . . . . . . . . . 432--443
                Amit Gandhi and   
             Haitham Akkary and   
                Ravi Rajwar and   
     Srikanth T. Srinivasan and   
                     Konrad Lai   Scalable Load and Store Processing in
                                  Latency Tolerant Processors  . . . . . . 446--457
                      Amir Roth   Store Vulnerability Window (SVW):
                                  Re-Execution Filtering for Enhanced Load
                                  Optimization . . . . . . . . . . . . . . 458--468
               E. F. Torres and   
                  P. Ibanez and   
                  V. Vinals and   
                 J. M. Llaberia   Store Buffer Design in First-Level
                                  Multibanked Data Caches  . . . . . . . . 469--480
             Albert Meixner and   
                Daniel J. Sorin   Dynamic Verification of Sequential
                                  Consistency  . . . . . . . . . . . . . . 482--493
                Ravi Rajwar and   
            Maurice Herlihy and   
                     Konrad Lai   Virtualizing Transactional Memory  . . . 494--505
   Saisanthosh Balakrishnan and   
                Ravi Rajwar and   
                 Mike Upton and   
                     Konrad Lai   The Impact of Performance Asymmetry in
                                  Emerging Multicore Architectures . . . . 506--517
         Jayanth Srinivasan and   
             Sarita V. Adve and   
                Pradip Bose and   
                 Jude A. Rivers   Exploiting Structural Duplication for
                                  Lifetime Reliability Enhancement . . . . 520--531
              Arijit Biswas and   
               Paul Racunas and   
          Razvan Cheveresan and   
                  Joel Emer and   
     Shubhendu S. Mukherjee and   
                     Ram Rangan   Computing Architectural Vulnerability
                                  Factors for Address-Based Structures . . 532--543
       Moinuddin K. Qureshi and   
             David Thompson and   
                   Yale N. Patt   The V-Way Cache: Demand Based
                                  Associativity via Global Replacement . . 544--555
                      Anonymous   Author Index . . . . . . . . . . . . . . 556--557

ACM SIGARCH Computer Architecture News
Volume 33, Number 3, June, 2005

               S. Bartolini and   
                  P. Foglia and   
                    C. A. Prete   Guests editors' introduction . . . . . . 1--2
           Hanene Ben Fradj and   
         Asmaa el Ouardighi and   
     Cécile Belleudy and   
                  Michel Auguin   Energy aware memory architecture
                                  configuration  . . . . . . . . . . . . . 3--9
              Hyo-Joong Suh and   
                 Sung Woo Chung   DRACO: optimized CC-NUMA system with
                                  novel dual-link interconnections to
                                  reduce the memory latency  . . . . . . . 10--16
                 Sami Yehia and   
Jean-François Collard and   
                  Olivier Temam   Load squared: adding logic close to
                                  memory to reduce the latency of indirect
                                  loads with high miss ratios  . . . . . . 17--24
          Hiroaki Kobayashi and   
                Isao Kotera and   
              Hiroyuki Takizawa   Locality analysis to control dynamically
                                  way-adaptable caches . . . . . . . . . . 25--32
                 F. Arakawa and   
                M. Ishikawa and   
                   Y. Kondo and   
                   T. Kamei and   
                   M. Ozawa and   
                  O. Nishii and   
                     T. Hattori   SH-X: an embedded processor core for
                                  consumer appliances  . . . . . . . . . . 33--40
                  Afrin Naz and   
              Mehran Rezaei and   
               Krishna Kavi and   
                  Philip Sweany   Improving data cache performance with
                                  integrated use of split caches, victim
                                  cache and stream buffers . . . . . . . . 41--48
               Alex Pajuelo and   
    Antonio González and   
                   Mateo Valero   Speculative execution for hiding memory
                                  latency  . . . . . . . . . . . . . . . . 49--56
        Javier Verdú and   
        Jorge García and   
           Mario Nemirovsky and   
                   Mateo Valero   The impact of traffic aggregation on the
                                  memory performance of networking
                                  applications . . . . . . . . . . . . . . 57--62
                Bramha Allu and   
                      Wei Zhang   Exploiting the replication cache to
                                  improve performance for multiple-issue
                                  microprocessors  . . . . . . . . . . . . 63--71
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 72--74
                      Anonymous   MEDEA 2004 workshop  . . . . . . . . . . ??

ACM SIGARCH Computer Architecture News
Volume 33, Number 4, November, 2005

           Norman P. Jouppi and   
               Rakesh Kumar and   
                   Dean Tullsen   Introduction to the special issue on the
                                  2005 Workshop on Design, Analysis, and
                                  Simulation of Chip Multiprocessors
                                  (dasCMP'05)  . . . . . . . . . . . . . . 4--4
                   James Laudon   Performance/Watt: the new server focus   5--13
              John D. Davis and   
                    Cong Fu and   
                   James Laudon   The RASE (Rapid, Accurate Simulation
                                  Environment) for chip multiprocessors    14--23
                   Lisa Hsu and   
                  Ravi Iyer and   
           Srihari Makineni and   
            Steve Reinhardt and   
                  Donald Newell   Exploring the cache design space for
                                  large scale CMPs . . . . . . . . . . . . 24--33
              John D. Davis and   
      Stephen E. Richardson and   
           Charis Charitsis and   
                 Kunle Olukotun   A chip prototyping substrate: the
                                  flexible architecture for simulation and
                                  testing (FAST) . . . . . . . . . . . . . 34--43
          Neil Vachharajani and   
               Matthew Iyer and   
              Chinmay Ashok and   
        Manish Vachharajani and   
            David I. August and   
                 Daniel Connors   Chip multi-processor scalability for
                                  single-threaded applications . . . . . . 44--53
                 Julia Chen and   
                Philo Juang and   
                   Kevin Ko and   
         Gilberto Contreras and   
                David Penry and   
                 Ram Rangan and   
                Adam Stoler and   
              Li-Shiuan Peh and   
             Margaret Martonosi   Hardware-modulated parallelism in chip
                                  multiprocessors  . . . . . . . . . . . . 54--63
               Jack Sampson and   
Rubén González and   
Jean-François Collard and   
           Norman P. Jouppi and   
                Mike Schlansker   Fast synchronization for chip
                                  multiprocessors  . . . . . . . . . . . . 64--69
          Anahita Shayesteh and   
              Glenn Reinman and   
              Norman Jouppi and   
              Suleyman Sair and   
                   Tim Sherwood   Dynamically configurable shared CMP
                                  helper engines for improved performance  70--79
     Theofanis Constantinou and   
         Yiannakis Sazeides and   
             Pierre Michaud and   
               Damien Fetis and   
                   Andre Seznec   Performance implications of single
                                  thread migration on a chip multi-core    80--91
          Milo M. K. Martin and   
            Daniel J. Sorin and   
       Bradford M. Beckmann and   
           Michael R. Marty and   
                     Min Xu and   
         Alaa R. Alameldeen and   
             Kevin E. Moore and   
               Mark D. Hill and   
                  David A. Wood   Multifacet's general execution-driven
                                  multiprocessor simulator (GEMS) toolset  92--99
                 David Wang and   
              Brinda Ganesh and   
      Nuengwong Tuaycharoen and   
            Kathleen Baynes and   
               Aamer Jaleel and   
                    Bruce Jacob   DRAMsim: a memory system simulator . . . 100--107
             Barry Rountree and   
            Robert Springer and   
         David K. Lowenthal and   
               Vincent W. Freeh   Notes from HPPAC 2005  . . . . . . . . . 108--112
                 H. C. Wang and   
                     C. K. Yuen   A general framework to build new CPUs by
                                  mapping abstract machine code to
                                  instruction level parallel execution
                                  hardware . . . . . . . . . . . . . . . . 113--120
                Nana B. Sam and   
               Martin Burtscher   Improving memory system performance with
                                  energy-efficient value speculation . . . 121--127
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 128--133

ACM SIGARCH Computer Architecture News
Volume 33, Number 5, December, 2005

                David Kaeli and   
                    Robert Cohn   WBIA'05: Introduction to the special
                                  issue  . . . . . . . . . . . . . . . . . 1--2
                Chunling Hu and   
                John McCabe and   
   Daniel A. Jiménez and   
                  Ulrich Kremer   The Camino Compiler infrastructure . . . 3--8
              Martin Schulz and   
                   Dong Ahn and   
              Andrew Bernat and   
      Bronis R. de Supinski and   
               Steven Y. Ko and   
                Gregory Lee and   
                 Barry Rountree   Scalable dynamic binary instrumentation
                                  for Blue Gene/L  . . . . . . . . . . . . 9--14
                Edson Borin and   
                 Cheng Wang and   
                 Youfeng Wu and   
                   Guido Araujo   Dynamic binary control-flow errors
                                  detection  . . . . . . . . . . . . . . . 15--20
               Micha Moffie and   
                    David Kaeli   ASM: application security monitor  . . . 21--26
                   Qin Zhao and   
              Rodric Rabbah and   
                  Weng-Fai Wong   Dynamic memory optimization using pool
                                  allocation and prefetching . . . . . . . 27--32
               Xiaofeng Gao and   
                 Beth Simon and   
                  Allan Snavely   ALITER: an asynchronous lightweight
                                  instrumentation tool for event recording 33--38
             Collin McCurdy and   
                Charles Fischer   Using Pin as a memory reference
                                  generator for multiprocessor simulation  39--44
                  Heidi Pan and   
           Krste Asanovi\'c and   
                Robert Cohn and   
                  Chi-Keung Luk   Controlling program execution through
                                  binary instrumentation . . . . . . . . . 45--50
               Nikrouz Faroughi   Profiling of parallel processing
                                  programs on shared memory
                                  multiprocessors using Simics . . . . . . 51--56
               Naveen Kumar and   
                    Ramesh Peri   Transparent debugging of dynamically
                                  instrumented programs  . . . . . . . . . 57--62
            Laune C. Harris and   
               Barton P. Miller   Practical analysis of stripped binary
                                  code . . . . . . . . . . . . . . . . . . 63--68
         Vijay Janapa Reddi and   
                Dan Connors and   
                 Robert S. Cohn   Persistence in dynamic code
                                  transformation systems . . . . . . . . . 69--74
             Ram Srinivasan and   
                    Olaf Lubeck   MonteSim: a Monte Carlo performance
                                  model for in-order microarchitectures    75--80
         Michael Laurenzano and   
                 Beth Simon and   
              Allan Snavely and   
                    Meghan Gunn   Low cost trace-driven memory simulation
                                  using SimPoint . . . . . . . . . . . . . 81--86
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 87--93

ACM SIGARCH Computer Architecture News
Volume 34, Number 1, 2006

               S. Bartolini and   
                  P. Foglia and   
                  R. Giorgi and   
                    C. A. Prete   Memory performance: dealing with
                                  applications, systems and architecture   1--2
             Scott Friedman and   
      Praveen Krishnamurthy and   
          Roger Chamberlain and   
              Ron K. Cytron and   
                Jason E. Fritts   Dusty caches for reference counting
                                  garbage collection . . . . . . . . . . . 3--10
      Subramanian Ramaswamy and   
           Jaswanth Sreeram and   
      Sudhakar Yalamanchili and   
               Krishna V. Palem   Data trace cache: an application
                                  specific cache architecture  . . . . . . 11--18
                  Afrin Naz and   
               Krishna Kavi and   
              Mehran Rezaei and   
                     Wentong Li   Making a case for split data caches for
                                  embedded applications  . . . . . . . . . 19--26
                    B. Allu and   
                   W. Zhang and   
                     M. Kandala   Exploiting the replication cache to
                                  improve cache read bandwidth cost
                                  effectively  . . . . . . . . . . . . . . 27--32
           Matteo Monchiero and   
           Gianluca Palermo and   
           Cristina Silvano and   
                   Oreste Villa   An efficient synchronization technique
                                  for multiprocessor systems on-chip . . . 33--40
           Farshad Khunjush and   
          Nikitas J. Dimopoulos   Hiding message delivery and reducing
                                  memory access latency by providing
                                  direct-to-cache transfer during receive
                                  operations in a message passing
                                  environment  . . . . . . . . . . . . . . 41--48
                    Yao Yue and   
                 Chuang Lin and   
                    Zhangxi Tan   NPCryptBench: a cryptographic benchmark
                                  suite for network processors . . . . . . 49--56
Abelardo López-Lagunas and   
                    Sek M. Chai   Memory bandwidth optimization through
                                  stream descriptors . . . . . . . . . . . 57--64
          Akihiro Chiyonobu and   
                 Toshinori Sato   Energy-efficient instruction scheduling
                                  utilizing cache miss information . . . . 65--70
         Alessandro Bardine and   
            Alessio Bechini and   
       Pierfrancesco Foglia and   
           Cosimo Antonio Prete   Analysis of embedded video coder
                                  systems: a system-level approach . . . . 71--76
            Alex Gontmakher and   
             Assaf Schuster and   
                  Avi Mendelson   Inthreads: a low granularity
                                  parallelization model  . . . . . . . . . 77--80
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 81--86

ACM SIGARCH Computer Architecture News
Volume 34, Number 2, 2006

                      Yale Patt   Computer Architecture Research and
                                  Future Microprocessors: Where Do We Go
                                  from Here? . . . . . . . . . . . . . . . 2--2
                Jongman Kim and   
    Chrysostomos Nicopoulos and   
                  Dongkook Park   A Gracefully Degrading and
                                  Energy-Efficient Modular Router
                                  Architecture for On-Chip Networks  . . . 4--15
                      Anonymous   Message from the General Chair . . . . . 10--10
                      Anonymous   Message from the Program Chair . . . . . 11--11
                      Anonymous   Reviewers  . . . . . . . . . . . . . . . 14--14
                Steve Scott and   
                Dennis Abts and   
                   John Kim and   
               William J. Dally   The BlackWidow High-Radix Clos Network   16--28
                      Anonymous   SIGARCH Guidelines . . . . . . . . . . . 17--17
              Arvind Arvind and   
             Jan-Willem Maessen   Memory Model $=$ Instruction Reordering
                                  $+$ Store Atomicity  . . . . . . . . . . 29--40
        Christoph von Praun and   
             Harold W. Cain and   
             Jong-Deok Choi and   
                 Kyung Dong Ryu   Conditional Memory Ordering  . . . . . . 41--52
            Austen McDonald and   
             JaeWoong Chung and   
         Brian D. Carlstrom and   
               Chi Cao Minh and   
               Hassan Chafi and   
         Christos Kozyrakis and   
                 Kunle Olukotun   Architectural Semantics for Practical
                                  Transactional Memory . . . . . . . . . . 53--65
  Parthasarathy Ranganathan and   
                 Phil Leech and   
                David Irwin and   
                  Jeffrey Chase   Ensemble-level Power Management for
                                  Dense Blade Servers  . . . . . . . . . . 66--77
               James Donald and   
             Margaret Martonosi   Techniques for Multicore Thermal
                                  Management: Classification and New
                                  Exploration  . . . . . . . . . . . . . . 78--88
                   Yuan Lin and   
               Hyunseok Lee and   
                   Mark Woh and   
                 Yoav Harel and   
               Scott Mahlke and   
               Trevor Mudge and   
       Chaitali Chakrabarti and   
             Krisztian Flautner   SODA: a Low-power Architecture For
                                  Software Radio . . . . . . . . . . . . . 89--101
                Weidong Shi and   
          Hsien-Hsin S. Lee and   
                Laura `Falk and   
                  Mrinmoy Ghosh   An Integrated Framework for Dependable
                                  and Revivable Architectures Using
                                  Multicore Processors . . . . . . . . . . 102--113
         Richard A. Hankins and   
          Gautham N. Chinya and   
         Jamison D. Collins and   
              Perry H. Wang and   
                Ryan Rakvic and   
                  Hong Wang and   
                   John P. Shen   Multiple Instruction Stream Processor    114--127
                    Philip Emma   The End of Scaling? Revolutions in
                                  Technology and Microarchitecture as We
                                  Pass the 90 Nanometer Node . . . . . . . 128--128
                  Feihui Li and   
    Chrysostomos Nicopoulos and   
          Thomas Richardson and   
                   Yuan Xie and   
    Vijaykrishnan Narayanan and   
                Mahmut Kandemir   Design and Management of $3$D Chip
                                  Multiprocessors Using Network-in-Memory  130--141
                  Alok Garg and   
           M. Wasiur Rashid and   
                  Michael Huang   Slackened Memory Dependence Enforcement:
                                  Combining Opportunistic Forwarding with
                                  Decoupled Verification . . . . . . . . . 142--154
                 Chuanjun Zhang   Balanced Cache: Reducing Conflict Misses
                                  of Direct-Mapped Caches  . . . . . . . . 155--166
       Moinuddin K. Qureshi and   
            Daniel N. Lynch and   
                 Onur Mutlu and   
                   Yale N. Patt   A Case for MLP-Aware Cache Replacement   167--178
                 Chenyu Yan and   
           Daniel Englender and   
            Milos Prvulovic and   
               Brian Rogers and   
                    Yan Solihin   Improving Cost, Performance, and
                                  Security of Memory Encryption and
                                  Authentication . . . . . . . . . . . . . 179--190
         Benjamin C. Brodie and   
            David E. Taylor and   
                  Ron K. Cytron   A Scalable Architecture For
                                  High-Throughput Regular-Expression
                                  Pattern Matching . . . . . . . . . . . . 191--202
             Jahangir Hasan and   
            Srihari Cadambi and   
           Venkatta Jakkula and   
              Srimat Chakradhar   Chisel: a Storage-efficient,
                                  Collision-free Hash-based Network
                                  Processing Architecture  . . . . . . . . 203--215
     Christopher B. Colohan and   
        Anastassia Ailamaki and   
         J. Gregory Steffan and   
                  Todd C. Mowry   Tolerating Dependences Between Large
                                  Speculative Threads Via Sub-Threads  . . 216--226
                  Luis Ceze and   
                 James Tuck and   
            Josep Torrellas and   
                 Calin Cascaval   Bulk Disambiguation of Speculative
                                  Threads in Multiprocessors . . . . . . . 227--238
             Seungryul Choi and   
                   Donald Yeung   Learning-Based SMT Processor Resource
                                  Distribution via Hill-Climbing . . . . . 239--251
            Stephen Somogyi and   
          Thomas F. Wenisch and   
        Anastassia Ailamaki and   
              Babak Falsafi and   
               Andreas Moshovos   Spatial Memory Streaming . . . . . . . . 252--263
              Jichuan Chang and   
               Gurindar S. Sohi   Cooperative Caching for Chip
                                  Multiprocessors  . . . . . . . . . . . . 264--276
                Shiliang Hu and   
                 James E. Smith   Reducing Startup Time in Co-Designed
                                  Virtual Machines . . . . . . . . . . . . 277--288
                  Qing Yang and   
                Weijun Xiao and   
                        Jin Ren   TRAP-Array: a Disk Array Architecture
                                  Providing Timely Recovery to Any
                                  Point-in-time  . . . . . . . . . . . . . 289--301
   Saisanthosh Balakrishnan and   
               Gurindar S. Sohi   Program Demultiplexing: Data-flow based
                                  Speculative Parallelization of Methods
                                  in Sequential Programs . . . . . . . . . 302--313
             Steven Swanson and   
              Andrew Putnam and   
            Martha Mercaldi and   
              Ken Michelson and   
            Andrew Petersen and   
            Andrew Schwerin and   
                 Mark Oskin and   
                Susan J. Eggers   Area-Performance Trade-offs in Tiled
                                  Dataflow Architectures . . . . . . . . . 314--326
              Karin Strauss and   
               Xiaowei Shen and   
                Josep Torrellas   Flexible Snooping: Adaptive Forwarding
                                  and Filtering of Snoops in Embedded-Ring
                                  Multiprocessors  . . . . . . . . . . . . 327--338
                Liqun Cheng and   
       Naveen Muralimanohar and   
             Karthik Ramani and   
     Rajeev Balasubramonian and   
                 John B. Carter   Interconnect-Aware Coherence Protocols
                                  for Chip Multiprocessors . . . . . . . . 339--351
                   Steve Herrod   The Future of Virtualization Technology  352--352
           Rodney Van Meter and   
                 Kae Nemoto and   
                W. J. Munro and   
                  Kohei M. Itoh   Distributed Arithmetic on a Quantum
                                  Multicomputer  . . . . . . . . . . . . . 354--365
          Nemanja Isailovic and   
               Yatish Patel and   
               Mark Whitney and   
               John Kubiatowicz   Interconnection Networks for Scalable
                                  Quantum Computers  . . . . . . . . . . . 366--377
          Darshan D. Thaker and   
          Tzvetan S. Metodi and   
            Andrew W. Cross and   
            Isaac L. Chuang and   
              Frederic T. Chong   Quantum Memory Hierarchies: Efficient
                                  Designs to Match Available Parallelism
                                  in Quantum Computing . . . . . . . . . . 378--390
                      Anonymous   Author Index . . . . . . . . . . . . . . 391--391

ACM SIGARCH Computer Architecture News
Volume 34, Number 3, June, 2006

               Martin Burtscher   TCgen 2.0: a tool to automatically
                                  generate lossless trace compressors  . . 1--8
                Abhas Kumar and   
               Nisheet Jain and   
               Mainak Chaudhuri   Long-latency branches: how much do they
                                  matter?  . . . . . . . . . . . . . . . . 9--15
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 16--21

ACM SIGARCH Computer Architecture News
Volume 34, Number 4, September, 2006

                John L. Henning   SPEC CPU2006 benchmark descriptions  . . 1--17
              Daniel Citron and   
               Adham Hurani and   
                   Alaa Gnadrey   The harmonic or geometric mean: does it
                                  really matter? . . . . . . . . . . . . . 18--25
                  James Poe and   
                         Tao Li   BASS: a benchmark suite for evaluating
                                  architectural security systems . . . . . 26--33
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 34--37

ACM SIGARCH Computer Architecture News
Volume 34, Number 5, December, 2006

               Mendel Rosenblum   Impact of virtualization on computer
                                  architecture and operating systems . . . 1--1
                Keith Adams and   
                     Ole Agesen   A comparison of software and hardware
                                  techniques for x86 virtualization  . . . 2--13
           Stephen T. Jones and   
   Andrea C. Arpaci-Dusseau and   
        Remzi H. Arpaci-Dusseau   Geiger: monitoring the buffer cache in a
                                  virtual machine environment  . . . . . . 14--24
       Jedidiah R. Crandall and   
            Gary Wassermann and   
  Daniela A. S. de Oliveira and   
                Zhendong Su and   
                S. Felix Wu and   
              Frederic T. Chong   Temporal search: detecting hidden
                                  malware timebombs with virtual machines  25--36
                    Shan Lu and   
               Joseph Tucek and   
                   Feng Qin and   
                  Yuanyuan Zhou   AVIO: detecting atomicity violations via
                                  access interleaving invariants . . . . . 37--48
                     Min Xu and   
               Mark D. Hill and   
                Rastislav Bodik   A regulated transitive reduction (RTR)
                                  for longer memory race recording . . . . 49--60
            Michael D. Bond and   
            Kathryn S. McKinley   Bell: bit-encoding online memory leak
                                  detection  . . . . . . . . . . . . . . . 61--72
               Smitha Shyam and   
      Kypros Constantinides and   
               Sujay Phadke and   
           Valeria Bertacco and   
                    Todd Austin   Ultra low-cost defect protection for
                                  microprocessor pipelines . . . . . . . . 73--82
             Vimal K. Reddy and   
             Eric Rotenberg and   
        Sailashri Parthasarathy   Understanding prediction-based partial
                                  redundant threading for low-overhead,
                                  high-coverage fault tolerance  . . . . . 83--94
         Angshuman Parashar and   
      Anand Sivasubramaniam and   
            Sudhanva Gurumurthi   SlicK: slice-based locality exploitation
                                  for efficient redundant multithreading   95--105
              Taliver Heath and   
          Ana Paula Centeno and   
             Pradeep George and   
                 Luiz Ramos and   
                 Yogesh Jaluria   Mercury and Freon: temperature emulation
                                  and management for server systems  . . . 106--116
                 Taeho Kgil and   
              Shaun D'Souza and   
                  Ali Saidi and   
             Nathan Binkert and   
          Ronald Dreslinski and   
               Trevor Mudge and   
           Steven Reinhardt and   
             Krisztian Flautner   PicoServer: using $3$D stacking
                                  technology to enable a compact energy
                                  efficient chip multiprocessor  . . . . . 117--128
         Katherine E. Coons and   
                   Xia Chen and   
                Doug Burger and   
        Kathryn S. McKinley and   
            Sundeep K. Kushwaha   A spatial path scheduling algorithm for
                                  EDGE architectures . . . . . . . . . . . 129--140
            Martha Mercaldi and   
             Steven Swanson and   
            Andrew Petersen and   
              Andrew Putnam and   
            Andrew Schwerin and   
                 Mark Oskin and   
                Susan J. Eggers   Instruction scheduling for a tiled
                                  dataflow architecture  . . . . . . . . . 141--150
          Michael I. Gordon and   
              William Thies and   
              Saman Amarasinghe   Exploiting coarse-grained task, data,
                                  and pipeline parallelism in stream
                                  programs . . . . . . . . . . . . . . . . 151--162
               Mahim Mishra and   
        Timothy J. Callahan and   
            Tiberiu Chelcea and   
       Girish Venkataramani and   
          Seth C. Goldstein and   
                    Mihai Budiu   Tartan: evaluating spatial computation
                                  for whole program execution  . . . . . . 163--174
              Stijn Eyerman and   
            Lieven Eeckhout and   
            Tejas Karkhanis and   
                 James E. Smith   A performance counter architecture for
                                  computing accurate CPI components  . . . 175--184
            Benjamin C. Lee and   
                David M. Brooks   Accurate and efficient regression
                                  modeling for microarchitectural
                                  performance and power prediction . . . . 185--194
            Engin Ïpek and   
             Sally A. McKee and   
               Rich Caruana and   
      Bronis R. de Supinski and   
                  Martin Schulz   Efficiently exploring architectural
                                  design spaces via predictive modeling    195--206
            Mazen Kharbutli and   
              Xiaowei Jiang and   
                Yan Solihin and   
         Guru Venkataramani and   
                Milos Prvulovic   Comprehensively and efficiently
                                  protecting the heap  . . . . . . . . . . 207--218
        Trishul M. Chilimbi and   
                Vinod Ganapathy   HeapMD: identifying heap-based bugs
                                  using anomaly detection  . . . . . . . . 219--228
        Satish Narayanasamy and   
          Cristiano Pereira and   
                    Brad Calder   Recording shared memory dependencies
                                  using strata . . . . . . . . . . . . . . 229--240
       Jaidev P. Patwardhan and   
               Vijeta Johri and   
                Chris Dwyer and   
                Alvin R. Lebeck   A defect tolerant self-organizing
                                  nanoscale SIMD architecture  . . . . . . 241--251
            Ethan Schuchman and   
               T. N. Vijaykumar   A program transformation and
                                  architecture support for quantum
                                  uncomputation  . . . . . . . . . . . . . 252--263
          Shashidhar Mysore and   
              Banit Agrawal and   
           Navin Srivastava and   
             Sheng-Chih Lin and   
           Kaustav Banerjee and   
                   Tim Sherwood   Introspective $3$D chips . . . . . . . . 264--273
            Jason F. Cantin and   
           Mikko H. Lipasti and   
                 James E. Smith   Stealth prefetching  . . . . . . . . . . 274--282
        Koushik Chakraborty and   
            Philip M. Wells and   
               Gurindar S. Sohi   Computation spreading: employing
                                  hardware migration to specialize CMP
                                  cores on-the-fly . . . . . . . . . . . . 283--292
            Jason E. Miller and   
                  Anant Agarwal   Software-based instruction caching for
                                  embedded processors  . . . . . . . . . . 293--302
                     Xin Li and   
               Marian Boldt and   
         Reinhard von Hanxleden   Mapping Esterel onto a multi-threaded
                                  embedded processor . . . . . . . . . . . 303--314
          Nathan L. Binkert and   
               Ali G. Saidi and   
            Steven K. Reinhardt   Integrated network interfaces for
                                  high-bandwidth TCP/IP  . . . . . . . . . 315--324
              David Tarditi and   
                  Sidd Puri and   
                   Jose Oglesby   Accelerator: using data parallelism to
                                  program GPUs for general-purpose uses    325--335
               Peter Damron and   
         Alexandra Fedorova and   
                      Yossi Lev   Hybrid transactional memory  . . . . . . 336--346
              Weihaw Chuang and   
        Satish Narayanasamy and   
           Ganesh Venkatesh and   
               Jack Sampson and   
     Michael Van Biesbrouck and   
               Gilles Pokam and   
                Brad Calder and   
                Osvaldo Colavin   Unbounded page-based transactional
                                  memory . . . . . . . . . . . . . . . . . 347--358
        Michelle J. Moravan and   
              Jayaram Bobba and   
             Kevin E. Moore and   
                   Luke Yen and   
               Mark D. Hill and   
                 Ben Liblit and   
           Michael M. Swift and   
                  David A. Wood   Supporting nested transactional memory
                                  in logTM . . . . . . . . . . . . . . . . 359--370
             JaeWoong Chung and   
               Chi Cao Minh and   
            Austen McDonald and   
               Travis Skare and   
               Hassan Chafi and   
         Brian D. Carlstrom and   
         Christos Kozyrakis and   
                 Kunle Olukotun   Tradeoffs in transactional memory
                                  virtualization . . . . . . . . . . . . . 371--381
          Motohiro Kawahito and   
            Hideaki Komatsu and   
             Takao Moriyama and   
              Hiroshi Inoue and   
                Toshio Nakatani   A new idiom recognition framework for
                                  exploiting hardware-assist instructions  382--393
               Sorav Bansal and   
                     Alex Aiken   Automatic generation of peephole
                                  superoptimizers  . . . . . . . . . . . . 394--403
       Armando Solar-Lezama and   
               Liviu Tancau and   
            Rastislav Bodik and   
              Sanjit Seshia and   
                 Vijay Saraswat   Combinatorial sketching for finite
                                  programs . . . . . . . . . . . . . . . . 404--415
              Jeff Da Silva and   
             J. Gregory Steffan   A probabilistic pointer analysis for
                                  speculative optimizations  . . . . . . . 416--425

ACM SIGARCH Computer Architecture News
Volume 35, Number 1, March, 2007

               Dean Tullsen and   
               Rakesh Kumar and   
               Norman P. Jouppi   Introduction to the special issue on the
                                  2006 Workshop on Design, Analysis, and
                                  Simulation of Chip Multiprocessors:
                                  (dasCMP'06)  . . . . . . . . . . . . . . 2--2
              Aqeel Mahesri and   
           Nicholas J. Wang and   
                Sanjay J. Patel   Hardware support for software controlled
                                  multithreading . . . . . . . . . . . . . 3--12
                 Xudong Shi and   
                   Feiqi Su and   
              Jih-kwon Peir and   
                     Ye Xia and   
                      Zhen Yang   CMP cache performance projection:
                                  accessibility vs. capacity . . . . . . . 13--20
                    Fei Guo and   
                Hari Kannan and   
                    Li Zhao and   
            Ramesh Illikkal and   
                  Ravi Iyer and   
                 Don Newell and   
                Yan Solihin and   
             Christos Kozyrakis   From chaos to QoS: case studies in CMP
                                  resource management  . . . . . . . . . . 21--30
              Masaaki Kondo and   
             Hiroshi Sasaki and   
               Hiroshi Nakamura   Improving fairness, throughput and
                                  energy-efficiency on a chip
                                  multiprocessor through DVFS  . . . . . . 31--38
            M. M. Waliullah and   
                  Per Stenstrom   Starvation-free commit arbitration
                                  policies for transactional memory
                                  systems  . . . . . . . . . . . . . . . . 39--46
               Cesare Ferri and   
              Tali Moreshet and   
              R. Iris Bahar and   
                Luca Benini and   
                Maurice Herlihy   A hardware/software framework for
                                  supporting transactional memory in a
                                  MPSoC environment  . . . . . . . . . . . 47--54
                   Sean Rul and   
        Hans Vandierendonck and   
              Koen De Bosschere   Function level parallelism driven by
                                  data dependencies  . . . . . . . . . . . 55--62
                John L. Henning   Guest editor's introduction  . . . . . . 63--64
                John L. Henning   SPEC CPU suite growth: an historical
                                  perspective  . . . . . . . . . . . . . . 65--68
         Aashish Phansalkar and   
                 Ajay Joshi and   
                   Lizy K. John   Subsetting the SPEC CPU2006 benchmark
                                  suite  . . . . . . . . . . . . . . . . . 69--76
                   Michael Wong   C++ benchmarks in SPEC CPU2006 . . . . . 77--83
                John L. Henning   SPEC CPU2006 memory footprint  . . . . . 84--89
                    Darryl Gove   CPU2006 working set size . . . . . . . . 90--96
                 Wendy Korn and   
                  Moon S. Chang   SPEC CPU2006 sensitivity to memory page
                                  sizes  . . . . . . . . . . . . . . . . . 97--101
        Reinhold P. Weicker and   
                John L. Henning   Subroutine profiling results for the
                                  CPU2006 benchmarks . . . . . . . . . . . 102--111
                    Dong Ye and   
                Joydeep Ray and   
                    David Kaeli   Characterization of file I/O activity
                                  for SPEC CPU2006 . . . . . . . . . . . . 112--117
                John L. Henning   Performance counters and development of
                                  SPEC CPU2006 . . . . . . . . . . . . . . 118--121
                Darryl Gove and   
             Lawrence Spracklen   Evaluating the correspondence between
                                  training and reference workloads in SPEC
                                  CPU2006  . . . . . . . . . . . . . . . . 122--129
            Cloyce D. Spradling   SPEC CPU2006 benchmark tools . . . . . . 130--134
            Swaroop Sridhar and   
        Jonathan S. Shapiro and   
           Prashanth P. Bungale   HDTrans: a low-overhead dynamic
                                  translator . . . . . . . . . . . . . . . 135--140
                    Jun Yan and   
                      Wei Zhang   Hybrid multi-core architecture for
                                  boosting single-threaded performance . . 141--148
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 149--154

ACM SIGARCH Computer Architecture News
Volume 35, Number 2, May, 2007

              David E. Shaw and   
         Martin M. Deneroff and   
                Ron O. Dror and   
          Jeffrey S. Kuskin and   
          Richard H. Larson and   
             John K. Salmon and   
                Cliff Young and   
             Brannon Batson and   
            Kevin J. Bowers and   
               Jack C. Chao and   
        Michael P. Eastwood and   
           Joseph Gagliardo and   
             J. P. Grossman and   
              C. Richard Ho and   
         Douglas J. Ierardi and   
István Kolossváry and   
            John L. Klepeis and   
             Timothy Layman and   
         Christine McLeavey and   
             Mark A. Moraes and   
               Rolf Mueller and   
           Edward C. Priest and   
                Yibing Shan and   
            Jochen Spengler and   
           Michael Theobald and   
               Brian Towles and   
                Stanley C. Wang   Anton, a special-purpose machine for
                                  molecular dynamics simulation  . . . . . 1--12
                 Xiaobo Fan and   
        Wolf-Dietrich Weber and   
             Luiz Andre Barroso   Power provisioning for a warehouse-sized
                                  computer . . . . . . . . . . . . . . . . 13--23
             Colin Blundell and   
               Joe Devietti and   
       E. Christopher Lewis and   
              Milo M. K. Martin   Making the fast case common and the
                                  uncommon case simple in unbounded
                                  transactional memory . . . . . . . . . . 24--34
                Weirong Zhu and   
       Vugranam C. Sreedhar and   
                   Ziang Hu and   
                   Guang R. Gao   Synchronization state buffer: supporting
                                  efficient fine-grain synchronization on
                                  many-core architectures  . . . . . . . . 35--45
           Michael R. Marty and   
                   Mark D. Hill   Virtual hierarchies to support server
                                  consolidation  . . . . . . . . . . . . . 46--56
             Kyle J. Nesbit and   
               James Laudon and   
                 James E. Smith   Virtual private caches . . . . . . . . . 57--68
               Chi Cao Minh and   
           Martin Trautmann and   
             JaeWoong Chung and   
            Austen McDonald and   
             Nathan Bronson and   
               Jared Casper and   
         Christos Kozyrakis and   
                 Kunle Olukotun   An effective hybrid transactional memory
                                  system with strong isolation guarantees  69--80
              Jayaram Bobba and   
             Kevin E. Moore and   
                Haris Volos and   
                   Luke Yen and   
               Mark D. Hill and   
           Michael M. Swift and   
                  David A. Wood   Performance pathologies in hardware
                                  transactional memory . . . . . . . . . . 81--91
            Hany E. Ramadan and   
    Christopher J. Rossbach and   
           Donald E. Porter and   
            Owen S. Hofmann and   
            Aditya Bhandari and   
                 Emmett Witchel   MetaTM/TxLinux: transactional memory for
                                  an operating system  . . . . . . . . . . 92--103
         Arrvindh Shriraman and   
           Michael F. Spear and   
            Hemayet Hossain and   
        Virendra J. Marathe and   
          Sandhya Dwarkadas and   
               Michael L. Scott   An integrated hardware-software approach
                                  to flexible transactional memory . . . . 104--115
                 Pablo Abad and   
            Valentin Puente and   
 José Angel Gregorio and   
                   Pablo Prieto   Rotary router: an efficient architecture
                                  for CMP interconnection networks . . . . 116--125
                   John Kim and   
           William J. Dally and   
                    Dennis Abts   Flattened butterfly: a cost-efficient
                                  topology for high-radix networks . . . . 126--137
                Jongman Kim and   
    Chrysostomos Nicopoulos and   
              Dongkook Park and   
             Reetuparna Das and   
                   Yuan Xie and   
    Vijaykrishnan Narayanan and   
            Mazin S. Yousif and   
                   Chita R. Das   A novel dimensionally-decomposed router
                                  for on-chip communication in $3$D
                                  architectures  . . . . . . . . . . . . . 138--149
                 Amit Kumar and   
              Li-Shiuan Peh and   
               Partha Kundu and   
                   Niraj K. Jha   Express virtual channels: towards the
                                  ideal interconnection fabric . . . . . . 150--161
              Sanjeev Kumar and   
      Christopher J. Hughes and   
                 Anthony Nguyen   Carbon: architectural support for
                                  fine-grained parallelism on chip
                                  multiprocessors  . . . . . . . . . . . . 162--173
         Naveen Neelakantam and   
                Ravi Rajwar and   
            Suresh Srinivas and   
             Uma Srinivasan and   
                   Craig Zilles   Hardware atomicity for reliable software
                                  speculation  . . . . . . . . . . . . . . 174--185
                 Engin Ipek and   
              Meyrem Kirman and   
               Nevin Kirman and   
               Jose F. Martinez   Core fusion: accommodating software
                                  diversity in chip multiprocessors  . . . 186--197
                   Eric Chi and   
            Stephen A. Lyon and   
             Margaret Martonosi   Tailoring quantum architectures to
                                  implementation style: a quantum computer
                                  for mobile and persistent qubits . . . . 198--209
                Xuejun Yang and   
                 Xiaobo Yan and   
              Zuocheng Xing and   
                    Yu Deng and   
                Jiang Jiang and   
                     Ying Zhang   A 64-bit stream processor architecture
                                  for scientific applications  . . . . . . 210--219
      Christopher J. Hughes and   
           Radek Grzeszczuk and   
          Eftychios Sifakis and   
                Daehyun Kim and   
              Sanjeev Kumar and   
            Andrew P. Selle and   
             Jatin Chhugani and   
           Matthew Holliman and   
                 Yen-Kuang Chen   Physical simulation for animation and
                                  visual effects: parallelization and
                                  characterization for chip
                                  multiprocessors  . . . . . . . . . . . . 220--231
              Thomas Y. Yeh and   
           Petros Faloutsos and   
            Sanjay J. Patel and   
                  Glenn Reinman   ParallAX: an architecture for real-time
                                  physics  . . . . . . . . . . . . . . . . 232--243
        Martha Mercaldi Kim and   
            Mojtaba Mehrara and   
                 Mark Oskin and   
                    Todd Austin   Architectural implications of brick and
                                  mortar silicon manufacturing . . . . . . 244--253
              Ahmed M. Amin and   
        Mithuna Thottethodi and   
           T. N. Vijaykumar and   
             Steven Wereley and   
            Stephen C. Jacobson   Aquacore: a programmable architecture
                                  for microfluidics  . . . . . . . . . . . 254--265
          Thomas F. Wenisch and   
         Anastasia Ailamaki and   
              Babak Falsafi and   
               Andreas Moshovos   Mechanisms for store-wait-free
                                  multiprocessors  . . . . . . . . . . . . 266--277
                  Luis Ceze and   
                 James Tuck and   
           Pablo Montesinos and   
                Josep Torrellas   BulkSC: bulk enforcement of sequential
                                  consistency  . . . . . . . . . . . . . . 278--289
                Bruno Diniz and   
            Dorgival Guedes and   
          Wagner Meira, Jr. and   
              Ricardo Bianchini   Limiting the power consumption of main
                                  memory . . . . . . . . . . . . . . . . . 290--301
Francisco Javier Mesa-Martinez and   
   Joseph Nayfach-Battilana and   
                     Jose Renau   Power model validation through thermal
                                  measurements . . . . . . . . . . . . . . 302--311
                  Jiang Lin and   
            Hongzhong Zheng and   
                Zhichun Zhu and   
               Howard David and   
                     Zhao Zhang   Thermal modeling and management of DRAM
                                  memory systems . . . . . . . . . . . . . 312--322
            Abhishek Tiwari and   
          Smruti R. Sarangi and   
                Josep Torrellas   ReCycle: pipeline adaptation to tolerate
                                  process variation  . . . . . . . . . . . 323--334
           Peter G. Sassone and   
             Jeff Rupley II and   
          Edward Brekelbaum and   
             Gabriel H. Loh and   
                    Bryan Black   Matrix scheduler reloaded  . . . . . . . 335--346
        Simha Sethumadhavan and   
          Franziska Roesner and   
               Joel S. Emer and   
                Doug Burger and   
             Stephen W. Keckler   Late-binding: enabling unordered
                                  load-store queues  . . . . . . . . . . . 347--357
             Jacob Leverich and   
             Hideho Arakida and   
          Alex Solomatnikov and   
         Amin Firoozshahian and   
              Mark Horowitz and   
             Christos Kozyrakis   Comparing memory systems for chip
                                  multiprocessors  . . . . . . . . . . . . 358--368
       Naveen Muralimanohar and   
         Rajeev Balasubramonian   Interconnect design considerations for
                                  large NUCA caches  . . . . . . . . . . . 369--380
       Moinuddin K. Qureshi and   
               Aamer Jaleel and   
               Yale N. Patt and   
            Simon C. Steely and   
                      Joel Emer   Adaptive insertion policies for high
                                  performance caching  . . . . . . . . . . 381--391
                 Paul A. Karger   Performance and security lessons learned
                                  from virtualizing the Alpha processor    392--401
         Tejas S. Karkhanis and   
                 James E. Smith   Automated design of application specific
                                  superscalar processors: an analytical
                                  approach . . . . . . . . . . . . . . . . 402--411
         Aashish Phansalkar and   
                 Ajay Joshi and   
                   Lizy K. John   Analysis of redundancy and application
                                  balance in the SPEC CPU2006 benchmark
                                  suite  . . . . . . . . . . . . . . . . . 412--423
                Hyesoon Kim and   
        José A. Joao and   
                 Onur Mutlu and   
              Chang Joo Lee and   
               Yale N. Patt and   
                    Robert Cohn   VPC prediction: reducing the cost of
                                  indirect branches via hardware-based
                                  dynamic devirtualization . . . . . . . . 424--435
           Andrew D. Hilton and   
                      Amir Roth   Ginger: control independence using tag
                                  rewriting  . . . . . . . . . . . . . . . 436--447
         Ahmed S. Al-Zawawi and   
             Vimal K. Reddy and   
             Eric Rotenberg and   
              Haitham H. Akkary   Transparent control independence (TCI)   448--459
           Nicholas J. Wang and   
              Aqeel Mahesri and   
                Sanjay J. Patel   Examining ACE analysis reliability
                                  estimates using fault-injection  . . . . 460--469
             Nidhi Aggarwal and   
  Parthasarathy Ranganathan and   
           Norman P. Jouppi and   
                 James E. Smith   Configurable isolation: building high
                                  availability systems with commodity
                                  multi-core processors  . . . . . . . . . 470--481
             Michael Dalton and   
                Hari Kannan and   
             Christos Kozyrakis   Raksha: a flexible information flow
                                  architecture for software security . . . 482--493
             Zhenghong Wang and   
                    Ruby B. Lee   New cache designs for thwarting software
                                  cache-based side channel attacks . . . . 494--505
Niranjan Kumar Soundararajan and   
         Angshuman Parashar and   
          Anand Sivasubramaniam   Mechanisms for bounding vulnerabilities
                                  of processor structures  . . . . . . . . 506--515
         Kristen R. Walcott and   
             Greg Humphreys and   
            Sudhanva Gurumurthi   Dynamic prediction of architectural
                                  vulnerability from microarchitectural
                                  state  . . . . . . . . . . . . . . . . . 516--527

ACM SIGARCH Computer Architecture News
Volume 35, Number 3, June, 2007

            Aneesh Aggarwal and   
                Pradip Bose and   
                 Mohamed Zahran   Introduction to the special issue on the
                                  2006 Reconfigurable and Adaptive
                                  Architecture Workshop  . . . . . . . . . 1--1
            Nikolaos Bellas and   
                Sek M. Chai and   
              Malcolm Dwyer and   
                  Dan Linzmeier   Mapping streaming architectures on
                                  reconfigurable platforms . . . . . . . . 2--8
           Martin Labrecque and   
         Peter Yiannacouras and   
             J. Gregory Steffan   Custom code generation for soft
                                  processors . . . . . . . . . . . . . . . 9--19
                   Tameesh Suri   Improving instruction level parallelism
                                  through reconfigurable units in
                                  superscalar processors . . . . . . . . . 20--27
      Hashem H. Najaf-abadi and   
                 Eric Rotenberg   Architectural contesting: exposing and
                                  exploiting temperamental behavior  . . . 28--35
              Kuo-Kun Tseng and   
               Ying-Dar Lin and   
             Tsern-Huei Lee and   
                 Yuan-Cheng Lai   Deterministic high-speed root-hashing
                                  automaton matching coprocessor for
                                  embedded network processor . . . . . . . 36--43
                  Fadi N. Sibai   Performance analysis and workload
                                  characterization of the $3$DMark05
                                  benchmark on modern parallel computer
                                  platforms  . . . . . . . . . . . . . . . 44--52
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 53--55

ACM SIGARCH Computer Architecture News
Volume 35, Number 4, September, 2007

               S. Bartolini and   
                  P. Foglia and   
                    C. A. Prete   MEmory performance: DEaling with
                                  applications, systems and architecture   4--5
          K. Patrick Lorton and   
                  David S. Wise   Analyzing block locality in Morton-order
                                  and Morton-hybrid matrices . . . . . . . 6--12
          Kaveh Jokar Deris and   
              Amirali Baniasadi   Investigating cache energy and latency
                                  break-even points in high performance
                                  processors . . . . . . . . . . . . . . . 13--20
                    Jun Yan and   
                      Wei Zhang   Evaluating instruction cache
                                  vulnerability to transient errors  . . . 21--28
Tanausú Ramírez and   
               Alex Pajuelo and   
        Oliverio J. Santana and   
                   Mateo Valero   Energy saving through a simple load
                                  control mechanism  . . . . . . . . . . . 29--36
              Luis M. Ramos and   
      José Luis Briz and   
Pablo E. Ibáñez and   
           Victor Viñals   Data prefetching in a cache hierarchy
                                  with high bandwidth and capacity . . . . 37--44
             Haakon Dybdahl and   
         Per Stenström and   
                   Lasse Natvig   An LRU-based replacement algorithm
                                  augmented with frequency of access in
                                  shared chip-multiprocessor caches  . . . 45--52
                 A. Bardine and   
                  P. Foglia and   
               G. Gabrielli and   
                C. A. Prete and   
              P. Stenström   Improving power efficiency of D-NUCA
                                  caches . . . . . . . . . . . . . . . . . 53--58
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 59--62

ACM SIGARCH Computer Architecture News
Volume 35, Number 5, December, 2007

                 Kenji Kise and   
             Toshinori Sato and   
                Hironori Nakajo   Special issue: ALPS'07 -- Advanced Low
                                  Power Systems: Introduction  . . . . . . 1--2
                    Jun Yao and   
               Shinobu Miwa and   
             Hajime Shimada and   
                  Shinji Tomita   Optimal pipeline depth with pipeline
                                  stage unification adoption . . . . . . . 3--9
    Preetham Lakshmikanthan and   
            Adrian Nuñez   VCLEARIT: a VLSI CMOS circuit leakage
                                  reduction technique for nanoscale
                                  technologies . . . . . . . . . . . . . . 10--16
            Kiyofumi Tanaka and   
              Takahiro Kawahara   Leakage energy reduction in cache memory
                                  by data compression  . . . . . . . . . . 17--24
             Hidetsugu Irie and   
               Ken Sugimoto and   
           Masahiro Goshima and   
                   Shuich Sakai   Preventing timing errors on register
                                  writes: mechanisms of detections and
                                  recoveries . . . . . . . . . . . . . . . 25--31
           Mihaela Mali\cta and   
          Gheorghe \cStefan and   
      Dominique Thiébaut   Not multi-, but many-core: designing
                                  integral parallel architectures for
                                  embedded computation . . . . . . . . . . 32--38
           Takefumi Miyoshi and   
                Nobuhiko Sugino   Fine-grain compensation method with
                                  consideration of trade-offs between
                                  computation and data transfer for power
                                  consumption  . . . . . . . . . . . . . . 39--44
        Bogdan F. Romanescu and   
           Michael E. Bauer and   
                  Sule Ozev and   
                Daniel J. Sorin   VariaSim: simulating circuits and
                                  systems in the presence of process
                                  variability  . . . . . . . . . . . . . . 45--48
           N. Venkateswaran and   
          Deepak Srinivasan and   
        Madhavan Manivannan and   
    T. P. Ramnath Sai Sagar and   
 Shyamsundar Gopalakrishnan and   
   VinothKrishnan Elangovan and   
       Karthik Chandrasekar and   
          Prem Kumar Ramesh and   
       Viswanath Venkatesan and   
          Arvindakshan Babu and   
                     Sudharshan   Future generation supercomputers I: a
                                  paradigm for node architecture . . . . . 49--60
           N. Venkateswaran and   
          Deepak Srinivasan and   
        Madhavan Manivannan and   
    T. P. Ramnath Sai Sagar and   
 Shyamsundar Gopalakrishnan and   
   VinothKrishnan Elangovan and   
                  Arvind M. and   
          Prem Kumar Ramesh and   
            Karthik Ganesan and   
    Viswanath Krishnamurthy and   
               Sivaramakrishnan   Future generation supercomputers II: a
                                  paradigm for cluster architecture  . . . 61--70
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 71--73

ACM SIGARCH Computer Architecture News
Volume 36, Number 1, March, 2008

                   Erik Winfree   Toward molecular programming with DNA    1--1
               Xiaoxin Chen and   
              Tal Garfinkel and   
       E. Christopher Lewis and   
        Pratap Subrahmanyam and   
        Carl A. Waldspurger and   
                  Dan Boneh and   
            Jeffrey Dwoskin and   
                Dan R. K. Ports   Overshadow: a virtualization-based
                                  approach to retrofitting protection in
                                  commodity operating systems  . . . . . . 2--13
         Jonathan M. McCune and   
                Bryan Parno and   
              Adrian Perrig and   
          Michael K. Reiter and   
                Arvind Seshadri   How low can you go?: recommendations for
                                  hardware-supported minimal TCB code
                                  execution  . . . . . . . . . . . . . . . 14--25
              Ravi Bhargava and   
          Benjamin Serebrin and   
          Francesco Spadini and   
                 Srilatha Manne   Accelerating two-dimensional page walks
                                  for virtualized systems  . . . . . . . . 26--35
            Benjamin C. Lee and   
                   David Brooks   Efficiency trends and limits from
                                  comprehensive microarchitectural
                                  adaptivity . . . . . . . . . . . . . . . 36--47
          Ramya Raghavendra and   
  Parthasarathy Ranganathan and   
              Vanish Talwar and   
                Zhikui Wang and   
                    Xiaoyun Zhu   No 'power' struggles: coordinated
                                  multi-level power management for the
                                  data center  . . . . . . . . . . . . . . 48--59
Chinnakrishnan S. Ballapuram and   
               Ahmad Sharif and   
              Hsien-Hsin S. Lee   Exploiting access semantics and program
                                  behavior to reduce snoop power in chip
                                  multiprocessors  . . . . . . . . . . . . 60--69
             Arindam Mallik and   
              Jack Cosgrove and   
             Robert P. Dick and   
               Gokhan Memik and   
                    Peter Dinda   PICSEL: measuring user-perceived
                                  performance to control dynamic frequency
                                  scaling  . . . . . . . . . . . . . . . . 70--79
               Jose A. Joao and   
                 Onur Mutlu and   
                Hyesoon Kim and   
              Rishi Agarwal and   
                   Yale N. Patt   Improving the performance of
                                  object-oriented languages with dynamic
                                  predication of indirect jumps  . . . . . 80--90
              Michal Wegiel and   
                 Chandra Krintz   The mapping collector: virtual memory
                                  support for generational, parallel, and
                                  concurrent compaction  . . . . . . . . . 91--102
               Joe Devietti and   
             Colin Blundell and   
          Milo M. K. Martin and   
                Steve Zdancewic   Hardbound: architectural support for
                                  spatial safety of the C programming
                                  language . . . . . . . . . . . . . . . . 103--114
            Vitaliy B. Lvin and   
                Gene Novark and   
            Emery D. Berger and   
               Benjamin G. Zorn   Archipelago: trading address space for
                                  reliability and security . . . . . . . . 115--124
               Bumyong Choi and   
                 Leo Porter and   
                Dean M. Tullsen   Accurate branch prediction for short
                                  threads  . . . . . . . . . . . . . . . . 125--134
        Shekhar Srikantaiah and   
            Mahmut Kandemir and   
                Mary Jane Irwin   Adaptive set pinning: managing shared
                                  caches in chip multiprocessors . . . . . 135--144
                 James Tuck and   
                 Wonsun Ahn and   
                  Luis Ceze and   
                Josep Torrellas   SoftSig: software-exposed hardware
                                  signatures for code analysis and
                                  optimization . . . . . . . . . . . . . . 145--156
               Ioana Burcea and   
            Stephen Somogyi and   
           Andreas Moshovos and   
                  Babak Falsafi   Predictor virtualization . . . . . . . . 157--167
            Vinod Ganapathy and   
      Matthew J. Renzelmann and   
         Arini Balakrishnan and   
           Michael M. Swift and   
                     Somesh Jha   The design and implementation of
                                  microdrivers . . . . . . . . . . . . . . 168--178
            Yaron Weinsberg and   
                Danny Dolev and   
                  Tal Anker and   
            Muli Ben-Yehuda and   
                   Pete Wyckoff   Tapping into the fountain of CPUs: on
                                  operating system support for
                                  programmable devices . . . . . . . . . . 179--188
                   Kai Shen and   
                 Ming Zhong and   
          Sandhya Dwarkadas and   
               Chuanpeng Li and   
        Christopher Stewart and   
                     Xiao Zhang   Hardware counter driven on-the-fly
                                  request signatures . . . . . . . . . . . 189--200
           Luk Van Ertvelde and   
                Lieven Eeckhout   Dispersing proprietary applications as
                                  benchmarks through code mutation . . . . 201--210
          Shashidhar Mysore and   
               Bita Mazloom and   
              Banit Agrawal and   
               Timothy Sherwood   Understanding and visualizing full
                                  systems with data flow tomography  . . . 211--221
           Guilherme Ottoni and   
                David I. August   Communication optimizations for global
                                  multi-threaded instruction scheduling    222--232
            Milind Kulkarni and   
             Keshav Pingali and   
       Ganesh Ramanarayanan and   
               Bruce Walter and   
                Kavita Bala and   
                   L. Paul Chew   Optimistic parallelism benefits from
                                  data partitioning  . . . . . . . . . . . 233--243
                   Russ Cox and   
                 Tom Bergan and   
         Austin T. Clements and   
             Frans Kaashoek and   
                   Eddie Kohler   Xoc, an extension-oriented compiler for
                                  systems programming  . . . . . . . . . . 244--254
            Philip M. Wells and   
        Koushik Chakraborty and   
               Gurindar S. Sohi   Adapting to intermittent faults in
                                  multicore systems  . . . . . . . . . . . 255--264
                 Man-Lap Li and   
       Pradeep Ramachandran and   
         Swarup Kumar Sahoo and   
             Sarita V. Adve and   
             Vikram S. Adve and   
                  Yuanyuan Zhou   Understanding the propagation of hard
                                  errors to software and implications for
                                  resilient system design  . . . . . . . . 265--276
           M. Aater Suleman and   
       Moinuddin K. Qureshi and   
                   Yale N. Patt   Feedback-driven threading:
                                  power-efficient and high-performance
                                  execution of multi-threaded workloads on
                                  CMPs . . . . . . . . . . . . . . . . . . 277--286
       Michael D. Linderman and   
         Jamison D. Collins and   
                  Hong Wang and   
                 Teresa H. Meng   Merge: a programming model for
                                  heterogeneous multi-core systems . . . . 287--296
          Jayanth Gummaraju and   
                Joel Coburn and   
              Yoshio Turner and   
               Mendel Rosenblum   Streamware: programming general-purpose
                                  multicore processors using streams . . . 297--307
      Edmund B. Nightingale and   
                Daniel Peek and   
              Peter M. Chen and   
                    Jason Flinn   Parallelizing security checks on
                                  commodity hardware . . . . . . . . . . . 308--318
              Miguel Castro and   
               Manuel Costa and   
           Jean-Philippe Martin   Better bug reporting with better privacy 319--328
                    Shan Lu and   
                Soyeon Park and   
                 Eunsoo Seo and   
                  Yuanyuan Zhou   Learning from mistakes: a comprehensive
                                  study on real world concurrency bug
                                  characteristics  . . . . . . . . . . . . 329--339

ACM SIGARCH Computer Architecture News
Volume 36, Number 3, June, 2008

                      Anonymous   Message from the General Chairs  . . . . x--x
                      Anonymous   Message from the Program Chair . . . . . xi--xi
                      Anonymous   Reviewers  . . . . . . . . . . . . . . . xv--xviii
              Francis Tseng and   
                   Yale N. Patt   Achieving Out-of-Order Performance with
                                  Almost In-Order Complexity . . . . . . . 3--12
             Mayank Agarwal and   
               Nitin Navale and   
              Kshitiz Malik and   
               Matthew I. Frank   Fetch-Criticality Reduction through
                                  Control Independence . . . . . . . . . . 13--24
           Miquel Peric\`as and   
             Adrian Cristal and   
       Francisco J. Cazorla and   
      Ruben González and   
            Alex Veidenbaum and   
   Daniel A. Jiménez and   
                   Mateo Valero   A Two-Level Load/Store Queue Based on
                                  Execution Locality . . . . . . . . . . . 25--36
                 Engin Ipek and   
                 Onur Mutlu and   
José F. Martínez and   
                   Rich Caruana   Self-Optimizing Memory Controllers: a
                                  Reinforcement Learning Approach  . . . . 39--50
       Shyamkumar Thoziyoor and   
                Jung Ho Ahn and   
           Matteo Monchiero and   
            Jay B. Brockman and   
               Norman P. Jouppi   A Comprehensive Memory Modeling Tool and
                                  Its Application to the Design and
                                  Analysis of Future Memory Hierarchies    51--62
                 Onur Mutlu and   
              Thomas Moscibroda   Parallelism-Aware Batch Scheduling:
                                  Enhancing both Performance and Fairness
                                  of Shared DRAM Systems . . . . . . . . . 63--74
                   John Kim and   
           William J. Dally and   
                Steve Scott and   
                    Dennis Abts   Technology-Driven, Highly-Scalable
                                  Dragonfly Topology . . . . . . . . . . . 77--88
                 Jae W. Lee and   
               Man Cheuk Ng and   
                 Krste Asanovic   Globally-Synchronized Frames for
                                  Guaranteed Quality-of-Service in On-Chip
                                  Networks . . . . . . . . . . . . . . . . 89--100
        Martha Mercaldi Kim and   
              John D. Davis and   
                 Mark Oskin and   
                    Todd Austin   Polymorphic On-Chip Networks . . . . . . 101--112
                  Lee Baugh and   
         Naveen Neelakantam and   
                   Craig Zilles   Using Hardware Memory Protection to
                                  Build a High-Performance,
                                  Strongly-Atomic Hybrid Transactional
                                  Memory . . . . . . . . . . . . . . . . . 115--126
              Jayaram Bobba and   
               Neelam Goyal and   
               Mark D. Hill and   
           Michael M. Swift and   
                  David A. Wood   TokenTM: Efficient Execution of Large
                                  Transactions with Hardware Transactional
                                  Memory . . . . . . . . . . . . . . . . . 127--138
         Arrvindh Shriraman and   
          Sandhya Dwarkadas and   
               Michael L. Scott   Flexible Decoupled Transactional Memory
                                  Support  . . . . . . . . . . . . . . . . 139--150
             Dana Vantrease and   
           Robert Schreiber and   
           Matteo Monchiero and   
              Moray McLaren and   
           Norman P. Jouppi and   
           Marco Fiorentino and   
                   Al Davis and   
             Nathan Binkert and   
      Raymond G. Beausoleil and   
                    Jung Ho Ahn   Corona: System Implications of Emerging
                                  Nanophotonic Technology  . . . . . . . . 153--164
      Lucas Kreger-Stickles and   
                     Mark Oskin   Microcoded Architectures for Ion-Tap
                                  Quantum Computers  . . . . . . . . . . . 165--176
          Nemanja Isailovic and   
               Mark Whitney and   
               Yatish Patel and   
               John Kubiatowicz   Running a Quantum Circuit at the Speed
                                  of Data  . . . . . . . . . . . . . . . . 177--188
              Xiaoyao Liang and   
                Gu-Yeon Wei and   
                   David Brooks   ReVIVaL: a Variation-Tolerant
                                  Architecture Using Voltage Interpolation
                                  and Variable Latency . . . . . . . . . . 191--202
            Chris Wilkerson and   
              Hongliang Gao and   
         Alaa R. Alameldeen and   
             Zeshan Chishti and   
           Muhammad Khellah and   
                   Shih-Lien Lu   Trading off Cache Capacity for
                                  Reliability to Enable Low Voltage
                                  Operation  . . . . . . . . . . . . . . . 203--214
          Franziska Roesner and   
                Doug Burger and   
             Stephen W. Keckler   Counting Dependence Predictors . . . . . 215--226
     Natalie Enright Jerger and   
              Li-Shiuan Peh and   
                  Mikko Lipasti   Virtual Circuit Tree Multicasting: a
                                  Case for On-Chip Hardware Multicast
                                  Support  . . . . . . . . . . . . . . . . 229--240
       Avinash Karanth Kodi and   
            Ashwini Sarathy and   
                    Ahmed Louri   iDEAL: Inter-router Dual-Function Energy
                                  and Area-Efficient Links for
                                  Network-on-Chip (NoC) Architectures  . . 241--250
              Dongkook Park and   
          Soumya Eachempati and   
             Reetuparna Das and   
             Asit K. Mishra and   
                   Yuan Xie and   
           N. Vijaykrishnan and   
                   Chita R. Das   MIRA: a Multi-layered On-Chip
                                  Interconnect Router Architecture . . . . 251--261
             Derek R. Hower and   
                   Mark D. Hill   Rerun: Exploiting Episodes for
                                  Lightweight Memory Race Recording  . . . 265--276
              Brandon Lucia and   
            Joseph Devietti and   
              Karin Strauss and   
                      Luis Ceze   Atom-Aid: Detecting and Surviving
                                  Atomicity Violations . . . . . . . . . . 277--288
           Pablo Montesinos and   
                  Luis Ceze and   
                Josep Torrellas   DeLorean: Recording and
                                  Deterministically Replaying
                                  Shared-Memory Multiprocessor Execution
                                  Efficiently  . . . . . . . . . . . . . . 289--300
              Sriram Sankar and   
        Sudhanva Gurumurthi and   
                 Mircea R. Stan   Intra-disk Parallelism: An Idea Whose
                                  Time Has Come  . . . . . . . . . . . . . 303--314
                  Kevin Lim and   
  Parthasarathy Ranganathan and   
              Jichuan Chang and   
          Chandrakant Patel and   
               Trevor Mudge and   
               Steven Reinhardt   Understanding and Designing New Server
                                  Architectures for Emerging
                                  Warehouse-Computing Environments . . . . 315--326
                 Taeho Kgil and   
              David Roberts and   
                   Trevor Mudge   Improving NAND Flash Based Disk Caches   327--338
                Xiaodong Li and   
             Sarita V. Adve and   
                Pradip Bose and   
                 Jude A. Rivers   Online Estimation of Architectural
                                  Vulnerability Factor for Soft Errors . . 341--352
              Jeonghee Shin and   
              Victor Zyuban and   
                Pradip Bose and   
            Timothy M. Pinkston   A Proactive Wearout Recovery Approach
                                  for Exploiting Microarchitectural
                                  Redundancy to Extend Cache SRAM Lifetime 353--362
            Radu Teodorescu and   
                Josep Torrellas   Variation-Aware Application Scheduling
                                  and Power Management for Chip
                                  Multiprocessors  . . . . . . . . . . . . 363--374
                Shimin Chen and   
             Michael Kozuch and   
         Theodoros Strigkos and   
              Babak Falsafi and   
         Phillip B. Gibbons and   
              Todd C. Mowry and   
        Vijaya Ramachandran and   
            Olatunji Ruwase and   
               Michael Ryan and   
              Evangelos Vlachos   Flexible Hardware Acceleration for
                                  Instruction-Grain Program Monitoring . . 377--388
               Nathan Clark and   
               Amir Hormati and   
                   Scott Mahlke   VEAL: Virtualized Execution Accelerator
                                  for Loops  . . . . . . . . . . . . . . . 389--400
                 Haibo Chen and   
                      Xi Wu and   
                 Liwei Yuan and   
                 Binyu Zang and   
              Pen-chung Yew and   
              Frederic T. Chong   From Speculation to Security: Practical
                                  and Efficient Information Flow Tracking
                                  Using Speculative Hardware . . . . . . . 401--412
              Carlos Boneti and   
       Francisco J. Cazorla and   
            Roberto Gioiosa and   
       Alper Buyuktosunoglu and   
             Chen-Yong Cher and   
                   Mateo Valero   Software-Controlled Priority
                                  Characterization of POWER5 Processor . . 415--426
                  Alex Shye and   
        Berkin Ozisikyilmaz and   
             Arindam Mallik and   
               Gokhan Memik and   
             Peter A. Dinda and   
             Robert P. Dick and   
              Alok N. Choudhary   Learning and Leveraging the Relationship
                                  between Architecture-Level Measurements
                                  and Individual User Satisfaction . . . . 427--438
              Sanjeev Kumar and   
                Daehyun Kim and   
        Mikhail Smelyanskiy and   
             Yen-Kuang Chen and   
             Jatin Chhugani and   
      Christopher J. Hughes and   
               Changkyu Kim and   
              Victor W. Lee and   
              Anthony D. Nguyen   Atomic Vector Operations on Chip
                                  Multiprocessors  . . . . . . . . . . . . 441--452
                 Gabriel H. Loh   $3$D-Stacked Memory Architectures for
                                  Multi-core Processors  . . . . . . . . . 453--464
                      Anonymous   Author Index . . . . . . . . . . . . . . 465--466
                      Anonymous   Publisher's Information  . . . . . . . . 468--468
                      Anonymous   Cover Art  . . . . . . . . . . . . . . . C1--C1

ACM SIGARCH Computer Architecture News
Volume 36, Number 4, September, 2008

            Ramesh K. Karne and   
     Alexander L. Wijesinha and   
            George H. Ford, Jr.   Opinion: stay on course with an
                                  evolution or choose a revolution in
                                  computing  . . . . . . . . . . . . . . . 1--6
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 7--11

ACM SIGARCH Computer Architecture News
Volume 36, Number 5, December, 2008

           Jerker Bengtsson and   
                Bertil Svensson   A domain-specific approach for software
                                  development on Manycore platforms  . . . 2--10
            Daniel Cederman and   
               Philippas Tsigas   On sorting and load balancing on GPUs    11--18
             Phuong Hoai Ha and   
           Philippas Tsigas and   
                 Otto J. Anshus   Non-blocking programming on multi-core
                                  graphics processors: (extended abstract) 19--28
    Shuvra S. Bhattacharyya and   
             Gordon Brebner and   
       Jörn W. Janneck and   
                 Johan Eker and   
            Carl von Platen and   
           Marco Mattavelli and   
            Mickaël Raulet   OpenDF: a dataflow toolset for
                                  reconfigurable hardware and multicore
                                  systems  . . . . . . . . . . . . . . . . 29--35
       Christoph W. Kessler and   
               Jörg Keller   Optimized on-chip pipelining of
                                  memory-intensive computations on the
                                  cell BE  . . . . . . . . . . . . . . . . 36--45
       Håkan Lundvall and   
    Kristian Stavåker and   
             Peter Fritzson and   
              Christoph Kessler   Automatic parallelization of simulation
                                  code for equation-based models with
                                  software pipelining and measurements on
                                  three platforms  . . . . . . . . . . . . 46--55
                  Huan Fang and   
                  Mats Brorsson   Scalable directory architecture for
                                  distributed shared memory chip
                                  multiprocessors  . . . . . . . . . . . . 56--64
                  Bengt Jonsson   State-space exploration for concurrent
                                  algorithms under weak memory orderings:
                                  (preliminary version)  . . . . . . . . . 65--71
        Parosh Aziz Abdulla and   
Frédéric Haziza and   
                   Mats Kindahl   Model checking race-freeness . . . . . . 72--79
              Hakan Sundell and   
               Philippas Tsigas   NOBLE: non-blocking programming support
                                  via lock-free shared abstract data types 80--87
           Anders Gidenstam and   
        Marina Papatriantafilou   LFTHREADS: a lock-free thread library    88--92
        Karl-Filip Faxén   Wool --- a work stealing library . . . . 93--100
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 101--111

ACM SIGARCH Computer Architecture News
Volume 37, Number 1, March, 2009

               Mark Gebhart and   
          Bertrand A. Maher and   
         Katherine E. Coons and   
               Jeff Diamond and   
                 Paul Gratz and   
               Mario Marino and   
          Nitya Ranganathan and   
           Behnam Robatmili and   
                Aaron Smith and   
              James Burrill and   
         Stephen W. Keckler and   
                Doug Burger and   
            Kathryn S. McKinley   An evaluation of the TRIPS computer
                                  system . . . . . . . . . . . . . . . . . 1--12
          Constantin Pistol and   
     Wutichai Chongchitmate and   
          Christopher Dwyer and   
                Alvin R. Lebeck   Architectural implications of nanoscale
                                  integrated sensing and computing . . . . 13--24
                Soyeon Park and   
                    Shan Lu and   
                  Yuanyuan Zhou   CTrigger: exposing atomicity violation
                                  bugs from their hiding places  . . . . . 25--36
         Stelios Sidiroglou and   
                Oren Laadan and   
               Carlos Perez and   
            Nicolas Viennot and   
                 Jason Nieh and   
           Angelos D. Keromytis   ASSURE: automatic software self-healing
                                  using rescue points  . . . . . . . . . . 37--48
            Andrew Lenharth and   
             Vikram S. Adve and   
                 Samuel T. King   Recovery domains: an organizing
                                  principle for recoverable operating
                                  systems  . . . . . . . . . . . . . . . . 49--60
            Martin Dimitrov and   
                   Huiyang Zhou   Anomaly-based bug prediction, isolation,
                                  and validation: an automated approach
                                  for software debugging . . . . . . . . . 61--72
           Pablo Montesinos and   
              Matthew Hicks and   
             Samuel T. King and   
                Josep Torrellas   Capo: a software-hardware interface for
                                  practical deterministic multiprocessor
                                  replay . . . . . . . . . . . . . . . . . 73--84
            Joseph Devietti and   
              Brandon Lucia and   
                  Luis Ceze and   
                     Mark Oskin   DMP: deterministic shared memory
                                  multiprocessing  . . . . . . . . . . . . 85--96
            Marek Olszewski and   
                Jason Ansel and   
              Saman Amarasinghe   Kendo: efficient deterministic
                                  multithreading in software . . . . . . . 97--108
               Mohit Tiwari and   
        Hassan M. G. Wassel and   
               Bita Mazloom and   
          Shashidhar Mysore and   
          Frederic T. Chong and   
               Timothy Sherwood   Complete information flow tracking from
                                  the gates up . . . . . . . . . . . . . . 109--120
               David K. Tam and   
                 Reza Azimi and   
            Livio B. Soares and   
                  Michael Stumm   RapidMRC: approximating L2 miss rate
                                  curves on commodity systems for online
                                  optimizations  . . . . . . . . . . . . . 121--132
              Stijn Eyerman and   
                Lieven Eeckhout   Per-thread cycle accounting in SMT
                                  processors . . . . . . . . . . . . . . . 133--144
            Owen S. Hofmann and   
    Christopher J. Rossbach and   
                 Emmett Witchel   Maximum benefit from a minimal HTM . . . 145--156
                  Dave Dice and   
                  Yossi Lev and   
                  Mark Moir and   
                Daniel Nussbaum   Early experience with a commercial
                                  hardware transactional memory
                                  implementation . . . . . . . . . . . . . 157--168
            Philip M. Wells and   
        Koushik Chakraborty and   
               Gurindar S. Sohi   Mixed-mode multicore reliability . . . . 169--180
            Sriram Rajamani and   
              G. Ramalingam and   
 Venkatesh Prasad Ranganath and   
                  Kapil Vaswani   ISOLATOR: dynamically ensuring isolation
                                  in comcurrent programs . . . . . . . . . 181--192
               Joseph Tucek and   
               Weiwei Xiong and   
                  Yuanyuan Zhou   Efficient online validation with delta
                                  execution  . . . . . . . . . . . . . . . 193--204
              David Meisner and   
              Brian T. Gold and   
              Thomas F. Wenisch   PowerNap: eliminating server idle power  205--216
        Adrian M. Caulfield and   
             Laura M. Grupp and   
                 Steven Swanson   Gordon: using flash memory to build
                                  fast, power-efficient clusters for
                                  data-intensive applications  . . . . . . 217--228
               Aayush Gupta and   
               Youngjae Kim and   
               Bhuvan Urgaonkar   DFTL: a flash translation layer
                                  employing demand-based selective caching
                                  of page-level address mappings . . . . . 229--240
              Farhana Aleen and   
                   Nathan Clark   Commutativity analysis for software
                                  parallelization: letting program
                                  transformations see the big picture  . . 241--252
           M. Aater Suleman and   
                 Onur Mutlu and   
       Moinuddin K. Qureshi and   
                   Yale N. Patt   Accelerating critical section execution
                                  with asymmetric multi-core architectures 253--264
             Todd Mytkowicz and   
                 Amer Diwan and   
         Matthias Hauswirth and   
               Peter F. Sweeney   Producing wrong data without doing
                                  anything obviously wrong!  . . . . . . . 265--276
            Michael D. Bond and   
            Kathryn S. McKinley   Leak pruning . . . . . . . . . . . . . . 277--288
              Michal Wegiel and   
                 Chandra Krintz   Dynamic prediction of collection yield
                                  for managed runtimes . . . . . . . . . . 289--300
              Aravind Menon and   
             Simon Schubert and   
               Willy Zwaenepoel   TwinDrivers: semi-automatic derivation
                                  of fast and safe hypervisor network
                                  drivers from guest OS drivers  . . . . . 301--312
               Ioana Burcea and   
               Andreas Moshovos   Phantom-BTB: a virtualized branch target
                                  buffer design  . . . . . . . . . . . . . 313--324
             Karthik Ramani and   
      Christiaan P. Gribble and   
                       Al Davis   StreamRay: a stream filtering
                                  architecture for coherent ray tracing    325--336
          Robert D. Cameron and   
                        Dan Lin   Architectural support for SWAR text
                                  processing with parallel bit streams:
                                  the inductive doubling principle . . . . 337--348

ACM SIGARCH Computer Architecture News
Volume 37, Number 2, May, 2009

           Norman P. Jouppi and   
               Rakesh Kumar and   
                   Dean Tullsen   Introduction to the special issue on the
                                  2008 Workshop on Design, Analysis, and
                                  Simulation of Chip Multiprocessors
                                  (dasCMP'08)  . . . . . . . . . . . . . . 1--1
                   Hui Zeng and   
                Matt Yourst and   
                Kanad Ghose and   
               Dmitry Ponomarev   MPTLsim: a cycle-accurate, full-system
                                  simulator for x86-64 multicore
                                  architectures with coherent caches . . . 2--9
           Matteo Monchiero and   
                Jung Ho Ahn and   
        Ayose Falcón and   
              Daniel Ortega and   
               Paolo Faraboschi   How to simulate 1000 cores . . . . . . . 10--19
               Jianwei Chen and   
           Murali Annavaram and   
                  Michel Dubois   SlackSim: a platform for parallel
                                  simulations of CMPs on CMPs  . . . . . . 20--29
        Madhura Purnaprajna and   
             Mario Porrmann and   
                Ulrich Rueckert   Run-time reconfigurability in embedded
                                  multiprocessors  . . . . . . . . . . . . 30--37
             Chris Jesshope and   
               Mike Lankamp and   
                       Li Zhang   The implementation of an SVP many-core
                                  processor and the evaluation of its
                                  memory architecture  . . . . . . . . . . 38--45
                Karan Singh and   
            Major Bhadauria and   
                 Sally A. McKee   Real time power estimation and thread
                                  scheduling via performance counters  . . 46--55
                 Omid Azizi and   
              Aqeel Mahesri and   
            Sanjay J. Patel and   
                  Mark Horowitz   Area-efficiency in CMP core design:
                                  co-optimization of microarchitecture and
                                  physical design  . . . . . . . . . . . . 56--65
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 66--69

ACM SIGARCH Computer Architecture News
Volume 37, Number 3, June, 2009

               Katherine Yelick   Ten ways to waste a parallel computer    1--1
            Benjamin C. Lee and   
                 Engin Ipek and   
                 Onur Mutlu and   
                    Doug Burger   Architecting phase change memory as a
                                  scalable DRAM alternative  . . . . . . . 2--13
                  Ping Zhou and   
                    Bo Zhao and   
                   Jun Yang and   
                   Youtao Zhang   A durable and energy efficient main
                                  memory using phase change memory
                                  technology . . . . . . . . . . . . . . . 14--23
       Moinuddin K. Qureshi and   
   Vijayalakshmi Srinivasan and   
                 Jude A. Rivers   Scalable high performance main memory
                                  system using phase-change memory
                                  technology . . . . . . . . . . . . . . . 24--33
                 Xiaoxia Wu and   
                    Jian Li and   
                Lixin Zhang and   
               Evan Speight and   
               Ram Rajamony and   
                       Yuan Xie   Hybrid cache architecture with disparate
                                  memory technologies  . . . . . . . . . . 34--45
                  Jinho Suh and   
                  Michel Dubois   Dynamic MIPS rate stabilization in
                                  out-of-order processors  . . . . . . . . 46--56
             Marco Paolieri and   
    Eduardo Quiñones and   
       Francisco J. Cazorla and   
             Guillem Bernat and   
                   Mateo Valero   Hardware support for WCET analysis of
                                  hard real-time multicore systems . . . . 57--68
            Stephen Somogyi and   
          Thomas F. Wenisch and   
         Anastasia Ailamaki and   
                  Babak Falsafi   Spatio-temporal memory streaming . . . . 69--80
                 Pedro Diaz and   
                 Marcelo Cintra   Stream chaining: exploiting multiple
                                  levels of correlation in data
                                  prefetching  . . . . . . . . . . . . . . 81--92
          Michael D. Powell and   
              Arijit Biswas and   
             Shantanu Gupta and   
         Shubhendu S. Mukherjee   Architectural core salvaging in a
                                  multi-core processor for hard-error
                                  tolerance  . . . . . . . . . . . . . . . 93--104
           Javier Carretero and   
             Pedro Chaparro and   
                Xavier Vera and   
               Jaume Abella and   
        Antonio González   End-to-end register data-flow continuous
                                  self-test  . . . . . . . . . . . . . . . 105--115
              Doe Hyun Yoon and   
                    Mattan Erez   Memory mapped ECC: low-cost error
                                  protection for last level caches . . . . 116--127
                   Mark Woh and   
                Sangwon Seo and   
               Scott Mahlke and   
               Trevor Mudge and   
       Chaitali Chakrabarti and   
             Krisztian Flautner   AnySP: anytime anywhere anyway signal
                                  processing . . . . . . . . . . . . . . . 128--139
               John H. Kelm and   
          Daniel R. Johnson and   
         Matthew R. Johnson and   
              Neal C. Crago and   
              William Tuohy and   
              Aqeel Mahesri and   
          Steven S. Lumetta and   
           Matthew I. Frank and   
                Sanjay J. Patel   Rigel: an architecture and scalable
                                  programming interface for a 1000-core
                                  accelerator  . . . . . . . . . . . . . . 140--151
                Sunpyo Hong and   
                    Hyesoon Kim   An analytical model for a GPU
                                  architecture with memory-level and
                                  thread-level parallelism awareness . . . 152--163
              Susmit Biswas and   
             Diana Franklin and   
                Alan Savage and   
                 Ryan Dixon and   
           Timothy Sherwood and   
              Frederic T. Chong   Multi-execution: multicore caching for
                                  data-similar executions  . . . . . . . . 164--173
                Yuejian Xie and   
                 Gabriel H. Loh   PIPP: promotion/insertion
                                  pseudo-partitioning of multi-core shared
                                  caches . . . . . . . . . . . . . . . . . 174--183
          Nikos Hardavellas and   
            Michael Ferdman and   
              Babak Falsafi and   
             Anastasia Ailamaki   Reactive NUCA: near-optimal block
                                  placement and replication in distributed
                                  caches . . . . . . . . . . . . . . . . . 184--195
          Thomas Moscibroda and   
                     Onur Mutlu   A case for bufferless routing in on-chip
                                  networks . . . . . . . . . . . . . . . . 196--207
            Michel A. Kinsy and   
             Myong Hyon Cho and   
                   Tina Wen and   
                 Edward Suh and   
            Marten van Dijk and   
               Srinivas Devadas   Application-aware deadlock-free
                                  oblivious routing  . . . . . . . . . . . 208--219
                  Nan Jiang and   
                   John Kim and   
               William J. Dally   Indirect adaptive routing on large scale
                                  interconnection networks . . . . . . . . 220--231
                 James Hamilton   Internet-scale service infrastructure
                                  efficiency . . . . . . . . . . . . . . . 232--232
             Colin Blundell and   
          Milo M. K. Martin and   
              Thomas F. Wenisch   InvisiFence: performance-transparent
                                  memory ordering in conventional
                                  multiprocessors  . . . . . . . . . . . . 233--244
              Andrew Hilton and   
                      Amir Roth   Decoupled store completion/silent
                                  deterministic replay: enabling scalable
                                  data memory for CPR/CFP processors . . . 245--254
            Hongzhong Zheng and   
                  Jiang Lin and   
                 Zhao Zhang and   
                    Zhichun Zhu   Decoupled DIMM: building high-bandwidth
                                  memory system using low-speed DRAM
                                  devices  . . . . . . . . . . . . . . . . 255--266
                  Kevin Lim and   
              Jichuan Chang and   
               Trevor Mudge and   
  Parthasarathy Ranganathan and   
        Steven K. Reinhardt and   
              Thomas F. Wenisch   Disaggregated memory for expansion and
                                  sharing in blade servers . . . . . . . . 267--278
               Cagdas Dirik and   
                    Bruce Jacob   The performance of PC solid-state disks
                                  (SSDs) as a function of bandwidth,
                                  concurrency, device architecture, and
                                  system organization  . . . . . . . . . . 279--289
     Abhishek Bhattacharjee and   
             Margaret Martonosi   Thread criticality predictors for
                                  dynamic performance, power, and resource
                                  management in chip multiprocessors . . . 290--301
          Krishna K. Rangan and   
                Gu-Yeon Wei and   
                   David Brooks   Thread motion: fine-grained power
                                  management for multi-core systems  . . . 302--313
                  Yefu Wang and   
                     Kai Ma and   
                   Xiaorui Wang   Temperature-constrained power control
                                  for chip multiprocessors with online
                                  model estimation . . . . . . . . . . . . 314--324
                     Jie Yu and   
            Satish Narayanasamy   A case for an interleaving constrained
                                  shared-memory multi-processor  . . . . . 325--336
           Abdullah Muzahid and   
        Dario Suárez and   
               Shanxiang Qi and   
                Josep Torrellas   SigRace: signature-based data race
                                  detection  . . . . . . . . . . . . . . . 337--348
            Vijay Nagarajan and   
                    Rajiv Gupta   ECMon: exposing cache events for
                                  monitoring . . . . . . . . . . . . . . . 349--360
               Ali G. Saidi and   
          Nathan L. Binkert and   
        Steven K. Reinhardt and   
                   Trevor Mudge   End-to-end performance forecasting:
                                  finding bottlenecks before they happen   361--370
            Brian M. Rogers and   
               Anil Krishna and   
             Gordon B. Bell and   
                     Ken Vu and   
              Xiaowei Jiang and   
                    Yan Solihin   Scaling the bandwidth wall: challenges
                                  in and avenues for CMP scaling . . . . . 371--382
            Mark G. Whitney and   
          Nemanja Isailovic and   
               Yatish Patel and   
               John Kubiatowicz   A fault tolerant, area efficient
                                  architecture for Shor's factoring
                                  algorithm  . . . . . . . . . . . . . . . 383--394
              Andrew Putnam and   
               Susan Eggers and   
               Dave Bennett and   
             Eric Dellinger and   
                 Jeff Mason and   
               Henry Styles and   
      Prasanna Sundararajan and   
                   Ralph Wittig   Performance and power of cache-based
                                  reconfigurable computing . . . . . . . . 395--405
         Amin Firoozshahian and   
          Alex Solomatnikov and   
               Ofer Shacham and   
                 Zain Asgar and   
         Stephen Richardson and   
         Christos Kozyrakis and   
                  Mark Horowitz   A memory system design framework:
                                  creating smart memories  . . . . . . . . 406--417
        José A. Joao and   
                 Onur Mutlu and   
                   Yale N. Patt   Flexible reference-counting-based
                                  hardware acceleration for garbage
                                  collection . . . . . . . . . . . . . . . 418--428
                    Yan Pan and   
              Prabhat Kumar and   
                   John Kim and   
               Gokhan Memik and   
                   Yu Zhang and   
                 Alok Choudhary   Firefly: illuminating future
                                  network-on-chip with nanophotonics . . . 429--440
         Mark J. Cianchetti and   
          Joseph C. Kerekes and   
              David H. Albonesi   Phastlane: a rapid transit optical
                                  routing network  . . . . . . . . . . . . 441--450
                Dennis Abts and   
  Natalie D. Enright Jerger and   
                   John Kim and   
                 Dan Gibson and   
               Mikko H. Lipasti   Achieving predictable performance
                                  through better memory controller
                                  placement in many-core CMPs  . . . . . . 451--461
               Yangchun Luo and   
     Venkatesan Packirisamy and   
              Wei-Chung Hsu and   
               Antonia Zhai and   
              Nikhil Mungre and   
                   Ankit Tarkas   Dynamic performance tuning for
                                  speculative threads  . . . . . . . . . . 462--473
            Carlos Madriles and   
         Pedro López and   
            Josep M. Codina and   
               Enric Gibert and   
           Fernando Latorre and   
         Alejandro Martinez and   
       Raúl Martinez and   
               Antonio Gonzalez   Boosting single-thread performance in
                                  multi-core systems through fine-grain
                                  multi-threading  . . . . . . . . . . . . 474--483
        Shailender Chaudhry and   
              Robert Cypher and   
               Magnus Ekman and   
            Martin Karlsson and   
              Anders Landin and   
                Sherman Yip and   
         Håkan Zeffer and   
                  Marc Tremblay   Simultaneous speculative threading: a
                                  novel pipeline architecture implemented
                                  in Sun's Rock processor  . . . . . . . . 484--495

ACM SIGARCH Computer Architecture News
Volume 37, Number 4, September, 2009

            Alexander Thomasian   Publications on storage and systems
                                  research . . . . . . . . . . . . . . . . 1--26
                   Enric Musoll   Mesh-based many-core performance under
                                  process variations: a core yield
                                  perspective  . . . . . . . . . . . . . . 27--34
               Angel V. Nikolov   Queuing theoretic model for a
                                  multiprocessor with private caches and
                                  shared memory  . . . . . . . . . . . . . 35--44
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 45--51

ACM SIGARCH Computer Architecture News
Volume 37, Number 5, December, 2009

                   Enric Musoll   Leakage-saving opportunities in
                                  mesh-based massive multi-core
                                  architectures  . . . . . . . . . . . . . 1--7
                Abdul Naeem and   
               Xiaowen Chen and   
                Zhonghai Lu and   
                   Axel Jantsch   Scalability of relaxed consistency
                                  models in NoC based multicore
                                  architectures  . . . . . . . . . . . . . 8--15
             Sandeep Sharma and   
               K. S. Kahlon and   
                   P. K. Bansal   Reliability and path length analysis of
                                  irregular fault tolerant multistage
                                  interconnection network  . . . . . . . . 16--23
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 24--30

ACM SIGARCH Computer Architecture News
Volume 38, Number 1, March, 2010

                 Eric A. Brewer   Technology for developing regions:
                                  Moore's Law is not enough  . . . . . . . 1--2
                 Engin Ipek and   
              Jeremy Condit and   
      Edmund B. Nightingale and   
                Doug Burger and   
              Thomas Moscibroda   Dynamically replicated memory: building
                                  reliable systems from nanoscale
                                  resistive memories . . . . . . . . . . . 3--14
               Nevin Kirman and   
 José F. Martínez   A power-efficient all-optical on-chip
                                  interconnect using wavelength-based
                                  oblivious routing  . . . . . . . . . . . 15--28
         Naveen Neelakantam and   
            David R. Ditzel and   
                   Craig Zilles   A real system evaluation of hardware
                                  atomicity for software speculation . . . 29--38
                 Tim Harris and   
                 Sasa Tomic and   
      Adrián Cristal and   
                    Osman Unsal   Dynamic filtering: multi-purpose
                                  architecture support for language
                                  runtime systems  . . . . . . . . . . . . 39--52
                 Tom Bergan and   
              Owen Anderson and   
            Joseph Devietti and   
                  Luis Ceze and   
                   Dan Grossman   CoreDet: a compiler and runtime system
                                  for deterministic multithreaded
                                  execution  . . . . . . . . . . . . . . . 53--64
                 Arun Raman and   
                 Hanjun Kim and   
            Thomas R. Mason and   
           Thomas B. Jablin and   
                David I. August   Speculative parallelization using
                                  software multi-threaded transactions . . 65--76
               Dongyoon Lee and   
            Benjamin Wester and   
      Kaushik Veeraraghavan and   
        Satish Narayanasamy and   
              Peter M. Chen and   
                    Jason Flinn   Respec: efficient online multiprocessor
                                  replay via speculation and external
                                  determinism  . . . . . . . . . . . . . . 77--90
              Stijn Eyerman and   
                Lieven Eeckhout   Probabilistic job symbiosis modeling for
                                  SMT processor scheduling . . . . . . . . 91--102
                       Kai Shen   Request behavior variations  . . . . . . 103--116
            F. Ryan Johnson and   
                Radu Stoica and   
         Anastasia Ailamaki and   
                  Todd C. Mowry   Decoupling contention management from
                                  scheduling . . . . . . . . . . . . . . . 117--128
           Sergey Zhuravlev and   
          Sergey Blagodurov and   
             Alexandra Fedorova   Addressing shared resource contention in
                                  multicore processors via scheduling  . . 129--142
                  Ding Yuan and   
                 Haohui Mai and   
               Weiwei Xiong and   
                    Lin Tan and   
              Yuanyuan Zhou and   
              Shankar Pasupathy   SherLog: error diagnosis by connecting
                                  clues from run-time logs . . . . . . . . 143--154
        Dasarath Weeratunge and   
              Xiangyu Zhang and   
             Suresh Jagannathan   Analyzing multicore dumps to facilitate
                                  concurrency bug reproduction . . . . . . 155--166
       Sebastian Burckhardt and   
            Pravesh Kothari and   
         Madanlal Musuvathi and   
            Santosh Nagarakatte   A randomized scheduler with
                                  probabilistic guarantees of finding bugs 167--178
                  Wei Zhang and   
                  Chong Sun and   
                        Shan Lu   ConMem: detecting severe concurrency
                                  bugs through an effect-oriented approach 179--192
Francisco Javier Mesa-Martinez and   
         Ehsan K. Ardestani and   
                     Jose Renau   Characterizing processor thermal
                                  behavior . . . . . . . . . . . . . . . . 193--204
           Ganesh Venkatesh and   
               Jack Sampson and   
            Nathan Goulding and   
           Saturnino Garcia and   
          Vladyslav Bryksin and   
         Jose Lugo-Martinez and   
             Steven Swanson and   
         Michael Bedford Taylor   Conservation cores: reducing the energy
                                  of mature computations . . . . . . . . . 205--218
              Kshitij Sudan and   
       Niladrish Chatterjee and   
              David Nellans and   
               Manu Awasthi and   
     Rajeev Balasubramonian and   
                       Al Davis   Micro-pages: increasing DRAM efficiency
                                  with locality-aware data placement . . . 219--230
              Steven Pelley and   
              David Meisner and   
          Pooya Zandevakili and   
          Thomas F. Wenisch and   
                 Jack Underwood   Power routing: dynamic power
                                  provisioning in the data center  . . . . 231--242
                Faraz Ahmad and   
               T. N. Vijaykumar   Joint optimization of idle and cooling
                                  power in data centers while maintaining
                                  response time  . . . . . . . . . . . . . 243--256
      Michelle L. Goodstein and   
          Evangelos Vlachos and   
                Shimin Chen and   
         Phillip B. Gibbons and   
          Michael A. Kozuch and   
                  Todd C. Mowry   Butterfly analysis: adapting dataflow
                                  analysis to dynamic parallel monitoring  257--270
          Evangelos Vlachos and   
      Michelle L. Goodstein and   
          Michael A. Kozuch and   
                Shimin Chen and   
              Babak Falsafi and   
         Phillip B. Gibbons and   
                  Todd C. Mowry   ParaLog: enabling and accelerating
                                  online parallel monitoring of
                                  multithreaded applications . . . . . . . 271--284
            Amir H. Hormati and   
               Yoonseo Choi and   
                   Mark Woh and   
           Manjunath Kudlur and   
              Rodric Rabbah and   
               Trevor Mudge and   
                   Scott Mahlke   MacroSS: macro-SIMDization of streaming
                                  applications . . . . . . . . . . . . . . 285--296
              Dong Hyuk Woo and   
              Hsien-Hsin S. Lee   COMPASS: a programmable data prefetcher
                                  using idle GPU shaders . . . . . . . . . 297--310
             Daniel Sanchez and   
             Richard M. Yoo and   
             Christos Kozyrakis   Flexible architectural support for
                                  fine-grain scheduling  . . . . . . . . . 311--322
        Bogdan F. Romanescu and   
            Alvin R. Lebeck and   
                Daniel J. Sorin   Specifying and dynamically verifying
                                  address translation-aware memory
                                  consistency  . . . . . . . . . . . . . . 323--334
             Eiman Ebrahimi and   
              Chang Joo Lee and   
                 Onur Mutlu and   
                   Yale N. Patt   Fairness via source throttling: a
                                  configurable and high-performance
                                  fairness substrate for multi-core memory
                                  systems  . . . . . . . . . . . . . . . . 335--346
               Isaac Gelado and   
             Javier Cabezas and   
              Nacho Navarro and   
              John E. Stone and   
               Sanjay Patel and   
                 Wen-mei W. Hwu   An asymmetric distributed shared memory
                                  model for heterogeneous parallel systems 347--358
     Abhishek Bhattacharjee and   
             Margaret Martonosi   Inter-core cooperative TLB for chip
                                  multiprocessors  . . . . . . . . . . . . 359--370
               Ruirui Huang and   
             Daniel Y. Deng and   
                  G. Edward Suh   Orthrus: efficient software integrity
                                  protection on multi-cores  . . . . . . . 371--384
              Shuguang Feng and   
             Shantanu Gupta and   
                Amin Ansari and   
                   Scott Mahlke   Shoestring: probabilistic soft error
                                  reliability on the cheap . . . . . . . . 385--396
              Doe Hyun Yoon and   
                    Mattan Erez   Virtualized and flexible ECC for main
                                  memory . . . . . . . . . . . . . . . . . 397--408

ACM SIGARCH Computer Architecture News
Volume 38, Number 2, May, 2010

            Alexander Thomasian   Storage research in industry and
                                  universities . . . . . . . . . . . . . . 1--48
               Wolfgang Matthes   Resources instead of cores?  . . . . . . 49--63
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 64--67

ACM SIGARCH Computer Architecture News
Volume 38, Number 3, June, 2010

               William J. Dally   Moving the needle, computer architecture
                                  research in academe and industry . . . . 1--1
            Yasuko Watanabe and   
              John D. Davis and   
                  David A. Wood   WiDGET: Wisconsin Decoupled Grid
                                  Execution Tiles  . . . . . . . . . . . . 2--13
                 Dan Gibson and   
                  David A. Wood   Forwardflow: a scalable core for
                                  power-constrained CMPs . . . . . . . . . 14--25
                 Omid Azizi and   
              Aqeel Mahesri and   
            Benjamin C. Lee and   
            Sanjay J. Patel and   
                  Mark Horowitz   Energy-performance tradeoffs in
                                  processor architecture and circuit
                                  design: a marginal cost analysis . . . . 26--36
               Rehan Hameed and   
             Wajahat Qadeer and   
                Megan Wachs and   
                 Omid Azizi and   
          Alex Solomatnikov and   
            Benjamin C. Lee and   
         Stephen Richardson and   
         Christos Kozyrakis and   
                  Mark Horowitz   Understanding sources of inefficiency in
                                  general-purpose chips  . . . . . . . . . 37--47
             Thomas W. Barr and   
                Alan L. Cox and   
                   Scott Rixner   Translation caching: skip, don't walk
                                  (the page table) . . . . . . . . . . . . 48--59
               Aamer Jaleel and   
          Kevin B. Theobald and   
       Simon C. Steely, Jr. and   
                      Joel Emer   High performance cache replacement using
                                  re-reference interval prediction (RRIP)  60--71
          Jeffrey Stuecheli and   
         Dimitris Kaseridis and   
                 David Daly and   
          Hillery C. Hunter and   
                   Lizy K. John   The virtual write queue: coordinating
                                  DRAM and last-level cache policies . . . 72--82
            Chris Wilkerson and   
         Alaa R. Alameldeen and   
             Zeshan Chishti and   
                     Wei Wu and   
          Dinesh Somasekhar and   
                   Shih-lien Lu   Reducing cache power with low-cost,
                                  multi-bit error-correcting codes . . . . 83--93
                   Jing Xue and   
                  Alok Garg and   
      Berkehan Ciftcio\uglu and   
                 Jianyun Hu and   
                 Shang Wang and   
            Ioannis Savidis and   
                Manish Jain and   
             Rebecca Berman and   
                   Peng Liu and   
              Michael Huang and   
                     Hui Wu and   
               Eby Friedman and   
                 Gary Wicks and   
                   Duncan Moore   An intra-chip free-space optical
                                  interconnect . . . . . . . . . . . . . . 94--105
             Reetuparna Das and   
                 Onur Mutlu and   
          Thomas Moscibroda and   
                   Chita R. Das   Aérgia: exploiting packet latency slack
                                  in on-chip networks  . . . . . . . . . . 106--116
                Pranay Koka and   
       Michael O. McCracken and   
             Herb Schwetman and   
               Xuezhe Zheng and   
                     Ron Ho and   
        Ashok V. Krishnamoorthy   Silicon-photonic network architectures
                                  for scalable, power-efficient multi-chip
                                  systems  . . . . . . . . . . . . . . . . 117--128
               Scott Beamer and   
                   Chen Sun and   
              Yong-Jin Kwon and   
                 Ajay Joshi and   
         Christopher Batten and   
      Vladimir Stojanovi\'c and   
               Krste Asanovi\'c   Re-architecting DRAM memory systems with
                                  monolithically integrated silicon
                                  photonics  . . . . . . . . . . . . . . . 129--140
           Stuart Schechter and   
             Gabriel H. Loh and   
               Karin Straus and   
                    Doug Burger   Use ECP, not ECC, for hard failures in
                                  resistive memories . . . . . . . . . . . 141--152
       Moinuddin K. Qureshi and   
    Michele M. Franceschini and   
Luis A. Lastras-Montaño and   
                John P. Karidis   Morphable memory system: a robust
                                  architecture for exploiting multi-level
                                  phase change memories  . . . . . . . . . 153--162
          Timothy Pritchett and   
            Mithuna Thottethodi   SieveStore: a highly-selective,
                                  ensemble-level disk cache for
                                  cost-performance . . . . . . . . . . . . 163--174
         Aniruddha N. Udipi and   
       Naveen Muralimanohar and   
       Niladrish Chatterjee and   
     Rajeev Balasubramonian and   
                   Al Davis and   
               Norman P. Jouppi   Rethinking DRAM design and organization
                                  for energy-constrained multi-cores . . . 175--186
                 Yunji Chen and   
                   Weiwu Hu and   
               Tianshi Chen and   
                     Ruiyang Wu   LReplay: a pending period based
                                  deterministic replay scheme  . . . . . . 187--197
        Gwendolyn Voskuilen and   
                Faraz Ahmad and   
               T. N. Vijaykumar   Timetraveler: exploiting acyclic races
                                  for optimizing memory race recording . . 198--209
              Brandon Lucia and   
                  Luis Ceze and   
              Karin Strauss and   
                Shaz Qadeer and   
                  Hans-J. Boehm   Conflict exceptions: simplifying
                                  concurrent language semantics with
                                  precise hardware exceptions for
                                  data-races . . . . . . . . . . . . . . . 210--221
              Brandon Lucia and   
                  Luis Ceze and   
                  Karin Strauss   ColorSafe: architectural support for
                                  debugging and dynamically avoiding
                                  multi-variable atomicity violations  . . 222--233
                Mary Jane Irwin   Shared caches in multicores: the good,
                                  the bad, and the ugly  . . . . . . . . . 234--234
               Jiayuan Meng and   
               David Tarjan and   
                  Kevin Skadron   Dynamic warp subdivision for integrated
                                  branch and memory divergence tolerance   235--246
          Srimat Chakradhar and   
         Murugan Sankaradas and   
            Venkata Jakkula and   
                Srihari Cadambi   A dynamically configurable coprocessor
                                  for convolutional neural networks  . . . 247--257
             Colin Blundell and   
              Arun Raghavan and   
              Milo M. K. Martin   RETCON: transactional repair without
                                  replay . . . . . . . . . . . . . . . . . 258--269
              Janghaeng Lee and   
                Haicheng Wu and   
    Madhumitha Ravichandran and   
                   Nathan Clark   Thread Tailor: dynamically weaving
                                  threads together for efficient, adaptive
                                  parallel applications  . . . . . . . . . 270--279
                Sunpyo Hong and   
                    Hyesoon Kim   An integrated GPU power and performance
                                  model  . . . . . . . . . . . . . . . . . 280--289
                Zhangxi Tan and   
            Andrew Waterman and   
                 Henry Cook and   
                 Sarah Bird and   
           Krste Asanovi\'c and   
                David Patterson   A case for FAME: FPGA architecture model
                                  execution  . . . . . . . . . . . . . . . 290--301
             Geoffrey Blake and   
       Ronald G. Dreslinski and   
               Trevor Mudge and   
      Krisztián Flautner   Evolution of thread-level parallelism in
                                  desktop applications . . . . . . . . . . 302--313
         Vijay Janapa Reddi and   
            Benjamin C. Lee and   
           Trishul Chilimbi and   
                  Kushagra Vaid   Web search using mobile cores:
                                  quantifying and mitigating the price of
                                  efficiency . . . . . . . . . . . . . . . 314--325
Vijayaraghavan Soundararajan and   
           Jennifer M. Anderson   The impact of management operations on
                                  the virtualized datacenter . . . . . . . 326--337
                Dennis Abts and   
           Michael R. Marty and   
            Philip M. Wells and   
             Peter Klausler and   
                       Hong Liu   Energy proportional datacenter networks  338--347
             Charles P. Thacker   Improving the future by examining the
                                  past . . . . . . . . . . . . . . . . . . 348--348
                  Olivier Temam   The rebirth of neural networks . . . . . 349--349
                Eric Keller and   
               Jakub Szefer and   
           Jennifer Rexford and   
                    Ruby B. Lee   NoHype: virtualized cloud infrastructure
                                  without the virtualization . . . . . . . 350--361
              Stijn Eyerman and   
                Lieven Eeckhout   Modeling critical sections in Amdahl's
                                  Law and its implications for multicore
                                  design . . . . . . . . . . . . . . . . . 362--370
               Xiaochen Guo and   
                 Engin Ipek and   
                   Tolga Soyata   Resistive computation: avoiding the
                                  power wall with low-leakage, STT-MRAM
                                  based computing  . . . . . . . . . . . . 371--382
              Nak Hee Seong and   
              Dong Hyuk Woo and   
              Hsien-Hsin S. Lee   Security refresh: prevent malicious
                                  wear-out and increase durability for
                                  phase-change memory with dynamically
                                  randomized address mapping . . . . . . . 383--394
               Ruirui Huang and   
                  G. Edward Suh   IVEC: off-chip memory integrity
                                  protection for both security and
                                  reliability  . . . . . . . . . . . . . . 395--406
         Arrvindh Shriraman and   
              Sandhya Dwarkadas   Sentry: light-weight auxiliary memory
                                  access control . . . . . . . . . . . . . 407--418
              Enric Herrero and   
José González and   
                    Ramon Canal   Elastic cooperative caching: an
                                  autonomous dynamically adaptive memory
                                  hierarchy for chip multiprocessors . . . 419--428
               John H. Kelm and   
          Daniel R. Johnson and   
              William Tuohy and   
          Steven S. Lumetta and   
                Sanjay J. Patel   Cohesion: a hybrid memory model for
                                  accelerators . . . . . . . . . . . . . . 429--440
           M. Aater Suleman and   
                 Onur Mutlu and   
        José A. Joao and   
                    Khubaib and   
                   Yale N. Patt   Data marshaling for multi-core
                                  architectures  . . . . . . . . . . . . . 441--450
              Victor W. Lee and   
               Changkyu Kim and   
             Jatin Chhugani and   
            Michael Deisher and   
                Daehyun Kim and   
          Anthony D. Nguyen and   
            Nadathur Satish and   
        Mikhail Smelyanskiy and   
        Srinivas Chennupaty and   
             Per Hammarlund and   
              Ronak Singhal and   
                  Pradeep Dubey   Debunking the 100X GPU vs. CPU myth: an
                                  evaluation of throughput computing on
                                  CPU and GPU  . . . . . . . . . . . . . . 451--460
            Vilas Sridharan and   
                 David R. Kaeli   Using hardware vulnerability factors to
                                  enhance AVF analysis . . . . . . . . . . 461--472
                Amin Ansari and   
              Shuguang Feng and   
             Shantanu Gupta and   
                   Scott Mahlke   Necromancer: enhancing system throughput
                                  by animating dead cores  . . . . . . . . 473--484
                 Guihai Yan and   
              Xiaoyao Liang and   
                  Yinhe Han and   
                     Xiaowei Li   Leveraging the core-level complementary
                                  effects of PVT variations to reduce
                                  timing emergencies in multi-core
                                  processors . . . . . . . . . . . . . . . 485--496
             Marc de Kruijf and   
               Shuou Nomura and   
      Karthikeyan Sankaralingam   Relax: an architectural framework for
                                  software recovery of hardware faults . . 497--508

ACM SIGARCH Computer Architecture News
Volume 38, Number 4, September, 2010

  Marco Nuño-Maganda and   
           Cesar Torres-Huitzil   A temporal coding hardware
                                  implementation for spiking neural
                                  networks . . . . . . . . . . . . . . . . 2--7
          Hirokazu Morisita and   
            Kenta Inakagata and   
             Yasunori Osana and   
             Naoyuki Fujita and   
                 Hideharu Amano   Implementation and evaluation of an
                                  arithmetic pipeline on FLOPS-$2$D:
                                  multi-FPGA system  . . . . . . . . . . . 8--13
            Anson H. T. Tse and   
            David B. Thomas and   
                 K. H. Tsoi and   
                      Wayne Luk   Efficient reconfigurable design for
                                  pricing Asian options  . . . . . . . . . 14--20
           Tadayoshi Horita and   
                 Itsuo Takanami   An FPGA-based fast classifier with high
                                  generalization property  . . . . . . . . 21--26
              Andrew Putnam and   
                Aaron Smith and   
                    Doug Burger   Dynamic vectorization in the E2 dynamic
                                  multicore architecture . . . . . . . . . 27--32
            Jong Kyung Paek and   
               Kiyoung Choi and   
                    Jongeun Lee   Binary acceleration using coarse-grained
                                  reconfigurable architecture  . . . . . . 33--39
               Keisuke Dohi and   
           Yuichiro Shibata and   
            Tsuyoshi Hamada and   
            Tomonari Masada and   
              Kiyoshi Oguri and   
                Duncan A. Buell   Implementation of a programming
                                  environment with a multithread model for
                                  reconfigurable systems . . . . . . . . . 40--45
            Mojtaba Sabeghi and   
              Hamid Mushtaq and   
                   Koen Bertels   Runtime multitasking support on
                                  polymorphic platforms  . . . . . . . . . 46--52
             Kuen Hung Tsoi and   
            Anson H. T. Tse and   
             Peter Pietzuch and   
                      Wayne Luk   Programming framework for clusters with
                                  heterogeneous accelerators . . . . . . . 53--59
             Claude Tadonki and   
          Gilbert Grodidier and   
                   Olivier Pene   An efficient CELL library for lattice
                                  quantum chromodynamics . . . . . . . . . 60--65
                Ryan Taylor and   
                    Xiaoming Li   Software-based branch predication for
                                  AMD GPUs . . . . . . . . . . . . . . . . 66--72
          Sebastian Banescu and   
        Florent de Dinechin and   
               Bogdan Pasca and   
                   Radu Tudoran   Multipliers for floating-point double
                                  precision and beyond on FPGAs  . . . . . 73--79
               Kentaro Sano and   
                Luzhou Wang and   
                Satoru Yamamoto   Prototype implementation of
                                  array-processor extensible over multiple
                                  FPGAs for scalable stencil computation   80--86
             Chi-Chiu Tsang and   
             Hayden Kwok-Hay So   Dynamic power reduction of FPGA-based
                                  reconfigurable computers using
                                  precomputation . . . . . . . . . . . . . 87--92
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 93--96

ACM SIGARCH Computer Architecture News
Volume 38, Number 5, December, 2010

        Manideepa Mukherjee and   
                 Amitabha Sinha   A novel architecture for conversion of
                                  binary to single digit double base
                                  numbers  . . . . . . . . . . . . . . . . 1--6
                  Shobha T. and   
                 Syed Akram and   
                  G. Varaprasad   Design and development of framework for
                                  diagnosing intermediate nodes  . . . . . 7--11
                     Fuad Tabba   Adding concurrency in Python using a
                                  commercial processor's hardware
                                  transactional memory support . . . . . . 12--19
            Alexander Thomasian   Why specialized disks for composite
                                  operations may be unnecessary  . . . . . 20--27
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 28--36

ACM SIGARCH Computer Architecture News
Volume 39, Number 1, March, 2011

                 James R. Larus   The cloud will change everything . . . . 1--2
                  Ding Yuan and   
                 Jing Zheng and   
                Soyeon Park and   
              Yuanyuan Zhou and   
                  Stefan Savage   Improving software diagnosability via
                                  log enhancement  . . . . . . . . . . . . 3--14
      Kaushik Veeraraghavan and   
               Dongyoon Lee and   
            Benjamin Wester and   
             Jessica Ouyang and   
              Peter M. Chen and   
                Jason Flinn and   
            Satish Narayanasamy   DoublePlay: parallelizing sequential
                                  logging and replay . . . . . . . . . . . 15--26
               Jared Casper and   
              Tayo Oguntebi and   
              Sungpack Hong and   
          Nathan G. Bronson and   
         Christos Kozyrakis and   
                 Kunle Olukotun   Hardware acceleration of transactional
                                  memory on commodity systems  . . . . . . 27--38
           Luke Dalessandro and   
    François Carouge and   
                 Sean White and   
                  Yossi Lev and   
                  Mark Moir and   
           Michael L. Scott and   
               Michael F. Spear   Hybrid NOrec: a case study in the
                                  effectiveness of best effort hardware
                                  transactional memory . . . . . . . . . . 39--52
           Abhayendra Singh and   
              Daniel Marino and   
        Satish Narayanasamy and   
             Todd Millstein and   
                Madan Musuvathi   Efficient processor support for DRFx, a
                                  memory model with exceptions . . . . . . 53--66
            Joseph Devietti and   
               Jacob Nelson and   
                 Tom Bergan and   
                  Luis Ceze and   
                   Dan Grossman   RCDC: a relaxed consistency
                                  deterministic computer . . . . . . . . . 67--78
               Jacob Burnim and   
              George Necula and   
                    Koushik Sen   Specifying and checking semantic
                                  atomicity for multithreaded programs . . 79--90
                Haris Volos and   
           Andres Jaan Tack and   
               Michael M. Swift   Mnemosyne: lightweight persistent memory 91--104
                Joel Coburn and   
        Adrian M. Caulfield and   
                 Ameen Akel and   
             Laura M. Grupp and   
            Rajesh K. Gupta and   
               Ranjit Jhala and   
                 Steven Swanson   NV-Heaps: making persistent objects fast
                                  and safe with next-generation,
                                  non-volatile memories  . . . . . . . . . 105--118
      Adrian Schüpbach and   
             Andrew Baumann and   
             Timothy Roscoe and   
                    Simon Peter   A declarative language approach to
                                  device configuration . . . . . . . . . . 119--132
              Leonid Ryzhyk and   
                  John Keys and   
          Balachandra Mirla and   
             Arun Raghunath and   
                   Mona Vij and   
                  Gernot Heiser   Improved device driver reliability
                                  through hardware verification reuse  . . 133--144
                Atif Hashmi and   
                Andrew Nere and   
         James Jamal Thomas and   
                  Mikko Lipasti   A case for neuromorphic ISAs . . . . . . 145--158
          Benjamin Ransford and   
               Jacob Sorber and   
                       Kevin Fu   Mementos: system support for
                                  long-running computation on RFID-scale
                                  devices  . . . . . . . . . . . . . . . . 159--170
      Emmanouil Koukoumidis and   
    Dimitrios Lymberopoulos and   
              Karin Strauss and   
                    Jie Liu and   
                    Doug Burger   Pocket cloudlets . . . . . . . . . . . . 171--184
               Navin Sharma and   
                Sean Barker and   
                David Irwin and   
                Prashant Shenoy   Blink: managing server clusters on
                                  intermittent power . . . . . . . . . . . 185--198
             Henry Hoffmann and   
         Stelios Sidiroglou and   
             Michael Carbin and   
            Sasa Misailovic and   
              Anant Agarwal and   
                  Martin Rinard   Dynamic knobs for responsive power-aware
                                  computing  . . . . . . . . . . . . . . . 199--212
                   Song Liu and   
       Karthik Pattabiraman and   
          Thomas Moscibroda and   
               Benjamin G. Zorn   Flikker: saving DRAM refresh-power
                                  through critical data partitioning . . . 213--224
              Qingyuan Deng and   
              David Meisner and   
                 Luiz Ramos and   
          Thomas F. Wenisch and   
              Ricardo Bianchini   MemScale: active low-power modes for
                                  main memory  . . . . . . . . . . . . . . 225--238
                     Qi Gao and   
               Wenbin Zhang and   
                Zhezhe Chen and   
                  Mai Zheng and   
                       Feng Qin   2ndStrike: toward manifesting hidden
                                  concurrency typestate bugs . . . . . . . 239--250
                  Wei Zhang and   
                Junghee Lim and   
          Ramya Olichandran and   
             Joel Scherpelz and   
               Guoliang Jin and   
                    Shan Lu and   
                    Thomas Reps   ConSeq: detecting concurrency bugs
                                  through sequential errors  . . . . . . . 251--264
           Vitaly Chipounov and   
        Volodymyr Kuznetsov and   
                  George Candea   S2E: a platform for in-vivo multi-path
                                  analysis of software systems . . . . . . 265--278
            Owen S. Hofmann and   
               Alan M. Dunn and   
                Sangman Kim and   
               Indrajit Roy and   
                 Emmett Witchel   Ensuring operating system kernel
                                  integrity with OSck  . . . . . . . . . . 279--290
           Donald E. Porter and   
        Silas Boyd-Wickizer and   
                 Jon Howell and   
             Reuben Olinsky and   
                  Galen C. Hunt   Rethinking the library OS from the top
                                  down . . . . . . . . . . . . . . . . . . 291--304
              Nicolas Palix and   
           Gaël Thomas and   
                 Suman Saha and   
        Christophe Calv\`es and   
               Julia Lawall and   
                  Gilles Muller   Faults in Linux: ten years later . . . . 305--318
          Hadi Esmaeilzadeh and   
                   Ting Cao and   
                    Yang Xi and   
       Stephen M. Blackburn and   
            Kathryn S. McKinley   Looking back on the language and
                                  hardware revolutions: measured power,
                                  performance, and scaling . . . . . . . . 319--332
              Donald Nguyen and   
                 Keshav Pingali   Synthesizing concurrent schedulers for
                                  irregular algorithms . . . . . . . . . . 333--344
                Giang Hoang and   
        Robby Bruce Findler and   
                    Russ Joseph   Exploring circuit timing-aware language
                                  and compilation  . . . . . . . . . . . . 345--356
           Sardar M. Farhad and   
                  Yousun Ko and   
          Bernd Burgstaller and   
                Bernhard Scholz   Orchestration by approximation: mapping
                                  stream programs onto multicore
                                  architectures  . . . . . . . . . . . . . 357--368
              Eddy Z. Zhang and   
              Yunlian Jiang and   
                   Ziyu Guo and   
                   Kai Tian and   
                    Xipeng Shen   On-the-fly elimination of dynamic
                                  irregularities for GPU computing . . . . 369--380
            Amir H. Hormati and   
             Mehrzad Samadi and   
                   Mark Woh and   
               Trevor Mudge and   
                   Scott Mahlke   Sponge: portable stream programming on
                                  graphics engines . . . . . . . . . . . . 381--392
             Md Kamruzzaman and   
             Steven Swanson and   
                Dean M. Tullsen   Inter-core prefetching for multicore
                                  processors using migrating helper
                                  threads  . . . . . . . . . . . . . . . . 393--404
      Hiroshige Hayashizaki and   
                    Peng Wu and   
              Hiroshi Inoue and   
        Mauricio J. Serrano and   
                Toshio Nakatani   Improving the performance of trace-based
                                  systems by false loop filtering  . . . . 405--418

ACM SIGARCH Computer Architecture News
Volume 39, Number 2, May, 2011

             Nathan Binkert and   
          Bradford Beckmann and   
              Gabriel Black and   
        Steven K. Reinhardt and   
                  Ali Saidi and   
             Arkaprava Basu and   
              Joel Hestness and   
             Derek R. Hower and   
             Tushar Krishna and   
          Somayeh Sardashti and   
               Rathijit Sen and   
               Korey Sewell and   
            Muhammad Shoaib and   
                Nilay Vaish and   
               Mark D. Hill and   
                  David A. Wood   The gem5 simulator . . . . . . . . . . . 1--7
            Alexander Thomasian   Survey and analysis of disk scheduling
                                  methods  . . . . . . . . . . . . . . . . 8--25
          Thimmarayaswamy K and   
             Mary M. Dsouza and   
                  G. Varaprasad   Low power techniques for an Android
                                  based phone  . . . . . . . . . . . . . . 26--35
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 36--52

ACM SIGARCH Computer Architecture News
Volume 39, Number 3, June, 2011

                Atif Hashmi and   
               Hugues Berry and   
              Olivier Temam and   
                  Mikko Lipasti   Automatic abstraction and fault
                                  tolerance in cortical microachitectures  1--10
         Niket K. Choudhary and   
         Salil V. Wadhavkar and   
             Tanmay A. Shah and   
               Hiran Mayukh and   
             Jayneel Gandhi and   
           Brandon H. Dwiel and   
             Sandeep Navada and   
      Hashem H. Najaf-abadi and   
                 Eric Rotenberg   FabScalar: composing synthesizable RTL
                                  designs of arbitrary cores within a
                                  canonical superscalar template . . . . . 11--22
               Erika Gunadi and   
               Mikko H. Lipasti   CRIB: consolidated rename, issue, and
                                  bypass . . . . . . . . . . . . . . . . . 23--32
              Rishi Agarwal and   
                Josep Torrellas   FlexBulk: intelligently forming atomic
                                  blocks in blocked-execution
                                  multiprocessors to minimize squashes . . 33--44
              Youngjin Kwon and   
               Changdae Kim and   
           Seungryoul Maeng and   
                    Jaehyuk Huh   Virtualizing performance asymmetric
                                  multi-core systems . . . . . . . . . . . 45--56
             Daniel Sanchez and   
             Christos Kozyrakis   Vantage: scalable and efficient
                                  fine-grain cache partitioning  . . . . . 57--68
             Asit K. Mishra and   
               Xiangyu Dong and   
                Guangyu Sun and   
                   Yuan Xie and   
           N. Vijaykrishnan and   
                   Chita R. Das   Architecting on-chip interconnects for
                                  stacked $3$D STT-RAM caches in CMPs  . . 69--80
                Jayesh Gaur and   
           Mainak Chaudhuri and   
           Sreenivas Subramoney   Bypass and insertion algorithms for
                                  exclusive last-level caches  . . . . . . 81--92
             Blas A. Cuesta and   
                Alberto Ros and   
María E. Gómez and   
             Antonio Robles and   
           José F. Duato   Increasing the effectiveness of
                                  directory caches by deactivating
                                  coherence for private memory blocks  . . 93--104
                  Jungju Oh and   
            Milos Prvulovic and   
                   Alenka Zajic   TLSync: support for multiple fast
                                  barriers using on-chip transmission
                                  lines  . . . . . . . . . . . . . . . . . 105--116
         Neal Clayton Crago and   
             Sanjay Jeram Patel   OUTRIDER: efficient memory latency
                                  tolerance with decoupled strands . . . . 117--128
                 Yunsup Lee and   
            Rimas Avizienis and   
               Alex Bishara and   
                Richard Xia and   
             Derek Lockhart and   
         Christopher Batten and   
               Krste Asanovi\'c   Exploring the tradeoffs between
                                  programmability and efficiency in
                                  data-parallel accelerators . . . . . . . 129--140
             Eiman Ebrahimi and   
              Chang Joo Lee and   
                 Onur Mutlu and   
                   Yale N. Patt   Prefetch-aware shared resource
                                  management for multi-core systems  . . . 141--152
              Rishi Agarwal and   
                Pranav Garg and   
                Josep Torrellas   Rebound: scalable checkpointing for
                                  coherent shared memory . . . . . . . . . 153--164
       Joseph L. Greathouse and   
                Zhiqiang Ma and   
           Matthew I. Frank and   
                Ramesh Peri and   
                    Todd Austin   Demand-driven software race detection
                                  using hardware performance counters  . . 165--176
         Siddhartha Chhabra and   
                    Yan Solihin   i-NVMM: a secure non-volatile main
                                  memory system with incremental
                                  encryption . . . . . . . . . . . . . . . 177--188
               Mohit Tiwari and   
             Jason K. Oberg and   
                     Xun Li and   
          Jonathan Valamehr and   
              Timothy Levin and   
              Ben Hardekopf and   
               Ryan Kastner and   
          Frederic T. Chong and   
               Timothy Sherwood   Crafting a usable microkernel,
                                  processor, and I/O system with strict
                                  and provable information flow security   189--200
               Shuou Nomura and   
        Matthew D. Sinclair and   
                Chen-Han Ho and   
    Venkatraman Govindaraju and   
             Marc de Kruijf and   
      Karthikeyan Sankaralingam   Sampling $+$ DMR: practical and
                                  low-overhead permanent fault detection   201--212
    Sangeetha Sudhakrishnan and   
              Rigo Dicochea and   
                     Jose Renau   Releasing efficient beta cores to market
                                  early  . . . . . . . . . . . . . . . . . 213--222
       Mehrtash Manoochehri and   
           Murali Annavaram and   
                  Michel Dubois   CPPC: correctable parity protected cache 223--234
               Mark Gebhart and   
          Daniel R. Johnson and   
               David Tarjan and   
         Stephen W. Keckler and   
           William J. Dally and   
              Erik Lindholm and   
                  Kevin Skadron   Energy-efficient mechanisms for managing
                                  thread context in throughput processors  235--246
             Wing-kei S. Yu and   
               Ruirui Huang and   
                Sarah Q. Xu and   
               Sung-En Wang and   
                  Edwin Kan and   
                  G. Edward Suh   SRAM--DRAM hybrid memory with
                                  applications to efficient register files
                                  in fine-grained multi-threading  . . . . 247--258
                Binzhang Fu and   
                  Yinhe Han and   
                     Jun Ma and   
                  Huawei Li and   
                     Xiaowei Li   An abacus turn model for
                                  time/space-efficient reconfigurable
                                  routing  . . . . . . . . . . . . . . . . 259--270
            Aaron Carpenter and   
                 Jianyun Hu and   
                     Jie Xu and   
              Michael Huang and   
                         Hui Wu   A case for globally shared-medium
                                  on-chip interconnect . . . . . . . . . . 271--282
               Lingjia Tang and   
                 Jason Mars and   
          Neil Vachharajani and   
               Robert Hundt and   
                 Mary Lou Soffa   The impact of memory subsystem resource
                                  sharing on datacenter applications . . . 283--294
              Doe Hyun Yoon and   
              Min Kyu Jeong and   
                    Mattan Erez   Adaptive granularity memory systems: a
                                  tradeoff between storage efficiency and
                                  throughput . . . . . . . . . . . . . . . 295--306
             Thomas W. Barr and   
                Alan L. Cox and   
                   Scott Rixner   SpecTLB: a mechanism for speculative
                                  address translation  . . . . . . . . . . 307--318
              David Meisner and   
      Christopher M. Sadler and   
  Luiz André Barroso and   
        Wolf-Dietrich Weber and   
              Thomas F. Wenisch   Power management of online
                                  data-intensive services  . . . . . . . . 319--330
              Susmit Biswas and   
               Mohit Tiwari and   
           Timothy Sherwood and   
           Luke Theogarajan and   
              Frederic T. Chong   Fighting fire with fire: modeling the
                                  datacenter-scale effects of targeted
                                  superlattice thermal management  . . . . 331--340
            Sriram Govindan and   
      Anand Sivasubramaniam and   
               Bhuvan Urgaonkar   Benefits and limitations of tapping into
                                  stored energy for datacenters  . . . . . 341--352
                 John Demme and   
            Simha Sethumadhavan   Rapid identification of architectural
                                  bottlenecks via precise event counting   353--364
          Hadi Esmaeilzadeh and   
                 Emily Blem and   
            Renee St. Amant and   
  Karthikeyan Sankaralingam and   
                    Doug Burger   Dark silicon and the end of multicore
                                  scaling  . . . . . . . . . . . . . . . . 365--376
                Guangyu Sun and   
      Christopher J. Hughes and   
               Changkyu Kim and   
                Jishen Zhao and   
                    Cong Xu and   
                   Yuan Xie and   
                 Yen-Kuang Chen   Moguls: a model to explore the memory
                                  hierarchy for bandwidth improvements . . 377--388
             Asit K. Mishra and   
           N. Vijaykrishnan and   
                   Chita R. Das   A case for heterogeneous on-chip
                                  interconnects for CMPs . . . . . . . . . 389--400
                 Boris Grot and   
              Joel Hestness and   
         Stephen W. Keckler and   
                     Onur Mutlu   Kilo-NOC: a heterogeneous
                                  network-on-chip architecture for
                                  scalability and service guarantees . . . 401--412
                   Sheng Ma and   
     Natalie Enright Jerger and   
                   Zhiying Wang   DBAR: an efficient routing algorithm to
                                  support multiple concurrent applications
                                  in networks-on-chip  . . . . . . . . . . 413--424
         Aniruddha N. Udipi and   
       Naveen Muralimanohar and   
     Rajeev Balasubramonian and   
                   Al Davis and   
               Norman P. Jouppi   Combining memory and a controller with
                                  photonics through $3$D-stacking to
                                  enable scalable and energy-efficient
                                  systems  . . . . . . . . . . . . . . . . 425--436
             Nathan Binkert and   
                   Al Davis and   
           Norman P. Jouppi and   
              Moray McLaren and   
       Naveen Muralimanohar and   
           Robert Schreiber and   
                    Jung Ho Ahn   The role of optics in future high radix
                                  switch design  . . . . . . . . . . . . . 437--448
                     Kai Ma and   
                     Xue Li and   
                  Ming Chen and   
                   Xiaorui Wang   Scalable power control for many-core
                                  architectures running multi-threaded
                                  applications . . . . . . . . . . . . . . 449--460
         Alaa R. Alameldeen and   
                Ilya Wagner and   
             Zeshan Chishti and   
                     Wei Wu and   
            Chris Wilkerson and   
                   Shih-Lien Lu   Energy-efficient cache design using
                                  variable-strength error-correcting codes 461--472
             Luiz Andre Barroso   Warehouse-Scale Computing: Entering the
                                  Teenage Decade . . . . . . . . . . . . . ??
              David A. Ferrucci   IBM's Watson/DeepQA  . . . . . . . . . . ??
                    Ravi Kannan   Algorithms: Recent Highlights and
                                  Challenges . . . . . . . . . . . . . . . ??

ACM SIGARCH Computer Architecture News
Volume 39, Number 4, September, 2011

              Miriam Leeser and   
            Devon Yablonski and   
                Dana Brooks and   
              Laurie Smith King   The challenges of writing portable,
                                  correct and high performance libraries
                                  for GPUs . . . . . . . . . . . . . . . . 2--7
             Kuen Hung Tsoi and   
                      Wayne Luk   Power profiling and optimization for
                                  heterogeneous multi-core systems . . . . 8--13
           Serban Georgescu and   
                     Peter Chow   GPU accelerated CAE using open solvers
                                  and the cloud  . . . . . . . . . . . . . 14--19
               Junying Chen and   
            Billy Y. S. Yiu and   
        Brandon K. Hamilton and   
            Alfred C. H. Yu and   
                Hayden K.-H. So   Design space exploration of adaptive
                                  beamforming acceleration for bedside and
                                  portable medical ultrasound imaging  . . 20--25
               Keisuke Dohi and   
           Yuichiro Shibata and   
              Kiyoshi Oguri and   
              Takafumi Fujimoto   GPU implementation and optimization of
                                  electromagnetic simulation using the
                                  FDTD method for antenna designing  . . . 26--31
         Tomoyuki Nagatsuka and   
          Yoshito Sakaguchi and   
         Takayuki Matsumura and   
                     Kenji Kise   CoreSymphony: an efficient
                                  reconfigurable multi-core architecture   32--37
  Shinya Takamaeda-Yamazaki and   
           Ryosuke Sasakawa and   
          Yoshito Sakaguchi and   
                     Kenji Kise   An FPGA-based scalable simulation
                                  accelerator for tile architectures . . . 38--43
               Kentaro Sano and   
            Satoru Yamamoto and   
               Yoshiaki Hatsuda   Domain-specific programmable design of
                                  scalable streaming-array for
                                  power-efficient stencil computation  . . 44--49
           Takayuki Akamine and   
            Kenta Inakagata and   
             Yasunori Osana and   
             Naoyuki Fujita and   
                 Hideharu Amano   An implementation of out-of-order
                                  execution system for acceleration of
                                  computational fluid dynamics on FPGAs    50--55
               Haisheng Liu and   
                 Smail Niar and   
          Yassin El-Hillali and   
                   Atika Rivenq   Embedded architecture with hardware
                                  accelerator for target recognition in
                                  driver assistance system . . . . . . . . 56--59
                Oliver Pell and   
                   Oskar Mencer   Surviving the end of frequency scaling
                                  with reconfigurable dataflow computing   60--65
                Ana Balevic and   
                  Bart Kienhuis   KPN2GPU: an approach for discovery and
                                  exploitation of fine-grain data
                                  parallelism in process networks  . . . . 66--71
             Amila Akagi\'c and   
                 Hideharu Amano   High speed CRC with 64-bit generator
                                  polynomial on an FPGA  . . . . . . . . . 72--77
                Shufan Yang and   
                T. M. McGinnity   A biologically plausible real-time
                                  spiking neuron simulation environment
                                  based on a multiple-FPGA platform  . . . 78--81
             Hiroomi Sawada and   
              Morihiro Kuga and   
           Motoki Amagasaki and   
              Masahiro Iida and   
             Toshinori Sueyoshi   Parallelization of the channel width
                                  search for FPGA routing  . . . . . . . . 82--85
               Shoji Tanabe and   
           Takuya Nagashima and   
              Yoshiki Yamaguchi   A study of an FPGA based flexible SIMD
                                  processor  . . . . . . . . . . . . . . . 86--89
             Antoine Trouve and   
               Kazuaki Murakami   Augmenting DR-ASIP flexibility through
                                  multi-mode custom instructions . . . . . 90--93
              Shinya Kubota and   
                Minoru Watanabe   A MEMS writer system embedded for a
                                  programmable optically reconfigurable
                                  gate array . . . . . . . . . . . . . . . 94--97
                 Jan Fousek and   
         Ji\vri Filipovi\vc and   
                 Matu\vs Madzin   Automatic fusions of CUDA--GPU kernels
                                  for parallel map . . . . . . . . . . . . 98--99
            Kohei Matsunobu and   
               Keisuke Dohi and   
           Yuichiro Shibata and   
                  Kiyoshi Oguri   A discussion on calculating eigenvalues
                                  of real symmetric tridiagonal matrices
                                  on a GPU . . . . . . . . . . . . . . . . 100--101
              Dominik Meyer and   
                   Bernd Klauer   Multicore reconfiguration platform an
                                  alternative to RAMPSoC . . . . . . . . . 102--103
               Robin Bonamy and   
             Daniel Chillet and   
           Olivier Sentieys and   
             Sebastien Bilavarn   Parallelism Level Impact on Energy
                                  Consumption in Reconfigurable Devices    104--105
      Michael Opoku Agyeman and   
                  Ali Ahmadinia   Power and area optimisation in
                                  heterogeneous $3$D networks-on-chip
                                  architectures  . . . . . . . . . . . . . 106--107
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 108--117

ACM SIGARCH Computer Architecture News
Volume 39, Number 5, December, 2011

                  Malay Das and   
             Amitabha Sinha and   
             Nishant Kumar Giri   High speed residue number system (RNS)
                                  based FIR filter using distributed
                                  arithmetic (DA)  . . . . . . . . . . . . 1--4
       Anindita Chakraborty and   
                 Amitabha Sinha   Conversion of binary to single-term
                                  triple base numbers for DSP applications 5--11
           Satrughna Singha and   
            Aniruddha Ghosh and   
                 Amitabha Sinha   A new architecture for FPGA based
                                  implementation of conversion of binary
                                  to double base number system (DBNS)
                                  using parallel search technique  . . . . 12--18
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 19--23

ACM SIGARCH Computer Architecture News
Volume 40, Number 1, March, 2012

    Dimitrios Lymberopoulos and   
                Oriana Riva and   
              Karin Strauss and   
              Akshay Mittal and   
             Alexandros Ntoulas   PocketWeb: instant web browsing for
                                  mobile devices . . . . . . . . . . . . . 1--12
          Felix Xiaozhu Lin and   
                  Zhen Wang and   
             Robert LiKamWa and   
                      Lin Zhong   Reflex: using low-power processors in
                                  smartphones without knowing them . . . . 13--24
              Jichuan Chang and   
                Justin Meza and   
  Parthasarathy Ranganathan and   
                  Amip Shah and   
                 Rocky Shih and   
                    Cullen Bash   Totally green: evaluating and designing
                                  servers for lifecycle environmental
                                  impact . . . . . . . . . . . . . . . . . 25--36
            Michael Ferdman and   
             Almutaz Adileh and   
             Onur Kocberber and   
              Stavros Volos and   
         Mohammad Alisafaee and   
            Djordje Jevdjic and   
               Cansu Kaynak and   
      Adrian Daniel Popescu and   
         Anastasia Ailamaki and   
                  Babak Falsafi   Clearing the clouds: a study of emerging
                                  scale-out workloads on modern hardware   37--48
                  Yang Chen and   
              Shuangde Fang and   
            Lieven Eeckhout and   
              Olivier Temam and   
                   Chengyong Wu   Iterative optimization for the data
                                  center . . . . . . . . . . . . . . . . . 49--60
                Faraz Ahmad and   
       Srimat T. Chakradhar and   
          Anand Raghunathan and   
               T. N. Vijaykumar   Tarazu: optimizing MapReduce on
                                  heterogeneous clusters . . . . . . . . . 61--74
            Sriram Govindan and   
                    Di Wang and   
      Anand Sivasubramaniam and   
               Bhuvan Urgaonkar   Leveraging stored energy for handling
                                  power emergencies in aggressively
                                  provisioned datacenters  . . . . . . . . 75--86
                 Asim Kadav and   
               Michael M. Swift   Understanding modern device drivers  . . 87--98
Sankaralingam Panneerselvam and   
               Michael M. Swift   Chameleon: operating system support for
                                  dynamic processors . . . . . . . . . . . 99--110
              Andy A. Hwang and   
        Ioan A. Stefanovici and   
               Bianca Schroeder   Cosmic rays don't strike twice:
                                  understanding the nature of DRAM errors
                                  and the implications for system design   111--122
     Siva Kumar Sastry Hari and   
             Sarita V. Adve and   
               Helia Naeimi and   
           Pradeep Ramachandran   Relyzer: exploiting application-level
                                  fault equivalence to analyze application
                                  resiliency to transient faults . . . . . 123--134
               Peter Feiner and   
         Angela Demke Brown and   
                    Ashvin Goel   Comprehensive kernel instrumentation via
                                  dynamic binary translation . . . . . . . 135--146
                 Rei Odaira and   
                Toshio Nakatani   Continuous object access profiling and
                                  optimizations to overcome the memory
                                  wall and bloat . . . . . . . . . . . . . 147--158
       Joseph L. Greathouse and   
                 Hongyi Xin and   
                  Yixin Luo and   
                    Todd Austin   A case for unlimited watchpoints . . . . 159--172
            Marek Olszewski and   
                   Qin Zhao and   
                  David Koh and   
                Jason Ansel and   
              Saman Amarasinghe   Aikido: accelerating shared data dynamic
                                  analyses . . . . . . . . . . . . . . . . 173--184
              Baris Kasikci and   
            Cristian Zamfir and   
                  George Candea   Data races vs. data race bugs: telling
                                  the difference with Portend  . . . . . . 185--198
         Austin T. Clements and   
          M. Frans Kaashoek and   
             Nickolai Zeldovich   Scalable address spaces using RCU
                                  balanced trees . . . . . . . . . . . . . 199--210
                Haris Volos and   
           Andres Jaan Tack and   
           Michael M. Swift and   
                        Shan Lu   Applying transactional memory to
                                  concurrency bugs . . . . . . . . . . . . 211--222
        José A. Joao and   
           M. Aater Suleman and   
                 Onur Mutlu and   
                   Yale N. Patt   Bottleneck identification and scheduling
                                  in multithreaded applications  . . . . . 223--234
         Petar Radojkovi\'c and   
       Vladimir Cakarevi\'c and   
       Miquel Moretó and   
        Javier Verdú and   
               Alex Pajuelo and   
       Francisco J. Cazorla and   
           Mario Nemirovsky and   
                   Mateo Valero   Optimal task assignment in multithreaded
                                  processors: a statistical approach . . . 235--248
               Aamer Jaleel and   
      Hashem H. Najaf-abadi and   
      Samantika Subramaniam and   
            Simon C. Steely and   
                      Joel Emer   CRUISE: cache replacement and
                                  utility-aware scheduling . . . . . . . . 249--260
            Matthew DeVuyst and   
              Ashish Venkat and   
                Dean M. Tullsen   Execution migration in a
                                  heterogeneous-ISA chip multiprocessor    261--272
               Changhui Lin and   
            Vijay Nagarajan and   
                Rajiv Gupta and   
              Bharghava Rajaram   Efficient sequential consistency via
                                  conflict ordering  . . . . . . . . . . . 273--286
             David Cheriton and   
         Amin Firoozshahian and   
          Alex Solomatnikov and   
          John P. Stevenson and   
                     Omid Azizi   HICAMP: architectural support for
                                  efficient concurrency-safe shared
                                  structured data access . . . . . . . . . 287--300
          Hadi Esmaeilzadeh and   
             Adrian Sampson and   
                  Luis Ceze and   
                    Doug Burger   Architecture support for disciplined
                                  approximate programming  . . . . . . . . 301--312
              David Meisner and   
              Thomas F. Wenisch   DreamWeaver: architectural support for
                                  deep sleep . . . . . . . . . . . . . . . 313--324
                 Myron King and   
                 Nirav Dave and   
                         Arvind   Automatic generation of
                                  hardware/software interfaces . . . . . . 325--336
         Lorenzo Martignoni and   
           Stephen McCamant and   
          Pongsin Poosankam and   
                  Dawn Song and   
                Petros Maniatis   Path-exploration lifting: hi-fi tests
                                  for lo-fi emulators  . . . . . . . . . . 337--348
              Sungpack Hong and   
               Hassan Chafi and   
                Edic Sedlar and   
                 Kunle Olukotun   Green-Marl: a DSL for easy and efficient
                                  graph analysis . . . . . . . . . . . . . 349--362
               Yongjun Park and   
                Sangwon Seo and   
              Hyunchul Park and   
              Hyoun Kyu Cho and   
                   Scott Mahlke   SIMD defragmenter: efficient ILP
                                  realization on data-parallel
                                  architectures  . . . . . . . . . . . . . 363--374
        Dilip Nijagal Simha and   
                  Maohua Lu and   
                Tzi-cker Chiueh   An update-aware storage system for
                                  low-locality update-intensive workloads  375--386
        Adrian M. Caulfield and   
            Todor I. Mollov and   
          Louis Alex Eisner and   
                    Arup De and   
                Joel Coburn and   
                 Steven Swanson   Providing safe, user space access to
                                  fast, solid state disks  . . . . . . . . 387--400
        Dushyanth Narayanan and   
                   Orion Hodson   Whole-system persistence . . . . . . . . 401--410
                Abel Gordon and   
                 Nadav Amit and   
               Nadav Har'El and   
            Muli Ben-Yehuda and   
                Alex Landau and   
             Assaf Schuster and   
                    Dan Tsafrir   ELI: bare-metal performance for I/O
                                  virtualization . . . . . . . . . . . . . 411--422
           Nedeljko Vasi\'c and   
          Dejan Novakovi\'c and   
            Svetozar Miucin and   
             Dejan Kosti\'c and   
              Ricardo Bianchini   DejaVu: accelerating resource allocation
                                  in virtualized environments  . . . . . . 423--436
               Jakub Szefer and   
                    Ruby B. Lee   Architectural support for
                                  hypervisor-secure virtualization . . . . 437--450
                    Min Lee and   
                 Karsten Schwan   Region scheduling: efficiently using the
                                  cache architectures via page-level
                                  affinity . . . . . . . . . . . . . . . . 451--462

ACM SIGARCH Computer Architecture News
Volume 40, Number 2, May, 2012

          B. H. H. Juurlink and   
              C. H. Meenderinck   Amdahl's law for predicting the future
                                  of multicores considered harmful . . . . 1--9
                 Conrad Mueller   Axiom based architecture . . . . . . . . 10--17
            Alexander Thomasian   Rebuild processing in RAID5 with
                                  emphasis on the supplementary parity
                                  augmentation method  . . . . . . . . . . 18--27
         Nishant Kumar Giri and   
                 Amitabha Sinha   FPGA implementation of a novel
                                  architecture for performance enhancement
                                  of Radix-2 FFT . . . . . . . . . . . . . 28--32
            Aniruddha Ghosh and   
           Satrughna Singha and   
                 Amitabha Sinha   A new architecture for FPGA
                                  implementation of a MAC unit for digital
                                  signal processors using mixed number
                                  system . . . . . . . . . . . . . . . . . 33--38
            Aniruddha Ghosh and   
           Satrughna Singha and   
                 Amitabha Sinha   ``Floating point RNS'': a new concept
                                  for designing the MAC unit of digital
                                  signal processor . . . . . . . . . . . . 39--43
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 44--49

ACM SIGARCH Computer Architecture News
Volume 40, Number 3, June, 2012

                  Jamie Liu and   
                 Ben Jaiyen and   
              Richard Veras and   
                     Onur Mutlu   RAIDR: Retention-Aware Intelligent DRAM
                                  Refresh  . . . . . . . . . . . . . . . . 1--12
        Mahdi Nazm Bojnordi and   
                     Engin Ipek   PARDIS: a programmable memory controller
                                  for the DDRx interfacing standards . . . 13--24
              Doe Hyun Yoon and   
              Jichuan Chang and   
       Naveen Muralimanohar and   
      Parthasarathy Ranganathan   BOOM: enabling mobile memory based
                                  low-power server DIMMs . . . . . . . . . 25--36
         Krishna T. Malladi and   
            Benjamin C. Lee and   
           Frank A. Nothaft and   
         Christos Kozyrakis and   
      Karthika Periyathambi and   
                  Mark Horowitz   Towards energy-proportional datacenter
                                  memory with mobile DRAM  . . . . . . . . 37--48
             Nicolas Brunie and   
           Sylvain Collange and   
                 Gregory Diamos   Simultaneous branch and warp
                                  interweaving for sustained GPU
                                  performance  . . . . . . . . . . . . . . 49--60
                 Minsoo Rhu and   
                    Mattan Erez   CAPRI: prediction of compaction-adequacy
                                  for handling control-divergence in GPGPU
                                  architectures  . . . . . . . . . . . . . 61--71
          Jaikrishnan Menon and   
             Marc De Kruijf and   
      Karthikeyan Sankaralingam   iGPU: exception support and speculative
                                  execution on GPUs  . . . . . . . . . . . 72--83
José-María Arnau and   
      Joan-Manuel Parcerisa and   
          Polychronis Xekalakis   Boosting mobile GPU performance with a
                                  decoupled access/execute fragment
                                  processor  . . . . . . . . . . . . . . . 84--93
             Mehmet Kayaalp and   
               Meltem Ozsoy and   
          Nael Abu-Ghazaleh and   
               Dmitry Ponomarev   Branch regulation: low-overhead
                                  protection from code reuse attacks . . . 94--105
                 John Demme and   
              Robert Martin and   
               Adam Waksman and   
            Simha Sethumadhavan   Side-channel vulnerability factor: a
                                  metric for measuring information leakage 106--117
              Robert Martin and   
                 John Demme and   
            Simha Sethumadhavan   TimeWarp: rethinking timekeeping and
                                  performance monitoring mechanisms to
                                  mitigate side-channel attacks  . . . . . 118--129
          Jonathan Valamehr and   
              Melissa Chase and   
                Seny Kamara and   
              Andrew Putnam and   
                 Dan Shumow and   
       Vinod Vaikuntanathan and   
               Timothy Sherwood   Inspection resistant memory:
                                  architectural support for security from
                                  physical examination . . . . . . . . . . 130--141
                      Yi Xu and   
                   Jun Yang and   
                    Rami Melhem   Tolerating process variations in
                                  nanophotonic on-chip networks  . . . . . 142--152
                Pranay Koka and   
       Michael O. McCracken and   
             Herb Schwetman and   
        Chia-Hsin Owen Chen and   
               Xuezhe Zheng and   
                     Ron Ho and   
                 Kannan Raj and   
        Ashok V. Krishnamoorthy   A micro-architectural analysis of
                                  switched photonic multi-chip
                                  interconnects  . . . . . . . . . . . . . 153--164
            Aaron Carpenter and   
                 Jianyun Hu and   
              Ovunc Kocabas and   
              Michael Huang and   
                         Hui Wu   Enhancing effective throughput for
                                  transmission line-based bus  . . . . . . 165--176
         Michihiro Koibuchi and   
           Hiroki Matsutani and   
             Hideharu Amano and   
               D. Frank Hsu and   
                 Henri Casanova   A case for random shortcut topologies
                                  for HPC interconnects  . . . . . . . . . 177--188
        Santosh Nagarakatte and   
          Milo M. K. Martin and   
                Steve Zdancewic   Watchdog: hardware for safe and secure
                                  manual memory management and full memory
                                  safety . . . . . . . . . . . . . . . . . 189--200
            Joseph Devietti and   
           Benjamin P. Wood and   
              Karin Strauss and   
                  Luis Ceze and   
               Dan Grossman and   
                    Shaz Qadeer   RADISH: always-on sound and complete
                                  \underlineRace \underlineDetection
                                  \underlinein \underlineSoftware and
                                  \underlineHardware . . . . . . . . . . . 201--212
        Kenzo Van Craeynest and   
               Aamer Jaleel and   
            Lieven Eeckhout and   
              Paolo Narvaez and   
                      Joel Emer   Scheduling heterogeneous multi-cores
                                  through Performance Impact Estimation
                                  (PIE)  . . . . . . . . . . . . . . . . . 213--224
                   Ting Cao and   
       Stephen M. Blackburn and   
                 Tiejun Gao and   
            Kathryn S. McKinley   The yin and yang of power and
                                  performance for asymmetric hardware and
                                  managed software . . . . . . . . . . . . 225--236
              Evgeni Krimer and   
             Patrick Chiang and   
                    Mattan Erez   Lane decoupling for improving the
                                  timing-error resiliency of wide-SIMD
                                  architectures  . . . . . . . . . . . . . 237--248
          Timothy N. Miller and   
               Renji Thomas and   
                  Xiang Pan and   
                Radu Teodorescu   VRSync: characterizing and eliminating
                                  synchronization-induced voltage
                                  emergencies in many-core processors  . . 249--260
           Ioannis Doudalis and   
                Milos Prvulovic   Euripus: a flexible unified hardware
                                  memory checkpointing accelerator for
                                  bidirectional-debugging and reliability  261--272
           Arun Arvind Nair and   
              Stijn Eyerman and   
            Lieven Eeckhout and   
               Lizy Kurian John   A first-order mechanistic model for
                                  architectural vulnerability factor . . . 273--284
         Aniruddha N. Udipi and   
       Naveen Muralimanohar and   
      Rajeev Balsubramonian and   
                   Al Davis and   
               Norman P. Jouppi   LOT-ECC: localized and tiered
                                  reliability mechanisms for commodity
                                  memory systems . . . . . . . . . . . . . 285--296
             Arkaprava Basu and   
               Mark D. Hill and   
               Michael M. Swift   Reducing memory reference energy with
                                  opportunistic virtual caching  . . . . . 297--308
                   Zhe Wang and   
             Samira M. Khan and   
       Daniel A. Jiménez   Improving writeback efficiency with
                                  decoupled last-write prediction  . . . . 309--320
               Jaewoong Sim and   
                 Jaekyu Lee and   
       Moinuddin K. Qureshi and   
                    Hyesoon Kim   FLEXclusion: balancing cache capacity
                                  and on-chip bandwidth via flexible
                                  exclusion  . . . . . . . . . . . . . . . 321--332
            Gaurang Upasani and   
                Xavier Vera and   
        Antonio González   Setting an error detection
                                  infrastructure with low cost acoustic
                                  wave detectors . . . . . . . . . . . . . 333--343
          Andrea Pellegrini and   
       Joseph L. Greathouse and   
               Valeria Bertacco   Viper: virtual pipelines for enhanced
                                  reliability  . . . . . . . . . . . . . . 344--355
                  Olivier Temam   A defect-tolerant accelerator for
                                  emerging high-performance applications   356--367
                 Yoongu Kim and   
             Vivek Seshadri and   
               Donghyuk Lee and   
                  Jamie Liu and   
                     Onur Mutlu   A case for exploiting subarray-level
                                  parallelism (SALP) in DRAM . . . . . . . 368--379
       Moinuddin K. Qureshi and   
    Michele M. Franceschini and   
            Ashish Jagmohan and   
                Luis A. Lastras   PreSET: improving performance of phase
                                  change memories by exploiting asymmetry
                                  in write times . . . . . . . . . . . . . 380--391
       Elliott Cooper-Balis and   
             Paul Rosenfeld and   
                    Bruce Jacob   Buffer-on-board memory systems . . . . . 392--403
             Myoungsoo Jung and   
        Ellis H. Wilson III and   
                Mahmut Kandemir   Physically Addressed Queueing (PAQ):
                                  improving parallelism in solid state
                                  disks  . . . . . . . . . . . . . . . . . 404--415
    Rachata Ausavarungnirun and   
        Kevin Kai-Wei Chang and   
        Lavanya Subramanian and   
             Gabriel H. Loh and   
                     Onur Mutlu   Staged memory scheduling: achieving high
                                  performance and scalability in
                                  heterogeneous systems  . . . . . . . . . 416--427
              R. Manikantan and   
              Kaushik Rajan and   
                R. Govindarajan   Probabilistic Shared Cache Management
                                  (PriSM)  . . . . . . . . . . . . . . . . 428--439
            Nadathur Satish and   
               Changkyu Kim and   
             Jatin Chhugani and   
               Hideki Saito and   
         Rakesh Krishnaiyer and   
        Mikhail Smelyanskiy and   
              Milind Girkar and   
                  Pradeep Dubey   Can traditional programming bridge the
                                  Ninja performance gap for parallel
                                  computing applications?  . . . . . . . . 440--451
           Melanie Kambadur and   
                   Kui Tang and   
                  Martha A. Kim   Harmony: collection and analysis of
                                  parallel block vectors . . . . . . . . . 452--463
            David Wentzlaff and   
     Christopher J. Jackson and   
            Patrick Griffin and   
                  Anant Agarwal   Configurable fine-grain protection for
                                  multicore processor virtualization . . . 464--475
              Jeongseob Ahn and   
              Seongwook Jin and   
                    Jaehyuk Huh   Revisiting hardware-assisted page walks
                                  for virtualized systems  . . . . . . . . 476--487
       Vasileios Kontorinis and   
           Liuyi Eric Zhang and   
              Baris Aksanli and   
               Jack Sampson and   
            Houman Homayoun and   
               Eddie Pettis and   
            Dean M. Tullsen and   
          Tajana Simunic Rosing   Managing distributed UPS energy for
                                  effective power capping in data centers  488--499
        Pejman Lotfi-Kamran and   
                 Boris Grot and   
            Michael Ferdman and   
              Stavros Volos and   
             Onur Kocberber and   
             Javier Picorel and   
             Almutaz Adileh and   
            Djordje Jevdjic and   
             Sachin Idgunji and   
                  Emre Ozer and   
                  Babak Falsafi   Scale-out processors . . . . . . . . . . 500--511
                    Chao Li and   
                Amer Qouneh and   
                         Tao Li   iSwitch: coordinating and optimizing
                                  renewable energy powered server clusters 512--523
           Abhayendra Singh and   
        Satish Narayanasamy and   
              Daniel Marino and   
             Todd Millstein and   
             Madanlal Musuvathi   End-to-end sequential consistency  . . . 524--535
                 Jason Mars and   
                   Naveen Kumar   BlockChop: dynamic squash elimination
                                  for hybrid processor architecture  . . . 536--547
              Doe Hyun Yoon and   
              Min Kyu Jeong and   
           Michael Sullivan and   
                    Mattan Erez   The dynamic granularity memory system    548--559

ACM SIGARCH Computer Architecture News
Volume 40, Number 4, September, 2012

         Marcos K. Aguilera and   
              Dahlia Malkhi and   
             Keith Marzullo and   
       Alessandro Panconesi and   
               Andrzej Pelc and   
              Roger Wattenhofer   Announcing the 2012 Edsger W. Dijkstra
                                  Prize in Distributed Computing . . . . . 1--2
           Subhashis Maitra and   
                 Amitabha Sinha   A new algorithm for computing
                                  triple-base number system  . . . . . . . 3--9
                 Shiv Kumar and   
    Seshadri Krishna Murthy and   
              G. Varaprasad and   
                  S. Sivasathya   Network load and traffic pattern on the
                                  capacity of wireless ad hoc networks . . 10--25
                  M. N. Isa and   
                 K. Benkrid and   
                     T. Clayton   Efficient architecture and scheduling
                                  technique for pairwise sequence
                                  alignment  . . . . . . . . . . . . . . . 26--31
              A. K. Oudjida and   
                N. Chaillet and   
           M. L. Berrandjia and   
                      A. Liacha   A new high radix-2 $r$ ($ r \geq 8$)
                                  multibit recoding algorithm for large
                                  operand size ($ N \geq 32$) multipliers  32--43
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 44--48

ACM SIGARCH Computer Architecture News
Volume 40, Number 5, December, 2012

             Hideharu Amano and   
                      Wayne Luk   FPGA-based Connect6 solver with
                                  hardware-accelerated move refinement . . 4--9
          Thomas C. P. Chau and   
                  Wayne Luk and   
             Peter Y. K. Cheung   Roberts: reconfigurable platform for
                                  benchmarking real-time systems . . . . . 10--15
              Kei Kinoshita and   
             Daisuke Takano and   
           Tomoyuki Okamura and   
              Tetsuhiko Yao and   
              Yoshiki Yamaguchi   An augmented reality system with a
                                  coarse-grained reconfigurable device . . 16--21
                Nicholas Ng and   
             Nobuko Yoshida and   
                 Xin Yu Niu and   
                 Kuen Hung Tsoi   Session types: towards safe and fast
                                  reconfigurable programming . . . . . . . 22--27
                Rizwan Syed and   
                   Yajun Ha and   
           Bharadwaj Veeravalli   A low overhead abstract architecture for
                                  FPGA resource management . . . . . . . . 28--33
             Kuen Hung Tsoi and   
              Tobias Becker and   
                      Wayne Luk   Modelling reconfigurable systems in
                                  event driven simulation  . . . . . . . . 34--39
             Zheng Zhi Shun and   
               Tsutomu Maruyama   FPGA acceleration of CDO pricing based
                                  on correlation expansions  . . . . . . . 40--45
            Hiroki Nakahara and   
         Hiroyuki Nakanishi and   
                  Tsutomu Sasao   On a wideband Fast Fourier Transform for
                                  a radio telescope  . . . . . . . . . . . 46--51
                 Cheng Ling and   
             Khaled Benkrid and   
                Tsuyoshi Hamada   High performance phylogenetic analysis
                                  on CUDA-compatible GPUs  . . . . . . . . 52--57
               Colin Yu Lin and   
    Hayden Kwok-Hay Kwok-Hay So   Energy-efficient dataflow computations
                                  on FPGAs using application-specific
                                  coarse-grain architecture synthesis  . . 58--63
      Jamshaid Sarwar Malik and   
            Paolo Palazzari and   
                   Ahmed Hemani   Effort, resources, and abstraction vs
                                  performance in high-level synthesis:
                                  finding new answers to an old question   64--69
           Takeshi Kakimoto and   
               Keisuke Dohi and   
           Yuichiro Shibata and   
                  Kiyoshi Oguri   Performance comparison of GPU
                                  programming frameworks with the striped
                                  Smith--Waterman algorithm  . . . . . . . 70--75
             Julien Tribino and   
      Antoine Trouvé and   
          Hadrien A. Clarke and   
            Kazuaki J. Murakami   PASTIS: a photonic arbitration with
                                  scalable token injection scheme  . . . . 76--81
          Takahiro Watanabe and   
                Minoru Watanabe   $ 0.18 \mu $ m CMOS process
                                  high-sensitivity optically
                                  reconfigurable gate array VLSI . . . . . 82--86
               Shogo Nakaya and   
            Makoto Miyamura and   
            Noboru Sakimura and   
            Yuichi Nakamura and   
           Tadahiko Sugibayashi   A non-volatile reconfigurable offloader
                                  for wireless sensor nodes  . . . . . . . 87--92
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 93--112

ACM SIGARCH Computer Architecture News
Volume 41, Number 1, March, 2013

                   Michael Bond   GPUDet: a deterministic GPU architecture 1--12
                Hyojin Sung and   
         Rakesh Komuravelli and   
                 Sarita V. Adve   DeNovoND: efficient hardware support for
                                  disciplined non-determinism  . . . . . . 13--26
            Benjamin Wester and   
            David Devecsery and   
              Peter M. Chen and   
                Jason Flinn and   
            Satish Narayanasamy   Parallelizing data race detection  . . . 27--38
              Brandon Lucia and   
                      Luis Ceze   Cooperative empirical failure avoidance
                                  for multithreaded programs . . . . . . . 39--50
  Íñigo Goiri and   
             William Katsak and   
                    Kien Le and   
              Thu D. Nguyen and   
              Ricardo Bianchini   Parasol and GreenSwitch: managing
                                  datacenters powered by renewable energy  51--64
                   Kai Shen and   
         Arrvindh Shriraman and   
          Sandhya Dwarkadas and   
                 Xiao Zhang and   
                     Zhuan Chen   Power containers: an OS facility for
                                  fine-grained power and energy management
                                  on multicore servers . . . . . . . . . . 65--76
       Christina Delimitrou and   
             Christos Kozyrakis   Paragon: QoS-aware scheduling for
                                  heterogeneous datacenters  . . . . . . . 77--88
               Lingjia Tang and   
                 Jason Mars and   
                   Wei Wang and   
                 Tanima Dey and   
                 Mary Lou Soffa   ReQoS: reactive static/dynamic
                                  compilation for QoS in warehouse scale
                                  computers  . . . . . . . . . . . . . . . 89--100
                Joy Arulraj and   
              Po-Chun Chang and   
               Guoliang Jin and   
                        Shan Lu   Production-run software failure
                                  diagnosis via hardware performance
                                  counters . . . . . . . . . . . . . . . . 101--112
                  Wei Zhang and   
             Marc de Kruijf and   
                     Ang Li and   
                    Shan Lu and   
      Karthikeyan Sankaralingam   ConAir: featherweight concurrency bug
                                  recovery via single-threaded idempotent
                                  execution  . . . . . . . . . . . . . . . 113--126
            Nicolas Viennot and   
             Siddharth Nair and   
                     Jason Nieh   Transparent mutable replay for multicore
                                  debugging and patch validation . . . . . 127--138
         Swarup Kumar Sahoo and   
              John Criswell and   
               Chase Geigle and   
                    Vikram Adve   Using likely invariants for automated
                                  software fault localization  . . . . . . 139--152
                    Eric Paulos   The rise of the expert amateur: DIY
                                  culture and the evolution of computer
                                  science  . . . . . . . . . . . . . . . . 153--154
              Arun Raghavan and   
             Laurel Emurian and   
                   Lei Shao and   
       Marios Papaefthymiou and   
              Kevin P. Pipe and   
          Thomas F. Wenisch and   
              Milo M. K. Martin   Computational sprinting on a
                                  hardware/software testbed  . . . . . . . 155--166
                 Wonsun Ahn and   
                 Yuelu Duan and   
                Josep Torrellas   DeAliaser: alias speculation using
                                  atomic region support  . . . . . . . . . 167--180
               Heekwon Park and   
              Seungjae Baek and   
               Jongmoo Choi and   
                Donghee Lee and   
                     Sam H. Noh   Regularities considered harmful: forcing
                                  randomness to memory accesses to reduce
                                  row buffer conflicts for multi-core,
                                  multi-bank systems . . . . . . . . . . . 181--192
             Nima Honarmand and   
          Nathan Dautenhahn and   
            Josep Torrellas and   
             Samuel T. King and   
               Gilles Pokam and   
              Cristiano Pereira   Cyrus: unintrusive application-level
                                  record-replay for replay parallelism . . 193--206
   Augusto Born de Oliveira and   
     Sebastian Fischmeister and   
                 Amer Diwan and   
         Matthias Hauswirth and   
               Peter F. Sweeney   Why you should care about quantile
                                  regression . . . . . . . . . . . . . . . 207--218
         Charlie Curtsinger and   
                Emery D. Berger   STABILIZER: statistically sound
                                  performance evaluation . . . . . . . . . 219--228
               Lokesh Gidra and   
           Gaël Thomas and   
              Julien Sopena and   
                   Marc Shapiro   A study of the scalability of
                                  stop-the-world garbage collectors on
                                  multicores . . . . . . . . . . . . . . . 229--240
         Daniel S. McFarlin and   
             Charles Tucker and   
                   Craig Zilles   Discerning the dominant out-of-order
                                  performance advantage: is it speculation
                                  or dynamism? . . . . . . . . . . . . . . 241--252
          Stephen Checkoway and   
                  Hovav Shacham   Iago attacks: why the system call API is
                                  a bad untrusted RPC interface  . . . . . 253--264
            Owen S. Hofmann and   
                Sangman Kim and   
               Alan M. Dunn and   
             Michael Z. Lee and   
                 Emmett Witchel   InkTag: secure applications on an
                                  untrusted operating system . . . . . . . 265--278
        Cristiano Giuffrida and   
             Anton Kuijsten and   
            Andrew S. Tanenbaum   Safe and automatic live update for
                                  operating systems  . . . . . . . . . . . 279--292
                 Haohui Mai and   
                  Edgar Pek and   
                    Hui Xue and   
       Samuel Talmadge King and   
       Parthasarathy Madhusudan   Verifying security invariants in
                                  ExpressOS  . . . . . . . . . . . . . . . 293--304
              Eric Schkufza and   
               Rahul Sharma and   
                     Alex Aiken   Stochastic superoptimization . . . . . . 305--316
               Eric Schulte and   
         Jonathan DiLorenzo and   
             Westley Weimer and   
              Stephanie Forrest   Automated repair of binary and assembly
                                  programs for cooperating embedded
                                  devices  . . . . . . . . . . . . . . . . 317--328
                 Heming Cui and   
                    Gang Hu and   
                 Jingyue Wu and   
                   Junfeng Yang   Verifying systems rules using
                                  rule-directed symbolic execution . . . . 329--342
               Xiaoya Xiang and   
                  Chen Ding and   
                    Hao Luo and   
                        Bin Bao   HOTL: a higher order theory of locality  343--356
                   Hui Kang and   
               Jennifer L. Wong   To hardware prefetch or not to
                                  prefetch?: a virtualized environment
                                  study and core binding approach  . . . . 357--368
                 Hwanju Kim and   
               Sangwook Kim and   
               Jinkyu Jeong and   
                Joonwon Lee and   
               Seungryoul Maeng   Demand-based coordinated scheduling for
                                  SMP VMs  . . . . . . . . . . . . . . . . 369--380
            Mohammad Dashti and   
         Alexandra Fedorova and   
             Justin Funston and   
                Fabien Gaud and   
            Renaud Lachaize and   
            Baptiste Lepers and   
               Vivien Quema and   
                      Mark Roth   Traffic management: a holistic approach
                                  to memory placement on NUMA systems  . . 381--394
                 Adwait Jog and   
               Onur Kayiran and   
Nachiappan Chidambaram Nachiappan and   
             Asit K. Mishra and   
         Mahmut T. Kandemir and   
                 Onur Mutlu and   
           Ravishankar Iyer and   
                   Chita R. Das   OWL: cooperative thread array aware
                                  scheduling techniques for improving
                                  GPGPU performance  . . . . . . . . . . . 395--406
              Sreepathi Pai and   
 Matthew J. Thazhuthaveetil and   
                R. Govindarajan   Improving GPGPU concurrency with elastic
                                  kernels  . . . . . . . . . . . . . . . . 407--418
                 Taewook Oh and   
                 Hanjun Kim and   
            Nick P. Johnson and   
                 Jae W. Lee and   
                David I. August   Practical automatic loop specialization  419--430
Phitchaya Mangpo Phothilimthana and   
                Jason Ansel and   
      Jonathan Ragan-Kelley and   
              Saman Amarasinghe   Portable performance on heterogeneous
                                  architectures  . . . . . . . . . . . . . 431--444
             Aashish Mittal and   
            Dushyant Bansal and   
               Sorav Bansal and   
                    Varun Sethi   Efficient virtualization on embedded
                                  Power Architecture\reg platforms . . . . 445--458
                   Mark D. Hill   Research directions for 21st century
                                  computer systems: ASPLOS 2013 panel  . . 459--460
          Anil Madhavapeddy and   
            Richard Mortier and   
         Charalampos Rotsos and   
                David Scott and   
               Balraj Singh and   
          Thomas Gazagnaire and   
               Steven Smith and   
                Steven Hand and   
                  Jon Crowcroft   Unikernels: library operating systems
                                  for the cloud  . . . . . . . . . . . . . 461--472
                 Asim Kadav and   
      Matthew J. Renzelmann and   
               Michael M. Swift   Fine-grained fault tolerance using
                                  device checkpoints . . . . . . . . . . . 473--484
           Mark Silberstein and   
                 Bryan Ford and   
                Idit Keidar and   
                 Emmett Witchel   GPUfs: integrating a file system with
                                  GPUs . . . . . . . . . . . . . . . . . . 485--498
              Nicholas Hunt and   
                 Tom Bergan and   
                  Luis Ceze and   
              Steven D. Gribble   DDOS: taming nondeterminism in
                                  distributed systems  . . . . . . . . . . 499--508
                 Cheng Wang and   
                     Youfeng Wu   TSO\_ATOMICITY: efficient hardware
                                  primitive for TSO-preserving region
                                  optimizations  . . . . . . . . . . . . . 509--520
        Syed Ali Raza Jafri and   
        Gwendolyn Voskuilen and   
               T. N. Vijaykumar   Wait-n-GoTM: improving HTM performance
                                  by serializing cyclic dependencies . . . 521--534
                Xuehai Qian and   
            Josep Torrellas and   
         Benjamin Sahelices and   
                     Depei Qian   Volition: scalable and precise
                                  sequential consistency violation
                                  detection  . . . . . . . . . . . . . . . 535--548
             J. P. Grossman and   
          Jeffrey S. Kuskin and   
             Joseph A. Bank and   
           Michael Theobald and   
                Ron O. Dror and   
         Douglas J. Ierardi and   
          Richard H. Larson and   
             U. Ben Schafer and   
               Brian Towles and   
                Cliff Young and   
                  David E. Shaw   Hardware support for fine-grained
                                  event-driven computation in Anton 2  . . 549--560

ACM SIGARCH Computer Architecture News
Volume 41, Number 2, May, 2013

             Amitabha Sinha and   
             Mitrava Sarkar and   
          Soumojit Acharyya and   
           Suranjan Chakraborty   A novel reconfigurable architecture of a
                                  DSP processor for efficient mapping of
                                  DSP functions using field programmable
                                  DSP arrays . . . . . . . . . . . . . . . 1--8
                Amrita Saha and   
        Manideepa Mukherjee and   
            Debanjana Datta and   
               Sangita Saha and   
                 Amitabha Sinha   Performance analysis of a FPGA based
                                  novel binary and DBNS multiplier . . . . 9--16
        Michael Sartin-Tarm and   
              Tony Nowatzki and   
           Lorenzo De Carli and   
  Karthikeyan Sankaralingam and   
                 Cristian Estan   Constraint centric scheduling guide  . . 17--21
                 Apala Guha and   
                  Yao Zhang and   
           Raihan ur Rasool and   
                Andrew A. Chien   Systematic evaluation of workload
                                  clustering for extremely
                                  energy-efficient architectures . . . . . 22--29
                Amrita Saha and   
              Pijush Biswas and   
                 Amitabha Sinha   An integrated development platform of a
                                  reconfigurable radio processor for
                                  software defined radio . . . . . . . . . 30--35
                Santanu Pal and   
             Amitabha Sinha and   
                  Pijush Biswas   FPGA implementation of a novel DCT
                                  architecture reducing constant cosine
                                  terms  . . . . . . . . . . . . . . . . . 36--40
              Kuo-Kun Tseng and   
                 Fu-Fu Zeng and   
            Huang-Nan Huang and   
                 Yiming Liu and   
            Jeng-Shyang Pan and   
                   W. H. Ip and   
                       C. H. Wu   A new non-exact Aho--Corasick framework
                                  for ECG classification . . . . . . . . . 41--46
           Subhashis Maitra and   
                 Amitabha Sinha   High performance MAC unit for DSP and
                                  cryptographic applications . . . . . . . 47--55
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 56--71

ACM SIGARCH Computer Architecture News
Volume 41, Number 3, June, 2013

              Bilel Belhadj and   
            Antoine Joubert and   
                   Zheng Li and   
     Rodolphe Héliot and   
                  Olivier Temam   Continuous real-world inputs can open up
                                  alternative accelerator designs  . . . . 1--12
              Paula Petrica and   
        Adam M. Izraelevitz and   
          David H. Albonesi and   
         Christine A. Shoemaker   Flicker: a dynamically adaptive
                                  architecture for power limited multicore
                                  systems  . . . . . . . . . . . . . . . . 13--23
             Wajahat Qadeer and   
               Rehan Hameed and   
               Ofer Shacham and   
         Preethi Venkatesan and   
         Christos Kozyrakis and   
               Mark A. Horowitz   Convolution engine: balancing efficiency
                                  & flexibility in specialized computing    24--35
                  Kevin Lim and   
              David Meisner and   
               Ali G. Saidi and   
  Parthasarathy Ranganathan and   
              Thomas F. Wenisch   Thin servers with smart pipes: designing
                                  SoC accelerators for memcached . . . . . 36--47
            Janani Mukundan and   
             Hillery Hunter and   
              Kyu-hyoun Kim and   
          Jeffrey Stuecheli and   
 José F. Martínez   Understanding and mitigating refresh
                                  overheads in high-density DDR4 DRAM
                                  systems  . . . . . . . . . . . . . . . . 48--59
                  Jamie Liu and   
                 Ben Jaiyen and   
                 Yoongu Kim and   
            Chris Wilkerson and   
                     Onur Mutlu   An experimental study of data retention
                                  behavior in modern DRAM devices:
                                  implications for retention time
                                  profiling mechanisms . . . . . . . . . . 60--71
           Prashant J. Nair and   
               Dae-Hyun Kim and   
           Moinuddin K. Qureshi   ArchShield: architectural framework for
                                  assisting DRAM scaling by tolerating
                                  high error rates . . . . . . . . . . . . 72--83
              Saugata Ghose and   
                Hyodong Lee and   
 José F. Martínez   Improving memory scheduling via
                                  processor-side load criticality
                                  information  . . . . . . . . . . . . . . 84--95
               Canturk Isci and   
           Suzanne McIntosh and   
            Jeffrey Kephart and   
               Rajarshi Das and   
               James Hanson and   
                Scott Piper and   
             Robert Wolford and   
                Thomas Brey and   
             Robert Kantner and   
                   Allen Ng and   
               James Norris and   
           Abdoulaye Traore and   
               Michael Frissora   Agile, efficient virtualization power
                                  management with low-latency server power
                                  states . . . . . . . . . . . . . . . . . 96--107
              Cheng-Chun Tu and   
              Chao-tang Lee and   
                Tzi-cker Chiueh   Secure I/O device sharing among virtual
                                  machines on multiple hosts . . . . . . . 108--119
              Xiaotao Chang and   
            Hubertus Franke and   
                      Yi Ge and   
                    Tao Liu and   
                   Kun Wang and   
               Jimi Xenidis and   
                   Fei Chen and   
                       Yu Zhang   Improving virtualization in the presence
                                  of software managed translation
                                  lookaside buffers  . . . . . . . . . . . 120--129
                     Ji Kim and   
          Christopher Torng and   
           Shreesha Srinath and   
             Derek Lockhart and   
             Christopher Batten   Microarchitectural mechanisms to exploit
                                  value structure in SIMT architectures    130--141
         Angshuman Parashar and   
           Michael Pellauer and   
              Michael Adler and   
               Bushra Ahsan and   
                 Neal Crago and   
              Daniel Lustig and   
            Vladimir Pavlov and   
               Antonia Zhai and   
              Mohit Gambhir and   
               Aamer Jaleel and   
               Randy Allmon and   
              Rachid Rayess and   
             Stephen Maresh and   
                      Joel Emer   Triggered instructions: a control
                                  paradigm for spatially-programmed
                                  architectures  . . . . . . . . . . . . . 142--153
        José A. Joao and   
           M. Aater Suleman and   
                 Onur Mutlu and   
                   Yale N. Patt   Utility-based acceleration of
                                  multithreaded applications on asymmetric
                                  CMPs . . . . . . . . . . . . . . . . . . 154--165
              Daniel Kudrow and   
               Kenneth Bier and   
               Zhaoxia Deng and   
             Diana Franklin and   
                  Yu Tomita and   
           Kenneth R. Brown and   
              Frederic T. Chong   Quantum rotations: a case study in
                                  static and dynamic machine-code
                                  generation for quantum computers . . . . 166--176
          Richard A. Muscat and   
              Karin Strauss and   
                  Luis Ceze and   
                   Georg Seelig   DNA-based molecular architecture with
                                  spatially localized components . . . . . 177--188
                   Qing Guo and   
               Xiaochen Guo and   
                 Ravi Patel and   
                 Engin Ipek and   
                Eby G. Friedman   AC-DIMM: associative computing with
                                  STT-MRAM . . . . . . . . . . . . . . . . 189--200
          Blake A. Hechtman and   
                Daniel J. Sorin   Exploring memory consistency for
                                  massively-threaded throughput-oriented
                                  processors . . . . . . . . . . . . . . . 201--212
                 Yuelu Duan and   
           Abdullah Muzahid and   
                Josep Torrellas   WeeFence: toward making fences free in
                                  TSO  . . . . . . . . . . . . . . . . . . 213--224
             Harold W. Cain and   
           Maged M. Michael and   
                  Brad Frey and   
                  Cathy May and   
             Derek Williams and   
                        Hung Le   Robust architectural support for
                                  transactional memory in the Power
                                  architecture . . . . . . . . . . . . . . 225--236
             Arkaprava Basu and   
             Jayneel Gandhi and   
              Jichuan Chang and   
               Mark D. Hill and   
               Michael M. Swift   Efficient virtual memory for big memory
                                  servers  . . . . . . . . . . . . . . . . 237--248
                    Lisa Wu and   
          Raymond J. Barker and   
              Martha A. Kim and   
                Kenneth A. Ross   Navigating big data with
                                  high-throughput, energy-efficient data
                                  partitioning . . . . . . . . . . . . . . 249--260
              Eric S. Chung and   
              John D. Davis and   
                     Jaewon Lee   LINQits: big data on little clients  . . 261--272
                 Islam Atta and   
      Pinar Tözün and   
                   Xin Tong and   
         Anastasia Ailamaki and   
               Andreas Moshovos   STREX: boosting instruction cache reuse
                                  in OLTP workloads through stratified
                                  transaction execution  . . . . . . . . . 273--284
               Indrani Paul and   
             Srilatha Manne and   
               Manish Arora and   
           W. Lloyd Bircher and   
          Sudhakar Yalamanchili   Cooperative boosting: needy versus
                                  greedy power management  . . . . . . . . 285--296
                 Anys Bacha and   
                Radu Teodorescu   Dynamic reduction of voltage margins by
                                  leveraging on-chip ECC in Itanium II
                                  processors . . . . . . . . . . . . . . . 297--307
                 Henry Cook and   
              Miquel Moreto and   
                 Sarah Bird and   
                  Khanh Dao and   
         David A. Patterson and   
                 Krste Asanovic   A hardware evaluation of cache
                                  partitioning to improve utilization and
                                  energy-efficiency while preserving
                                  responsiveness . . . . . . . . . . . . . 308--319
             Reetuparna Das and   
        Satish Narayanasamy and   
         Sudhir K. Satpathy and   
           Ronald G. Dreslinski   Catnap: energy proportional multiple
                                  network-on-chip  . . . . . . . . . . . . 320--331
                 Adwait Jog and   
               Onur Kayiran and   
             Asit K. Mishra and   
         Mahmut T. Kandemir and   
                 Onur Mutlu and   
           Ravishankar Iyer and   
                   Chita R. Das   Orchestrated scheduling and prefetching
                                  for GPGPUs . . . . . . . . . . . . . . . 332--343
               Naifeng Jing and   
                   Yao Shen and   
                     Yao Lu and   
        Shrikanth Ganapathy and   
                Zhigang Mao and   
                  Minyi Guo and   
                Ramon Canal and   
                  Xiaoyao Liang   An energy-efficient and scalable
                                  eDRAM-based register file architecture
                                  for GPGPU  . . . . . . . . . . . . . . . 344--355
                 Minsoo Rhu and   
                    Mattan Erez   Maximizing SIMD resource utilization in
                                  GPGPUs with SIMD lane permutation  . . . 356--367
        Aniruddha S. Vaidya and   
          Anahita Shayesteh and   
              Dong Hyuk Woo and   
                Roy Saharoy and   
                     Mani Azimi   SIMD divergence optimization through
                                  intra-warp compaction  . . . . . . . . . 368--379
             Young Hoon Son and   
                 O. Seongil and   
                  Yuhwan Ro and   
                 Jae W. Lee and   
                    Jung Ho Ahn   Reducing memory access latency with
                                  asymmetric DRAM bank organizations . . . 380--391
                   Ziyi Liu and   
               JongHyuk Lee and   
               Junyuan Zeng and   
               Yuanfeng Wen and   
               Zhiqiang Lin and   
                    Weidong Shi   CPU transparent protection of OS kernel
                                  and hypervisor integrity with
                                  programmable DRAM  . . . . . . . . . . . 392--403
            Djordje Jevdjic and   
              Stavros Volos and   
                  Babak Falsafi   Die-stacked DRAM caches for servers: hit
                                  ratio, latency, or bandwidth? Have it
                                  all with footprint cache . . . . . . . . 404--415
               Jaewoong Sim and   
             Gabriel H. Loh and   
            Vilas Sridharan and   
                  Mike O'Connor   Resilient die-stacked DRAM caches  . . . 416--427
                      Yu Du and   
                  Miao Zhou and   
          Bruce R. Childers and   
        Daniel Mossé and   
                    Rami Melhem   Bit mapping for balanced PCM cell
                                  programming  . . . . . . . . . . . . . . 428--439
              Nak Hee Seong and   
                Sungkap Yeo and   
              Hsien-Hsin S. Lee   Tri-level-cell phase change memory:
                                  toward an efficient and reliable memory
                                  system . . . . . . . . . . . . . . . . . 440--451
            Rodolfo Azevedo and   
              John D. Davis and   
              Karin Strauss and   
          Parikshit Gopalan and   
               Mark Manasse and   
                Sergey Yekhanin   Zombie memory: extending memory lifetime
                                  by reviving dead blocks  . . . . . . . . 452--463
        Adrian M. Caulfield and   
                 Steven Swanson   QuickSAN: a storage area network for
                                  fast, distributed, solid state disks . . 464--474
             Daniel Sanchez and   
             Christos Kozyrakis   ZSim: fast and accurate
                                  microarchitectural simulation of
                                  thousand-core systems  . . . . . . . . . 475--486
               Jingwen Leng and   
        Tayler Hetherington and   
            Ahmed ElTantawy and   
                Syed Gilani and   
               Nam Sung Kim and   
              Tor M. Aamodt and   
             Vijay Janapa Reddi   GPUWattch: enabling energy optimizations
                                  in GPGPUs  . . . . . . . . . . . . . . . 487--498
                 Meng-Ju Wu and   
                Minshu Zhao and   
                   Donald Yeung   Studying multicore processor scaling via
                                  reuse distance analysis  . . . . . . . . 499--510
            Kristof Du Bois and   
              Stijn Eyerman and   
         Jennifer B. Sartor and   
                Lieven Eeckhout   Criticality stacks: identifying critical
                                  threads in parallel programs using
                                  synchronization behavior . . . . . . . . 511--522
              George Kurian and   
                  Omer Khan and   
               Srinivas Devadas   The locality-aware adaptive cache
                                  coherence protocol . . . . . . . . . . . 523--534
           Stefanos Kaxiras and   
                    Alberto Ros   A new perspective for efficient
                                  virtual-cache coherence  . . . . . . . . 535--546
              Hongzhou Zhao and   
         Arrvindh Shriraman and   
            Snehasish Kumar and   
              Sandhya Dwarkadas   Protozoa: adaptive granularity cache
                                  coherence  . . . . . . . . . . . . . . . 547--558
                 John Demme and   
            Matthew Maycock and   
              Jared Schmitz and   
                Adrian Tang and   
               Adam Waksman and   
        Simha Sethumadhavan and   
               Salvatore Stolfo   On the feasibility of online malware
                                  detection with performance counters  . . 559--570
                   Ling Ren and   
                Xiangyao Yu and   
    Christopher W. Fletcher and   
            Marten van Dijk and   
               Srinivas Devadas   Design space exploration and
                                  optimization of path oblivious RAM in
                                  secure processors  . . . . . . . . . . . 571--582
        Hassan M. G. Wassel and   
                   Ying Gao and   
             Jason K. Oberg and   
               Ted Huffmire and   
               Ryan Kastner and   
          Frederic T. Chong and   
               Timothy Sherwood   SurfNoC: a low latency and provably
                                  non-interfering approach to secure
                                  networks-on-chip . . . . . . . . . . . . 583--594
                    Di Wang and   
              Chuangang Ren and   
          Anand Sivasubramaniam   Virtualizing power distribution in
                                  datacenters  . . . . . . . . . . . . . . 595--606
               Hailong Yang and   
               Alex Breslow and   
                 Jason Mars and   
                   Lingjia Tang   Bubble-Flux: precise online QoS
                                  management for increased utilization in
                                  warehouse scale computers  . . . . . . . 607--618
                 Jason Mars and   
                   Lingjia Tang   Whare-map: heterogeneity in
                                  ``homogeneous'' warehouse-scale
                                  computers  . . . . . . . . . . . . . . . 619--630
              Nikos Foutris and   
        Dimitris Gizopoulos and   
                Xavier Vera and   
               Antonio Gonzalez   Deconfigurable microprocessor
                                  architectures for silicon debug
                                  acceleration . . . . . . . . . . . . . . 631--642
               Gilles Pokam and   
                Klaus Danne and   
          Cristiano Pereira and   
                 Rolf Kassa and   
                Tim Kranich and   
                Shiliang Hu and   
         Justin Gottschlich and   
             Nima Honarmand and   
          Nathan Dautenhahn and   
             Samuel T. King and   
                Josep Torrellas   QuickRec: prototyping an Intel
                                  architecture extension for record and
                                  replay of multithreaded programs . . . . 643--654
               Ruirui Huang and   
               Erik Halberg and   
                  G. Edward Suh   Non-race concurrency bug detection
                                  through order-sensitive critical
                                  sections . . . . . . . . . . . . . . . . 655--666

ACM SIGARCH Computer Architecture News
Volume 41, Number 4, September, 2013

           Subhashis Maitra and   
                 Amitabha Sinha   High efficiency MAC unit used in digital
                                  signal processing and elliptic curve
                                  cryptography . . . . . . . . . . . . . . 1--7
          Tomislav Janjusic and   
                   Krishna Kavi   Gleipnir: a memory profiling and tracing
                                  tool . . . . . . . . . . . . . . . . . . 8--12
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 13--22

ACM SIGARCH Computer Architecture News
Volume 41, Number 5, December, 2013

                    Ivan Godard   The Mill: split-stream encoding  . . . . 1--5
            Alexander Thomasian   Disk arrays with multiple RAID levels    6--24
           Subhashis Maitra and   
                 Amitabha Sinha   Design and simulation of MAC unit using
                                  combinational circuit and adder  . . . . 25--33
          Thomas C. P. Chau and   
           James S. Targett and   
        Marlon Wijeyasinghe and   
                  Wayne Luk and   
         Peter Y. K. Cheung and   
              Benjamin Cope and   
                Alison Eele and   
                Jan Maciejowski   Accelerating sequential Monte Carlo
                                  method for real-time air traffic
                                  management . . . . . . . . . . . . . . . 35--40
              Atabak Mahram and   
             Martin C. Herbordt   NCBI BLASTP on the Convey HC1-EX . . . . 41--46
               Kentaro Sano and   
              Yoshiaki Kono and   
              Hayato Suzuki and   
              Ryotaro Chiba and   
                    Ryo Ito and   
              Tomohiro Ueno and   
                Kyo Koizumi and   
                Satoru Yamamoto   Efficient custom computing of
                                  fully-streamed lattice Boltzmann method
                                  on tightly-coupled FPGA cluster  . . . . 47--52
         Wim Vanderbauwhede and   
               Anton Frolov and   
   Sai Rahul Chalamalasetti and   
                 Martin Margala   A hybrid CPU--FPGA system for high
                                  throughput (10Gb/s) streaming document
                                  classification . . . . . . . . . . . . . 53--58
                     Ce Guo and   
                  Wayne Luk and   
      Ekaterina Vinkovskaya and   
                      Rama Cont   Customisable pipelined engine for
                                  intensity evaluation in multivariate
                                  Hawkes point processes . . . . . . . . . 59--64
             Heiner Giefers and   
           Christian Plessl and   
             Jens Förstner   Accelerating finite difference time
                                  domain simulations with reconfigurable
                                  dataflow computers . . . . . . . . . . . 65--70
                 Yuki Ogawa and   
              Masahiro Iida and   
           Motoki Amagasaki and   
              Morihiro Kuga and   
             Toshinori Sueyoshi   A reconfigurable Java accelerator with
                                  software compatibility for embedded
                                  systems  . . . . . . . . . . . . . . . . 71--76
             Takeshi Ohkawa and   
              Daichi Uetake and   
             Takashi Yokota and   
            Kanemitsu Ootsu and   
                  Takanobu Baba   Reconfigurable and hardwired ORB engine
                                  on FPGA by Java-to-HDL synthesizer for
                                  realtime application . . . . . . . . . . 77--82
        Florent de Dinechin and   
               Matei Istoan and   
              Guillaume Sergent   Fixed-point trigonometric functions on
                                  FPGAs  . . . . . . . . . . . . . . . . . 83--88
                     Jubee Tada   Performance evaluation of $3$-D stacked
                                  $ 32$-bit parallel multipliers . . . . . 89--94
           Yuichiroh Tanaka and   
               Shimpei Sato and   
                     Kenji Kise   The UltraSmall soft processor  . . . . . 95--100
               Liucheng Guo and   
            David B. Thomas and   
                      Wayne Luk   Customisable architectures for the set
                                  covering problem . . . . . . . . . . . . 101--106
            Gary Plumbridge and   
               Jack Whitham and   
                   Neil Audsley   Blueshell: a platform for rapid
                                  prototyping of multiprocessor NoCs and
                                  accelerators . . . . . . . . . . . . . . 107--117
                 Chuan Hong and   
             Khaled Benkrid and   
                 Nazrin Isa and   
                  Xabier Iturbe   A run-time reconfigurable system for
                                  adaptive high performance efficient
                                  computing  . . . . . . . . . . . . . . . 113--118
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 119--127

ACM SIGARCH Computer Architecture News
Volume 42, Number 1, March, 2014

                       Al Davis   Inside Windows Azure: the challenges and
                                  opportunities of a cloud operating
                                  system . . . . . . . . . . . . . . . . . 1--2
           Stanko Novakovic and   
          Alexandros Daglis and   
            Edouard Bugnion and   
              Babak Falsafi and   
                     Boris Grot   Scale-out NUMA . . . . . . . . . . . . . 3--18
         Sandeep R. Agrawal and   
            Valentin Pistol and   
                   Jun Pang and   
                  John Tran and   
               David Tarjan and   
                Alvin R. Lebeck   Rhythm: harnessing data parallel
                                  hardware for server workloads  . . . . . 19--34
             Mehrzad Samadi and   
    Davoud Anoushe Jamshidi and   
              Janghaeng Lee and   
                   Scott Mahlke   Paraprox: pattern-based approximation
                                  for data parallel applications . . . . . 35--50
             James Bornholt and   
             Todd Mytkowicz and   
            Kathryn S. McKinley   Uncertain$<$ t$>$: a first-order type for
                                  uncertain data . . . . . . . . . . . . . 51--66
                Nuno Santos and   
               Himanshu Raj and   
              Stefan Saroiu and   
                    Alec Wolman   Using ARM trustzone to build a trusted
                                  language runtime for mobile applications 67--80
              John Criswell and   
          Nathan Dautenhahn and   
                    Vikram Adve   Virtual Ghost: protecting applications
                                  from hostile operating systems . . . . . 81--96
                     Xun Li and   
            Vineeth Kashyap and   
             Jason K. Oberg and   
               Mohit Tiwari and   
   Vasanth Ram Rajarathinam and   
               Ryan Kastner and   
           Timothy Sherwood and   
              Ben Hardekopf and   
              Frederic T. Chong   Sapper: a language for hardware-level
                                  security policy enforcement  . . . . . . 97--112
               Radu Banabic and   
              George Candea and   
               Rachid Guerraoui   Finding Trojan message vulnerabilities
                                  in distributed systems . . . . . . . . . 113--126
       Christina Delimitrou and   
             Christos Kozyrakis   Quasar: resource-efficient and QoS-aware
                                  cluster management . . . . . . . . . . . 127--144
         Seyed Majid Zahedi and   
                Benjamin C. Lee   REF: resource elasticity fairness with
                                  sharing incentives for multiprocessors   145--160
Thannirmalai Somu Muthukaruppan and   
              Anuj Pathania and   
                   Tulika Mitra   Price theory based power management for
                                  heterogeneous multi-cores  . . . . . . . 161--176
                    Di Wang and   
            Sriram Govindan and   
      Anand Sivasubramaniam and   
                Aman Kansal and   
                    Jie Liu and   
             Badriddine Khessib   Underprovisioning backup power
                                  infrastructure for datacenters . . . . . 177--192
                    Xiao Yu and   
                    Shi Han and   
              Dongmei Zhang and   
                        Tao Xie   Comprehending performance from
                                  real-world execution traces: a
                                  device-driver case . . . . . . . . . . . 193--206
                Joy Arulraj and   
               Guoliang Jin and   
                        Shan Lu   Leveraging the short-term memory of
                                  hardware to diagnose production-run
                                  software failures  . . . . . . . . . . . 207--222
             Nima Honarmand and   
                Josep Torrellas   RelaxReplay: record and replay for
                                  relaxed-consistency multiprocessors  . . 223--238
               Stefan Bucur and   
            Johannes Kinder and   
                  George Candea   Prototyping symbolic execution engines
                                  for interpreted languages  . . . . . . . 239--254
                    Lisa Wu and   
           Andrea Lottarini and   
           Timothy K. Paine and   
              Martha A. Kim and   
                Kenneth A. Ross   Q100: the architecture and design of a
                                  database processing unit . . . . . . . . 255--268
               Tianshi Chen and   
                  Zidong Du and   
                Ninghui Sun and   
                   Jia Wang and   
               Chengyong Wu and   
                 Yunji Chen and   
                  Olivier Temam   DianNao: a small-footprint
                                  high-throughput accelerator for
                                  ubiquitous machine-learning  . . . . . . 269--284
          Felix Xiaozhu Lin and   
                  Zhen Wang and   
                      Lin Zhong   K2: a mobile operating system for
                                  heterogeneous coherence domains  . . . . 285--300
     Konstantinos Menychtas and   
                   Kai Shen and   
               Michael L. Scott   Disengaged scheduling for fair,
                                  protected access to fast computational
                                  accelerators . . . . . . . . . . . . . . 301--316
                  Jeff Gehlhaar   Neuromorphic processing: a new frontier
                                  in scaling computer architecture . . . . 317--318
         Ardalan Amiri Sani and   
                 Kevin Boos and   
                 Shaopu Qin and   
                      Lin Zhong   I/O paravirtualization at the device
                                  file boundary  . . . . . . . . . . . . . 319--332
           Christoffer Dall and   
                     Jason Nieh   KVM\slash ARM: the design and
                                  implementation of the Linux ARM
                                  hypervisor . . . . . . . . . . . . . . . 333--348
                 Nadav Amit and   
                Dan Tsafrir and   
                 Assaf Schuster   VSwapper: a memory swapper for
                                  virtualized environments . . . . . . . . 349--366
              Jeremy Andrus and   
        Alexander Van't Hof and   
              Naser AlDuaij and   
           Christoffer Dall and   
            Nicolas Viennot and   
                     Jason Nieh   Cider: native execution of iOS apps on
                                  Android  . . . . . . . . . . . . . . . . 367--382
                Heiner Litz and   
             David Cheriton and   
         Amin Firoozshahian and   
                 Omid Azizi and   
              John P. Stevenson   SI-TM: reducing transactional memory
                                  abort rates through snapshot isolation   383--398
                Wenjia Ruan and   
                Trilok Vyas and   
                  Yujie Liu and   
                  Michael Spear   Transactionalizing legacy code: an
                                  experience report using GCC and
                                  Memcached  . . . . . . . . . . . . . . . 399--412
              Adam Morrison and   
                    Yehuda Afek   Fence-free work stealing on bounded TSO
                                  processors . . . . . . . . . . . . . . . 413--426
             Derek R. Hower and   
          Blake A. Hechtman and   
       Bradford M. Beckmann and   
         Benedict R. Gaster and   
               Mark D. Hill and   
        Steven K. Reinhardt and   
                  David A. Wood   Heterogeneous-race-free memory models    427--440
             Myoungsoo Jung and   
                 Wonil Choi and   
                 John Shalf and   
         Mahmut Taylan Kandemir   Triple-A: a Non-SSD based autonomic
                                  all-flash array for high performance
                                  storage systems  . . . . . . . . . . . . 441--454
               Ren-Shuo Liu and   
                 De-Yu Shen and   
              Chia-Lin Yang and   
               Shun-Chih Yu and   
        Cheng-Yuan Michael Wang   NVM duet: unified working memory and
                                  persistent store architecture  . . . . . 455--470
                Jian Ouyang and   
                Shiding Lin and   
                 Song Jiang and   
                 Zhenyu Hou and   
                  Yong Wang and   
                 Yuanzheng Wang   SDF: software-defined flash for
                                  Web-scale Internet storage systems . . . 471--484
          Anthony Gutierrez and   
            Michael Cieslak and   
            Bharan Giridhar and   
       Ronald G. Dreslinski and   
                  Luis Ceze and   
                   Trevor Mudge   Integrated $3$D-stacked server designs
                                  for increasing physical density of
                                  key-value stores . . . . . . . . . . . . 485--498
              Donald Nguyen and   
            Andrew Lenharth and   
                 Keshav Pingali   Deterministic Galois: on-demand,
                                  portable and parameterless . . . . . . . 499--512
                Haris Ribic and   
                   Yu David Liu   Energy-efficient work-stealing language
                                  runtimes . . . . . . . . . . . . . . . . 513--528
             Todd Mytkowicz and   
         Madanlal Musuvathi and   
                Wolfram Schulte   Data-parallel finite-state machines  . . 529--542
                Zhijia Zhao and   
                      Bo Wu and   
                    Xipeng Shen   Challenging the ``embarrassingly
                                  sequential'': parallelizing finite state
                                  machine-based computations through
                                  principled speculation . . . . . . . . . 543--558
                 Yanqi Zhou and   
                David Wentzlaff   The sharing architecture: sub-core
                                  configurability for IaaS clouds  . . . . 559--574
             Amos Waterland and   
            Elaine Angelino and   
              Ryan P. Adams and   
           Jonathan Appavoo and   
                  Margo Seltzer   ASC: automatically scalable computation  575--590
              Stijn Eyerman and   
                Lieven Eeckhout   The benefit of SMT in the multi-core
                                  era: flexibility towards degrees of
                                  thread-level parallelism . . . . . . . . 591--606
                 Yufei Ding and   
              Mingzhou Zhou and   
                Zhijia Zhao and   
            Sarah Eisenstat and   
                    Xipeng Shen   Finding the limit: examining the
                                  potential and complexity of compilation
                                  scheduling for JIT-based runtime systems 607--622
                 Marc Lupon and   
               Enric Gibert and   
          Grigorios Magklis and   
          Sridhar Samudrala and   
Raúl Martínez and   
           Kyriakos Stavrou and   
                David R. Ditzel   Speculative hardware/software
                                  co-designed floating-point multiply-add
                                  fusion . . . . . . . . . . . . . . . . . 623--638
               Eric Schulte and   
              Jonathan Dorn and   
            Stephen Harding and   
          Stephanie Forrest and   
                 Westley Weimer   Post-compiler software optimization for
                                  reducing energy  . . . . . . . . . . . . 639--652
                  David A. Wood   Resolved: specialized architectures,
                                  languages, and system software should
                                  supplant general-purpose alternatives
                                  within a decade  . . . . . . . . . . . . 653--654
            Olatunji Ruwase and   
          Michael A. Kozuch and   
         Phillip B. Gibbons and   
                  Todd C. Mowry   Guardrail: a high fidelity approach to
                                  protecting hardware devices from buggy
                                  drivers  . . . . . . . . . . . . . . . . 655--670
           Benjamin P. Wood and   
                  Luis Ceze and   
                   Dan Grossman   Low-level detection of language-level
                                  data races with LARD . . . . . . . . . . 671--686
                Jiaqi Zhang and   
Lakshminarayanan Renganarayana and   
              Xiaolan Zhang and   
                    Niyu Ge and   
               Vasanth Bala and   
                 Tianyin Xu and   
                  Yuanyuan Zhou   EnCore: exploiting system environment
                                  and correlation information for
                                  misconfiguration detection . . . . . . . 687--700
        Gwendolyn Voskuilen and   
               T. N. Vijaykumar   High-performance fractal coherence . . . 701--714
             Woo-Cheol Kwon and   
             Tushar Krishna and   
                  Li-Shiuan Peh   Locality-oblivious cache organization
                                  leveraging single-cycle multi-hop NoCs   715--728
            Harshad Kasture and   
                 Daniel Sanchez   Ubik: efficient cache sharing with
                                  strict QoS for latency-critical
                                  workloads  . . . . . . . . . . . . . . . 729--742
             Bharath Pichai and   
                   Lisa Hsu and   
         Abhishek Bhattacharjee   Architectural support for address
                                  translation on GPUs: designing memory
                                  management units for CPU/GPUs with
                                  unified address spaces . . . . . . . . . 743--758

ACM SIGARCH Computer Architecture News
Volume 42, Number 2, May, 2014

             Subijit Mondal and   
               Subhashis Maitra   Data security-modified AES algorithm and
                                  its applications . . . . . . . . . . . . 1--8
                 Soumik Sen and   
               Subhashis Maitra   Three levels three dimensional compact
                                  coding . . . . . . . . . . . . . . . . . 9--14
        Alexander Thomasian and   
               Bingxing Liu and   
                     Yuhui Deng   Balancing disk access times in RAID5
                                  disk arrays in degraded mode by
                                  conditionally prioritizing fork/join
                                  requests . . . . . . . . . . . . . . . . 15--19
             Jayneel Gandhi and   
             Arkaprava Basu and   
               Mark D. Hill and   
               Michael M. Swift   BadgerTrap: a tool to instrument x86-64
                                  TLB misses . . . . . . . . . . . . . . . 20--23
                   Mark Thorson   Internet nuggets . . . . . . . . . . . . 24--36

ACM SIGARCH Computer Architecture News
Volume 42, Number 3, June, 2014

               Brian Towles and   
             J. P. Grossman and   
             Brian Greskamp and   
                  David E. Shaw   Unifying on-chip and inter-node
                                  switching within the Anton 2 network . . 1--12
              Andrew Putnam and   
        Adrian M. Caulfield and   
              Eric S. Chung and   
                Derek Chiou and   
      Kypros Constantinides and   
                 John Demme and   
          Hadi Esmaeilzadeh and   
              Jeremy Fowers and   
             Gopi Prashanth and   
                  Gopal Jan and   
               Gray Michael and   
       Haselman Scott Hauck and   
               Stephen Heil and   
               Amir Hormati and   
              Joo-Young Kim and   
              Sitaram Lanka and   
                James Larus and   
              Eric Peterson and   
                 Simon Pope and   
                Aaron Smith and   
                Jason Thong and   
                 Phillip Yi and   
               Xiao Doug Burger   A reconfigurable fabric for accelerating
                                  large-scale datacenter services  . . . . 13--24
             Bhavya K. Daya and   
        Chia-Hsin Owen Chen and   
        Suvinay Subramanian and   
             Woo-Cheol Kwon and   
              Sunghyun Park and   
             Tushar Krishna and   
                   Jim Holt and   
    Anantha P. Chandrakasan and   
                  Li-Shiuan Peh   SCORPIO: a $ 36$-core research chip
                                  demonstrating snoopy coherence on a
                                  scalable mesh NoC with in-network
                                  ordering . . . . . . . . . . . . . . . . 25--36
            Gaurang Upasani and   
                Xavier Vera and   
        Antonio González   Avoiding core's DUE & SDC via acoustic
                                  wave detectors and tailored error
                                  containment and recovery . . . . . . . . 37--48
                  Long Chen and   
                     Zhao Zhang   MemGuard: a low cost and energy
                                  efficient design to support and enhance
                                  memory system reliability  . . . . . . . 49--60
     Siva Kumar Sastry Hari and   
          Radha Venkatagiri and   
             Sarita V. Adve and   
                   Helia Naeimi   GangES: gang error simulation for
                                  hardware resiliency evaluation . . . . . 61--72
                Jack Wadden and   
       Alexander Lyashevsky and   
        Sudhanva Gurumurthi and   
            Vilas Sridharan and   
                  Kevin Skadron   Real-world design and evaluation of
                                  compiler-managed GPU redundant
                                  multithreading . . . . . . . . . . . . . 73--84
               Tianshi Chen and   
                     Qi Guo and   
                    Ke Tang and   
              Olivier Temam and   
                  Zhiwei Xu and   
               Zhi-Hua Zhou and   
                     Yunji Chen   ArchRanker: a ranking approach to design
                                  space exploration  . . . . . . . . . . . 85--96
          Yakun Sophia Shao and   
             Brandon Reagen and   
                Gu-Yeon Wei and   
                   David Brooks   Aladdin: a Pre-RTL, power-performance
                                  accelerator simulator enabling large
                                  design space exploration of customized
                                  architectures  . . . . . . . . . . . . . 97--108
                 Mario Badr and   
         Natalie Enright Jerger   SynFull: synthetic traffic models
                                  capturing cache coherent behaviour . . . 109--120
              Ashish Venkat and   
                Dean M. Tullsen   Harnessing ISA diversity: design of a
                                  heterogeneous-ISA chip multiprocessor    121--132
           Andreas Sembrant and   
             Erik Hagersten and   
           David Black-Schaffer   The Direct-to-Data (D2D) cache:
                                  navigating the cache hierarchy with a
                                  single lookup  . . . . . . . . . . . . . 133--144
           Angelos Arelakis and   
                  Per Stenstrom   SC2: a statistical compression cache
                                  scheme . . . . . . . . . . . . . . . . . 145--156
             Vivek Seshadri and   
          Abhishek Bhowmick and   
                 Onur Mutlu and   
         Phillip B. Gibbons and   
          Michael A. Kozuch and   
                  Todd C. Mowry   The dirty-block index  . . . . . . . . . 157--168
                    Lei Liu and   
                    Yong Li and   
                  Zehan Cui and   
                Yungang Bao and   
                Mingyu Chen and   
                   Chengyong Wu   Going vertical in memory management:
                                  handling multiplicity by multi-policy    169--180
                Marc S. Orr and   
       Bradford M. Beckmann and   
        Steven K. Reinhardt and   
                  David A. Wood   Fine-grain task aggregation and
                                  coordination on GPUs . . . . . . . . . . 181--192
               Ivan Tanasic and   
               Isaac Gelado and   
             Javier Cabezas and   
               Alex Ramirez and   
              Nacho Navarro and   
                   Mateo Valero   Enabling preemptive multiprogramming on
                                  GPUs . . . . . . . . . . . . . . . . . . 193--204
            Dani Voitsechov and   
                    Yoav Etsion   Single-graph multiple flows: energy
                                  efficient design alternative for GPGPUs  205--216
           Simone Campanoni and   
             Kevin Brownell and   
               Svilen Kanev and   
           Timothy M. Jones and   
                Gu-Yeon Wei and   
                   David Brooks   HELIX--RC: an architecture-compiler
                                  co-design for automatic parallelization
                                  of irregular programs  . . . . . . . . . 217--228
                 James E. Smith   Efficient digital neurons for large
                                  scale cortical architectures . . . . . . 229--240
        Karthik Swaminathan and   
                 Huichu Liu and   
               Jack Sampson and   
        Vijaykrishnan Narayanan   An examination of the architecture and
                                  system-level tradeoffs of employing
                                  steep slope devices in $3$D CMPs . . . . 241--252
     Rangharajan Venkatesan and   
Shankar Ganesh Ramasubramanian and   
      Swagath Venkataramani and   
                Kaushik Roy and   
              Anand Raghunathan   STAG: spintronic-tape architecture for
                                  GPGPU cache hierarchies  . . . . . . . . 253--264
              Steven Pelley and   
              Peter M. Chen and   
              Thomas F. Wenisch   Memory persistency . . . . . . . . . . . 265--276
        Morteza Hoseinzadeh and   
          Mohammad Arjomand and   
             Hamid Sarbazi-Azad   Reducing access latency of MLC PCMs
                                  through line striping  . . . . . . . . . 277--288
             Myoungsoo Jung and   
                 Wonil Choi and   
        Shekhar Srikantaiah and   
               Joonhyuk Yoo and   
             Mahmut T. Kandemir   HIOS: a host interface I/O scheduler for
                                  solid state disks  . . . . . . . . . . . 289--300
                   David Lo and   
                Liqun Cheng and   
           Rama Govindaraju and   
  Luiz André Barroso and   
             Christos Kozyrakis   Towards energy proportionality for
                                  large-scale latency-critical workloads   301--312
                 Yanpei Liu and   
            Stark C. Draper and   
                   Nam Sung Kim   SleepScale: runtime joint speed scaling
                                  and sleep states management for power
                                  efficient data centers . . . . . . . . . 313--324
                   Ming Liu and   
                         Tao Li   Optimizing virtual machine consolidation
                                  performance on NUMA server architecture
                                  for cloud workloads  . . . . . . . . . . 325--336
                  Seongil O and   
             Young Hoon Son and   
               Nam Sung Kim and   
                    Jung Ho Ahn   Row-buffer decoupling: a case for
                                  low-latency DRAM microarchitecture . . . 337--348
                  Tao Zhang and   
                    Ke Chen and   
                    Cong Xu and   
                Guangyu Sun and   
                   Tao Wang and   
                       Yuan Xie   Half-DRAM: a high-bandwidth and
                                  low-power DRAM architecture from the
                                  rethinking of fine-grained activation    349--360
                 Yoongu Kim and   
                  Ross Daly and   
                Jeremie Kim and   
               Chris Fallin and   
                 Ji Hye Lee and   
               Donghyuk Lee and   
            Chris Wilkerson and   
                 Konrad Lai and   
                     Onur Mutlu   Flipping bits in memory without
                                  accessing them: an experimental study of
                                  DRAM disturbance errors  . . . . . . . . 361--372
               Runjie Zhang and   
                    Ke Wang and   
             Brett H. Meyer and   
             Mircea R. Stan and   
                  Kevin Skadron   Architecture implications of pads as a
                                  scarce resource  . . . . . . . . . . . . 373--384
              Shaoming Chen and   
                     Yue Hu and   
                 Ying Zhang and   
                    Lu Peng and   
              Jesse Ardonne and   
              Samuel Irving and   
               Ashok Srivastava   Increasing off-chip bandwidth in
                                  multi-core processors with switchable
                                  pins . . . . . . . . . . . . . . . . . . 385--396
                  Lei Jiang and   
                    Bo Zhao and   
                   Jun Yang and   
                   Youtao Zhang   A low power and reliable charge pump
                                  design for phase change memories . . . . 397--408
        Gwendolyn Voskuilen and   
               T. N. Vijaykumar   Fractal++: closing the performance gap
                                  between fractal and conventional
                                  coherence  . . . . . . . . . . . . . . . 409--420
                Xuehai Qian and   
         Benjamin Sahelices and   
                Josep Torrellas   OmniOrder: directory-based conflict
                                  serialization of transactions  . . . . . 421--432
                Xuehai Qian and   
         Benjamin Sahelices and   
                     Depei Qian   Pacifier: record and replay for
                                  relaxed-consistency multiprocessors with
                                  distributed directory protocol . . . . . 433--444
             Nima Honarmand and   
                Josep Torrellas   Replay debugging: leveraging record and
                                  replay for program debugging . . . . . . 445--456
          Jonathan Woodruff and   
        Robert N. M. Watson and   
             David Chisnall and   
             Simon W. Moore and   
          Jonathan Anderson and   
               Brooks Davis and   
                 Ben Laurie and   
           Peter G. Neumann and   
              Robert Norton and   
                    Michael Roe   The CHERI capability model: revisiting
                                  RISC in an age of risk . . . . . . . . . 457--468
          Llu\"\is Vilanova and   
            Muli Ben-Yehuda and   
              Nacho Navarro and   
                Yoav Etsion and   
                   Mateo Valero   CODOMs: protecting software with
                                  code-centric memory domains  . . . . . . 469--480
              Arthur Perais and   
            André Seznec   EOLE: paving the way for an effective
                                  implementation of value prediction . . . 481--492
         Kenneth Czechowski and   
              Victor W. Lee and   
              Ed Grochowski and   
                Ronny Ronen and   
              Ronak Singhal and   
              Richard Vuduc and   
                  Pradeep Dubey   Improving the energy efficiency of big
                                  cores  . . . . . . . . . . . . . . . . . 493--504
     Renée St. Amant and   
          Amir Yazdanbakhsh and   
                Jongse Park and   
           Bradley Thwaites and   
          Hadi Esmaeilzadeh and   
             Arjang Hassibi and   
                  Luis Ceze and   
                    Doug Burger   General-purpose code acceleration with
                                  limited-precision analog computation . . 505--516
            Advait Madhavan and   
           Timothy Sherwood and   
                 Dmitri Strukov   Race logic: a hardware acceleration for
                                  dynamic programming algorithms . . . . . 517--528
           Jose-Maria Arnau and   
      Joan-Manuel Parcerisa and   
          Polychronis Xekalakis   Eliminating redundant fragment shader
                                  executions on a mobile GPU via hardware
                                  memoization  . . . . . . . . . . . . . . 529--540
                  Yuhao Zhu and   
             Vijay Janapa Reddi   WebCore: architectural support for
                                  mobile Web browsing  . . . . . . . . . . 541--552

ACM SIGARCH Computer Architecture News
Volume 42, Number 4, September, 2014

              Yuetsu Kodama and   
           Toshihiro Hanawa and   
               Taisuke Boku and   
                 Mitsuhisa Sato   PEACH2: an FPGA-based PCIe network
                                  device for Tightly Coupled Accelerators  3--8
             Shimpei Nomura and   
           Takuji Mitsuishi and   
                 Jun Suzuki and   
               Yuki Hayashi and   
                 Masaki Kan and   
                 Hideharu Amano   Performance Analysis of the Multi-GPU
                                  System with ExpEther . . . . . . . . . . 9--14
          Tsuyoshi Watanabe and   
               Naohito Nakasato   GPU Accelerated Hybrid Tree Algorithm
                                  for Collision Less $N$-body Simulations  15--20
           Haruhisa Tsuyama and   
               Tsutomu Maruyama   GPU and FPGA Acceleration of Level Set
                                  Method . . . . . . . . . . . . . . . . . 21--25
                  Yu Tanabe and   
               Tsutomu Maruyama   Fast and Accurate Optical Flow
                                  Estimation using FPGA  . . . . . . . . . 27--32
       Cesar Torres-Huitzil and   
Marco Aurelio Nuño-Maganda   Area-time Efficient Implementation of
                                  Local Adaptive Image Thresholding in
                                  Reconfigurable Hardware  . . . . . . . . 33--38
           Diana Göhringer   Reconfigurable Multiprocessor Systems:
                                  Handling Hydras Heads --- A Survey . . . 39--44
               Kentaro Sano and   
              Ryotaro Chiba and   
                Tomoya Ueno and   
              Hayato Suzuki and   
                    Ryo Ito and   
                Satoru Yamamoto   FPGA-based Custom Computing Architecture
                                  for Large-Scale Fluid Simulation with
                                  Building Cube Method . . . . . . . . . . 45--50
                   Tao Wang and   
                Guangyu Sun and   
                Jiahua Chen and   
                  Jian Gong and   
                 Haoyang Wu and   
               Xiaoguang Li and   
                  Songwu Lu and   
                     Jason Cong   GRT: a Reconfigurable SDR Platform with
                                  High Performance and Usability . . . . . 51--56
                  Yuki Ando and   
             Masataka Ogawa and   
             Yuya Mizoguchi and   
              Kouta Kumagai and   
             Miaw Torng-Der and   
                   Shinya Honda   A Case Study of FPGA Blokus Duo Solver
                                  by System-Level Design . . . . . . . . . 57--62
              Mioara Joldes and   
          Valentina Popescu and   
                 Warwick Tucker   Searching for Sinks for the Hénon Map
                                  using a Multiple-precision GPU
                                  Arithmetic Library . . . . . . . . . . . 63--68
                Rie Soejima and   
                 Koji Okina and   
               Keisuke Dohi and   
           Yuichiro Shibata and   
                  Kiyoshi Oguri   A Memory Profiling Framework for Stencil
                                  Computation on an FPGA Accelerator with
                                  High Level Synthesis . . . . . . . . . . 69--74
             Shin Morishima and   
               Hiroki Matsutani   Performance Evaluations of Graph
                                  Database using CUDA and OpenMP
                                  Compatible Libraries . . . . . . . . . . 75--80
           Takuji Mitsuishi and   
             Shimpei Nomura and   
                 Jun Suzuki and   
               Yuki Hayashi and   
                 Masaki Kan and   
                 Hideharu Amano   Accelerating Breadth First Search on
                                  GPU--BOX . . . . . . . . . . . . . . . . 81--86
               Jose Nunez-Yanez   Energy efficient Reconfigurable
                                  Computing with Adaptive Voltage and
                                  Logic scaling  . . . . . . . . . . . . . 87--92
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 93--101
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 93--101

ACM SIGARCH Computer Architecture News
Volume 43, Number 1, March, 2015

                   Ozcan Ozturk   Architectural Support for Cyber-Physical
                                  Systems  . . . . . . . . . . . . . . . . 1--1
               Yiying Zhang and   
                  Jian Yang and   
       Amirsaman Memaripour and   
                 Steven Swanson   Mojim: a Reliable and Highly-Available
                                  Non-Volatile Memory System . . . . . . . 3--18
                 Rujia Wang and   
                  Lei Jiang and   
               Youtao Zhang and   
                       Jun Yang   SD--PCM: Constructing Reliable Super
                                  Dense Phase Change Memory under Write
                                  Disturbance  . . . . . . . . . . . . . . 19--31
               Vinson Young and   
           Prashant J. Nair and   
           Moinuddin K. Qureshi   DEUCE: Write-Efficient Encryption for
                                  Non-Volatile Memories  . . . . . . . . . 33--44
              Adam Morrison and   
                    Yehuda Afek   Temporally Bounding TSO for Fence-Free
                                  Asymmetric Synchronization . . . . . . . 45--58
          Alexander Matveev and   
                     Nir Shavit   Reduced Hardware NOrec: a Safe and
                                  Scalable Hybrid Transactional Memory . . 59--71
                Marc S. Orr and   
                  Shuai Che and   
              Ayse Yilmazer and   
       Bradford M. Beckmann and   
               Mark D. Hill and   
                  David A. Wood   Synchronization Using Remote-Scope
                                  Promotion  . . . . . . . . . . . . . . . 73--86
                  Chang Liu and   
              Austin Harris and   
                Martin Maas and   
              Michael Hicks and   
               Mohit Tiwari and   
                     Elaine Shi   GhostRider: a Hardware-Software System
                                  for Memory Trace Oblivious Computation   87--101
    Christopher W. Fletcher and   
                   Ling Ren and   
                Albert Kwon and   
            Marten van Dijk and   
               Srinivas Devadas   Freecursive ORAM: [Nearly] Free
                                  Recursion and Integrity Verification for
                                  Position-based Oblivious RAM . . . . . . 103--116
             David Chisnall and   
             Colin Rothwell and   
        Robert N. M. Watson and   
          Jonathan Woodruff and   
              Munraj Vadera and   
             Simon W. Moore and   
                Michael Roe and   
               Brooks Davis and   
               Peter G. Neumann   Beyond the PDP-11: Architectural Support
                                  for a Memory-Safe C Abstract Machine . . 117--130
                  Jiuyue Ma and   
                Xiufeng Sui and   
                Ninghui Sun and   
                  Yupeng Li and   
                   Zihao Yu and   
                Bowen Huang and   
                  Tianni Xu and   
               Zhicheng Yao and   
                   Yun Chen and   
                Haibin Wang and   
                Lixin Zhang and   
                    Yungang Bao   Supporting Differentiated Services in
                                  Computers via Programmable Architecture
                                  for Resourcing-on-Demand (PARD)  . . . . 131--143
                Yushi Omote and   
         Takahiro Shinagawa and   
                  Kazuhiko Kato   Improving Agility and Elasticity in
                                  Bare-metal Clouds  . . . . . . . . . . . 145--159
                Md E. Haque and   
               Yong hun Eom and   
                 Yuxiong He and   
             Sameh Elnikety and   
          Ricardo Bianchini and   
            Kathryn S. McKinley   Few-to-Many: Incremental Parallelism for
                                  Reducing Tail Latency in Interactive
                                  Services . . . . . . . . . . . . . . . . 161--175
               Patrick Colp and   
               Jiawen Zhang and   
              James Gleeson and   
               Sahil Suneja and   
               Eyal de Lara and   
               Himanshu Raj and   
              Stefan Saroiu and   
                    Alec Wolman   Protecting Data on Smartphones and
                                  Tablets from Memory Attacks  . . . . . . 177--189
          Nathan Dautenhahn and   
       Theodoros Kasampalis and   
                 Will Dietz and   
              John Criswell and   
                    Vikram Adve   Nested Kernel: an Operating System
                                  Architecture for Intra-Kernel Privilege
                                  Separation . . . . . . . . . . . . . . . 191--206
                Zhangxi Tan and   
              Zhenghao Qian and   
                    Xi Chen and   
             Krste Asanovic and   
                David Patterson   DIABLO: a Warehouse-Scale Computer
                                  Network Simulator using FPGAs  . . . . . 207--221
            Johann Hauswald and   
      Michael A. Laurenzano and   
                Yunqi Zhang and   
                   Cheng Li and   
            Austin Rovinski and   
              Arjun Khurana and   
       Ronald G. Dreslinski and   
               Trevor Mudge and   
          Vinicius Petrucci and   
               Lingjia Tang and   
                     Jason Mars   Sirius: an Open End-to-End Voice and
                                  Vision Personal Assistant and Its
                                  Implications for Future Warehouse Scale
                                  Computers  . . . . . . . . . . . . . . . 223--238
                    Chao Xu and   
          Felix Xiaozhu Lin and   
                Yuyang Wang and   
                      Lin Zhong   Automated OS-level Device Runtime Power
                                  Management . . . . . . . . . . . . . . . 239--252
  Íñigo Goiri and   
              Thu D. Nguyen and   
              Ricardo Bianchini   CoolAir: Temperature- and
                                  Variation-Aware Management for
                                  Free-Cooled Datacenters  . . . . . . . . 253--265
              Nikita Mishra and   
               Huazhe Zhang and   
           John D. Lafferty and   
                 Henry Hoffmann   A Probabilistic Graphical Model-based
                                  Approach for Minimizing Energy Under
                                  Performance Constraints  . . . . . . . . 267--281
                   Jun Pang and   
                Chris Dwyer and   
                Alvin R. Lebeck   More is Less, Less is More:
                                  Molecular-Scale Photonic NoC Power
                                  Topologies . . . . . . . . . . . . . . . 283--296
            Vilas Sridharan and   
        Nathan DeBardeleben and   
             Sean Blanchard and   
           Kurt B. Ferreira and   
               Jon Stearley and   
                 John Shalf and   
            Sudhanva Gurumurthi   Memory Errors in Modern Systems: The
                                  Good, The Bad, and The Ugly  . . . . . . 297--310
                Yavuz Yetim and   
               Sharad Malik and   
             Margaret Martonosi   CommGuard: Mitigating Communication
                                  Errors in Error-Prone Parallel Execution 311--323
               Dohyeong Kim and   
               Yonghwi Kwon and   
          William N. Sumner and   
              Xiangyu Zhang and   
                     Dongyan Xu   Dual Execution for On the Fly Fine
                                  Grained Execution Comparison . . . . . . 325--338
                 Petr Hosek and   
                 Cristian Cadar   VARAN the Unbelievable: an Efficient
                                  $N$-version Execution Framework  . . . . 339--353
                Moshe Malka and   
                 Nadav Amit and   
            Muli Ben-Yehuda and   
                    Dan Tsafrir   rIOMMU: Efficient IOMMU for I/O Devices
                                  that Employ Ring Buffers . . . . . . . . 355--368
                  Daofu Liu and   
               Tianshi Chen and   
                 Shaoli Liu and   
               Jinhong Zhou and   
             Shengyuan Zhou and   
              Olivier Teman and   
              Xiaobing Feng and   
                Xuehai Zhou and   
                     Yunji Chen   PuDianNao: a Polyvalent Machine Learning
                                  Accelerator  . . . . . . . . . . . . . . 369--381
                Inigo Goiri and   
          Ricardo Bianchini and   
        Santosh Nagarakatte and   
                  Thu D. Nguyen   ApproxHadoop: Bringing Approximations to
                                  MapReduce Frameworks . . . . . . . . . . 383--397
         Michael Ringenburg and   
             Adrian Sampson and   
             Isaac Ackerman and   
                  Luis Ceze and   
                   Dan Grossman   Monitoring and Debugging the Quality of
                                  Results in Approximate Programs  . . . . 399--411
               Guruduth Banavar   Watson and the Era of Cognitive
                                  Computing  . . . . . . . . . . . . . . . 413--413
             Gordon Stewart and   
              Mahanth Gowda and   
          Geoffrey Mainland and   
          Bozidar Radunovic and   
       Dimitrios Vytiniotis and   
         Cristina Luengo Agullo   Ziria: a DSL for Wireless Systems
                                  Programming  . . . . . . . . . . . . . . 415--428
        Ravi Teja Mullapudi and   
              Vinay Vasista and   
                Uday Bondhugula   PolyMage: Automatic Optimization for
                                  Image Processing Pipelines . . . . . . . 429--443
                Jeff Heckey and   
               Shruti Patil and   
           Ali JavadiAbhari and   
                Adam Holmes and   
              Daniel Kudrow and   
           Kenneth R. Brown and   
             Diana Franklin and   
          Frederic T. Chong and   
             Margaret Martonosi   Compiler Management of Communication and
                                  Parallelism for Quantum Computation  . . 445--456
     Muhammad Amber Hassaan and   
           Donald D. Nguyen and   
              Keshav K. Pingali   Kinetic Dependence Graphs  . . . . . . . 457--471
 Stelios Sidiroglou-Douskos and   
              Eric Lahtinen and   
         Nathan Rittenhouse and   
              Paolo Piselli and   
                   Fan Long and   
               Deokhwan Kim and   
                  Martin Rinard   Targeted Automatic Integer Overflow
                                  Discovery Using Goal-Directed
                                  Conditional Branch Enforcement . . . . . 473--486
                Udit Dhawan and   
             Catalin Hritcu and   
              Raphael Rubin and   
            Nikos Vasilakis and   
          Silviu Chiricescu and   
          Jonathan M. Smith and   
      Thomas F. Knight, Jr. and   
         Benjamin C. Pierce and   
                    Andre DeHon   Architectural Support for
                                  Software-Defined Metadata Processing . . 487--502
              Danfeng Zhang and   
                   Yao Wang and   
              G. Edward Suh and   
                Andrew C. Myers   A Hardware Design Language for
                                  Timing-Sensitive Information-Flow
                                  Security . . . . . . . . . . . . . . . . 503--516
              Matthew Hicks and   
            Cynthia Sturton and   
             Samuel T. King and   
              Jonathan M. Smith   SPECS: a Lightweight Runtime Mechanism
                                  for Protecting Software from
                                  Security-Critical Processor Bugs . . . . 517--529
                 Yuelu Duan and   
             Nima Honarmand and   
                Josep Torrellas   Asymmetric Memory Fences: Optimizing
                                  Both Performance and Implementability    531--543
                Hyojin Sung and   
                 Sarita V. Adve   DeNovoSync: Efficient Support for
                                  Arbitrary Synchronization without
                                  Writer-Initiated Invalidations . . . . . 545--559
            Aritra Sengupta and   
           Swarnendu Biswas and   
               Minjia Zhang and   
            Michael D. Bond and   
                Milind Kulkarni   Hybrid Static-Dynamic Analysis for
                                  Statically Bounded Region
                                  Serializability  . . . . . . . . . . . . 561--575
               Jade Alglave and   
                 Mark Batty and   
      Alastair F. Donaldson and   
      Ganesh Gopalakrishnan and   
              Jeroen Ketema and   
              Daniel Poetzl and   
             Tyler Sorensen and   
                 John Wickerson   GPU Concurrency: Weak Behaviours and
                                  Programming Assumptions  . . . . . . . . 577--591
        Jason Jong Kyu Park and   
               Yongjun Park and   
                   Scott Mahlke   Chimera: Collaborative Preemption for
                                  Multitasking on a Shared GPU . . . . . . 593--606
               Neha Agarwal and   
              David Nellans and   
            Mark Stephenson and   
              Mike O'Connor and   
             Stephen W. Keckler   Page Placement Strategies for GPUs
                                  within Heterogeneous Memory Systems  . . 607--618
                Zhijia Zhao and   
                    Xipeng Shen   On-the-Fly Principled Speculation for
                                  FSM Parallelization  . . . . . . . . . . 619--630
                Tudor David and   
           Rachid Guerraoui and   
           Vasileios Trigonakis   Asynchronized Concurrency: The Secret to
                                  Scaling Concurrent Search Data
                                  Structures . . . . . . . . . . . . . . . 631--644
            Pramod Bhatotia and   
              Pedro Fonseca and   
               Umut A. Acar and   
  Björn B. Brandenburg and   
              Rodrigo Rodrigues   iThreads: a Threading Library for
                                  Parallel Incremental Computation . . . . 645--659
               Lokesh Gidra and   
           Gaël Thomas and   
              Julien Sopena and   
               Marc Shapiro and   
                    Nhan Nguyen   NumaGiC: a Garbage Collector for Big
                                  Data on Big NUMA Machines  . . . . . . . 661--673
               Khanh Nguyen and   
                   Kai Wang and   
                  Yingyi Bu and   
                    Lu Fang and   
                 Jianfei Hu and   
                     Guoqing Xu   FACADE: a Compiler and Runtime for
                                  (Almost) Object-Bounded Big Data
                                  Applications . . . . . . . . . . . . . . 675--690
              Varun Agrawal and   
            Abhiroop Dabral and   
                Tapti Palit and   
              Yongming Shen and   
                Michael Ferdman   Architectural Support for Dynamic
                                  Linking  . . . . . . . . . . . . . . . . 691--702

ACM SIGARCH Computer Architecture News
Volume 43, Number 3, May, 2015

            Andrew A. Chien and   
           Tung Thanh-Hoang and   
            Dilip Vasudevan and   
               Yuanwei Fang and   
             Amirali Shambayati   $ 10 \times 10 $: a Case Study in
                                  Highly-Programmable and Energy-Efficient
                                  Heterogeneous Federated Architecture . . 2--9
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 10--16

ACM SIGARCH Computer Architecture News
Volume 43, Number 4, September, 2015

            Martin Herbordt and   
                  Miriam Leeser   Off-Loading LET Generation to PEACH2: a
                                  Switching Hub for High Performance GPU
                                  Clusters . . . . . . . . . . . . . . . . 3--8
                 Koji Okina and   
                Rie Soejima and   
              Kota Fukumoto and   
           Yuichiro Shibata and   
                  Kiyoshi Oguri   Power Performance Profiling of $3$-D
                                  Stencil Computation on an FPGA
                                  Accelerator for Efficient Pipeline
                                  Optimization . . . . . . . . . . . . . . 9--14
              Ahmad Lashgar and   
                Ebad Salehi and   
              Amirali Baniasadi   A Case Study in Reverse Engineering
                                  GPGPUs: Outstanding Memory Handling
                                  Resources  . . . . . . . . . . . . . . . 15--21
                Ami Hayashi and   
             Yuta Tokusashi and   
               Hiroki Matsutani   A Line Rate Outlier Filtering FPGA NIC
                                  using 10GbE Interface  . . . . . . . . . 22--27
        Abhishek Kumar Jain and   
                Xiangwei Li and   
            Suhaib A. Fahmy and   
             Douglas L. Maskell   Adapting the DySER Architecture with DSP
                                  Blocks as an Overlay for the Xilinx Zynq 28--33
    David de la Chevallerie and   
               Jens Korinth and   
                   Andreas Koch   ffLink: a Lightweight High-Performance
                                  Open-Source PCI Express Gen3 Interface
                                  for Reconfigurable Accelerators  . . . . 34--39
           Soukaina N. Hmid and   
        Jose G. F. Coutinho and   
                      Wayne Luk   A Transfer-Aware Runtime System for
                                  Heterogeneous Asynchronous Parallel
                                  Execution  . . . . . . . . . . . . . . . 40--45
            Ahmed Al-Wattar and   
              Shawki Areibi and   
                    Gary Grewal   Efficient Mapping and Allocation of
                                  Execution Units to Task Graphs using an
                                  Evolutionary Framework . . . . . . . . . 46--51
                Amir Momeni and   
               Hamed Tabkhi and   
               Yash Ukidave and   
             Gunar Schirner and   
                    David Kaeli   Exploring the Efficiency of the OpenCL
                                  Pipe Semantic on an FPGA . . . . . . . . 52--57
           Takuji Mitsuishi and   
                 Jun Suzuki and   
               Yuki Hayashi and   
                 Masaki Kan and   
                 Hideharu Amano   Breadth First Search on Cost-efficient
                                  Multi-GPU Systems  . . . . . . . . . . . 58--63
            Michael Mefenza and   
            Nicolas Edwards and   
               Christophe Bobda   Interface Based Memory Synthesis of
                                  Image Processing Applications in FPGA    64--69
                    Da Tong and   
                Viktor Prasanna   High Throughput Sketch Based Online
                                  Heavy Hitter Detection on FPGA . . . . . 70--75
               Xinying Wang and   
           Phillip H. Jones and   
                Joseph Zambreno   A Configurable Architecture for Sparse $
                                  L U $ Decomposition on Matrices with
                                  Arbitrary Patterns . . . . . . . . . . . 76--81
               Kentaro Sano and   
                Fumiya Kono and   
           Naohito Nakasato and   
         Alexander Vazhenin and   
             Stanislav Sedukhin   Stream Computation of Shallow Water
                                  Equation Solver for FPGA-based $1$D
                                  Tsunami Simulation . . . . . . . . . . . 82--87
               Liucheng Guo and   
       Andreea Ingrid Funie and   
            David B. Thomas and   
                 Haohuan Fu and   
                      Wayne Luk   Parallel Genetic Algorithms on Multiple
                                  FPGAs  . . . . . . . . . . . . . . . . . 86--93
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 94--100

ACM SIGARCH Computer Architecture News
Volume 43, Number 5, December, 2015

                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 7--11

ACM SIGARCH Computer Architecture News
Volume 44, Number 1, May, 2016

      Hadi Asgharimoghaddam and   
                   Nam Sung Kim   SpinWise: a Practical Energy-Efficient
                                  Synchronization Technique for CMPs . . . 1--8
              Lena E. Olson and   
                   Mark D. Hill   Probabilistic Directed Writebacks for
                                  Exclusive Caches . . . . . . . . . . . . 9--18
                   Mark Thorson   Internet Nuggets . . . . . . . . . . . . 19--22

ACM SIGARCH Computer Architecture News
Volume 44, Number 2, May, 2016

                  Yuanyuan Zhou   Programming Uncertain $<$T$>$ hings  . . . 1--2
               Sergi Abadal and   
   Albert Cabellos-Aparicio and   
             Eduard Alarcon and   
                Josep Torrellas   WiSync: an Architecture for Fast
                                  Synchronization through On-Chip Wireless
                                  Communication  . . . . . . . . . . . . . 3--17
              Xiaodong Wang and   
 José F. Martínez   ReBudget: Trading Off Efficiency vs.
                                  Fairness in Market-Based Multicore
                                  Resource Allocation via Runtime Budget
                                  Reassignment . . . . . . . . . . . . . . 19--32
                Haishan Zhu and   
                    Mattan Erez   Dirigent: Enforcing QoS for
                                  Latency-Critical Tasks on Shared
                                  Multicore Systems  . . . . . . . . . . . 33--47
             Yossi Kuperman and   
             Eyal Moscovici and   
                 Joel Nider and   
             Razya Ladelsky and   
                Abel Gordon and   
                    Dan Tsafrir   Paravirtual Remote I/O . . . . . . . . . 49--65
           Antoine Kaufmann and   
                SImon Peter and   
          Naveen Kr. Sharma and   
            Thomas Anderson and   
           Arvind Krishnamurthy   High Performance Packet Processing with
                                  FlexNIC  . . . . . . . . . . . . . . . . 67--81
             James Bornholt and   
           Antoine Kaufmann and   
                  Jialin Li and   
       Arvind Krishnamurthy and   
               Emina Torlak and   
                        Xi Wang   Specifying and Checking File System
                                  Crash-Consistency Models . . . . . . . . 83--98
            Aravinda Prasad and   
                    K. Gopinath   Prudent Memory Reclamation in
                                  Procrastination-Based Synchronization    99--112
             Anurag Mukkara and   
            Nathan Beckmann and   
                 Daniel Sanchez   Whirlpool: Improving Dynamic Cache
                                  Management with Static Data
                                  Classification . . . . . . . . . . . . . 113--127
             Myeongjae Jeon and   
                 Yuxiong He and   
                 Hwanju Kim and   
             Sameh Elnikety and   
               Scott Rixner and   
                    Alan L. Cox   TPC: Target-Driven Parallelism Combining
                                  Prediction and Correction to Reduce Tail
                                  Latency in Interactive Services  . . . . 129--141
               Fraser Brown and   
         Andres Nötzli and   
                  Dawson Engler   How to Build Static Checking Systems
                                  Using Orders of Magnitude Less Code  . . 143--157
                 Tong Zhang and   
               Dongyoon Lee and   
                  Changhee Jung   TxRace: Efficient Data Race Detection
                                  Using Commodity Hardware Transactional
                                  Memory . . . . . . . . . . . . . . . . . 159--173
               Sidney Amani and   
                 Alex Hixon and   
                 Zilin Chen and   
        Christine Rizkallah and   
                Peter Chubb and   
              Liam O'Connor and   
                Joel Beeren and   
           Yutaka Nagashima and   
                Japheth Lim and   
              Thomas Sewell and   
               Joseph Tuong and   
            Gabriele Keller and   
                Toby Murray and   
               Gerwin Klein and   
                  Gernot Heiser   Cogent: Verifying High-Assurance File
                                  System Implementations . . . . . . . . . 175--188
              Nils Asmussen and   
           Marcus Völp and   
       Benedikt Nöthen and   
        Hermann Härtig and   
               Gerhard Fettweis   M3: a Hardware/Operating-System
                                  Co-Design to Tame Heterogeneous
                                  Manycores  . . . . . . . . . . . . . . . 189--203
             Daniyal Liaqat and   
              Silviu Jingoi and   
               Eyal de Lara and   
                Ashvin Goel and   
                  Wilson To and   
                  Kevin Lee and   
     Italo De Moraes Garcia and   
                 Manuel Saldana   Sidewinder: an Energy Efficient and
                                  Developer Friendly Heterogeneous
                                  Architecture for Continuous Mobile
                                  Sensing  . . . . . . . . . . . . . . . . 205--215
           Jonathan Balkind and   
            Michael McKeown and   
                Yaosheng Fu and   
                 Tri Nguyen and   
                 Yanqi Zhou and   
              Alexey Lavrov and   
           Mohammad Shahrad and   
                  Adi Fuchs and   
               Samuel Payne and   
              Xiaohua Liang and   
               Matthew Matl and   
                David Wentzlaff   OpenPiton: an Open Source Manycore
                                  Research Framework . . . . . . . . . . . 217--232
              Daniel Lustig and   
                 Geet Sethi and   
         Margaret Martonosi and   
         Abhishek Bhattacharjee   COATCheck: Verifying Memory Ordering at
                                  the Hardware-OS Interface  . . . . . . . 233--247
               Alex Markuze and   
              Adam Morrison and   
                    Dan Tsafrir   True IOMMU Protection from DMA Attacks:
                                  When Copy is Faster than Zero Copy . . . 249--262
                  Amro Awad and   
         Pratyusa Manadhata and   
               Stuart Haber and   
                Yan Solihin and   
                  William Horne   Silent Shredder: Zero-Cost Shredding for
                                  Secure Non-Volatile Main Memory
                                  Controllers  . . . . . . . . . . . . . . 263--276
              Youngjin Kwon and   
               Alan M. Dunn and   
             Michael Z. Lee and   
            Owen S. Hofmann and   
               Yuanzhong Xu and   
                 Emmett Witchel   Sego: Pervasive Trusted Metadata for
                                  Efficiently Verified Untrusted System
                                  Services . . . . . . . . . . . . . . . . 277--290
                    Dan Tsafrir   Synopsis of the ASPLOS '16 Wild and
                                  Crazy Ideas (WACI) Invited-Speakers
                                  Session  . . . . . . . . . . . . . . . . 291--294
            R. Stanley Williams   Brain Inspired Computing . . . . . . . . 295--295
Phitchaya Mangpo Phothilimthana and   
              Aditya Thakur and   
            Rastislav Bodik and   
               Dinakar Dhurjati   Scaling up Superoptimization . . . . . . 297--310
          Niranjan Hasabnis and   
                       R. Sekar   Lifting Assembly to Intermediate
                                  Representation: a Novel Approach
                                  Leveraging Compilers . . . . . . . . . . 311--324
        Saurav Muralidharan and   
                   Amit Roy and   
                  Mary Hall and   
            Michael Garland and   
                     Piyush Rai   Architecture-Adaptive Code Variant
                                  Tuning . . . . . . . . . . . . . . . . . 325--338
               Xiaofeng Lin and   
                    Yu Chen and   
                Xiaodong Li and   
                 Junjie Mao and   
                 Jiaquan He and   
                     Wei Xu and   
                   Yuanchun Shi   Scalable Kernel TCP Design and
                                  Implementation for Short-Lived
                                  Connections  . . . . . . . . . . . . . . 339--352
              Izzat El Hajj and   
          Alexander Merritt and   
             Gerd Zellweger and   
            Dejan Milojicic and   
             Reto Achermann and   
           Paolo Faraboschi and   
                Wen-mei Hwu and   
             Timothy Roscoe and   
                 Karsten Schwan   SpaceJMP: Programming with Multiple
                                  Virtual Address Spaces . . . . . . . . . 353--368
          Felix Xiaozhu Lin and   
                         Xu Liu   \ttf memif: Towards Programming
                                  Heterogeneous Memory Asynchronously  . . 369--383
               Wook-Hee Kim and   
               Jinwoong Kim and   
               Woongki Baek and   
               Beomseok Nam and   
                     Youjip Won   NVWAL: Exploiting NVRAM in Write-Ahead
                                  Logging  . . . . . . . . . . . . . . . . 385--398
             Aasheesh Kolli and   
              Steven Pelley and   
                  Ali Saidi and   
              Peter M. Chen and   
              Thomas F. Wenisch   High-Performance Transactions for
                                  Persistent Memories  . . . . . . . . . . 399--411
                   Qing Guo and   
              Karin Strauss and   
                  Luis Ceze and   
             Henrique S. Malvar   High-Density Image Storage Using
                                  Approximate Memory Cells . . . . . . . . 413--426
         Joseph Izraelevitz and   
              Terence Kelly and   
                 Aasheesh Kolli   Failure-Atomic Persistent Memory Updates
                                  via JUSTDO Logging . . . . . . . . . . . 427--442
                 Jaeung Han and   
             Seungheun Jeon and   
              Young-ri Choi and   
                    Jaehyuk Huh   Interference Management for Distributed
                                  Parallel Applications in Consolidated
                                  Clusters . . . . . . . . . . . . . . . . 443--456
                Martin Maas and   
           Krste Asanovi\'c and   
                 Tim Harris and   
               John Kubiatowicz   Taurus: a Holistic Language Runtime
                                  System for Coordinating Distributed
                                  Managed-Language Applications  . . . . . 457--471
       Christina Delimitrou and   
             Christos Kozyrakis   HCloud: Resource-Efficient Provisioning
                                  in Shared Cloud Systems  . . . . . . . . 473--488
                    Xiao Yu and   
              Pallavi Joshi and   
                  Jianwu Xu and   
               Guoliang Jin and   
                  Hui Zhang and   
                   Guofei Jiang   CloudSeer: Workflow Monitoring of Cloud
                                  Infrastructures via Interleaved Logs . . 489--502
               Yonghwi Kwon and   
               Dohyeong Kim and   
        William Nick Sumner and   
               Kyungtae Kim and   
     Brendan Saltaformaggio and   
              Xiangyu Zhang and   
                     Dongyan Xu   LDX: Causality Inference by Lightweight
                                  Dual Execution . . . . . . . . . . . . . 503--515
 Tanakorn Leesatapornwongsa and   
          Jeffrey F. Lukman and   
                    Shan Lu and   
              Haryadi S. Gunawi   TaxDC: a Taxonomy of Non-Deterministic
                                  Concurrency Bugs in Datacenter
                                  Distributed Systems  . . . . . . . . . . 517--530
                 Junjie Mao and   
                    Yu Chen and   
                 Qixue Xiao and   
                   Yuanchun Shi   RID: Finding Reference Count Bugs with
                                  Inconsistent Path Pair Checking  . . . . 531--544
               Huazhe Zhang and   
                 Henry Hoffmann   Maximizing Performance Under a Power
                                  Cap: a Comparison of Hardware, Software,
                                  and Hybrid Techniques  . . . . . . . . . 545--559
               Songchun Fan and   
         Seyed Majid Zahedi and   
                Benjamin C. Lee   The Computational Sprinting Game . . . . 561--575
               Alexei Colin and   
              Graham Harvey and   
              Brandon Lucia and   
              Alanson P. Sample   An Energy-interference-free
                                  Hardware-Software Debugger for
                                  Intermittent Energy-harvesting Systems   577--589
                 Emmett Witchel   Programmer Productivity in a World of
                                  Mushy Interfaces: Challenges of the
                                  Post-ISA Reality . . . . . . . . . . . . 591--591
             Kevin Angstadt and   
             Westley Weimer and   
                  Kevin Skadron   RAPID Programming of Pattern-Recognition
                                  Processors . . . . . . . . . . . . . . . 593--605
                    Xin Sui and   
            Andrew Lenharth and   
          Donald S. Fussell and   
                 Keshav Pingali   Proactive Control of Approximate
                                  Programs . . . . . . . . . . . . . . . . 607--621
                Jongse Park and   
             Emmanuel Amaro and   
              Divya Mahajan and   
           Bradley Thwaites and   
              Hadi Esmaeilzadeh   AxGames: Towards Crowdsourcing Quality
                                  Target Determination in Approximate
                                  Computing  . . . . . . . . . . . . . . . 623--636
             James Bornholt and   
             Randolph Lopez and   
         Douglas M. Carmean and   
                  Luis Ceze and   
               Georg Seelig and   
                  Karin Strauss   A DNA-Based Archival Storage System  . . 637--649
            Raghu Prabhakar and   
           David Koeplinger and   
             Kevin J. Brown and   
             HyoukJoong Lee and   
          Christopher De Sa and   
         Christos Kozyrakis and   
                 Kunle Olukotun   Generating Configurable Hardware from
                                  Parallel Patterns  . . . . . . . . . . . 651--665
               Li-Wen Chang and   
               Hee-Seok Kim and   
                 Wen-mei W. Hwu   DySel: Lightweight Dynamic Selection for
                                  Kernel-based Data-parallel Programming
                                  Model  . . . . . . . . . . . . . . . . . 667--680
                  Quan Chen and   
               Hailong Yang and   
                 Jason Mars and   
                   Lingjia Tang   Baymax: QoS Awareness and Increased
                                  Utilization for Non-Preemptive
                                  Accelerators in Warehouse Scale
                                  Computers  . . . . . . . . . . . . . . . 681--696
              Tony Nowatzki and   
      Karthikeyan Sankaralingam   Analyzing Behavior Specialized
                                  Acceleration . . . . . . . . . . . . . . 697--711
                Man-Ki Yoon and   
           Negin Salajegheh and   
                   Yin Chen and   
           Mihai Christodorescu   PIFT: Predictive Information-Flow
                                  Tracking . . . . . . . . . . . . . . . . 713--725
              Ashish Venkat and   
      Sriskanda Shamasunder and   
              Hovav Shacham and   
                Dean M. Tullsen   HIPStR: Heterogeneous-ISA Program State
                                  Relocation . . . . . . . . . . . . . . . 727--741
      Zelalem Birhanu Aweke and   
  Salessawi Ferede Yitbarek and   
                   Rui Qiao and   
             Reetuparna Das and   
              Matthew Hicks and   
                 Yossi Oren and   
                    Todd Austin   ANVIL: Software-Based Protection Against
                                  Next-Generation Rowhammer Attacks  . . . 743--755
               Diego Didona and   
               Nuno Diegues and   
       Anne-Marie Kermarrec and   
           Rachid Guerraoui and   
              Ricardo Neves and   
                   Paolo Romano   ProteusTM: Abstraction Meets Performance
                                  in Transactional Memory  . . . . . . . . 757--771
                Noam Shalev and   
                Eran Harpaz and   
                Hagar Porat and   
                Idit Keidar and   
                Yaron Weinsberg   CSR: Core Surprise Removal in Commodity
                                  Operating Systems  . . . . . . . . . . . 773--787
            Tanmay Gangwani and   
              Adam Morrison and   
                Josep Torrellas   CASPAR: Breaking Serialization in
                                  Lock-Free Multicore Synchronization  . . 789--804

ACM SIGARCH Computer Architecture News
Volume 44, Number 3, June, 2016

            Jorge Albericio and   
               Patrick Judd and   
        Tayler Hetherington and   
                 Tor Aamodt and   
     Natalie Enright Jerger and   
               Andreas Moshovos   Cnvlutin: ineffectual-neuron-free deep
                                  neural network computing . . . . . . . . 1--13
                Ali Shafiee and   
                Anirban Nag and   
       Naveen Muralimanohar and   
     Rajeev Balasubramonian and   
         John Paul Strachan and   
                    Miao Hu and   
        R. Stanley Williams and   
                 Vivek Srikumar   ISAAC: a convolutional neural network
                                  accelerator with in-situ analog
                                  arithmetic in crossbars  . . . . . . . . 14--26
                   Ping Chi and   
              Shuangchen Li and   
                    Cong Xu and   
                  Tao Zhang and   
                Jishen Zhao and   
                Yongpan Liu and   
                    Yu Wang and   
                       Yuan Xie   PRIME: a novel processing-in-memory
                                  architecture for neural network
                                  computation in ReRAM-based main memory   27--39
          Christopher Torng and   
                Moyang Wang and   
             Christopher Batten   Asymmetry-aware work-stealing runtimes   40--52
             Hung-Wei Tseng and   
              Qianchen Zhao and   
                Yuxiao Zhou and   
               Mark Gahagan and   
                 Steven Swanson   Morpheus: creating application objects
                                  efficiently for heterogeneous computing  53--65
              Divya Mahajan and   
          Amir Yazdanbakhsh and   
                Jongse Park and   
           Bradley Thwaites and   
              Hadi Esmaeilzadeh   Towards statistical guarantees in
                                  controlling quality tradeoffs for
                                  approximate acceleration . . . . . . . . 66--77
              Akanksha Jain and   
                     Calvin Lin   Back to the future: leveraging Belady's
                                  algorithm for improved cache replacement 78--89
          Caching Hyun Park and   
               Taekyung Heo and   
                    Jaehyuk Huh   Efficient synonym filtering and scalable
                                  delayed translation for hybrid virtual   90--102
           Hsiang-Yun Cheng and   
                Jishen Zhao and   
               Jack Sampson and   
            Mary Jane Irwin and   
               Aamer Jaleel and   
                      Yu Lu and   
                       Yuan Xie   LAP: loop-block aware inclusion
                                  properties for energy-efficient
                                  asymmetric last level caches . . . . . . 103--114
           David Koeplinger and   
       Christina Delimitrou and   
            Raghu Prabhakar and   
         Christos Kozyrakis and   
                 Yaqi Zhang and   
                 Kunle Olukotun   Automatic generation of efficient
                                  accelerators for reconfigurable hardware 115--127
                Donggyu Kim and   
           Adam Izraelevitz and   
          Christopher Celio and   
                 Hokeun Kim and   
               Brian Zimmer and   
                 Yunsup Lee and   
          Jonathan Bachrach and   
               Krste Asanovi\'c   Strober: fast and accurate sample-based
                                  energy simulation for arbitrary RTL  . . 128--139
      Michael A. Laurenzano and   
                Yunqi Zhang and   
                 Jiang Chen and   
               Lingjia Tang and   
                     Jason Mars   PowerChop: identifying and managing
                                  non-critical units in hybrid processor
                                  architectures  . . . . . . . . . . . . . 140--152
                Boncheol Gu and   
              Andre S. Yoon and   
                Duck-Ho Bae and   
                  Insoon Jo and   
               Jinyoung Lee and   
              Jonghyun Yoon and   
              Jeong-Uk Kang and   
              Moonsang Kwon and   
                Chanho Yoon and   
               Sangyeun Cho and   
              Jaeheon Jeong and   
                 Duckhyun Chang   Biscuit: a framework for near-data
                                  processing of big data workloads . . . . 153--165
     Muhammet Mustafa Ozdal and   
                Serif Yesil and   
                 Taemin Kim and   
              Andrey Ayupov and   
                 John Greth and   
               Steven Burns and   
                   Ozcan Ozturk   Energy efficient architecture for graph
                                  analytics accelerators . . . . . . . . . 166--177
                Ikuo Magaki and   
             Moein Khazraee and   
        Luis Vega Gutierrez and   
         Michael Bedford Taylor   ASIC clouds: specializing the datacenter 178--190
                   Yunho Oh and   
                Keunsoo Kim and   
             Myung Kuk Yoon and   
             Jong Hyun Park and   
               Yongjun Park and   
                 Won Woo Ro and   
               Murali Annavaram   APRES: improving cache efficiency by
                                  exploiting load characteristics on GPUs  191--203
                Kevin Hsieh and   
             Eiman Ebrahimi and   
               Gwangsun Kim and   
       Niladrish Chatterjee and   
              Mike O'Connor and   
         Nandita Vijaykumar and   
                 Onur Mutlu and   
             Stephen W. Keckler   Transparent offloading and mapping
                                  (TOM): enabling programmer-transparent
                                  near-data processing in GPU systems  . . 204--216
            Chang Hyun Park and   
               Taekyung Heo and   
                    Jaehyuk Huh   Efficient synonym filtering and scalable
                                  delayed translation for hybrid virtual
                                  caching  . . . . . . . . . . . . . . . . 217--229
                  Qiumin Xu and   
                Hyeran Jeon and   
                Keunsoo Kim and   
                 Won Woo Ro and   
               Murali Annavaram   Warped-slicer: efficient intra-SM
                                  slicing through dynamic resource
                                  partitioning for GPU multiprogramming    230--242
                   Song Han and   
                 Xingyu Liu and   
                  Huizi Mao and   
                    Jing Pu and   
             Ardavan Pedram and   
           Mark A. Horowitz and   
               William J. Dally   EIE: efficient inference engine on
                                  compressed deep neural network . . . . . 243--254
             Robert LiKamWa and   
                 Yunhui Hou and   
                 Julian Gao and   
               Mia Polansky and   
                      Lin Zhong   RedEye: analog ConvNet image sensor
                                  architecture for continuous mobile
                                  vision . . . . . . . . . . . . . . . . . 255--266
             Brandon Reagen and   
             Paul Whatmough and   
               Robert Adolf and   
                Saketh Rama and   
              Hyunkwang Lee and   
                Sae Kyu Lee and   
José Miguel Hernández-Lobato and   
                Gu-Yeon Wei and   
                   David Brooks   Minerva: enabling low-power,
                                  highly-accurate deep neural network
                                  accelerators . . . . . . . . . . . . . . 267--278
                   Yuan Yao and   
                    Zhonghai Lu   Opportunistic competition overhead
                                  reduction for expediting critical
                                  section in NoC based CMPs  . . . . . . . 279--290
                Channoh Kim and   
                Sungmin Kim and   
              Hyeon Gyu Cho and   
               Dooyoung Kim and   
               Jaehyeok Kim and   
                Young H. Oh and   
               Hakbeom Jang and   
                     Jae W. Lee   Short-circuit dispatch: accelerating
                                  virtual machine interpreters on embedded
                                  processors . . . . . . . . . . . . . . . 291--303
           Christoffer Dall and   
                Shih-Wei Li and   
               Jin Tack Lim and   
                 Jason Nieh and   
           Georgios Koloventzos   ARM virtualization: performance and
                                  architectural implications . . . . . . . 304--316
                Jayesh Gaur and   
         Alaa R. Alameldeen and   
           Sreenivas Subramoney   Base-victim compression: an
                                  opportunistic cache compression
                                  architecture . . . . . . . . . . . . . . 317--328
                Jungrae Kim and   
           Michael Sullivan and   
               Esha Choukse and   
                    Mattan Erez   Bit-plane compression: transforming data
                                  for better compression in many-core
                                  architectures  . . . . . . . . . . . . . 329--340
           Prashant J. Nair and   
            Vilas Sridharan and   
           Moinuddin K. Qureshi   XED: exposing on-die error detection
                                  information for strong memory
                                  reliability  . . . . . . . . . . . . . . 341--353
    Mohammad Mejbah ul Alam and   
               Abdullah Muzahid   Production-run software failure
                                  diagnosis via \underlineadaptive
                                  \underlinecommunication
                                  \underlinetracking . . . . . . . . . . . 354--366
               Yu-Hsin Chen and   
                  Joel Emer and   
                   Vivienne Sze   Eyeriss: a spatial architecture for
                                  energy-efficient dataflow for
                                  convolutional neural networks  . . . . . 367--379
               Duckhwan Kim and   
                 Jaeha Kung and   
                   Sek Chai and   
      Sudhakar Yalamanchili and   
            Saibal Mukhopadhyay   Neurocube: a programmable digital
                                  neuromorphic architecture with
                                  high-density $3$D memory . . . . . . . . 380--392
                 Shaoli Liu and   
                  Zidong Du and   
                 Jinhua Tao and   
                   Dong Han and   
                    Tao Luo and   
                   Yuan Xie and   
                 Yunji Chen and   
                   Tianshi Chen   Cambricon: an instruction set
                                  architecture for neural networks . . . . 393--405
              Ziqiang Huang and   
           Andrew D. Hilton and   
                Benjamin C. Lee   Decoupling loads for nano-instruction
                                  set computers  . . . . . . . . . . . . . 406--417
              Timothy Hayes and   
              Oscar Palomar and   
                Osman Unsal and   
             Adrian Cristal and   
                   Mateo Valero   Future vector microprocessor extensions
                                  for data aggregations  . . . . . . . . . 418--430
         Faissal M. Sleiman and   
              Thomas F. Wenisch   Efficiently scaling out-of-order cores
                                  for simultaneous multithreading  . . . . 431--443
              Milad Hashemi and   
                    Khubaib and   
             Eiman Ebrahimi and   
                 Onur Mutlu and   
                   Yale N. Patt   Accelerating dependent cache misses with
                                  an enhanced memory controller  . . . . . 444--455
                Yunqi Zhang and   
              David Meisner and   
                 Jason Mars and   
                   Lingjia Tang   Treadmill: attributing the source of
                                  tail latency through precise load
                                  testing and statistical inference  . . . 456--468
                   Qiang Wu and   
              Qingyuan Deng and   
             Lakshmi Ganesh and   
             Chang-Hong Hsu and   
                    Yun Jin and   
              Sanjeev Kumar and   
                     Bin Li and   
                Justin Meza and   
                  Yee Jiun Song   Dynamo: facebook's data center-wide
                                  power management system  . . . . . . . . 469--480
                    Daniel Wong   Peak efficiency aware scheduling for
                                  highly energy proportional servers . . . 481--492
                    Chao Li and   
               Zhenhua Wang and   
               Xiaofeng Hou and   
               Haopeng Chen and   
              Xiaoyao Liang and   
                      Minyi Guo   Power attack defense: securing
                                  battery-backed data centers  . . . . . . 493--505
                 Mingyu Gao and   
       Christina Delimitrou and   
                  Dimin Niu and   
         Krishna T. Malladi and   
            Hongzhong Zheng and   
                Bob Brennan and   
             Christos Kozyrakis   DRAF: a low-power DRAM-based
                                  reconfigurable acceleration fabric . . . 506--518
               Lunkai Zhang and   
                Brian Neely and   
             Diana Franklin and   
             Dmitri Strukov and   
                   Yuan Xie and   
              Frederic T. Chong   Mellow Writes: extending lifetime in
                                  resistive memories through selective
                                  slow write backs . . . . . . . . . . . . 519--531
                 Yanqi Zhou and   
                David Wentzlaff   MITTS: memory inter-arrival time traffic
                                  shaping  . . . . . . . . . . . . . . . . 532--544
          Joshua San Miguel and   
         Natalie Enright Jerger   The anytime automaton  . . . . . . . . . 545--557
                Siyang Wang and   
              Xiangyu Zhang and   
                  Yuxuan Li and   
            Ramin Bashizade and   
                  Song Yang and   
                Chris Dwyer and   
                Alvin R. Lebeck   Accelerating Markov random field
                                  inference using molecular optical Gibbs
                                  sampling units . . . . . . . . . . . . . 558--569
               Yipeng Huang and   
                   Ning Guo and   
                Mingoo Seok and   
            Yannis Tsividis and   
            Simha Sethumadhavan   Evaluation of an analog accelerator for
                                  linear algebra . . . . . . . . . . . . . 570--582
                   Jin Wang and   
                 Norm Rubin and   
            Albert Sidelnik and   
          Sudhakar Yalamanchili   LaPerm: locality aware scheduler for
                                  dynamic parallelism on GPUs  . . . . . . 583--595
                Sagi Shahar and   
               Shai Bergman and   
               Mark Silberstein   ActivePointers: a case for software
                                  address translation on GPUs  . . . . . . 596--608
             Myung Kuk Yoon and   
                Keunsoo Kim and   
                Sangpil Lee and   
                 Won Woo Ro and   
               Murali Annavaram   Virtual thread: maximizing thread-level
                                  parallelism beyond GPU scheduling limit  609--621
                Jungrae Kim and   
           Michael Sullivan and   
                Sangkug Lym and   
                    Mattan Erez   All-inclusive ECC: thorough end-to-end
                                  protection for reliable computer memory  622--633
                 Henry Duwe and   
                   Xun Jian and   
            Daniel Petrisko and   
                   Rakesh Kumar   Rescuing uncorrectable fault patterns in
                                  on-chip memories through error pattern
                                  transformation . . . . . . . . . . . . . 634--644
               Dong Wan Kim and   
                    Mattan Erez   RelaxFault memory repair . . . . . . . . 645--657
Raghavendra Pradyumna Pothukuchi and   
                Amin Ansari and   
           Petros Voulgaris and   
                Josep Torrellas   Using multiple input, multiple output
                                  formal control to maximize resource
                                  efficiency in architectures  . . . . . . 658--670
            Hari Cherupalli and   
               Rakesh Kumar and   
                   John Sartori   Exploiting dynamic timing slack for
                                  energy efficiency in ultra-low-power
                                  embedded systems . . . . . . . . . . . . 671--681
                 Yanqi Zhou and   
             Henry Hoffmann and   
                David Wentzlaff   CASH: supporting IaaS customers with a
                                  sub-core configurable architecture . . . 682--694
          Mohammad Arjomand and   
         Mahmut T. Kandemir and   
      Anand Sivasubramaniam and   
                   Chita R. Das   Boosting access parallelism to PCM-based
                                  main memory  . . . . . . . . . . . . . . 695--706
             Jayneel Gandhi and   
               Mark D. Hill and   
               Michael M. Swift   Agile paging: exceeding the best of
                                  nested and shadow paging . . . . . . . . 707--718
                Hoseok Seol and   
                Wongyu Shin and   
                Jaemin Jang and   
              Jungwhan Choi and   
               Jinwoong Suh and   
                    Lee-Sup Kim   Energy efficient data encoding in DRAM
                                  channels exploiting data value
                                  similarity . . . . . . . . . . . . . . . 719--730

ACM SIGARCH Computer Architecture News
Volume 44, Number 4, September, 2016

                Jiayi Sheng and   
             Qingqing Xiong and   
                  Chen Yang and   
             Martin C. Herbordt   Collective Communication on FPGA
                                  Clusters with Static Scheduling  . . . . 2--7
             Susumu Mashimo and   
              Thiem Van Chu and   
                     Kenji Kise   Cost-Effective and High-Throughput Merge
                                  Network: Architecture for the Fastest
                                  FPGA Sorting Accelerator . . . . . . . . 8--13
            Cuong Pham-Quoc and   
                Biet Nguyen and   
                Tran Ngoc Thinh   FPGA-based Multicore Architecture for
                                  Integrating Multiple DDoS Defense
                                  Mechanisms . . . . . . . . . . . . . . . 14--19
             Fatemeh Eslami and   
            Steven J. E. Wilton   An Improved Overlay and Mapping
                                  Algorithm Supporting Rapid Triggering
                                  for FPGA Debug . . . . . . . . . . . . . 20--25
           Ryohei Kobayashi and   
            Tomohiro Misono and   
                     Kenji Kise   A High-speed Verilog HDL Simulation
                                  Method using a Lightweight Translator    26--31
               Shohei Sassa and   
             Kenji Kanazawa and   
                Shaowei Cai and   
             Moritoshi Yasunaga   An FPGA Solver for Partial MaxSAT
                                  Problems Based on Stochastic Local
                                  Search . . . . . . . . . . . . . . . . . 32--37
     Ernst Joachim Houtgast and   
             VladMihai Sima and   
               Koen Bertels and   
                     Zaid AlArs   An Efficient GPUAccelerated
                                  Implementation of Genomic Short Read
                                  Mapping with BWAMEM  . . . . . . . . . . 38--43
            Hiroki Nakahara and   
         Hiroyuki Nakanishi and   
              Kazumasa Iwai and   
                  Tsutomu Sasao   An FFT Circuit for a Spectrometer of a
                                  Radio Telescope using the Nested RNS
                                  including the Constant Division  . . . . 44--49
          Vinod Pangracious and   
                Mulhim Al-Doori   Novel Three-Dimensional Embedded FPGA
                                  Technology and Achitecture . . . . . . . 50--55
              Oliver Knodel and   
           Paul R. Genssler and   
              Rainer G. Spallek   Migration of long-running Tasks between
                                  Reconfigurable Resources using
                                  Virtualization . . . . . . . . . . . . . 56--61
                 Jubee Tada and   
             Maiki Hosokawa and   
              Ryusuke Egawa and   
              Hiroaki Kobayashi   Effects of Stacking Granularity on $3$-D
                                  Stacked Floating-point Fused Multiply
                                  Add Units  . . . . . . . . . . . . . . . 62--67
                   Jiang Su and   
              Jianxiong Liu and   
            David B. Thomas and   
             Peter Y. K. Cheung   Neural Network Based Reinforcement
                                  Learning Acceleration on FPGA Platforms  68--73
            Erik H. D'Hollander   High-Level Synthesis Optimization for
                                  Blocked Floating-Point Matrix
                                  Multiplication . . . . . . . . . . . . . 74--79
                Chengzhe Li and   
              Lai Yoong Yee and   
           Hiroshi Maruyama and   
              Yoshiki Yamaguchi   FPGA-based Volleyball Player Tracker . . 80--86
                  Qian Zhao and   
           Motoki Amagasaki and   
              Masahiro Iida and   
              Morihiro Kuga and   
             Toshinori Sueyoshi   A Study of Heterogeneous Computing
                                  Design Method based on Virtualization
                                  Technology . . . . . . . . . . . . . . . 86--91
               Colin Yu Lin and   
            Zhenghong Jiang and   
                   Cheng Fu and   
         Hayden Kwok-Hay So and   
                   Haigang Yang   FPGA High-level Synthesis versus
                                  Overlay: Comparisons on Computation
                                  Kernels  . . . . . . . . . . . . . . . . 92--97

ACM SIGARCH Computer Architecture News
Volume 44, Number 5, December, 2016

               Xusheng Zhan and   
                Yungang Bao and   
           Christian Bienia and   
                         Kai Li   PARSEC3.0: a Multicore Benchmark Suite
                                  with Network Stacks and SPLASH-2X  . . . 1--16

ACM SIGARCH Computer Architecture News
Volume 45, Number 1, March, 2017

                     Yunji Chen   Big Data Analytics and Intelligence at
                                  Alibaba Cloud  . . . . . . . . . . . . . 1--1
            Hari Cherupalli and   
                 Henry Duwe and   
                 Weidong Ye and   
               Rakesh Kumar and   
                   John Sartori   Determining Application-specific Peak
                                  Power and Energy Requirements for
                                  Ultra-low Power Processors . . . . . . . 3--16
                  Quan Chen and   
               Hailong Yang and   
                  Minyi Guo and   
        Ram Srivatsa Kannan and   
                 Jason Mars and   
                   Lingjia Tang   Prophet: Precise QoS Prediction on
                                  Non-Preemptive Accelerators to Improve
                                  Utilization in Warehouse-Scale Computers 17--32
               Svilen Kanev and   
               Sam Likun Xi and   
                Gu-Yeon Wei and   
                   David Brooks   Mallacc: Accelerating Memory Allocation  33--45
                 Shasha Wen and   
              Milind Chabbi and   
                         Xu Liu   REDSPY: Exploring Value Locality in
                                  Software . . . . . . . . . . . . . . . . 47--61
         Abhishek Bhattacharjee   Translation-Triggered Prefetching  . . . 63--76
                Channoh Kim and   
               Jaehyeok Kim and   
                Sungmin Kim and   
               Dooyoung Kim and   
                  Namho Kim and   
                   Gitae Na and   
                Young H. Oh and   
              Hyeon Gyu Cho and   
                     Jae W. Lee   Typed Architectures: Architectural
                                  Support for Lightweight Scripting  . . . 77--90
                  Jihye Seo and   
               Wook-Hee Kim and   
               Woongki Baek and   
               Beomseok Nam and   
                     Sam H. Noh   Failure-Atomic Slotted Paging for
                                  Persistent Memory  . . . . . . . . . . . 91--104
              Donald Nguyen and   
                 Keshav Pingali   What Scalable Programs Need from
                                  Transactional Memory . . . . . . . . . . 105--118
           Caroline Trippel and   
          Yatin A. Manerkar and   
              Daniel Lustig and   
           Michael Pellauer and   
             Margaret Martonosi   TriCheck: Memory Model Verification at
                                  the Trisection of Software, Hardware,
                                  and ISA  . . . . . . . . . . . . . . . . 119--133
              Sanketh Nalli and   
              Swapnil Haria and   
               Mark D. Hill and   
           Michael M. Swift and   
                Haris Volos and   
                Kimberly Keeton   An Analysis of Persistent Memory Use
                                  with WHISPER . . . . . . . . . . . . . . 135--148
                 Tong Zhang and   
              Changhee Jung and   
                   Dongyoon Lee   ProRace: Practical Data Race Detection
                                  for Production Use . . . . . . . . . . . 149--162
              Lena E. Olson and   
               Mark D. Hill and   
                  David A. Wood   Crossing Guard: Mediating
                                  Host-Accelerator Coherence Interactions  163--176
             Joseph McMahan and   
        Michael Christensen and   
             Lawton Nichols and   
               Jared Roesch and   
               Sung-Yee Guo and   
              Ben Hardekopf and   
               Timothy Sherwood   An Architecture Supporting Formal and
                                  Compositional Binary Analysis  . . . . . 177--191
            Chun-Hung Hsiao and   
        Satish Narayanasamy and   
  Essam Muhammad Idris Khan and   
       Cristiano L. Pereira and   
                Gilles A. Pokam   AsyncClock: Scalable Inference of
                                  Asynchronous Event Causality . . . . . . 193--205
               Irina Calciu and   
             Siddhartha Sen and   
        Mahesh Balakrishnan and   
             Marcos K. Aguilera   Black-box Concurrent Data Structures for
                                  NUMA Architectures . . . . . . . . . . . 207--221
                 Keval Vora and   
                  Chen Tian and   
                Rajiv Gupta and   
                       Ziang Hu   CoRAL: Confined Recovery in Distributed
                                  Asynchronous Graph Processing  . . . . . 223--236
                 Keval Vora and   
                Rajiv Gupta and   
                     Guoqing Xu   KickStarter: Fast and Accurate
                                  Computations on Streaming Graphs via
                                  Trimmed Approximations . . . . . . . . . 237--251
               Bobby Powers and   
                  John Vilk and   
                Emery D. Berger   Browsix: Bridging the Gap Between Unix
                                  and the Browser  . . . . . . . . . . . . 253--266
         Samyam Rajbhandari and   
                 Yuxiong He and   
            Olatunji Ruwase and   
             Michael Carbin and   
               Trishul Chilimbi   Optimizing CNNs on Multicores for
                                  Scalability, Performance and Goodput . . 267--280
   Kirshanthan Sundararajah and   
                Laith Sakka and   
                Milind Kulkarni   Locality Transformations for Nested
                                  Recursive Iteration Spaces . . . . . . . 281--295
                     Ang Li and   
         Shuaiwen Leon Song and   
                Weifeng Liu and   
                     Xu Liu and   
                Akash Kumar and   
                 Henk Corporaal   Locality-Aware CTA Clustering for Modern
                                  GPUs . . . . . . . . . . . . . . . . . . 297--311
         Berkeley Churchill and   
               Rahul Sharma and   
                 JF Bastien and   
                     Alex Aiken   Sound Loop Superoptimization for Google
                                  Native Client  . . . . . . . . . . . . . 313--326
              Ricardo Bianchini   Improving Datacenter Efficiency  . . . . 327--327
               Mengxing Liu and   
             Mingxing Zhang and   
                  Kang Chen and   
                Xuehai Qian and   
                 Yongwei Wu and   
               Weimin Zheng and   
                    Jinglei Ren   DudeTM: Building Durable Transactions
                                  with Decoupling for Persistent Memory    329--343
               Ana Klimovic and   
                Heiner Litz and   
             Christos Kozyrakis   ReFlex: Remote Flash $ \approx $ Local
                                  Flash  . . . . . . . . . . . . . . . . . 345--359
            Djordje Jevdjic and   
              Karin Strauss and   
                  Luis Ceze and   
             Henrique S. Malvar   Approximate Storage of Compressed and
                                  Encrypted Videos . . . . . . . . . . . . 361--373
                Nima Elyasi and   
          Mohammad Arjomand and   
      Anand Sivasubramaniam and   
         Mahmut T. Kandemir and   
               Chita R. Das and   
                 Myoungsoo Jung   Exploiting Intra-Request Slack to
                                  Improve SSD Performance  . . . . . . . . 375--388
                   Kai Wang and   
              Aftab Hussain and   
               Zhiqiang Zuo and   
                 Guoqing Xu and   
             Ardalan Amiri Sani   Graspan: a Single-machine Disk-based
                                  Graph System for Interprocedural Static
                                  Analyses of Large-scale Systems Code . . 389--404
                     Ao Ren and   
                     Zhe Li and   
                Caiwen Ding and   
                  Qinru Qiu and   
                Yanzhi Wang and   
                      Ji Li and   
                Xuehai Qian and   
                        Bo Yuan   SC-DCNN: Highly-Scalable Deep
                                  Convolutional Neural Network using
                                  Stochastic Computing . . . . . . . . . . 405--418
                 Jerry Ajay and   
                  Chen Song and   
       Aditya Singh Rathore and   
                   Chi Zhou and   
                      Wenyao Xu   $3$DGates: an Instruction-Level Energy
                                  Analysis and Optimization of $3$D
                                  Printers . . . . . . . . . . . . . . . . 419--433
              Guilherme Cox and   
         Abhishek Bhattacharjee   Efficient Address Translation for
                                  Architectures with Multiple Page Sizes   435--448
              Ilya Lesokhin and   
                Haggai Eran and   
            Shachar Raindel and   
                Guy Shapiro and   
              Sagi Grimberg and   
                 Liran Liss and   
            Muli Ben-Yehuda and   
                 Nadav Amit and   
                    Dan Tsafrir   Page Fault Support for Network
                                  Controllers  . . . . . . . . . . . . . . 449--466
                    Yang Hu and   
              Mingcong Song and   
                         Tao Li   Towards ``Full Containerization'' in
                                  Containerized Network Function
                                  Virtualization . . . . . . . . . . . . . 467--481
                      Bo Wu and   
                     Xu Liu and   
                Xiaobo Zhou and   
                 Changjun Jiang   FLEP: Enabling Flexible and Efficient
                                  Preemption on GPUs . . . . . . . . . . . 483--496
                  Kaiwei Li and   
               Jianfei Chen and   
              Wenguang Chen and   
                        Jun Zhu   SaberLDA: Sparsity-Aware Learning of
                                  Topic Models on GPUs . . . . . . . . . . 497--509
             Moein Khazraee and   
                   Lu Zhang and   
                  Luis Vega and   
         Michael Bedford Taylor   Moonwalk: NRE Optimization in ASIC
                                  Clouds . . . . . . . . . . . . . . . . . 511--526
        Jason Jong Kyu Park and   
               Yongjun Park and   
                   Scott Mahlke   Dynamic Resource Management for
                                  Efficient Utilization of Multitasking
                                  GPUs . . . . . . . . . . . . . . . . . . 527--540
                  Rui Zhang and   
            Natalie Stanley and   
         Christopher Griggs and   
                 Andrew Chi and   
                Cynthia Sturton   Identifying Security Critical Properties
                                  for the Dynamic Verification of a
                                  Processor  . . . . . . . . . . . . . . . 541--554
          Andrew Ferraiuolo and   
                     Rui Xu and   
              Danfeng Zhang and   
            Andrew C. Myers and   
                  G. Edward Suh   Verification of a Practical Hardware
                                  Security Architecture Through Static
                                  Information Flow Analysis  . . . . . . . 555--568
             David Chisnall and   
               Brooks Davis and   
               Khilan Gudka and   
              David Brazdil and   
          Alexandre Joannou and   
          Jonathan Woodruff and   
      A. Theodore Markettos and   
            J. Edward Maste and   
              Robert Norton and   
                 Stacey Son and   
                Michael Roe and   
             Simon W. Moore and   
           Peter G. Neumann and   
                 Ben Laurie and   
            Robert N. M. Watson   CHERI JNI: Sinking the Java Security
                                  Model into the C . . . . . . . . . . . . 569--583
                 Xinyang Ge and   
                Weidong Cui and   
                   Trent Jaeger   GRIFFIN: Guarding Control Flows Using
                                  Intel Processor Trace  . . . . . . . . . 585--598
       Christina Delimitrou and   
             Christos Kozyrakis   Bolt: I Know What You Did Last Summer
                                  \ldots In The Cloud  . . . . . . . . . . 599--613
                Yiping Kang and   
            Johann Hauswald and   
                    Cao Gao and   
            Austin Rovinski and   
               Trevor Mudge and   
                 Jason Mars and   
                   Lingjia Tang   Neurosurgeon: Collaborative Intelligence
                                  Between the Cloud and Mobile Edge  . . . 615--629
               Neha Agarwal and   
              Thomas F. Wenisch   Thermostat: Application-transparent Page
                                  Management for Two-tiered Main Memory    631--644
          Antonio Barbalace and   
              Robert Lyerly and   
   Christopher Jelesnianski and   
              Anthony Carno and   
              Ho-Ren Chuang and   
             Vincent Legout and   
                Binoy Ravindran   Breaking the Boundaries in
                                  Heterogeneous-ISA Datacenters  . . . . . 645--659
              Daniel Lustig and   
              Andrew Wright and   
Alexandros Papakonstantinou and   
                 Olivier Giroux   Automated Synthesis of Comprehensive
                                  Memory Model Litmus Test Suites  . . . . 661--675
                Haopeng Liu and   
                 Guangpu Li and   
          Jeffrey F. Lukman and   
                  Jiaxin Li and   
                    Shan Lu and   
          Haryadi S. Gunawi and   
                      Chen Tian   DCatch: Automatically Detecting
                                  Distributed Concurrency Bugs in Cloud
                                  Systems  . . . . . . . . . . . . . . . . 677--691
Ali José Mashtizadeh and   
              Tal Garfinkel and   
                David Terei and   
             David Mazieres and   
               Mendel Rosenblum   Towards Practical Default-On Multi-Core
                                  Record/Replay  . . . . . . . . . . . . . 693--708
                 Jian Huang and   
         Michael Allen-Bond and   
                  Xuechen Zhang   Pallas: Semantic-Aware Checking for
                                  Finding Deep Bugs in Fast Path . . . . . 709--722
          Jagadish B. Kotra and   
             Narges Shahidi and   
          Zeshan A. Chishti and   
             Mahmut T. Kandemir   Hardware-Software Co-design to Mitigate
                                  DRAM Refresh Overheads: a Case for
                                  Refresh-Aware Process Scheduling . . . . 723--736
                Jinchun Kim and   
               Elvira Teran and   
              Paul V. Gratz and   
   Daniel A. Jiménez and   
            Seth H. Pugsley and   
                Chris Wilkerson   Kill the Program Counter: Reconstructing
                                  Program Behavior in the Processor Cache
                                  Hierarchy  . . . . . . . . . . . . . . . 737--749
                 Mingyu Gao and   
                    Jing Pu and   
                  Xuan Yang and   
              Mark Horowitz and   
             Christos Kozyrakis   TETRIS: Scalable and Efficient Neural
                                  Network Acceleration with $3$D Memory    751--764
                Wonjun Song and   
               Gwangsun Kim and   
             Hyungjoon Jung and   
             Jongwook Chung and   
                Jung Ho Ahn and   
                 Jae W. Lee and   
                       John Kim   History-Based Arbitration for Fairness
                                  in Processor-Interconnect of NUMA
                                  Servers  . . . . . . . . . . . . . . . . 765--777
            Pulkit A. Misra and   
           Jeffrey S. Chase and   
            Johannes Gehrke and   
                Alvin R. Lebeck   Enabling Lightweight Transactions with
                                  Precision Time . . . . . . . . . . . . . 779--794
                   Ming Liu and   
                  Liang Luo and   
               Jacob Nelson and   
                  Luis Ceze and   
       Arvind Krishnamurthy and   
                 Kishore Atreya   IncBricks: Toward In-Network Computation
                                  with an In-Network Cache . . . . . . . . 795--809
              Ismail Akturk and   
               Ulya R. Karpuzcu   AMNESIAC: Amnesic Automatic Computer . . 811--824
                  Yuxin Bai and   
              Victor W. Lee and   
                     Engin Ipek   Voltage Regulator Efficiency Aware Power
                                  Management . . . . . . . . . . . . . . . 825--838

ACM SIGARCH Computer Architecture News
Volume 45, Number 2, May, 2017

           Norman P. Jouppi and   
                Cliff Young and   
              Nishant Patil and   
            David Patterson and   
             Gaurav Agrawal and   
             Raminder Bajwa and   
                Sarah Bates and   
              Suresh Bhatia and   
                  Nan Boden and   
                Al Borchers and   
                 Rick Boyle and   
          Pierre-luc Cantin and   
              Clifford Chao and   
                Chris Clark and   
             Jeremy Coriell and   
                 Mike Daley and   
                   Matt Dau and   
               Jeffrey Dean and   
                   Ben Gelb and   
    Tara Vazir Ghaemmaghami and   
         Rajendra Gottipati and   
            William Gulland and   
             Robert Hagmann and   
              C. Richard Ho and   
               Doug Hogberg and   
                    John Hu and   
               Robert Hundt and   
                   Dan Hurt and   
               Julian Ibarz and   
               Aaron Jaffey and   
              Alek Jaworski and   
           Alexander Kaplan and   
            Harshit Khaitan and   
           Daniel Killebrew and   
                  Andy Koch and   
               Naveen Kumar and   
                 Steve Lacy and   
               James Laudon and   
                  James Law and   
                 Diemthu Le and   
                Chris Leary and   
                Zhuyuan Liu and   
                 Kyle Lucke and   
                Alan Lundin and   
             Gordon MacKean and   
           Adriana Maggiore and   
               Maire Mahony and   
              Kieran Miller and   
            Rahul Nagarajan and   
         Ravi Narayanaswami and   
                     Ray Ni and   
                  Kathy Nix and   
              Thomas Norrie and   
              Mark Omernick and   
         Narayana Penukonda and   
                Andy Phelps and   
              Jonathan Ross and   
                  Matt Ross and   
                 Amir Salek and   
             Emad Samadiani and   
               Chris Severn and   
            Gregory Sizikov and   
            Matthew Snelham and   
                 Jed Souter and   
              Dan Steinberg and   
                 Andy Swing and   
               Mercedes Tan and   
            Gregory Thorson and   
                    Bo Tian and   
                 Horia Toma and   
               Erick Tuttle and   
            Vijay Vasudevan and   
             Richard Walter and   
                Walter Wang and   
                Eric Wilcox and   
                  Doe Hyun Yoon   In-Datacenter Performance Analysis of a
                                  Tensor Processing Unit . . . . . . . . . 1--12
      Swagath Venkataramani and   
              Ashish Ranjan and   
           Subarno Banerjee and   
               Dipankar Das and   
          Sasikanth Avancha and   
          Ashok Jagannathan and   
                 Ajaya Durg and   
          Dheemanth Nagaraj and   
                Bharat Kaul and   
              Pradeep Dubey and   
              Anand Raghunathan   ScaleDeep: a Scalable Compute
                                  Architecture for Learning and Evaluating
                                  Deep Networks  . . . . . . . . . . . . . 13--26
         Angshuman Parashar and   
                 Minsoo Rhu and   
             Anurag Mukkara and   
          Antonio Puglielli and   
     Rangharajan Venkatesan and   
            Brucek Khailany and   
                  Joel Emer and   
         Stephen W. Keckler and   
               William J. Dally   SCNN: an Accelerator for
                                  Compressed-sparse Convolutional Neural
                                  Networks . . . . . . . . . . . . . . . . 27--40
            Hari Cherupalli and   
                 Henry Duwe and   
                 Weidong Ye and   
               Rakesh Kumar and   
                   John Sartori   Bespoke Processors for Applications with
                                  Ultra-low Area and Power Constraints . . 41--54
                Yajing Chen and   
               Shengshuo Lu and   
                   Cheng Fu and   
               David Blaauw and   
     Ronald Dreslinski, Jr. and   
               Trevor Mudge and   
                   Hun-Seok Kim   A Programmable Galois Field Processor
                                  for the Internet of Things . . . . . . . 55--68
                 Aosen Wang and   
               Lizhong Chen and   
                      Wenyao Xu   XPro: a Cross-End Processing
                                  Architecture for Data Analytics in
                                  Wearables  . . . . . . . . . . . . . . . 69--80
                Ofir Weisse and   
           Valeria Bertacco and   
                    Todd Austin   Regaining Lost Cycles with HotCalls: a
                                  Fast Interface for SGX Secure Enclaves   81--93
               Shaizeen Aga and   
            Satish Narayanasamy   InvisiMem: Smart Memory Defenses for
                                  Memory Bus Side Channel  . . . . . . . . 94--106
                  Amro Awad and   
                Yipeng Wang and   
             Deborah Shands and   
                    Yan Solihin   ObfusMem: a Low-Overhead Access
                                  Obfuscation for Trusted Memories . . . . 107--119
       S. Karen Khatamifard and   
               Longfei Wang and   
                   Weize Yu and   
    Selçuk Köse and   
               Ulya R. Karpuzcu   ThermoGater: Thermally-Aware On-Chip
                                  Voltage Regulation . . . . . . . . . . . 120--132
               Hailong Yang and   
                  Quan Chen and   
                 Moeiz Riaz and   
              Zhongzhi Luan and   
               Lingjia Tang and   
                     Jason Mars   PowerChief: Intelligent Power Allocation
                                  for Multi-Stage Applications to Improve
                                  Responsiveness on Power Constrained CMP  133--146
     Gokul Subramanian Ravi and   
               Mikko H. Lipasti   CHARSTAR: Clock Hierarchy Aware Resource
                                  Scaling in Tiled ARchitectures . . . . . 147--160
        Matthew D. Sinclair and   
            Johnathan Alsop and   
                 Sarita V. Adve   Chasing Away RAts: Semantics and
                                  Evaluation for Relaxed Atomics on
                                  Heterogeneous Systems  . . . . . . . . . 161--174
              Seunghee Shin and   
                 James Tuck and   
                    Yan Solihin   Hiding the Long Latency of Persist
                                  Barriers Using Speculative Execution . . 175--186
                Alberto Ros and   
          Trevor E. Carlson and   
              Mehdi Alipour and   
               Stefanos Kaxiras   Non-Speculative Load-Load Reordering in
                                  TSO  . . . . . . . . . . . . . . . . . . 187--200
                 Doowon Lee and   
               Valeria Bertacco   MTraceCheck: Validating
                                  Non-Deterministic Behavior of Memory
                                  Consistency Models in Post-Silicon
                                  Validation . . . . . . . . . . . . . . . 201--213
             Ruohuang Zheng and   
               Michael C. Huang   Redundant Memory Array Architecture for
                                  Efficient Selective Protection . . . . . 214--227
                  Matthew Hicks   Clank: Architectural Support for
                                  Intermittent Computation . . . . . . . . 228--240
         Manolis Kaliorakis and   
        Dimitris Gizopoulos and   
                Ramon Canal and   
               Antonio Gonzalez   MeRLiN: Exploiting Dynamic Instruction
                                  Behavior for Fast and Accurate
                                  Microarchitecture Level Reliability
                                  Assessment . . . . . . . . . . . . . . . 241--254
               Minesh Patel and   
             Jeremie S. Kim and   
                     Onur Mutlu   The Reach Profiler (REAPER): Enabling
                                  the Mitigation of DRAM Retention
                                  Failures via Profiling at Aggressive
                                  Conditions . . . . . . . . . . . . . . . 255--268
              Zhenning Wang and   
                   Jun Yang and   
                Rami Melhem and   
             Bruce Childers and   
               Youtao Zhang and   
                      Minyi Guo   Quality of Service Support for
                                  Fine-Grained Sharing on GPUs . . . . . . 269--281
                   Sui Chen and   
                    Lu Peng and   
                  Samuel Irving   Accelerating GPU Hardware Transactional
                                  Memory with Snapshot Isolation . . . . . 282--294
                   Kai Wang and   
                     Calvin Lin   Decoupled Affine Computation for SIMT
                                  GPUs . . . . . . . . . . . . . . . . . . 295--306
                 Gunjae Koo and   
                   Yunho Oh and   
                 Won Woo Ro and   
               Murali Annavaram   Access Pattern-Aware Cache Management
                                  for Improving Data Utilization in GPU    307--319
            Akhil Arunkumar and   
             Evgeny Bolotin and   
               Benjamin Cho and   
              Ugljesa Milic and   
             Eiman Ebrahimi and   
               Oreste Villa and   
               Aamer Jaleel and   
             Carole-Jean Wu and   
                  David Nellans   MCM-GPU: Multi-Chip-Module GPUs for
                                  Continued Performance Scalability  . . . 320--332
             Alireza Nazari and   
          Nader Sehatbakhsh and   
                Monjur Alam and   
               Alenka Zajic and   
                Milos Prvulovic   EDDIE: EM-Based Detection of Deviations
                                  in Program Execution . . . . . . . . . . 333--346
                Mengjia Yan and   
         Bhargava Gopireddy and   
               Thomas Shull and   
                Josep Torrellas   Secure Hierarchy-Aware Cache Replacement
                                  Policy (SHARP): Defending Against
                                  Cache-Based Side Channel Atacks  . . . . 347--360
               Zhaoxia Deng and   
              Ariel Feldman and   
            Stuart A. Kurtz and   
              Frederic T. Chong   Lemonade from Lemons: Harnessing Device
                                  Wearout to Create Limited-Use Security
                                  Architectures  . . . . . . . . . . . . . 361--374
  Muhammad Shoaib Bin Altaf and   
                  David A. Wood   LogCA: a High-Level Performance Model
                                  for Hardware Accelerators  . . . . . . . 375--388
            Raghu Prabhakar and   
                 Yaqi Zhang and   
           David Koeplinger and   
               Matt Feldman and   
                  Tian Zhao and   
              Stefan Hadjis and   
             Ardavan Pedram and   
         Christos Kozyrakis and   
                 Kunle Olukotun   Plasticine: a Reconfigurable
                                  Architecture For Parallel Paterns  . . . 389--402
                 Jaeha Kung and   
                   Yun Long and   
               Duckhwan Kim and   
            Saibal Mukhopadhyay   A Programmable Hardware Accelerator for
                                  Simulating Dynamical Systems . . . . . . 403--415
              Tony Nowatzki and   
            Vinay Gangadhar and   
            Newsha Ardalani and   
      Karthikeyan Sankaralingam   Stream-Dataflow Acceleration . . . . . . 416--429
                     Zi Yan and   
   Ján Veselý and   
              Guilherme Cox and   
         Abhishek Bhattacharjee   Hardware Translation Coherence for
                                  Virtualized Systems  . . . . . . . . . . 430--443
            Chang Hyun Park and   
               Taekyung Heo and   
                Jungi Jeong and   
                    Jaehyuk Huh   Hybrid TLB Coalescing: Improving TLB
                                  Translation Coverage under Diverse
                                  Fragmented Memory Allocations  . . . . . 444--456
                 Hanna Alam and   
              Tianhao Zhang and   
                Mattan Erez and   
                    Yoav Etsion   Do-It-Yourself Virtual Memory
                                  Translation  . . . . . . . . . . . . . . 457--468
                Jee Ho Ryoo and   
             Nagendra Gulur and   
                Shuang Song and   
                   Lizy K. John   Rethinking TLB Designs in Virtualized
                                  Environments: a Very Large
                                  Part-of-Memory TLB . . . . . . . . . . . 469--480
             Aasheesh Kolli and   
              Vaibhav Gogte and   
                  Ali Saidi and   
       Stephan Diestelhorst and   
              Peter M. Chen and   
        Satish Narayanasamy and   
              Thomas F. Wenisch   Language-level persistency . . . . . . . 481--493
                  Jiho Choi and   
               Thomas Shull and   
          Maria J. Garzaran and   
                Josep Torrellas   ShortCut: Architectural Support for Fast
                                  Object Access in Scripting Languages . . 494--506
               Dibakar Gope and   
           David J. Schlais and   
               Mikko H. Lipasti   Architectural Support for Server-Side
                                  PHP Processing . . . . . . . . . . . . . 507--520
            Sudarsun Kannan and   
            Ada Gavrilovska and   
               Vishal Gupta and   
                 Karsten Schwan   HeteroOS: OS Design for Heterogeneous
                                  Memory Management in Datacenter  . . . . 521--534
              Yongming Shen and   
            Michael Ferdman and   
                   Peter Milder   Maximizing CNN Accelerator Efficiency
                                  Through Resource Partitioning  . . . . . 535--547
                  Jiecao Yu and   
            Andrew Lukefahr and   
            David Palframan and   
              Ganesh Dasika and   
             Reetuparna Das and   
                   Scott Mahlke   Scalpel: Customizing DNN Pruning to the
                                  Underlying Hardware Parallelism  . . . . 548--560
          Christopher De Sa and   
            Matthew Feldman and   
      Christopher Ré and   
                 Kunle Olukotun   Understanding and Optimizing
                                  Asynchronous Low-Precision Stochastic
                                  Gradient Descent . . . . . . . . . . . . 561--574
                 Zhaoshi Li and   
                  Leibo Liu and   
              Yangdong Deng and   
                 Shouyi Yin and   
                   Yao Wang and   
                    Shaojun Wei   Aggressive Pipelining of Irregular
                                  Applications on Reconfigurable Hardware  575--586
        Suvinay Subramanian and   
            Mark C. Jeffrey and   
           Maleen Abeydeera and   
             Hyun Ryong Lee and   
             Victor A. Ying and   
                  Joel Emer and   
                 Daniel Sanchez   Fractal: an Execution Model for
                                  Fine-Grain Nested Speculative
                                  Parallelism  . . . . . . . . . . . . . . 587--599
          Arun Subramaniyan and   
                 Reetuparna Das   Parallel Automata Processor  . . . . . . 600--612
               Rajat Kateja and   
              Anirudh Badam and   
            Sriram Govindan and   
              Bikash Sharma and   
                    Greg Ganger   Viyojit: Decoupling Battery and DRAM
                                  Capacities for Battery-Backed DRAM . . . 613--626
               Vinson Young and   
           Prashant J. Nair and   
           Moinuddin K. Qureshi   DICE: Compressing DRAM Caches for
                                  Bandwidth and Capacity . . . . . . . . . 627--638
              Mario Drumond and   
          Alexandros Daglis and   
           Nooshin Mirzadeh and   
           Dmitrii Ustiugov and   
             Javier Picorel and   
              Babak Falsafi and   
                 Boris Grot and   
        Dionisios Pnevmatikatos   The Mondrian Data Engine . . . . . . . . 639--651
                 Po-An Tsai and   
            Nathan Beckmann and   
                 Daniel Sanchez   Jenga: Software-Defined Cache
                                  Hierarchies  . . . . . . . . . . . . . . 652--665
             Rahul Boyapati and   
                Jiayi Huang and   
            Pritam Majumder and   
                Ki Hwan Yum and   
                   Eun Jung Kim   APPROX-NoC: a Data Approximation
                                  Framework for Network-On-Chip
                                  Architectures  . . . . . . . . . . . . . 666--677
            Matthew Poremba and   
                 Itir Akgun and   
                Jieming Yin and   
               Onur Kayiran and   
                   Yuan Xie and   
                 Gabriel H. Loh   There and Back Again: Optimizing the
                                  Interconnect in Networks of Memory Cubes 678--690
                Binzhang Fu and   
                       John Kim   Footprint: Regulating Routing
                                  Adaptiveness in Networks-on-Chip . . . . 691--702
          Masoumeh Ebrahimi and   
             Masoud Daneshtalab   EbDa: a New Theory on Design and
                                  Verification of Deadlock-free
                                  Interconnection Networks . . . . . . . . 703--715