Last update:
Tue Oct 1 07:12:31 MDT 2024
Caxton C. Foster A review of dynamic memories with
enhanced data access by Harold S. Stone.
IEEETC Vol. C-21, #4, p 359--386, April
1972 . . . . . . . . . . . . . . . . . . 3--7
M. Bataille Something old: the Gamma 60 the computer
that was ahead of its time . . . . . . . 10--15
Caxton C. Foster Something new: the Intel MCS-4 micro
computer set . . . . . . . . . . . . . . 16--17
J. A. N. Lee My next compiler . . . . . . . . . . . . 17--19
Michael J. Flynn and
Mrs. Carol Rogers Computer architecture at Johns Hopkins 21--33
R. F. Vaughan and
R. A. Collins On computer architecture, software
portability & microprogramming . . . . . 14--15
James C. Brakefield An optimal floating point format . . . . 16--17
J. E. Brewer Recent doctoral dissertations of
interest to SIGARCH . . . . . . . . . . 18--20
C. W. Bettcher Thread standardization and relative cost 9--9
Richard L. Sites Floating point significance interrupt
proposal . . . . . . . . . . . . . . . . 10--12
Caxton Foster Computer architecture . . . . . . . . . 13--18
Louis S. Adler A mini-computer configuration for CAI: a
systems engineering view . . . . . . . . 10--19
W. M. Gentleman and
B. A. Wichmann Timing on computers . . . . . . . . . . 20--23
Karl Schank Architectural assistance to software
debugging aids . . . . . . . . . . . . . 37--38
Dileep P. Bhandarkar and
Samuel H. Fuller Markov chain models for analyzing memory
interference in multiprocessor computer
systems . . . . . . . . . . . . . . . . 1--6
George A. Anderson Interconnecting a distributed processor
system for avionics . . . . . . . . . . 11--16
L. Rodney Goke and
G. J. Lipovski Banyan networks for partitioning
multiprocessor systems . . . . . . . . . 21--28
Harry F. Jordan and
Burton J. Smith Structure of digital system description
languages . . . . . . . . . . . . . . . 31--34
John A. N. Lee VDL---a definition system for all levels 41--48
Charles H. Radoy and
George P. Copeland, Jr. and
G. J. Lipovski A methodology for parallel processing
design tradeoffs . . . . . . . . . . . . 51--56
S. F. Reddaway DAP---a distributed array processor . . 61--65
Peter M. Kogge Maximal rate pipelined solutions to
recurrence problems . . . . . . . . . . 71--76
Tilak Agerwala and
Mike Flynn Comments on capabilities, limitations
and ``correctness'' of Petri nets . . . 81--86
Wayne E. Omohundro and
James H. Tracey Flowware---a flow charting procedure to
describe digital networks . . . . . . . 91--97
Mario R. Barbacci and
Daniel P. Siewiorek Automated exploration of the design
space for register transfer (RT) systems 101--106
T. A. Laliotis Implementation aspects of the symbol
hardware compiler . . . . . . . . . . . 111--115
George P. Copeland, Jr. and
G. J. Lipovski and
Stanley Y. W. Su The architecture of CASSM: a cellular
system for non-numeric processing . . . 121--128
John M. Hemphill and
S. A. Szygenda Deriving design guidelines for
diagnosable computer systems . . . . . . 131--135
Behrooz Parhami and
Algirdas Avizienis Design of fault-tolerant associative
processors . . . . . . . . . . . . . . . 141--145
M. A. Fischler and
O. Firschein A fault tolerant multiprocessor
architecture for real-time control
applications . . . . . . . . . . . . . . 151--157
G. J. Lipovski A varistructured fail-soft cellular
computer . . . . . . . . . . . . . . . . 161--165
Jean Vaucher and
Christian Rey A hardware laboratory for computer
architecture research . . . . . . . . . 171--175
P. J. Knoke Simulation exercises for computer
architecture education . . . . . . . . . 181--185
M. E. Sloan Computer architecture courses in
electrical engineering departments . . . 191--195
R. Hartenstein Increasing hardware complexity---a
challenge to computer architecture
education . . . . . . . . . . . . . . . 201--206
George Rossmann Review of the \em Workshop on Computer
Architecture Education . . . . . . . . . 211--214
Richard G. Cooper Micromodules: Microprogrammable building
blocks for hardware development . . . . 221--226
S. H. Fuller and
D. P. Siewiorek and
R. J. Swan Computer Modules: an architecture for
large digital modules . . . . . . . . . 231--237
Rodnay Zaks A microprogrammed architecture for front
end processing . . . . . . . . . . . . . 241--246
Z. G. Vranesic and
V. C. Hamacher and
Y. Y. Leung Design of a fully variable-length
structured minicomputer . . . . . . . . 251--255
Orin E. Marvel Happe Honeywell Associative Parallel
Processing Ensemble . . . . . . . . . . 261--267
Mario R. Schaffner A computer architecture and its
programming language . . . . . . . . . . 271--277
John Shore Conjecture corner . . . . . . . . . . . 3--6
W. M. McKeeman Computer design evaluation using
programming language primitives . . . . 7--18
Reiner W. Hartenstein Letter to membership from incoming
chairman (CAN, Oct. 73) . . . . . . . . 19--22
David Stryker and
David Weiss Secure system architecture . . . . . . . 37--38
Stephen Y. H. Su Book review of \em Logic and Logic
Design by B. Girling and H. G. Morning.
International Textbook Company Limited
1973 . . . . . . . . . . . . . . . . . . 2--3
John Shore Conjecture corner . . . . . . . . . . . 4--9
L. Nisnevich and
E. Strasbourger Decentralized priority control in data
communication . . . . . . . . . . . . . 1--6
Cecil C. Reames and
Ming T. Liu A loop network for simultaneous
transmission of variable-length messages 7--12
James F. Callan The architecture of the Picture System 13--16
John Staudhammer and
Jeffrey F. Eastman and
James N. England A fast display-oriented processor . . . 17--22
Jeffrey F. Eastman and
John Staudhammer Computer display of colored
three-dimensional objects . . . . . . . 23--27
Henry D. Kerr A microprogrammed processor for
interactive computer graphics . . . . . 28--33
C. V. W. Armstrong Functional memory techniques applied to
the microprogrammed control of an
associative processor . . . . . . . . . 34--40
James F. Wade and
Paul D. Stigall Instruction design to minimize program
size . . . . . . . . . . . . . . . . . . 41--44
James O. Bondi and
Paul D. Stigall HMO, a hardware microcode optimizer . . 45--51
A. M. Peskin The computer aided design of processor
architectures . . . . . . . . . . . . . 51--55
W. H. Huen and
D. P. Siewiorek Intermodule protocol for register
transfer level modules: representation
and analytic tools . . . . . . . . . . . 56--62
Portia Isaacson Picture systems, PS, and the design of a
channel-to-channel computer interface 63--70
Lennart Löfgren Reference concepts in a tree structured
address space . . . . . . . . . . . . . 71--79
Judith A. Anderson and
G. J. Lipovski A virtual memory for microprocessors . . 80--84
R. E. Brundage and
A. P. Batson The performance enhancement of
descriptor-based virtual memory systems
through the use of associative registers 85--90
Orin E. Marvel SPEAC: special purpose electronic area
correlator . . . . . . . . . . . . . . . 91--94
James M. Satterfield Architectural advances of the space
shuttle orbiter avionics computer system 95--98
Uno R. Kodres and
William L. McCracken Design study of an avionics navigation
microcomputer . . . . . . . . . . . . . 99--105
Gerald R. Kane An iteratively structured information
processor . . . . . . . . . . . . . . . 106--112
H. Richards, Jr. and
A. E. Oldehoeft Hardware-software interactions in
SYMBOL-2R's operating system . . . . . . 113--118
Pierre Sylvain and
Maniel Vineberg The design and evaluation of the array
machine: a high-level language processor 119--125
Jack B. Dennis and
David P. Misunas A preliminary architecture for a basic
data-flow processor . . . . . . . . . . 126--132
K. J. Berkling Reduction languages for reduction
machines . . . . . . . . . . . . . . . . 133--140
Willis K. King and
Fulvio Carbonaro Output devices sharing by minicomputers 141--145
S. Rannem and
V. C. Hamacher and
S. G. Zaky and
P. Connolly On relating small computer performance
to design parameters . . . . . . . . . . 146--151
Harold W. Lawson, Jr. and
Bengt Magnhagen Advantages of structured hardware . . . 152--158
Peter Kornerup Concepts of the MATHILDA system . . . . 159--164
Caxton C. Foster SOCRATES . . . . . . . . . . . . . . . . 165--169
Donald F. Wann and
Robert A. Ellis Conjoined computer systems: an
architecture for laboratory data
processing and instrument control . . . 170--175
E. Douglas Jensen A distributed function computer for
real-time control . . . . . . . . . . . 176--182
C. H. Radoy and
G. J. Lipovski Switched multiple instruction, multiple
data stream processing . . . . . . . . . 183--187
Robert J. Lechner Sequentially encoded data structures
that support bidirectional scanning . . 188--194
Martin Freeman An instruction class for an extensible
interpreter . . . . . . . . . . . . . . 195--200
W. K. Giloi and
H. Berg STARLET: a computer concept based on
ordered sets as primitive data types . . 201--206
R. G. Cornell and
H. C. Torng A cellular general purpose computer . . 207--213
Barry C. Goldstein and
Thomas W. Scrutchin A machine-oriented resource management
architecture . . . . . . . . . . . . . . 214--219
M. E. Sloan A design-oriented computer engineering
program . . . . . . . . . . . . . . . . 220--224
Janis Beitch Baron and
D. E. Atkins An educational laboratory in
contemporary digital design . . . . . . 225--231
W. R. Smith AADC computer family architecture
program . . . . . . . . . . . . . . . . 4--8
Åmund Lunde More data on the O/W ratios: a note on a
paper by Flynn . . . . . . . . . . . . . 9--13
G. Jack Lipovski and
Stanley Y. W. and
Sr On non-numeric architecture . . . . . . 14--29
Guy. G. Boulaye Structured design for structured
computer architecture . . . . . . . . . 8--17
D. L. Parnas Evaluation criteria for abstract
machines with unknown applications . . . 2--9
William R. Smith AADC computer family architecture
questions and answers . . . . . . . . . 15--21
Stephen Y. H. Su An introduction to CHDL (computer
hardware description languages) . . . . 22--23
R. W. Doran The International Computers Ltd. ICL2900
computer architecture . . . . . . . . . 24--47
Gordon Bell and
William D. Strecker Computer structures: What have we
learned from the PDP-11? . . . . . . . . 1--14
Helmut Kerner and
Werner Beyerle A PMS level language for performance
evaluation modelling (V-PMS) . . . . . . 15--19
M. Moalla and
G. Saucier and
J. Sifakis and
M. Zachariades A design tool for the multilevel
description and simulation of systems of
interconnected modules . . . . . . . . . 20--27
Jonathan Allen A course in computer structures . . . . 28--32
George E. Rossmann The IEEE Computer Society task force on
computer architecture . . . . . . . . . 33--33
Lawrence C. Widdoes, Jr. The Minerva multi-microprocessor . . . . 34--39
R. G. Arnold and
E. W. Page A hierarchical, restructurable
multi-microprocessor architecture . . . 40--45
Robert McGill and
John Steinhoff A multimicroprocessor approach to
numerical analysis: An application to
gaming problems . . . . . . . . . . . . 46--51
John E. Jensen and
Jean-Loup Baer A model of interference in a shared
resource multiprocessor . . . . . . . . 52--57
Clement K. C. Leung and
David P. Misunas and
Andrij Neczwid and
Jack B. Dennis A computer simulation facility for
packet communication architecture . . . 58--63
S. L. Rege Cost, performance and size tradeoffs for
different levels in a memory hierarchy 64--67
Paul E. Dworak and
Alice C. Parker An input interface for a real-time
digital sound generation system . . . . 68--73
Michael C. Mulder and
Patrick P. Fasang A microprocessor oriented data
acquisition and control system for power
system control . . . . . . . . . . . . . 74--78
H. M. Gladney and
G. Hochweller Multiprogramming for real-time
applications . . . . . . . . . . . . . . 79--85
Theodore H. Kehl Basil architecture --- an HLL
minicomputer . . . . . . . . . . . . . . 86--92
Harold W. Lawson, Jr. Function distribution in computer system
architectures . . . . . . . . . . . . . 93--97
Chris A. Vissers Interface, a dispersed architecture . . 98--104
A. Thomasian and
A. Avizienis A design study of a shared resource
computing system . . . . . . . . . . . . 105--112
W. S. Ford and
V. C. Hamacher Hardware support for inter-process
communication and processor sharing . . 113--118
Ulrich Trambacz and
Georg Hyla A taxonomy of display processors . . . . 119--120
W. E. Kluge Traversing binary tree structures with
shift register memories (recent results) 121.1--121.1
Eduardo B. Fernandez and
Rita C. Summers and
Charles D. Coleman Architectural support for system
protection (recent results) . . . . . . 121.2--121.2
James W. Gault and
Alice C. Parker The design of a user-programmable
digital interface (recent results) . . . 121.3--121.3
Serge Fournier and
Ming T. Liu System design of a grammar-programmable
high-level language machine . . . . . . 122.4--122.4
Ch. Kuznia and
R. Kober and
H. Kopp SMS 101 --- a structured multi
microprocessor system with deadlock-free
operation scheme . . . . . . . . . . . . 122.5--122.5
Philip S. Liu and
Frederic J. Mowle Selection schemes for dynamically
microcoding Fortran programs . . . . . . 122.6--122.6
S. H. Fuller and
D. P. Siewiorek and
R. J. Swan The design of a multi-micro-computer
system . . . . . . . . . . . . . . . . . 123--123
Cecil C. Reames and
Ming T. Liu Design and simulation of the distributed
loop computer network (DLCN) . . . . . . 124--129
Paolo Franchi Distribution of functions and control in
RPCNET . . . . . . . . . . . . . . . . . 130--135
Larry D. Wittie Efficient message routing in
Mega-Micro-Computer networks . . . . . . 136--140
Terry A. Welch An investigation of descriptor oriented
architecture . . . . . . . . . . . . . . 141--146
E. A. Feustel Tagged architecture and the semantics of
programming languages: Extensible types 147--150
A. P. Batson and
R. E. Brundage and
J. P. Kearns Design data for Algol-60 machines . . . 151--154
William D. Strecker Cache memories for PDP-11 family
computers . . . . . . . . . . . . . . . 155--158
Janak H. Patel and
Edward S. Davidson Improving the throughput of a pipeline
by insertion of delays . . . . . . . . . 159--164
A. M. Abd-Alla and
Laird H. Moffett On-line architecture tuning using
microcapture . . . . . . . . . . . . . . 165--171
Leonard D. Healy A character-oriented context-addressed
segment-sequential storage . . . . . . . 172--177
J. A. Bush and
G. J. Lipovski and
S. Y. W. su and
J. K. Watson and
S. J. Ackerman Some implementations of segment
sequential functions . . . . . . . . . . 178--185
Manlio DeMartinis and
G. Jack Lipovski and
Stanley Y. W. Su and
J. K. Watson A Self Managing Secondary Memory system 186--194
Samuel H. Fuller Price/performance comparison of C.mmp
and the PDP-10 . . . . . . . . . . . . . 195--202
Lars-Erik Thorelli Representation of arrays in computers 6--9
Helmut Berndt Evolutionary computer architecture: the
Unidata 7.000 series . . . . . . . . . . 10--16
Jack B. Dennis Computer architecture and the cost of
software . . . . . . . . . . . . . . . . 17--21
George Lindamood On navel contemplation and the art of
computer maintenance . . . . . . . . . . 22--23
S. H. Fuller and
G. A. Mathew Implementing microprogram storage with
PLA's . . . . . . . . . . . . . . . . . 6--11
D. R. Hicks A generalized queue scheme for process
synchronization and communication . . . 12--14
Glen G. Langdon Book reviews: Review of \em Introduction
to Computer Architecture by Harold S.
Stone . . . . . . . . . . . . . . . . . 17--19
Kenneth J. Thurber ARPS: a new real-time computer . . . . . 6--16
Alan B. Salisbury MCF: a military computer family for
computer-based systems . . . . . . . . . 17--20
Frederic N. Ris A unified decimal floating-point
architecture for the support of
high-level languages . . . . . . . . . . 21--31
G. Jack Lipovski A question of style . . . . . . . . . . 32--38
G. Chroust Data interfaces versus control
interfaces: a half-baked conjecture . . 39--40
Glen G. Langdon Considerations on the ``figure of
merit'' technique for storage hierarchy
design . . . . . . . . . . . . . . . . . 25--28
Edward F. Miller Book Reviews: Review of \em High-Level
Language Computer Architecture by Yaohan
Chu. Academic Press, New York, 1975 . . 29--29
Yaohan Chu Architecture of a hardware data
interpreter . . . . . . . . . . . . . . 1--9
Subrata Dasgupta The design of some language constructs
for horizontal microprogramming . . . . 10--16
E. Douglas Jensen and
Richard Y. Kain The Honeywell Modular Microprogram
Machine: M3 . . . . . . . . . . . . . . 17--28
Richard R. Ramseyer and
Andries van Dam A multi-microprocessor implementation of
a general purpose pipelined CPU . . . . 29--34
C. V. Ravi and
Torben Moller A hierarchical microcomputer system for
hardware and software development . . . 35--40
J. Archer Harris and
David R. Smith Hierarchical multiprocessor
organizations . . . . . . . . . . . . . 41--48
K. Hurakami and
S. Nishikawa and
M. Sato Poly-Processor System analysis and
design . . . . . . . . . . . . . . . . . 49--56
Guy Mazare A few examples of how to use a
symmetrical multi-micro-processor . . . 57--62
Peter M. Kogge The microprogramming of pipelined
processors . . . . . . . . . . . . . . . 63--69
Howard Jay Siegel The universality of various types of
SIMD machine interconnection networks 70--79
Ramakrishna B. Rau and
George E. Rossmann The effect of instruction fetch
strategies upon the performance of
pipelined instruction units . . . . . . 80--89
S. R. Ahuja and
J. R. Jump A modular memory scheme for array
processing . . . . . . . . . . . . . . . 90--94
Leonard S. Haynes The architecture of an ALGOL 60 computer
implemented with distributed processors 95--104
Herbert Sullivan and
T. R. Bashkow A large scale, homogeneous, fully
distributed parallel machine, I . . . . 105--117
Herbert Sullivan and
Theodore R. Bashkow and
David Klappholz A Large Scale, Homogeneous, Fully
Distributed Parallel Machine, II . . . . 118--124
G. Jack Lipovski On virtual memories and micronetworks 125--134
Jon C. Strauss and
Kenneth J. Thurber Considerations for new tactical computer
systems . . . . . . . . . . . . . . . . 135--140
Kenneth J. Thurber and
Peter C. Patton and
Robert C. Deward and
Jon C. Strauss and
Thomas W. Petschauer An advanced tactical computer concept 141--146
Gary J. Nutt Microprocessor implementation of a
parallel processor . . . . . . . . . . . 147--152
Paul Dworak and
Alice C. Parker and
Richard Blum The design and implementation of a
real-time sound generation system . . . 153--158
A. C. Parker and
A. W. Nagle Hardware/software tradeoffs in a
variable word width, variable queue
length buffer memory . . . . . . . . . . 159--164
Bernard L. Peuto and
Leonard J. Shustek An instruction timing model of CPU
performance . . . . . . . . . . . . . . 165--178
Cornelis H. Hoogendoorn Reduction of memory interference in
multiprocessor systems . . . . . . . . . 179--183
D. W. Hammerstrom and
E. S. Davidson Information content of CPU memory
referencing behavior . . . . . . . . . . 184--192
Ming T. Liu and
Cecil C. Reames Message communication protocol and
operating system design for the
Distributed Loop Computer Network (DLCN) 193--200
G. H. Poujoulat Architecture of the CORAIL building
block system . . . . . . . . . . . . . . 201--204
H. L. Tredennick and
T. A. Welch High-speed buffering for variable length
operands . . . . . . . . . . . . . . . . 205--210
Rod Steel Another general purpose computer
architecture . . . . . . . . . . . . . . 5--11
George E. Lindamood What's in a name? . . . . . . . . . . . 12--14
Conrad Schneiker The microprocessors of the future . . . 15--16
Edward F. Miller, Jr. Book review: Review of \em Large-Scale
Computer Architecture: Parallel and
Associative Processors by Kenneth J.
Thurber, Hayden Book Company, Rochelle
Park, New Jersey 1976 . . . . . . . . . 17--17
William M. Conner and
Edward R. Dirling Input/Output considerations in
look-ahead processing . . . . . . . . . 7--12
Robert F. Rosin The significance of microprogramming . . 14--19
Mario J. Gonzalez Book review: Review of \em
Microprogramming Primer by Harry Katzan,
Jr., McGraw-Hill 1977 . . . . . . . . . 29--30
Maniel Vineberg Implementation of character string
pattern matching on a multiprocessor . . 1--7
R. M. Bird and
J. C. Tu and
R. M. Worthy Associative/parallel processors for
searching very large textual data bases 8--9
G. J. Lipovski On imaginary fields, token transfers and
floating codes in intelligent secondary
memories . . . . . . . . . . . . . . . . 17--22
S. G. Zaky Microprocessors for non-numeric
processing . . . . . . . . . . . . . . . 23--30
David K. Hsiao and
Krishnamurthi Kannan The architecture of a database computer
--- a summary . . . . . . . . . . . . . 31--33
Robert S. Rosenthal The data management machine, a
classification . . . . . . . . . . . . . 35--39
Ken J. McDonell Trends in non-software support for
input-output functions . . . . . . . . . 40--47
R. Cerretti and
D. Jasilli and
D. R. Matteucci Ulisse: An Italian project for a
multifunctional terminal system . . . . 48--50
Olin H. Bray Data management requirements: The
similarity of memory management,
database systems, and message processing 68--76
Barry M. Landson and
Robert G. Sargent A comparison of sequential and associate
computing of priority queues . . . . . . 77--78
Glenford J. Myers The case against stack-oriented
instruction sets . . . . . . . . . . . . 7--10
Andrew S. Tanenbaum Ambiguous machine architecture and
program efficiency . . . . . . . . . . . 11--13
D. R. Hicks Microprogramming with a
content-addressable read-only-memory . . 14--15
D. R. Hicks Multitasking as a program structuring
primitive . . . . . . . . . . . . . . . 16--18
G. Chroust Book reviews: Review of \em Digital
System Implementation by Gerrit A.
Blaauw, Prentice Hall, Series in
Automatic Computation 1976 . . . . . . . 27--28
R. A. Hagan and
C. S. Wallace A virtual memory system for the Hewlett
Packard 2100A . . . . . . . . . . . . . 5--13
Forest Baskett More on microprocessors of the future 14--17
Yaohan Chu Direct-execution computer architecture 18--23
Peter U. Schulthess and
Eduard P. Mumprecht Reply to the case against stack-oriented
instruction sets . . . . . . . . . . . . 24--27
John B. Mountain and
Philip H. Enslow Application of the military computer
family architecture selection criteria
to the PR1ME P400 . . . . . . . . . . . 3--17
G. Jack Lipovski Just a few more words on microprocessors
of the future . . . . . . . . . . . . . 18--21
J. L. Keedy On the use of stacks in the evaluation
of expressions . . . . . . . . . . . . . 22--28
Andrew S. Tanenbaum Review of \em Processor Architecture by
S. H. Lavington, NCC Publications,
Manchester 1976 . . . . . . . . . . . . 31--31
A. E. Whiteside Book reviews: Review of \em The
Architecture of Concurrent Programs by
Per Brinch Hansen, Prentice-Hall 1977 32--32
Dileep P. Bhandarkar and
J. Egil Juliussen Semiconductor technology: trends and
implications . . . . . . . . . . . . . . 4--14
A. J. Payne A computer console design to help the
operator . . . . . . . . . . . . . . . . 15--22
Daniel R. McGlynn Review of \em Content Addressable
Parallel Processors by Caxton C. Foster.
Van Nostrand Reinhold Co. 1976 . . . . . 23--23
C. V. Ramamoorthy Review of \em Structured Computer
Organization by Andrew S. Tanenbaum,
Prentice-Hall 1976 . . . . . . . . . . . 23--23
W. Buchholz Review of \em Computer System
Architecture by M. Morris Mano,
Prentice-Hall 1976 . . . . . . . . . . . 24--24
Z. G. Vranesic Book reviews: Review of \em Content
Addressable Parallel Processors by
Caxton C. Foster, Van Nostrand Reinhold
Co. 1976 . . . . . . . . . . . . . . . . 24--24
R. R. Korfhage and
W. H. E. Day and
L. L. Beck and
W. F. Appelbe Data physics: an unorthodox view of data
and its implications in data processors 1--7
George P. Copeland String storage and searching for data
base applications: implementation on the
INDY backend kernel . . . . . . . . . . 8--17
Allen J. Otis and
George P. Copeland Editing requirements for data base
applications and their implementation on
the INDY backend kernel . . . . . . . . 18--29
G. Jack Lipovski Semantic paging on intelligent discs . . 30--34
Rhon Williams A multiprocessing system for the direct
execution of LISP . . . . . . . . . . . 35--41
R. M. Bird and
J. B. Newsbaum and
J. L. Trefftzs Text file inversion: an evaluation . . . 42--50
David C. Roberts A specialized computer architecture for
text retrieval . . . . . . . . . . . . . 51--59
M. J. Stucki and
J. R. Cox and
G. C. Roman and
P. N. Turcu Coordinating concurrent access in a
distributed database architecture . . . 60--64
Mohamed G. Gouda A hierarchical controller for concurrent
accessing of distributed databases . . . 65--70
Bezalel Gavish and
Harvey Koch An extensible architecture for data flow
processing . . . . . . . . . . . . . . . 71--76
J. B. Harvill Functional parallelism in an operand
state saving computer . . . . . . . . . 77--84
J. S. Hutchison and
W. G. Roman Madman machine . . . . . . . . . . . . . 85--90
Jayanta Banerjee and
David K. Hsiao The use of a database machine for
supporting relational databases . . . . 91--98
Paul J. Sadowski and
S. A. Schuster Exploiting parallelism in a Relational
Associative Processor . . . . . . . . . 99--109
Hsu Chang Bubbles for relational database . . . . 110--116
A. El Masri and
J. Rohmer and
D. Tusera A machine for information retrieval . . 117--120
Dante R. Matteucci A distributed structure for the
automization of the Catalog of the
National Cultural Heritage: experiences
and proposals . . . . . . . . . . . . . 121--133
Kenneth J. Thurber Computer communication techniques . . . 7--16
Hal W. Jennings A variation on the PDP 11 . . . . . . . 17--26
Per Brinch Hansen Multiprocessor architectures for
concurrent programs . . . . . . . . . . 4--23
J. L. Keedy On the evaluation of expressions using
accumulators, stacks and store-to-store
instructions . . . . . . . . . . . . . . 24--27
Rahul Chattergy In the current literature . . . . . . . 30--30
Harvey G. Cragon An evaluation of code space requirements
and performance of various architectures 5--21
Kenneth J. Thurber and
Harvey A. Freeman A bibliography of local computer network
architectures . . . . . . . . . . . . . 22--27
Lyle A. Cox, Jr. The nature of ``computer architecture'' 8--12
Jan L. A. van de Snepscheut and
Gert A. Slavenburg Introducing the notion of processes to
hardware . . . . . . . . . . . . . . . . 13--23
D. E. Atkins Review of \em Advances in Computer
Architecture by Glenford J. Myers.
Wiley-Interscience Division of John
Wiley and Sons 1978 . . . . . . . . . . 25--26
Kevin W. Bowyer Book review of \em The Structure of
Computers and Computations: Volume One
by David J. Kuck. John Wiley & Sons 1978 27--30
Randall Gibson and
Paul Anderson Technical overview of the Renaissance
Octobus system . . . . . . . . . . . . . 2--9
Johan W. Stevenson and
Andrew S. Tanenbaum Efficient encoding of machine
instructions . . . . . . . . . . . . . . 10--17
J. L. Keedy More on the use of stacks in the
evaluation of expressions . . . . . . . 18--22
G. E. Quick Intelligent memory: ``a parallel
processing concept'' . . . . . . . . . . 23--28
Ronald L. Rivest The BLIZZARD computer architecture . . . 2--10
J. L. Keedy A technique for passing reference
parameters in an information-hiding
architecture . . . . . . . . . . . . . . 11--15
Krishna M. Kavipurapu and
Dennis J. Frailey Quantification of architectures using
software science . . . . . . . . . . . . 2--6
Trevor Turton A proposed high-speed computer design 7--21
Computer Architecture News staff In the current literature . . . . . . . 22--22
Dana Richards On a ``Counter--Example'' . . . . . . . 2--3
Peter J. Denning Why not innovations in computer
architecture? . . . . . . . . . . . . . 4--7
G. W. Gerrity Hardware detection of undefined
references . . . . . . . . . . . . . . . 8--11
Peter J. Denning and
T. Don Dennis On minimizing contention at semaphores 12--19
Jack B. Dennis and
G. Andrew Boughton and
Clement K. C. Leung Building blocks for data flow prototypes 1--8
Edward S. Davidson A multiple stream microprocessor
prototype system: AMP-1 . . . . . . . . 9--16
F. Andre and
J. P. Banâtre and
H. Leroy and
G. Paget and
F. Ployette and
J. P. Routeau KENSUR: An architecture oriented towards
programming languages translation . . . 17--22
J. G. Kuhl and
S. M. Reddy Distributed fault-tolerance for large
multiprocessor systems . . . . . . . . . 23--30
Miroslaw Malek A comparison connection assignment for
diagnosis of multiprocessor systems . . 31--36
K. E. Grosspietsch and
J. Kaiser and
E. Nett A concept for test and reconfiguration
of a fault-tolerant VLSI processor
system . . . . . . . . . . . . . . . . . 37--43
Jean-Paul Brassard and
Jan Gecsei Path building in cellular partitioning
networks . . . . . . . . . . . . . . . . 44--50
Robert J. McMillen and
Howard Jay Siegel MIMD machine communication using the
augmented data manipulator network . . . 51--60
John P. Shen and
John P. Hayes Fault tolerance of a class of connecting
networks . . . . . . . . . . . . . . . . 61--71
E. G. Coffman, Jr. and
Kimming So On the comparison between single and
multiple processor systems . . . . . . . 72--79
V. Carl Hamacher and
Gerald S. Shedler Performance of a collision-free local
bus network having asynchronous
distributed control . . . . . . . . . . 80--87
W. M. Zuberek Timed Petri nets and preliminary
performance evaluation . . . . . . . . . 88--96
David R. Ditzel and
David A. Patterson Retrospective on high-level language
computer architecture . . . . . . . . . 97--104
J. P. Sansonnet and
M. Castan and
C. Percebois M3L: a list-directed architecture . . . 105--112
Yasushi Hibino A Practical Parallel Garbage Collection
Algorithm and Its Implementation . . . . 113--120
Philip C. Treleaven and
Geoffrey F. Mole A multi-processor reduction machine for
user-defined reduction languages . . . . 121--130
Jeffrey M. Tobias A single user multiprocessor
incorporating processor manipulation
facilities . . . . . . . . . . . . . . . 131--138
Robert H. Halstead, Jr. and
Stephen A. Ward The MuNet: a scalable decentralized
architecture for parallel computation 139--145
Butler W. Lampson and
Kenneth A. Pier A processor for a high-performance
personal computer . . . . . . . . . . . 146--160
D. B. G. Edwards and
A. E. Knowles and
J. V. Woods MU6-G: a new design to achieve mainframe
performance from a mini-sized computer 161--167
Kenneth E. Batcher Architecture of a massively parallel
processor . . . . . . . . . . . . . . . 168--173
John Palmer The Intel 8087 numeric data processor 174--181
Robert H. Kuhn Efficient mapping of algorithms to
single-stage interconnections . . . . . 182--189
David Nassimi and
Sartaj Sahni A self routing Benes network . . . . . . 190--195
H. von Issendorff and
W. Grünewald An adaptable network for functional
distributed systems . . . . . . . . . . 196--201
Mokhtar Boshra Riad A combination of field and current
access techniques for efficient and
cost-effective bubble memories . . . . . 202--210
K. S. Trivedi Designing linear storage hierarchies so
as to maximize reliability subject to
cost and performance constraints . . . . 211--217
Sudhir R. Ahuja and
Charles S. Roberts An associative/parallel processor for
partial match retrieval using
superimposed codes . . . . . . . . . . . 218--227
M. D. Ruggiero and
S. G. Zaky A microprocessor-based virtual memory
system . . . . . . . . . . . . . . . . . 228--235
Anand Jagannathan A technique for the architectural
implementation of software subsystems 236--244
Viktors Berstis Security and protection of data in the
IBM System/38 . . . . . . . . . . . . . 245--252
Miguel García Hoffmann Hardware implementation of communication
protocols: a formal approach . . . . . . 253--263
P. Guillier and
D. Slosberg An architecture with comprehensive
facilities of inter-process
synchronization and communication . . . 264--270
Robert M. Lougheed and
David L. McCubbrey The cytocomputer: a practical pipelined
image processor . . . . . . . . . . . . 271--277
C. Halatsis and
A. van Dam and
J. Joosten and
M. Letheren Architectural considerations for a
microprogrammable emulating engine using
bit-slices . . . . . . . . . . . . . . . 278--291
Mary Jane Irwin and
Don Heller Online pipeline systems for recursive
numeric computations . . . . . . . . . . 292--299
M. J. Foster and
H. T. Kung Design of special-purpose VLSI chips:
Example and opinions . . . . . . . . . . 300--307
Anshul Kumar and
P. C. P. Bhatt A structured language for CAD of digital
systems . . . . . . . . . . . . . . . . 308--316
Uwe Hercksen and
Rainer Klar and
Wolfgang Kleinöder Hardware-measurements of storage access
conflicts in the processor array EGPA(1) 317--324
Mario Tokoro and
Kiichiro Tamaru and
Masaaki Mizuno and
Masao Hori A high level multi-lingual
multiprocessor KMP/II . . . . . . . . . 325--333
Ken Aupperle A real innovation in computer
architecture . . . . . . . . . . . . . . 6--7
John R. Galloway, Jr. Architectural innovation round: round #3 8--10
John A. Sharp Some thoughts on data flow architectures 11--21
Mary Payne and
Dileep Bhandarkar VAX floating point: a solid foundation
for numerical computation . . . . . . . 22--33
Lloyd Dickman Treasurer's report . . . . . . . . . . . 37--38
Computer Architecture News staff Current literature: abstracts of
articles of interest\ldots . . . . . . . 48--48
Julian Davies Clock architecture and management . . . 3--6
G. Chroust and
J. R. Mühlbacher Rivalling multiprocessor organization: a
hardware/speed trade-off . . . . . . . . 7--10
David Stevenson A report on the proposed IEEE Floating
Point Standard (IEEE Task p754) . . . . 11--12
Justin Rattner and
George Cox Object-based computer architecture . . . 4--11
G. J. Myers and
B. R. S. Buckingham A hardware implementation of
capability-based addressing . . . . . . 12--24
David A. Patterson and
David R. Ditzel The case for the reduced instruction set
computer . . . . . . . . . . . . . . . . 25--33
Douglas W. Clark and
William D. Strecker Comments on ``The Case for the Reduced
Instruction Set Computer,'' by Patterson
and Ditzel . . . . . . . . . . . . . . . 34--38
James C. Brakefield Is 32 bits of address too much? . . . . 39--40
James C. Brakefield The peripheral bus . . . . . . . . . . . 41--43
Trevor Mudge Book reviews: Review of \em The
Structure of Computers and Computation,
Vol. I by David J. Kuck, John Wiley & and
Sons 1978 . . . . . . . . . . . . . . . 44--45
Computer Architecture News Staff Current literature: abstracts of
articles of interest\ldots . . . . . . . 46--46
Karl Reed The way forward in computer architecture
research . . . . . . . . . . . . . . . . 3--7
John Gilmore Suggested enhancements to the Motorola
MC68000 . . . . . . . . . . . . . . . . 8--14
John F. Wakerly Pascal extensions for describing
computer instruction sets . . . . . . . 15--23
Krishna M. Kavi Semantics of an algorithm . . . . . . . 24--26
Philip C. Treleaven VLSI: machine architecture and very high
level languages . . . . . . . . . . . . 27--38
Lloyd Dickman SIGARCH business . . . . . . . . . . . . 7--8
Martin L. De Prycker A new index mode for the VAX-11 . . . . 10--11
David Stevenson The Phoenix Project . . . . . . . . . . 12--15
E. M. J. C. Van Oost Multi-processor system description and
simulation using structured
multi-programming languages . . . . . . 16--32
John Wakerly Book review: Review of 'The Computers
that Saved Metropolis, by DC Comics and
Radio Shack', July 1980 . . . . . . . . 33--34
Arvind and
V. Kathail A Multiple Processor Data Flow Machine
that Supports Generalized Procedures . . ??
G. W. Gerrity On processes and interrupts . . . . . . 4--14
Dwight D. Hill A hardware mechanism for supporting
range checks . . . . . . . . . . . . . . 15--21
Vladimir S. Cherniavsky The computing memory another distributed
computer architecture . . . . . . . . . 22--24
James E. Thornton 8th Annual Symposium on Computer
Architecture: Heterogeneous Computer
Architecture . . . . . . . . . . . . . . 25--33
Computer Architecture News Staff Errata for two publications . . . . . . 34--34
Donald C. Lindsay Cache memory for microprocessors . . . . 6--13
Krishna M. Kavi Innovative architectures and commercial
computers: a summary of the panel
discussion at NCC 1981 . . . . . . . . . 14--16
R. M. Jenevein and
?. DeGroot and
G. Jack Lipovski Errata: ``A hardware support mechanism
for scheduling resources in parallel
machine environment'': (from Proceedings
of the 8th Annual Symposium on Computer
Architecture, p. 57) . . . . . . . . . . 17--17
C. K. Yuen Extending the power of short-wordlength
processors by means of context-dependent
machine instructions . . . . . . . . . . 9--15
Allan Gottlieb and
Clyde P. Kruskal Coordinating parallel processors: a
partial unification . . . . . . . . . . 16--24
Anonymous Errata: Structured machine design: an
ongoing experiment . . . . . . . . . . . 25--25
Charlie McDowell Protection at the micromachine level . . 4--8
Edward A. Feustel Protected procedure call on the
PRIME(TM) machines . . . . . . . . . . . 9--22
Hossam El-Halabi and
Dharma P. Agrawal Some remarks on direct execution
computers . . . . . . . . . . . . . . . 23--27
Daniel T. Fitzpatrick and
John K. Foderaro and
Manolis G. H. Katevenis and
Howard A. Landman and
David A. Patterson and
James B. Peek and
Zvi Peshkess and
Carlo H. Séquin and
Robert W. Sherburne and
Korbin S. Van Dyke A RISCy approach to VLSI . . . . . . . . 28--32
Justin Rattner Hardware/software cooperation in the
iAPX-432 . . . . . . . . . . . . . . . . 1--1
John Hennessy and
Norman Jouppi and
Forest Baskett and
Thomas Gross and
John Gill Hardware/software tradeoffs for
increased performance . . . . . . . . . 2--11
James W. Rymarczyk Coding guidelines for pipelined
processors . . . . . . . . . . . . . . . 12--19
Richard K. Johnsson and
John D. Wick An overview of the mesa processor
architecture . . . . . . . . . . . . . . 20--29
Alan D. Berenbaum and
Michael W. Condry and
Priscilla M. Lu The operating system and language
support features of the BELLMACTM-32
microprocessor . . . . . . . . . . . . . 30--38
George Radin The 801 minicomputer . . . . . . . . . . 39--47
David R. Ditzel and
H. R. McLellan Register allocation for free: The C
machine stack cache . . . . . . . . . . 48--56
Samuel P. Harbison An architectural alternative to
optimizing compilers . . . . . . . . . . 57--65
Butler W. Lampson Fast procedure calls . . . . . . . . . . 66--76
Douglas W. Jones Systematic protection mechanism design 77--80
Karl Reed On a general property of memory mapping
tables . . . . . . . . . . . . . . . . . 81--86
Robert P. Cook and
Nitin Donde An experiment to improve operand
addressing . . . . . . . . . . . . . . . 87--91
Akira Fusaoka and
Masaharu Hirayama Compiler chip: a hardware implementation
of compiler . . . . . . . . . . . . . . 92--95
B. R. Rau and
C. D. Glaeser and
E. M. Greenawalt Architectural support for the efficient
generation of code for horizontal
architectures . . . . . . . . . . . . . 96--99
R. E. McLear and
D. M. Scheibelhut and
E. Tammaru Guidelines for creating a debuggable
processor . . . . . . . . . . . . . . . 100--106
M. V. Wilkes Hardware support for memory protection:
Capability implementations . . . . . . . 107--116
Fred J. Pollack and
George W. Cox and
Dan W. Hammerstrom and
Kevin C. Kahn and
Konrad K. Lai and
Justin R. Rattner Supporting Ada memory management in the
iAPX-432 . . . . . . . . . . . . . . . . 117--131
J. P. Sansonnet and
M. Castan and
C. Percebois and
D. Botella and
J. Perez Direct execution of Lisp on a
list-directed architecture . . . . . . . 132--139
Mark Scott Johnson Some requirements for architectural
support of software debugging . . . . . 140--148
C. A. Middelburg The effect of the PDP-11 architecture on
code generation for chill . . . . . . . 149--157
Richard E. Sweet and
James G. Sandman, Jr. Empirical analysis of the mesa
instruction set . . . . . . . . . . . . 158--166
Gene McDaniel An analysis of a mesa instruction set
using dynamic instruction frequencies 167--176
Cheryl A. Wiecek A case study of VAX-11 instruction set
usage for compiler execution . . . . . . 177--184
Mamoru Maekawa and
Ken Sakamura and
Chiaki Ishikawa Firmware structure and architectural
support for monitors, vertical migration
and user microprogramming . . . . . . . 185--194
N. Kamibayashi and
H. Ogawana and
K. Nagayama and
H. Aiso Heart: an operating system nucleus
machine implemented by firmware . . . . 195--204
Sudhir R. Ahuja and
Abhaya Asthana A multi-microprocessor architecture with
hardware support for communication and
scheduling . . . . . . . . . . . . . . . 205--209
David A. Patterson and
Richard S. Piepho RISC assessment: a high-level language
experiment . . . . . . . . . . . . . . . 3--8
Douglas W. Clark and
Henry M. Levy Measurement and analysis of instruction
use in the VAX-11/780 . . . . . . . . . 9--17
Krishna Kavi and
Boumediene Belkhouche and
Evelyn Bullard and
Lois Delcambre and
Stephen Nemecek HLL architectures: Pitfalls and
predilections . . . . . . . . . . . . . 18--23
Allan Gottlieb and
Ralph Grishman and
Clyde P. Kruskal and
Kevin P. McAuliffe and
Larry Rudolph and
Marc Snir The NYU Ultracomputer---designing a
MIMD, shared-memory parallel machine
(extended abstract) . . . . . . . . . . 27--42
King-Hang Chu and
King-Sun Fu VLSI architectures for high speed
recognition of context-free languages
and finite-state languages . . . . . . . 43--49
Mark A. Franklin and
Donald F. Wann Asynchronous and clocked control
structures for VLSI based
interconnection networks . . . . . . . . 50--59
Robert J. McMillen and
Howard Jay Siegel Performance and fault tolerance
improvements in the Inverse Augmented
Data Manipulator network . . . . . . . . 63--72
D. S. Parker and
C. S. Raghavendra The Gamma network: a multiprocessor
interconnection network with redundant
paths . . . . . . . . . . . . . . . . . 73--80
R. M. Jenevein and
J. C. Browne A control processor for a reconfigurable
array computer . . . . . . . . . . . . . 81--89
Laxmi N. Bhuyan and
Dharma P. Agrawal A general class of processor
interconnection strategies . . . . . . . 90--98
F. J. Burkowski Instruction set design issues relating
to a static dataflow computer . . . . . 101--111
James E. Smith Decoupled access/execute computer
architectures . . . . . . . . . . . . . 112--119
L. J. Caluwaerts and
J. Debacker and
J. A. Peperstraete A data flow architecture with a paged
memory system . . . . . . . . . . . . . 120--127
B. Ramakrishna Rau and
Christopher D. Glaeser and
Raymond L. Picard Efficient code generation for horizontal
architectures: Compiler techniques and
architectural support . . . . . . . . . 131--139
Gene C. Barton Sentry: a novel hardware implementation
of classic operating system mechanisms 140--147
M. Abramovici and
Y. H. Levendel and
P. R. Menon A logic simulation machine . . . . . . . 148--157
Subrata Dasgupta and
Marius Olafsson Towards a family of languages for the
design and implementation of machine
architectures . . . . . . . . . . . . . 158--167
Yann-Hang Lee and
Kang G. Shin Rollback propagation detection and
performance evaluation of FTMR2M---a
fault-tolerant multiprocessor . . . . . 171--180
Woei Lin and
Chuan-lin Wu Design of a $ 2 \times 2 $
fault-tolerant switching element . . . . 181--189
Donald Fussell and
Peter Varman Fault-tolerant wafer-scale architectures
for VLSI . . . . . . . . . . . . . . . . 190--198
Sakti Pramanik Database filters . . . . . . . . . . . . 201--210
Mario Tokoro and
Takashi Takizuka On the semantic structure of information
--- a proposal of the abstract storage
architecture . . . . . . . . . . . . . . 211--217
Yasunori Dohi and
Akira Suzuki and
Noriyuki Matsui Hardware sorter and its application to
data base machine . . . . . . . . . . . 218--225
Philip C. Treleaven and
Richard P. Hopkins A recursive computer architecture for
VLSI . . . . . . . . . . . . . . . . . . 229--238
M. Castan and
E. I. Organick $ \mu $3L: an HLL-RISC processor for
parallel execution of FP-language
programs . . . . . . . . . . . . . . . . 239--247
F. Hommes The heap/substitution concept --- an
implementation of functional operations
on data structures for a reduction
machine . . . . . . . . . . . . . . . . 248--256
Paul F. Reynolds, Jr. A shared resource algorithm for
distributed simulation . . . . . . . . . 259--266
Bijendra N. Jain Duplication of packets and their
detection in X.25 communication
protocols . . . . . . . . . . . . . . . 267--273
Pauline Markenscoff A multiple processor system for real
time control tasks . . . . . . . . . . . 274--280
Leslie Jill Miller A heterogeneous multiprocessor design
and the distributed scheduling of its
task group workload . . . . . . . . . . 283--290
George H. Goble and
Michael H. Marsh A dual processor VAX 11/780 . . . . . . 291--298
Michel Dubois and
Fay\.e A. Briggs Effects of cache coherency in
multiprocessors . . . . . . . . . . . . 299--308
T. N. Mudge and
B. A. Makrucki Probabilistic analysis of a crossbar
switch . . . . . . . . . . . . . . . . . 311--320
Steven P. Levitan and
Caxton C. Foster Finding an extremum in a network . . . . 321--325
U. V. Premkumar and
J. C. Browne Resource allocation in rectangular SW
banyans . . . . . . . . . . . . . . . . 326--333
Anonymous List of authors . . . . . . . . . . . . 335--335
Alastair J. W. Mayer The architecture of the Burroughs B5000:
20 years later and still ahead of the
times? . . . . . . . . . . . . . . . . . 3--10
James C. Brakefield From the other side of the Atlantic: how
to improve upon the MU5 design . . . . . 11--16
Paul M. Hansen and
Mark A. Linton and
Robert N. Mayo and
Marguerite Murphy and
David A. Patterson A performance evaluation of the Intel
iAPX 432 . . . . . . . . . . . . . . . . 17--26
Miquel Huguet The protection of the processor status
word of the PDP-11/60 . . . . . . . . . 27--30
James Brakefield Just what is an op-code?: or a universal
computer design . . . . . . . . . . . . 31--34
J. D. Knott and
T. W. Crockett Fair dynamic arbitration for a
multiprocessor communications bus . . . 4--9
James R. Larus A comparison of microcode, assembly
code, and high-level languages on the
VAX-11 and RISC I . . . . . . . . . . . 10--15
David A. Patterson A performance evaluation of the Intel
80286 . . . . . . . . . . . . . . . . . 16--18
Rod Egan The effect of VLSI on computer
architecture . . . . . . . . . . . . . . 19--22
Thomas Benzie Book reviews: Review of \em
Microcomputer Architecture and
Programming by John F. Wakerly, John
Wiley & Sons, Inc., 1981 . . . . . . . . 23--23
Henry M. Levy and
Douglas W. Clark On the use of benchmarks for measuring
system performance . . . . . . . . . . . 5--8
Peter Schulthess and
Fritz Vonaesch OPA: a new architecture for Pascal-like
languages . . . . . . . . . . . . . . . 9--20
James C. Brakefield Talk on interpreters . . . . . . . . . . 21--28
D. W. Doran Main frame computer trends . . . . . . . 29--44
Daniel Gajski and
David Kuck and
Duncan Lawrie and
Ahmed Sameh CEDAR: a large scale multiprocessor . . 7--11
Elaine French and
Hugh Glaser TUKI: a data flow processor . . . . . . 12--18
Nenad Marovac A systematic approach to the design and
implementation of a computer instruction
set . . . . . . . . . . . . . . . . . . 19--24
Harvey Cragon Executable instruction set specification 25--43
Robert P. Colwell and
Charles Y. Hitchcock and
E. Douglas Jensen Peering through the RISC/CISC fog: an
outline of research . . . . . . . . . . 44--50
G. W. Gorsline Review of \em Advances in Computer
Architecture by Glenford J. Myers, John
Wiley & Sons, Inc. 1982 . . . . . . . . . 55--55
M. W. Sachs Book reviews: Review of \em
Microcomputer Interfacing by G. Jack
Lipovski, Lexington Books 1980 . . . . . 55--55
David Abramson and
John Rosenberg Hardware support for program debuggers
in a paged virtual memory . . . . . . . 8--19
Dennis J. Frailey Word length of a computer architecture
definitions and applications . . . . . . 20--26
Lee A. Hollaar Book reviews: Review of \em Computer
Design by Glen G. Langdon, Computeach
Press . . . . . . . . . . . . . . . . . 27--28
Maurice V. Wilkes Size, power, and speed (keynote address) 2--4
W. K. Giloi Towards a taxonomy of computer
architecture based on the machine data
type view . . . . . . . . . . . . . . . 6--15
Algirdas Avi\vzienis Framework for a taxonomy of
fault-tolerance attributes in computer
systems . . . . . . . . . . . . . . . . 16--21
Björn Pehrson and
Joachim Parrow Caddie an interactive design environment 24--31
Subrata Dasgupta On the verification of computer
architectures using an architecture
description language . . . . . . . . . . 32--38
Richard M. King Research on synthesis of concurrent
computing systems (extended abstract) 39--46
Allan L. Fisher and
H. T. Kung and
Louis M. Monier and
Yasunori Dohi Architecture of the PSC---a programmable
systolic chip . . . . . . . . . . . . . 48--53
Allan L. Fisher and
H. T. Kung Synchronizing large VLSI processor
arrays . . . . . . . . . . . . . . . . . 54--58
Robert A. Wagner The Boolean Vector Machine [BVM] . . . . 59--66
M. A. Bonuccelli and
E. Lodi and
F. Luccio and
P. Maestrini and
L. Pagli A VLSI tree machine for relational data
bases . . . . . . . . . . . . . . . . . 67--73
L. J. Caluwaerts and
J. Debacker and
J. A. Peperstraete Implementing streams on a data flow
computer system with paged memory . . . 76--83
Joseph E. Requa The Piecewise Data Flow architecture
control flow and register management . . 84--89
Mario Tokoro and
J. R. Jagannathan and
Hideki Sunahara On the working set concept for data-flow
machines . . . . . . . . . . . . . . . . 90--97
R. W. Marczy\'nski and
J. Milewski A data driven system based on a
microprogrammed processor module . . . . 98--106
David A. Patterson and
Phil Garrison and
Mark Hill and
Dimitris Lioupis and
Chris Nyberg and
Tim Sippel and
Korbin Van Dyke Architecture of a VLSI instruction cache
for a RISC . . . . . . . . . . . . . . . 108--116
Phil C. C. Yeh and
Janak H. Patel and
Edward S. Davidson Performance of shared cache for
parallel-pipelined computer systems . . 117--123
James R. Goodman Using cache memory to reduce
processor-memory traffic . . . . . . . . 124--131
James E. Smith and
James R. Goodman A study of instruction cache
organizations and replacement policies 132--137
Joseph A. Fisher Very Long Instruction Word architectures
and the ELI-512 . . . . . . . . . . . . 140--150
Shinji Tomita and
Kiyoshi Shibayama and
Toshiaki Kitamura and
Toshiyuki Nakata and
Hiroshi Hagiwara A user-microprogrammable, local host
computer with low-level parallelism . . 151--157
Richard H. Gumpertz Combining tags with error codes . . . . 160--165
Young Gil Park and
Jung Wan Cho Fault diagnosis of bit-slice processor 166--172
M. A. Fiol and
I. Alegre and
J. L. A. Yebra Line digraph iterations and the (d,k)
problem for directed graphs . . . . . . 174--177
Eli Opper and
Miroslaw Malek and
G. Jack Lipovski Resource allocation in rectangular
CC-banyans . . . . . . . . . . . . . . . 178--184
Franti\vsek Sovi\vs Uniform theory of the shuffle-exchange
type permutation networks . . . . . . . 185--191
Vason P. Srini and
Jorge F. Asenjo Analysis of Cray-1S architecture . . . . 194--206
Harry F. Jordan Performance measurements on HEP --- a
pipelined MIMD computer . . . . . . . . 207--212
Hideharu Amano and
Takaichi Yoshida and
Hideo Aiso (SM)2-Sparse Matrix Solving Machine . . 213--220
R. Kalyana Krishnan and
A. K. Rajasekar and
C. S. Moghe An experimental system for Computer
Science instruction . . . . . . . . . . 222--227
Klaus Kronlöf Execution control and memory management
of a Data Flow Signal Processor . . . . 230--235
Masasuke Kishi and
Hiroshi Yasuhara and
Yasusuke Kawamura DDDP---a Distributed Data Driven
Processor . . . . . . . . . . . . . . . 236--242
Naohisa Takahashi and
Makoto Amamiya A data flow processor array system:
Design and analysis . . . . . . . . . . 243--250
Kenneth A. Pier A retrospective on the Dorado, a
high-performance personal computer . . . 252--269
Robert J. Dugan System/370 extended architecture: a
program view of the channel subsystem 270--276
Richard L. Norton and
Jacob A. Abraham Adaptive interpretation as a means of
exploiting complex instruction sets . . 277--282
Manoj Kumar and
Daniel M. Dias and
J. R. Jump Switching strategies in a class of
packet switching networks . . . . . . . 284--300
Benjamin W. Wah A comparative study of distributed
resource sharing on multiprocessors . . 301--308
W. Kent Fuchs and
Jacob A. Abraham and
Kuang-Hua Huang Concurrent error detection in VLSI
interconnection networks . . . . . . . . 309--315
W. K. Giloi and
P. Behr Hierarchical function distribution --- a
design principle for advanced
multicomputer architectures . . . . . . 318--325
Luigi Stringa EMMA-an industrial experience on large
multiprocessing architectures . . . . . 326--333
Lars Philipson and
Bo Nilsson and
Bjorn Breidegard A communication structure for a
multiprocessor computer with distributed
global memory . . . . . . . . . . . . . 334--340
Hiromu Hayashi and
Akira Hattori and
Haruo Akimoto ALPHA---a high-performance LISP machine
equipped with a new stack structure and
garbage collection system . . . . . . . 342--348
Shinji Umeyama and
Koichiro Tamura A parallel execution model of logic
programs . . . . . . . . . . . . . . . . 349--355
Claudia Schmittgen and
Werner Kluge A system architecture for the concurrent
evaluation of applicative program
expressions . . . . . . . . . . . . . . 356--362
Yoshinori Yamaguchi and
Kenji Toda and
Toshitsugu Yuba A performance evaluation of a Lisp-based
data-driven machine (EM-3) . . . . . . . 363--369
Steven L. Tanimoto A pyramidal approach to parallel
processing . . . . . . . . . . . . . . . 372--378
Gérard Gaillat The design of a parallel processor for
image processing on-board satellites: an
application oriented approach . . . . . 379--386
Hitoshi Nishimura and
Hiroshi Ohno and
Toru Kawata and
Isao Shirakawa and
Koichi Omura Links-1 --- a parallel pipelined
multimicrocomputer system for image
creation . . . . . . . . . . . . . . . . 387--394
T. Ericsson and
P. E. Danielsson LIPP --- a SIMD multiprocessor
architecture for image processing . . . 395--400
Philip C. Treleaven The new generation of computer
architecture . . . . . . . . . . . . . . 402--409
Shunichi Uchida Inference machine: From sequential to
parallel . . . . . . . . . . . . . . . . 410--416
Tohru Moto-oka Overview to the Fifth Generation
Computer System project . . . . . . . . 417--422
Kunio Murakami and
Takeo Kakuta and
Nobuyoshi Miyazaki and
Shigeki Shibayama and
Haruo Yokota A relational data base machine: First
step to knowledge base machine . . . . . 423--425
Arvind and
Robert A. Iannucci A critique of multiprocessing von
Neumann style . . . . . . . . . . . . . 426--436
Dwight D. Hill An analysis of C machine support for
other block-structured languages . . . . 6--16
Nenad Marovac On interprocess interaction in
distributed architectures . . . . . . . 17--22
Robert J. Schalkoff Towards an efficient, dedicated
architecture for a Digital Geometric
Image Transformer (DGIT) . . . . . . . . 23--29
Arieh Plotkin and
Daniel Tabak A Tree Structured Architecture for
semantic gap reduction . . . . . . . . . 30--44
Maurice V. Wilkes Keeping jump instructions out of the
pipeline of a RISC-like computer . . . . 5--7
Jeremy Jones Puzzling with microcode . . . . . . . . 8--12
Wayne Amsbury A code-splitting algorithm . . . . . . . 13--21
Jack J. Dongarra Performance of various computers using
standard linear equations software in a
Fortran environment . . . . . . . . . . 22--27
M. R. Bhujade On the design of Always Compatible
Instruction Set Architecture(ACISA) . . 28--30
J. L. Heath Re-evaluation of the RISC I . . . . . . 3--10
David A. Patterson RISC watch . . . . . . . . . . . . . . . 11--19
Michael Beeler Beyond the Baskett benchmark . . . . . . 20--31
Edward A. Feustel Process exchange on the PR1ME family of
computers . . . . . . . . . . . . . . . 32--43
P. M. Fenwick Addressing operations for automatic data
structure accessing . . . . . . . . . . 44--57
C. K. Yuen Some applications of the implicit
register reference . . . . . . . . . . . 58--63
Krishna M. Kavi and
K. Krishnamohan Architecture quality . . . . . . . . . . 64--72
Dharma P. Agrawal and
Winser E. Alexander B-HIVE: a heterogeneous, interconnected,
versatile and expandable multicomputer
system . . . . . . . . . . . . . . . . . 7--13
F. J. Burkowski A vector and array multiprocessor
extension of the sylvan architecture . . 4--11
Alejandro Kapauan and
J. Timothy Field and
Dennis B. Gannon and
Lawrence Snyder The Pringle parallel computer . . . . . 12--20
Mehrad Yasrebi and
G. J. Lipovski A state-of-the-art SIMD two-dimensional
FFT array processor . . . . . . . . . . 21--27
Y. W. Ma and
R. Krishnamurti The architecture of Replica: a
special-purpose computer system for
active multi-sensory perception of
$3$-dimensional objects . . . . . . . . 30--37
Samuel M. Goldwasser A generalized object display processor
architecture . . . . . . . . . . . . . . 38--47
Katsura Kawakami and
Shigeo Shimazaki A special purpose LSI processor using
the DDA algorithm for image
transformation . . . . . . . . . . . . . 48--54
Benjamin W. Wah and
Guo-Jie Li and
Chee-Fen Yu The status of MANIP --- a multicomputer
architecture for solving, combinatorial
extremum-search problems . . . . . . . . 56--63
R. Gonzalez-Rubio and
J. Rohmer and
D. Terral The SCHUSS filter: a processor for
non-numerical data processing . . . . . 64--73
Carl Ebeling and
Andrew Palay The design and implementation of a VLSI
chess move generator . . . . . . . . . . 74--80
Manjai Lee and
Chuan-lin Wu Performance analysis of circuit
switching, baseline interconnection
networks . . . . . . . . . . . . . . . . 82--90
Clyde P. Kruskal and
Marc Snir The importance of being square . . . . . 91--98
Chi-Yuan Chin and
Kai Hwang Connection principles for multipath,
packet switching networks . . . . . . . 99--108
Shlomo Weiss and
James E. Smith Instruction issue logic for pipelined
supercomputers . . . . . . . . . . . . . 110--118
Robert G. Wedig and
Marc A. Rose The reduction of branch instruction
execution overhead using structured
control flow . . . . . . . . . . . . . . 119--125
Utpal Banerjee and
Daniel D. Gajski Fast execution of loops with if
statements . . . . . . . . . . . . . . . 126--132
Daniel Gajski and
Won Kim and
Shinya Fushimi A parallel pipelined relational query
processor: an architectural overview . . 134--141
Arun K. Somani and
Vinod K. Agarwal An efficient VLSI dictionary machine . . 142--150
Allan L. Fisher Dictionary machines with a small number
of processors . . . . . . . . . . . . . 151--156
Mark D. Hill and
Alan Jay Smith Experimental evaluation of on-chip
microprocessor cache memories . . . . . 158--166
James R. Goodman and
Men-chow Chiang The use of static column RAM as a memory
hierarchy . . . . . . . . . . . . . . . 167--173
I. J. Haikala Cache hit ratios with geometric task
switch intervals . . . . . . . . . . . . 175--175
Yutaka Ishikawa and
Mario Tokoro The design of an object oriented
architecture . . . . . . . . . . . . . . 178--187
David Ungar and
Ricki Blau and
Peter Foley and
Dain Samples and
David Patterson Architecture of SOAR: Smalltalk on a
RISC . . . . . . . . . . . . . . . . . . 188--197
Pradip Bose and
Edward S. Davidson Design of instruction set architectures
for support of high-level languages . . 198--206
Patrice Quinton Automatic synthesis of systolic arrays
from uniform recurrent equations . . . . 208--214
Chang nian Zhang and
David Y. Y. Yun Multi-dimensional systolic networks, for
Discrete Fourier Transform . . . . . . . 215--222
J. A. B. Fortes and
D. I. Moldovan Data broadcasting in linearly scheduled
array processors . . . . . . . . . . . . 224--231
I. V. Ramakrishnan and
P. J. Varman Modular matrix multiplication on a
linear array . . . . . . . . . . . . . . 232--238
T. R. N. Rao Joint encryption and error correction
schemes . . . . . . . . . . . . . . . . 240--241
Bella Bose Unidirectional error
correction/detection for VLSI memory . . 242--244
C. L. Chen Error-correcting codes for semiconductor
memories . . . . . . . . . . . . . . . . 245--247
Khaled Abdel Ghaffar and
Robert J. McEliece Soft error correction for increased
densities in VLSI memories . . . . . . . 248--250
Richard M. King and
Robert A. Wagner Combining speed with alpha-particle
induced memory, error tolerance in a
large Boolean vector machine . . . . . . 251--253
Laxmi N. Bhuyan On the performance of loosely coupled
multiprocessors . . . . . . . . . . . . 256--262
Ravi Mehrotra and
Sarosh N. Talukdar Scheduling of tasks for distributed
processors . . . . . . . . . . . . . . . 263--270
Krishna M. Kavi and
Edward W. Banios and
Bruce D. Shriver Message repository definitional
facility: an architectural model for
interprocess communication . . . . . . . 271--278
Prithviraj Banerjee and
Jacob A. Abraham Fault-secure algorithms for
multiple-processor systems . . . . . . . 279--287
Lubomir Bic Execution of logic programs on a
dataflow architecture . . . . . . . . . 290--296
W. G. Rudd and
Duncan A. Buell and
Donald M. Chiarulli A high performance factoring machine . . 297--300
Joel S. Emer and
Douglas W. Clark A characterization of processor
performance in the VAX-11/780 . . . . . 301--310
W. D. Moeller and
G. Sandweg The peripheral processor PP4, a highly
regular VLSI processor . . . . . . . . . 312--318
Lars Philipson VLSI based design principles for MIMD
multiprocessor computers with
distributed memory management . . . . . 319--327
M. R. Samatham and
D. K. Pradhan A multiprocessor network suitable for
single-chip VLSI implementation . . . . 328--339
Larry Rudolph and
Zary Segall Dynamic decentralized cache schemes for
MIMD parallel processors . . . . . . . . 340--347
Mark S. Papamarcos and
Janak H. Patel A low-overhead coherence solution for
multiprocessors with private cache
memories . . . . . . . . . . . . . . . . 348--354
James Archibald and
Jean Loup Baer An economical solution to the cache
coherence problem . . . . . . . . . . . 355--362
Ilkka J. Haikala Cache hit ratios with geometric task
switch intervals . . . . . . . . . . . . 364--371
Gilman D. Chesley A wafer microcomputer . . . . . . . . . 4--6
Howard Jay Siegel and
Thomas Schwederski and
Nathaniel J. Davis IV and
James T. Kuehn PASM: a reconfigurable parallel system
for image processing . . . . . . . . . . 7--19
Javaid Aslam Methodology for designing a computer
architecture . . . . . . . . . . . . . . 4--11
Peter C. J. Graham Providing architectural support for
expert systems . . . . . . . . . . . . . 12--18
Jack J. Dongarra Performance of various computers using
standard linear equations software in a
Fortran environment . . . . . . . . . . 3--11
T. M. Hor and
C. K. Yuen The design and programming of a powerful
short wordlength processor using
context-dependent machine instructions 12--26
E. N. Miya Multiprocessor/distributed processing
bibliography (in machine-readable form) 27--29
Weiming Hu Dataflow architecture for EEG patient
monitor . . . . . . . . . . . . . . . . 3--10
A. G. Tagg Speculations on the evolution of an
architecture . . . . . . . . . . . . . . 11--18
Brian Randell Hardware/software tradeoffs: a general
design principle? . . . . . . . . . . . 19--21
V. K. Prasanna Kumar and
C. S. Raghavendra Array processor with multiple
broadcasting . . . . . . . . . . . . . . 2--10
G. Wolf and
J. R. Jump Matrix multiplication in an interleaved
array processing architecture . . . . . 11--17
J. R. Goodman and
Jian-tu Hsieh and
Koujuch Liou and
Andrew R. Pleszkun and
P. B. Schechter and
Honesty C. Young PIPE: a VLSI decoupled architecture . . 20--27
Peter Y. T. Hsu and
Joseph T. Rahmeh and
Edward S. Davidson and
Jacob A. Abraham TIDBITS: speedup via time-delay
bit-slicing in ALU design for VLSI
technology . . . . . . . . . . . . . . . 29--35
James E. Smith and
Andrew R. Pleszkun Implementation of precise interrupts in
pipelined processors . . . . . . . . . . 36--44
Herb Schwetman and
Daniel Gajski and
Dennis Gannon and
Daniel Hills and
Jacob Schwartz and
James Browne Classification of parallel processor
architectures (invited tutorial session) 45--45
Makoto Hasegawa and
Yoshiharu Shigei High-speed top-of-stack scheme for VLSI
processor: a management algorithm and
its analysis . . . . . . . . . . . . . . 48--54
Charles Y. Hitchcock III and
H. M. Brinkley Sprunt Analyzing multiple register sets . . . . 55--63
Alan Jay Smith Cache evaluation and the impact of
workload choice . . . . . . . . . . . . 64--73
David A. Moon Architecture of the Symbolics 3600 . . . 76--83
Ashwin Ram and
Janak H. Patel Parallel garbage collection without
synchronization overhead . . . . . . . . 84--90
Gurindar S. Sohi and
Edward S. Davidson and
Janak H. Patel An efficient LISP-execution architecture
with a new representation for list
structures . . . . . . . . . . . . . . . 91--98
Hideharu Amano and
Taisuke Boku and
Tomohiro Kudoh and
Hideo Aiso (SM)2-II: a new version of the sparse
matrix solving machine . . . . . . . . . 100--107
John Beetem and
Monty Denneau and
Don Weingarten The GF11 supercomputer . . . . . . . . . 108--115
Bradley Warren Smith and
Howard Jay Siegel Models for use in the design of
macro-pipelined parallel processors . . 116--123
Jan Edler and
Allan Gottlieb and
Clyde P. Kruskal and
Kevin P. McAuliffe and
Larry Rudolph and
Marc Snir and
Patricia J. Teller and
James Wilson Issues related to MIMD shared-memory
computers: the NYU Ultracomputer
approach . . . . . . . . . . . . . . . . 126--135
R. N. Ibbett and
P. C. Capon and
N. P. Topham MU6V: a parallel vector processing
system . . . . . . . . . . . . . . . . . 136--144
Stephen F. Lundstrom A decentralized control, highly
concurrent multiprocessor . . . . . . . 145--151
William J. Dally and
James T. Kajiya An object oriented architecture . . . . 154--161
Edward F. Gehringer and
J. Leslie Keedy Tagged architecture: how compelling are
its advantages? . . . . . . . . . . . . 162--170
S. Nanba and
N. Ohno and
H. Kubo and
H. Morisue and
T. Ohshima and
H. Yamagishi VM/4: ACOS-4 virtual machine
architecture . . . . . . . . . . . . . . 171--178
T. P. Dobry and
A. M. Despain and
Y. N. Patt Performance studies of a Prolog machine
architecture . . . . . . . . . . . . . . 180--190
Ryosei Nakazaki and
Akihiko Konagaya and
Shin'ichi Habata and
Hideo Shimazu and
Mamoru Umemutra and
Masahiro Yamamoto and
Minoru Yokota and
Takashi Chikayama Design of a high-speed Prolog machine
(HPM) . . . . . . . . . . . . . . . . . 191--197
Nam Sung Woo A hardware unification unit: design and
analysis . . . . . . . . . . . . . . . . 198--205
Nicholas Matelan The FLEX/32 multicomputer . . . . . . . 209--213
J. Rattner Commercial multiprocessors (title only) 214--214
Dick Naedel Closely coupled asynchronous
hierarchical and parallel processing in
an open architecture . . . . . . . . . . 215--220
Jim Savage Parallel processing as a language design
problem . . . . . . . . . . . . . . . . 221--224
David P. Rodgers Improvements in multiprocessor system
design . . . . . . . . . . . . . . . . . 225--231
Peter B. Mark The Sequoia computer: a fault-tolerant
tightly-coupled multiprocessor
architecture . . . . . . . . . . . . . . 232--232
Elliot Nestle and
Armond Inselberg The SYNAPSE N+1 System: architectural
characteristics and performance data of
a tightly-coupled multiprocessor system 233--239
Robert W. Horst and
Timothy C. K. Chou An architecture for high volume
transaction processing . . . . . . . . . 240--245
Harold Stone and
Eric Manning and
Harriet Rigas and
Philip Treleaven The fifth generation computer systems
projects (invited session) . . . . . . . 247--247
Shigeo Kamiya and
Susumu Matsuda and
Kazuhide Iwata and
Shigeki Shibayama and
Hiroshi Sakai and
Kunio Murakami A hardware pipeline algorithm for
relational database operation . . . . . 250--257
Dik Lun Lee A distributed multiple-response resolver
for value-order retrieval . . . . . . . 258--265
John Feo and
Roy Jenevein and
J. C. Browne Dynamic, distributed resource
configuration on SW-banyans . . . . . . 268--275
R. H. Katz and
S. J. Eggers and
D. A. Wood and
C. L. Perkins and
R. G. Sheldon Implementing a cache consistency
protocol . . . . . . . . . . . . . . . . 276--283
Zhiyuan Li and
Walid Abu-Sufah A technique for reducing synchronization
overhead in large scale multiprocessors 284--291
Colin Whitby-Strevens The transputer . . . . . . . . . . . . . 292--300
A. R. Hurson and
B. Shirazi A systolic multiplier unit and its VLSI
design . . . . . . . . . . . . . . . . . 302--309
Rami Melhem A language for the simulation of
systolic architectures . . . . . . . . . 310--314
Henry Y. H. Chuang and
Guo He A versatile systolic array for matrix
computations . . . . . . . . . . . . . . 315--322
Rex Vedder and
Dennis Finn The Hughes Data Flow Multiprocessor:
architecture for efficient signal and
data processing . . . . . . . . . . . . 324--332
Kenneth R. Traub An abstract parallel graph reduction
machine . . . . . . . . . . . . . . . . 333--341
Bruno R. Preiss and
V. C. Hamacher Data flow on a queue machine . . . . . . 342--351
J. L. Gaudiot Methods for handling structures in
data-flow systems . . . . . . . . . . . 352--358
M. R. Samatham and
D. K. Pradhan The de Bruijn multiprocessor network: a
versatile sorting network . . . . . . . 360--367
Nian-Feng Tzeng and
Pen-Chung Yew and
Chun-Qi Zhu A fault-tolerant scheme for multistage
interconnection networks . . . . . . . . 368--375
V. P. Kumar and
S. M. Reddy Design and analysis of fault-tolerant
multistage interconnection networks with
low link complexity . . . . . . . . . . 376--386
Nathaniel J. Davis IV and
Howard Jay Siegel The performance analysis of partitioned
circuit switched multistage
interconnection networks . . . . . . . . 387--394
Dalibor Vrsalovic and
Edward F. Gehringer and
Zary Z. Segall and
Daniel P. Siewiorek The influence of parallel decomposition
strategies on the performance of
multiprocessor systems . . . . . . . . . 396--405
Walid Abu-Sufah and
Alex Y. Kwok Performance prediction tools for Cedar:
a multiprocessor supercomputer . . . . . 406--413
José M. Llabería Griñó and
Mateo Valero Cortés and
Enrique Herrada Lillo and
Jesús Labarta Mancho Analysis and simulation of multiplexed
single-bus networks with and without
buffering . . . . . . . . . . . . . . . 414--421
J. Sanguinetti and
B. Kumar Performance of a message-based
multiprocessor . . . . . . . . . . . . . 424--425
J.-Fr. Hake PDOC --- a database on parallel
processing literature . . . . . . . . . 2--7
Mark Rockey The dataflow architecture: a suitable
base for the implementation of expert
systems . . . . . . . . . . . . . . . . 8--14
Harvey G. Cragon An architecture design system . . . . . 15--21
Miquel Huguet and
Tomás Lang A reduced register file for RISC
architectures . . . . . . . . . . . . . 22--31
Cedell A. Alexander and
William M. Keshlear and
Faye Briggs Translation buffer performance in a UNIX
environment . . . . . . . . . . . . . . 2--14
Rosanna Lee On ``hot spot'' contention . . . . . . . 15--20
Nam Sung Woo and
Richard O'Keefe A comment on ``A hardware unification
unit: design and analysis'' . . . . . . 2--3
A. B. Ruighaver Design aspects of the Delft Parallel
Processor DPP84 and its programming
system . . . . . . . . . . . . . . . . . 4--8
Dan Hammerstrom and
David Maier and
Shreekant Thakkar The Cognitive Architecture Project . . . 9--21
Alan Jay Smith Bibliography and reading on CPU cache
memories and related topics . . . . . . 22--42
H. Yokota and
H. Itoh A model and an architecture for a
relational knowledge base . . . . . . . 2--9
M. Amamiya and
M. Takesue and
R. Hasegawa and
H. Mikami Implementation and evaluation of a
list-processing-oriented data flow
machine . . . . . . . . . . . . . . . . 10--19
K. Takahashi and
H. Yamada and
H. Nagai and
K. Matsumi A new string search hardware
architecture for VLSI . . . . . . . . . 20--27
A. Gupta and
C. Forgy and
A. Newell and
R. Wedig Parallel algorithms and architectures
for rule-based systems . . . . . . . . . 28--37
R. R. Halstead, Jr. and
T. L. Anderson and
R. B. Osborne and
T. L. Sterling Concert: design of a multiprocessor
development system . . . . . . . . . . . 40--48
H. T. Kung Memory requirements for balanced
computer architectures . . . . . . . . . 49--54
Y. C. Hong and
T. H. Payne and
L. B. O. Ferguson Graph allocation in static dataflow
systems . . . . . . . . . . . . . . . . 55--64
P. Agrawal and
R. Agrawal Software implementation of a recursive
fault tolerance algorithm on a network
of computers . . . . . . . . . . . . . . 65--72
T. Nojiri and
S. Kawasaki and
K. Sakoda Microprogrammable processor for
object-oriented architecture . . . . . . 74--81
S. S. Thakkar and
W. E. Hostmann An instruction fetch unit for a graph
reduction machine . . . . . . . . . . . 82--91
E. F. Gehringer and
R. P. Colwell Fast object-oriented procedure calls:
lessons from the Intel 432 . . . . . . . 92--101
D. M. Dias and
B. R. Iyer and
P. S. Yu On coupling many small systems for
transaction processing . . . . . . . . . 104--110
M. I. Malkawi and
J. H. Patel Performance measurement of paging
behavior in multiprogramming systems . . 111--118
A. Agarwal and
R. L. Sites and
M. Horowitz ATUM: a new technique for capturing
address traces using microcode . . . . . 119--127
M. J. Wise Experimenting with EPILOG: some results
and preliminary conclusions . . . . . . 119--127
Y. Shobatake and
H. Aiso A unification processor based on a
uniformly structured cellular hardware 128--139
N. Ito and
M. Sato and
E. Kuno and
K. Rokusawa The architecture and preliminary
evaluation results of the experimental
parallel inference machine PIM-D . . . . 149--156
A. Seznec An efficient routing control for the
SIGMA network $ \Sigma (4) $ . . . . . . 158--168
J. D. Nicoud and
K. Skala REYSM, a high performance, low power
multi-processor bus . . . . . . . . . . 169--174
K. Y. Lee and
W. Hegazy The extra stage gamma network . . . . . 175--182
M. Yuhara and
A. Hattori and
M. Niwa and
M. Kishimoto and
H. Hayashi Evaluation of the FACOM ALPHA Lisp
machine . . . . . . . . . . . . . . . . 184--190
A. R. Pleszkun and
M. J. Thazhuthaveetil An architecture for efficient Lisp list
access . . . . . . . . . . . . . . . . . 191--198
T. Nakata and
N. Koike A functional level simulation engine of
MAN-YO: a special purpose parallel
machine for logic design automation . . 202--208
E. H. Frank Exploiting parallelism in a switch-level
simulation machine . . . . . . . . . . . 209--215
T. S. Anantharaman and
R. Bisiani A hardware accelerator for speech
recognition algorithms . . . . . . . . . 216--223
T. Shimada and
K. Hiraki and
K. Nishida and
S. Sekiguchi Evaluation of a prototype data flow
processor of the SIGMA-1 for scientific
computations . . . . . . . . . . . . . . 226--234
J. Sargeant and
C. C. Kirkham Stored data structures on the Manchester
dataflow machine . . . . . . . . . . . . 235--242
K. Hawakami and
J. R. Gurd A scalable dataflow structure store . . 243--250
M. Hasegawa and
Y. Shigei $ A T^2 = O(N \log^4 N), T = O(\log N) $
Fast Fourier Transform in a light
connected $3$-dimensional VLSI . . . . . 252--260
K. Sapiecha and
R. Jarocki Modular architecture for high
performance implementation of FFT
algorithm . . . . . . . . . . . . . . . 261--270
J. J. Navarro and
J. M. Llaberia and
M. Valero Computing size-independent matrix
problems on systolic array processors 271--278
S. Tomita and
K. Shibayama and
T. Nakata and
S. Yuasa and
H. Hagiwara A computer with low-level parallelism
QA-2: its applications to $3$-D graphics
and Prolog/Lisp machines . . . . . . . . 280--289
M. Hirayama VLSI oriented asynchronous architecture 290--296
W. Hwu and
Y. N. Patt HPSm, a high performance restricted data
flow architecture having minimal
functionality . . . . . . . . . . . . . 297--306
K. Onaga and
T. Takechi On design of rotary array communication
and wavefront-driven algorithms for
solving large-scale band-limited matrix
equations . . . . . . . . . . . . . . . 308--315
L. M. Napolitano, Jr. A computer architecture for dynamic
finite element analysis . . . . . . . . 316--323
D. T. Harper III and
J. R. Jump Performance evaluation of vector
accesses in parallel memories using a
skewed storage scheme . . . . . . . . . 324--328
T. Kondo and
T. Tsuchiya and
T. Kitamura and
Y. Sugiyama and
T. Kimura Pseudo MIMD array processor---AAP2 . . . 330--337
A. L. Fisher Scan line array processors for image
computation . . . . . . . . . . . . . . 338--345
M. Annaratone and
E. Arnould and
T. Gross and
H. T. Kung and
M. S. Lam Warp architecture and implementation . . 346--356
D. A. Wood and
S. J. Eggers and
G. Gibson and
M. D. Hill and
J. M. Pendleton An in-cache address translation
mechanism . . . . . . . . . . . . . . . 358--365
D. R. Cheriton and
G. A. Slavenburg and
P. D. Boyle Software-controlled caches in the VMP
multiprocessor . . . . . . . . . . . . . 366--374
J. R. Goodman and
W. C. Hsu On the use of registers vs. cache to
minimize memory traffic . . . . . . . . 375--383
P. Y. T. Hsu and
E. S. Davidson Highly concurrent scalar processing . . 386--395
S. McFarling and
J. Hennesey Reducing the cost of branches . . . . . 396--403
S. R. Kunkel and
J. E. Smith Optimal pipelining in supercomputers . . 404--411
P. Sweazey and
A. J. Smith A class of compatible cache consistency
protocols and their support by the IEEE
Futurebus . . . . . . . . . . . . . . . 414--423
P. Bitar and
A. M. Despain Multiprocessor cache synchronization:
issues, innovations, evolution . . . . . 424--433
M. Dubois and
C. Scheurich and
F. Briggs Memory access buffering in
multiprocessors . . . . . . . . . . . . 434--442
G. S. Taylor and
P. N. Hilfinger and
J. R. Larus and
D. A. Patterson and
B. G. Zorn Evaluation of the SPUR Lisp architecture 444--452
Nam Sung Woo A reply to comments ``A Comment on 'A
Hardware Unification Unit: Design and
Analysis''\,' . . . . . . . . . . . . . 2--4
D. K. DuBose and
D. K. Fotakis and
D. Tabak A microcoded RISC . . . . . . . . . . . 5--16
Tomás Lang and
Miquel Huguet Reduced register saving/restoring in
single-window register files . . . . . . 17--26
Larry O'Neal Rouse The twisted double helix: a minimum
distance architecture for 5th generation
computing . . . . . . . . . . . . . . . 27--33
David M. Harland A recursively microcodable tagged
architecture . . . . . . . . . . . . . . 34--40
Cedell Alexander and
William Keshlear and
Furrokh Cooper and
Faye Briggs Cache memory performance in a Unix
environment . . . . . . . . . . . . . . 41--61
Roger Stokes Traces for hardware verification . . . . 7--14
Claudio Kirner and
Eduardo Marques Design of a distributed system support
based on a centralized parallel bus . . 15--26
Mary Jane Irwin Secretary/Treasurer's Report . . . . . . 28--28
David M. Harland and
Bruno Beloff Microcoding an object-oriented
instruction set . . . . . . . . . . . . 3--12
William Stallings An annotated bibliography on reduced
instruction set computers . . . . . . . 13--19
Robert H. Halstead, Jr. Overview of Concert MultiLisp: a
multiprocessor symbolic computing system 5--14
Dave Patterson A progress report on SPUR: February 1,
1987 . . . . . . . . . . . . . . . . . . 15--21
A. Despain and
Y. Patt and
V. Srini and
P. Bitar and
W. Bush and
C. Chien and
W. Citrin and
B. Fagin and
W. Hwu and
S. Melvin and
R. McGeer and
A. Singhal and
M. Shebanow and
P. Van Roy Aquarius . . . . . . . . . . . . . . . . 22--34
Madhur Kohli and
Mark E. Giuliano and
Jack Minker An overview of the PRISM project . . . . 35--42
M. V. Hermenegildo and
R. A. Warren Designing a high performance parallel
logic programming system . . . . . . . . 43--52
Jonathan W. Mills Coming to grips with a RISC: a report of
the progress of the LOW RISC design
group . . . . . . . . . . . . . . . . . 53--62
Brian Short Use of instruction set simulators to
evaluate the LOW RISC . . . . . . . . . 63--67
Kurt M. Gutzmann Optimal dimension of hypercubes for
sorting . . . . . . . . . . . . . . . . 68--72
Gilman Chesley Addressable WSI: a non-redundant
approach . . . . . . . . . . . . . . . . 73--80
Nripendra N. Biswas and
S. Srinivas and
Trishala Dharanendra A centrally controlled shuffle network
for reconfigurable and fault-tolerant
architecture . . . . . . . . . . . . . . 81--87
D. R. Ditzel and
H. R. McLellan Branch folding in the CRISP
microprocessor: reducing branch delay to
zero . . . . . . . . . . . . . . . . . . 2--8
J. A. DeRosa and
H. M. Levy An evaluation of branch architectures 10--16
W. W. Hwu and
Y. N. Patt Checkpoint repair for out-of-order
execution machines . . . . . . . . . . . 18--26
G. S. Sohi and
S. Vajapeyam Instruction issue logic for
high-performance, interruptible
pipelined processors . . . . . . . . . . 27--34
J. Swensen and
Y. Patt Fast temporary storage for serial and
parallel execution . . . . . . . . . . . 35--43
K. Wong and
M. A. Franklin Performance analysis and design of a
logic simulation machine . . . . . . . . 46--55
K. Doshi and
P. Varman A modular systolic architecture for
image convolutions . . . . . . . . . . . 56--63
S. Fujita and
R. Aibara and
M. Yamashita and
T. Ae A template matching algorithm using
optically-connected $3$-D VLSI
architecture . . . . . . . . . . . . . . 64--70
B. Mendelson and
G. M. Silberman Mapping data flow programs on a VLSI
array of processors . . . . . . . . . . 72--80
D. Ghosal and
L. N. Bhuyan Analytical modeling and architectural
modifications of a dataflow computer . . 81--89
M. Takesue A unified resource management and
execution control mechanism for data
flow machines . . . . . . . . . . . . . 90--97
S. Abe and
T. Bandoh and
S. Yamaguchi and
K. Kurosawa and
K. Kiriyama High performance integrated Prolog
processor IPP . . . . . . . . . . . . . 100--107
B. S. Fagin and
A. M. Despain Performance studies of a parallel Prolog
architecture . . . . . . . . . . . . . . 108--116
P. L. Civera and
F. Maddaleno and
G. L. Piccinini and
M. Zamboni An experimental VLSI Prolog interpreter:
preliminary measurements and results . . 117--126
O. Ridoux Deterministic and stochastic modeling of
parallel garbage collection: towards
real-time criteria . . . . . . . . . . . 128--136
C. Sun and
Y. Tsu The sharing of environment in
AND--OR-parallel execution of logic
programs . . . . . . . . . . . . . . . . 137--144
A. Guha and
R. Ramnarayan and
M. Derstine Architectural issues in designing
symbolic processors in optics . . . . . 145--151
A. Varma and
C. S. Raghavendra Rearrangeability of multistage
shuffle/exchange networks . . . . . . . 154--162
R. Beivide and
E. Herrada and
J. L. Balcazar and
J. Labarta Optimized mesh-connected networks for
SIMD and MIMD architectures . . . . . . 163--170
D. T. Harper III and
J. R. Jump Performance evaluation of reduced
bandwidth multistage interconnection
networks . . . . . . . . . . . . . . . . 171--175
U. Ramachandran and
M. Solomon and
M. Vernon Hardware support for interprocess
communication . . . . . . . . . . . . . 178--188
W. J. Dally and
L. Chao and
A. Chien and
S. Hassoun and
W. Horwat and
J. Kaplan and
P. Song and
B. Totty and
S. Wills Architecture of a message-driven
processor . . . . . . . . . . . . . . . 189--196
M. Kumar Effect of storage allocation/reclamation
methods on parallelism and storage
requirements . . . . . . . . . . . . . . 197--205
J. H. Chang and
H. Chao and
K. So Cache design of a sub-micron CMOS
System/370 . . . . . . . . . . . . . . . 208--213
M. Freeman An architectural perspective on a memory
access controller . . . . . . . . . . . 214--223
K. Cheung and
G. Sohi and
K. Saluja and
D. Pradhan Organization and analysis of a
gracefully-degrading interleaved memory
system . . . . . . . . . . . . . . . . . 224--231
C. Scheurich and
M. Dubois Correct memory operation of cache-based
multiprocessors . . . . . . . . . . . . 234--243
A. W. Wilson, Jr. Hierarchical cache/bus architecture for
shared memory multiprocessors . . . . . 244--252
R. L. Lee and
P. C. Yew and
D. H. Lawrie Multiprocessor cache design
considerations . . . . . . . . . . . . . 253--262
R. J. Eickemeyer and
J. H. Patel Performance evaluation of multiple
register sets . . . . . . . . . . . . . 264--271
T. J. Stanley and
R. G. Wedig A performance analysis of automatically
managed top of stack buffers . . . . . . 272--281
B. Moore and
A. Padegs and
R. Smith and
W. Buchholz Concepts of the System/370 vector
architecture . . . . . . . . . . . . . . 282--288
A. R. Pleszkun and
J. R. Goodman and
W. C. Hsu and
R. T. Joersz and
G. Bier and
P. Woest and
P. B. Schechter WISQ: a restartable architecture using
queues . . . . . . . . . . . . . . . . . 290--299
P. Chow and
M. Horowitz Architectural tradeoffs in the design of
MIPS-X . . . . . . . . . . . . . . . . . 300--308
D. R. Ditzel and
H. R. McLellan and
A. D. Berenbaum The hardware architecture of the CRISP
microprocessor . . . . . . . . . . . . . 309--319
Matthew Moore and
Charles McDowell Bi-directional networks for large
parallel processors . . . . . . . . . . 3--4
Ian Kaplan The LDF 100: a large grain dataflow
parallel processor . . . . . . . . . . . 5--12
Stanley Lass Wide channel computers . . . . . . . . . 13--16
Reinder J. Bril An implementation independent approach
to cache memories . . . . . . . . . . . 17--24
Reinder J. Bril On cacheability of lock-variables in
tightly coupled multiprocessor systems 25--32
J. K. Iliffe A forward-looking method of Cache memory
control . . . . . . . . . . . . . . . . 4--10
Amitava Bandyopadhyay and
Yuan F. Zheng Combining both microcode and hardwired
control in RISC . . . . . . . . . . . . 11--15
Martin Dowd An example RISC vector machine
architecture . . . . . . . . . . . . . . 16--22
Sanjiv K. Bhatia and
A. G. Starling Multilayered Illiac network scheme . . . 23--31
Lothar Nowak SAMP:a general purpose processor based
on a self-timed VLIW structure . . . . . 32--39
Peter J. Ashenden and
Chris J. Barter and
Chris D. Marlin The Leopard workstation project . . . . 40--51
Y. P. Chiang and
M. L. Manwaring Direct execution Lisp and cell memory 52--57
J. M. Terry Flow-control machines:the structured
execution architecture (SXA) . . . . . . 58--69
Niklaus Wirth Hardware architectures for programming
languages and programming languages for
hardware architectures . . . . . . . . . 2--8
Bob Beck and
Bob Kasten and
Shreekant Thakkar VLSI assist for a multiprocessor . . . . 10--20
Roberto Bisiani and
Alessandro Forin Architectural support for multilanguage
parallel programming on heterogeneous
systems . . . . . . . . . . . . . . . . 21--30
Richard Rashid and
Avadis Tevanian and
Michael Young and
David Golub and
Robert Baron Machine-independent virtual memory
management for paged uniprocessor and
multiprocessor architectures . . . . . . 31--39
John R. Hayes and
Martin E. Fraeman and
Robert L. Williams and
Thomas Zaremba An architecture for the direct execution
of the Forth programming language . . . 42--49
Peter Steenkiste and
John Hennessy Tags and type checking in LISP: hardware
and software approaches . . . . . . . . 50--59
Jack W. Davidson and
Richard A. Vaughan The effect of instruction set complexity
on program size and memory performance 60--64
Russell R. Atkinson and
Edward M. McCreight The dragon processor . . . . . . . . . . 65--69
James R. Goodman Coherency for multiprocessor virtual
address caches . . . . . . . . . . . . . 72--81
T. A. Cargill and
B. N. Locanthi Cheap hardware support for software
debugging and profiling . . . . . . . . 82--83
C. J. Georgiou and
S. L. Palmer and
P. L. Rosenfeld An experimental coprocessor for
implementing persistent objects on an
IBM 4381 . . . . . . . . . . . . . . . . 84--87
Daniel J. Magenheimer and
Liz Peters and
Karl Pettis and
Dan Zuras Integer multiplication and division on
the HP precision architecture . . . . . 90--99
David W. Wall and
Michael L. Powell The Mahler experience: using an
intermediate language as the machine
description . . . . . . . . . . . . . . 100--104
Shlomo Weiss and
James E. Smith A study of scalar compilation techniques
for pipelined supercomputers . . . . . . 105--109
William R. Bush and
A. Dain Samples and
David Ungar and
Paul N. Hilfinger Compiling Smalltalk-80 to a RISC . . . . 112--116
F. Chow and
S. Correll and
M. Himelstein and
E. Killian and
L. Weber How many addressing modes are enough? 117--121
Henry Massalin Superoptimizer: a look at the smallest
program . . . . . . . . . . . . . . . . 122--126
Kazuo Taki and
Katzuto Nakajima and
Hiroshi Nakashima and
Morihiro Ikeda Performance and architectural evaluation
of the PSI machine . . . . . . . . . . . 128--135
Gaetano Borriello and
Andrew R. Cherenson and
Peter B. Danzig and
Michael N. Nelson RISCs vs. CISCs for Prolog: a case study 136--145
Richard B. Kieburtz A RISC architecture for symbolic
computation . . . . . . . . . . . . . . 146--155
David R. Ditzel and
Hubert R. McLellan and
Alan D. Berenbaum Design tradeoffs to support the C
programming language in the CRISP
microprocessor . . . . . . . . . . . . . 158--163
Charles P. Thacker and
Lawrence C. Stewart Firefly: a multiprocessor workstation 164--172
Douglas W. Clark Pipelining and performance in the VAX
8800 processor . . . . . . . . . . . . . 173--177
Robert P. Colwell and
Robert P. Nix and
John J. O'Donnell and
David B. Papworth and
Paul K. Rodman A VLIW architecture for a trace
scheduling compiler . . . . . . . . . . 180--192
Adam Levinthal and
Pat Hanrahan and
Mike Paquette and
Jim Lawson Parallel computers for graphics
applications . . . . . . . . . . . . . . 193--198
J. E. Smith and
G. E. Dermer and
B. D. Vanderwarn and
S. D. Klinger and
C. M. Rozewski The ZS-1 central processor . . . . . . . 199--204
E. E. E. Frietman and
A. B. Ruighaver An electro-optic data communication
system for the Delft parallel processor 2--8
G. B. Shippen and
J. K. Archibald A tagged token dataflow machine for
computing small, iterative algorithms 9--18
Clif Penn Preface to the Special issue on Neural
Networks . . . . . . . . . . . . . . . . 6--6
Richard P. Lippmann An introduction to computing with neural
nets . . . . . . . . . . . . . . . . . . 7--25
James A. Anderson and
Edward J. Wisniewski and
Susan R. Viscuso Software for neural networks . . . . . . 26--36
Simon Garth and
Danny Pike An integrated system for neural network
simulations . . . . . . . . . . . . . . 37--44
A. Jean Maren Conference report: IEEE First
International Conference on Neural
Networks . . . . . . . . . . . . . . . . 45--46
Jack J. Dongarra Performance of various computers using
standard linear equations software in a
FORTRAN environment . . . . . . . . . . 47--69
Wm. A. Wulf The WM computer architecture . . . . . . 70--84
Daniel Tabak Logarithmic indices for multiprocessor
evaluation . . . . . . . . . . . . . . . 85--90
Martin Dowd An example RISC vector machine
architecture . . . . . . . . . . . . . . 91--99
Martin Dowd RISC vector CPU's and crossbars in
desktops . . . . . . . . . . . . . . . . 100--102
Stanley Lass Multiple instructions/operands per
access to cache memory . . . . . . . . . 103--103
Wanda Gass Workshop report: synthesis of foo bars 104--108
F. Joel Ferguson Book Review: \em Logic Design Principles
by Edward J. McCluskey, Prentice-Hall
Publishers, Englewood Cliffs, New
Jersey, 549 pp., \$39.95} . . . . . . . 109--109
J. Ghosh and
K. Hwang Critical issues in mapping neural
networks on message-passing
multicomputers . . . . . . . . . . . . . 3--11
Y. Takefuji and
R. Jannarone and
Y. B. Cho and
T. Chen Multinomial conjunctoid statistical
learning machines . . . . . . . . . . . 12--17
A. Louri and
K. Hwang A bit-plane architecture for optical
computing with two-dimensional symbolic
substitution . . . . . . . . . . . . . . 18--27
S. Fiske and
W. J. Dally The reconfigurable arithmetic processor 30--36
A. R. Pleszkun and
G. S. Sohi The performance potential of multiple
functional unit processors . . . . . . . 37--44
W. W. Hwu and
P. P. Chang Exploiting parallel microprocessor
microarchitectures with a compiler code
generator . . . . . . . . . . . . . . . 45--53
G. D. McNiven and
E. S. Davidson Analysis of memory referencing behavior
for design of local memories . . . . . . 56--63
R. J. Eickenmeyer and
J. H. Patel Performance evaluation of on-chip
register and cache organizations . . . . 64--72
J.-L. Baer and
W.-H. Wang On the inclusion properties for
multi-level cache hierarchies . . . . . 73--80
R. T. Short and
H. M. Levy A simulation study of two-level caches 81--88
E. Chow and
H. Madan and
J. Peterson and
D. Grunwald and
D. Reed Hyperswitch network for the hypercube
computer . . . . . . . . . . . . . . . . 90--99
D. C. Winsor and
T. N. Mudge Analysis of bus hierarchies for
multiprocessors . . . . . . . . . . . . 100--107
S. Wei and
G. Lee Extra group network: a cost-effective
fault-tolerant multistage
interconnection network . . . . . . . . 108--115
H. Jiang and
K. C. Smith A partial-multiple-bus computer
structure with improved cost
effectiveness . . . . . . . . . . . . . 116--122
I. Watson and
V. Woods and
P. Watson and
R. Banach and
M. Greenberg and
J. Sargeant Flagship: a parallel architecture for
declarative programming . . . . . . . . 124--130
R. A. Iannucci Toward a dataflow/von Neumann hybrid
architecture . . . . . . . . . . . . . . 131--140
D. E. Culler and
Arvind Resource requirements of dataflow
programs . . . . . . . . . . . . . . . . 141--150
B. Sprunt and
D. Kirk and
L. Sha Priority-driven, preemptive I/O
controllers for real-time systems . . . 152--159
S. B. Shukla and
D. P. Agrawal A kernel-independent, pipelined
architecture for real-time $2$-D
convolution . . . . . . . . . . . . . . 160--166
W. Liu and
T.-F. Yeh and
W. E. Batchelor and
R. Cavin Exploiting bit level concurrency in
real-time geometric feature extractions 167--174
D. W. Clark and
P. J. Bannon and
J. B. Keller Measuring VAX 8800 performance with a
histogram hardware monitor . . . . . . . 176--185
R. L. Sites and
A. Agarwal Multiprocessor cache analysis using ATUM 186--195
S. Ng and
D. Lang and
R. Selinger Trade-offs between devices and paths in
achieving disk interleaving . . . . . . 196--201
K. Jainandunsing and
E. F. Deprettere Design of a concurrent computer for
solving systems of linear equations . . 204--211
A. Wolfe and
M. Breternitz, Jr. and
C. Stephens and
A. L. Ting and
D. B. Kirk and
R. P. Bianchini, Jr. and
J. P. Shen The white dwarf: a high-performance
application-specific processor . . . . . 212--222
J. L. Gaudiot and
C. M. Lin and
M. Hosseiniyar Solving partial differential equations
in a data-driven multiprocessor
environment . . . . . . . . . . . . . . 223--230
D. Lee Scrambled storage for parallel memory
systems . . . . . . . . . . . . . . . . 232--239
V. Krishnaswamy and
S. Ahuja and
N. Carriero and
D. Gelernter The architecture of a Linda coprocessor 240--249
H. T. Kung Deadlock avoidance for systolic
communication . . . . . . . . . . . . . 252--260
K. So and
V. Zecca Cache performance of vector processors 261--268
M. K. Vernon and
U. Manber Distributed round-robin and first-come
first-serve protocols and their
applications to multiprocessor bus
arbitration . . . . . . . . . . . . . . 269--279
A. Agarwal and
R. Simoni and
J. Hennessy and
M. Horowitz An evaluation of directory schemes for
cache coherence . . . . . . . . . . . . 280--298
S. Prybylski and
M. Horowitz and
J. Hennessy Performance tradeoffs in cache design 290--298
H. Cheong and
A. V. Vaidenbaum A cache coherence scheme with fast
selective invalidation . . . . . . . . . 299--307
M. K. Vernon and
E. D. Lazowska and
J. Zahorjan An accurate and efficient performance
analysis technique for multiprocessor
snooping cache-consistency protocols . . 308--315
D. Rau and
J. A. B. Fortes and
H. J. Siegel Destination tag routing techniques based
on a state model for the LADM network 318--324
D. W. Kim and
G. J. Lipovski and
A. Hartmann and
R. Jenevein Regular CC-banyan networks . . . . . . . 325--332
R. M. Jenevein and
T. Mookken Traffic analysis of rectangular
SW-banyan networks . . . . . . . . . . . 333--342
Y. Tamir and
G. L. Frazier High-performance multi-queue buffers for
VLSI communications switches . . . . . . 343--354
B. R. Preiss and
V. C. Hamacher A cache-based message passing scheme for
a shared-bus multiprocessor . . . . . . 358--364
T. Boku and
S. Nomura and
H. Amano IMPULSE: a high performance processing
unit for multiprocessors for scientific
calculation . . . . . . . . . . . . . . 365--372
S. J. Eggers and
R. H. Katz A characterization of sharing in
parallel programs and its application to
coherency protocol evaluation . . . . . 373--382
G. J. Lipovski and
P. Vaughan A fetch-and-op implementation for
parallel computers . . . . . . . . . . . 384--392
A. Seznec and
Y. Jégou Synchronizing processors through memory
requests in a tightly coupled
multiprocessor . . . . . . . . . . . . . 393--400
R. M. Fujimoto and
J.-J. Tsai and
G. Gopalakrishnan Design and performance of special
purpose hardware for time warp . . . . . 401--409
D. R. Cheriton and
A. Gupta and
P. D. Boyle and
H. A. Goosen The VMP multiprocessor: initial
experience, refinements, and performance
evaluation . . . . . . . . . . . . . . . 410--421
J. R. Goodman and
P. J. Woest The Wisconsin multicube: a new
large-scale cache-coherent
multiprocessor . . . . . . . . . . . . . 422--431
E. Tick Data buffer performance for sequential
Prolog architectures . . . . . . . . . . 434--442
R. H. Halstead, Jr. and
T. Fujita MASA: a multithreaded processor
architecture for parallel symbolic
computing . . . . . . . . . . . . . . . 443--451
P. L. Butler and
J. D. Allen, Jr. and
D. W. Bouldin Parallel architecture for OPS5 . . . . . 452--457
David R. Cheriton and
Pat Boyle and
Gert A. Slavenburg Comments on ``Coherency for
multiprocessor virtual addresses
caches'' by James R. Goodman . . . . . . 3--6
James R. Goodman Reply to David R. Cheriton's, Pat
Boyle's, and Gert A. Slavenburg's
``Comments on 'Coherency for
multiprocessor virtual addressed
caches''\,' by James R. Goodman . . . . 7--7
Guy Rabbat and
Borko Furht and
Ron Kibler Three-dimensional computers and
measuring their performance . . . . . . 9--16
M. Castan and
A. Contessa and
E. Cousin and
C. Coustet and
B. Lecussan MaRs: a parallel graph reduction
multiprocessor . . . . . . . . . . . . . 17--24
Alessandro Contessa An approach to fault tolerance and error
recovery in a parallel graph reduction
machine: MaRS---a case study . . . . . . 25--32
Chuck Crawford Evolution of the Harris H-series
computers and speculations on their
future . . . . . . . . . . . . . . . . . 33--39
Philip L. Good Structuring an instruction cache . . . . 40--43
Eric E. Johnson Completing an MIMD multiprocessor
taxonomy . . . . . . . . . . . . . . . . 44--47
Douglas W. Jones The ultimate RISC . . . . . . . . . . . 48--55
Douglas W. Jones A minimal CISC . . . . . . . . . . . . . 56--63
Stanley Lass Shared cache multiprocessing with pack
computers . . . . . . . . . . . . . . . 64--70
Norman P. Jouppi Superscalar vs. superpipelined machines 71--80
Lorne H. Schachter Book review of \em High-Performance
Computer Architecture by Harold S.
Stone. Addison-Wesley 1987 . . . . . . . 81--84
Umakishore Ramachandran Preface to the Special Issue on
Architectural Support for Operating
Systems . . . . . . . . . . . . . . . . 11--11
A. Asthana and
H. V. Jagadish and
J. A. Chandross and
D. Lin and
S. C. Knauer An intelligent memory system . . . . . . 12--20
Monica Beltrametti and
Kenneth Bobey and
John R. Zorbas The control mechanism for the Myrias
parallel computer system . . . . . . . . 21--30
Raphael Finkel and
Debra Hengsen YACKOS on a shared-memory multiprocessor 31--36
Marc F. Pucci and
J. L. Alberi Optimized communication in an extended
remote procedure call model . . . . . . 37--46
Jordi Cortadella and
Teodor Jové Dynamic RAM for on-chip instruction
caches . . . . . . . . . . . . . . . . . 45--50
M. Naderi Modelling and performance evaluation of
multiprocessors organization with shared
memories . . . . . . . . . . . . . . . . 51--74
Edward Gehringer and
Janne Abullarade and
Michael H. Gulyn A survey of commercial parallel
processors . . . . . . . . . . . . . . . 75--107
Mark Lease and
Mac Lively Comparing production system
architectures . . . . . . . . . . . . . 108--116
Ivor Page and
Jeff Niehaus The Flex architecture, a high speed
graphics processor . . . . . . . . . . . 117--129
Kazuaki Murakami and
Akira Fukuda and
Toshinori Sueyoshi and
Shinji Tomita An overview of the Kyushu University
reconfigurable parallel processor . . . 130--137
Ora E. Percus and
J. K. Percus Some results concerning clock-regulated
queues . . . . . . . . . . . . . . . . . 138--144
Fleur Liane Williams Should SCC set condition codes? . . . . 145--149
Gordon B. Steven A novel effective address calculation
mechanism for RISC microprocessors . . . 150--156
Behrooz Parhami From defects to failures: a view of
dependable computing . . . . . . . . . . 157--168
David A. Patterson RISCY patents . . . . . . . . . . . . . 169--191
Helen C. Takacs Book review: \em A VLSI Architecture for
Concurrent Data Structures by William J.
Dally (Kluwer 1988) . . . . . . . . . . 192--193
Robert P. Colwell Book review: \em Computer Architecture
and Organization, 2nd ed. by John P.
Hayes (McGraw Hill, 1988) . . . . . . . 193--195
Charles E. McDowell Book review: \em Supercomputer
Architectures by Paul B. Schneck (Kluwer
Academic Publishers) . . . . . . . . . . 195--196
Herbert H. J. Hum and
Guang R. Gao Summary of the workshop on frontiers in
functional programming and dataflow
architecture . . . . . . . . . . . . . . 12--19
Andre M. van Tilborg Instrumentation for distributed
computing systems . . . . . . . . . . . 20--25
Glenn W. Griffin The ultimate ultimate RISC . . . . . . . 26--32
Douglas W. Jones Risks of comparing RISCs . . . . . . . . 33--34
M. Naderi Modelling and performance evaluation of
multiprocessors, organizations with
multi-memory units . . . . . . . . . . . 35--51
Peter Kogge and
John Oldfield and
Mark Brule and
Charles Stormon VLSI and rule-based systems . . . . . . 52--65
Behrooz Parhami Book review: \em Memory Storage Patterns
in Parallel Processing by Mary A. Mace
(Kluwer Academic Publishers, Boston,
1987, 139 pp.) . . . . . . . . . . . . . 76--76
J. P. Moskowitz and
C. Jousselin An algebraic memory model . . . . . . . 55--62
W. F. Wong A stack addressing scheme based on
windowing . . . . . . . . . . . . . . . 63--69
Anonymous Pipelining through Dynamic Control ROM 70--72
Stanley E. Lass Some innovations in computer
architecture . . . . . . . . . . . . . . 73--77
Philip Bitar Book reviews: Review of \em Parallel
Execution of Logic Programs by John
Conery. Kluwer Academic Publishers 1987 81--82
Robert Cohn and
Thomas Gross and
Monica Lam Architecture and compiler tradeoffs for
a long instruction word processor . . . 2--14
Gurindar S. Sohi and
Sriram Vajapeyam Tradeoffs in instruction format design
for horizontal architectures . . . . . . 15--25
James C. Dehnert and
Peter Y.-T. Hsu and
Joseph P. Bratt Overlapped loop support in the Cydra 5 26--38
F. J. Burkowski and
G. V. Cormack and
G. D. P. Dueck Architectural support for synchronous
task communication . . . . . . . . . . . 40--53
Rajiv Gupta The fuzzy barrier: a mechanism for high
speed synchronization of processors . . 54--63
James R. Goodman and
Mary K. Vernon and
Philip J. Woest Efficient synchronization primitives for
large-scale cache-coherent
multiprocessors . . . . . . . . . . . . 64--75
J. M. Mellor-Crummey and
T. J. LeBlanc A software instruction counter . . . . . 78--86
Z. Aral and
I. Gerther and
G. Schaffer Efficient debugging primitives for
multiprocessors . . . . . . . . . . . . 87--95
M. E. Staknis Sheaved memory: architectural support
for state saving and restoration in
pages systems . . . . . . . . . . . . . 96--102
M. A. Holliday Reference history, page size, and
migration daemons in local/remote
architectures . . . . . . . . . . . . . 104--112
D. L. Black and
R. F. Rashid and
D. B. Golub and
C. R. Hill Translation lookaside buffer
consistency: a software approach . . . . 113--122
G. A. Gibson and
L. Hellerstein and
R. M. Karp and
D. A. Patterson Failure correction techniques for large
disk arrays . . . . . . . . . . . . . . 123--132
N. P. Jouppi and
J. Bertoni and
D. W. Wall A unified vector/scalar floating-point
architecture . . . . . . . . . . . . . . 134--143
H. Mulder Data buffering: run-time versus
compile-time support . . . . . . . . . . 144--151
T. L. Adams and
R. E. Zimmerman An analysis of 8086 instruction set
usage in MS DOS programs . . . . . . . . 152--160
J. Roos A real-time support processor for Ada
tasking . . . . . . . . . . . . . . . . 162--171
Steven R. Vegdahl and
Uwe F. Pleban The runtime environment for Scheme, a
Scheme implementation on the 88000 . . . 172--182
S. McFarling Program optimization for instruction
caches . . . . . . . . . . . . . . . . . 183--191
Paul A. Karger Using registers to optimize cross-domain
call performance . . . . . . . . . . . . 194--204
Emmanuel Arnould and
H. T. Kung and
François Bitz and
Robert D. Sansom and
Eric C. Cooperm The design of nectar: a network
backplane for heterogeneous
multicomputers . . . . . . . . . . . . . 205--216
S. A. Delgado-Rannauro and
T. J. Reynolds A message driven OR-parallel machine . . 217--228
S. Owicki and
A. Agarwal Evaluating the performance of software
cache coherence . . . . . . . . . . . . 230--242
W. Weber and
A. Gupta Analysis of cache invalidation patterns
in multiprocessors . . . . . . . . . . . 243--256
S. J. Eggers and
R. H. Katz The effect of sharing on the cache and
bus performance of parallel programs . . 257--270
N. P. Jouppi and
D. W. Wall Available instruction-level parallelism
for superscalar and superpipelined
machines . . . . . . . . . . . . . . . . 272--282
W. J. Dally Micro-optimization of floating-point
operations . . . . . . . . . . . . . . . 283--289
M. D. Smith and
M. Johnson and
M. A. Horowitz Limits on multiple instruction issue . . 290--302
S. J. Eggers and
R. H. Katz Evaluating the performance of four
snooping cache coherency protocols . . . 2--15
D. R. Cheriton and
H. A. Goosen and
P. D. Boyle Multi-level shared caching techniques
for scalability in VMP-M/C . . . . . . . 16--24
A. Goto and
A. Matsumoto and
E. Tick Design and performance of a coherent
cache for parallel logic programming
architectures . . . . . . . . . . . . . 25--33
V. G. Grafe and
G. S. Davidson and
J. E. Hoch and
V. P. Holmes The Epsilon dataflow processor . . . . . 36--45
S. Sakai and
y. Yamaguchi and
K. Hiraki and
Y. Kodama and
T. Yuba An architecture of a dataflow single
chip processor . . . . . . . . . . . . . 46--53
P. Nitezki Exploiting data parallelism in signal
processing on a dataflow machine . . . . 54--61
R. N. Ibbett and
T. M. Hopkins and
K. I. M. McKinnon Architectural mechanisms to support
sparse vector processing . . . . . . . . 64--71
D. T. Harper and
D. A. Linebarger A dynamic storage scheme for
conflict-free vector access . . . . . . 72--77
K. Murakami and
N. Irie and
S. Tomita SIMP (Single Instruction stream/Multiple
instruction Pipelining): a novel
high-speed single-processor architecture 78--85
Y. Ben-Asher and
D. Egozi and
A. Schuster $2$-D SIMD algorithms in the perfect
shuffle networks . . . . . . . . . . . . 88--95
M. Valero-Garcia and
J. J. Navarro and
J. M. Llaberia and
M. Valero Systematic hardware adaptation of
systolic algorithms . . . . . . . . . . 96--104
M.-S. Chen and
K. G. Shin Task migration in hypercube
multiprocessors . . . . . . . . . . . . 105--111
S. Przybylski and
M. Horowitz and
J. Hennessy Characteristics of performance-optimal
multi-level cache hierarchies . . . . . 114--121
D. A. Wood and
R. H. Katz Supporting reference and dirty bits in
SPUR's virtual address cache . . . . . . 122--130
R. E. Kessler and
R. Jooss and
A. Lebeck and
M. D. Hill Inexpensive implementations of
set-associativity . . . . . . . . . . . 131--139
W. H. Wang and
J.-L. Baer and
H. M. Levy Organization and performance of a
two-level virtual-real cache hierarchy 140--148
C. R. Jesshope and
P. R. Miller and
J. T. Yantchev High performance communications in
processor networks . . . . . . . . . . . 150--157
H. E. Mizrahi and
J. L. Baer and
E. D. Lazowska and
J. Zahorjan Introducing memory into the switch
elements of multiprocessor
interconnection networks . . . . . . . . 158--166
S. L. Scott and
G. S. Sohi Using feedback to control tree
saturation in multistage interconnection
networks . . . . . . . . . . . . . . . . 167--176
P. D. Ezhilchelvan and
S. K. Shrivastava and
A. Tully Constructing replicated systems using
processors with point-to-point
communication links . . . . . . . . . . 177--184
H. Benker and
J. M. Beacco and
M. Dorochevsky and
Th. Jeffré and
A. Pöhlmann and
J. Noyé and
B. Poterie and
J. C. Syre and
O. Thibault and
G. Watzlawik KCM: a knowledge crunching machine . . . 186--194
A. Singhal and
Y. N. Patt A high performance Prolog processor with
multiple function units . . . . . . . . 195--202
M. Morioka and
S. Yamaguchi and
T. Bandoh Evaluation of memory system for
integrated Prolog processor IPP . . . . 203--210
K.-F. Wong and
M. H. Williams A type driven hardware engine for Prolog
clause retrieval over a large knowledge
base . . . . . . . . . . . . . . . . . . 211--222
W. W. Hwu and
T. M. Conte and
P. P. Chang Comparing software and hardware schemes
for reducing the cost of branches . . . 224--233
M. K. Farrens and
a. R. Pleszkun Improving performance of small on-chip
instruction caches . . . . . . . . . . . 234--241
W. W. Hwu and
P. P. Chang Achieving high instruction cache
performance with an optimizing compiler 242--251
P. Steenkiste The impact of code density on
instruction cache performance . . . . . 252--259
R. S. Nikhil Can dataflow subsume von Neumann
computing? . . . . . . . . . . . . . . . 262--272
W.-D. Weber and
A. Gupta Exploring the benefits of multiple
hardware contexts in a multiprocessor
architecture: preliminary results . . . 273--280
N. P. Jouppi Architectural and organizational
tradeoffs in the design of the
MultiTitan CPU . . . . . . . . . . . . . 281--289
M. Sato and
S. Ichikawa and
E. Goto Run-time checking in Lisp by integrating
memory addressing and range checking . . 290--297
A. Hopper and
A. Jones and
D. Lioupis Multiple vs. wide shared bus
multiprocessors . . . . . . . . . . . . 300--306
M. Annaratone and
R. Rühl Performance measurements on a commercial
multiprocessor running parallel code . . 307--314
M. Annaratone and
C. Pommerell and
R. Rühl Interprocessor communication speed and
performance in distributed-memory
parallel processors . . . . . . . . . . 315--324
D. S. Ghosal and
S. K. Tripathi and
L. N. Bhuyan and
H. Jiang Analysis of computation-communication
issues in dynamic dataflow architectures 325--333
S. Kravitz and
R. E. Bryant and
R. Rutenbar Logic simulation on massively parallel
architectures . . . . . . . . . . . . . 336--343
T. Fukazawa and
T. Kimura and
M. Tomizawa and
K. Takeda and
Y. Itoh R256: a research parallel processor for
scientific computation . . . . . . . . . 344--351
M. L. Anido and
D. J. Allerton and
E. J. Zaluska A three-port/three-access register file
for concurrent processing and I/O
communication in a RISC-like graphics
engine . . . . . . . . . . . . . . . . . 354--361
J. M. Mulder and
R. J. Portier and
A. Srivastava and
R. in't Velt An architecture framework for
application-specific and scalable
architectures . . . . . . . . . . . . . 362--369
K. Kim and
V. K. Prasanna-Kumar Perfect Latin squares and parallel array
access . . . . . . . . . . . . . . . . . 372--379
S. Weiss An aperiodic storage scheme to reduce
memory conflicts in vector processors 380--386
C.-L. Chen and
C.-K. Liao Analysis of vector access performance on
skewed interleaved memory . . . . . . . 387--394
A. Agarwal and
M. Cherian Adaptive backoff synchronization
techniques . . . . . . . . . . . . . . . 396--406
P. Stenström A cache consistency protocol for
multiprocessors with multistage networks 407--415
H.-M. Su and
P.-C. Yew On data synchronization for
multiprocessors . . . . . . . . . . . . 416--423
A. M. van Tilborg Panel on future directions in parallel
computer architecture . . . . . . . . . 3--53
N. J. Gunther and
M. T. Noga ParcBench: a benchmark for shared-memory
architectures . . . . . . . . . . . . . 54--61
A. Elkateeb and
T. Le-Ngoc A priority strategy on RISC for
real-time multitasking software
applications . . . . . . . . . . . . . . 62--68
Y.-J. Oyang A multiprocessor configuration in
accordance with the aspects of physical
and systems design . . . . . . . . . . . 69--73
H. Seebauer A memory controller executing segment
operations in time $ O(1) $ . . . . . . 74--81
R. J. Schwartz The design and development of a dynamic
program behavior measurement tool for
the Intel 8086/88 . . . . . . . . . . . 82--94
A. J. Martin and
S. M. Burns and
T. K. Lee and
D. Borkovic and
P. J. Hazewindus The first asynchronous microprocessor:
the test results . . . . . . . . . . . . 95--110
F. Cornett The UT1000 microprogramming simulator:
an educational tool . . . . . . . . . . 111--118
C. K. Yuen and
W. F. Wong A bidirectional data driven Lisp engine
for the direct execution of Lisp in
parallel . . . . . . . . . . . . . . . . 119--130
M. Smotherman A sequencing-based taxonomy of I/O
systems and review of historical
machines . . . . . . . . . . . . . . . . 5--15
R. Cousins DMA considerations on RISC workstations 16--23
R. H. Katz A project on high performance I/O
subsystems . . . . . . . . . . . . . . . 24--31
P. C. Dibble and
M. L. Scott Beyond striping: the bridge
multiprocessor file system . . . . . . . 32--39
A. L. N. Reddy and
P. Banerjee A study parallel disk organizations . . 40--47
J. M. Smith and
G. Q. Maguire, Jr. Measured response times for page-sized
fetches on a network . . . . . . . . . . 48--54
B. Wolman and
T. M. Olson IOBENCH: a system independent IO
benchmark . . . . . . . . . . . . . . . 55--70
T. M. Oslon Disk array performance in a random IO
environment . . . . . . . . . . . . . . 71--77
B. L. Wolman An analysis of server-based locking . . 78--82
E. H. Debaere Instruction-path coprocessing to solve
some RISC problems . . . . . . . . . . . 83--94
H. Seebauer A memory controller executing segment
operations in time $ O(1) $ . . . . . . 95--102
P. K. Chiu Representation of logic functions by
if--then clauses . . . . . . . . . . . . 103--107
C. Baleanu and
D. Tomescu Embedding computers in a cellular array 108--115
S. Lass On hardware enhanced 80386 software
emulation, compiled emulation, a program
distribution language, and pack
computers . . . . . . . . . . . . . . . 116--118
Daniel Litaize and
Omar Hammami and
Mustapha Lalam and
Adelaziz Mzoughi and
Pascl Sinrat Multiprocessors with a serial multiport
memory and a pseudo crossbar of serial
links used as a processor-memory switch 8--21
G. Fritsch and
W. Henning and
H. Hesenuer and
R. Klar and
C. U. Linster and
C. w. Oehlrich and
P. Schlenk and
J. Vokert Distributed shared memory multiprocessor
architecture MEMSY for high performance
parallel computations . . . . . . . . . 22--35
A. Mendelson and
D. K. Pradhan and
A. D. Singh A single cached copy data coherence
scheme for multiprocessor systems . . . 36--49
Dror G. Feitelson and
Larry Rudolph Architecture for a multi-user
general-purpose parallel system . . . . 50--56
D. Quammen and
D. R. Miller and
D. Tabak Register window architecture for
multitasking applications . . . . . . . 57--66
Arnold Rosenberg Efficient emulations of interconnection
networks . . . . . . . . . . . . . . . . 67--79
Isaac D. Scherson and
Peter F. Corbett Description and performance of a class
of orthogonal multiprocessor networks 80--90
Llana David and
Ran Ginosar and
Michael Yoeli An efficient implementation of Boolean
functions and finite state machine as
self-timed circuit . . . . . . . . . . . 91--104
Apostolos Dollan and
Robert F. Krick The case for the sustained performance
computer architecture . . . . . . . . . 129--136
Eric E. Johnson Working set prefetching for cache
memories . . . . . . . . . . . . . . . . 137--141
K. e H. Lee and
C. H. Lam Massage-passing controller for a
shared-memory multiprocessor . . . . . . 142--149
Tsong-Chih Hsu and
Ling-Yang Kung Logic and conflict-free vector addresses 150--153
Tsong-Chih Hsu and
Ling-Yang Kung An address generation unit for array
accessing . . . . . . . . . . . . . . . 154--160
Tsong-Chih Hsu and
Ling-Yang Kung A hardware mechanism for priority queue 162--169
V. Dvorak Microsequencer architecture supporting
arbitrary branching up to 2m targets . . 9--9
Jack J. Dongarra Performance of various computers using
standard linear equations software . . . 17--17
Tsong---Chih Hsu and
Ling---Yang Kung A comment on ``A Fetch-and-Op
Implementation for Parallel Computers'' 32--32
Robert Cousins A novel approach to character interfaces 35--35
Robert Cousins A reentrant peripheral interface . . . . 43--43
Noel W. Anderson Amorphous computer system architecture:
a preliminary look . . . . . . . . . . . 51--51
Yen-Jen Oyang and
Bor-Ting Chang and
Shu-May Lin A cost-effective approach to implement a
long instruction word microprocessor . . 59--59
C. Fritsch and
T. Sánchez and
J. Anaya Primitive based architectures . . . . . 73--73
Harold Lorin A model for recentralization of
computing: (distributed processing comes
home) . . . . . . . . . . . . . . . . . 81--81
Dan Teodosiu Computing in three dimensions . . . . . 99--99
Gary Frazier Ariel: a scalable multiprocessor for the
simulation of neural networks . . . . . 107--107
Robert P. Colwell Book review: \em High-Level Language
Computer Architecture edited by Veljko
Milutinovic (Computer Science Press,
1989) . . . . . . . . . . . . . . . . . 120--122
Behrooz Parhami Book review: \em Advanced Research in
VLSI, edited by Charles L. Seitz (The
MIT Press, Cambridge, MA, 1989, 373 pp.) 122--123
Wolfgang Matthes Hardware Resources: a generalizing view
on computer architectures . . . . . . . 7--14
Lawrence Rauchwerger and
Michael P. Farmwald A multiple floating point coprocessor
architecture . . . . . . . . . . . . . . 15--24
Andy Glew and
Wen-Mei Hwu Snoopy cache test-and-test-and-set
without excessive bus contention . . . . 25--32
Lee Higbee Quick and easy cache performance
analysis . . . . . . . . . . . . . . . . 33--44
Arvin Park and
Jeffrey C. Becker and
Richard J. Lipton IOStone: a synthetic file system
benchmark . . . . . . . . . . . . . . . 45--52
Dionisios N. Pnevmatikatos and
Mark D. Hill Cache performance of the integer SPEC
benchmarks on a RISC . . . . . . . . . . 53--68
A. B. Ruighaver A modular network for dense optical
interconnection of processing elements 69--75
Alessandro De Gloria VISA: a variable instruction set
architecture . . . . . . . . . . . . . . 76--84
Fleur L. Williams and
Gordon B. Steven Address and data register separation on
the M68000 family . . . . . . . . . . . 85--89
Sarita V. Adve and
Mark D. Hill Weak ordering---a new definition . . . . 2--14
Kourosh Gharachorloo and
Daniel Lenoski and
James Laudon and
Phillip Gibbons and
Anoop Gupta and
John Hennessy Memory consistency and event ordering in
scalable shared-memory multiprocessors 15--26
Joonwon Lee and
Umakishore Ramachandran Synchronization with multiprocessor
caches . . . . . . . . . . . . . . . . . 27--37
Po-Jen Chuang and
Nian-Feng Tzeng Dynamic processor allocation in
hypercube computers . . . . . . . . . . 40--49
Abdou Youssef and
Bruce Arden A new approach to fast control of $ r_2
\times r_2 $ $3$-stage Benes networks of
$ r \times r$ crossbar switches . . . . 50--59
William J. Dally Virtual-channel flow control . . . . . . 60--68
Shekhar Borkar and
Robert Cohn and
George Cox and
Thomas Gross and
H. T. Kung and
Monica Lam and
Margie Levine and
Brian Moore and
Wire Moore and
Craig Peterson and
Jim Susman and
Jim Sutton and
John Urbanski and
Jon Webb Supporting systolic and memory
communication in iWarp . . . . . . . . . 70--81
Gregory M. Papadopoulos and
David E. Culler Monsoon: an explicit token-store
architecture . . . . . . . . . . . . . . 82--91
Marco Annaratone and
Marco Fillo and
Kiyoshi Nakabayashi and
Marc Viredaz The K2 parallel processor: architecture
and hardware implementation . . . . . . 92--101
Anant Agarwal and
Beng-Hong Lim and
David Kranz and
John Kubiatowicz APRIL: a processor architecture for
multiprocessing . . . . . . . . . . . . 104--114
Roberto Bisiani and
Mosur Ravishankar PLUS: a distributed shared-memory system 115--124
John K. Bennett and
John B. Carter and
Willy Zwaenepoel Adaptive software cache management for
distributed shared memory architectures 125--134
David R. Ditzel and
John L. Hennessy and
Bernie Rudin and
Alan Jay Smith and
Stephen L. Squires and
Zeke Zalcstein Big science versus little science---do
you have to build it? (panel session) 136--136
Brian W. O'Krafka and
A. Richard Newton An empirical evaluation of two
memory-efficient directory methods . . . 138--147
Daniel Lenoski and
James Laudon and
Kourosh Gharachorloo and
Anoop Gupta and
John Hennessy The directory-based cache coherence
protocol for the DASH multiprocessor . . 148--159
Steven Przybylski The performance impact of block sizes
and fetch strategies . . . . . . . . . . 160--169
D. Alpert and
A. Averbuch and
O. Danieli Performance comparison of load/store and
symmetric instruction set architectures 172--181
Jack W. Davidson and
David B. Whalley Reducing the cost of branches by using
registers . . . . . . . . . . . . . . . 182--191
Carl E. Love and
Harry F. Jordan An investigation of static versus
dynamic scheduling . . . . . . . . . . . 192--201
Dileep Bhandarkar and
Richard Brunner VAX vector architecture . . . . . . . . 204--215
Robert W. Horst and
Richard L. Harris and
Robert L. Jardine Multiple instruction issue in the
NonStop Cyclone processor . . . . . . . 216--226
Shreekant S. Thakkar and
Mark Sweiger Performance of an OLTP application on
symmetry multiprocessor system . . . . . 228--238
Ding-Kai Chen and
Hong-Men Su and
Pen-Chung Yew The impact of synchronization and
granularity on parallel systems . . . . 239--248
Håkon O. Bugge and
Ernst H. Kristiansen and
Bjòrn O. Bakka Trace-driven simulations for a two-level
cache design in open bus systems . . . . 250--259
Jiun-Ming Hsu and
Prithviraj Banerjee Performance measurement and trace driven
simulation of parallel CAD and numeric
applications on a hypercube
multicomputer . . . . . . . . . . . . . 260--269
Anita Borg and
R. E. Kessler and
David W. Wall Generation and analysis of very long
address traces . . . . . . . . . . . . . 270--279
Bruce K. Holmer and
Barton Sano and
Michael Carlton and
Peter Van Roy and
Ralph Haygood and
William R. Bush and
Alvin M. Despain and
Joan M. Pendleton and
Tep Dobry Fast Prolog with an extended general
purpose architecture . . . . . . . . . . 282--291
Leon Alkalaj and
Tomás Lang and
Milo\vs Ercegovac Architectural support for the management
of tightly-coupled fine-grain goals in
flat concurrent Prolog . . . . . . . . . 292--301
Samuel Ho and
Lawrence Snyder Balance in architectural design . . . . 302--310
A. L. Narasimha Reddy and
Prithviraj Banerjee A study of I/O behavior of perfect
benchmarks on a multiprocessor . . . . . 312--321
Peter M. Chen and
David A. Patterson Maximizing performance in a striped disk
array . . . . . . . . . . . . . . . . . 322--331
Kang G. Shin and
Greg Dykema A distributed I/O architecture for HARTS 332--342
Michael D. Smith and
Monica S. Lam and
Mark A. Horowitz Boosting beyond static scheduling in a
superscalar processor . . . . . . . . . 344--354
George Taylor and
Peter Davies and
Michael Farmwald The TLB slice---a low-cost high-speed
address translation mechanism . . . . . 355--363
Norman P. Jouppi Improving direct-mapped cache
performance by the addition of a small
fully-associative cache and prefetch
buffers . . . . . . . . . . . . . . . . 364--373
Edward S. Davidson and
Gurindar S. Sohl and
Joseph A. Fisher and
Greg Grohoski and
Yale Pratt and
J. E. Smith and
David R. Stiles Better than one operation per clock
(panel): vectors, VLIW, and superscalar 376--376
Robert Alverson and
David Callahan and
Daniel Cummings and
Brian Koblenz and
Allan Porterfield and
Burton Smith The Tera computer system . . . . . . . . 1--6
K. Hwang and
M. Dubois and
D. K. Panda and
S. Rao and
S. Shang and
A. Uresin and
W. Mao and
H. Nair and
M. Lytwyn and
F. Hsieh and
J. Liu and
S. Mehrotra and
C. M. Cheng OMP: a RISC-based multiprocessor using
orthogonal-access memories and multiple
spanning buses . . . . . . . . . . . . . 7--22
Kechang Dai and
Wolfgang K. Giloi A basic architecture supporting LGDG
computation . . . . . . . . . . . . . . 23--33
Sang Lyul Min and
Jean-Loup Baer and
Hyoung-Joo Kim An efficient caching support for
critical sections in large-scale
shared-memory multiprocessors . . . . . 34--47
Umpei Nagashima and
Fumio Nishimoto and
Takashi Shibata and
Hiroshi Itoh and
Minoru Gotoh An improvement of I/O function for
auxiliary storage: parallel I/O for a
large scale supercomputing . . . . . . . 48--59
Nian-Feng Tzeng Analysis of a variant hypercube topology 60--70
P. J. van der Houwen and
B. P. Sommeijer Parallel ODE solvers . . . . . . . . . . 71--81
M. J. Daydé and
I. S. Duff Use of parallel level 3 BLAS in LU
factorization on three vector
multiprocessors the ALLIANT FX/80, the
CRAY-2, and the IBM 3090 VF . . . . . . 82--95
E. N. Houstis and
J. R. Rice and
N. P. Chrisochoides and
H. C. Karathanasis and
P. N. Papachiou and
M. K. Samartzis and
E. A. Vavalis and
Ko Yang Wang and
S. Weerawarana //ELLPACK: a numerical simulation
programming environment for parallel
MIMD machines . . . . . . . . . . . . . 96--107
Christina C. Christara Schur complement preconditioned
conjugate gradient methods for spline
collocation equations . . . . . . . . . 108--120
Kuo-Liang Chung and
Ferng-Ching Lin and
Wen-Chin Chen Cost-optimal parallel B-spline
interpolations . . . . . . . . . . . . . 121--131
K. Gallivan and
A. Sameh and
Z. Zlatev Solving general sparse linear systems
using conjugate gradient-type methods 132--139
Toshitsugu Yuba and
Toshio Shimada and
Yoshinori Yamaguchi and
Kei Hiraki and
Shuichi Sakai Dataflow computer development in Japan 140--147
Vivek Sarkar and
David Cann POSC---a partitioning and optimizing
SISAL compiler . . . . . . . . . . . . . 148--164
François Bodin and
François Charot Loop optimization for horizontal
microcoded machines . . . . . . . . . . 164--176
Peiyi Tang and
Pen-Chung Yew and
Chuan-Qi Zhu Compiler techniques for data
synchronization in nested parallel loops 177--186
David E. Hudak and
Santosh G. Abraham Compiler techniques for data
partitioning of sequentially iterated
parallel loops . . . . . . . . . . . . . 187--200
David Klappholz and
Kleanthis Psarris and
Xiangyun Kong On the perfect accuracy of an
approximate subscript analysis test . . 201--212
Allen D. Malony and
Daniel A. Reed A hardware-based performance monitor for
the Intel iPSC/2 hypercube . . . . . . . 213--226
R. T. Dimpsey and
R. K. Iyer Performance degradation due to
multiprogramming and system overheads in
real workloads: case study on a shared
memory multiprocessor . . . . . . . . . 227--238
Youcef Saad and
Harry A. G. Wijshoff SPARK: a benchmark package for sparse
computations . . . . . . . . . . . . . . 239--253
George Cybenko and
Lyle Kipp and
Lynn Pointer and
David Kuck Supercomputer performance evaluation and
the Perfect Benchmarks . . . . . . . . . 254--266
Ahmed K. Noor and
Jeanne M. Peters Strategies for large-scale structural
problems on high-performance computers 267--280
V. Zecca and
A. Kamel Elastodynamics on clustered vector
multiprocessors . . . . . . . . . . . . 281--290
Victor Eijkhout Implementation of $5$-point/$9$-point
multi-level methods on hypercube
architectures . . . . . . . . . . . . . 291--295
Philip C. Chen Supercomputer-based visualization
systems used for analyzing output data
of a numerical weather prediction model 296--309
Yoshizo Takahashi and
Shigetaka Sasaki Parallel automated wire-routing with a
number of competing processors . . . . . 310--317
Tony F. Chan Hierarchical algorithms and
architectures for parallel scientific
computing . . . . . . . . . . . . . . . 318--329
Kevin Smith and
Bill Appelbe and
Kurt Stirewalt Incremental dependence analysis for
interactive parallelization . . . . . . 330--341
Roland Rühl and
Marco Annaratone Parallelization of FORTRAN code on
distributed-memory parallel processors 342--353
Edward H. Gornish and
Elana D. Granston and
Alexander V. Veidenbaum Compiler-directed data prefetching in
multiprocessors with memory hierarchies 354--368
Guang R. Gao and
Herbert H. J. Hum and
Yue-Bong Wong Towards efficient fine-grain software
pipelining . . . . . . . . . . . . . . . 369--379
Françoise André and
Jean-Louis Pazat and
Henry Thomas Pandore: a system to manage data
distribution . . . . . . . . . . . . . . 380--388
Rod A. Fatoohi Vector performance analysis of the NEC
SX-2 . . . . . . . . . . . . . . . . . . 389--400
François Bodin and
Daniel Windheiser and
William Jalby and
Daya Atapattu and
Mannho Lee and
Dennis Gannon Performance evaluation and prediction
for parallel algorithms on the BBN
GP1000 . . . . . . . . . . . . . . . . . 401--413
Luigi Brochard and
Alex Freau Designing algorithms on hierarchical
memory multiprocessors . . . . . . . . . 414--427
Ingrid Y. Bucher and
Donald A. Calahan Access conflicts in multiprocessor
memories queueing models and simulation
studies . . . . . . . . . . . . . . . . 428--438
Emilio Luque and
Ana Ripoll and
Porfidio Hernández and
Tomás Margalef Impact of task duplication on
static-scheduling performance in
multiprocessor systems with variable
execution-time tasks . . . . . . . . . . 439--446
Apostolos Gerasoulis and
Sesh Venugopal and
Tao Yang Clustering task graphs for message
passing architectures . . . . . . . . . 447--456
Edwin M. Paalvast and
Arjan J. van Gemund and
Henk J. Sips A method for parallel program generation
with an application to the Booster
language . . . . . . . . . . . . . . . . 457--469
M. A. Tsoukarellas and
T. S. Papatheodorou A run time support system for
multiprocessor machines . . . . . . . . 470--478
Anthony J. G. Hey Supercomputing with transputers---past,
present and future . . . . . . . . . . . 479--489
Burton Smith The end of architecture . . . . . . . . 10--17
Mark D. Hill What is scalability? . . . . . . . . . . 18--21
P. A. Laplante A novel single instruction computer
architecture . . . . . . . . . . . . . . 22--26
Ran Ginosar and
Nick Michell On the potential of asynchronous
pipelined processors . . . . . . . . . . 27--34
Yen-Jen Oyang and
Chun-Hung Wen and
Yu-Fen Chen and
Shu-May Lin The effect of employing advanced
branching mechanisms in superscalar
processors . . . . . . . . . . . . . . . 35--52
Yannick Deville A low-cost usage-based replacement
algorithm for cache memories . . . . . . 52--58
Bernard K. Gunther A high speed mechanism for short
branches . . . . . . . . . . . . . . . . 59--61
Robert McLaughlin Design for fast DSP machine . . . . . . 62--66
Werner B. Joerg A subclass of Petri Nets as design
abstraction for parallel architectures 67--77
Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 80--89
Glen G. Langdon, Jr. Book review: \em Highly Parallel
Computing by George Almasi and Allan
Gotlieb (Benjamin/Cummings, 1989) . . . 90--90
Glen G. Langdon, Jr. Book review: \em Solving Problems on
Concurrent Processors, Vol II: Software
for Concurrent Processors by I. Angus,
G. Fox, J. Kim, and D. Walker
(Prentice-Hall, 1990) . . . . . . . . . 90--91
Marc Dikotter Book review: \em The Definition of
Standard ML by R. Milner, M. Torte, R.
Harper . . . . . . . . . . . . . . . . . 91--91
F. T. Leighton Selected Papers from the Symposium on
Parallel Algorithms and Architectures 5--5
John Y. Ngai and
Charles L. Seitz A framework for adaptive routing in
multicomputer networks . . . . . . . . . 6--14
Richard Beigel and
Clydel P. Kruskal Processor networks and interconnection
networks without long wires (extended
abstract) . . . . . . . . . . . . . . . 15--24
Fred Annexstein Fault tolerance in hypercube-derivative
networks (preliminary version) . . . . . 25--34
Richard M. Fujimoto The Virtual Time Machine . . . . . . . . 35--44
Ginfranco Bilardi and
Scot W. Hornick and
Majid Sarrafzadeh Optimal VLSI architectures for
multidimensional DFT (preliminary
version) . . . . . . . . . . . . . . . . 45--52
Clark D. Thomborson and
Belle W.-Y. Wei Systolic implementations of a
move-to-front text compressor . . . . . 53--60
Thomas F. Knight, Jr. Technologies for low latency
interconnection switches . . . . . . . . 61--68
Martin C. Herbordt and
Charles C. Weems and
James C. Corbett Message-passing algorithms for a SIMD
torus with coteries . . . . . . . . . . 69--78
S. Konstantinidou and
L. Snyder The chaos router: a practical
application of randomization in network
routing . . . . . . . . . . . . . . . . 79--88
Jehoshua Bruck and
Robert Cypher and
Danny Soroker Running algorithms efficiently on faulty
hypercubes (extended abstract) . . . . . 89--96
Naomi Nishimura Asynchronous shared memory parallel
computation (preliminary version) . . . 97--105
M. Shand and
P. Bertin and
J. Vuillemin Hardware speedups in long integer
multiplication . . . . . . . . . . . . . 106--113
Manu Thapar and
Bruce Delagi Cache coherence for large scale shared
memory multiprocessors . . . . . . . . . 114--119
Peter Grabienski FLIP-FLOP: a stack-oriented
multiprocessing system . . . . . . . . . 120--127
Camille C. Price Task allocation in data flow
multiprocessors: an annotated
bibliography . . . . . . . . . . . . . . 128--134
Rod Adams and
Gordon Steven A parallel pipelined processor with
conditional instruction execution . . . 135--142
Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 146--150
Michael L. Hilton Book review: \em Systems Programming in
Parallel Logic Languages by Ian Foster
(Prentice Hall, 1990) . . . . . . . . . 151--151
Keith Anthony Book review: \em Technology Projection
Modeling of Future Computer Systems by
Al Cutaia (Prentice-Hall, 1990) . . . . 152--153
Paul B. Schneck Book review: \em Optimizing FORTRAN
Programs by C. F. Schofield (Halstead
Press, 1989) . . . . . . . . . . . . . . 153--154
Robert Bernecky Book review: \em Multiprocessors by
Daniel Tabak (Prentice Hall, Englewood
Cliffs, NJ) . . . . . . . . . . . . . . 154--156
Robert Bernecky Book review: \em Multiprocessor
Performance by Erol Gelenbe (J. Wiley &
Sons, Chichester, England) . . . . . . . 156--157
John Fulcher Book review: \em Neural Net Applications
and Products by Richard K. Miller, Terri
C. Walker, and Anne M. Ryan (SEAl
Technical Publications, 1990) . . . . . 157--158
Andrew Wolfe and
John P. Shen A variable instruction stream extension
to the VLIW architecture . . . . . . . . 2--14
Manolis Katevenis and
Nestoras Tzartzanis Reducing the branch penalty by
rearranging instructions in a
double-width memory . . . . . . . . . . 15--27
Roland L. Lee and
Alex Y. Kwok and
Fayé A. Briggs The floating point performance of a
superscalar SPARC processor . . . . . . 28--37
David Callahan and
Ken Kennedy and
Allan Porterfield Software prefetching . . . . . . . . . . 40--52
Gurindar S. Sohi and
Manoj Franklin High-bandwidth data memory systems for
superscalar processors . . . . . . . . . 53--62
Monica D. Lam and
Edward E. Rothberg and
Michael E. Wolf The cache performance and optimizations
of blocked algorithms . . . . . . . . . 63--74
Jeffrey C. Mogul and
Anita Borg The effect of context switches on cache
performance . . . . . . . . . . . . . . 75--84
David Keppel A portable interface for on-the-fly
instruction space modification . . . . . 86--95
Andrew W. Appel and
Kai Li Virtual memory primitives for user
programs . . . . . . . . . . . . . . . . 96--107
Thomas E. Anderson and
Henry M. Levy and
Brian N. Bershad and
Edward D. Lazowska The interaction of architecture and
operating system design . . . . . . . . 108--120
David G. Bradlee and
Susan J. Eggers and
Robert R. Henry Integrating register allocation and
instruction scheduling for RISCs . . . . 122--131
Manuel E. Benitez and
Jack W. Davidson Code generation for streaming: an
access/execute mechanism . . . . . . . . 132--141
Rajive Bagrodia and
Sharad Mathur Efficient Implementation of high-level
parallel programs . . . . . . . . . . . 142--151
William Mangione-Smith and
Santosh G. Abraham and
Edward S. Davidson Vector register design for polycyclic
vector scheduling . . . . . . . . . . . 154--163
David E. Culler and
Anurag Sah and
Klaus E. Schauser and
Thorsten von Eicken and
John Wawrzynek Fine-grain parallelism with minimal
hardware support: a compiler-controlled
threaded abstract machine . . . . . . . 164--175
David W. Wall Limits of instruction-level parallelism 176--188
Edward K. Lee and
Randy H. Katz Performance consequences of parity
placement in disk arrays . . . . . . . . 190--199
Vincent Cate and
Thomas Gross Combining the concepts of compression
and caching for a two-level filesystem 200--211
William J. Bolosky and
Michael L. Scott and
Robert P. Fitzgerald and
Robert J. Fowler and
Alan L. Cox NUMA policies and their relation to
memory architecture . . . . . . . . . . 212--221
David Chaiken and
John Kubiatowicz and
Anant Agarwal LimitLESS directories: a scalable cache
coherence scheme . . . . . . . . . . . . 224--234
Sang L. Min and
Jong-Deok Choi An efficient cache-based access anomaly
detection scheme . . . . . . . . . . . . 235--244
Kourosh Gharachorloo and
Anoop Gupta and
John Hennessy Performance evaluation of memory
consistency models for shared-memory
multiprocessors . . . . . . . . . . . . 245--257
Eric Freudenthal and
Allan Gottlieb Process coordination with
fetch-and-increment . . . . . . . . . . 260--268
John M. Mellor-Crummey and
Michael L. Scott Synchronization without contention . . . 269--278
Douglas Johnson The case for a read barrier . . . . . . 279--287
Robert F. Cmelik and
Shing I. Kong and
David R. Ditzel and
Edmund J. Kelly An analysis of MIPS and SPARC
instruction set utilization on the SPEC
benchmarks . . . . . . . . . . . . . . . 290--302
C. Brian Hall and
Kevin O'Brien Performance characteristics of
architectural features of the IBM RISC
System/6000 . . . . . . . . . . . . . . 303--309
Dileep Bhandarkar and
Douglas W. Clark Performance from architecture: comparing
a RISC and a CISC with similar hardware
organization . . . . . . . . . . . . . . 310--319
R. F. DeMara and
D. I. Moldovan The SNAP-1 parallel AI prototype . . . . 2--11
Wei Siong Tan and
H. Russ and
Cecil O. Alford GT-EP: a novel high-performance
real-time architecture . . . . . . . . . 13--21
Tetsuya Higuchi and
Tatsumi Furuya and
Kenichi Handa and
Naoto Takahashi and
Hiroyasu Nishiyama and
Akio Kokubu IXM2: a parallel associative processor 22--31
David R. Kaeli and
Philip G. Emma Branch history table prediction of
moving target branches due to subroutine
returns . . . . . . . . . . . . . . . . 34--42
Alexander C. Klaiber and
Henry M. Levy An architecture for software-controlled
data prefetching . . . . . . . . . . . . 43--53
John W. C. Fu and
Janak H. Patel Data prefetching in multiprocessor
vector cache memories . . . . . . . . . 54--63
D. T. Harper III Reducing memory contention in shared
memory multiprocessors . . . . . . . . . 66--73
B. Ramakrishna Rau Pseudo-randomly interleaved memory . . . 74--83
Kai Li and
Karin Petersen Evaluation of memory system extensions 84--93
Patrick W. Dowd High performance interprocessor
communication through optical wavelength
division multiple access channels . . . 96--105
Anders Landin and
Erik Hagersten and
Seif Haridi Race-free interconnection networks and
multiprocessor consistency . . . . . . . 106--115
Xiaola Lin and
Lionel M. Ni Deadlock-free multicast wormhole routing
in multicomputer networks . . . . . . . 116--125
Matthew Farrens and
Arvin Park Dynamic base register caching: a
technique for reducing address bus width 128--137
O. A. Olukotun and
T. N. Mudge and
R. B. Brown Implementing a cache for a
high-performance GaAs microprocessor . . 138--147
Lizyamma Kurian and
Paul T. Hulina and
Lee D. Coraor and
Dhamir N. Mannai Classification and performance
evaluation of instruction buffering
techniques . . . . . . . . . . . . . . . 150--159
Masaitsu Nakajima and
Hiraku Nakano and
Yasuhiro Nakakura and
Tadahiro Yoshida and
Yoshiyuki Goi and
Yuji Nakai and
Reiji Segawa and
Takeshi Kishida and
Hiroshi Kadota OHMEGA: a VLSI superscalar processor
architecture for numerical applications 160--168
Sriram Vajapeyam and
Gurindar S. Sohi and
Wei-Chung Hsu An empirical study of the CRAY Y-MP
processor using the Perfect Club
benchmarks . . . . . . . . . . . . . . . 170--179
Chriss Stephens and
Bryce Cogswell and
John Heinlein and
Gregory Palmer and
John P. Shen Instruction level profiling and
evaluation of the IBM/6000 . . . . . . . 180--189
R. T. Dimpsey and
R. K. Iyer Performance prediction and tuning on a
multiprocessor . . . . . . . . . . . . . 190--199
C. W. Oehlrich and
A. Quick Performance evaluation of a
communication system for
transputer-networks based on monitored
event traces . . . . . . . . . . . . . . 202--211
S. Konstantinidou and
L. Snyder Chaos router: architecture and
performance . . . . . . . . . . . . . . 212--221
Shridhar B. Shukla and
Dharma P. Agrawal Scheduling pipelined communication in
distributed memory multiprocessors for
real-time applications . . . . . . . . . 222--231
Sarita V. Adve and
Mark D. Hill and
Barton P. Miller and
Robert H. B. Netzer Detecting data races on weak memory
systems . . . . . . . . . . . . . . . . 234--243
Eric J. Koldinger and
Susan J. Eggers and
Henry M. Levy On the validity of trace-driven
simulation for multiprocessors . . . . . 244--253
Anoop Gupta and
John Hennessy and
Kourosh Gharachorloo and
Todd Mowry and
Wolf-Dietrich Weber Comparative evaluation of latency
reducing and tolerating techniques . . . 254--263
Pohua P. Chang and
Scott A. Mahlke and
William Y. Chen and
Nancy J. Warter and
Wen-mei W. Hwu IMPACT: an architectural framework for
multiple-instruction-issue processors 266--275
Michael Butler and
Tse-Yu Yeh and
Yale Patt and
Mitch Alsup and
Hunter Scales and
Michael Shebanow Single instruction stream parallelism is
greater than two . . . . . . . . . . . . 276--286
Stephen Melvin and
Yale Patt Exploiting fine-grained parallelism
through a combination of hardware and
software techniques . . . . . . . . . . 287--296
Sarita V. Adve and
Vikram S. Adve and
Mark D. Hill and
Mary K. Vernon Comparison of hardware and software
cache coherence schemes . . . . . . . . 298--308
Richard Simoni and
Mark Horowitz Modeling the performance of limited
pointers directories for cache coherence 309--319
Donna J. Quammen and
D. Richard Miller Flexible register management for
sequential programs . . . . . . . . . . 320--329
David G. Bradlee and
Susan J. Eggers and
Robert R. Henry The effect on RISC performance of
register set size and structure versus
code generation strategy . . . . . . . . 330--339
Gregory M. Papadopoulos and
Kenneth R. Traub Multithreading: a revisionist view of
dataflow architectures . . . . . . . . . 342--351
Tzi-cker Chiueh Multi-threaded vectorization . . . . . . 352--361
Matthew K. Farrens and
Andrew R. Pleszkun Strategies for achieving improved
processor throughput . . . . . . . . . . 362--369
Toyohiko Kagimasa and
Kikuo Takahashi and
Toshiaki Mori and
Seiichi Yoshizumi Adaptive storage management for very
large virtual/real storage systems . . . 372--379
Judith S. Hall and
Paul T. Robinson Virtualizing the VAX architecture . . . 380--389
Janaki Akella and
Daniel P. Siewiorek Modeling and measurement of the impact
of Input/Output on system performance 390--399
Paul R. Wilson Pointer swizzling at page fault time:
efficiently supporting huge address
spaces on standard hardware . . . . . . 6--13
Morihiro Kuga and
Kazuaki Murakami and
Shinji Tomita DSNS (dynamically-hazard-resolved
statically-code-scheduled, nonuniform
superscalar): yet another superscalar
processor architecture . . . . . . . . . 14--29
Carl Ponder Performance variation across benchmark
suites . . . . . . . . . . . . . . . . . 30--36
Thomas M. Conte and
Wen-mei W. Hwu A brief survey of benchmark usage in the
architecture community . . . . . . . . . 37--44
Todd D. Morris and
Edward F. Gehringer A cost-effective reliable multipath
interconnection network . . . . . . . . 45--65
P. A. Laplante An improved conditional branching scheme
for a single instruction computer
architecture . . . . . . . . . . . . . . 66--68
Andrew J. DuBois and
John Rasure Design and evaluation of a distributed
asynchronous VLSI crossbar switch
controller for a packet switched
supercomputer network . . . . . . . . . 69--79
Stanley E. Lass The compiler controlled pack cache and
messaging . . . . . . . . . . . . . . . 80--85
Theo Ungerer and
Eberhard Zehendner A multi-level parallelism architecture 86--93
Wolfgang Matthes How many operation units are adequate? 94--108
Alberto R. Cunha and
Carlos N. Ribeiro and
José A. Marques The architecture of a memory management
unit for object-oriented systems . . . . 109--116
Norman Matloff An argument against scalable cache
coherency . . . . . . . . . . . . . . . 117--123
D. P. Rodohan and
R. J. Glover An overview of the A architecture for
optimisation problems in a logic
programming environment . . . . . . . . 124--131
Stuart C. Wray Time-sequenced DMA for multimedia
computers . . . . . . . . . . . . . . . 132--137
Ganesh Ramamoorthy and
Alok N. Choudhary A bibliography for multiprocessor cache
memories . . . . . . . . . . . . . . . . 138--153
Alan Jay Smith Second bibliography on Cache memories 154--182
Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 185--191
David A. Patterson Towards guidelines for SIGARCH sponsored
conferences . . . . . . . . . . . . . . 7--7
Yeong-Chang Maa and
Dhiraj K. Pradhan and
Dominique Thiébaut Two economical directory schemes for
large-scale cache coherent
multiprocessors . . . . . . . . . . . . 10--10
Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 21--26
Vladimir G. Ivanovic Book review: \em Computation Structures
by Stephen A Ward and Robert H.
Halstead, Jr. (MIT Press or McGraw-Hill,
1990) . . . . . . . . . . . . . . . . . 27--29
Moshe Krieger Book review: \em Multiprocessors by D.
Tabak (Prentice-Hall, 1990) . . . . . . 27--29
John Fulcher Book review: \em The 68000 and 68020
Microprocessors: Hardware, Software and
Interfacing Techniques by W. Triebel and
A. Singh (Prentice Hall, 1991) . . . . . 29--30
Henry G. Baker Precise instruction scheduling without a
precise machine model . . . . . . . . . 4--8
Robert McLaughlin Look-ahead branching hardware . . . . . 9--11
Thomas Beth and
Volker Hatz A restricted crossbar implementation and
its applications . . . . . . . . . . . . 12--16
Mark Thorson Usenet nuggets . . . . . . . . . . . . . 19--23
Robert Bernecky Book review: \em Past, Present,
Parallel: A Survey of Available Parallel
Computing Systems by Arthur Trew & Greg
Wilson (Eds.), (Springer-Verlag 1991) 24--25
Jaswinder Pal Singh and
Wolf-Dietrich Weber and
Anoop Gupta SPLASH: Stanford parallel applications
for shared-memory . . . . . . . . . . . 5--44
Eligiusz Wajda SPIRE: streaming processing with
instructions release element . . . . . . 45--54
Yannick Deville and
Jean Gobert A class of replacement policies for
medium and high-associativity structures 55--64
Richard N. Zucker and
Jean-Loup Baer A performance study of memory
consistency models . . . . . . . . . . . 2--12
Pete Keleher and
Alan L. Cox and
Willy Zwaenepoel Lazy release consistency for software
distributed shared memory . . . . . . . 13--21
Kourosh Gharachorloo and
Anoop Gupta and
John Hennessy Hiding memory latency using dynamic
scheduling in shared-memory
multiprocessors . . . . . . . . . . . . 22--33
Edil S. T. Fernandes and
Fernando M. B. Barbosa Effects of building blocks on the
performance of super-scalar architecture 36--45
Monica S. Lam and
Robert P. Wilson Limits of control flow on parallelism 46--57
Manoj Franklin and
Gurindar S. Sohi The expandable split window paradigm for
exploiting fine-grain parallelism . . . 58--67
Daniel Litaize and
Abdelaziz Mzoughi and
Christine Rochange and
Pascal Sainrat Towards a shared-memory massively
parallel multiprocessor . . . . . . . . 70--79
Per Stenström and
Truman Joe and
Anoop Gupta Comparative performance evaluation of
cache-coherent NUMA and COMA
architectures . . . . . . . . . . . . . 80--91
Daniel Lenoski and
James Laudon and
Truman Joe and
David Nakahira and
Luis Stevens and
Anoop Gupta and
John Hennessy The DASH prototype: implementation and
performance . . . . . . . . . . . . . . 92--103
Gideon Intrater and
Ilan Spillinger Performance evaluation of a decoded
instruction cache for variable
instruction-length computers . . . . . . 106--113
J. Bradley Chen and
Anita Borg and
Norman P. Jouppi A simulation based study of TLB
performance . . . . . . . . . . . . . . 114--123
Tse-Yu Yeh and
Yale N. Patt Alternative implementations of two-level
adaptive branch prediction . . . . . . . 124--134
Hiroaki Hirata and
Kozo Kimura and
Satoshi Nagamine and
Yoshiyuki Mochizuki and
Akio Nishimura and
Yoshimori Nakase and
Teiji Nishizawa An elementary processor architecture
with simultaneous instruction issuing
from multiple threads . . . . . . . . . 136--145
Mitsuhisa Sato and
Yuetsu Kodama and
Shuichi Sakai and
Yoshinori Yamaguchi and
Yasuhito Koumura Thread-based programming for the EM-4
hybrid dataflow machine . . . . . . . . 146--155
R. S. Nikhil and
G. M. Papadopoulos and
Arvind T: a multithreaded massively parallel
architecture . . . . . . . . . . . . . . 156--167
Czarek Dubnicki and
Thomas J. LeBlanc Adjustable block size coherent caches 170--180
Kunle Olukotun and
Trevor Mudge and
Richard Brown Performance optimization of pipelined
primary cache . . . . . . . . . . . . . 181--190
Scott McFarling Cache replacement with dynamic exclusion 191--200
Stephem W. Keckler and
William J. Dally Processor coupling: integrating compile
time and runtime scheduling for
parallelism . . . . . . . . . . . . . . 202--213
Bob Boothe and
Abhiram Ranade Improved multithreading techniques for
hiding communication latency in
multiprocessors . . . . . . . . . . . . 214--223
Alessandro De Gloria and
Paolo Faraboschi Instruction-level parallelism in Prolog:
analysis and architectural support . . . 224--233
Lizyamma Kurian and
Paul T. Hulina and
Lee D. Coraor Memory latency effects in decoupled
architectures with a single data memory
module . . . . . . . . . . . . . . . . . 236--245
André Seznec and
Jacques Lenfant Interleaved parallel schemes: improving
memory throughput on supercomputers . . 246--255
Thorsten von Eicken and
David E. Culler and
Seth Copen Goldstein and
Klaus Erik Schauser Active messages: a mechanism for
integrated communication and computation 256--266
Andrew A. Chien and
Jae H. Kim Planar-adaptive routing: low-cost
adaptive networks for multiprocessors 268--277
Christopher J. Glass and
Lionel M. Ni The turn model for adaptive routing . . 278--287
Toshiyuki Shimizu and
Takeshi Horie and
Hiroaki Ishihata Low-latency message communication
support for the AP1000 . . . . . . . . . 288--297
Barbara P. Aichinger Futurebus+ as an I/O bus: profile B . . 300--307
A. L. Narasimha Reddy A study of I/O system organizations . . 308--317
Jai Menon and
Dick Mattson Comparison of sparing alternatives for
disk arrays . . . . . . . . . . . . . . 318--329
Markus Siegle and
Richard Hofmann Monitoring program behaviour on SUPRENUM 332--341
Todd M. Austin and
Gurindar S. Sohi Dynamic dependency analysis of ordinary
programs . . . . . . . . . . . . . . . . 342--351
Walid A. Najjar and
W. Marcus Miller and
A. P. Wim Böhm An analysis of loop latency in dataflow
execution . . . . . . . . . . . . . . . 352--360
Qing Yang and
Liping Wu Yang A novel cache design for vector
processing . . . . . . . . . . . . . . . 362--371
Mateo Valero and
Tomás Lang and
José M. Llabería and
Montse Peiron and
Eduard Ayguadé and
Juan J. Navarra Increasing the number of strides for
conflict-free vector access . . . . . . 372--381
Wm. A. Wulf Evaluation of the WM architecture . . . 382--390
Kirk L. Johnson The impact of communication locality on
large-scale multiprocessor performance 392--402
Steven L. Scott and
James R. Goodman and
Mary K. Vernon Performance of the SCI ring . . . . . . 403--414
Madhusudhan Talluri and
Shing Kong and
Mark D. Hill and
David A. Patterson Tradeoffs in supporting two page sizes 415--424
Ahmed Louri and
Jongwhoa Na Parallel electro-optical rule-based
system for fast execution of expert
systems (abstract) . . . . . . . . . . . 427--427
André Seznec and
Karl Courtel OPAC (abstract): a floating-point
coprocessor dedicated to compute-bound
kernels . . . . . . . . . . . . . . . . 427--427
Der-Chung Cheng and
Kanad Ghose The time-constrained barrier
synchronizer and its applications in
parallel systems (abstract) . . . . . . 428--428
Ahmed Louri and
Hongki Sung A new compiler-directed cache coherence
scheme for shared memory multiprocessors
with fast and parallel explicit
invalidation (abstract) . . . . . . . . 428--428
Gautam B. Singh Architecture of a graphics processor
(abstract) . . . . . . . . . . . . . . . 429--429
Ruben Yomtov Performance evaluation of disk
subsystems . . . . . . . . . . . . . . . 429--429
Feipei Lai and
Meng-chou Chang Enhancing boosting with semantic
register in a superscalar processor
(abstract) . . . . . . . . . . . . . . . 430--430
Ivan Sklenar Prefetch unit for vector operations on
scalar computers (abstract) . . . . . . 430--430
Gary Newman Memory management support for tiled
array organization (abstract) . . . . . 431--431
Augustus K. Uht and
Darin B. Johnson Data path issues in a highly concurrent
machine (abstract) . . . . . . . . . . . 431--431
Samuel A. Fineberg and
Thomas L. Casavant and
Brent H. Pease Seamless --- a latency-tolerant
RISC-based multiprocessor architecture
(abstract) . . . . . . . . . . . . . . . 432--432
M. A. Sayeed and
M. Atiquzzaman Performance of multiple-bus
multiprocessor under non-uniform memory
reference model (abstract) . . . . . . . 432--432
M. Tahar Kechadi and
J-L. Dekeyser and
Ph. Marquet and
Ph. Preux Performance improvement for vector
pipeline multiprocessor systems using a
disordered execution model(abstract) . . 433--433
Anujan Varma and
Gunjan Sinha A class of prefetch schemes for on-chip
data caches . . . . . . . . . . . . . . 433--433
Arthur Abnous and
Nader Bagherzadeh Pipelining and bypassing in a VLIW
processor (abstract) . . . . . . . . . . 434--434
Shiv Prakash and
Alice C. Parker Synthesis of application-specific
heterogeneous multiprocessor systems
(abstract) . . . . . . . . . . . . . . . 434--434
Matthew Farrens and
Arvin Park and
Rob Fanfelle and
Pius Ng and
Gary Tyson A partitioned translation lookaside
buffer approach to reducing address
bandwidth (abstract) . . . . . . . . . . 435--435
James Laudon and
Anoop Gupta and
Mark Horowitz Architectural and implementation
tradeoffs in the design of
multiple-context processors (abstract) 435--435
Brian D. Alleyne and
Isaac D. Scherson Expanded delta networks for very large
parallel computers . . . . . . . . . . . 436--436
Jaswinder Pal Singh Implications of hierarchical N-body
methods for multiprocessor architecture 436--436
Wisam Michael Directory-based cache coherency protocol
for a ring-connected
multiprocessor-array . . . . . . . . . . 437--437
Wen-Hann Wang and
Jim Quinlan and
Konrad Lai Revisit the case for direct-mapped
chaches: a case for two-way
set-associative level-two caches . . . . 437--437
David E. Culler and
Michial Gunter and
James C. Lee Analysis of multithreaded
microprocessors under multiprogramming 438--438
C. M. Wittenbrink and
A. K. Somani and
C. H. Chen Cache write generate for high
performance parallel processing . . . . 438--438
Walter H. Burkhardt and
Stefan Rust Integrated computer architecture
development system . . . . . . . . . . . 439--439
R. J. Chevance An evaluation methodology for
microprocessor and system architecture 4--13
Michael Laird A comparison of three current
superscalar designs . . . . . . . . . . 14--21
Jack J. Dongarra Performance of various computers using
standard linear equations software . . . 22--44
William F. Keown, Jr. and
Philip Koopman, Jr. and
Aaron Collins Performance of the HARRIS RTX 2000 stack
architecture versus the Sun 4 SPARC and
the Sun 3 M68020 Architectures . . . . . 45--52
Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 56--62
Siddhartha Chalterjee Book review: \em The Impact of Vector
and Parallel Architectures on the
Gaussian Elimination Algorithm by Yves
Robert (Manchester University Press and
Halsted Press, 1991) . . . . . . . . . . 63--64
Margarita Esponda and
Raúl Rojas A graphical comparison of RISC
processors . . . . . . . . . . . . . . . 2--8
Shogo Matsui Dynamic refresh method for dynamic RAMs 9--16
Arvin Park and
Ron Maeder Codes to reduce switching transients
across VLSI I/O pins . . . . . . . . . . 17--21
Gary Newman Memory management support for tiled
array organization . . . . . . . . . . . 22--30
Ivan Sklená\vr Prefetch unit for vector operations on
scalar computers . . . . . . . . . . . . 31--37
Nadeem Malik and
Richard J. Eickemeyer and
Stamatis Vassiliadis Instruction-level parallelism from
execution interlock collapsing . . . . . 38--43
Stamatis Vassiliadis and
Bart Blaner and
Richard J. Eickemeyer On the attributes of the SCISM
organization . . . . . . . . . . . . . . 44--53
Mark Thorson Usenet nuggets . . . . . . . . . . . . . 56--64
Ken Allen Book review: \em Computing with Parallel
Architectures: T.Node, edited by D.
Gassilloud and J. C. Grossetie (Kluwer
Academic Publishers 1991) . . . . . . . 65--66
Gavin Michael and
Andrew Chien Future multicomputers: beyond minimalist
multiprocessors? . . . . . . . . . . . . 6--12
R. P. Kaushal and
J. S. Bedi Comparison of hypercube, hypernet, and
symmetric hypernet architectures . . . . 13--25
Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 28--33
David Levy Book review: \em Neural Networks and
Fuzzy Systems: A Dynamical Systems
Approach to Machine Intelligence by Bart
Kosko (Prentice Hall 1992) . . . . . . . 34--34
Atsushi Inoue and
Kenji Takeda Performance evaluation for various
configuration of superscalar processors 4--11
Augustus K. Uht Extraction of massive instruction level
parallelism . . . . . . . . . . . . . . 12--14
Nasr Ullah and
Matt Holle The MC88110 implementation of precise
exceptions in a superscalar architecture 15--25
Yannick Deville A process-dependent partitioning
strategy for cache memories . . . . . . 26--33
Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 36--38
ACM SIGARCH Computer Architecture News Staff Book reviews . . . . . . . . . . . . . . 39--39
R. Cypher and
A. Ho and
S. Konstantinidou and
P. Messina Architectural requirements of parallel
scientific applications with explicit
communication . . . . . . . . . . . . . 2--13
Edward Rothberg and
Jaswinder Pal Singh and
Anoop Gupta Working sets, cache sizes, and node
granularity issues for large-scale
multiprocessors . . . . . . . . . . . . 14--26
David Nagle and
Richard Uhlig and
Tim Stanley and
Stuart Sechrest and
Trevor Mudge and
Richard Brown Design tradeoffs for software-managed
TLBs . . . . . . . . . . . . . . . . . . 27--38
Jerry Huck and
Jim Hays Architectural support for translation
table management in large address space
machines . . . . . . . . . . . . . . . . 39--50
Pei Cao and
Swee Boon Lim and
Shivakumar Venkataraman and
John Wilkes The TickerTAIP parallel RAID
architecture . . . . . . . . . . . . . . 52--63
Daniel Stodolsky and
Garth Gibson and
Mark Holland Parity logging overcoming the small
write problem in redundant disk arrays 64--75
Jai Menon and
Jim Cortney The architecture of a fault-tolerant
cached RAID controller . . . . . . . . . 76--87
Michel Dubois and
Jonas Skeppstedt and
Livio Ricciulli and
Krishnan Ramamurthy and
Per Stenström The detection and elimination of useless
misses in multiprocessors . . . . . . . 88--97
Alan L. Cox and
Robert J. Fowler Adaptive cache coherency for detecting
migratory shared data . . . . . . . . . 98--108
Per Stenström and
Mats Brorsson and
Lars Sandberg An adaptive cache coherence protocol
optimized for migratory sharing . . . . 109--118
Carl A. Waldspurger and
William E. Weihl Register relocation: flexible contexts
for multithreading . . . . . . . . . . . 120--130
Yasuo Hidaka and
Hanpei Koike and
Hidehiko Tanaka Multiple threads in cyclic register
windows . . . . . . . . . . . . . . . . 131--142
Sandhya Dwarkadas and
Peter Keleher and
Alan L. Cox and
Willy Zwaenepoel Evaluation of release consistent
software distributed shared memory on
emerging network technology . . . . . . 144--155
David A. Wood and
Satish Chandra and
Babak Falsafi and
Mark D. Hill and
James R. Larus and
Alvin R. Lebeck and
James C. Lewis and
Shubhendu S. Mukherjee and
Subbarao Palacharla and
Steven K. Reinhardt Mechanisms for cooperative shared memory 156--167
André Seznec A case for two-way skewed-associative
caches . . . . . . . . . . . . . . . . . 169--178
Anant Agarwal and
Stephen D. Pudar Column-associative caches: a technique
for reducing the miss rate of
direct-mapped caches . . . . . . . . . . 179--190
Norman P. Jouppi Cache write policies and performance . . 191--201
Eric L. Boyd and
Edward S. Davidson Hierarchical performance modeling with
MACS: a case study of the Convex C-240 203--210
D. Kuck and
E. Davidson and
D. Lawrie and
A. Sameh and
C. Q. Zhu and
A. Veidenbaum and
J. Konicek and
P. Yew and
K. Gallivan and
W. Jalby and
H. Wijshoff and
R. Bramley and
U. M. Yang and
P. Emrath and
D. Padua and
R. Eigenmann and
J. Hoeflinger and
G. Jaxon and
Z. Li and
T. Murphy and
J. Andrews The cedar system and an initial
performance study . . . . . . . . . . . 213--223
Michael D. Noakes and
Deborah A. Wallach and
William J. Dally The J-machine multicomputer: an
architectural evaluation . . . . . . . . 224--235
John Bunda and
Don Fussell and
W. C. Athas and
Roy Jenevein 16-bit vs. 32-bit instructions for
pipelined microprocessors . . . . . . . 237--246
Tokuzo Kiyohara and
Scott Mahlke and
William Chen and
Roger Bringmann and
Richard Hank and
Sadun Anik and
Wen-Mei Hwu Register connection: a new approach to
adding registers into instruction set
architectures . . . . . . . . . . . . . 247--256
Tse-Yu Yeh and
Yale N. Patt A comparison of dynamic branch
predictors that use two levels of branch
history . . . . . . . . . . . . . . . . 257--266
Luis André Barroso and
Michel Dubois The performance of cache-coherent
ring-based multiprocessors . . . . . . . 268--277
Dean M. Tullsen and
Susan J. Eggers Limitations of cache prefetching on a
bus-based multiprocessor . . . . . . . . 278--288
Maurice Herlihy and
J. Eliot B. Moss Transactional memory: architectural
support for lock-free data structures 289--300
Ellen Spertus and
Seth Copen Goldstein and
Klaus Erik Schauser and
Thorsten von Eicken and
David E. Culler and
William J. Dally Evaluation of mechanisms for
fine-grained parallel programs in the
J-machine and the CM-5 . . . . . . . . . 302--313
Takeshi Horie and
Kenichi Hayashi and
Toshiyuki Shimizu and
Hiroaki Ishihata Improving AP1000 parallel computer
performance with message communication 314--325
W.-C. Hsu and
J. E. Smith Performance of cached DRAM organizations
in vector supercomputers . . . . . . . . 327--336
Q. S. Gao The Chinese remainder theorem and the
prime memory system . . . . . . . . . . 337--340
André Seznec and
Jacques Lenfant Odd memory systems may be quite
interesting . . . . . . . . . . . . . . 341--350
Rajendra V. Boppana and
Suresh Chalasani A comparison of adaptive wormhole
routing algorithms . . . . . . . . . . . 351--360
Augustus K. Uht Extraction of massive instruction level
parallelism . . . . . . . . . . . . . . 5--12
Gowri Ramanathan and
Joel Oren Survey of commercial parallel machines 13--33
Benjamin J. Ewy and
Joseph B. Evans Secondary cache performance in RISC
architecture . . . . . . . . . . . . . . 34--37
Iraj Danesh Physical limitations of a computer . . . 40--45
Mark Thorson Usenet nuggets . . . . . . . . . . . . . 46--49
Gary Fostel Book Reviews: \em Principles of Computer
Systems by Gerald M. Karam & John C.
Bryant (Prentice Hall 1992) . . . . . . 50--51
Gary Fostel Book Review: \em Computer Architecture
by Mario De Blasi (Addison-Wesley
Publishing Company, 1990) . . . . . . . 51--53
John Fulcher Book Review: \em Practical Parallel
Computing by Paul Messina and Almerico
Murli, Editors (John Wiley and Sons,
1992) . . . . . . . . . . . . . . . . . 53--54
Mark D. Hill and
James R. Larus and
Alvin R. Lebeck and
Madhusudhan Talluri and
David A. Wood Wisconsin Architectural Research Tool
Set . . . . . . . . . . . . . . . . . . 8--10
Craig Hyatt A high-performance object-oriented
memory . . . . . . . . . . . . . . . . . 11--19
Gautam Dewan and
V. S. S. Nair A case for uniform memory access
multiprocessors . . . . . . . . . . . . 20--26
Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 27--28
Glen Langdon Book Reviews . . . . . . . . . . . . . . 29--29
Ravi Jain and
John Werth and
J. C. Browne Introduction to the Special Issue on
Input/Output in Parallel Computer
Systems . . . . . . . . . . . . . . . . 5--6
Peter F. Corbett and
Sandra Johnson Baylor and
Dror G. Feitelson Overview of the Vesta parallel file
system . . . . . . . . . . . . . . . . . 7--14
Z. Lin and
S. Zhou Parallelizing I/O intensive applications
for a workstation cluster: a case study 15--22
Samuel A. Fineberg Implementing the NHT-1 application I/O
benchmark . . . . . . . . . . . . . . . 23--30
Juan Miguel del Rosario and
Rajesh Bordawekar and
Alok Choudhary Improved parallel I/O via a two-phase
run-time access strategy . . . . . . . . 31--38
Shahram Ghandeharizadeh and
Cyrus Shahabi and
Luis Ramos An overview of techniques to support
continuous retrieval of multimedia
objects . . . . . . . . . . . . . . . . 39--46
Ravi Jain and
Kiran Somalwar and
John Werth and
J. C. Browne Scheduling parallel I/O operations . . . 47--54
Qiang Li and
Naphtali Rishe A transputer T9000 family based
architecture for parallel database
machines . . . . . . . . . . . . . . . . 55--62
Claus Aßmann A RISC processor architecture with a
versatile stack system . . . . . . . . . 63--70
Dajin Wang A note on ``Diagnosabilities of
hypercubes under the pessimistic
one-step diagnosis strategy'' . . . . . 71--78
Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 79--85
Bob Alverson Book Review: \em High-Speed Digital
Design: A Handbook of Black Magic by
Howard W. Johnson and Martin Graham
(Prentice-Hall, 1993) . . . . . . . . . 85--86
Robert Iannucci and
Anant Agarwal and
Bill Dally and
Anoop Gupta and
Greg Papadopoulos and
Burton Smith Architectural and implementation issues
for multithreading (panel session I) . . 3--18
Burt Halstead and
David Callahan and
Jack Dennis and
R. S. Nikhil and
Vivek Sarkar Programming, compilation, and resource
management issues for multithreading
(panel session II) . . . . . . . . . . . 19--33
Henry G. Baker Linear logic and permutation
stacks---the Forth shall be first . . . 34--43
Abraham Mendlson and
Shlomit S. Pinter and
Ruth Shtokhamer Compile time instruction cache
optimizations . . . . . . . . . . . . . 44--51
David Barach and
Jaspal Kohli and
John Slice and
Marc Spaulding and
Rajeev Bharadhwaj and
Don Hudson and
Cliff Neighbors and
Nirmal Saxena and
Rolland Crunk HALSIM---a very fast SPARC V9 behavioral
model . . . . . . . . . . . . . . . . . 52--58
Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 59--60
Ewerton Longoni Madruga Book Review: \em SNMP, SNMPv2, and CMIP:
The Practical Guide to Network
Management Standards by William
Stallings (Addison-Wesley Publishing
Company Inc. 1993) . . . . . . . . . . . 60--61
B. Calder and
D. Grunwald Fast and accurate instruction fetch and
branch prediction . . . . . . . . . . . 2--11
A. R. Talcott and
W. Yamamoto and
M. J. Serrano and
R. C. Wood and
M. Nemirovsky The impact of unresolved branches on
branch prediction scheme performance . . 12--21
S. Palacharla and
R. E. Kessler Evaluating stream buffers as a secondary
cache replacement . . . . . . . . . . . 24--33
N. P. Jouppi and
S. J. E. Wilton Tradeoffs in two-level on-chip caching 34--45
A. Singhal and
A. J. Goldberg Architectural support for performance
tuning: a case study on the SPARCcenter
2000 . . . . . . . . . . . . . . . . . . 48--59
Z. Cvetanovic and
D. Bhandarkar Characterization of Alpha AXP
performance using TP and SPEC workloads 60--70
C. Natarajan and
S. Sharma and
R. K. Iyer Measurement-based characterization of
global memory and network contention,
operating system and parallelization
overheads . . . . . . . . . . . . . . . 71--80
T. Joe and
J. L. Hennessy Evaluating the memory overhead required
for COMA architectures . . . . . . . . . 82--93
A. C. Klaiber and
H. M. Levy A comparison of message passing and
shared memory architectures for data
parallel programs . . . . . . . . . . . 94--105
A. L. Cox and
S. Dwarkadas and
P. Keleher and
H. Lu and
R. Rajamony and
W. Zwaenepoel Software versus hardware shared-memory
implementation: a case study . . . . . . 106--117
D. N. Pnevmatikatos and
G. S. Sohi Guarded execution and branch prediction
in dynamic ILP processors . . . . . . . 120--129
C.-L Su and
A. M. Despain Branch with masked squashing in
superpipelined processors . . . . . . . 130--140
M. A. Blumrich and
K. Li and
R. Alpert and
C. Dubnicki and
E. W. Felten and
J. Sandberg Virtual memory mapped network interface
for the SHRIMP multicomputer . . . . . . 142--153
P. Steenkiste and
M. Hemy and
T. Mummert and
B. Zill Architecture and evaluation of a
high-speed networking subsystem for
distributed-memory systems . . . . . . . 154--163
B. A. Nayfeh and
K. Olukotun Exploring the design space for a
shared-cache multiprocessor . . . . . . 166--175
R. Thekkath and
S. J. Eggers Impact of sharing-based thread placement
on multithreaded architectures . . . . . 176--186
F. Dahlgren and
M. Dubois and
P. Stenström Combined performance gains of simple
cache protocol extensions . . . . . . . 187--197
A. S. Huang and
G. Slavenburg and
J. P. Shen Speculative disambiguation: a
compilation technique for dynamic memory
disambiguation . . . . . . . . . . . . . 200--210
K. I. Farkas and
N. P. Jouppi Complexity/performance tradeoffs with
non-blocking loads . . . . . . . . . . . 211--222
T.-F. Chen and
J.-L. Baer A performance study of software and
hardware data prefetching schemes . . . 223--232
A. L. Drapeau and
K. W. Shirriff and
J. H. Hartman and
E. L. Miller and
S. Seshan and
R. H. Katz and
K. Lutz and
D. A. Patterson and
E. K. Lee and
P. M. Chen and
G. A. Gibson RAID-II: a high-bandwidth network file
server . . . . . . . . . . . . . . . . . 234--244
M. Blaum and
J. Brady and
J. Bruck and
J. Menon EVENODD: an optimal scheme for
tolerating double disk failures in RAID
architectures . . . . . . . . . . . . . 245--254
S. W. Ng Crosshatch disk array for improved
reliability and performance . . . . . . 255--264
A. DeHon and
F. Chong and
M. Becker and
E. Egozy and
H. Minsky and
S. Peretz and
T. F. Knight, Jr. METRO: a router architecture for
high-performance, short-haul routing
networks . . . . . . . . . . . . . . . . 266--277
J. D. Allen and
P. T. Gaughan and
D. E. Schimmel and
S. Yalamanchili Ariadne---an adaptive router for
fault-tolerant multicomputers . . . . . 278--288
J. H. Kim and
Z. Liu and
A. A. Chien Compressionless routing: a framework for
adaptive and fault-tolerant routing . . 289--300
J. Kuskin and
D. Ofelt and
M. Heinrich and
J. Heinlein and
R. Simoni and
K. Gharachorloo and
J. Chapin and
D. Nakahira and
J. Baxter and
M. Horowitz and
A. Gupta and
M. Rosenblum and
J. Hennessy The Stanford FLASH multiprocessor . . . 302--313
D. Chaiken and
A. Agarwal Software-extended coherent shared
memory: performance and cost . . . . . . 314--324
S. K. Reinhardt and
J. R. Larus and
D. A. Wood Tempest and Typhoon: user-level shared
memory . . . . . . . . . . . . . . . . . 325--336
M. Farrens and
G. Tyson and
A. R. Pleszkun A study of single-chip processor/cache
organizations for large numbers of
transistors . . . . . . . . . . . . . . 338--347
C.-H. Chen and
A. K. Somani A unified architectural tradeoff
methodology . . . . . . . . . . . . . . 348--357
D. Nagle and
R. Uhlig and
T. Mudge and
S. Sechrest Optimal allocation of on-chip memory for
multiple-API operating systems . . . . . 358--369
R. W. Quong Expected I-cache miss rates via the gap
model . . . . . . . . . . . . . . . . . 372--383
A. Seznec Decoupled sectored caches: conciliating
low tag implementation cost . . . . . . 384--393
J. R. Gurd Supercomputing: big bang or steady state
growth? . . . . . . . . . . . . . . . . 3--13
Kay P. Litchfield Instruction execution sequence
confirmation . . . . . . . . . . . . . . 14--18
Phil Allen and
Franc Brglez and
Hal Carter and
Robert Caverly and
Jerry Dillion and
Albert Lo and
Ron Lomax and
John Oldfield and
Cesar Pina and
T. J. Wilkinson Report of the 1993 Workshop on Rapid
Prototyping of Microelectronic Systems
for Universities . . . . . . . . . . . . 19--26
Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 27--28
Ewerton Longoni Madruga Book Review: \em Internetworking with
TCP/IP, vol. III: Client-Server
programming and applications (BSD
Sockets version) by Douglas E. Comer and
David L. Stevens (Prentice-Hall, 1993) 29--30
Ravi Jain and
John Werth and
J. C. Browne Special Issue on Input/Output in
Parallel Computer Systems: Introduction 3--4
Sandra Johnson Baylor and
Caroline Benveniste and
Yarsun Hsu Performance evaluation of a massively
parallel I/O subsystem . . . . . . . . . 5--10
James B. Sinclair and
Jay Tang and
Peter J. Varman Instability in parallel I/O systems . . 11--16
Steven H. Vanderleest and
Ravishankar K. Iyer Measurement of I/O bus contention and
correlation among heterogeneous device
types in a single-bus multiprocessor
system . . . . . . . . . . . . . . . . . 17--22
Rajeev Thakur and
Rajesh Bordawekar and
Alok Choudhary Compilation of out-of-core data parallel
programs for distributed memory machines 23--28
Abhaya Asthana and
Mark Cravatts and
Paul Krzyzanowski An experimental active memory based I/O
subsystem . . . . . . . . . . . . . . . 29--34
Dannie Durand and
Ravi Jain and
David Tseytlin Distributed scheduling algorithms to
improve the performance of parallel data
transfers . . . . . . . . . . . . . . . 35--40
Haruo Yokota DR-nets: data-reconstruction networks
for highly reliable parallel-disk
systems . . . . . . . . . . . . . . . . 41--46
Martti J. Forsell Are multiport memories physically
feasible? . . . . . . . . . . . . . . . 47--54
Ghulam Chaudhry and
Xuechang Li A case for the multithreaded processor
architecture . . . . . . . . . . . . . . 55--59
Yin Chan and
Ashok Sudarsanam and
Andrew Wolfe The effect of compiler-flag tuning on
SPEC benchmark performance . . . . . . . 60--70
Jin-Ho Lee and
Min-Young Lee and
Seong-Uk Choi and
Myong-Soon Park Reducing cache conflicts in data cache
prefetching . . . . . . . . . . . . . . 71--77
Mark Thorson Usenet Nuggets . . . . . . . . . . . . . 78--81
Martti J. Forsell Are multiport memories physically
feasible? . . . . . . . . . . . . . . . 3--10
Rok Sosi\vc History cache: hardware support for
reverse execution . . . . . . . . . . . 11--18
Mark D. Hill and
James R. Larus and
David A. Wood The Wisconsin Wind Tunnel project: an
annotated bibliography . . . . . . . . . 19--26
Avijit Saha and
Nadeem Malik Distributed directory tags . . . . . . . 27--29
Ishaq H. Unwala and
Harvey G. Cragon A study of MIPS programs . . . . . . . . 30--40
Mark Thorson Internet Nuggets . . . . . . . . . . . . 41--46
Kenneth R. Ohnemus and
Diana F. Mallin Benefits of implementing on-line methods
and procedures . . . . . . . . . . . . . 49--55
Daniel K. Cunningham and
Steven J. Reilly Leading the design team---the evolution
of the technical writer from a support
role to a design role . . . . . . . . . 56--60
Ann Rockley Multimedia: towards an electronic
performance support system . . . . . . . 61--65
Katherine E. Drew Telecommunicators and telecommuters:
making multiple-site documentation
projects work . . . . . . . . . . . . . 66--75
Aimee Severson and
Brent Nelson Throughput in a counterflow pipeline
processor . . . . . . . . . . . . . . . 5--12
Tsong-Chih Hsu and
Sheng-De Wang A simple architecture for constant time
sorting machines . . . . . . . . . . . . 13--19
Wm. A. Wulf and
Sally A. McKee Hitting the memory wall: implications of
the obvious . . . . . . . . . . . . . . 20--24
Mark Thorson Internet Nuggets . . . . . . . . . . . . 25--28
Anant Agarwal and
Ricardo Bianchini and
David Chaiken and
Kirk L. Johnson and
David Kranz and
John Kubiatowicz and
Beng-Hong Lim and
Kenneth Mackenzie and
Donald Yeung The MIT Alewife machine: architecture
and performance . . . . . . . . . . . . 2--13
Yuetsu Kodama and
Hirohumi Sakane and
Mitsuhisa Sato and
Hayato Yamana and
Shuichi Sakai and
Yoshinori Yamaguchi The EM-X parallel computer: architecture
and basic performance . . . . . . . . . 14--23
Steven Cameron Woo and
Moriyoshi Ohara and
Evan Torrie and
Jaswinder Pal Singh and
Anoop Gupta The SPLASH-2 programs: characterization
and methodological considerations . . . 24--36
Håkan Grahn and
Per Stenström Efficient strategies for software-only
protocols in shared-memory
multiprocessors . . . . . . . . . . . . 38--47
Alvin R. Lebeck and
David A. Wood Dynamic self-invalidation: reducing
coherence overhead in shared-memory
multiprocessors . . . . . . . . . . . . 48--59
Fredrik Dahlgren Boosting the performance of hybrid
snooping cache protocols . . . . . . . . 60--69
Andreas G. Nowatzyk and
Michael C. Browne and
Edmund J. Kelly and
Michael Parkin S-connect: from networks of workstations
to supercomputer performance . . . . . . 71--82
Anujan Varma and
Quinn Jacobson Destage algorithms for disk arrays with
non-volatile caches . . . . . . . . . . 83--95
Gordon Stoll and
Bin Wei and
Douglas Clark and
Edward W. Felten and
Kai Li and
Patrick Hanrahan Evaluating multi-port frame buffer
designs for a mesh-connected
multicomputer . . . . . . . . . . . . . 96--105
Andreas G. Nowatzyk and
Paul R. Prucnal Are crossbars really dead?: the case for
optical multiprocessor interconnect
systems . . . . . . . . . . . . . . . . 106--115
Stéphan Jourdan and
Pascal Sainrat and
Daniel Litaize Exploring configurations of functional
units in an out-of-order superscalar
processor . . . . . . . . . . . . . . . 117--125
Hideki Ando and
Chikako Nakanishi and
Tetsuya Hara and
Masao Nakaya Unconstrained speculative execution with
predicated state buffering . . . . . . . 126--137
Scott A. Mahlke and
Richard E. Hank and
James E. McCormick and
David I. August and
Wen-Mei W. Hwu A comparison of full and partial
predicated execution support for ILP
processors . . . . . . . . . . . . . . . 138--150
M. Simone and
A. Essen and
A. Ike and
A. Krishnamoorthy and
T. Maruyama and
N. Patkar and
M. Ramaswami and
M. Shebanow and
V. Thirumalaiswamy and
D. Tovey Implementation trade-offs in using a
restricted data flow architecture in a
high performance RISC microprocessor . . 151--162
Trung A. Diep and
Christopher Nelson and
John Paul Shen Performance evaluation of the PowerPC
620 microarchitecture . . . . . . . . . 163--174
Theodore H. Romer and
Wayne H. Ohlrich and
Anna R. Karlin and
Brian N. Bershad Reducing TLB and memory overhead using
online superpage promotion . . . . . . . 176--187
Zheng Zhang and
Josep Torrellas Speeding up irregular applications in
shared-memory multiprocessors: memory
binding and group prefetching . . . . . 188--199
K. V. Anjan and
Timothy Mark Pinkston An efficient, fully adaptive deadlock
recovery scheme: DISHA . . . . . . . . . 201--210
Kang G. Shin and
Stuart W. Daniel Analysis and implementation of hybrid
switching . . . . . . . . . . . . . . . 211--219
Binh Vien Dao and
Jose Duato and
Sudhakar Yalamanchili Configurable flow control mechanisms for
fault-tolerant routing . . . . . . . . . 220--229
Timothy Callahan and
Seth Copen Goldstein NIFDY: a low overhead, high throughput
network interface . . . . . . . . . . . 230--241
Montse Peiron and
Mateo Valero and
Eduard Ayguadé and
Tomás Lang Vector multiprocessors with arbitrated
memory access . . . . . . . . . . . . . 243--252
Krishna M. Kavi and
A. R. Hurson and
Phenil Patadia and
Elizabeth Abraham and
Ponnarasu Shanmugam Design of cache memories for
multi-threaded dataflow architecture . . 253--264
François Bodin and
André Seznec Skewed associativity enhances
performance predictability . . . . . . . 265--274
Cliff Young and
Nicolas Gloy and
Michael D. Smith A comparative analysis of schemes for
correlated branch prediction . . . . . . 276--286
Brad Calder and
Dirk Grunwald Next cache line and set prediction . . . 287--296
Vijay Karamcheti and
Andrew A. Chien A comparison of architectural support
for messaging in the TMC CM-5 and the
Cray T3D . . . . . . . . . . . . . . . . 298--307
T. Stricker and
T. Gross Optimizing memory system performance for
communication in parallel computers . . 308--319
Remzi H. Arpaci and
David E. Culler and
Arvind Krishnamurthy and
Steve G. Steinberg and
Katherine Yelick Empirical evaluation of the CRAY-T$3$D:
a compiler perspective . . . . . . . . . 320--331
Thomas M. Conte and
Kishore N. Menezes and
Patrick M. Mills and
Burzin A. Patel Optimization of instruction fetch
mechanisms for high issue rates . . . . 333--344
Richard Uhlig and
David Nagle and
Trevor Mudge and
Stuart Sechrest and
Joel Emer Instruction fetching: coping with code
bloat . . . . . . . . . . . . . . . . . 345--356
Dennis Lee and
Jean-Loup Baer and
Brad Calder and
Dirk Grunwald Instruction cache fetch policies for
speculative execution . . . . . . . . . 357--367
Todd M. Austin and
Dionisios N. Pnevmatikatos and
Gurindar S. Sohi Streamlining data cache access with fast
address calculation . . . . . . . . . . 369--380
Hong Wang and
Tong Sun and
Qing Yang CAT---caching address tags: a technique
for reducing area cost of on-chip caches 381--390
Dean M. Tullsen and
Susan J. Eggers and
Henry M. Levy Simultaneous multithreading: maximizing
on-chip parallelism . . . . . . . . . . 392--403
Richard C. Ho and
C. Han Yang and
Mark A. Horowitz and
David L. Dill Architecture validation for processors 404--413
Gurindar S. Sohi and
Scott E. Breach and
T. N. Vijaykumar Multiscalar processors . . . . . . . . . 414--425
Carl J. Beckmann HTGL: a program modelling language . . . 3--10
Jean-Louis Lafitte On structured data handling in parallel
processing . . . . . . . . . . . . . . . 11--18
B. Ulmann o$ \mu $-EP-1: a simple 32-bit
architecture . . . . . . . . . . . . . . 19--24
Mark Thorson Internet Nuggets . . . . . . . . . . . . 25--27
Daniel Tabak \em Cache and Memory Hierarchy Design: A
Performance-Directed Approach by Steven
A. Przybylski . . . . . . . . . . . . . 28--28
Maurice V. Wilkes The memory wall and the CMOS end-point 4--6
Eric E. Johnson Graffiti on ``the memory wall'' . . . . 7--8
Tariq Afzal Performance modeling using the Motorola
PowerPC timing simulator . . . . . . . . 9--18
Behrooz Parhami SIMD machines: do they have a
significant future? . . . . . . . . . . 19--22
Ravi Jain and
John Werth Airdisks and airRAID (expanded extract):
modeling and scheduling periodic
wireless data broadcast . . . . . . . . 23--28
Leonidas I. Kontothanassis and
Michael L. Scott Efficient shared memory with minimal
hardware support . . . . . . . . . . . . 29--35
Michael K. Gschwind and
Thomas J. Pietsch Vector prefetching . . . . . . . . . . . 1--7
Ramesh K. Karne Object-oriented computer architectures
for new generation of applications . . . 8--19
Humayun Khalid The unconventional replacement
algorithms . . . . . . . . . . . . . . . 20--26
Humayun Khalid A trace-driven simulation methodology 27--33
Nikki Mirghafori and
Margret Jacoby and
David Patterson Truth in SPEC benchmarks . . . . . . . . 34--42
Mark Thorson Internet Nuggets . . . . . . . . . . . . 43--44
Trevor Mudge Report on the panel: ``How Can Computer
Architecture Researchers Avoid Becoming
the Society for Irreproducible
Results?'' . . . . . . . . . . . . . . . 1--5
Oh-Young Kwon and
Gi-Ho Park and
Tack-Don Han A compiler optimization to reduce
execution time of loop nest . . . . . . 6--11
Mark Thorson Internet Nuggets . . . . . . . . . . . . 12--16
Daniel Tabak Book Review: \em Alpha Implementations
and Architecture by Dileep P. Bhandarkar 17--18
Marius Evers and
Po-Yung Chang and
Yale N. Patt Using hybrid branch predictors to
improve branch prediction accuracy in
the presence of context switches . . . . 3--11
Nicolas Gloy and
Cliff Young and
J. Bradley Chen and
Michael D. Smith An analysis of dynamic branch prediction
schemes on system workloads . . . . . . 12--21
Stuart Sechrest and
Chih-Chieh Lee and
Trevor Mudge Correlation and aliasing in dynamic
branch predictors . . . . . . . . . . . 22--32
Steven K. Reinhardt and
Robert W. Pfile and
David A. Wood Decoupled hardware support for
distributed shared memory . . . . . . . 34--43
Donald Yeung and
John Kubiatowicz and
Anant Agarwal MGS: a multigrain shared memory system 44--55
Christine Morin and
Alain Gefflaut and
Michel Banâtre and
Anne-Marie Kermarrec COMA: an opportunity for building
fault-tolerant scalable shared memory
multiprocessors . . . . . . . . . . . . 56--65
Basem A. Nayfeh and
Lance Hammond and
Kunle Olukotun Evaluation of design alternatives for a
multiprocessor microprocessor . . . . . 67--77
Doug Burger and
James R. Goodman and
Alain Kägi Memory bandwidth limitations of future
microprocessors . . . . . . . . . . . . 78--89
Ashley Saulsbury and
Fong Pong and
Andreas Nowatzyk Missing the memory wall: the case for
processor/memory integration . . . . . . 90--101
André Seznec Don't use the page number, but a pointer
to it . . . . . . . . . . . . . . . . . 104--113
Toni Juan and
Tomás Lang and
Juan J. Navarro The difference-bit cache . . . . . . . . 114--120
Liviu Iftode and
Jaswinder Pal Singh and
Kai Li Understanding application performance on
shared virtual memory systems . . . . . 122--133
Chris Holt and
Jaswinder Pal Singh and
John Hennessy Application and architectural
bottlenecks in large scale distributed
shared memory machines . . . . . . . . . 134--145
Kenneth M. Wilson and
Kunle Olukotun and
Mendel Rosenblum Increasing cache port efficiency for
dynamic superscalar microprocessors . . 147--157
Todd M. Austin and
Gurindar S. Sohi High-bandwidth address translation for
multiple-issue processors . . . . . . . 158--167
Yiming Hu and
Qing Yang DCD---disk caching disk: a new approach
for boosting I/O performance . . . . . . 169--178
Olivier Maquelin and
Guang R. Gao and
Herbert H. J. Hum and
Kevin B. Theobald and
Xin-Min Tian Polling watchdog: combining polling and
interrupts for efficient message
handling . . . . . . . . . . . . . . . . 179--188
Dean M. Tullsen and
Susan J. Eggers and
Joel S. Emer and
Henry M. Levy and
Jack L. Lo and
Rebecca L. Stamm Exploiting choice: instruction fetch and
issue on an implementable simultaneous
multithreading processor . . . . . . . . 191--202
Richard J. Eickemeyer and
Ross E. Johnson and
Steven R. Kunkel and
Mark S. Squillante and
Shiafun Liu Evaluation of multithreaded
uniprocessors for commercial application
environments . . . . . . . . . . . . . . 203--212
Tetsuya Hara and
Hideki Ando and
Chikako Nakanishi and
Masao Nakaya Performance comparison of ILP machines
with cycle time evaluation . . . . . . . 213--224
Jae H. Kim and
Andrew A. Chien Rotating combined queueing (RCQ):
bandwidth and latency guarantees in
low-cost, high-performance networks . . 226--236
Jennifer Rexford and
John Hall and
Kang G. Shin A router architecture for real-time
point-to-point networks . . . . . . . . 237--246
Shubhendu S. Mukherjee and
Babak Falsafi and
Mark D. Hill and
David A. Wood Coherent network interfaces for
fine-grain communication . . . . . . . . 247--258
Mark Horowitz and
Margaret Martonosi and
Todd C. Mowry and
Michael D. Smith Informing memory operations: providing
memory performance feedback in modern
processors . . . . . . . . . . . . . . . 260--270
Chun Xia and
Josep Torrellas Instruction prefetching of systems codes
with layout optimized for reduced cache
misses . . . . . . . . . . . . . . . . . 271--282
Lynn Choi and
Pen-Chung Yew Compiler and hardware support for cache
coherence in large-scale
multiprocessors: design considerations
and performance study . . . . . . . . . 283--294
Edward W. Felten and
Richard D. Alpert and
Angelos Bilas and
Matthias A. Blumrich and
Douglas W. Clark and
Stefanos N. Damianakis and
Cezary Dubnicki and
Liviu Iftode and
Kai Li Early experience with message-passing on
the SHRIMP multicomputer . . . . . . . . 296--307
Tom Lovett and
Russell Clapp STiNG: a CC-NUMA computer system for the
commercial marketplace . . . . . . . . . 308--317
J. Carretero and
F. Pérez and
P. de Miguel and
F. García and
L. Alonso A massively parallel and distributed I/O
subsystem . . . . . . . . . . . . . . . 1--8
W. B. Ligon III and
Daniel C. Stanzione, Jr. Distributing and load-balancing for
loops in scientific applications . . . . 9--17
Samson Belayneh and
David R. Kaeli A discussion on non-blocking/lockup-free
caches . . . . . . . . . . . . . . . . . 18--25
Mark Thorson Internet Nuggets . . . . . . . . . . . . 26--32
Gerard Páez-Monzón and
Charles Páez-Monzón The RISC processor DMN-6: a unified
data-control flow architecture . . . . . 3--10
J. A. Gómez Pulido and
J. M. Sánchez Pérez and
J. A. Moreno Zamora An educational tool for testing
hierarchical multilevel caches . . . . . 11--15
Samson Belayneh and
David R. Kaeli A discussion on non-blocking/lockup-free
caches . . . . . . . . . . . . . . . . . 16--16
Mark Rosenbaum Architectural potholes . . . . . . . . . 17--18
John Mashey Architectural potholes . . . . . . . . . 18--18
Adrian Cockcroft I/O potholes . . . . . . . . . . . . . . 18--19
Zahir Ebrahim I/O potholes . . . . . . . . . . . . . . 19--20
Brad Carlile Interpreting benchmarks . . . . . . . . 20--21
David Chase Register windows . . . . . . . . . . . . 21--21
Paul W. DeMone Register windows and delay slots . . . . 21--22
Charlton D. Rose and
J. Kelly Flanagan Constructing instruction traces from
cache-filtered address traces (CITCAT) 1--8
Susan Flynn Hummel Efficient data sharing with conditional
remote memory transfers . . . . . . . . 9--17
Larry Widigen and
Elliot Sowadsky and
Kevin McGrath Eliminating operand read latency . . . . 18--22
Philip Machanick The case for SRAM main memory . . . . . 23--30
Dileep Bhandarkar RISC versus CISC: a tale of two chips 1--12
I. Martín and
F. Tirado A SIMD computer for multigrid methods 13--18
Reinhold Weicker On the use of SPEC benchmarks in
computer architecture research . . . . . 19--22
Shubhendu S. Mukherjee What should graduate students know
before joining a large computer
architecture project? . . . . . . . . . 23--26
Humayun Khalid A new cache replacement scheme based on
backpropagation neural networks . . . . 27--33
Mark Thorson Internet Nuggets . . . . . . . . . . . . 34--36
Sriram Vajapeyam and
Tulika Mitra Improving superscalar instruction
dispatch and issue by exploiting dynamic
code sequences . . . . . . . . . . . . . 1--12
Ravi Nair and
Martin E. Hopkins Exploiting instruction level parallelism
in processors by caching scheduled
groups . . . . . . . . . . . . . . . . . 13--25
Kemal Ebcio\uglu and
Erik R. Altman DAISY: dynamic compilation for 100%
architectural compatibility . . . . . . 26--37
Timothy Mark Pinkston and
Sugath Warnakulasuriya On deadlocks in interconnection networks 38--49
Craig B. Stunkel and
Rajeev Sivaram and
Dhabaleswar K. Panda Implementing multidestination worms in
switch-based parallel systems:
architectural alternatives and their
impact . . . . . . . . . . . . . . . . . 50--61
Guillermo A. Alvarez and
Walter A. Burkhard and
Flaviu Cristian Tolerating multiple failures in RAID
architectures with optimal storage and
uniform declustering . . . . . . . . . . 62--72
Dan Teodosiu and
Joel Baxter and
Kinshuk Govil and
John Chapin and
Mendel Rosenblum and
Mark Horowitz Hardware fault containment in scalable
shared-memory multiprocessors . . . . . 73--84
Richard P. Martin and
Amin M. Vahdat and
David E. Culler and
Thomas E. Anderson Effects of communication latency,
overhead, and bandwidth in a cluster
architecture . . . . . . . . . . . . . . 85--97
Wolf-Dietrich Weber and
Stephen Gold and
Pat Helland and
Takeshi Shimizu and
Thomas Wicki and
Winfried Wilcke The Mercury Interconnect Architecture: a
cost-effective infrastructure for
high-performance servers . . . . . . . . 98--107
Ziyad S. Hakura and
Anoop Gupta The design and analysis of a cache
architecture for texture mapping . . . . 108--120
Kenneth M. Wilson and
Kunle Olukotun Designing high bandwidth on-chip caches 121--132
Keith I. Farkas and
Paul Chow and
Norman P. Jouppi and
Zvonko Vranesic Memory-system design considerations for
dynamically-scheduled processors . . . . 133--143
Parthasarathy Ranganathan and
Vijay S. Pai and
Hazim Abdel-Shafi and
Sarita V. Adve The interaction of software prefetching
with ILP processors in shared-memory
systems . . . . . . . . . . . . . . . . 144--156
Leonidas Kontothanassis and
Galen Hunt and
Robert Stets and
Nikolaos Hardavellas and
Micha\l Cierniak and
Srinivasan Parthasarathy and
Wagner Meira, Jr. and
Sandhya Dwarkadas and
Michael Scott VM-based shared memory on low-latency,
remote-memory-access networks . . . . . 157--169
Alain Kägi and
Doug Burger and
James R. Goodman Efficient synchronization: let them eat
QOLB . . . . . . . . . . . . . . . . . . 170--180
Andreas Moshovos and
Scott E. Breach and
T. N. Vijaykumar and
Gurindar S. Sohi Dynamic speculation and synchronization
of data dependences . . . . . . . . . . 181--193
Avinash Sodani and
Gurindar S. Sohi Dynamic instruction reuse . . . . . . . 194--205
Subbarao Palacharla and
Norman P. Jouppi and
J. E. Smith Complexity-effective superscalar
processors . . . . . . . . . . . . . . . 206--218
Maged M. Michael and
Ashwini K. Nanda and
Beng-Hong Lim and
Michael L. Scott Coherence controller architectures for
SMP-based CC-NUMA multiprocessors . . . 219--228
Babak Falsafi and
David A. Wood Reactive NUMA: a design for unifying
S-COMA and CC-NUMA . . . . . . . . . . . 229--240
James Laudon and
Daniel Lenoski The SGI Origin: a ccNUMA highly scalable
server . . . . . . . . . . . . . . . . . 241--251
Doug Joseph and
Dirk Grunwald Prefetching using Markov predictors . . 252--263
Vatsa Santhanam and
Edward H. Gornish and
Wei-Chung Hsu Data prefetching on the HP PA-8000 . . . 264--273
Po-Yung Chang and
Eric Hao and
Yale N. Patt Target prediction for indirect jumps . . 274--283
Eric Sprangle and
Robert S. Chappell and
Mitch Alsup and
Yale N. Patt The agree predictor: a mechanism for
reducing negative branch history
interference . . . . . . . . . . . . . . 284--291
Pierre Michaud and
André Seznec and
Richard Uhlig Trading conflict and capacity aliasing
in conditional branch predictors . . . . 292--303
Joel Emer and
Nikolas Gloy A language for describing predictors and
its application to automatic synthesis 304--314
Teresa L. Johnson and
Wen-mei W. Hwu Run-time adaptive cache hierarchy
management via reference analysis . . . 315--326
Richard Fromm and
Stylianos Perissakis and
Neal Cardwell and
Christoforos Kozyrakis and
Bruce McGaughy and
David Patterson and
Tom Anderson and
Katherine Yelick The energy efficiency of IRAM
architectures . . . . . . . . . . . . . 327--337
Doug Burger and
Stefanos Kaxiras and
James R. Goodman DataScalar architectures . . . . . . . . 338--349
Maurice Wilkes and
Andrew Hopper The collapsed LAN: a solution to a
bandwidth problem? . . . . . . . . . . . 1--5
Tommi Jokinen and
Chia-Jiu Wang Cache design with path balancing table,
skewing and indirect tags . . . . . . . 6--12
Doug Burger and
Todd M. Austin The SimpleScalar tool set, version 2.0 13--25
Mark Thorson Internet Nuggets . . . . . . . . . . . . 26--27
Rodney Van Meter and
Greg Finn and
Steve Hotz and
Dave Dyer Response to the collapsed LAN . . . . . 1--12
Weiwu Hu and
Peisu Xia Out-of-order execution in sequentially
consistent shared-memory systems . . . . 3--10
Humayun Khalid A novel trace sampling technique . . . . 11--16
Humayun Khalid Performance of the KORA-2 cache
replacement scheme . . . . . . . . . . . 17--21
D. N. Jutla and
P. Bodorik Improving applications performance: a
memory model and cache architecture . . 22--29
B. Ulmann NICE: an elegant and powerful 32-bit
architecture . . . . . . . . . . . . . . 30--35
Mark Thorson Internet Nuggets . . . . . . . . . . . . 36--41
Vijay S. Pai and
Parthasarathy Ranganathan and
Sarita V. Adve RSIM: Rice simulator for ILP
multiprocessors . . . . . . . . . . . . 1--1
Weisong Shi and
Weiwu Hu and
Ming Zhu An innovative implementation for
directory-based cache coherence in
shared memory multiprocessors . . . . . 2--9
Mark Thorson Internet Nuggets . . . . . . . . . . . . 10--14
B. Ulmann Instruction looping, an extension to
conditional execution . . . . . . . . . 3--4
Günter Haring and
Christoph Lindemann and
Martin Reiser International workshop performance
evaluation --- origins and directions 5--6
Wes Munsil and
Chia-Jiu Wang Reducing stack usage in Java bytecode
execution . . . . . . . . . . . . . . . 7--11
Mark Thorson Internet nuggets . . . . . . . . . . . . 12--17
Mayan Moudgill Techniques for fast simulation of
associative cache directories . . . . . 1--8
Byung-Kwon Chung and
Jih-Kwon Peir LRU-based column-associative caches . . 9--17
Mark Thorson Internet Nuggets . . . . . . . . . . . . 18--22
Luiz André Barroso and
Kourosh Gharachorloo and
Edouard Bugnion Memory system characterization of
commercial workloads . . . . . . . . . . 3--14
Kimberly Keeton and
David A. Patterson and
Yong Qiang He and
Roger C. Raphael and
Walter E. Baker Performance characterization of a Quad
Pentium Pro SMP using OLTP workloads . . 15--26
Dennis C. Lee and
Patrick J. Crowley and
Jean-Loup Baer and
Thomas E. Anderson and
Brian N. Bershad Execution characteristics of desktop
applications on Windows NT . . . . . . . 27--38
Jack L. Lo and
Luiz André Barroso and
Susan J. Eggers and
Kourosh Gharachorloo and
Henry M. Levy and
Sujay S. Parekh An analysis of database workload
performance on simultaneous
multithreaded processors . . . . . . . . 39--50
Marius Evers and
Sanjay J. Patel and
Robert S. Chappell and
Yale N. Patt An analysis of correlation and
predictability: what makes two-level
branch predictors work . . . . . . . . . 52--61
Eitan Federovsky and
Meir Feder and
Sholomo Weiss Branch prediction based on universal
data compression algorithms . . . . . . 62--72
Yiannakis Sazeides and
James E. Smith Modeling program predictability . . . . 73--84
Michael Cox and
Narendra Bhandari and
Michael Shantz Multi-level texture caching for $3$D
graphics hardware . . . . . . . . . . . 86--97
Hans Eberle and
Erwin Oertli Switcherland: a QoS communication
architecture for workstation clusters 98--108
Guillermo A. Alvarez and
Walter A. Burkhard and
Larry J. Stockmeyer and
Flaviu Cristian Declustered disk array architectures
with optimal and near-optimal
parallelism . . . . . . . . . . . . . . 109--120
Dirk Grunwald and
Artur Klauser and
Srilatha Manne and
Andrew Pleszkun Confidence estimation for speculation
control . . . . . . . . . . . . . . . . 122--131
Srilatha Manne and
Artur Klauser and
Dirk Grunwald Pipeline gating: speculation control for
energy reduction . . . . . . . . . . . . 132--141
George Z. Chrysos and
Joel S. Emer Memory dependence prediction using store
sets . . . . . . . . . . . . . . . . . . 142--153
Toni Juan and
Sanji Sanjeevan and
Juan J. Navarro Dynamic history-length fitting: a third
level of adaptivity for branch
prediction . . . . . . . . . . . . . . . 155--166
Karel Driesen and
Urs Hölzle Accurate indirect branch prediction . . 167--178
Shubhendu S. Mukherjee and
Mark D. Hill Using prediction to accelerate coherence
protocols . . . . . . . . . . . . . . . 179--190
Mark Oskin and
Frederic T. Chong and
Timothy Sherwood Active pages: a computation model for
intelligent memory . . . . . . . . . . . 192--203
Mark Swanson and
Leigh Stoller and
John Carter Increasing TLB reach using superpages
backed by shadow memory . . . . . . . . 204--213
Xiaogang Qiu and
Michel Dubois Options for dynamic address translation
in COMAs . . . . . . . . . . . . . . . . 214--225
David I. August and
Daniel A. Connors and
Scott A. Mahlke and
John W. Sias and
Kevin M. Crozier and
Ben-Chung Cheng and
Patrick R. Eaton and
Qudus B. Olaniran and
Wen-mei W. Hwu Integrated predicated and speculative
execution in the IMPACT EPIC
architecture . . . . . . . . . . . . . . 227--237
Steven Wallace and
Brad Calder and
Dean M. Tullsen Threaded multiple path execution . . . . 238--249
Artur Klauser and
Abhijit Paithankar and
Dirk Grunwald Selective eager execution on the
PolyPath architecture . . . . . . . . . 250--259
Sanjay Jeram Patel and
Marius Evers and
Yale N. Patt Improving trace cache effectiveness with
branch promotion and trace packing . . . 262--271
Freddy Gabbay and
Avi Mendelson The effect of instruction fetch
bandwidth on value prediction . . . . . 272--281
David H. Albonesi Dynamic IPC/clock rate optimization . . 282--292
Yinong Zhang and
George B. Adams III Performance modeling and code
partitioning for the DS architecture . . 293--304
Stephen W. Keckler and
William J. Dally and
Daniel Maskit and
Nicholas P. Carter and
Andrew Chang and
Whay S. Lee Exploiting fine-grain thread level
parallelism on the MIT multi-ALU
processor . . . . . . . . . . . . . . . 306--317
Gheith A. Abandah and
Edward S. Davidson Effects of architectural and
technological advances on the HP/Convex
Exemplar's memory and communication
performance . . . . . . . . . . . . . . 318--329
Matthias A. Blumrich and
Richard D. Alpert and
Yuqun Chen and
Douglas W. Clark and
Stefanos N. Damianakis and
Cezary Dubnicki and
Edward W. Felten and
Liviu Iftode and
Kai Li and
Margaret Martonosi and
Robert A. Shillner Design choices in the SHRIMP system: an
empirical study . . . . . . . . . . . . 330--341
Vijayaraghavan Soundararajan and
Mark Heinrich and
Ben Verghese and
Kourosh Gharachorloo and
Anoop Gupta and
John Hennessy Flexible use of memory for
replication/migration in cache-coherent
DSM multiprocessors . . . . . . . . . . 342--355
Sanjeev Kumar and
Christopher Wilkerson Exploiting spatial locality in data
caches using spatial footprints . . . . 357--368
William L. Lynch and
Gary Lauterbach and
Joseph I. Chamdani Low load latency through sum-addressed
memory (SAM) . . . . . . . . . . . . . . 369--379
Daniel J. Sorin and
Vijay S. Pai and
Sarita V. Adve and
Mary K. Vernon and
David A. Wood Analytic evaluation of shared-memory
systems with ILP processors . . . . . . 380--391
Prasad N. Golla and
Eric C. Lin A comparison of the effect of branch
prediction on multithreaded and scalar
architectures . . . . . . . . . . . . . 3--11
Mark Thorson Internet nuggets . . . . . . . . . . . . 12--16
Philip Machanick Streaming vs. latency in information
mass-transit . . . . . . . . . . . . . . 4--6
Jean-Louis Lafitte A generalized mapping device to help
memory latency . . . . . . . . . . . . . 7--13
Farooq Ashraf and
Mostafa Abd-El-Barr and
Khalid Al-Tawil Introduction to routing in multicomputer
networks . . . . . . . . . . . . . . . . 14--21
Dick Wilmot Data threaded microarchitecture . . . . 22--32
C. K. Yuen Stack and RISC . . . . . . . . . . . . . 3--9
Sandra Johnson Baylor Unified scalable shared memory
architectures . . . . . . . . . . . . . 10--21
Anthony DeWitt and
Thomas Gross The potential of thread-level
speculation based on value profiling . . 22--22
John Kalamatianos and
David R. Kaeli Improving the accuracy of indirect
branch prediction via branch
classification . . . . . . . . . . . . . 23--26
Roy Dz-ching Ju and
Jean-François Collard and
Karim Oukbir Probabilistic memory disambiguation and
its application to data speculation . . 27--30
Matthew A. Postiff and
David A. Greene and
Gary S. Tyson and
Trevor N. Mudge The limits of instruction level
parallelism in SPEC95 applications . . . 31--34
Byung-Sun Yang and
Junpyo Lee and
Jinpyo Park and
Soo-Mook Moon and
Kemal Ebcio\uglu and
Erik Altman Lightweight monitor for Java VM . . . . 35--38
Amit Rao and
Santosh Pande Storage assignment using expression tree
transformations to generate compact and
efficient DSP code . . . . . . . . . . . 39--42
Krisztián Flautner and
Gary S. Tyson and
Trevor Mudge A high level simulator integrated with
the Mirv compiler . . . . . . . . . . . 43--46
H. Cassé and
L. Féraud and
C. Rochange and
P. Sainrat Using the abstract interpretation
technique for static pointer analysis 47--50
Iris Bahar and
Brad Calder and
Dirk Grunwald A comparison of software code reordering
and victim buffers . . . . . . . . . . . 51--54
Steve Carr and
Philip Sweany Improving software pipelining with
hardware support for self-spatial loads 55--58
Rajeev Barua and
Walter Lee and
Saman Amarasinghe and
Anant Agarwal Maps: a compiler-managed memory system
for raw machines . . . . . . . . . . . . 4--15
Sriram Vajapeyam and
P. J. Joseph and
Tulika Mitra Dynamic vectorization: a mechanism for
exploiting far-flung ILP in ordinary
programs . . . . . . . . . . . . . . . . 16--27
Seth Copen Goldstein and
Herman Schmit and
Matthew Moe and
Mihai Budiu and
Srihari Cadambi and
R. Reed Taylor and
Ronald Laufer PipeRench: a co/processor for streaming
multimedia acceleration . . . . . . . . 28--39
Adi Yoaz and
Mattan Erez and
Ronny Ronen and
Stephan Jourdan Speculation techniques for improving
load related instruction scheduling . . 42--53
Michael Bekerman and
Stephan Jourdan and
Ronny Ronen and
Gilad Kirshenboim and
Lihu Rappoport and
Adi Yoaz and
Uri Weiser Correlated load-address predictors . . . 54--63
Brad Calder and
Glenn Reinman and
Dean M. Tullsen Selective value prediction . . . . . . . 64--74
Xiaogang Qiu and
Michel Dubois Tolerating late memory traps in ILP
processors . . . . . . . . . . . . . . . 76--87
Chi-Keung Luk and
Todd C. Mowry Memory forwarding: enabling aggressive
layout optimizations by guaranteeing the
safety of data relocation . . . . . . . 88--99
Sangyeun Cho and
Pen-Chung Yew and
Gyungho Lee Decoupling local variable accesses in a
wide-issue superscalar processor . . . . 100--110
Amir Roth and
Gurindar S. Sohi Effective jump-pointer prefetching for
linked data structures . . . . . . . . . 111--121
Parthasarathy Ranganathan and
Sarita Adve and
Norman P. Jouppi Performance of image and video
processing with general-purpose
processors and media ISA extensions . . 124--135
Matthew C. Merten and
Andrew R. Trick and
Christopher N. George and
John C. Gyllenhaal and
Wen-mei W. Hwu A hardware-driven profiling scheme for
identifying program hot spots to support
runtime optimization . . . . . . . . . . 136--147
Xiaowei Shen and
Arvind and
Larry Rudolph Commit-reconcile & fences (CRF): a new
memory model for architects and compiler
writers . . . . . . . . . . . . . . . . 150--161
Chris Gniady and
Babak Falsafi and
T. N. Vijaykumar Is SC + ILP = RC? . . . . . . . . . . . 162--171
An-Chow Lai and
Babak Falsafi Memory sharing predictor: the key to a
speculative coherent DSM . . . . . . . . 172--183
Robert S. Chappell and
Jared Stark and
Sangwook P. Kim and
Steven K. Reinhardt and
Yale N. Patt Simultaneous subordinate microthreading
(SSMT) . . . . . . . . . . . . . . . . . 186--195
Bryan Black and
Bohuslav Rychlik and
John Paul Shen The block-based trace cache . . . . . . 196--207
David I. August and
John W. Sias and
Jean-Michel Puiatti and
Scott A. Mahlke and
Daniel A. Connors and
Kevin M. Crozier and
Wen-mei W. Hwu The program decision logic approach to
predicated execution . . . . . . . . . . 208--219
Vinodh Cuppu and
Bruce Jacob and
Brian Davis and
Trevor Mudge A performance comparison of contemporary
DRAM architectures . . . . . . . . . . . 222--233
Glenn Reinman and
Todd Austin and
Brad Calder A scalable front-end architecture for
fast instruction delivery . . . . . . . 234--245
Seongwoo Kim and
Arun K. Somani Area efficient architectures for
information integrity in cache memories 246--255
Tarun Nakra and
Rajiv Gupta and
Mary Lou Soffa Value prediction in VLIW machines . . . 258--269
Dean M. Tullsen and
John S. Seng Storageless value prediction using prior
register values . . . . . . . . . . . . 270--279
Angelos Bilas and
Cheng Liao and
Jaswinder Pal Singh Using network interface support to avoid
asynchronous protocol processing in
shared virtual memory systems . . . . . 282--293
E. Ender Bilir and
Ross M. Dickson and
Ying Hu and
Manoj Plakal and
Daniel J. Sorin and
Mark D. Hill and
David A. Wood Multicast snooping: a new coherence
method using a multicast address network 294--304
Dongming Jiang and
Jaswinder Pal Singh Scaling application performance on a
cache-coherent multiprocessor . . . . . 305--316
Anonymous In memoriam---SIGARCH founder: Caxton C.
Foster . . . . . . . . . . . . . . . . . 1--3
Seung H. Hwang and
Gwan S. Choi Selective-set-invalidation (SSI) for
soft-error-resilient cache architecture 4--9
Peng Cheng and
Hai Jin and
Jiangling Zhang Design of high performance RAID in
real-time system . . . . . . . . . . . . 10--17
C. K. Yuen Architectural support for the cache
based vector computation . . . . . . . . 18--23
Benjamin Driker Disbursed control computer architecture 24--31
Humayun Khalid Performance evaluation of multimedia
systems with MPEG-2 bitstreams . . . . . 32--37
Humayun Khalid A methodology for performance evaluation
of systems with large emulation code . . 38--42
Humayun Khalid Tracing multimedia benchmarks with five
degrees of validation . . . . . . . . . 43--48
Humayun Khalid Performance evaluation of two operating
systems . . . . . . . . . . . . . . . . 49--52
Mark Thorson Internet Nuggets . . . . . . . . . . . . 53--60
Phillip Machanick Correction to RAMpage ASPOLOS paper . . 2--5
H. S. Shahhoseini and
M. Naderi and
S. Nemati Achieving the best performance on
superscalar processors . . . . . . . . . 6--11
Mark Thorson Internet Nuggets . . . . . . . . . . . . 12--14
Marc Torrant and
Muhammad Shaaban and
Roy Czernikowski and
Ken Hsu A simultaneous multithreading simulator 1--5
Mark Thorson Internet Nuggets . . . . . . . . . . . . 6--10
Min Dai and
Christine Eisenbeis and
Sid-Ahmed-Ali Touati Load-store optimization for software
pipelining . . . . . . . . . . . . . . . 3--10
Philippe Clauss and
Beno\^\it Meister Automatic memory layout transformations
to optimize spatial locality in
parameterized loop nests . . . . . . . . 11--19
Barbara Kreaseck and
Dean Tullsen and
Brad Calder Limits of task-based parallelism in
irregular applications . . . . . . . . . 20--20
Junpyo Lee and
Byung-Sun Yang and
Suhyun Kim and
Kemal Ebcio\uglu and
Erik Altman and
Seungil Lee and
Yoo C. Chung and
Heungbok Lee and
Je Hyung Lee and
Soo-Mook Moon Reducing virtual call overheads in a
Java VM just-in-time compiler . . . . . 21--33
Chris Sadler and
Sandeep K. S. Gupta and
Rohit Bhatia Applying predication to efficiently
handle runtime class testing . . . . . . 34--42
Nerina Bermudo and
Xavier Vera and
Antonio González and
Josep Llosa Optimizing cache miss equations
polyhedra . . . . . . . . . . . . . . . 43--52
A. Unger and
E. Zehendner and
Th. Ungerer A combined compiler and architecture
technique to control multithreaded
execution of branches and loop
iterations . . . . . . . . . . . . . . . 53--61
Hakan Aydin and
David Kaeli Using cache line coloring to perform
aggressive procedure inlining . . . . . 62--71
Akhilesh Tyagi and
Gyungho Lee A compiler optimization paradigm for
dynamic energy management . . . . . . . 72--76
Mark Thorson Internet Nuggets . . . . . . . . . . . . 77--78
J. Greggory Steffan and
Christopher B. Colohan and
Antonia Zhai and
Todd C. Mowry A scalable approach to thread-level
speculation . . . . . . . . . . . . . . 1--12
Marcelo Cintra and
José F. Martínez and
Josep Torrellas Architectural support for scalable
speculative parallelization in
shared-memory multiprocessors . . . . . 13--24
Steven K. Reinhardt and
Shubhendu S. Mukherjee Transient fault detection via
simultaneous multithreading . . . . . . 25--36
Quinn Jacobson and
James E. Smith Trace preconstruction . . . . . . . . . 37--46
Ryan Rakvic and
Bryan Black and
John Paul Shen Completion time multiple branch
prediction for enhancing trace cache
performance . . . . . . . . . . . . . . 47--58
Matthew C. Merten and
Andrew R. Trick and
Erik M. Nystrom and
Ronald D. Barnes and
Wen-mei W. Hmu A hardware mechanism for dynamic
extraction and relayout of program hot
spots . . . . . . . . . . . . . . . . . 59--70
Mark Oskin and
Frederic T. Chong and
Matthew Farrens HLS: combining statistical and symbolic
simulation to guide microprocessor
designs . . . . . . . . . . . . . . . . 71--82
David Brooks and
Vivek Tiwari and
Margaret Martonosi Wattch: a framework for
architectural-level power analysis and
optimizations . . . . . . . . . . . . . 83--94
N. Vijaykrishnan and
M. Kandemir and
M. J. Irwin and
H. S. Kim and
W. Ye Energy-driven integrated
hardware-software optimizations using
SimplePower . . . . . . . . . . . . . . 95--106
Erik G. Hallnor and
Steven K. Reinhardt A fully associative software-managed
cache design . . . . . . . . . . . . . . 107--116
Ashley Saulsbury and
Fredrik Dahlgren and
Per Stenström Recency-based TLB preloading . . . . . . 117--127
Scott Rixner and
William J. Dally and
Ujval J. Kapasi and
Peter Mattson and
John D. Owens Memory access scheduling . . . . . . . . 128--138
An-Chow Lai and
Babak Falsafi Selective, accurate, and timely
self-invalidation using last-touch
prediction . . . . . . . . . . . . . . . 139--148
Norman Margolus An embedded DRAM architecture for
large-scale spatial-lattice computations 149--160
Ken Mai and
Tim Paaske and
Nuwan Jayasena and
Ron Ho and
William J. Dally and
Mark Horowitz Smart Memories: a modular reconfigurable
architecture . . . . . . . . . . . . . . 161--171
Craig B. Zilles and
Gurindar S. Sohi Understanding the backward slices of
performance degrading instructions . . . 172--181
Kevin M. Lepak and
Mikko H. Lipasti On the value locality of store
instructions . . . . . . . . . . . . . . 182--191
Zarka Cvetanovic and
R. E. Kessler Performance analysis of the Alpha
21264-based Compaq ES40 system . . . . . 192--202
Paolo Faraboschi and
Geoffrey Brown and
Joseph A. Fisher and
Giuseppe Desoli and
Fred Homewood Lx: a technology platform for
customizable VLIW embedded processing 203--213
Parthasarathy Ranganathan and
Sarita Adve and
Norman P. Jouppi Reconfigurable caches and their
application to media processing . . . . 214--224
Zhi Alex Ye and
Andreas Moshovos and
Scott Hauck and
Prithviraj Banerjee CHIMAERA: a high-performance
architecture with a tightly-coupled
reconfigurable functional unit . . . . . 225--235
Dana S. Henry and
Bradley C. Kuszmaul and
Gabriel H. Loh and
Rahul Sami Circuits for wide-window superscalar
processors . . . . . . . . . . . . . . . 236--247
Vikas Agarwal and
M. S. Hrishikesh and
Stephen W. Keckler and
Doug Burger Clock rate versus IPC: the end of the
road for conventional microarchitectures 248--259
J. E. Smith and
Greg Faanes and
Rabin Sugumar Vector instruction set support for
conditional operations . . . . . . . . . 260--269
Yuan Chou and
John Paul Shen Instruction path coprocessors . . . . . 270--281
Luiz André Barroso and
Kourosh Gharachorloo and
Robert McNamara and
Andreas Nowatzyk and
Shaz Qadeer and
Barton Sano and
Scott Smith and
Robert Stets and
Ben Verghese Piranha: a scalable architecture based
on single-chip multiprocessing . . . . . 282--293
Ramesh Radhakrishnan and
Deependra Talla and
Lizy Kurian John Allowing for ILP in an embedded Java
processor . . . . . . . . . . . . . . . 294--305
Michael Bekerman and
Adi Yoaz and
Freddy Gabbay and
Stephan Jourdan and
Maxim Kalaev and
Ronny Ronen Early load address resolution via
register tracking . . . . . . . . . . . 306--315
José-Lorenzo Cruz and
Antonio González and
Mateo Valero and
Nigel P. Topham Multiple-banked register file
architectures . . . . . . . . . . . . . 316--325
Benjamín Sahelices Fernández and
Diego R. Llanos Ferraris and
Agustín de Dios Hernández Exploiting parallelism in a network of
workstations using COMA-BC . . . . . . . 1--8
Mark Thorson Internet Nuggets . . . . . . . . . . . . 9--13
Jean-Louis Lafitte Regarding a device to help battering the
RAM wall . . . . . . . . . . . . . . . . 4--10
S. Petit and
J. A. Gil and
J. Sahuquillo and
A. Pont LIDE: a simulation environment for
shared virtual memory systems . . . . . 11--18
Steven W. Schlosser and
John Linwood Griffin and
David F. Nagle and
Gregory R. Ganger Designing computer systems with
MEMS-based storage . . . . . . . . . . . 1--12
Kourosh Gharachorloo and
Madhu Sharma and
Simon Steely and
Stephen Van Doren Architecture and design of AlphaServer
GS320 . . . . . . . . . . . . . . . . . 13--24
Milo M. K. Martin and
Daniel J. Sorin and
Anastassia Ailamaki and
Alaa R. Alameldeen and
Ross M. Dickson and
Carl J. Mauer and
Kevin E. Moore and
Manoj Plakal and
Mark D. Hill and
David A. Wood Timestamp snooping: an approach for
extending SMPs . . . . . . . . . . . . . 25--36
Ashwini Nanda and
Kwok-Ken Mak and
Krishnan Sugarvanam and
Ramendra K. Sahoo and
Vijayaraghavan Soundararajan and
T. Basil Smith MemorIES3: a programmable, real-time
hardware emulation tool for
multiprocessor server design . . . . . . 37--48
Jeff Gibson and
Robert Kunz and
David Ofelt and
Mark Horowitz and
John Hennessy and
Mark Heinrich FLASH vs. (Simulated) FLASH: closing the
simulation loop . . . . . . . . . . . . 49--58
Andy Chou and
Benjamin Chelf and
Dawson Engler and
Mark Heinrich Using meta-level compilation to check
FLASH protocol code . . . . . . . . . . 59--70
Raoul A. F. Bhoedjang and
Kees Verstoep and
Tim Rühl and
Henri E. Bal and
Rutger F. H. Hofman Evaluating design alternatives for
reliable communication on high-speed
networks . . . . . . . . . . . . . . . . 71--81
Peter Mattson and
William J. Dally and
Scott Rixner and
Ujval J. Kapasi and
John D. Owens Communication scheduling . . . . . . . . 82--92
Jason Hill and
Robert Szewczyk and
Alec Woo and
Seth Hollar and
David Culler and
Kristofer Pister System architecture directions for
networked sensors . . . . . . . . . . . 93--104
Alvin R. Lebeck and
Xiaobo Fan and
Heng Zeng and
Carla Ellis Power aware page allocation . . . . . . 105--116
Emery D. Berger and
Kathryn S. McKinley and
Robert D. Blumofe and
Paul R. Wilson Hoard: a scalable memory allocator for
multithreaded applications . . . . . . . 117--128
Kristián Flautner and
Rich Uhlig and
Steve Reinhardt and
Trevor Mudge Thread-level parallelism and interactive
performance of desktop applications . . 129--138
Motohiro Kawahito and
Hideaki Komatsu and
Toshio Nakatani Effective null pointer check elimination
utilizing hardware trap . . . . . . . . 139--149
Youtao Zhang and
Jun Yang and
Rajiv Gupta Frequent value locality and
value-centric data cache design . . . . 150--159
M. Burrows and
U. Erlingson and
S-T. A. Leung and
M. T. Vandevoorde and
C. A. Waldspurger and
K. Walker and
W. E. Weihl Efficient and flexible value sampling 160--167
David Lie Chandramohan Thekkath and
Mark Mitchell and
Patrick Lincoln and
Dan Boneh and
John Mitchell and
Mark Horowitz Architectural support for copy and
tamper resistant software . . . . . . . 168--177
Jerome Burke and
John McDonald and
Todd Austin Architectural support for fast
symmetric-key cryptography . . . . . . . 178--189
John Kubiatowicz and
David Bindel and
Yan Chen and
Steven Czerwinski and
Patrick Eaton and
Dennis Geels and
Ramakrishna Gummadi and
Sean Rhea and
Hakim Weatherspoon and
Chris Wells and
Ben Zhao OceanStore: an architecture for
global-scale persistent storage . . . . 190--201
Evelyn Duesterwald and
Vasanth Bala Software profiling for hot path
prediction: less is more . . . . . . . . 202--211
Rumi Zahir and
Jonathan Ross and
Dale Morris and
Drew Hess OS and compiler considerations in the
design of the IA-64 architecture . . . . 212--221
Daniel A. Connors and
Hillery C. Hunter and
Ben-Chung Cheng and
Wen-mei W. Hwu Hardware support for dynamic activation
of compiler-directed computation reuse 222--233
Allan Snavely and
Dean M. Tullsen Symbiotic job scheduling for a
simultaneous multithreaded processor . . 234--244
Joshua A. Redstone and
Susan J. Eggers and
Henry M. Levy An analysis of operating system behavior
on a simultaneous multithreaded
architecture . . . . . . . . . . . . . . 245--256
Karthik Sundaramoorthy and
Zach Purser and
Eric Rotenburg Slipstream processors: improving both
performance and fault tolerance . . . . 257--268
Maurice V. Wilkes The memory gap and the future of high
performance memories . . . . . . . . . . 2--7
Naraig Manjikian Multiprocessor enhancements of the
SimpleScalar tool set . . . . . . . . . 8--15
Frank Wang A modified architecture for high-density
MRAM . . . . . . . . . . . . . . . . . . 16--22
Erik R. Altman and
David Kaeli WBT-2000: Workshop on Binary Translation
2000 . . . . . . . . . . . . . . . . . . 23--25
Amitabh Srivastava Emerging opportunities for binary tools 26--26
Harold W. Cain and
Kevin M. Lepak and
Mikko H. Lipasti A dynamic binary translation approach to
architectural simulation . . . . . . . . 27--36
Rolf Hilgendorf and
Wolfram Sauer Instruction translation for an
experimental S/390 processor . . . . . . 37--42
Michiel Ronsse and
Koen De Bosschere JiTI: a robust just in time
instrumentation technique . . . . . . . 43--54
David Ung and
Cristina Cifuentes Optimising hot paths in a dynamic binary
translator . . . . . . . . . . . . . . . 55--65
Michael Gschwind and
Erik Altman Optimization and precise exceptions in
dynamic compilation . . . . . . . . . . 66--74
Mark Thorson Internet Nuggets . . . . . . . . . . . . 75--77
Craig Zilles and
Gurindar Sohi Execution-based prediction using
speculative slices . . . . . . . . . . . 2--13
Jamison D. Collins and
Hong Wang and
Dean M. Tullsen and
Christopher Hughes and
Yong-Fong Lee and
Dan Lavery and
John P. Shen Speculative precomputation: long-range
prefetching of delinquent loads . . . . 14--25
Rajeev Balasubramonian and
Sandhya Dwarkadas and
David H. Albonesi Dynamically allocating processor
resources between nearby and distant ILP 26--37
Chi-Keung Luk Tolerating memory latency through
software-controlled pre-execution in
simultaneous multithreading processors 40--51
Murali Annavaram and
Jignesh M. Patel and
Edward S. Davidson Data prefetching by dependence graph
precomputation . . . . . . . . . . . . . 52--61
Vinodh Cuppu and
Bruce Jacob Concurrency, latency, or system
overhead: which has the largest impact
on uniprocessor DRAM-system performance? 62--71
Brian Fields and
Shai Rubin and
Rastislav Bodík Focusing processor policies via
critical-path prediction . . . . . . . . 74--85
Timothy Sherwood and
Brad Calder Automated design of finite state machine
predictors for customized processors . . 86--97
Youfeng Wu and
Dong-Yuan Chen and
Jesse Fang Better exploration of region-level value
locality with integrated computation
reuse and value prediction . . . . . . . 98--108
Lisa Wu and
Chris Weaver and
Todd Austin CryptoManiac: a fast flexible
architecture for secure communication 110--119
Ki Hwan Yum and
Eun Jung Kim and
Chita R. Das QoS provisioning in clusters: an
investigation of Router and NIC design 120--129
Srikanth T. Srinivasan and
Roy Dz-ching Ju and
Alvin R. Lebeck and
Chris Wilkerson Locality vs. criticality . . . . . . . . 132--143
An-Chow Lai and
Cem Fide and
Babak Falsafi Dead-block prediction & dead-block
correlating prefetchers . . . . . . . . 144--154
Alex Ramirez and
Luiz André Barroso and
Kourosh Gharachorloo and
Robert Cohn and
Josep Larriba-Pey and
P. Geoffrey Lowney and
Mateo Valero Code layout optimizations for
transaction processing workloads . . . . 155--164
Michael Thaddeus Niemier and
Peter M. Kogge Exploring and exploiting wire-level
pipelining in emerging technologies . . 166--177
Seth Copen Goldstein and
Mihai Budiu NanoFabrics: spatial computing using
molecular electronics . . . . . . . . . 178--191
David Lie and
Andy Chou and
Dawson Engler and
David L. Dill A simple method for extracting models
for protocol code . . . . . . . . . . . 192--203
Milos Prvulovic and
María Jesús Garzarán and
Lawrence Rauchwerger and
Josep Torrellas Removing architectural bottlenecks to
the scalability of speculative
parallelization . . . . . . . . . . . . 204--215
R. Iris Bahar and
Srilatha Manne Power and energy reduction via pipeline
balancing . . . . . . . . . . . . . . . 218--229
Daniele Folegnani and
Antonio González Energy-effective issue logic . . . . . . 230--239
Stefanos Kaxiras and
Zhigang Hu and
Margaret Martonosi Cache decay: exploiting generational
behavior to reduce cache leakage power 240--251
Christopher J. Hughes and
Praful Kaul and
Sarita V. Adve and
Rohit Jain and
Chanik Park and
Jayanth Srinivasan Variability in the execution of
multimedia applications and implications
for architecture . . . . . . . . . . . . 254--265
S. Subramanya Sastry and
Rastislav Bodík and
James E. Smith Rapid profiling via stratified sampling 278--289
Craig B. Zilles Benchmark health considered harmful . . 4--5
Niki C. Thornock and
J. Kelly Flanagan A national trace collection and
distribution resource . . . . . . . . . 6--10
Mark Thorson Internet Nuggets . . . . . . . . . . . . 11--15
Naraig Manjikian More enhancements of the SimpleScalar
tool set . . . . . . . . . . . . . . . . 5--12
Jason F. Cantin and
Mark D. Hill Cache performance for selected SPEC
CPU2000 benchmarks . . . . . . . . . . . 13--18
Jinsuo Zhang The predictability of load address . . . 19--28
Mark Thorson Internet Nuggets . . . . . . . . . . . . 29--31
M. Watheq El-Kharashi and
Fayez Elguibaly and
Kin F. Li Adapting Tomasulo's algorithm for
bytecode folding based Java processors 1--8
S. Bartolini and
R. Giorgi and
J. Protic and
C. A. Prete and
M. Valero Parallel architecture and compilation
techniques: selection of workshop
papers, Guest Editors' introduction . . 9--12
Andrea Acquaviva and
Luca Benini and
Bruno Riccó Energy characterization of embedded
real-time operating systems . . . . . . 13--18
M. Angels Moncusi and
Alex Arenas and
Jesus Labarta Improving energy saving in hard real
time systems via a modified dual
priority scheduling . . . . . . . . . . 19--24
Frank Vahid and
Rilesh Patel and
Greg Stitt Propagating constants past software to
hardware peripherals in
fixed-application embedded systems . . . 25--30
Vishal Aslot and
Rudolf Eigenmann Performance characteristics of the SPEC
OMP2001 benchmarks . . . . . . . . . . . 31--40
J. Mark Bull and
Darragh O'Neill A microbenchmark suite for OpenMP 2.0 41--48
D. S. Nikolopoulos and
E. Artiaga and
E. Ayguadé and
J. Labarta Exploiting memory affinity in OpenMP
through schedule reuse . . . . . . . . . 49--55
Michael Sung and
Ronny Krashinsky and
Krste Asanovi\'c Multithreading decoupled architectures
for complexity-effective general purpose
computing . . . . . . . . . . . . . . . 56--61
Deependra Talla and
Lizy K. John MediaBreeze: a decoupled architecture
for accelerating multimedia applications 62--67
Tatsuo Nakajima A middleware component supporting
flexible user interaction for networked
home appliances . . . . . . . . . . . . 68--75
David Touzet and
Jean-Marc Menaud and
Frédéric Weis and
Paul Couderc and
Michel Banâtre SIDE surfer: enriching casual meetings
with spontaneous information gathering 76--83
Erik R. Altman and
David R. Kaeli Workshop on Binary Translation 2001 . . 84--85
Mark Thorson Internet Nuggets . . . . . . . . . . . . 86--90
Rajagopalan Desikan and
Doug Burger and
Stephen W. Keckler and
Llorenc Cruz and
Fernando Latorre and
Antonio González and
Mateo Valero Errata on ``Measuring Experimental Error
in Microprocessor Simulation'' . . . . . 2--4
Fu-Chi Chang and
Chia-Jiu Wang Architectural tradeoff in implementing
RSA processors . . . . . . . . . . . . . 5--11
Augustus K. Uht Disjoint Eager Execution: what it is
/what it is not . . . . . . . . . . . . 12--14
Mark Thorson Internet Nuggets . . . . . . . . . . . . 15--21
A. Hartstein and
Thomas R. Puzak The optimum pipeline depth for a
microprocessor . . . . . . . . . . . . . 7--13
M. S. Hrishikesh and
Doug Burger and
Norman P. Jouppi and
Stephen W. Keckler and
Keith I. Farkas and
Premkishore Shivakumar The optimal logic depth per pipeline
stage is 6 to 8 FO4 inverter delays . . 14--24
Eric Sprangle and
Doug Carmean Increasing processor performance by
implementing deeper pipelines . . . . . 25--34
Dan Ernst and
Todd Austin Efficient dynamic scheduling through tag
elimination . . . . . . . . . . . . . . 37--46
Brian Fields and
Rastislav Bodík and
Mark D. Hill Slack: maximizing performance under
technological constraints . . . . . . . 47--58
Alvin R. Lebeck and
Jinson Koppanalil and
Tong Li and
Jaidev Patwardhan and
Eric Rotenberg A large, fast instruction window for
tolerating cache misses . . . . . . . . 59--70
Ho-Seop Kim and
James E. Smith An instruction set and microarchitecture
for instruction level distributed
processing . . . . . . . . . . . . . . . 71--81
T. N. Vijaykumar and
Irith Pomeranz and
Karl Cheng Transient-fault recovery using
simultaneous multithreading . . . . . . 87--98
Shubhendu S. Mukherjee and
Michael Kontz and
Steven K. Reinhardt Detailed design and evaluation of
redundant multithreading alternatives 99--110
Milos Prvulovic and
Zheng Zhang and
Josep Torrellas ReVive: cost-effective architectural
support for rollback recovery in
shared-memory multiprocessors . . . . . 111--122
Daniel J. Sorin and
Milo M. K. Martin and
Mark D. Hill and
David A. Wood SafetyNet: improving the availability of
shared memory multiprocessors with
global checkpoint/recovery . . . . . . . 123--134
Seongmoo Heo and
Kenneth Barr and
Mark Hampton and
Krste Asanovi\'c Dynamic fine-grain leakage reduction
using leakage-biased bitlines . . . . . 137--147
Krisztián Flautner and
Nam Sung Kim and
Steve Martin and
David Blaauw and
Trevor Mudge Drowsy caches: simple techniques for
reducing leakage power . . . . . . . . . 148--157
Anoop Iyer and
Diana Marculescu Power and performance evaluation of
globally asynchronous locally
synchronous processors . . . . . . . . . 158--168
Yan Solihin and
Jaejin Lee and
Josep Torrellas Using a user-level memory thread for
correlation prefetching . . . . . . . . 171--182
Jarrod A. Lewis and
Bryan Black and
Mikko H. Lipasti Avoiding initialization misses to the
heap . . . . . . . . . . . . . . . . . . 183--194
Gokul B. Kandiraju and
Anand Sivasubramaniam Going the distance for TLB prefetching:
an application-driven study . . . . . . 195--206
Zhigang Hu and
Stefanos Kaxiras and
Margaret Martonosi Timekeeping in the memory system:
predicting and optimizing memory
behavior . . . . . . . . . . . . . . . . 209--220
Ilhyun Kim and
Mikko H. Lipasti Implementing optimizations at decode
time . . . . . . . . . . . . . . . . . . 221--232
Ashutosh S. Dhodapkar and
James E. Smith Managing multi-configuration hardware
via dynamic working set analysis . . . . 233--244
Philip Buonadonna and
David Culler Queue pair IP: a hybrid architecture for
system area networks . . . . . . . . . . 247--256
Yuanyuan Zhou and
Angelos Bilas and
Suresh Jagannathan and
Cezary Dubnicki and
James F. Philbin and
Kai Li Experiences with VI communication for
database storage . . . . . . . . . . . . 257--268
Alex Pajuelo and
Antonio González and
Mateo Valero Speculative dynamic vectorization . . . 271--280
Roger Espasa and
Federico Ardanaz and
Joel Emer and
Stephen Felix and
Julio Gago and
Roger Gramunt and
Isaac Hernandez and
Toni Juan and
Geoff Lowney and
Matthew Mattina and
André Seznec Tarantula: a vector extension to the
Alpha architecture . . . . . . . . . . . 281--292
André Seznec and
Stephen Felix and
Venkata Krishnan and
Yiannakis Sazeides Design tradeoffs for the Alpha EV8
conditional branch predictor . . . . . . 295--306
Robert S. Chappell and
Francis Tseng and
Adi Yoaz and
Yale N. Patt Difficult-path branch prediction using
subordinate microthreads . . . . . . . . 307--317
Steven E. Raasch and
Nathan L. Binkert and
Steven K. Reinhardt A scalable instruction queue design
using dependence chains . . . . . . . . 318--329
Ken Steele and
Jason Waterman and
Eugene Weinstein The Oxygen H21 handheld . . . . . . . . 3--4
Diana Keen and
Frederic T. Chong Hardware-software co-design of embedded
sensor-actuator networks . . . . . . . . 5--6
Masaaki Kondo and
Motonobu Fujita and
Hiroshi Nakamura Software-controlled on-chip memory for
high-performance and low-power computing 7--8
Ramendra K. Sahoo and
Myung Bae and
Jose Moreira Semi-hierarchical approach for
reliability, availability, and
serviceability of cellular systems . . . 9--10
Hans Eberle Monitoring and diagnosing computer
systems by radio communication . . . . . 11--12
William Thies and
Michal Karczmarek and
Michael Gordon and
David Maze and
Jeremy Wong and
Henry Hoffmann and
Matthew Brown and
Saman Amarasinghe A common machine language for grid-based
architectures . . . . . . . . . . . . . 13--14
Frank Wang and
Na Helian and
Farhi Marir A novel associative memory architecture
for quick matching . . . . . . . . . . . 15--16
Mike Parker A case for user-level interrupts . . . . 17--18
Martin Burtscher An improved index function for (D)FCM
predictors . . . . . . . . . . . . . . . 19--24
Mark Thorson Internet Nuggets . . . . . . . . . . . . 25--26
I. G\`omez and
L. Piñuel and
M. Prieto and
F. Tirado Analysis of simulation-adapted SPEC 2000
benchmarks . . . . . . . . . . . . . . . 4--10
Mark Thorson Internet Nuggets . . . . . . . . . . . . 11--16
Deborah Estrin Keynote address: Sensor network
research: emerging challenges for
architecture, systems, and languages . . 1--4
Ravi Rajwar and
James R. Goodman Transactional lock-free execution of
lock-based programs . . . . . . . . . . 5--17
José F. Martínez and
Josep Torrellas Speculative synchronization: applying
thread-level speculation to explicitly
parallel applications . . . . . . . . . 18--29
Kevin M. Lepak and
Mikko H. Lipasti Temporally silent stores . . . . . . . . 30--41
Timothy Sherwood and
Erez Perelman and
Greg Hamerly and
Brad Calder Automatically characterizing large scale
program behavior . . . . . . . . . . . . 45--57
Kazunori Ogata and
Hideaki Komatsu and
Toshio Nakatani Bytecode fetch optimization for a Java
interpreter . . . . . . . . . . . . . . 58--67
Tao Li and
Lizy Kurian John and
Anand Sivasubramaniam and
N. Vijaykrishnan and
Juan Rubio Understanding and improving operating
system effects in control flow
prediction . . . . . . . . . . . . . . . 68--80
Philip Levis and
David Culler Maté: a tiny virtual machine for sensor
networks . . . . . . . . . . . . . . . . 85--95
Philo Juang and
Hidekazu Oki and
Yong Wang and
Margaret Martonosi and
Li Shiuan Peh and
Daniel Rubenstein Energy-efficient computing for wildlife
tracking: design tradeoffs and early
experiences with ZebraNet . . . . . . . 96--107
Darko Kirovski and
Milenko Drini\'c and
Miodrag Potkonjak Enabling trusted software integrity . . 108--120
Heng Zeng and
Carla S. Ellis and
Alvin R. Lebeck and
Amin Vahdat ECOSystem: managing energy as a first
class operating system resource . . . . 123--132
Raksit Ashok and
Saurabh Chheda and
Csaba Andras Moritz Cool-Mem: combining statically
speculative memory accessing with
selective address translation for energy
efficiency . . . . . . . . . . . . . . . 133--143
Ruchira Sasanka and
Christopher J. Hughes and
Sarita V. Adve Joint local and global hardware
adaptations for energy . . . . . . . . . 144--155
Dongkeun Kim and
Donald Yeung Design and evaluation of compiler
algorithms for pre-execution . . . . . . 159--170
Antonia Zhai and
Christopher B. Colohan and
J. Gregory Steffan and
Todd C. Mowry Compiler optimization of scalar value
communication between speculative
threads . . . . . . . . . . . . . . . . 171--183
Jeffrey Oplinger and
Monica S. Lam Enhancing software reliability with
speculative threads . . . . . . . . . . 184--196
J. Adam Butts and
Guri Sohi Dynamic dead-instruction detection and
elimination . . . . . . . . . . . . . . 199--210
Changkyu Kim and
Doug Burger and
Stephen W. Keckler An adaptive, non-uniform cache structure
for wire-delay dominated on-chip caches 211--222
Shubhendu S. Mukherjee and
Federico Silla and
Peter Bannon and
Joel Emer and
Steve Lang and
David Webb A comparative study of arbitration
algorithms for the Alpha 21364 pipelined
router . . . . . . . . . . . . . . . . . 223--234
Hyong-youb Kim and
Vijay S. Pai and
Scott Rixner Increasing Web server throughput with
network interface data caching . . . . . 239--250
Eddie Kohler and
Robert Morris and
Benjie Chen Programming language optimizations for
modular router configurations . . . . . 251--263
Muthian Sivathanu and
Andrea C. Arpaci-Dusseau and
Remzi H. Arpaci-Dusseau Evolving RPC for active storage . . . . 264--276
Robert Cooksey and
Stephan Jourdan and
Dirk Grunwald A stateless, content-directed data
prefetching mechanism . . . . . . . . . 279--290
Michael I. Gordon and
William Thies and
Michal Karczmarek and
Jasper Lin and
Ali S. Meli and
Andrew A. Lamb and
Chris Leger and
Jeremy Wong and
Henry Hoffmann and
David Maze and
Saman Amarasinghe A stream compiler for
communication-exposed architectures . . 291--303
Emmett Witchel and
Josh Cates and
Krste Asanovi\'c Mondrian memory protection . . . . . . . 304--316
Jack B. Dennis Fresh Breeze: a multiprocessor chip
architecture guided by modular
programming principles . . . . . . . . . 7--15
D. Morano and
A. Khalafi and
D. R. Kaeli and
A. K. Uht Realizing high IPC through a scalable
memory-latency tolerant multipath
microarchitecture . . . . . . . . . . . 16--25
George Almási and
C\ualin Ca\cscaval and
José G. Castaños and
Monty Denneau and
Derek Lieber and
José E. Moreira and
Henry S. Warren, Jr. Dissecting Cyclops: a detailed analysis
of a multithreaded architecture . . . . 26--38
Mohamed M. Zahran On cache memory hierarchy for
Chip-Multiprocessor . . . . . . . . . . 39--48
Gary Gréwal and
Tom Wilson and
Andrew Morton An EGA approach to the compile-time
assignment of data to multiple memories
in digital-signal processors . . . . . . 49--59
Ulrich Ramacher and
Nico Brüs and
Ulrich Hachmann and
Jens Harnisch and
Wolfgang Raab and
Axel Techmer 100 GOPS vision processor for automotive
applications . . . . . . . . . . . . . . 60--68
Nikos P. Pitsianis and
Gerald G. Pechanek Indirect VLIW memory allocation for the
ManArray multiprocessor DSP . . . . . . 69--74
Naohiko Shimizu and
Ken Takatori A transparent Linux super page kernel
for Alpha, Sparc64 and IA32: reducing
TLB misses of applications . . . . . . . 75--84
Alessio Bechini and
Pierfrancesco Foglia and
Cosimo Antonio Prete Fine-grain design space exploration for
a cartographic SoC multiprocessor . . . 85--92
Mark Thorson Internet Nuggets . . . . . . . . . . . . 93--96
Kevin Skadron and
Mircea R. Stan and
Wei Huang and
Sivakumar Velusamy and
Karthik Sankaranarayanan and
David Tarjan Temperature-aware microarchitecture . . 2--13
Grigorios Magklis and
Michael L. Scott and
Greg Semeraro and
David H. Albonesi and
Steven Dropsho Profile-based dynamic voltage and
frequency scaling for a multiple clock
domain microprocessor . . . . . . . . . 14--27
Ilhyun Kim and
Mikko H. Lipasti Half-price architecture . . . . . . . . 28--38
Il Park and
Babak Falsafi and
T. N. Vijaykumar Implicitly-multithreaded processors . . 39--51
Daniel Citron MisSPECulation: partial and misleading
use of SPEC CPU2000 in computer
architecture conferences . . . . . . . . 52--61
Jessica H. Tseng and
Krste Asanovi\'c Banked multiported register files for
high-frequency superscalar
microprocessors . . . . . . . . . . . . 62--71
Michael D. Powell and
T. N. Vijaykumar Pipeline damping: a microarchitectural
technique to reduce inductive noise in
supply voltage . . . . . . . . . . . . . 72--83
Roland E. Wunderlich and
Thomas F. Wenisch and
Babak Falsafi and
James C. Hoe SMARTS: accelerating microarchitecture
simulation via rigorous statistical
sampling . . . . . . . . . . . . . . . . 84--97
Mohamed Gomaa and
Chad Scarbrough and
T. N. Vijaykumar and
Irith Pomeranz Transient-fault recovery for chip
multiprocessors . . . . . . . . . . . . 98--109
Milos Prvulovic and
Josep Torrellas ReEnact: using thread-level speculation
mechanisms to debug data races in
multithreaded codes . . . . . . . . . . 110--121
Min Xu and
Rastislav Bodik and
Mark D. Hill A ``flight data recorder'' for enabling
full-system multiprocessor deterministic
replay . . . . . . . . . . . . . . . . . 122--135
Chuanjun Zhang and
Frank Vahid and
Walid Najjar A highly configurable cache architecture
for embedded systems . . . . . . . . . . 136--146
Alper Buyuktosuno\uglu and
Tejas Karkhanis and
David H. Albonesi and
Pradip Bose Energy efficient co-adaptive instruction
fetch and issue . . . . . . . . . . . . 147--156
Michael C. Huang and
Jose Renau and
Josep Torrellas Positional adaptation of processors:
application to energy reduction . . . . 157--168
Sudhanva Gurumurthi and
Anand Sivasubramaniam and
Mahmut Kandemir and
Hubertus Franke DRPM: dynamic speed control for power
management in server class disks . . . . 169--181
Milo M. K. Martin and
Mark D. Hill and
David A. Wood Token coherence: decoupling performance
and correctness . . . . . . . . . . . . 182--193
Arjun Singh and
William J. Dally and
Amit K. Gupta and
Brian Towles GOAL: a load-balanced adaptive routing
algorithm for torus networks . . . . . . 194--205
Milo M. K. Martin and
Pacia J. Harper and
Daniel J. Sorin and
Mark D. Hill and
David A. Wood Using destination-set prediction to
improve the latency/bandwidth tradeoff
in shared-memory multiprocessors . . . . 206--217
Zarka Cvetanovic Performance analysis of the Alpha
21364-based HP GS1280 multiprocessor . . 218--229
Paramjit S. Oberoi and
Gurindar S. Sohi Parallelism in the front-end . . . . . . 230--240
André Seznec and
Antony Fraboulet Effective ahead pipelining of
instruction block address generation . . 241--252
Dan Ernst and
Andrew Hamel and
Todd Austin Cyclone: a broadcast-free dynamic
instruction scheduler with selective
replay . . . . . . . . . . . . . . . . . 253--263
Ravi Bhargava and
Lizy K. John Improving dynamic cluster assignment for
clustered trace cache processors . . . . 264--274
Rajeev Balasubramonian and
Sandhya Dwarkadas and
David H. Albonesi Dynamically managing the
communication-parallelism trade-off in
future clustered processors . . . . . . 275--287
Timothy Sherwood and
George Varghese and
Brad Calder A pipelined memory architecture for high
throughput network processors . . . . . 288--299
Jahangir Hasan and
Satish Chandra and
T. N. Vijaykumar Efficient use of memory bandwidth to
improve network processor throughput . . 300--313
Renju Thomas and
Manoj Franklin and
Chris Wilkerson and
Jared Stark Improving branch prediction by dynamic
dataflow-based identification of
correlated branches from a large global
history . . . . . . . . . . . . . . . . 314--323
Huiyang Zhou and
Jill Flanagan and
Thomas M. Conte Detecting global stride locality in
value streams . . . . . . . . . . . . . 324--335
Timothy Sherwood and
Suleyman Sair and
Brad Calder Phase tracking and prediction . . . . . 336--349
Aravindh Anantaraman and
Kiran Seth and
Kaustubh Patil and
Eric Rotenberg and
Frank Mueller Virtual simple architecture (VISA):
exceeding the complexity limit in safe
real-time systems . . . . . . . . . . . 350--361
Marc L. Corliss and
E. Christopher Lewis and
Amir Roth DISE: a programmable macro engine for
customizing applications . . . . . . . . 362--373
Mark Oskin and
Frederic T. Chong and
Isaac L. Chuang and
John Kubiatowicz Building quantum wires: the long and the
short of it . . . . . . . . . . . . . . 374--387
Zhenlin Wang and
Doug Burger and
Kathryn S. McKinley and
Steven K. Reinhardt and
Charles C. Weems Guided region prefetching: a cooperative
hardware/software approach . . . . . . . 388--398
Christos Kozyrakis and
David Patterson Overcoming the limitations of
conventional vector processors . . . . . 399--409
Jinwoo Suh and
Eun-Gyu Kim and
Stephen P. Crago and
Lakshmi Srinivasan and
Matthew C. French A performance analysis of PIM, stream
processing, and tiled processing on
memory-intensive signal processing
kernels . . . . . . . . . . . . . . . . 410--421
Karthikeyan Sankaralingam and
Ramadass Nagarajan and
Haiming Liu and
Changkyu Kim and
Jaehyuk Huh and
Doug Burger and
Stephen W. Keckler and
Charles R. Moore Exploiting ILP, TLP, and DLP with the
polymorphous TRIPS architecture . . . . 422--433
Michael K. Chen and
Kunle Olukotun The Jrpm system for dynamically
parallelizing Java programs . . . . . . 434--446
Anthony S. Fong A computer architecture with access
control and cache option tags on
individual instruction operands . . . . 1--5
Edwin J. Tan and
Wendi B. Heinzelman DSP architectures: past, present and
futures . . . . . . . . . . . . . . . . 6--19
Lucian N. Vintan and
Marius Sbera and
Ioan Z. Mihu and
Adrian Florea An alternative to branch prediction:
pre-computed branches . . . . . . . . . 20--29
Mark Heinrich and
Mainak Chaudhuri Ocean warning: avoid drowning . . . . . 30--32
Jean-Louis Lafitte Qualitatively matching computer
architecture with Turing machine . . . . 33--41
Takenori Koushiro and
Toshinori Sato and
Itsujiro Arita A trace-level value predictor for
Contrail processors . . . . . . . . . . 42--47
Mark Thorson Internet Nuggets . . . . . . . . . . . . 48--54
Mikkel Thorup Combinatorial power in multimedia
processors . . . . . . . . . . . . . . . 5--11
Gary K. W. Hau and
Anthony Fong and
Mok Pak Lun Support of Java API for the jHISC system 12--17
Mok Pak Lun and
Richard Li and
Anthony Fong Method manipulation in an
object-oriented processor . . . . . . . 18--25
Mark Thorson Internet Nuggets . . . . . . . . . . . . 26--32
Kristopher C. Breen and
Duncan G. Elliott Aliasing and anti-aliasing in branch
history table prediction . . . . . . . . 1--4
Ryan W. S. Yu and
Gary K. W. Hau and
Anthony S. Fong Test bench for software development of
object-oriented processor . . . . . . . 5--9
Mok Pak Lun and
Anthony Fong and
Gary K. W. Hau Object-oriented processor requirements
with instruction analysis of Java
programs . . . . . . . . . . . . . . . . 10--15
Mark Thorson Internet Nuggets . . . . . . . . . . . . 16--21
Lizy Kurian John More on finding a single number to
indicate overall performance of a
benchmark suite . . . . . . . . . . . . 3--8
Mark Thorson Internet Nuggets . . . . . . . . . . . . 9--13
Michael Bedford Taylor and
Walter Lee and
Jason Miller and
David Wentzlaff and
Ian Bratt and
Ben Greenwald and
Henry Hoffmann and
Paul Johnson and
Jason Kim and
James Psota and
Arvind Saraf and
Nathan Shnidman and
Volker Strumpen and
Matt Frank and
Saman Amarasinghe and
Anant Agarwal Evaluation of the Raw Microprocessor: An
Exposed-Wire-Delay Architecture for ILP
and Streams . . . . . . . . . . . . . . 2--2
Anonymous General Co-Chair's Message . . . . . . . 9--9
Anonymous Program Chair's Message . . . . . . . . 10--10
Anonymous Committees . . . . . . . . . . . . . . . 11--11
Anonymous Reviewers . . . . . . . . . . . . . . . 13--13
Jung Ho Ahn and
William J. Dally and
Brucek Khailany and
Ujval J. Kapasi and
Abhishek Das Evaluating the Imagine Stream
Architecture . . . . . . . . . . . . . . 14--14
John W. Sias and
Sain-zee Ueng and
Geoff A. Kent and
Ian M. Steiner and
Erik M. Nystrom and
Wen-mei W. Hwu Field-testing IMPACT EPIC research
results in Itanium 2 . . . . . . . . . . 26--26
T. N. Vijaykumar and
Zeshan Chishti Wire Delay is Not a Problem for SMT (In
the Near Future) . . . . . . . . . . . . 40--40
Ronny Krashinsky and
Christopher Batten and
Mark Hampton and
Steve Gerding and
Brian Pharris and
Jared Casper and
Krste Asanovic The Vector-Thread Architecture . . . . . 52--52
Rakesh Kumar and
Dean M. Tullsen and
Parthasarathy Ranganathan and
Norman P. Jouppi and
Keith I. Farkas Single-ISA Heterogeneous Multi-Core
Architectures for Multithreaded Workload
Performance . . . . . . . . . . . . . . 64--64
Yuan Chou and
Brian Fahs and
Santosh Abraham Microarchitecture Optimizations for
Exploiting Memory-Level Parallelism . . 76--76
Harold W. Cain and
Mikko H. Lipasti Memory Ordering: a Value-Based Approach 90--90
Lance Hammond and
Vicky Wong and
Mike Chen and
Brian D. Carlstrom and
John D. Davis and
Ben Hertzberg and
Manohar K. Prabhu and
Honggo Wijaya and
Christos Kozyrakis and
Kunle Olukotun Transactional Memory Coherence and
Consistency . . . . . . . . . . . . . . 102--102
Sudheendra Hangal and
Durgam Vahia and
Chaiyasit Manovit and
Juin-Yeu Joseph Lu TSOtool: a Program for Verifying Memory
Systems Using the Memory Consistency
Model . . . . . . . . . . . . . . . . . 114--114
Mainak Chaudhuri and
Mark Heinrich SMTp: An Architecture for
Next-generation Scalable Multi-threading 124--124
Christopher J. Hughes and
Sarita V. Adve A Formal Approach to Frequent Energy
Adaptations for Multimedia Applications 138--138
John Oliver and
Ravishankar Rao and
Paul Sultana and
Jedidiah Crandall and
Erik Czernikowski and
Leslie W. Jones IV and
Diana Franklin and
Venkatesh Akella and
Frederic T. Chong Synchroscalar: a Multiple Clock Domain,
Power-Aware, Tile-Based Embedded
Processor . . . . . . . . . . . . . . . 150--150
Roni Rosner and
Yoav Almog and
Micha Moffie and
Naftali Schwartz and
Avi Mendelson Power Awareness through Selective
Dynamically Optimized Traces . . . . . . 162--162
Lakshmi N. Bairavasundaram and
Muthian Sivathanu and
Andrea C. Arpaci-Dusseau and
Remzi H. Arpaci-Dusseau X-RAY: a Non-Invasive Exclusive Caching
Mechanism for RAIDs . . . . . . . . . . 176--176
Robert Mullins and
Andrew West and
Simon Moore Low-Latency Virtual-Channel Routers for
On-Chip Networks . . . . . . . . . . . . 188--188
V. Puente and
J. A. Gregorio and
F. Vallejo and
R. Beivide Immunet: a Cheap and Robust
Fault-Tolerant Packet Routing Mechanism 198--198
Alaa R. Alameldeen and
David A. Wood Adaptive Cache Compression for
High-Performance Processors . . . . . . 212--212
Pin Zhou and
Feng Qin and
Wei Liu and
Yuanyuan Zhou and
Josep Torrellas iWatcher: Efficient Architectural
Support for Software Debugging . . . . . 224--224
Sami Yehia and
Olivier Temam From Sequences of Dependent Instructions
to Functions: An Approach for Improving
Performance without ILP or Speculation 238--238
Ayose Falcon and
Jared Stark and
Alex Ramirez and
Konrad Lai and
Mateo Valero Prophet/Critic Hybrid Branch Prediction 250--250
Christopher Weaver and
Joel Emer and
Shubhendu S. Mukherjee and
Steven K. Reinhardt Techniques to Reduce the Soft Error Rate
of a High-Performance Microprocessor . . 264--264
Jayanth Srinivasan and
Sarita V. Adve and
Pradip Bose and
Jude A. Rivers The Case for Lifetime Reliability-Aware
Microprocessors . . . . . . . . . . . . 276--276
Michael D. Powell and
T. N. Vijaykumar Exploiting Resonant Behavior to Reduce
Inductive Noise . . . . . . . . . . . . 288--288
J. Adam Butts and
Gurindar S. Sohi Use-Based Register Caching with
Decoupled Indexing . . . . . . . . . . . 302--302
Gonzalez Gonzalez and
Adrian Cristal and
Daniel Ortega and
Alexander Veidenbaum and
Mateo Valero A Content Aware Integer Register File
Organization . . . . . . . . . . . . . . 314--314
Mikko H. Lipasti and
Brian R. Mestan and
Erika Gunadi Physical Register Inlining . . . . . . . 325--325
Tejas S. Karkhanis and
James E. Smith A First-Order Superscalar Processor
Model . . . . . . . . . . . . . . . . . 338--338
Lieven Eeckhout and
Robert H. Bell Jr. and
Bastiaan Stougie and
Koen De Bosschere and
Lizy K. John Control Flow Modeling in Statistical
Simulation for Accurate and Efficient
Processor Design Studies . . . . . . . . 350--350
Bharath Iyer and
Sadagopan Srinivasan and
Bruce Jacob Extended Split-Issue: Enabling
Flexibility in the Hardware
Implementation of NUAL VLIW DSPs . . . . 364--364
Angshuman Parashar and
Sudhanva Gurumurthi and
Anand Sivasubramaniam A Complexity-Effective Approach to ALU
Bandwidth Enhancement for
Instruction-Level Temporal Redundancy 376--376
Anonymous Author Index . . . . . . . . . . . . . . 387--387
Adrián Cristal and
José F. Martínez and
Josep Llosa and
Mateo Valero A case for resource-conscious
out-of-order processors: towards
kilo-instruction in-flight processors 3--10
Partha Kundu and
Murali Annavaram and
Trung Diep and
John Shen A case for shared instruction cache on
chip multiprocessors running OLTP . . . 11--18
N. Venkateswaran and
Waran Research Foundation and
Aditya Krishnan and
S. Niranjan Kumar and
Arrvindh Shriraman and
Srinivas Sridharan Memory in processor: a novel design
paradigm for supercomputing
architectures . . . . . . . . . . . . . 19--26
I. Branovic and
R. Giorgi and
E. Martinelli A workload characterization of elliptic
curve cryptography methods in embedded
environments . . . . . . . . . . . . . . 27--34
K. Brifault and
H. P. Charles Data cache management on EPIC
architecture: optimizing memory access
for image processing . . . . . . . . . . 35--42
Naohiko Shimizu and
Chiaki Kon Java object look aside buffer for
embedded applications . . . . . . . . . 43--49
Akihito Sakanaka and
Seiichirou Fujii and
Toshinori Sato A leakage-energy-reduction technique for
highly-associative caches in embedded
systems . . . . . . . . . . . . . . . . 50--54
S. Moch and
M. Berekovi\'c and
H. J. Stolberg and
L. Friebe and
M. B. Kulaczewski and
A. Dehnhardt and
P. Pirsch HIBRID-SOC: a multi-core architecture
for image and video applications . . . . 55--61
Mladen Berekovic and
Sören Moch and
Peter Pirsch A scalable, clustered SMT processor for
digital signal processing . . . . . . . 62--69
S. Bartolini and
C. A. Prete A proposal for input-sensitivity
analysis of profile-driven optimizations
on embedded applications . . . . . . . . 70--77
Mark Thorson Internet Nuggets . . . . . . . . . . . . 78--83
John R. Mashey War of the benchmark means: time for a
truce . . . . . . . . . . . . . . . . . 1--14
Jean-Louis Lafitte 40 years later \ldots a new engine to
handle an operating system
infrastructure . . . . . . . . . . . . . 15--22
Mark Thorson Internet Nuggets . . . . . . . . . . . . 23--41
Lance Hammond and
Brian D. Carlstrom and
Vicky Wong and
Ben Hertzberg and
Mike Chen and
Christos Kozyrakis and
Kunle Olukotun Programming with transactional coherence
and consistency (TCC) . . . . . . . . . 1--13
Mihai Budiu and
Girish Venkataramani and
Tiberiu Chelcea and
Seth Copen Goldstein Spatial computation . . . . . . . . . . 14--26
Virantha Ekanayake and
Clinton Kelly IV and
Rajit Manohar An ultra low-power processor for sensor
networks . . . . . . . . . . . . . . . . 27--36
Christopher R. Lumb and
Richard Golding D-SPTF: decentralized request
distribution in brick-based storage
systems . . . . . . . . . . . . . . . . 37--47
Yasushi Saito and
Svend Fròlund and
Alistair Veitch and
Arif Merchant and
Susan Spence FAB: building distributed enterprise
disk arrays from commodity components 48--58
Timothy E. Denehy and
John Bent and
Florentina I. Popovici and
Andrea C. Arpaci-Dusseau and
Remzi H. Arpaci-Dusseau Deconstructing storage arrays . . . . . 59--71
Xiaotong Zhuang and
Tao Zhang and
Santosh Pande HIDE: an infrastructure for efficiently
protecting information leakage on the
address bus . . . . . . . . . . . . . . 72--84
G. Edward Suh and
Jae W. Lee and
David Zhang and
Srinivas Devadas Secure program execution via dynamic
information flow tracking . . . . . . . 85--96
Jaehyuk Huh and
Jichuan Chang and
Doug Burger and
Gurindar S. Sohi Coherence decoupling: making use of
incoherence . . . . . . . . . . . . . . 97--106
Srikanth T. Srinivasan and
Ravi Rajwar and
Haitham Akkary and
Amit Gandhi and
Mike Upton Continual flow pipelines . . . . . . . . 107--119
Rajagopalan Desikan and
Simha Sethumadhavan and
Doug Burger and
Stephen W. Keckler Scalable selective re-execution for EDGE
architectures . . . . . . . . . . . . . 120--132
John Regehr and
Alastair Reid HOIST: a system for automatically
deriving static analyzers for embedded
systems . . . . . . . . . . . . . . . . 133--143
Perry H. Wang and
Jamison D. Collins and
Hong Wang and
Dongkeun Kim and
Bill Greene and
Kai-Ming Chan and
Aamir B. Yunus and
Terry Sych and
Stephen F. Moore and
John P. Shen Helper threads via virtual
multithreading on an experimental
Itanium-2 processor-based platform . . . 144--155
Matthias Hauswirth and
Trishul M. Chilimbi Low-overhead memory leak detection using
adaptive statistical profiling . . . . . 156--164
Xipeng Shen and
Yutao Zhong and
Chen Ding Locality phase prediction . . . . . . . 165--176
Pin Zhou and
Vivek Pandey and
Jagadeesan Sundaresan and
Anand Raghuraman and
Yuanyuan Zhou and
Sanjeev Kumar Dynamic tracking of page miss ratio
curve for memory management . . . . . . 177--188
Rodric M. Rabbah and
Hariharan Sandanagobalane and
Mongkol Ekpanyapong and
Weng-Fai Wong Compiler orchestrated prefetching via
speculation and predication . . . . . . 189--198
Chen-Yong Cher and
Antony L. Hosking and
T. N. Vijaykumar Software prefetching for mark-sweep
garbage collection: hardware analysis
and software redesign . . . . . . . . . 199--210
David E. Lowell and
Yasushi Saito and
Eileen J. Samberg Devirtualizable virtual machines
enabling general, single-node, online
maintenance . . . . . . . . . . . . . . 211--223
Jared C. Smolens and
Brian T. Gold and
Jangwoo Kim and
Babak Falsafi and
James C. Hoe and
Andreas G. Nowatzyk Fingerprinting: bounding soft-error
detection latency and bandwidth . . . . 224--234
Greg Bronevetsky and
Daniel Marques and
Keshav Pingali and
Peter Szwed and
Martin Schulz Application-level checkpointing for
shared memory programs . . . . . . . . . 235--247
Qiang Wu and
Philo Juang and
Margaret Martonosi and
Douglas W. Clark Formal online methods for
voltage/frequency control in multiple
clock domain microprocessors . . . . . . 248--259
Mohamed Gomaa and
Michael D. Powell and
T. N. Vijaykumar Heat-and-run: leveraging SMT and CMP to
manage power density through the
operating system . . . . . . . . . . . . 260--270
Xiaodong Li and
Zhenmin Li and
Francis David and
Pin Zhou and
Yuanyuan Zhou and
Sarita Adve and
Sanjeev Kumar Performance directed energy management
for main memory and disks . . . . . . . 271--283
David M. Chess Security in autonomic computing . . . . 2--5
Weidong Shi and
Hsien-Hsin S. Lee and
Chenghuai Lu and
Mrinmoy Ghosh Towards the issues in architectural
support for protection of software
execution . . . . . . . . . . . . . . . 6--15
John P. McGregor and
Ruby B. Lee Protecting cryptographic keys and
computations via virtual secure
coprocessing . . . . . . . . . . . . . . 16--26
Brian Rogers and
Yan Solihin and
Milos Prvulovic Memory predecryption: hiding the latency
overhead of memory encryption . . . . . 27--33
David A. Holland and
Ada T. Lim and
Margo I. Seltzer An architecture a day keeps the hacker
away . . . . . . . . . . . . . . . . . . 34--41
Stelios Sidiroglou and
Michael E. Locasto and
Angelos D. Keromytis Hardware support for self-healing
software services . . . . . . . . . . . 42--47
Jedidiah R. Crandall and
Frederic T. Chong A security assessment of the Minos
architecture . . . . . . . . . . . . . . 48--57
Matthew Burnside and
Angelos D. Keromytis The case for crypto protocol awareness
inside the OS kernel . . . . . . . . . . 58--64
Marc L. Corliss and
E. Christopher Lewis and
Amir Roth Using DISE to protect return addresses
from attack . . . . . . . . . . . . . . 65--72
Dong Ye and
David Kaeli A reliable return address stack:
microarchitectural features to defeat
stack smashing . . . . . . . . . . . . . 73--80
Koji Inoue Energy-security tradeoff in a secure
cache architecture against buffer
overflow attacks . . . . . . . . . . . . 81--89
Derek Uluski and
Micha Moffie and
David Kaeli Characterizing antivirus workload
execution . . . . . . . . . . . . . . . 90--98
Monther Aldwairi and
Thomas Conte and
Paul Franzon Configurable string matching hardware
for speeding up intrusion detection . . 99--107
Milena Milenkovi\'c and
Aleksandar Milenkovi\'c and
Emil Jovanov Using instruction block signatures to
counter code injection attacks . . . . . 108--117
Youtao Zhang and
Jun Yang and
Yongjing Lin and
Lan Gao Architectural support for protecting
user privacy on trusted processors . . . 118--123
Masaaki Shirase and
Yasushi Hibino An architecture for elliptic curve
cryptography computation . . . . . . . . 124--133
Taeho Kgil and
Laura Falk and
Trevor Mudge ChipLock: support for secure
microarchitectures . . . . . . . . . . . 134--143
Magnus Ekman and
Fredrik Warg and
Jim Nilsson An in-depth look at computer performance
growth . . . . . . . . . . . . . . . . . 144--147
N. Venkateswaran and
S. Balaji and
V. Sridhar Fault tolerant bus architecture for deep
submicron based processors . . . . . . . 148--155
Mark Thorson Internet Nuggets . . . . . . . . . . . . 156--160
Ruby B. Lee and
Peter C. S. Kwan and
John P. McGregor and
Jeffrey Dwoskin and
Zhenghong Wang Architecture for Protecting Critical
Secrets in Microprocessors . . . . . . . 2--13
Anonymous General Chair's Message . . . . . . . . 9--9
Anonymous Program Chair's Message . . . . . . . . x--xv
Weidong Shi and
Hsien-Hsin S. Lee and
Mrinmoy Ghosh and
Chenghuai Lu and
Alexandra Boldyreva High Efficiency Counter Mode Security
Architecture via Prediction and
Precomputation . . . . . . . . . . . . . 14--24
Anonymous Committees . . . . . . . . . . . . . . . 16--16
Anonymous Reviewers . . . . . . . . . . . . . . . xvii--xviii
G. Edward Suh and
Charles W. O'Donnell and
Ishan Sachdev and
Srinivas Devadas Design and Implementation of the AEGIS
Single-Chip Secure Processor Using
Physical Random Functions . . . . . . . 25--36
Sudhanva Gurumurthi and
Anand Sivasubramaniam and
Vivek K. Natarajan Disk Drive Roadmap from the Thermal
Perspective: a Case for Dynamic Thermal
Management . . . . . . . . . . . . . . . 38--49
Ram Huggahalli and
Ravi Iyer and
Scott Tetrick Direct Cache Access for High Bandwidth
Network I/O . . . . . . . . . . . . . . 50--59
Haryadi S. Gunawi and
Nitin Agrawal and
Andrea C. Arpaci-Dusseau and
Remzi H. Arpaci-Dusseau and
Jiri Schindler Deconstructing Commodity Storage
Clusters . . . . . . . . . . . . . . . . 60--71
Magnus Ekman and
Per Stenström A Robust Main-Memory Compression Scheme 74--85
Brian Fahs and
Todd Rafacz and
Sanjay J. Patel and
Steven S. Lumetta Continuous Optimization . . . . . . . . 86--97
Vlad Petric and
Tingting Sha and
Amir Roth RENO: a Rename-Based Instruction
Optimizer . . . . . . . . . . . . . . . 98--109
Lin Tan and
Timothy Sherwood A High Throughput String Matching
Architecture for Intrusion Detection and
Prevention . . . . . . . . . . . . . . . 112--122
Florin Baboescu and
Dean M. Tullsen and
Grigore Rosu and
Sumeet Singh A Tree Based Router Search Engine
Architecture with Single Port Memories 123--133
Shorin Kyo and
Shin'ichiro Okazaki and
Tamio Arai An Integrated Memory Array Processor
Architecture for Embedded Image
Recognition Systems . . . . . . . . . . 134--145
George A. Reis and
Jonathan Chang and
Neil Vachharajani and
Ram Rangan and
David I. August and
Shubhendu S. Mukherjee Design and Evaluation of Hybrid
Fault-Detection Systems . . . . . . . . 148--159
Ethan Schuchman and
T. N. Vijaykumar Rescue: a Microarchitecture for
Testability and Defect Tolerance . . . . 160--171
Mohamed A. Gomaa and
T. N. Vijaykumar Opportunistic Transient-Fault Detection 172--183
Steven Balensiefer and
Lucas Kregor-Stickles and
Mark Oskin An Evaluation Framework and Instruction
Set Architecture for Ion-Trap Based
Quantum Micro-Architectures . . . . . . 186--196
Leyla Nazhandali and
Bo Zhai and
Javin Olson and
Anna Reeves and
Michael Minuth and
Ryan Helfand and
Sanjay Pant and
Todd Austin and
David Blaauw Energy Optimization of
Subthreshold-Voltage Sensor Network
Processors . . . . . . . . . . . . . . . 197--207
Mark Hempstead and
Nikhil Tripathi and
Patrick Mauro and
Gu-Yeon Wei and
David Brooks An Ultra Low Power System Architecture
for Sensor Network Applications . . . . 208--219
Thomas F. Wenisch and
Stephen Somogyi and
Nikolaos Hardavellas and
Jangwoo Kim and
Anastassia Ailamaki and
Babak Falsafi Temporal Streaming of Shared Memory . . 222--233
Andreas Moshovos RegionScout: Exploiting Coarse Grain
Sharing in Snoop-Based Coherence . . . . 234--245
Jason F. Cantin and
Mikko H. Lipasti and
James E. Smith Improving Multiprocessor Performance
with Coarse-Grain Coherence Tracking . . 246--257
Stephen Hines and
Joshua Green and
Gary Tyson and
David Whalley Improving Program Efficiency by Packing
Instructions into Registers . . . . . . 260--271
Nathan Clark and
Jason Blome and
Michael Chu and
Scott Mahlke and
Stuart Biles and
Krisztian Flautner An Architecture Framework for
Transparent Instruction Set
Customization in Embedded Processors . . 272--283
Satish Narayanasamy and
Gilles Pokam and
Brad Calder BugNet: Continuously Recording Program
Execution for Deterministic Replay
Debugging . . . . . . . . . . . . . . . 284--295
Murali Annavaram and
Ed Grochowski and
John Shen Mitigating Amdahl's Law through EPI
Throttling . . . . . . . . . . . . . . . 298--309
Emil Talpes and
Diana Marculescu Increased Scalability and Power
Efficiency by Using Multiple Speed
Pipelines . . . . . . . . . . . . . . . 310--321
Vlad Petric and
Amir Roth Energy-Effectiveness of Pre-Execution
and Energy-Aware P-Thread Selection . . 322--333
Michael Zhang and
Krste Asanovic Victim Replication: Maximizing Capacity
while Hiding Wire Delay in Tiled Chip
Multiprocessors . . . . . . . . . . . . 336--345
Evan Speight and
Hazim Shafi and
Lixin Zhang and
Ram Rajamony Adaptive Mechanisms and Policies for
Managing Cache Hierarchies in Chip
Multiprocessors . . . . . . . . . . . . 346--356
Zeshan Chishti and
Michael D. Powell and
T. N. Vijaykumar Optimizing Replication, Communication,
and Capacity Allocation in CMPs . . . . 357--368
Onur Mutlu and
Hyesoon Kim and
Yale N. Patt Techniques for Efficient Processing in
Runahead Execution Engines . . . . . . . 370--381
Daniel A. Jimenez Piecewise Linear Branch Prediction . . . 382--393
Andre Seznec Analysis of the O-GEometric History
Length Branch Predictor . . . . . . . . 394--405
Rakesh Kumar and
Victor Zyuban and
Dean M. Tullsen Interconnections in Multi-Core
Architectures: Understanding Mechanisms,
Overheads and Scaling . . . . . . . . . 408--419
John Kim and
William J. Dally and
Brian Towles and
Amit K. Gupta Microarchitecture of a High-Radix Router 420--431
Daeho Seo and
Akif Ali and
Won-Taek Lim and
Nauman Rafique and
Mithuna Thottethodi Near-Optimal Worst-Case Throughput
Routing for Two-Dimensional Mesh
Networks . . . . . . . . . . . . . . . . 432--443
Amit Gandhi and
Haitham Akkary and
Ravi Rajwar and
Srikanth T. Srinivasan and
Konrad Lai Scalable Load and Store Processing in
Latency Tolerant Processors . . . . . . 446--457
Amir Roth Store Vulnerability Window (SVW):
Re-Execution Filtering for Enhanced Load
Optimization . . . . . . . . . . . . . . 458--468
E. F. Torres and
P. Ibanez and
V. Vinals and
J. M. Llaberia Store Buffer Design in First-Level
Multibanked Data Caches . . . . . . . . 469--480
Albert Meixner and
Daniel J. Sorin Dynamic Verification of Sequential
Consistency . . . . . . . . . . . . . . 482--493
Ravi Rajwar and
Maurice Herlihy and
Konrad Lai Virtualizing Transactional Memory . . . 494--505
Saisanthosh Balakrishnan and
Ravi Rajwar and
Mike Upton and
Konrad Lai The Impact of Performance Asymmetry in
Emerging Multicore Architectures . . . . 506--517
Jayanth Srinivasan and
Sarita V. Adve and
Pradip Bose and
Jude A. Rivers Exploiting Structural Duplication for
Lifetime Reliability Enhancement . . . . 520--531
Arijit Biswas and
Paul Racunas and
Razvan Cheveresan and
Joel Emer and
Shubhendu S. Mukherjee and
Ram Rangan Computing Architectural Vulnerability
Factors for Address-Based Structures . . 532--543
Moinuddin K. Qureshi and
David Thompson and
Yale N. Patt The V-Way Cache: Demand Based
Associativity via Global Replacement . . 544--555
Anonymous Author Index . . . . . . . . . . . . . . 556--557
S. Bartolini and
P. Foglia and
C. A. Prete Guests editors' introduction . . . . . . 1--2
Hanene Ben Fradj and
Asmaa el Ouardighi and
Cécile Belleudy and
Michel Auguin Energy aware memory architecture
configuration . . . . . . . . . . . . . 3--9
Hyo-Joong Suh and
Sung Woo Chung DRACO: optimized CC-NUMA system with
novel dual-link interconnections to
reduce the memory latency . . . . . . . 10--16
Sami Yehia and
Jean-François Collard and
Olivier Temam Load squared: adding logic close to
memory to reduce the latency of indirect
loads with high miss ratios . . . . . . 17--24
Hiroaki Kobayashi and
Isao Kotera and
Hiroyuki Takizawa Locality analysis to control dynamically
way-adaptable caches . . . . . . . . . . 25--32
F. Arakawa and
M. Ishikawa and
Y. Kondo and
T. Kamei and
M. Ozawa and
O. Nishii and
T. Hattori SH-X: an embedded processor core for
consumer appliances . . . . . . . . . . 33--40
Afrin Naz and
Mehran Rezaei and
Krishna Kavi and
Philip Sweany Improving data cache performance with
integrated use of split caches, victim
cache and stream buffers . . . . . . . . 41--48
Alex Pajuelo and
Antonio González and
Mateo Valero Speculative execution for hiding memory
latency . . . . . . . . . . . . . . . . 49--56
Javier Verdú and
Jorge García and
Mario Nemirovsky and
Mateo Valero The impact of traffic aggregation on the
memory performance of networking
applications . . . . . . . . . . . . . . 57--62
Bramha Allu and
Wei Zhang Exploiting the replication cache to
improve performance for multiple-issue
microprocessors . . . . . . . . . . . . 63--71
Mark Thorson Internet nuggets . . . . . . . . . . . . 72--74
Anonymous MEDEA 2004 workshop . . . . . . . . . . ??
Norman P. Jouppi and
Rakesh Kumar and
Dean Tullsen Introduction to the special issue on the
2005 Workshop on Design, Analysis, and
Simulation of Chip Multiprocessors
(dasCMP'05) . . . . . . . . . . . . . . 4--4
James Laudon Performance/Watt: the new server focus 5--13
John D. Davis and
Cong Fu and
James Laudon The RASE (Rapid, Accurate Simulation
Environment) for chip multiprocessors 14--23
Lisa Hsu and
Ravi Iyer and
Srihari Makineni and
Steve Reinhardt and
Donald Newell Exploring the cache design space for
large scale CMPs . . . . . . . . . . . . 24--33
John D. Davis and
Stephen E. Richardson and
Charis Charitsis and
Kunle Olukotun A chip prototyping substrate: the
flexible architecture for simulation and
testing (FAST) . . . . . . . . . . . . . 34--43
Neil Vachharajani and
Matthew Iyer and
Chinmay Ashok and
Manish Vachharajani and
David I. August and
Daniel Connors Chip multi-processor scalability for
single-threaded applications . . . . . . 44--53
Julia Chen and
Philo Juang and
Kevin Ko and
Gilberto Contreras and
David Penry and
Ram Rangan and
Adam Stoler and
Li-Shiuan Peh and
Margaret Martonosi Hardware-modulated parallelism in chip
multiprocessors . . . . . . . . . . . . 54--63
Jack Sampson and
Rubén González and
Jean-François Collard and
Norman P. Jouppi and
Mike Schlansker Fast synchronization for chip
multiprocessors . . . . . . . . . . . . 64--69
Anahita Shayesteh and
Glenn Reinman and
Norman Jouppi and
Suleyman Sair and
Tim Sherwood Dynamically configurable shared CMP
helper engines for improved performance 70--79
Theofanis Constantinou and
Yiannakis Sazeides and
Pierre Michaud and
Damien Fetis and
Andre Seznec Performance implications of single
thread migration on a chip multi-core 80--91
Milo M. K. Martin and
Daniel J. Sorin and
Bradford M. Beckmann and
Michael R. Marty and
Min Xu and
Alaa R. Alameldeen and
Kevin E. Moore and
Mark D. Hill and
David A. Wood Multifacet's general execution-driven
multiprocessor simulator (GEMS) toolset 92--99
David Wang and
Brinda Ganesh and
Nuengwong Tuaycharoen and
Kathleen Baynes and
Aamer Jaleel and
Bruce Jacob DRAMsim: a memory system simulator . . . 100--107
Barry Rountree and
Robert Springer and
David K. Lowenthal and
Vincent W. Freeh Notes from HPPAC 2005 . . . . . . . . . 108--112
H. C. Wang and
C. K. Yuen A general framework to build new CPUs by
mapping abstract machine code to
instruction level parallel execution
hardware . . . . . . . . . . . . . . . . 113--120
Nana B. Sam and
Martin Burtscher Improving memory system performance with
energy-efficient value speculation . . . 121--127
Mark Thorson Internet Nuggets . . . . . . . . . . . . 128--133
David Kaeli and
Robert Cohn WBIA'05: Introduction to the special
issue . . . . . . . . . . . . . . . . . 1--2
Chunling Hu and
John McCabe and
Daniel A. Jiménez and
Ulrich Kremer The Camino Compiler infrastructure . . . 3--8
Martin Schulz and
Dong Ahn and
Andrew Bernat and
Bronis R. de Supinski and
Steven Y. Ko and
Gregory Lee and
Barry Rountree Scalable dynamic binary instrumentation
for Blue Gene/L . . . . . . . . . . . . 9--14
Edson Borin and
Cheng Wang and
Youfeng Wu and
Guido Araujo Dynamic binary control-flow errors
detection . . . . . . . . . . . . . . . 15--20
Micha Moffie and
David Kaeli ASM: application security monitor . . . 21--26
Qin Zhao and
Rodric Rabbah and
Weng-Fai Wong Dynamic memory optimization using pool
allocation and prefetching . . . . . . . 27--32
Xiaofeng Gao and
Beth Simon and
Allan Snavely ALITER: an asynchronous lightweight
instrumentation tool for event recording 33--38
Collin McCurdy and
Charles Fischer Using Pin as a memory reference
generator for multiprocessor simulation 39--44
Heidi Pan and
Krste Asanovi\'c and
Robert Cohn and
Chi-Keung Luk Controlling program execution through
binary instrumentation . . . . . . . . . 45--50
Nikrouz Faroughi Profiling of parallel processing
programs on shared memory
multiprocessors using Simics . . . . . . 51--56
Naveen Kumar and
Ramesh Peri Transparent debugging of dynamically
instrumented programs . . . . . . . . . 57--62
Laune C. Harris and
Barton P. Miller Practical analysis of stripped binary
code . . . . . . . . . . . . . . . . . . 63--68
Vijay Janapa Reddi and
Dan Connors and
Robert S. Cohn Persistence in dynamic code
transformation systems . . . . . . . . . 69--74
Ram Srinivasan and
Olaf Lubeck MonteSim: a Monte Carlo performance
model for in-order microarchitectures 75--80
Michael Laurenzano and
Beth Simon and
Allan Snavely and
Meghan Gunn Low cost trace-driven memory simulation
using SimPoint . . . . . . . . . . . . . 81--86
Mark Thorson Internet Nuggets . . . . . . . . . . . . 87--93
S. Bartolini and
P. Foglia and
R. Giorgi and
C. A. Prete Memory performance: dealing with
applications, systems and architecture 1--2
Scott Friedman and
Praveen Krishnamurthy and
Roger Chamberlain and
Ron K. Cytron and
Jason E. Fritts Dusty caches for reference counting
garbage collection . . . . . . . . . . . 3--10
Subramanian Ramaswamy and
Jaswanth Sreeram and
Sudhakar Yalamanchili and
Krishna V. Palem Data trace cache: an application
specific cache architecture . . . . . . 11--18
Afrin Naz and
Krishna Kavi and
Mehran Rezaei and
Wentong Li Making a case for split data caches for
embedded applications . . . . . . . . . 19--26
B. Allu and
W. Zhang and
M. Kandala Exploiting the replication cache to
improve cache read bandwidth cost
effectively . . . . . . . . . . . . . . 27--32
Matteo Monchiero and
Gianluca Palermo and
Cristina Silvano and
Oreste Villa An efficient synchronization technique
for multiprocessor systems on-chip . . . 33--40
Farshad Khunjush and
Nikitas J. Dimopoulos Hiding message delivery and reducing
memory access latency by providing
direct-to-cache transfer during receive
operations in a message passing
environment . . . . . . . . . . . . . . 41--48
Yao Yue and
Chuang Lin and
Zhangxi Tan NPCryptBench: a cryptographic benchmark
suite for network processors . . . . . . 49--56
Abelardo López-Lagunas and
Sek M. Chai Memory bandwidth optimization through
stream descriptors . . . . . . . . . . . 57--64
Akihiro Chiyonobu and
Toshinori Sato Energy-efficient instruction scheduling
utilizing cache miss information . . . . 65--70
Alessandro Bardine and
Alessio Bechini and
Pierfrancesco Foglia and
Cosimo Antonio Prete Analysis of embedded video coder
systems: a system-level approach . . . . 71--76
Alex Gontmakher and
Assaf Schuster and
Avi Mendelson Inthreads: a low granularity
parallelization model . . . . . . . . . 77--80
Mark Thorson Internet nuggets . . . . . . . . . . . . 81--86
Yale Patt Computer Architecture Research and
Future Microprocessors: Where Do We Go
from Here? . . . . . . . . . . . . . . . 2--2
Jongman Kim and
Chrysostomos Nicopoulos and
Dongkook Park A Gracefully Degrading and
Energy-Efficient Modular Router
Architecture for On-Chip Networks . . . 4--15
Anonymous Message from the General Chair . . . . . 10--10
Anonymous Message from the Program Chair . . . . . 11--11
Anonymous Reviewers . . . . . . . . . . . . . . . 14--14
Steve Scott and
Dennis Abts and
John Kim and
William J. Dally The BlackWidow High-Radix Clos Network 16--28
Anonymous SIGARCH Guidelines . . . . . . . . . . . 17--17
Arvind Arvind and
Jan-Willem Maessen Memory Model $=$ Instruction Reordering
$+$ Store Atomicity . . . . . . . . . . 29--40
Christoph von Praun and
Harold W. Cain and
Jong-Deok Choi and
Kyung Dong Ryu Conditional Memory Ordering . . . . . . 41--52
Austen McDonald and
JaeWoong Chung and
Brian D. Carlstrom and
Chi Cao Minh and
Hassan Chafi and
Christos Kozyrakis and
Kunle Olukotun Architectural Semantics for Practical
Transactional Memory . . . . . . . . . . 53--65
Parthasarathy Ranganathan and
Phil Leech and
David Irwin and
Jeffrey Chase Ensemble-level Power Management for
Dense Blade Servers . . . . . . . . . . 66--77
James Donald and
Margaret Martonosi Techniques for Multicore Thermal
Management: Classification and New
Exploration . . . . . . . . . . . . . . 78--88
Yuan Lin and
Hyunseok Lee and
Mark Woh and
Yoav Harel and
Scott Mahlke and
Trevor Mudge and
Chaitali Chakrabarti and
Krisztian Flautner SODA: a Low-power Architecture For
Software Radio . . . . . . . . . . . . . 89--101
Weidong Shi and
Hsien-Hsin S. Lee and
Laura `Falk and
Mrinmoy Ghosh An Integrated Framework for Dependable
and Revivable Architectures Using
Multicore Processors . . . . . . . . . . 102--113
Richard A. Hankins and
Gautham N. Chinya and
Jamison D. Collins and
Perry H. Wang and
Ryan Rakvic and
Hong Wang and
John P. Shen Multiple Instruction Stream Processor 114--127
Philip Emma The End of Scaling? Revolutions in
Technology and Microarchitecture as We
Pass the 90 Nanometer Node . . . . . . . 128--128
Feihui Li and
Chrysostomos Nicopoulos and
Thomas Richardson and
Yuan Xie and
Vijaykrishnan Narayanan and
Mahmut Kandemir Design and Management of $3$D Chip
Multiprocessors Using Network-in-Memory 130--141
Alok Garg and
M. Wasiur Rashid and
Michael Huang Slackened Memory Dependence Enforcement:
Combining Opportunistic Forwarding with
Decoupled Verification . . . . . . . . . 142--154
Chuanjun Zhang Balanced Cache: Reducing Conflict Misses
of Direct-Mapped Caches . . . . . . . . 155--166
Moinuddin K. Qureshi and
Daniel N. Lynch and
Onur Mutlu and
Yale N. Patt A Case for MLP-Aware Cache Replacement 167--178
Chenyu Yan and
Daniel Englender and
Milos Prvulovic and
Brian Rogers and
Yan Solihin Improving Cost, Performance, and
Security of Memory Encryption and
Authentication . . . . . . . . . . . . . 179--190
Benjamin C. Brodie and
David E. Taylor and
Ron K. Cytron A Scalable Architecture For
High-Throughput Regular-Expression
Pattern Matching . . . . . . . . . . . . 191--202
Jahangir Hasan and
Srihari Cadambi and
Venkatta Jakkula and
Srimat Chakradhar Chisel: a Storage-efficient,
Collision-free Hash-based Network
Processing Architecture . . . . . . . . 203--215
Christopher B. Colohan and
Anastassia Ailamaki and
J. Gregory Steffan and
Todd C. Mowry Tolerating Dependences Between Large
Speculative Threads Via Sub-Threads . . 216--226
Luis Ceze and
James Tuck and
Josep Torrellas and
Calin Cascaval Bulk Disambiguation of Speculative
Threads in Multiprocessors . . . . . . . 227--238
Seungryul Choi and
Donald Yeung Learning-Based SMT Processor Resource
Distribution via Hill-Climbing . . . . . 239--251
Stephen Somogyi and
Thomas F. Wenisch and
Anastassia Ailamaki and
Babak Falsafi and
Andreas Moshovos Spatial Memory Streaming . . . . . . . . 252--263
Jichuan Chang and
Gurindar S. Sohi Cooperative Caching for Chip
Multiprocessors . . . . . . . . . . . . 264--276
Shiliang Hu and
James E. Smith Reducing Startup Time in Co-Designed
Virtual Machines . . . . . . . . . . . . 277--288
Qing Yang and
Weijun Xiao and
Jin Ren TRAP-Array: a Disk Array Architecture
Providing Timely Recovery to Any
Point-in-time . . . . . . . . . . . . . 289--301
Saisanthosh Balakrishnan and
Gurindar S. Sohi Program Demultiplexing: Data-flow based
Speculative Parallelization of Methods
in Sequential Programs . . . . . . . . . 302--313
Steven Swanson and
Andrew Putnam and
Martha Mercaldi and
Ken Michelson and
Andrew Petersen and
Andrew Schwerin and
Mark Oskin and
Susan J. Eggers Area-Performance Trade-offs in Tiled
Dataflow Architectures . . . . . . . . . 314--326
Karin Strauss and
Xiaowei Shen and
Josep Torrellas Flexible Snooping: Adaptive Forwarding
and Filtering of Snoops in Embedded-Ring
Multiprocessors . . . . . . . . . . . . 327--338
Liqun Cheng and
Naveen Muralimanohar and
Karthik Ramani and
Rajeev Balasubramonian and
John B. Carter Interconnect-Aware Coherence Protocols
for Chip Multiprocessors . . . . . . . . 339--351
Steve Herrod The Future of Virtualization Technology 352--352
Rodney Van Meter and
Kae Nemoto and
W. J. Munro and
Kohei M. Itoh Distributed Arithmetic on a Quantum
Multicomputer . . . . . . . . . . . . . 354--365
Nemanja Isailovic and
Yatish Patel and
Mark Whitney and
John Kubiatowicz Interconnection Networks for Scalable
Quantum Computers . . . . . . . . . . . 366--377
Darshan D. Thaker and
Tzvetan S. Metodi and
Andrew W. Cross and
Isaac L. Chuang and
Frederic T. Chong Quantum Memory Hierarchies: Efficient
Designs to Match Available Parallelism
in Quantum Computing . . . . . . . . . . 378--390
Anonymous Author Index . . . . . . . . . . . . . . 391--391
Martin Burtscher TCgen 2.0: a tool to automatically
generate lossless trace compressors . . 1--8
Abhas Kumar and
Nisheet Jain and
Mainak Chaudhuri Long-latency branches: how much do they
matter? . . . . . . . . . . . . . . . . 9--15
Mark Thorson Internet nuggets . . . . . . . . . . . . 16--21
John L. Henning SPEC CPU2006 benchmark descriptions . . 1--17
Daniel Citron and
Adham Hurani and
Alaa Gnadrey The harmonic or geometric mean: does it
really matter? . . . . . . . . . . . . . 18--25
James Poe and
Tao Li BASS: a benchmark suite for evaluating
architectural security systems . . . . . 26--33
Mark Thorson Internet nuggets . . . . . . . . . . . . 34--37
Mendel Rosenblum Impact of virtualization on computer
architecture and operating systems . . . 1--1
Keith Adams and
Ole Agesen A comparison of software and hardware
techniques for x86 virtualization . . . 2--13
Stephen T. Jones and
Andrea C. Arpaci-Dusseau and
Remzi H. Arpaci-Dusseau Geiger: monitoring the buffer cache in a
virtual machine environment . . . . . . 14--24
Jedidiah R. Crandall and
Gary Wassermann and
Daniela A. S. de Oliveira and
Zhendong Su and
S. Felix Wu and
Frederic T. Chong Temporal search: detecting hidden
malware timebombs with virtual machines 25--36
Shan Lu and
Joseph Tucek and
Feng Qin and
Yuanyuan Zhou AVIO: detecting atomicity violations via
access interleaving invariants . . . . . 37--48
Min Xu and
Mark D. Hill and
Rastislav Bodik A regulated transitive reduction (RTR)
for longer memory race recording . . . . 49--60
Michael D. Bond and
Kathryn S. McKinley Bell: bit-encoding online memory leak
detection . . . . . . . . . . . . . . . 61--72
Smitha Shyam and
Kypros Constantinides and
Sujay Phadke and
Valeria Bertacco and
Todd Austin Ultra low-cost defect protection for
microprocessor pipelines . . . . . . . . 73--82
Vimal K. Reddy and
Eric Rotenberg and
Sailashri Parthasarathy Understanding prediction-based partial
redundant threading for low-overhead,
high-coverage fault tolerance . . . . . 83--94
Angshuman Parashar and
Anand Sivasubramaniam and
Sudhanva Gurumurthi SlicK: slice-based locality exploitation
for efficient redundant multithreading 95--105
Taliver Heath and
Ana Paula Centeno and
Pradeep George and
Luiz Ramos and
Yogesh Jaluria Mercury and Freon: temperature emulation
and management for server systems . . . 106--116
Taeho Kgil and
Shaun D'Souza and
Ali Saidi and
Nathan Binkert and
Ronald Dreslinski and
Trevor Mudge and
Steven Reinhardt and
Krisztian Flautner PicoServer: using $3$D stacking
technology to enable a compact energy
efficient chip multiprocessor . . . . . 117--128
Katherine E. Coons and
Xia Chen and
Doug Burger and
Kathryn S. McKinley and
Sundeep K. Kushwaha A spatial path scheduling algorithm for
EDGE architectures . . . . . . . . . . . 129--140
Martha Mercaldi and
Steven Swanson and
Andrew Petersen and
Andrew Putnam and
Andrew Schwerin and
Mark Oskin and
Susan J. Eggers Instruction scheduling for a tiled
dataflow architecture . . . . . . . . . 141--150
Michael I. Gordon and
William Thies and
Saman Amarasinghe Exploiting coarse-grained task, data,
and pipeline parallelism in stream
programs . . . . . . . . . . . . . . . . 151--162
Mahim Mishra and
Timothy J. Callahan and
Tiberiu Chelcea and
Girish Venkataramani and
Seth C. Goldstein and
Mihai Budiu Tartan: evaluating spatial computation
for whole program execution . . . . . . 163--174
Stijn Eyerman and
Lieven Eeckhout and
Tejas Karkhanis and
James E. Smith A performance counter architecture for
computing accurate CPI components . . . 175--184
Benjamin C. Lee and
David M. Brooks Accurate and efficient regression
modeling for microarchitectural
performance and power prediction . . . . 185--194
Engin Ïpek and
Sally A. McKee and
Rich Caruana and
Bronis R. de Supinski and
Martin Schulz Efficiently exploring architectural
design spaces via predictive modeling 195--206
Mazen Kharbutli and
Xiaowei Jiang and
Yan Solihin and
Guru Venkataramani and
Milos Prvulovic Comprehensively and efficiently
protecting the heap . . . . . . . . . . 207--218
Trishul M. Chilimbi and
Vinod Ganapathy HeapMD: identifying heap-based bugs
using anomaly detection . . . . . . . . 219--228
Satish Narayanasamy and
Cristiano Pereira and
Brad Calder Recording shared memory dependencies
using strata . . . . . . . . . . . . . . 229--240
Jaidev P. Patwardhan and
Vijeta Johri and
Chris Dwyer and
Alvin R. Lebeck A defect tolerant self-organizing
nanoscale SIMD architecture . . . . . . 241--251
Ethan Schuchman and
T. N. Vijaykumar A program transformation and
architecture support for quantum
uncomputation . . . . . . . . . . . . . 252--263
Shashidhar Mysore and
Banit Agrawal and
Navin Srivastava and
Sheng-Chih Lin and
Kaustav Banerjee and
Tim Sherwood Introspective $3$D chips . . . . . . . . 264--273
Jason F. Cantin and
Mikko H. Lipasti and
James E. Smith Stealth prefetching . . . . . . . . . . 274--282
Koushik Chakraborty and
Philip M. Wells and
Gurindar S. Sohi Computation spreading: employing
hardware migration to specialize CMP
cores on-the-fly . . . . . . . . . . . . 283--292
Jason E. Miller and
Anant Agarwal Software-based instruction caching for
embedded processors . . . . . . . . . . 293--302
Xin Li and
Marian Boldt and
Reinhard von Hanxleden Mapping Esterel onto a multi-threaded
embedded processor . . . . . . . . . . . 303--314
Nathan L. Binkert and
Ali G. Saidi and
Steven K. Reinhardt Integrated network interfaces for
high-bandwidth TCP/IP . . . . . . . . . 315--324
David Tarditi and
Sidd Puri and
Jose Oglesby Accelerator: using data parallelism to
program GPUs for general-purpose uses 325--335
Peter Damron and
Alexandra Fedorova and
Yossi Lev Hybrid transactional memory . . . . . . 336--346
Weihaw Chuang and
Satish Narayanasamy and
Ganesh Venkatesh and
Jack Sampson and
Michael Van Biesbrouck and
Gilles Pokam and
Brad Calder and
Osvaldo Colavin Unbounded page-based transactional
memory . . . . . . . . . . . . . . . . . 347--358
Michelle J. Moravan and
Jayaram Bobba and
Kevin E. Moore and
Luke Yen and
Mark D. Hill and
Ben Liblit and
Michael M. Swift and
David A. Wood Supporting nested transactional memory
in logTM . . . . . . . . . . . . . . . . 359--370
JaeWoong Chung and
Chi Cao Minh and
Austen McDonald and
Travis Skare and
Hassan Chafi and
Brian D. Carlstrom and
Christos Kozyrakis and
Kunle Olukotun Tradeoffs in transactional memory
virtualization . . . . . . . . . . . . . 371--381
Motohiro Kawahito and
Hideaki Komatsu and
Takao Moriyama and
Hiroshi Inoue and
Toshio Nakatani A new idiom recognition framework for
exploiting hardware-assist instructions 382--393
Sorav Bansal and
Alex Aiken Automatic generation of peephole
superoptimizers . . . . . . . . . . . . 394--403
Armando Solar-Lezama and
Liviu Tancau and
Rastislav Bodik and
Sanjit Seshia and
Vijay Saraswat Combinatorial sketching for finite
programs . . . . . . . . . . . . . . . . 404--415
Jeff Da Silva and
J. Gregory Steffan A probabilistic pointer analysis for
speculative optimizations . . . . . . . 416--425
Dean Tullsen and
Rakesh Kumar and
Norman P. Jouppi Introduction to the special issue on the
2006 Workshop on Design, Analysis, and
Simulation of Chip Multiprocessors:
(dasCMP'06) . . . . . . . . . . . . . . 2--2
Aqeel Mahesri and
Nicholas J. Wang and
Sanjay J. Patel Hardware support for software controlled
multithreading . . . . . . . . . . . . . 3--12
Xudong Shi and
Feiqi Su and
Jih-kwon Peir and
Ye Xia and
Zhen Yang CMP cache performance projection:
accessibility vs. capacity . . . . . . . 13--20
Fei Guo and
Hari Kannan and
Li Zhao and
Ramesh Illikkal and
Ravi Iyer and
Don Newell and
Yan Solihin and
Christos Kozyrakis From chaos to QoS: case studies in CMP
resource management . . . . . . . . . . 21--30
Masaaki Kondo and
Hiroshi Sasaki and
Hiroshi Nakamura Improving fairness, throughput and
energy-efficiency on a chip
multiprocessor through DVFS . . . . . . 31--38
M. M. Waliullah and
Per Stenstrom Starvation-free commit arbitration
policies for transactional memory
systems . . . . . . . . . . . . . . . . 39--46
Cesare Ferri and
Tali Moreshet and
R. Iris Bahar and
Luca Benini and
Maurice Herlihy A hardware/software framework for
supporting transactional memory in a
MPSoC environment . . . . . . . . . . . 47--54
Sean Rul and
Hans Vandierendonck and
Koen De Bosschere Function level parallelism driven by
data dependencies . . . . . . . . . . . 55--62
John L. Henning Guest editor's introduction . . . . . . 63--64
John L. Henning SPEC CPU suite growth: an historical
perspective . . . . . . . . . . . . . . 65--68
Aashish Phansalkar and
Ajay Joshi and
Lizy K. John Subsetting the SPEC CPU2006 benchmark
suite . . . . . . . . . . . . . . . . . 69--76
Michael Wong C++ benchmarks in SPEC CPU2006 . . . . . 77--83
John L. Henning SPEC CPU2006 memory footprint . . . . . 84--89
Darryl Gove CPU2006 working set size . . . . . . . . 90--96
Wendy Korn and
Moon S. Chang SPEC CPU2006 sensitivity to memory page
sizes . . . . . . . . . . . . . . . . . 97--101
Reinhold P. Weicker and
John L. Henning Subroutine profiling results for the
CPU2006 benchmarks . . . . . . . . . . . 102--111
Dong Ye and
Joydeep Ray and
David Kaeli Characterization of file I/O activity
for SPEC CPU2006 . . . . . . . . . . . . 112--117
John L. Henning Performance counters and development of
SPEC CPU2006 . . . . . . . . . . . . . . 118--121
Darryl Gove and
Lawrence Spracklen Evaluating the correspondence between
training and reference workloads in SPEC
CPU2006 . . . . . . . . . . . . . . . . 122--129
Cloyce D. Spradling SPEC CPU2006 benchmark tools . . . . . . 130--134
Swaroop Sridhar and
Jonathan S. Shapiro and
Prashanth P. Bungale HDTrans: a low-overhead dynamic
translator . . . . . . . . . . . . . . . 135--140
Jun Yan and
Wei Zhang Hybrid multi-core architecture for
boosting single-threaded performance . . 141--148
Mark Thorson Internet nuggets . . . . . . . . . . . . 149--154
David E. Shaw and
Martin M. Deneroff and
Ron O. Dror and
Jeffrey S. Kuskin and
Richard H. Larson and
John K. Salmon and
Cliff Young and
Brannon Batson and
Kevin J. Bowers and
Jack C. Chao and
Michael P. Eastwood and
Joseph Gagliardo and
J. P. Grossman and
C. Richard Ho and
Douglas J. Ierardi and
István Kolossváry and
John L. Klepeis and
Timothy Layman and
Christine McLeavey and
Mark A. Moraes and
Rolf Mueller and
Edward C. Priest and
Yibing Shan and
Jochen Spengler and
Michael Theobald and
Brian Towles and
Stanley C. Wang Anton, a special-purpose machine for
molecular dynamics simulation . . . . . 1--12
Xiaobo Fan and
Wolf-Dietrich Weber and
Luiz Andre Barroso Power provisioning for a warehouse-sized
computer . . . . . . . . . . . . . . . . 13--23
Colin Blundell and
Joe Devietti and
E. Christopher Lewis and
Milo M. K. Martin Making the fast case common and the
uncommon case simple in unbounded
transactional memory . . . . . . . . . . 24--34
Weirong Zhu and
Vugranam C. Sreedhar and
Ziang Hu and
Guang R. Gao Synchronization state buffer: supporting
efficient fine-grain synchronization on
many-core architectures . . . . . . . . 35--45
Michael R. Marty and
Mark D. Hill Virtual hierarchies to support server
consolidation . . . . . . . . . . . . . 46--56
Kyle J. Nesbit and
James Laudon and
James E. Smith Virtual private caches . . . . . . . . . 57--68
Chi Cao Minh and
Martin Trautmann and
JaeWoong Chung and
Austen McDonald and
Nathan Bronson and
Jared Casper and
Christos Kozyrakis and
Kunle Olukotun An effective hybrid transactional memory
system with strong isolation guarantees 69--80
Jayaram Bobba and
Kevin E. Moore and
Haris Volos and
Luke Yen and
Mark D. Hill and
Michael M. Swift and
David A. Wood Performance pathologies in hardware
transactional memory . . . . . . . . . . 81--91
Hany E. Ramadan and
Christopher J. Rossbach and
Donald E. Porter and
Owen S. Hofmann and
Aditya Bhandari and
Emmett Witchel MetaTM/TxLinux: transactional memory for
an operating system . . . . . . . . . . 92--103
Arrvindh Shriraman and
Michael F. Spear and
Hemayet Hossain and
Virendra J. Marathe and
Sandhya Dwarkadas and
Michael L. Scott An integrated hardware-software approach
to flexible transactional memory . . . . 104--115
Pablo Abad and
Valentin Puente and
José Angel Gregorio and
Pablo Prieto Rotary router: an efficient architecture
for CMP interconnection networks . . . . 116--125
John Kim and
William J. Dally and
Dennis Abts Flattened butterfly: a cost-efficient
topology for high-radix networks . . . . 126--137
Jongman Kim and
Chrysostomos Nicopoulos and
Dongkook Park and
Reetuparna Das and
Yuan Xie and
Vijaykrishnan Narayanan and
Mazin S. Yousif and
Chita R. Das A novel dimensionally-decomposed router
for on-chip communication in $3$D
architectures . . . . . . . . . . . . . 138--149
Amit Kumar and
Li-Shiuan Peh and
Partha Kundu and
Niraj K. Jha Express virtual channels: towards the
ideal interconnection fabric . . . . . . 150--161
Sanjeev Kumar and
Christopher J. Hughes and
Anthony Nguyen Carbon: architectural support for
fine-grained parallelism on chip
multiprocessors . . . . . . . . . . . . 162--173
Naveen Neelakantam and
Ravi Rajwar and
Suresh Srinivas and
Uma Srinivasan and
Craig Zilles Hardware atomicity for reliable software
speculation . . . . . . . . . . . . . . 174--185
Engin Ipek and
Meyrem Kirman and
Nevin Kirman and
Jose F. Martinez Core fusion: accommodating software
diversity in chip multiprocessors . . . 186--197
Eric Chi and
Stephen A. Lyon and
Margaret Martonosi Tailoring quantum architectures to
implementation style: a quantum computer
for mobile and persistent qubits . . . . 198--209
Xuejun Yang and
Xiaobo Yan and
Zuocheng Xing and
Yu Deng and
Jiang Jiang and
Ying Zhang A 64-bit stream processor architecture
for scientific applications . . . . . . 210--219
Christopher J. Hughes and
Radek Grzeszczuk and
Eftychios Sifakis and
Daehyun Kim and
Sanjeev Kumar and
Andrew P. Selle and
Jatin Chhugani and
Matthew Holliman and
Yen-Kuang Chen Physical simulation for animation and
visual effects: parallelization and
characterization for chip
multiprocessors . . . . . . . . . . . . 220--231
Thomas Y. Yeh and
Petros Faloutsos and
Sanjay J. Patel and
Glenn Reinman ParallAX: an architecture for real-time
physics . . . . . . . . . . . . . . . . 232--243
Martha Mercaldi Kim and
Mojtaba Mehrara and
Mark Oskin and
Todd Austin Architectural implications of brick and
mortar silicon manufacturing . . . . . . 244--253
Ahmed M. Amin and
Mithuna Thottethodi and
T. N. Vijaykumar and
Steven Wereley and
Stephen C. Jacobson Aquacore: a programmable architecture
for microfluidics . . . . . . . . . . . 254--265
Thomas F. Wenisch and
Anastasia Ailamaki and
Babak Falsafi and
Andreas Moshovos Mechanisms for store-wait-free
multiprocessors . . . . . . . . . . . . 266--277
Luis Ceze and
James Tuck and
Pablo Montesinos and
Josep Torrellas BulkSC: bulk enforcement of sequential
consistency . . . . . . . . . . . . . . 278--289
Bruno Diniz and
Dorgival Guedes and
Wagner Meira, Jr. and
Ricardo Bianchini Limiting the power consumption of main
memory . . . . . . . . . . . . . . . . . 290--301
Francisco Javier Mesa-Martinez and
Joseph Nayfach-Battilana and
Jose Renau Power model validation through thermal
measurements . . . . . . . . . . . . . . 302--311
Jiang Lin and
Hongzhong Zheng and
Zhichun Zhu and
Howard David and
Zhao Zhang Thermal modeling and management of DRAM
memory systems . . . . . . . . . . . . . 312--322
Abhishek Tiwari and
Smruti R. Sarangi and
Josep Torrellas ReCycle: pipeline adaptation to tolerate
process variation . . . . . . . . . . . 323--334
Peter G. Sassone and
Jeff Rupley II and
Edward Brekelbaum and
Gabriel H. Loh and
Bryan Black Matrix scheduler reloaded . . . . . . . 335--346
Simha Sethumadhavan and
Franziska Roesner and
Joel S. Emer and
Doug Burger and
Stephen W. Keckler Late-binding: enabling unordered
load-store queues . . . . . . . . . . . 347--357
Jacob Leverich and
Hideho Arakida and
Alex Solomatnikov and
Amin Firoozshahian and
Mark Horowitz and
Christos Kozyrakis Comparing memory systems for chip
multiprocessors . . . . . . . . . . . . 358--368
Naveen Muralimanohar and
Rajeev Balasubramonian Interconnect design considerations for
large NUCA caches . . . . . . . . . . . 369--380
Moinuddin K. Qureshi and
Aamer Jaleel and
Yale N. Patt and
Simon C. Steely and
Joel Emer Adaptive insertion policies for high
performance caching . . . . . . . . . . 381--391
Paul A. Karger Performance and security lessons learned
from virtualizing the Alpha processor 392--401
Tejas S. Karkhanis and
James E. Smith Automated design of application specific
superscalar processors: an analytical
approach . . . . . . . . . . . . . . . . 402--411
Aashish Phansalkar and
Ajay Joshi and
Lizy K. John Analysis of redundancy and application
balance in the SPEC CPU2006 benchmark
suite . . . . . . . . . . . . . . . . . 412--423
Hyesoon Kim and
José A. Joao and
Onur Mutlu and
Chang Joo Lee and
Yale N. Patt and
Robert Cohn VPC prediction: reducing the cost of
indirect branches via hardware-based
dynamic devirtualization . . . . . . . . 424--435
Andrew D. Hilton and
Amir Roth Ginger: control independence using tag
rewriting . . . . . . . . . . . . . . . 436--447
Ahmed S. Al-Zawawi and
Vimal K. Reddy and
Eric Rotenberg and
Haitham H. Akkary Transparent control independence (TCI) 448--459
Nicholas J. Wang and
Aqeel Mahesri and
Sanjay J. Patel Examining ACE analysis reliability
estimates using fault-injection . . . . 460--469
Nidhi Aggarwal and
Parthasarathy Ranganathan and
Norman P. Jouppi and
James E. Smith Configurable isolation: building high
availability systems with commodity
multi-core processors . . . . . . . . . 470--481
Michael Dalton and
Hari Kannan and
Christos Kozyrakis Raksha: a flexible information flow
architecture for software security . . . 482--493
Zhenghong Wang and
Ruby B. Lee New cache designs for thwarting software
cache-based side channel attacks . . . . 494--505
Niranjan Kumar Soundararajan and
Angshuman Parashar and
Anand Sivasubramaniam Mechanisms for bounding vulnerabilities
of processor structures . . . . . . . . 506--515
Kristen R. Walcott and
Greg Humphreys and
Sudhanva Gurumurthi Dynamic prediction of architectural
vulnerability from microarchitectural
state . . . . . . . . . . . . . . . . . 516--527
Aneesh Aggarwal and
Pradip Bose and
Mohamed Zahran Introduction to the special issue on the
2006 Reconfigurable and Adaptive
Architecture Workshop . . . . . . . . . 1--1
Nikolaos Bellas and
Sek M. Chai and
Malcolm Dwyer and
Dan Linzmeier Mapping streaming architectures on
reconfigurable platforms . . . . . . . . 2--8
Martin Labrecque and
Peter Yiannacouras and
J. Gregory Steffan Custom code generation for soft
processors . . . . . . . . . . . . . . . 9--19
Tameesh Suri Improving instruction level parallelism
through reconfigurable units in
superscalar processors . . . . . . . . . 20--27
Hashem H. Najaf-abadi and
Eric Rotenberg Architectural contesting: exposing and
exploiting temperamental behavior . . . 28--35
Kuo-Kun Tseng and
Ying-Dar Lin and
Tsern-Huei Lee and
Yuan-Cheng Lai Deterministic high-speed root-hashing
automaton matching coprocessor for
embedded network processor . . . . . . . 36--43
Fadi N. Sibai Performance analysis and workload
characterization of the $3$DMark05
benchmark on modern parallel computer
platforms . . . . . . . . . . . . . . . 44--52
Mark Thorson Internet nuggets . . . . . . . . . . . . 53--55
S. Bartolini and
P. Foglia and
C. A. Prete MEmory performance: DEaling with
applications, systems and architecture 4--5
K. Patrick Lorton and
David S. Wise Analyzing block locality in Morton-order
and Morton-hybrid matrices . . . . . . . 6--12
Kaveh Jokar Deris and
Amirali Baniasadi Investigating cache energy and latency
break-even points in high performance
processors . . . . . . . . . . . . . . . 13--20
Jun Yan and
Wei Zhang Evaluating instruction cache
vulnerability to transient errors . . . 21--28
Tanausú Ramírez and
Alex Pajuelo and
Oliverio J. Santana and
Mateo Valero Energy saving through a simple load
control mechanism . . . . . . . . . . . 29--36
Luis M. Ramos and
José Luis Briz and
Pablo E. Ibáñez and
Victor Viñals Data prefetching in a cache hierarchy
with high bandwidth and capacity . . . . 37--44
Haakon Dybdahl and
Per Stenström and
Lasse Natvig An LRU-based replacement algorithm
augmented with frequency of access in
shared chip-multiprocessor caches . . . 45--52
A. Bardine and
P. Foglia and
G. Gabrielli and
C. A. Prete and
P. Stenström Improving power efficiency of D-NUCA
caches . . . . . . . . . . . . . . . . . 53--58
Mark Thorson Internet nuggets . . . . . . . . . . . . 59--62
Kenji Kise and
Toshinori Sato and
Hironori Nakajo Special issue: ALPS'07 -- Advanced Low
Power Systems: Introduction . . . . . . 1--2
Jun Yao and
Shinobu Miwa and
Hajime Shimada and
Shinji Tomita Optimal pipeline depth with pipeline
stage unification adoption . . . . . . . 3--9
Preetham Lakshmikanthan and
Adrian Nuñez VCLEARIT: a VLSI CMOS circuit leakage
reduction technique for nanoscale
technologies . . . . . . . . . . . . . . 10--16
Kiyofumi Tanaka and
Takahiro Kawahara Leakage energy reduction in cache memory
by data compression . . . . . . . . . . 17--24
Hidetsugu Irie and
Ken Sugimoto and
Masahiro Goshima and
Shuich Sakai Preventing timing errors on register
writes: mechanisms of detections and
recoveries . . . . . . . . . . . . . . . 25--31
Mihaela Mali\cta and
Gheorghe \cStefan and
Dominique Thiébaut Not multi-, but many-core: designing
integral parallel architectures for
embedded computation . . . . . . . . . . 32--38
Takefumi Miyoshi and
Nobuhiko Sugino Fine-grain compensation method with
consideration of trade-offs between
computation and data transfer for power
consumption . . . . . . . . . . . . . . 39--44
Bogdan F. Romanescu and
Michael E. Bauer and
Sule Ozev and
Daniel J. Sorin VariaSim: simulating circuits and
systems in the presence of process
variability . . . . . . . . . . . . . . 45--48
N. Venkateswaran and
Deepak Srinivasan and
Madhavan Manivannan and
T. P. Ramnath Sai Sagar and
Shyamsundar Gopalakrishnan and
VinothKrishnan Elangovan and
Karthik Chandrasekar and
Prem Kumar Ramesh and
Viswanath Venkatesan and
Arvindakshan Babu and
Sudharshan Future generation supercomputers I: a
paradigm for node architecture . . . . . 49--60
N. Venkateswaran and
Deepak Srinivasan and
Madhavan Manivannan and
T. P. Ramnath Sai Sagar and
Shyamsundar Gopalakrishnan and
VinothKrishnan Elangovan and
Arvind M. and
Prem Kumar Ramesh and
Karthik Ganesan and
Viswanath Krishnamurthy and
Sivaramakrishnan Future generation supercomputers II: a
paradigm for cluster architecture . . . 61--70
Mark Thorson Internet nuggets . . . . . . . . . . . . 71--73
Erik Winfree Toward molecular programming with DNA 1--1
Xiaoxin Chen and
Tal Garfinkel and
E. Christopher Lewis and
Pratap Subrahmanyam and
Carl A. Waldspurger and
Dan Boneh and
Jeffrey Dwoskin and
Dan R. K. Ports Overshadow: a virtualization-based
approach to retrofitting protection in
commodity operating systems . . . . . . 2--13
Jonathan M. McCune and
Bryan Parno and
Adrian Perrig and
Michael K. Reiter and
Arvind Seshadri How low can you go?: recommendations for
hardware-supported minimal TCB code
execution . . . . . . . . . . . . . . . 14--25
Ravi Bhargava and
Benjamin Serebrin and
Francesco Spadini and
Srilatha Manne Accelerating two-dimensional page walks
for virtualized systems . . . . . . . . 26--35
Benjamin C. Lee and
David Brooks Efficiency trends and limits from
comprehensive microarchitectural
adaptivity . . . . . . . . . . . . . . . 36--47
Ramya Raghavendra and
Parthasarathy Ranganathan and
Vanish Talwar and
Zhikui Wang and
Xiaoyun Zhu No 'power' struggles: coordinated
multi-level power management for the
data center . . . . . . . . . . . . . . 48--59
Chinnakrishnan S. Ballapuram and
Ahmad Sharif and
Hsien-Hsin S. Lee Exploiting access semantics and program
behavior to reduce snoop power in chip
multiprocessors . . . . . . . . . . . . 60--69
Arindam Mallik and
Jack Cosgrove and
Robert P. Dick and
Gokhan Memik and
Peter Dinda PICSEL: measuring user-perceived
performance to control dynamic frequency
scaling . . . . . . . . . . . . . . . . 70--79
Jose A. Joao and
Onur Mutlu and
Hyesoon Kim and
Rishi Agarwal and
Yale N. Patt Improving the performance of
object-oriented languages with dynamic
predication of indirect jumps . . . . . 80--90
Michal Wegiel and
Chandra Krintz The mapping collector: virtual memory
support for generational, parallel, and
concurrent compaction . . . . . . . . . 91--102
Joe Devietti and
Colin Blundell and
Milo M. K. Martin and
Steve Zdancewic Hardbound: architectural support for
spatial safety of the C programming
language . . . . . . . . . . . . . . . . 103--114
Vitaliy B. Lvin and
Gene Novark and
Emery D. Berger and
Benjamin G. Zorn Archipelago: trading address space for
reliability and security . . . . . . . . 115--124
Bumyong Choi and
Leo Porter and
Dean M. Tullsen Accurate branch prediction for short
threads . . . . . . . . . . . . . . . . 125--134
Shekhar Srikantaiah and
Mahmut Kandemir and
Mary Jane Irwin Adaptive set pinning: managing shared
caches in chip multiprocessors . . . . . 135--144
James Tuck and
Wonsun Ahn and
Luis Ceze and
Josep Torrellas SoftSig: software-exposed hardware
signatures for code analysis and
optimization . . . . . . . . . . . . . . 145--156
Ioana Burcea and
Stephen Somogyi and
Andreas Moshovos and
Babak Falsafi Predictor virtualization . . . . . . . . 157--167
Vinod Ganapathy and
Matthew J. Renzelmann and
Arini Balakrishnan and
Michael M. Swift and
Somesh Jha The design and implementation of
microdrivers . . . . . . . . . . . . . . 168--178
Yaron Weinsberg and
Danny Dolev and
Tal Anker and
Muli Ben-Yehuda and
Pete Wyckoff Tapping into the fountain of CPUs: on
operating system support for
programmable devices . . . . . . . . . . 179--188
Kai Shen and
Ming Zhong and
Sandhya Dwarkadas and
Chuanpeng Li and
Christopher Stewart and
Xiao Zhang Hardware counter driven on-the-fly
request signatures . . . . . . . . . . . 189--200
Luk Van Ertvelde and
Lieven Eeckhout Dispersing proprietary applications as
benchmarks through code mutation . . . . 201--210
Shashidhar Mysore and
Bita Mazloom and
Banit Agrawal and
Timothy Sherwood Understanding and visualizing full
systems with data flow tomography . . . 211--221
Guilherme Ottoni and
David I. August Communication optimizations for global
multi-threaded instruction scheduling 222--232
Milind Kulkarni and
Keshav Pingali and
Ganesh Ramanarayanan and
Bruce Walter and
Kavita Bala and
L. Paul Chew Optimistic parallelism benefits from
data partitioning . . . . . . . . . . . 233--243
Russ Cox and
Tom Bergan and
Austin T. Clements and
Frans Kaashoek and
Eddie Kohler Xoc, an extension-oriented compiler for
systems programming . . . . . . . . . . 244--254
Philip M. Wells and
Koushik Chakraborty and
Gurindar S. Sohi Adapting to intermittent faults in
multicore systems . . . . . . . . . . . 255--264
Man-Lap Li and
Pradeep Ramachandran and
Swarup Kumar Sahoo and
Sarita V. Adve and
Vikram S. Adve and
Yuanyuan Zhou Understanding the propagation of hard
errors to software and implications for
resilient system design . . . . . . . . 265--276
M. Aater Suleman and
Moinuddin K. Qureshi and
Yale N. Patt Feedback-driven threading:
power-efficient and high-performance
execution of multi-threaded workloads on
CMPs . . . . . . . . . . . . . . . . . . 277--286
Michael D. Linderman and
Jamison D. Collins and
Hong Wang and
Teresa H. Meng Merge: a programming model for
heterogeneous multi-core systems . . . . 287--296
Jayanth Gummaraju and
Joel Coburn and
Yoshio Turner and
Mendel Rosenblum Streamware: programming general-purpose
multicore processors using streams . . . 297--307
Edmund B. Nightingale and
Daniel Peek and
Peter M. Chen and
Jason Flinn Parallelizing security checks on
commodity hardware . . . . . . . . . . . 308--318
Miguel Castro and
Manuel Costa and
Jean-Philippe Martin Better bug reporting with better privacy 319--328
Shan Lu and
Soyeon Park and
Eunsoo Seo and
Yuanyuan Zhou Learning from mistakes: a comprehensive
study on real world concurrency bug
characteristics . . . . . . . . . . . . 329--339
Anonymous Message from the General Chairs . . . . x--x
Anonymous Message from the Program Chair . . . . . xi--xi
Anonymous Reviewers . . . . . . . . . . . . . . . xv--xviii
Francis Tseng and
Yale N. Patt Achieving Out-of-Order Performance with
Almost In-Order Complexity . . . . . . . 3--12
Mayank Agarwal and
Nitin Navale and
Kshitiz Malik and
Matthew I. Frank Fetch-Criticality Reduction through
Control Independence . . . . . . . . . . 13--24
Miquel Peric\`as and
Adrian Cristal and
Francisco J. Cazorla and
Ruben González and
Alex Veidenbaum and
Daniel A. Jiménez and
Mateo Valero A Two-Level Load/Store Queue Based on
Execution Locality . . . . . . . . . . . 25--36
Engin Ipek and
Onur Mutlu and
José F. Martínez and
Rich Caruana Self-Optimizing Memory Controllers: a
Reinforcement Learning Approach . . . . 39--50
Shyamkumar Thoziyoor and
Jung Ho Ahn and
Matteo Monchiero and
Jay B. Brockman and
Norman P. Jouppi A Comprehensive Memory Modeling Tool and
Its Application to the Design and
Analysis of Future Memory Hierarchies 51--62
Onur Mutlu and
Thomas Moscibroda Parallelism-Aware Batch Scheduling:
Enhancing both Performance and Fairness
of Shared DRAM Systems . . . . . . . . . 63--74
John Kim and
William J. Dally and
Steve Scott and
Dennis Abts Technology-Driven, Highly-Scalable
Dragonfly Topology . . . . . . . . . . . 77--88
Jae W. Lee and
Man Cheuk Ng and
Krste Asanovic Globally-Synchronized Frames for
Guaranteed Quality-of-Service in On-Chip
Networks . . . . . . . . . . . . . . . . 89--100
Martha Mercaldi Kim and
John D. Davis and
Mark Oskin and
Todd Austin Polymorphic On-Chip Networks . . . . . . 101--112
Lee Baugh and
Naveen Neelakantam and
Craig Zilles Using Hardware Memory Protection to
Build a High-Performance,
Strongly-Atomic Hybrid Transactional
Memory . . . . . . . . . . . . . . . . . 115--126
Jayaram Bobba and
Neelam Goyal and
Mark D. Hill and
Michael M. Swift and
David A. Wood TokenTM: Efficient Execution of Large
Transactions with Hardware Transactional
Memory . . . . . . . . . . . . . . . . . 127--138
Arrvindh Shriraman and
Sandhya Dwarkadas and
Michael L. Scott Flexible Decoupled Transactional Memory
Support . . . . . . . . . . . . . . . . 139--150
Dana Vantrease and
Robert Schreiber and
Matteo Monchiero and
Moray McLaren and
Norman P. Jouppi and
Marco Fiorentino and
Al Davis and
Nathan Binkert and
Raymond G. Beausoleil and
Jung Ho Ahn Corona: System Implications of Emerging
Nanophotonic Technology . . . . . . . . 153--164
Lucas Kreger-Stickles and
Mark Oskin Microcoded Architectures for Ion-Tap
Quantum Computers . . . . . . . . . . . 165--176
Nemanja Isailovic and
Mark Whitney and
Yatish Patel and
John Kubiatowicz Running a Quantum Circuit at the Speed
of Data . . . . . . . . . . . . . . . . 177--188
Xiaoyao Liang and
Gu-Yeon Wei and
David Brooks ReVIVaL: a Variation-Tolerant
Architecture Using Voltage Interpolation
and Variable Latency . . . . . . . . . . 191--202
Chris Wilkerson and
Hongliang Gao and
Alaa R. Alameldeen and
Zeshan Chishti and
Muhammad Khellah and
Shih-Lien Lu Trading off Cache Capacity for
Reliability to Enable Low Voltage
Operation . . . . . . . . . . . . . . . 203--214
Franziska Roesner and
Doug Burger and
Stephen W. Keckler Counting Dependence Predictors . . . . . 215--226
Natalie Enright Jerger and
Li-Shiuan Peh and
Mikko Lipasti Virtual Circuit Tree Multicasting: a
Case for On-Chip Hardware Multicast
Support . . . . . . . . . . . . . . . . 229--240
Avinash Karanth Kodi and
Ashwini Sarathy and
Ahmed Louri iDEAL: Inter-router Dual-Function Energy
and Area-Efficient Links for
Network-on-Chip (NoC) Architectures . . 241--250
Dongkook Park and
Soumya Eachempati and
Reetuparna Das and
Asit K. Mishra and
Yuan Xie and
N. Vijaykrishnan and
Chita R. Das MIRA: a Multi-layered On-Chip
Interconnect Router Architecture . . . . 251--261
Derek R. Hower and
Mark D. Hill Rerun: Exploiting Episodes for
Lightweight Memory Race Recording . . . 265--276
Brandon Lucia and
Joseph Devietti and
Karin Strauss and
Luis Ceze Atom-Aid: Detecting and Surviving
Atomicity Violations . . . . . . . . . . 277--288
Pablo Montesinos and
Luis Ceze and
Josep Torrellas DeLorean: Recording and
Deterministically Replaying
Shared-Memory Multiprocessor Execution
Efficiently . . . . . . . . . . . . . . 289--300
Sriram Sankar and
Sudhanva Gurumurthi and
Mircea R. Stan Intra-disk Parallelism: An Idea Whose
Time Has Come . . . . . . . . . . . . . 303--314
Kevin Lim and
Parthasarathy Ranganathan and
Jichuan Chang and
Chandrakant Patel and
Trevor Mudge and
Steven Reinhardt Understanding and Designing New Server
Architectures for Emerging
Warehouse-Computing Environments . . . . 315--326
Taeho Kgil and
David Roberts and
Trevor Mudge Improving NAND Flash Based Disk Caches 327--338
Xiaodong Li and
Sarita V. Adve and
Pradip Bose and
Jude A. Rivers Online Estimation of Architectural
Vulnerability Factor for Soft Errors . . 341--352
Jeonghee Shin and
Victor Zyuban and
Pradip Bose and
Timothy M. Pinkston A Proactive Wearout Recovery Approach
for Exploiting Microarchitectural
Redundancy to Extend Cache SRAM Lifetime 353--362
Radu Teodorescu and
Josep Torrellas Variation-Aware Application Scheduling
and Power Management for Chip
Multiprocessors . . . . . . . . . . . . 363--374
Shimin Chen and
Michael Kozuch and
Theodoros Strigkos and
Babak Falsafi and
Phillip B. Gibbons and
Todd C. Mowry and
Vijaya Ramachandran and
Olatunji Ruwase and
Michael Ryan and
Evangelos Vlachos Flexible Hardware Acceleration for
Instruction-Grain Program Monitoring . . 377--388
Nathan Clark and
Amir Hormati and
Scott Mahlke VEAL: Virtualized Execution Accelerator
for Loops . . . . . . . . . . . . . . . 389--400
Haibo Chen and
Xi Wu and
Liwei Yuan and
Binyu Zang and
Pen-chung Yew and
Frederic T. Chong From Speculation to Security: Practical
and Efficient Information Flow Tracking
Using Speculative Hardware . . . . . . . 401--412
Carlos Boneti and
Francisco J. Cazorla and
Roberto Gioiosa and
Alper Buyuktosunoglu and
Chen-Yong Cher and
Mateo Valero Software-Controlled Priority
Characterization of POWER5 Processor . . 415--426
Alex Shye and
Berkin Ozisikyilmaz and
Arindam Mallik and
Gokhan Memik and
Peter A. Dinda and
Robert P. Dick and
Alok N. Choudhary Learning and Leveraging the Relationship
between Architecture-Level Measurements
and Individual User Satisfaction . . . . 427--438
Sanjeev Kumar and
Daehyun Kim and
Mikhail Smelyanskiy and
Yen-Kuang Chen and
Jatin Chhugani and
Christopher J. Hughes and
Changkyu Kim and
Victor W. Lee and
Anthony D. Nguyen Atomic Vector Operations on Chip
Multiprocessors . . . . . . . . . . . . 441--452
Gabriel H. Loh $3$D-Stacked Memory Architectures for
Multi-core Processors . . . . . . . . . 453--464
Anonymous Author Index . . . . . . . . . . . . . . 465--466
Anonymous Publisher's Information . . . . . . . . 468--468
Anonymous Cover Art . . . . . . . . . . . . . . . C1--C1
Ramesh K. Karne and
Alexander L. Wijesinha and
George H. Ford, Jr. Opinion: stay on course with an
evolution or choose a revolution in
computing . . . . . . . . . . . . . . . 1--6
Mark Thorson Internet Nuggets . . . . . . . . . . . . 7--11
Jerker Bengtsson and
Bertil Svensson A domain-specific approach for software
development on Manycore platforms . . . 2--10
Daniel Cederman and
Philippas Tsigas On sorting and load balancing on GPUs 11--18
Phuong Hoai Ha and
Philippas Tsigas and
Otto J. Anshus Non-blocking programming on multi-core
graphics processors: (extended abstract) 19--28
Shuvra S. Bhattacharyya and
Gordon Brebner and
Jörn W. Janneck and
Johan Eker and
Carl von Platen and
Marco Mattavelli and
Mickaël Raulet OpenDF: a dataflow toolset for
reconfigurable hardware and multicore
systems . . . . . . . . . . . . . . . . 29--35
Christoph W. Kessler and
Jörg Keller Optimized on-chip pipelining of
memory-intensive computations on the
cell BE . . . . . . . . . . . . . . . . 36--45
Håkan Lundvall and
Kristian Stavåker and
Peter Fritzson and
Christoph Kessler Automatic parallelization of simulation
code for equation-based models with
software pipelining and measurements on
three platforms . . . . . . . . . . . . 46--55
Huan Fang and
Mats Brorsson Scalable directory architecture for
distributed shared memory chip
multiprocessors . . . . . . . . . . . . 56--64
Bengt Jonsson State-space exploration for concurrent
algorithms under weak memory orderings:
(preliminary version) . . . . . . . . . 65--71
Parosh Aziz Abdulla and
Frédéric Haziza and
Mats Kindahl Model checking race-freeness . . . . . . 72--79
Hakan Sundell and
Philippas Tsigas NOBLE: non-blocking programming support
via lock-free shared abstract data types 80--87
Anders Gidenstam and
Marina Papatriantafilou LFTHREADS: a lock-free thread library 88--92
Karl-Filip Faxén Wool --- a work stealing library . . . . 93--100
Mark Thorson Internet nuggets . . . . . . . . . . . . 101--111
Mark Gebhart and
Bertrand A. Maher and
Katherine E. Coons and
Jeff Diamond and
Paul Gratz and
Mario Marino and
Nitya Ranganathan and
Behnam Robatmili and
Aaron Smith and
James Burrill and
Stephen W. Keckler and
Doug Burger and
Kathryn S. McKinley An evaluation of the TRIPS computer
system . . . . . . . . . . . . . . . . . 1--12
Constantin Pistol and
Wutichai Chongchitmate and
Christopher Dwyer and
Alvin R. Lebeck Architectural implications of nanoscale
integrated sensing and computing . . . . 13--24
Soyeon Park and
Shan Lu and
Yuanyuan Zhou CTrigger: exposing atomicity violation
bugs from their hiding places . . . . . 25--36
Stelios Sidiroglou and
Oren Laadan and
Carlos Perez and
Nicolas Viennot and
Jason Nieh and
Angelos D. Keromytis ASSURE: automatic software self-healing
using rescue points . . . . . . . . . . 37--48
Andrew Lenharth and
Vikram S. Adve and
Samuel T. King Recovery domains: an organizing
principle for recoverable operating
systems . . . . . . . . . . . . . . . . 49--60
Martin Dimitrov and
Huiyang Zhou Anomaly-based bug prediction, isolation,
and validation: an automated approach
for software debugging . . . . . . . . . 61--72
Pablo Montesinos and
Matthew Hicks and
Samuel T. King and
Josep Torrellas Capo: a software-hardware interface for
practical deterministic multiprocessor
replay . . . . . . . . . . . . . . . . . 73--84
Joseph Devietti and
Brandon Lucia and
Luis Ceze and
Mark Oskin DMP: deterministic shared memory
multiprocessing . . . . . . . . . . . . 85--96
Marek Olszewski and
Jason Ansel and
Saman Amarasinghe Kendo: efficient deterministic
multithreading in software . . . . . . . 97--108
Mohit Tiwari and
Hassan M. G. Wassel and
Bita Mazloom and
Shashidhar Mysore and
Frederic T. Chong and
Timothy Sherwood Complete information flow tracking from
the gates up . . . . . . . . . . . . . . 109--120
David K. Tam and
Reza Azimi and
Livio B. Soares and
Michael Stumm RapidMRC: approximating L2 miss rate
curves on commodity systems for online
optimizations . . . . . . . . . . . . . 121--132
Stijn Eyerman and
Lieven Eeckhout Per-thread cycle accounting in SMT
processors . . . . . . . . . . . . . . . 133--144
Owen S. Hofmann and
Christopher J. Rossbach and
Emmett Witchel Maximum benefit from a minimal HTM . . . 145--156
Dave Dice and
Yossi Lev and
Mark Moir and
Daniel Nussbaum Early experience with a commercial
hardware transactional memory
implementation . . . . . . . . . . . . . 157--168
Philip M. Wells and
Koushik Chakraborty and
Gurindar S. Sohi Mixed-mode multicore reliability . . . . 169--180
Sriram Rajamani and
G. Ramalingam and
Venkatesh Prasad Ranganath and
Kapil Vaswani ISOLATOR: dynamically ensuring isolation
in comcurrent programs . . . . . . . . . 181--192
Joseph Tucek and
Weiwei Xiong and
Yuanyuan Zhou Efficient online validation with delta
execution . . . . . . . . . . . . . . . 193--204
David Meisner and
Brian T. Gold and
Thomas F. Wenisch PowerNap: eliminating server idle power 205--216
Adrian M. Caulfield and
Laura M. Grupp and
Steven Swanson Gordon: using flash memory to build
fast, power-efficient clusters for
data-intensive applications . . . . . . 217--228
Aayush Gupta and
Youngjae Kim and
Bhuvan Urgaonkar DFTL: a flash translation layer
employing demand-based selective caching
of page-level address mappings . . . . . 229--240
Farhana Aleen and
Nathan Clark Commutativity analysis for software
parallelization: letting program
transformations see the big picture . . 241--252
M. Aater Suleman and
Onur Mutlu and
Moinuddin K. Qureshi and
Yale N. Patt Accelerating critical section execution
with asymmetric multi-core architectures 253--264
Todd Mytkowicz and
Amer Diwan and
Matthias Hauswirth and
Peter F. Sweeney Producing wrong data without doing
anything obviously wrong! . . . . . . . 265--276
Michael D. Bond and
Kathryn S. McKinley Leak pruning . . . . . . . . . . . . . . 277--288
Michal Wegiel and
Chandra Krintz Dynamic prediction of collection yield
for managed runtimes . . . . . . . . . . 289--300
Aravind Menon and
Simon Schubert and
Willy Zwaenepoel TwinDrivers: semi-automatic derivation
of fast and safe hypervisor network
drivers from guest OS drivers . . . . . 301--312
Ioana Burcea and
Andreas Moshovos Phantom-BTB: a virtualized branch target
buffer design . . . . . . . . . . . . . 313--324
Karthik Ramani and
Christiaan P. Gribble and
Al Davis StreamRay: a stream filtering
architecture for coherent ray tracing 325--336
Robert D. Cameron and
Dan Lin Architectural support for SWAR text
processing with parallel bit streams:
the inductive doubling principle . . . . 337--348
Norman P. Jouppi and
Rakesh Kumar and
Dean Tullsen Introduction to the special issue on the
2008 Workshop on Design, Analysis, and
Simulation of Chip Multiprocessors
(dasCMP'08) . . . . . . . . . . . . . . 1--1
Hui Zeng and
Matt Yourst and
Kanad Ghose and
Dmitry Ponomarev MPTLsim: a cycle-accurate, full-system
simulator for x86-64 multicore
architectures with coherent caches . . . 2--9
Matteo Monchiero and
Jung Ho Ahn and
Ayose Falcón and
Daniel Ortega and
Paolo Faraboschi How to simulate 1000 cores . . . . . . . 10--19
Jianwei Chen and
Murali Annavaram and
Michel Dubois SlackSim: a platform for parallel
simulations of CMPs on CMPs . . . . . . 20--29
Madhura Purnaprajna and
Mario Porrmann and
Ulrich Rueckert Run-time reconfigurability in embedded
multiprocessors . . . . . . . . . . . . 30--37
Chris Jesshope and
Mike Lankamp and
Li Zhang The implementation of an SVP many-core
processor and the evaluation of its
memory architecture . . . . . . . . . . 38--45
Karan Singh and
Major Bhadauria and
Sally A. McKee Real time power estimation and thread
scheduling via performance counters . . 46--55
Omid Azizi and
Aqeel Mahesri and
Sanjay J. Patel and
Mark Horowitz Area-efficiency in CMP core design:
co-optimization of microarchitecture and
physical design . . . . . . . . . . . . 56--65
Mark Thorson Internet nuggets . . . . . . . . . . . . 66--69
Katherine Yelick Ten ways to waste a parallel computer 1--1
Benjamin C. Lee and
Engin Ipek and
Onur Mutlu and
Doug Burger Architecting phase change memory as a
scalable DRAM alternative . . . . . . . 2--13
Ping Zhou and
Bo Zhao and
Jun Yang and
Youtao Zhang A durable and energy efficient main
memory using phase change memory
technology . . . . . . . . . . . . . . . 14--23
Moinuddin K. Qureshi and
Vijayalakshmi Srinivasan and
Jude A. Rivers Scalable high performance main memory
system using phase-change memory
technology . . . . . . . . . . . . . . . 24--33
Xiaoxia Wu and
Jian Li and
Lixin Zhang and
Evan Speight and
Ram Rajamony and
Yuan Xie Hybrid cache architecture with disparate
memory technologies . . . . . . . . . . 34--45
Jinho Suh and
Michel Dubois Dynamic MIPS rate stabilization in
out-of-order processors . . . . . . . . 46--56
Marco Paolieri and
Eduardo Quiñones and
Francisco J. Cazorla and
Guillem Bernat and
Mateo Valero Hardware support for WCET analysis of
hard real-time multicore systems . . . . 57--68
Stephen Somogyi and
Thomas F. Wenisch and
Anastasia Ailamaki and
Babak Falsafi Spatio-temporal memory streaming . . . . 69--80
Pedro Diaz and
Marcelo Cintra Stream chaining: exploiting multiple
levels of correlation in data
prefetching . . . . . . . . . . . . . . 81--92
Michael D. Powell and
Arijit Biswas and
Shantanu Gupta and
Shubhendu S. Mukherjee Architectural core salvaging in a
multi-core processor for hard-error
tolerance . . . . . . . . . . . . . . . 93--104
Javier Carretero and
Pedro Chaparro and
Xavier Vera and
Jaume Abella and
Antonio González End-to-end register data-flow continuous
self-test . . . . . . . . . . . . . . . 105--115
Doe Hyun Yoon and
Mattan Erez Memory mapped ECC: low-cost error
protection for last level caches . . . . 116--127
Mark Woh and
Sangwon Seo and
Scott Mahlke and
Trevor Mudge and
Chaitali Chakrabarti and
Krisztian Flautner AnySP: anytime anywhere anyway signal
processing . . . . . . . . . . . . . . . 128--139
John H. Kelm and
Daniel R. Johnson and
Matthew R. Johnson and
Neal C. Crago and
William Tuohy and
Aqeel Mahesri and
Steven S. Lumetta and
Matthew I. Frank and
Sanjay J. Patel Rigel: an architecture and scalable
programming interface for a 1000-core
accelerator . . . . . . . . . . . . . . 140--151
Sunpyo Hong and
Hyesoon Kim An analytical model for a GPU
architecture with memory-level and
thread-level parallelism awareness . . . 152--163
Susmit Biswas and
Diana Franklin and
Alan Savage and
Ryan Dixon and
Timothy Sherwood and
Frederic T. Chong Multi-execution: multicore caching for
data-similar executions . . . . . . . . 164--173
Yuejian Xie and
Gabriel H. Loh PIPP: promotion/insertion
pseudo-partitioning of multi-core shared
caches . . . . . . . . . . . . . . . . . 174--183
Nikos Hardavellas and
Michael Ferdman and
Babak Falsafi and
Anastasia Ailamaki Reactive NUCA: near-optimal block
placement and replication in distributed
caches . . . . . . . . . . . . . . . . . 184--195
Thomas Moscibroda and
Onur Mutlu A case for bufferless routing in on-chip
networks . . . . . . . . . . . . . . . . 196--207
Michel A. Kinsy and
Myong Hyon Cho and
Tina Wen and
Edward Suh and
Marten van Dijk and
Srinivas Devadas Application-aware deadlock-free
oblivious routing . . . . . . . . . . . 208--219
Nan Jiang and
John Kim and
William J. Dally Indirect adaptive routing on large scale
interconnection networks . . . . . . . . 220--231
James Hamilton Internet-scale service infrastructure
efficiency . . . . . . . . . . . . . . . 232--232
Colin Blundell and
Milo M. K. Martin and
Thomas F. Wenisch InvisiFence: performance-transparent
memory ordering in conventional
multiprocessors . . . . . . . . . . . . 233--244
Andrew Hilton and
Amir Roth Decoupled store completion/silent
deterministic replay: enabling scalable
data memory for CPR/CFP processors . . . 245--254
Hongzhong Zheng and
Jiang Lin and
Zhao Zhang and
Zhichun Zhu Decoupled DIMM: building high-bandwidth
memory system using low-speed DRAM
devices . . . . . . . . . . . . . . . . 255--266
Kevin Lim and
Jichuan Chang and
Trevor Mudge and
Parthasarathy Ranganathan and
Steven K. Reinhardt and
Thomas F. Wenisch Disaggregated memory for expansion and
sharing in blade servers . . . . . . . . 267--278
Cagdas Dirik and
Bruce Jacob The performance of PC solid-state disks
(SSDs) as a function of bandwidth,
concurrency, device architecture, and
system organization . . . . . . . . . . 279--289
Abhishek Bhattacharjee and
Margaret Martonosi Thread criticality predictors for
dynamic performance, power, and resource
management in chip multiprocessors . . . 290--301
Krishna K. Rangan and
Gu-Yeon Wei and
David Brooks Thread motion: fine-grained power
management for multi-core systems . . . 302--313
Yefu Wang and
Kai Ma and
Xiaorui Wang Temperature-constrained power control
for chip multiprocessors with online
model estimation . . . . . . . . . . . . 314--324
Jie Yu and
Satish Narayanasamy A case for an interleaving constrained
shared-memory multi-processor . . . . . 325--336
Abdullah Muzahid and
Dario Suárez and
Shanxiang Qi and
Josep Torrellas SigRace: signature-based data race
detection . . . . . . . . . . . . . . . 337--348
Vijay Nagarajan and
Rajiv Gupta ECMon: exposing cache events for
monitoring . . . . . . . . . . . . . . . 349--360
Ali G. Saidi and
Nathan L. Binkert and
Steven K. Reinhardt and
Trevor Mudge End-to-end performance forecasting:
finding bottlenecks before they happen 361--370
Brian M. Rogers and
Anil Krishna and
Gordon B. Bell and
Ken Vu and
Xiaowei Jiang and
Yan Solihin Scaling the bandwidth wall: challenges
in and avenues for CMP scaling . . . . . 371--382
Mark G. Whitney and
Nemanja Isailovic and
Yatish Patel and
John Kubiatowicz A fault tolerant, area efficient
architecture for Shor's factoring
algorithm . . . . . . . . . . . . . . . 383--394
Andrew Putnam and
Susan Eggers and
Dave Bennett and
Eric Dellinger and
Jeff Mason and
Henry Styles and
Prasanna Sundararajan and
Ralph Wittig Performance and power of cache-based
reconfigurable computing . . . . . . . . 395--405
Amin Firoozshahian and
Alex Solomatnikov and
Ofer Shacham and
Zain Asgar and
Stephen Richardson and
Christos Kozyrakis and
Mark Horowitz A memory system design framework:
creating smart memories . . . . . . . . 406--417
José A. Joao and
Onur Mutlu and
Yale N. Patt Flexible reference-counting-based
hardware acceleration for garbage
collection . . . . . . . . . . . . . . . 418--428
Yan Pan and
Prabhat Kumar and
John Kim and
Gokhan Memik and
Yu Zhang and
Alok Choudhary Firefly: illuminating future
network-on-chip with nanophotonics . . . 429--440
Mark J. Cianchetti and
Joseph C. Kerekes and
David H. Albonesi Phastlane: a rapid transit optical
routing network . . . . . . . . . . . . 441--450
Dennis Abts and
Natalie D. Enright Jerger and
John Kim and
Dan Gibson and
Mikko H. Lipasti Achieving predictable performance
through better memory controller
placement in many-core CMPs . . . . . . 451--461
Yangchun Luo and
Venkatesan Packirisamy and
Wei-Chung Hsu and
Antonia Zhai and
Nikhil Mungre and
Ankit Tarkas Dynamic performance tuning for
speculative threads . . . . . . . . . . 462--473
Carlos Madriles and
Pedro López and
Josep M. Codina and
Enric Gibert and
Fernando Latorre and
Alejandro Martinez and
Raúl Martinez and
Antonio Gonzalez Boosting single-thread performance in
multi-core systems through fine-grain
multi-threading . . . . . . . . . . . . 474--483
Shailender Chaudhry and
Robert Cypher and
Magnus Ekman and
Martin Karlsson and
Anders Landin and
Sherman Yip and
Håkan Zeffer and
Marc Tremblay Simultaneous speculative threading: a
novel pipeline architecture implemented
in Sun's Rock processor . . . . . . . . 484--495
Alexander Thomasian Publications on storage and systems
research . . . . . . . . . . . . . . . . 1--26
Enric Musoll Mesh-based many-core performance under
process variations: a core yield
perspective . . . . . . . . . . . . . . 27--34
Angel V. Nikolov Queuing theoretic model for a
multiprocessor with private caches and
shared memory . . . . . . . . . . . . . 35--44
Mark Thorson Internet nuggets . . . . . . . . . . . . 45--51
Enric Musoll Leakage-saving opportunities in
mesh-based massive multi-core
architectures . . . . . . . . . . . . . 1--7
Abdul Naeem and
Xiaowen Chen and
Zhonghai Lu and
Axel Jantsch Scalability of relaxed consistency
models in NoC based multicore
architectures . . . . . . . . . . . . . 8--15
Sandeep Sharma and
K. S. Kahlon and
P. K. Bansal Reliability and path length analysis of
irregular fault tolerant multistage
interconnection network . . . . . . . . 16--23
Mark Thorson Internet nuggets . . . . . . . . . . . . 24--30
Eric A. Brewer Technology for developing regions:
Moore's Law is not enough . . . . . . . 1--2
Engin Ipek and
Jeremy Condit and
Edmund B. Nightingale and
Doug Burger and
Thomas Moscibroda Dynamically replicated memory: building
reliable systems from nanoscale
resistive memories . . . . . . . . . . . 3--14
Nevin Kirman and
José F. Martínez A power-efficient all-optical on-chip
interconnect using wavelength-based
oblivious routing . . . . . . . . . . . 15--28
Naveen Neelakantam and
David R. Ditzel and
Craig Zilles A real system evaluation of hardware
atomicity for software speculation . . . 29--38
Tim Harris and
Sasa Tomic and
Adrián Cristal and
Osman Unsal Dynamic filtering: multi-purpose
architecture support for language
runtime systems . . . . . . . . . . . . 39--52
Tom Bergan and
Owen Anderson and
Joseph Devietti and
Luis Ceze and
Dan Grossman CoreDet: a compiler and runtime system
for deterministic multithreaded
execution . . . . . . . . . . . . . . . 53--64
Arun Raman and
Hanjun Kim and
Thomas R. Mason and
Thomas B. Jablin and
David I. August Speculative parallelization using
software multi-threaded transactions . . 65--76
Dongyoon Lee and
Benjamin Wester and
Kaushik Veeraraghavan and
Satish Narayanasamy and
Peter M. Chen and
Jason Flinn Respec: efficient online multiprocessor
replay via speculation and external
determinism . . . . . . . . . . . . . . 77--90
Stijn Eyerman and
Lieven Eeckhout Probabilistic job symbiosis modeling for
SMT processor scheduling . . . . . . . . 91--102
Kai Shen Request behavior variations . . . . . . 103--116
F. Ryan Johnson and
Radu Stoica and
Anastasia Ailamaki and
Todd C. Mowry Decoupling contention management from
scheduling . . . . . . . . . . . . . . . 117--128
Sergey Zhuravlev and
Sergey Blagodurov and
Alexandra Fedorova Addressing shared resource contention in
multicore processors via scheduling . . 129--142
Ding Yuan and
Haohui Mai and
Weiwei Xiong and
Lin Tan and
Yuanyuan Zhou and
Shankar Pasupathy SherLog: error diagnosis by connecting
clues from run-time logs . . . . . . . . 143--154
Dasarath Weeratunge and
Xiangyu Zhang and
Suresh Jagannathan Analyzing multicore dumps to facilitate
concurrency bug reproduction . . . . . . 155--166
Sebastian Burckhardt and
Pravesh Kothari and
Madanlal Musuvathi and
Santosh Nagarakatte A randomized scheduler with
probabilistic guarantees of finding bugs 167--178
Wei Zhang and
Chong Sun and
Shan Lu ConMem: detecting severe concurrency
bugs through an effect-oriented approach 179--192
Francisco Javier Mesa-Martinez and
Ehsan K. Ardestani and
Jose Renau Characterizing processor thermal
behavior . . . . . . . . . . . . . . . . 193--204
Ganesh Venkatesh and
Jack Sampson and
Nathan Goulding and
Saturnino Garcia and
Vladyslav Bryksin and
Jose Lugo-Martinez and
Steven Swanson and
Michael Bedford Taylor Conservation cores: reducing the energy
of mature computations . . . . . . . . . 205--218
Kshitij Sudan and
Niladrish Chatterjee and
David Nellans and
Manu Awasthi and
Rajeev Balasubramonian and
Al Davis Micro-pages: increasing DRAM efficiency
with locality-aware data placement . . . 219--230
Steven Pelley and
David Meisner and
Pooya Zandevakili and
Thomas F. Wenisch and
Jack Underwood Power routing: dynamic power
provisioning in the data center . . . . 231--242
Faraz Ahmad and
T. N. Vijaykumar Joint optimization of idle and cooling
power in data centers while maintaining
response time . . . . . . . . . . . . . 243--256
Michelle L. Goodstein and
Evangelos Vlachos and
Shimin Chen and
Phillip B. Gibbons and
Michael A. Kozuch and
Todd C. Mowry Butterfly analysis: adapting dataflow
analysis to dynamic parallel monitoring 257--270
Evangelos Vlachos and
Michelle L. Goodstein and
Michael A. Kozuch and
Shimin Chen and
Babak Falsafi and
Phillip B. Gibbons and
Todd C. Mowry ParaLog: enabling and accelerating
online parallel monitoring of
multithreaded applications . . . . . . . 271--284
Amir H. Hormati and
Yoonseo Choi and
Mark Woh and
Manjunath Kudlur and
Rodric Rabbah and
Trevor Mudge and
Scott Mahlke MacroSS: macro-SIMDization of streaming
applications . . . . . . . . . . . . . . 285--296
Dong Hyuk Woo and
Hsien-Hsin S. Lee COMPASS: a programmable data prefetcher
using idle GPU shaders . . . . . . . . . 297--310
Daniel Sanchez and
Richard M. Yoo and
Christos Kozyrakis Flexible architectural support for
fine-grain scheduling . . . . . . . . . 311--322
Bogdan F. Romanescu and
Alvin R. Lebeck and
Daniel J. Sorin Specifying and dynamically verifying
address translation-aware memory
consistency . . . . . . . . . . . . . . 323--334
Eiman Ebrahimi and
Chang Joo Lee and
Onur Mutlu and
Yale N. Patt Fairness via source throttling: a
configurable and high-performance
fairness substrate for multi-core memory
systems . . . . . . . . . . . . . . . . 335--346
Isaac Gelado and
Javier Cabezas and
Nacho Navarro and
John E. Stone and
Sanjay Patel and
Wen-mei W. Hwu An asymmetric distributed shared memory
model for heterogeneous parallel systems 347--358
Abhishek Bhattacharjee and
Margaret Martonosi Inter-core cooperative TLB for chip
multiprocessors . . . . . . . . . . . . 359--370
Ruirui Huang and
Daniel Y. Deng and
G. Edward Suh Orthrus: efficient software integrity
protection on multi-cores . . . . . . . 371--384
Shuguang Feng and
Shantanu Gupta and
Amin Ansari and
Scott Mahlke Shoestring: probabilistic soft error
reliability on the cheap . . . . . . . . 385--396
Doe Hyun Yoon and
Mattan Erez Virtualized and flexible ECC for main
memory . . . . . . . . . . . . . . . . . 397--408
Alexander Thomasian Storage research in industry and
universities . . . . . . . . . . . . . . 1--48
Wolfgang Matthes Resources instead of cores? . . . . . . 49--63
Mark Thorson Internet nuggets . . . . . . . . . . . . 64--67
William J. Dally Moving the needle, computer architecture
research in academe and industry . . . . 1--1
Yasuko Watanabe and
John D. Davis and
David A. Wood WiDGET: Wisconsin Decoupled Grid
Execution Tiles . . . . . . . . . . . . 2--13
Dan Gibson and
David A. Wood Forwardflow: a scalable core for
power-constrained CMPs . . . . . . . . . 14--25
Omid Azizi and
Aqeel Mahesri and
Benjamin C. Lee and
Sanjay J. Patel and
Mark Horowitz Energy-performance tradeoffs in
processor architecture and circuit
design: a marginal cost analysis . . . . 26--36
Rehan Hameed and
Wajahat Qadeer and
Megan Wachs and
Omid Azizi and
Alex Solomatnikov and
Benjamin C. Lee and
Stephen Richardson and
Christos Kozyrakis and
Mark Horowitz Understanding sources of inefficiency in
general-purpose chips . . . . . . . . . 37--47
Thomas W. Barr and
Alan L. Cox and
Scott Rixner Translation caching: skip, don't walk
(the page table) . . . . . . . . . . . . 48--59
Aamer Jaleel and
Kevin B. Theobald and
Simon C. Steely, Jr. and
Joel Emer High performance cache replacement using
re-reference interval prediction (RRIP) 60--71
Jeffrey Stuecheli and
Dimitris Kaseridis and
David Daly and
Hillery C. Hunter and
Lizy K. John The virtual write queue: coordinating
DRAM and last-level cache policies . . . 72--82
Chris Wilkerson and
Alaa R. Alameldeen and
Zeshan Chishti and
Wei Wu and
Dinesh Somasekhar and
Shih-lien Lu Reducing cache power with low-cost,
multi-bit error-correcting codes . . . . 83--93
Jing Xue and
Alok Garg and
Berkehan Ciftcio\uglu and
Jianyun Hu and
Shang Wang and
Ioannis Savidis and
Manish Jain and
Rebecca Berman and
Peng Liu and
Michael Huang and
Hui Wu and
Eby Friedman and
Gary Wicks and
Duncan Moore An intra-chip free-space optical
interconnect . . . . . . . . . . . . . . 94--105
Reetuparna Das and
Onur Mutlu and
Thomas Moscibroda and
Chita R. Das Aérgia: exploiting packet latency slack
in on-chip networks . . . . . . . . . . 106--116
Pranay Koka and
Michael O. McCracken and
Herb Schwetman and
Xuezhe Zheng and
Ron Ho and
Ashok V. Krishnamoorthy Silicon-photonic network architectures
for scalable, power-efficient multi-chip
systems . . . . . . . . . . . . . . . . 117--128
Scott Beamer and
Chen Sun and
Yong-Jin Kwon and
Ajay Joshi and
Christopher Batten and
Vladimir Stojanovi\'c and
Krste Asanovi\'c Re-architecting DRAM memory systems with
monolithically integrated silicon
photonics . . . . . . . . . . . . . . . 129--140
Stuart Schechter and
Gabriel H. Loh and
Karin Straus and
Doug Burger Use ECP, not ECC, for hard failures in
resistive memories . . . . . . . . . . . 141--152
Moinuddin K. Qureshi and
Michele M. Franceschini and
Luis A. Lastras-Montaño and
John P. Karidis Morphable memory system: a robust
architecture for exploiting multi-level
phase change memories . . . . . . . . . 153--162
Timothy Pritchett and
Mithuna Thottethodi SieveStore: a highly-selective,
ensemble-level disk cache for
cost-performance . . . . . . . . . . . . 163--174
Aniruddha N. Udipi and
Naveen Muralimanohar and
Niladrish Chatterjee and
Rajeev Balasubramonian and
Al Davis and
Norman P. Jouppi Rethinking DRAM design and organization
for energy-constrained multi-cores . . . 175--186
Yunji Chen and
Weiwu Hu and
Tianshi Chen and
Ruiyang Wu LReplay: a pending period based
deterministic replay scheme . . . . . . 187--197
Gwendolyn Voskuilen and
Faraz Ahmad and
T. N. Vijaykumar Timetraveler: exploiting acyclic races
for optimizing memory race recording . . 198--209
Brandon Lucia and
Luis Ceze and
Karin Strauss and
Shaz Qadeer and
Hans-J. Boehm Conflict exceptions: simplifying
concurrent language semantics with
precise hardware exceptions for
data-races . . . . . . . . . . . . . . . 210--221
Brandon Lucia and
Luis Ceze and
Karin Strauss ColorSafe: architectural support for
debugging and dynamically avoiding
multi-variable atomicity violations . . 222--233
Mary Jane Irwin Shared caches in multicores: the good,
the bad, and the ugly . . . . . . . . . 234--234
Jiayuan Meng and
David Tarjan and
Kevin Skadron Dynamic warp subdivision for integrated
branch and memory divergence tolerance 235--246
Srimat Chakradhar and
Murugan Sankaradas and
Venkata Jakkula and
Srihari Cadambi A dynamically configurable coprocessor
for convolutional neural networks . . . 247--257
Colin Blundell and
Arun Raghavan and
Milo M. K. Martin RETCON: transactional repair without
replay . . . . . . . . . . . . . . . . . 258--269
Janghaeng Lee and
Haicheng Wu and
Madhumitha Ravichandran and
Nathan Clark Thread Tailor: dynamically weaving
threads together for efficient, adaptive
parallel applications . . . . . . . . . 270--279
Sunpyo Hong and
Hyesoon Kim An integrated GPU power and performance
model . . . . . . . . . . . . . . . . . 280--289
Zhangxi Tan and
Andrew Waterman and
Henry Cook and
Sarah Bird and
Krste Asanovi\'c and
David Patterson A case for FAME: FPGA architecture model
execution . . . . . . . . . . . . . . . 290--301
Geoffrey Blake and
Ronald G. Dreslinski and
Trevor Mudge and
Krisztián Flautner Evolution of thread-level parallelism in
desktop applications . . . . . . . . . . 302--313
Vijay Janapa Reddi and
Benjamin C. Lee and
Trishul Chilimbi and
Kushagra Vaid Web search using mobile cores:
quantifying and mitigating the price of
efficiency . . . . . . . . . . . . . . . 314--325
Vijayaraghavan Soundararajan and
Jennifer M. Anderson The impact of management operations on
the virtualized datacenter . . . . . . . 326--337
Dennis Abts and
Michael R. Marty and
Philip M. Wells and
Peter Klausler and
Hong Liu Energy proportional datacenter networks 338--347
Charles P. Thacker Improving the future by examining the
past . . . . . . . . . . . . . . . . . . 348--348
Olivier Temam The rebirth of neural networks . . . . . 349--349
Eric Keller and
Jakub Szefer and
Jennifer Rexford and
Ruby B. Lee NoHype: virtualized cloud infrastructure
without the virtualization . . . . . . . 350--361
Stijn Eyerman and
Lieven Eeckhout Modeling critical sections in Amdahl's
Law and its implications for multicore
design . . . . . . . . . . . . . . . . . 362--370
Xiaochen Guo and
Engin Ipek and
Tolga Soyata Resistive computation: avoiding the
power wall with low-leakage, STT-MRAM
based computing . . . . . . . . . . . . 371--382
Nak Hee Seong and
Dong Hyuk Woo and
Hsien-Hsin S. Lee Security refresh: prevent malicious
wear-out and increase durability for
phase-change memory with dynamically
randomized address mapping . . . . . . . 383--394
Ruirui Huang and
G. Edward Suh IVEC: off-chip memory integrity
protection for both security and
reliability . . . . . . . . . . . . . . 395--406
Arrvindh Shriraman and
Sandhya Dwarkadas Sentry: light-weight auxiliary memory
access control . . . . . . . . . . . . . 407--418
Enric Herrero and
José González and
Ramon Canal Elastic cooperative caching: an
autonomous dynamically adaptive memory
hierarchy for chip multiprocessors . . . 419--428
John H. Kelm and
Daniel R. Johnson and
William Tuohy and
Steven S. Lumetta and
Sanjay J. Patel Cohesion: a hybrid memory model for
accelerators . . . . . . . . . . . . . . 429--440
M. Aater Suleman and
Onur Mutlu and
José A. Joao and
Khubaib and
Yale N. Patt Data marshaling for multi-core
architectures . . . . . . . . . . . . . 441--450
Victor W. Lee and
Changkyu Kim and
Jatin Chhugani and
Michael Deisher and
Daehyun Kim and
Anthony D. Nguyen and
Nadathur Satish and
Mikhail Smelyanskiy and
Srinivas Chennupaty and
Per Hammarlund and
Ronak Singhal and
Pradeep Dubey Debunking the 100X GPU vs. CPU myth: an
evaluation of throughput computing on
CPU and GPU . . . . . . . . . . . . . . 451--460
Vilas Sridharan and
David R. Kaeli Using hardware vulnerability factors to
enhance AVF analysis . . . . . . . . . . 461--472
Amin Ansari and
Shuguang Feng and
Shantanu Gupta and
Scott Mahlke Necromancer: enhancing system throughput
by animating dead cores . . . . . . . . 473--484
Guihai Yan and
Xiaoyao Liang and
Yinhe Han and
Xiaowei Li Leveraging the core-level complementary
effects of PVT variations to reduce
timing emergencies in multi-core
processors . . . . . . . . . . . . . . . 485--496
Marc de Kruijf and
Shuou Nomura and
Karthikeyan Sankaralingam Relax: an architectural framework for
software recovery of hardware faults . . 497--508
Marco Nuño-Maganda and
Cesar Torres-Huitzil A temporal coding hardware
implementation for spiking neural
networks . . . . . . . . . . . . . . . . 2--7
Hirokazu Morisita and
Kenta Inakagata and
Yasunori Osana and
Naoyuki Fujita and
Hideharu Amano Implementation and evaluation of an
arithmetic pipeline on FLOPS-$2$D:
multi-FPGA system . . . . . . . . . . . 8--13
Anson H. T. Tse and
David B. Thomas and
K. H. Tsoi and
Wayne Luk Efficient reconfigurable design for
pricing Asian options . . . . . . . . . 14--20
Tadayoshi Horita and
Itsuo Takanami An FPGA-based fast classifier with high
generalization property . . . . . . . . 21--26
Andrew Putnam and
Aaron Smith and
Doug Burger Dynamic vectorization in the E2 dynamic
multicore architecture . . . . . . . . . 27--32
Jong Kyung Paek and
Kiyoung Choi and
Jongeun Lee Binary acceleration using coarse-grained
reconfigurable architecture . . . . . . 33--39
Keisuke Dohi and
Yuichiro Shibata and
Tsuyoshi Hamada and
Tomonari Masada and
Kiyoshi Oguri and
Duncan A. Buell Implementation of a programming
environment with a multithread model for
reconfigurable systems . . . . . . . . . 40--45
Mojtaba Sabeghi and
Hamid Mushtaq and
Koen Bertels Runtime multitasking support on
polymorphic platforms . . . . . . . . . 46--52
Kuen Hung Tsoi and
Anson H. T. Tse and
Peter Pietzuch and
Wayne Luk Programming framework for clusters with
heterogeneous accelerators . . . . . . . 53--59
Claude Tadonki and
Gilbert Grodidier and
Olivier Pene An efficient CELL library for lattice
quantum chromodynamics . . . . . . . . . 60--65
Ryan Taylor and
Xiaoming Li Software-based branch predication for
AMD GPUs . . . . . . . . . . . . . . . . 66--72
Sebastian Banescu and
Florent de Dinechin and
Bogdan Pasca and
Radu Tudoran Multipliers for floating-point double
precision and beyond on FPGAs . . . . . 73--79
Kentaro Sano and
Luzhou Wang and
Satoru Yamamoto Prototype implementation of
array-processor extensible over multiple
FPGAs for scalable stencil computation 80--86
Chi-Chiu Tsang and
Hayden Kwok-Hay So Dynamic power reduction of FPGA-based
reconfigurable computers using
precomputation . . . . . . . . . . . . . 87--92
Mark Thorson Internet nuggets . . . . . . . . . . . . 93--96
Manideepa Mukherjee and
Amitabha Sinha A novel architecture for conversion of
binary to single digit double base
numbers . . . . . . . . . . . . . . . . 1--6
Shobha T. and
Syed Akram and
G. Varaprasad Design and development of framework for
diagnosing intermediate nodes . . . . . 7--11
Fuad Tabba Adding concurrency in Python using a
commercial processor's hardware
transactional memory support . . . . . . 12--19
Alexander Thomasian Why specialized disks for composite
operations may be unnecessary . . . . . 20--27
Mark Thorson Internet nuggets . . . . . . . . . . . . 28--36
James R. Larus The cloud will change everything . . . . 1--2
Ding Yuan and
Jing Zheng and
Soyeon Park and
Yuanyuan Zhou and
Stefan Savage Improving software diagnosability via
log enhancement . . . . . . . . . . . . 3--14
Kaushik Veeraraghavan and
Dongyoon Lee and
Benjamin Wester and
Jessica Ouyang and
Peter M. Chen and
Jason Flinn and
Satish Narayanasamy DoublePlay: parallelizing sequential
logging and replay . . . . . . . . . . . 15--26
Jared Casper and
Tayo Oguntebi and
Sungpack Hong and
Nathan G. Bronson and
Christos Kozyrakis and
Kunle Olukotun Hardware acceleration of transactional
memory on commodity systems . . . . . . 27--38
Luke Dalessandro and
François Carouge and
Sean White and
Yossi Lev and
Mark Moir and
Michael L. Scott and
Michael F. Spear Hybrid NOrec: a case study in the
effectiveness of best effort hardware
transactional memory . . . . . . . . . . 39--52
Abhayendra Singh and
Daniel Marino and
Satish Narayanasamy and
Todd Millstein and
Madan Musuvathi Efficient processor support for DRFx, a
memory model with exceptions . . . . . . 53--66
Joseph Devietti and
Jacob Nelson and
Tom Bergan and
Luis Ceze and
Dan Grossman RCDC: a relaxed consistency
deterministic computer . . . . . . . . . 67--78
Jacob Burnim and
George Necula and
Koushik Sen Specifying and checking semantic
atomicity for multithreaded programs . . 79--90
Haris Volos and
Andres Jaan Tack and
Michael M. Swift Mnemosyne: lightweight persistent memory 91--104
Joel Coburn and
Adrian M. Caulfield and
Ameen Akel and
Laura M. Grupp and
Rajesh K. Gupta and
Ranjit Jhala and
Steven Swanson NV-Heaps: making persistent objects fast
and safe with next-generation,
non-volatile memories . . . . . . . . . 105--118
Adrian Schüpbach and
Andrew Baumann and
Timothy Roscoe and
Simon Peter A declarative language approach to
device configuration . . . . . . . . . . 119--132
Leonid Ryzhyk and
John Keys and
Balachandra Mirla and
Arun Raghunath and
Mona Vij and
Gernot Heiser Improved device driver reliability
through hardware verification reuse . . 133--144
Atif Hashmi and
Andrew Nere and
James Jamal Thomas and
Mikko Lipasti A case for neuromorphic ISAs . . . . . . 145--158
Benjamin Ransford and
Jacob Sorber and
Kevin Fu Mementos: system support for
long-running computation on RFID-scale
devices . . . . . . . . . . . . . . . . 159--170
Emmanouil Koukoumidis and
Dimitrios Lymberopoulos and
Karin Strauss and
Jie Liu and
Doug Burger Pocket cloudlets . . . . . . . . . . . . 171--184
Navin Sharma and
Sean Barker and
David Irwin and
Prashant Shenoy Blink: managing server clusters on
intermittent power . . . . . . . . . . . 185--198
Henry Hoffmann and
Stelios Sidiroglou and
Michael Carbin and
Sasa Misailovic and
Anant Agarwal and
Martin Rinard Dynamic knobs for responsive power-aware
computing . . . . . . . . . . . . . . . 199--212
Song Liu and
Karthik Pattabiraman and
Thomas Moscibroda and
Benjamin G. Zorn Flikker: saving DRAM refresh-power
through critical data partitioning . . . 213--224
Qingyuan Deng and
David Meisner and
Luiz Ramos and
Thomas F. Wenisch and
Ricardo Bianchini MemScale: active low-power modes for
main memory . . . . . . . . . . . . . . 225--238
Qi Gao and
Wenbin Zhang and
Zhezhe Chen and
Mai Zheng and
Feng Qin 2ndStrike: toward manifesting hidden
concurrency typestate bugs . . . . . . . 239--250
Wei Zhang and
Junghee Lim and
Ramya Olichandran and
Joel Scherpelz and
Guoliang Jin and
Shan Lu and
Thomas Reps ConSeq: detecting concurrency bugs
through sequential errors . . . . . . . 251--264
Vitaly Chipounov and
Volodymyr Kuznetsov and
George Candea S2E: a platform for in-vivo multi-path
analysis of software systems . . . . . . 265--278
Owen S. Hofmann and
Alan M. Dunn and
Sangman Kim and
Indrajit Roy and
Emmett Witchel Ensuring operating system kernel
integrity with OSck . . . . . . . . . . 279--290
Donald E. Porter and
Silas Boyd-Wickizer and
Jon Howell and
Reuben Olinsky and
Galen C. Hunt Rethinking the library OS from the top
down . . . . . . . . . . . . . . . . . . 291--304
Nicolas Palix and
Gaël Thomas and
Suman Saha and
Christophe Calv\`es and
Julia Lawall and
Gilles Muller Faults in Linux: ten years later . . . . 305--318
Hadi Esmaeilzadeh and
Ting Cao and
Yang Xi and
Stephen M. Blackburn and
Kathryn S. McKinley Looking back on the language and
hardware revolutions: measured power,
performance, and scaling . . . . . . . . 319--332
Donald Nguyen and
Keshav Pingali Synthesizing concurrent schedulers for
irregular algorithms . . . . . . . . . . 333--344
Giang Hoang and
Robby Bruce Findler and
Russ Joseph Exploring circuit timing-aware language
and compilation . . . . . . . . . . . . 345--356
Sardar M. Farhad and
Yousun Ko and
Bernd Burgstaller and
Bernhard Scholz Orchestration by approximation: mapping
stream programs onto multicore
architectures . . . . . . . . . . . . . 357--368
Eddy Z. Zhang and
Yunlian Jiang and
Ziyu Guo and
Kai Tian and
Xipeng Shen On-the-fly elimination of dynamic
irregularities for GPU computing . . . . 369--380
Amir H. Hormati and
Mehrzad Samadi and
Mark Woh and
Trevor Mudge and
Scott Mahlke Sponge: portable stream programming on
graphics engines . . . . . . . . . . . . 381--392
Md Kamruzzaman and
Steven Swanson and
Dean M. Tullsen Inter-core prefetching for multicore
processors using migrating helper
threads . . . . . . . . . . . . . . . . 393--404
Hiroshige Hayashizaki and
Peng Wu and
Hiroshi Inoue and
Mauricio J. Serrano and
Toshio Nakatani Improving the performance of trace-based
systems by false loop filtering . . . . 405--418
Nathan Binkert and
Bradford Beckmann and
Gabriel Black and
Steven K. Reinhardt and
Ali Saidi and
Arkaprava Basu and
Joel Hestness and
Derek R. Hower and
Tushar Krishna and
Somayeh Sardashti and
Rathijit Sen and
Korey Sewell and
Muhammad Shoaib and
Nilay Vaish and
Mark D. Hill and
David A. Wood The gem5 simulator . . . . . . . . . . . 1--7
Alexander Thomasian Survey and analysis of disk scheduling
methods . . . . . . . . . . . . . . . . 8--25
Thimmarayaswamy K and
Mary M. Dsouza and
G. Varaprasad Low power techniques for an Android
based phone . . . . . . . . . . . . . . 26--35
Mark Thorson Internet nuggets . . . . . . . . . . . . 36--52
Atif Hashmi and
Hugues Berry and
Olivier Temam and
Mikko Lipasti Automatic abstraction and fault
tolerance in cortical microachitectures 1--10
Niket K. Choudhary and
Salil V. Wadhavkar and
Tanmay A. Shah and
Hiran Mayukh and
Jayneel Gandhi and
Brandon H. Dwiel and
Sandeep Navada and
Hashem H. Najaf-abadi and
Eric Rotenberg FabScalar: composing synthesizable RTL
designs of arbitrary cores within a
canonical superscalar template . . . . . 11--22
Erika Gunadi and
Mikko H. Lipasti CRIB: consolidated rename, issue, and
bypass . . . . . . . . . . . . . . . . . 23--32
Rishi Agarwal and
Josep Torrellas FlexBulk: intelligently forming atomic
blocks in blocked-execution
multiprocessors to minimize squashes . . 33--44
Youngjin Kwon and
Changdae Kim and
Seungryoul Maeng and
Jaehyuk Huh Virtualizing performance asymmetric
multi-core systems . . . . . . . . . . . 45--56
Daniel Sanchez and
Christos Kozyrakis Vantage: scalable and efficient
fine-grain cache partitioning . . . . . 57--68
Asit K. Mishra and
Xiangyu Dong and
Guangyu Sun and
Yuan Xie and
N. Vijaykrishnan and
Chita R. Das Architecting on-chip interconnects for
stacked $3$D STT-RAM caches in CMPs . . 69--80
Jayesh Gaur and
Mainak Chaudhuri and
Sreenivas Subramoney Bypass and insertion algorithms for
exclusive last-level caches . . . . . . 81--92
Blas A. Cuesta and
Alberto Ros and
María E. Gómez and
Antonio Robles and
José F. Duato Increasing the effectiveness of
directory caches by deactivating
coherence for private memory blocks . . 93--104
Jungju Oh and
Milos Prvulovic and
Alenka Zajic TLSync: support for multiple fast
barriers using on-chip transmission
lines . . . . . . . . . . . . . . . . . 105--116
Neal Clayton Crago and
Sanjay Jeram Patel OUTRIDER: efficient memory latency
tolerance with decoupled strands . . . . 117--128
Yunsup Lee and
Rimas Avizienis and
Alex Bishara and
Richard Xia and
Derek Lockhart and
Christopher Batten and
Krste Asanovi\'c Exploring the tradeoffs between
programmability and efficiency in
data-parallel accelerators . . . . . . . 129--140
Eiman Ebrahimi and
Chang Joo Lee and
Onur Mutlu and
Yale N. Patt Prefetch-aware shared resource
management for multi-core systems . . . 141--152
Rishi Agarwal and
Pranav Garg and
Josep Torrellas Rebound: scalable checkpointing for
coherent shared memory . . . . . . . . . 153--164
Joseph L. Greathouse and
Zhiqiang Ma and
Matthew I. Frank and
Ramesh Peri and
Todd Austin Demand-driven software race detection
using hardware performance counters . . 165--176
Siddhartha Chhabra and
Yan Solihin i-NVMM: a secure non-volatile main
memory system with incremental
encryption . . . . . . . . . . . . . . . 177--188
Mohit Tiwari and
Jason K. Oberg and
Xun Li and
Jonathan Valamehr and
Timothy Levin and
Ben Hardekopf and
Ryan Kastner and
Frederic T. Chong and
Timothy Sherwood Crafting a usable microkernel,
processor, and I/O system with strict
and provable information flow security 189--200
Shuou Nomura and
Matthew D. Sinclair and
Chen-Han Ho and
Venkatraman Govindaraju and
Marc de Kruijf and
Karthikeyan Sankaralingam Sampling $+$ DMR: practical and
low-overhead permanent fault detection 201--212
Sangeetha Sudhakrishnan and
Rigo Dicochea and
Jose Renau Releasing efficient beta cores to market
early . . . . . . . . . . . . . . . . . 213--222
Mehrtash Manoochehri and
Murali Annavaram and
Michel Dubois CPPC: correctable parity protected cache 223--234
Mark Gebhart and
Daniel R. Johnson and
David Tarjan and
Stephen W. Keckler and
William J. Dally and
Erik Lindholm and
Kevin Skadron Energy-efficient mechanisms for managing
thread context in throughput processors 235--246
Wing-kei S. Yu and
Ruirui Huang and
Sarah Q. Xu and
Sung-En Wang and
Edwin Kan and
G. Edward Suh SRAM--DRAM hybrid memory with
applications to efficient register files
in fine-grained multi-threading . . . . 247--258
Binzhang Fu and
Yinhe Han and
Jun Ma and
Huawei Li and
Xiaowei Li An abacus turn model for
time/space-efficient reconfigurable
routing . . . . . . . . . . . . . . . . 259--270
Aaron Carpenter and
Jianyun Hu and
Jie Xu and
Michael Huang and
Hui Wu A case for globally shared-medium
on-chip interconnect . . . . . . . . . . 271--282
Lingjia Tang and
Jason Mars and
Neil Vachharajani and
Robert Hundt and
Mary Lou Soffa The impact of memory subsystem resource
sharing on datacenter applications . . . 283--294
Doe Hyun Yoon and
Min Kyu Jeong and
Mattan Erez Adaptive granularity memory systems: a
tradeoff between storage efficiency and
throughput . . . . . . . . . . . . . . . 295--306
Thomas W. Barr and
Alan L. Cox and
Scott Rixner SpecTLB: a mechanism for speculative
address translation . . . . . . . . . . 307--318
David Meisner and
Christopher M. Sadler and
Luiz André Barroso and
Wolf-Dietrich Weber and
Thomas F. Wenisch Power management of online
data-intensive services . . . . . . . . 319--330
Susmit Biswas and
Mohit Tiwari and
Timothy Sherwood and
Luke Theogarajan and
Frederic T. Chong Fighting fire with fire: modeling the
datacenter-scale effects of targeted
superlattice thermal management . . . . 331--340
Sriram Govindan and
Anand Sivasubramaniam and
Bhuvan Urgaonkar Benefits and limitations of tapping into
stored energy for datacenters . . . . . 341--352
John Demme and
Simha Sethumadhavan Rapid identification of architectural
bottlenecks via precise event counting 353--364
Hadi Esmaeilzadeh and
Emily Blem and
Renee St. Amant and
Karthikeyan Sankaralingam and
Doug Burger Dark silicon and the end of multicore
scaling . . . . . . . . . . . . . . . . 365--376
Guangyu Sun and
Christopher J. Hughes and
Changkyu Kim and
Jishen Zhao and
Cong Xu and
Yuan Xie and
Yen-Kuang Chen Moguls: a model to explore the memory
hierarchy for bandwidth improvements . . 377--388
Asit K. Mishra and
N. Vijaykrishnan and
Chita R. Das A case for heterogeneous on-chip
interconnects for CMPs . . . . . . . . . 389--400
Boris Grot and
Joel Hestness and
Stephen W. Keckler and
Onur Mutlu Kilo-NOC: a heterogeneous
network-on-chip architecture for
scalability and service guarantees . . . 401--412
Sheng Ma and
Natalie Enright Jerger and
Zhiying Wang DBAR: an efficient routing algorithm to
support multiple concurrent applications
in networks-on-chip . . . . . . . . . . 413--424
Aniruddha N. Udipi and
Naveen Muralimanohar and
Rajeev Balasubramonian and
Al Davis and
Norman P. Jouppi Combining memory and a controller with
photonics through $3$D-stacking to
enable scalable and energy-efficient
systems . . . . . . . . . . . . . . . . 425--436
Nathan Binkert and
Al Davis and
Norman P. Jouppi and
Moray McLaren and
Naveen Muralimanohar and
Robert Schreiber and
Jung Ho Ahn The role of optics in future high radix
switch design . . . . . . . . . . . . . 437--448
Kai Ma and
Xue Li and
Ming Chen and
Xiaorui Wang Scalable power control for many-core
architectures running multi-threaded
applications . . . . . . . . . . . . . . 449--460
Alaa R. Alameldeen and
Ilya Wagner and
Zeshan Chishti and
Wei Wu and
Chris Wilkerson and
Shih-Lien Lu Energy-efficient cache design using
variable-strength error-correcting codes 461--472
Luiz Andre Barroso Warehouse-Scale Computing: Entering the
Teenage Decade . . . . . . . . . . . . . ??
David A. Ferrucci IBM's Watson/DeepQA . . . . . . . . . . ??
Ravi Kannan Algorithms: Recent Highlights and
Challenges . . . . . . . . . . . . . . . ??
Miriam Leeser and
Devon Yablonski and
Dana Brooks and
Laurie Smith King The challenges of writing portable,
correct and high performance libraries
for GPUs . . . . . . . . . . . . . . . . 2--7
Kuen Hung Tsoi and
Wayne Luk Power profiling and optimization for
heterogeneous multi-core systems . . . . 8--13
Serban Georgescu and
Peter Chow GPU accelerated CAE using open solvers
and the cloud . . . . . . . . . . . . . 14--19
Junying Chen and
Billy Y. S. Yiu and
Brandon K. Hamilton and
Alfred C. H. Yu and
Hayden K.-H. So Design space exploration of adaptive
beamforming acceleration for bedside and
portable medical ultrasound imaging . . 20--25
Keisuke Dohi and
Yuichiro Shibata and
Kiyoshi Oguri and
Takafumi Fujimoto GPU implementation and optimization of
electromagnetic simulation using the
FDTD method for antenna designing . . . 26--31
Tomoyuki Nagatsuka and
Yoshito Sakaguchi and
Takayuki Matsumura and
Kenji Kise CoreSymphony: an efficient
reconfigurable multi-core architecture 32--37
Shinya Takamaeda-Yamazaki and
Ryosuke Sasakawa and
Yoshito Sakaguchi and
Kenji Kise An FPGA-based scalable simulation
accelerator for tile architectures . . . 38--43
Kentaro Sano and
Satoru Yamamoto and
Yoshiaki Hatsuda Domain-specific programmable design of
scalable streaming-array for
power-efficient stencil computation . . 44--49
Takayuki Akamine and
Kenta Inakagata and
Yasunori Osana and
Naoyuki Fujita and
Hideharu Amano An implementation of out-of-order
execution system for acceleration of
computational fluid dynamics on FPGAs 50--55
Haisheng Liu and
Smail Niar and
Yassin El-Hillali and
Atika Rivenq Embedded architecture with hardware
accelerator for target recognition in
driver assistance system . . . . . . . . 56--59
Oliver Pell and
Oskar Mencer Surviving the end of frequency scaling
with reconfigurable dataflow computing 60--65
Ana Balevic and
Bart Kienhuis KPN2GPU: an approach for discovery and
exploitation of fine-grain data
parallelism in process networks . . . . 66--71
Amila Akagi\'c and
Hideharu Amano High speed CRC with 64-bit generator
polynomial on an FPGA . . . . . . . . . 72--77
Shufan Yang and
T. M. McGinnity A biologically plausible real-time
spiking neuron simulation environment
based on a multiple-FPGA platform . . . 78--81
Hiroomi Sawada and
Morihiro Kuga and
Motoki Amagasaki and
Masahiro Iida and
Toshinori Sueyoshi Parallelization of the channel width
search for FPGA routing . . . . . . . . 82--85
Shoji Tanabe and
Takuya Nagashima and
Yoshiki Yamaguchi A study of an FPGA based flexible SIMD
processor . . . . . . . . . . . . . . . 86--89
Antoine Trouve and
Kazuaki Murakami Augmenting DR-ASIP flexibility through
multi-mode custom instructions . . . . . 90--93
Shinya Kubota and
Minoru Watanabe A MEMS writer system embedded for a
programmable optically reconfigurable
gate array . . . . . . . . . . . . . . . 94--97
Jan Fousek and
Ji\vri Filipovi\vc and
Matu\vs Madzin Automatic fusions of CUDA--GPU kernels
for parallel map . . . . . . . . . . . . 98--99
Kohei Matsunobu and
Keisuke Dohi and
Yuichiro Shibata and
Kiyoshi Oguri A discussion on calculating eigenvalues
of real symmetric tridiagonal matrices
on a GPU . . . . . . . . . . . . . . . . 100--101
Dominik Meyer and
Bernd Klauer Multicore reconfiguration platform an
alternative to RAMPSoC . . . . . . . . . 102--103
Robin Bonamy and
Daniel Chillet and
Olivier Sentieys and
Sebastien Bilavarn Parallelism Level Impact on Energy
Consumption in Reconfigurable Devices 104--105
Michael Opoku Agyeman and
Ali Ahmadinia Power and area optimisation in
heterogeneous $3$D networks-on-chip
architectures . . . . . . . . . . . . . 106--107
Mark Thorson Internet nuggets . . . . . . . . . . . . 108--117
Malay Das and
Amitabha Sinha and
Nishant Kumar Giri High speed residue number system (RNS)
based FIR filter using distributed
arithmetic (DA) . . . . . . . . . . . . 1--4
Anindita Chakraborty and
Amitabha Sinha Conversion of binary to single-term
triple base numbers for DSP applications 5--11
Satrughna Singha and
Aniruddha Ghosh and
Amitabha Sinha A new architecture for FPGA based
implementation of conversion of binary
to double base number system (DBNS)
using parallel search technique . . . . 12--18
Mark Thorson Internet nuggets . . . . . . . . . . . . 19--23
Dimitrios Lymberopoulos and
Oriana Riva and
Karin Strauss and
Akshay Mittal and
Alexandros Ntoulas PocketWeb: instant web browsing for
mobile devices . . . . . . . . . . . . . 1--12
Felix Xiaozhu Lin and
Zhen Wang and
Robert LiKamWa and
Lin Zhong Reflex: using low-power processors in
smartphones without knowing them . . . . 13--24
Jichuan Chang and
Justin Meza and
Parthasarathy Ranganathan and
Amip Shah and
Rocky Shih and
Cullen Bash Totally green: evaluating and designing
servers for lifecycle environmental
impact . . . . . . . . . . . . . . . . . 25--36
Michael Ferdman and
Almutaz Adileh and
Onur Kocberber and
Stavros Volos and
Mohammad Alisafaee and
Djordje Jevdjic and
Cansu Kaynak and
Adrian Daniel Popescu and
Anastasia Ailamaki and
Babak Falsafi Clearing the clouds: a study of emerging
scale-out workloads on modern hardware 37--48
Yang Chen and
Shuangde Fang and
Lieven Eeckhout and
Olivier Temam and
Chengyong Wu Iterative optimization for the data
center . . . . . . . . . . . . . . . . . 49--60
Faraz Ahmad and
Srimat T. Chakradhar and
Anand Raghunathan and
T. N. Vijaykumar Tarazu: optimizing MapReduce on
heterogeneous clusters . . . . . . . . . 61--74
Sriram Govindan and
Di Wang and
Anand Sivasubramaniam and
Bhuvan Urgaonkar Leveraging stored energy for handling
power emergencies in aggressively
provisioned datacenters . . . . . . . . 75--86
Asim Kadav and
Michael M. Swift Understanding modern device drivers . . 87--98
Sankaralingam Panneerselvam and
Michael M. Swift Chameleon: operating system support for
dynamic processors . . . . . . . . . . . 99--110
Andy A. Hwang and
Ioan A. Stefanovici and
Bianca Schroeder Cosmic rays don't strike twice:
understanding the nature of DRAM errors
and the implications for system design 111--122
Siva Kumar Sastry Hari and
Sarita V. Adve and
Helia Naeimi and
Pradeep Ramachandran Relyzer: exploiting application-level
fault equivalence to analyze application
resiliency to transient faults . . . . . 123--134
Peter Feiner and
Angela Demke Brown and
Ashvin Goel Comprehensive kernel instrumentation via
dynamic binary translation . . . . . . . 135--146
Rei Odaira and
Toshio Nakatani Continuous object access profiling and
optimizations to overcome the memory
wall and bloat . . . . . . . . . . . . . 147--158
Joseph L. Greathouse and
Hongyi Xin and
Yixin Luo and
Todd Austin A case for unlimited watchpoints . . . . 159--172
Marek Olszewski and
Qin Zhao and
David Koh and
Jason Ansel and
Saman Amarasinghe Aikido: accelerating shared data dynamic
analyses . . . . . . . . . . . . . . . . 173--184
Baris Kasikci and
Cristian Zamfir and
George Candea Data races vs. data race bugs: telling
the difference with Portend . . . . . . 185--198
Austin T. Clements and
M. Frans Kaashoek and
Nickolai Zeldovich Scalable address spaces using RCU
balanced trees . . . . . . . . . . . . . 199--210
Haris Volos and
Andres Jaan Tack and
Michael M. Swift and
Shan Lu Applying transactional memory to
concurrency bugs . . . . . . . . . . . . 211--222
José A. Joao and
M. Aater Suleman and
Onur Mutlu and
Yale N. Patt Bottleneck identification and scheduling
in multithreaded applications . . . . . 223--234
Petar Radojkovi\'c and
Vladimir Cakarevi\'c and
Miquel Moretó and
Javier Verdú and
Alex Pajuelo and
Francisco J. Cazorla and
Mario Nemirovsky and
Mateo Valero Optimal task assignment in multithreaded
processors: a statistical approach . . . 235--248
Aamer Jaleel and
Hashem H. Najaf-abadi and
Samantika Subramaniam and
Simon C. Steely and
Joel Emer CRUISE: cache replacement and
utility-aware scheduling . . . . . . . . 249--260
Matthew DeVuyst and
Ashish Venkat and
Dean M. Tullsen Execution migration in a
heterogeneous-ISA chip multiprocessor 261--272
Changhui Lin and
Vijay Nagarajan and
Rajiv Gupta and
Bharghava Rajaram Efficient sequential consistency via
conflict ordering . . . . . . . . . . . 273--286
David Cheriton and
Amin Firoozshahian and
Alex Solomatnikov and
John P. Stevenson and
Omid Azizi HICAMP: architectural support for
efficient concurrency-safe shared
structured data access . . . . . . . . . 287--300
Hadi Esmaeilzadeh and
Adrian Sampson and
Luis Ceze and
Doug Burger Architecture support for disciplined
approximate programming . . . . . . . . 301--312
David Meisner and
Thomas F. Wenisch DreamWeaver: architectural support for
deep sleep . . . . . . . . . . . . . . . 313--324
Myron King and
Nirav Dave and
Arvind Automatic generation of
hardware/software interfaces . . . . . . 325--336
Lorenzo Martignoni and
Stephen McCamant and
Pongsin Poosankam and
Dawn Song and
Petros Maniatis Path-exploration lifting: hi-fi tests
for lo-fi emulators . . . . . . . . . . 337--348
Sungpack Hong and
Hassan Chafi and
Edic Sedlar and
Kunle Olukotun Green-Marl: a DSL for easy and efficient
graph analysis . . . . . . . . . . . . . 349--362
Yongjun Park and
Sangwon Seo and
Hyunchul Park and
Hyoun Kyu Cho and
Scott Mahlke SIMD defragmenter: efficient ILP
realization on data-parallel
architectures . . . . . . . . . . . . . 363--374
Dilip Nijagal Simha and
Maohua Lu and
Tzi-cker Chiueh An update-aware storage system for
low-locality update-intensive workloads 375--386
Adrian M. Caulfield and
Todor I. Mollov and
Louis Alex Eisner and
Arup De and
Joel Coburn and
Steven Swanson Providing safe, user space access to
fast, solid state disks . . . . . . . . 387--400
Dushyanth Narayanan and
Orion Hodson Whole-system persistence . . . . . . . . 401--410
Abel Gordon and
Nadav Amit and
Nadav Har'El and
Muli Ben-Yehuda and
Alex Landau and
Assaf Schuster and
Dan Tsafrir ELI: bare-metal performance for I/O
virtualization . . . . . . . . . . . . . 411--422
Nedeljko Vasi\'c and
Dejan Novakovi\'c and
Svetozar Miucin and
Dejan Kosti\'c and
Ricardo Bianchini DejaVu: accelerating resource allocation
in virtualized environments . . . . . . 423--436
Jakub Szefer and
Ruby B. Lee Architectural support for
hypervisor-secure virtualization . . . . 437--450
Min Lee and
Karsten Schwan Region scheduling: efficiently using the
cache architectures via page-level
affinity . . . . . . . . . . . . . . . . 451--462
B. H. H. Juurlink and
C. H. Meenderinck Amdahl's law for predicting the future
of multicores considered harmful . . . . 1--9
Conrad Mueller Axiom based architecture . . . . . . . . 10--17
Alexander Thomasian Rebuild processing in RAID5 with
emphasis on the supplementary parity
augmentation method . . . . . . . . . . 18--27
Nishant Kumar Giri and
Amitabha Sinha FPGA implementation of a novel
architecture for performance enhancement
of Radix-2 FFT . . . . . . . . . . . . . 28--32
Aniruddha Ghosh and
Satrughna Singha and
Amitabha Sinha A new architecture for FPGA
implementation of a MAC unit for digital
signal processors using mixed number
system . . . . . . . . . . . . . . . . . 33--38
Aniruddha Ghosh and
Satrughna Singha and
Amitabha Sinha ``Floating point RNS'': a new concept
for designing the MAC unit of digital
signal processor . . . . . . . . . . . . 39--43
Mark Thorson Internet nuggets . . . . . . . . . . . . 44--49
Jamie Liu and
Ben Jaiyen and
Richard Veras and
Onur Mutlu RAIDR: Retention-Aware Intelligent DRAM
Refresh . . . . . . . . . . . . . . . . 1--12
Mahdi Nazm Bojnordi and
Engin Ipek PARDIS: a programmable memory controller
for the DDRx interfacing standards . . . 13--24
Doe Hyun Yoon and
Jichuan Chang and
Naveen Muralimanohar and
Parthasarathy Ranganathan BOOM: enabling mobile memory based
low-power server DIMMs . . . . . . . . . 25--36
Krishna T. Malladi and
Benjamin C. Lee and
Frank A. Nothaft and
Christos Kozyrakis and
Karthika Periyathambi and
Mark Horowitz Towards energy-proportional datacenter
memory with mobile DRAM . . . . . . . . 37--48
Nicolas Brunie and
Sylvain Collange and
Gregory Diamos Simultaneous branch and warp
interweaving for sustained GPU
performance . . . . . . . . . . . . . . 49--60
Minsoo Rhu and
Mattan Erez CAPRI: prediction of compaction-adequacy
for handling control-divergence in GPGPU
architectures . . . . . . . . . . . . . 61--71
Jaikrishnan Menon and
Marc De Kruijf and
Karthikeyan Sankaralingam iGPU: exception support and speculative
execution on GPUs . . . . . . . . . . . 72--83
José-María Arnau and
Joan-Manuel Parcerisa and
Polychronis Xekalakis Boosting mobile GPU performance with a
decoupled access/execute fragment
processor . . . . . . . . . . . . . . . 84--93
Mehmet Kayaalp and
Meltem Ozsoy and
Nael Abu-Ghazaleh and
Dmitry Ponomarev Branch regulation: low-overhead
protection from code reuse attacks . . . 94--105
John Demme and
Robert Martin and
Adam Waksman and
Simha Sethumadhavan Side-channel vulnerability factor: a
metric for measuring information leakage 106--117
Robert Martin and
John Demme and
Simha Sethumadhavan TimeWarp: rethinking timekeeping and
performance monitoring mechanisms to
mitigate side-channel attacks . . . . . 118--129
Jonathan Valamehr and
Melissa Chase and
Seny Kamara and
Andrew Putnam and
Dan Shumow and
Vinod Vaikuntanathan and
Timothy Sherwood Inspection resistant memory:
architectural support for security from
physical examination . . . . . . . . . . 130--141
Yi Xu and
Jun Yang and
Rami Melhem Tolerating process variations in
nanophotonic on-chip networks . . . . . 142--152
Pranay Koka and
Michael O. McCracken and
Herb Schwetman and
Chia-Hsin Owen Chen and
Xuezhe Zheng and
Ron Ho and
Kannan Raj and
Ashok V. Krishnamoorthy A micro-architectural analysis of
switched photonic multi-chip
interconnects . . . . . . . . . . . . . 153--164
Aaron Carpenter and
Jianyun Hu and
Ovunc Kocabas and
Michael Huang and
Hui Wu Enhancing effective throughput for
transmission line-based bus . . . . . . 165--176
Michihiro Koibuchi and
Hiroki Matsutani and
Hideharu Amano and
D. Frank Hsu and
Henri Casanova A case for random shortcut topologies
for HPC interconnects . . . . . . . . . 177--188
Santosh Nagarakatte and
Milo M. K. Martin and
Steve Zdancewic Watchdog: hardware for safe and secure
manual memory management and full memory
safety . . . . . . . . . . . . . . . . . 189--200
Joseph Devietti and
Benjamin P. Wood and
Karin Strauss and
Luis Ceze and
Dan Grossman and
Shaz Qadeer RADISH: always-on sound and complete
\underlineRace \underlineDetection
\underlinein \underlineSoftware and
\underlineHardware . . . . . . . . . . . 201--212
Kenzo Van Craeynest and
Aamer Jaleel and
Lieven Eeckhout and
Paolo Narvaez and
Joel Emer Scheduling heterogeneous multi-cores
through Performance Impact Estimation
(PIE) . . . . . . . . . . . . . . . . . 213--224
Ting Cao and
Stephen M. Blackburn and
Tiejun Gao and
Kathryn S. McKinley The yin and yang of power and
performance for asymmetric hardware and
managed software . . . . . . . . . . . . 225--236
Evgeni Krimer and
Patrick Chiang and
Mattan Erez Lane decoupling for improving the
timing-error resiliency of wide-SIMD
architectures . . . . . . . . . . . . . 237--248
Timothy N. Miller and
Renji Thomas and
Xiang Pan and
Radu Teodorescu VRSync: characterizing and eliminating
synchronization-induced voltage
emergencies in many-core processors . . 249--260
Ioannis Doudalis and
Milos Prvulovic Euripus: a flexible unified hardware
memory checkpointing accelerator for
bidirectional-debugging and reliability 261--272
Arun Arvind Nair and
Stijn Eyerman and
Lieven Eeckhout and
Lizy Kurian John A first-order mechanistic model for
architectural vulnerability factor . . . 273--284
Aniruddha N. Udipi and
Naveen Muralimanohar and
Rajeev Balsubramonian and
Al Davis and
Norman P. Jouppi LOT-ECC: localized and tiered
reliability mechanisms for commodity
memory systems . . . . . . . . . . . . . 285--296
Arkaprava Basu and
Mark D. Hill and
Michael M. Swift Reducing memory reference energy with
opportunistic virtual caching . . . . . 297--308
Zhe Wang and
Samira M. Khan and
Daniel A. Jiménez Improving writeback efficiency with
decoupled last-write prediction . . . . 309--320
Jaewoong Sim and
Jaekyu Lee and
Moinuddin K. Qureshi and
Hyesoon Kim FLEXclusion: balancing cache capacity
and on-chip bandwidth via flexible
exclusion . . . . . . . . . . . . . . . 321--332
Gaurang Upasani and
Xavier Vera and
Antonio González Setting an error detection
infrastructure with low cost acoustic
wave detectors . . . . . . . . . . . . . 333--343
Andrea Pellegrini and
Joseph L. Greathouse and
Valeria Bertacco Viper: virtual pipelines for enhanced
reliability . . . . . . . . . . . . . . 344--355
Olivier Temam A defect-tolerant accelerator for
emerging high-performance applications 356--367
Yoongu Kim and
Vivek Seshadri and
Donghyuk Lee and
Jamie Liu and
Onur Mutlu A case for exploiting subarray-level
parallelism (SALP) in DRAM . . . . . . . 368--379
Moinuddin K. Qureshi and
Michele M. Franceschini and
Ashish Jagmohan and
Luis A. Lastras PreSET: improving performance of phase
change memories by exploiting asymmetry
in write times . . . . . . . . . . . . . 380--391
Elliott Cooper-Balis and
Paul Rosenfeld and
Bruce Jacob Buffer-on-board memory systems . . . . . 392--403
Myoungsoo Jung and
Ellis H. Wilson III and
Mahmut Kandemir Physically Addressed Queueing (PAQ):
improving parallelism in solid state
disks . . . . . . . . . . . . . . . . . 404--415
Rachata Ausavarungnirun and
Kevin Kai-Wei Chang and
Lavanya Subramanian and
Gabriel H. Loh and
Onur Mutlu Staged memory scheduling: achieving high
performance and scalability in
heterogeneous systems . . . . . . . . . 416--427
R. Manikantan and
Kaushik Rajan and
R. Govindarajan Probabilistic Shared Cache Management
(PriSM) . . . . . . . . . . . . . . . . 428--439
Nadathur Satish and
Changkyu Kim and
Jatin Chhugani and
Hideki Saito and
Rakesh Krishnaiyer and
Mikhail Smelyanskiy and
Milind Girkar and
Pradeep Dubey Can traditional programming bridge the
Ninja performance gap for parallel
computing applications? . . . . . . . . 440--451
Melanie Kambadur and
Kui Tang and
Martha A. Kim Harmony: collection and analysis of
parallel block vectors . . . . . . . . . 452--463
David Wentzlaff and
Christopher J. Jackson and
Patrick Griffin and
Anant Agarwal Configurable fine-grain protection for
multicore processor virtualization . . . 464--475
Jeongseob Ahn and
Seongwook Jin and
Jaehyuk Huh Revisiting hardware-assisted page walks
for virtualized systems . . . . . . . . 476--487
Vasileios Kontorinis and
Liuyi Eric Zhang and
Baris Aksanli and
Jack Sampson and
Houman Homayoun and
Eddie Pettis and
Dean M. Tullsen and
Tajana Simunic Rosing Managing distributed UPS energy for
effective power capping in data centers 488--499
Pejman Lotfi-Kamran and
Boris Grot and
Michael Ferdman and
Stavros Volos and
Onur Kocberber and
Javier Picorel and
Almutaz Adileh and
Djordje Jevdjic and
Sachin Idgunji and
Emre Ozer and
Babak Falsafi Scale-out processors . . . . . . . . . . 500--511
Chao Li and
Amer Qouneh and
Tao Li iSwitch: coordinating and optimizing
renewable energy powered server clusters 512--523
Abhayendra Singh and
Satish Narayanasamy and
Daniel Marino and
Todd Millstein and
Madanlal Musuvathi End-to-end sequential consistency . . . 524--535
Jason Mars and
Naveen Kumar BlockChop: dynamic squash elimination
for hybrid processor architecture . . . 536--547
Doe Hyun Yoon and
Min Kyu Jeong and
Michael Sullivan and
Mattan Erez The dynamic granularity memory system 548--559
Marcos K. Aguilera and
Dahlia Malkhi and
Keith Marzullo and
Alessandro Panconesi and
Andrzej Pelc and
Roger Wattenhofer Announcing the 2012 Edsger W. Dijkstra
Prize in Distributed Computing . . . . . 1--2
Subhashis Maitra and
Amitabha Sinha A new algorithm for computing
triple-base number system . . . . . . . 3--9
Shiv Kumar and
Seshadri Krishna Murthy and
G. Varaprasad and
S. Sivasathya Network load and traffic pattern on the
capacity of wireless ad hoc networks . . 10--25
M. N. Isa and
K. Benkrid and
T. Clayton Efficient architecture and scheduling
technique for pairwise sequence
alignment . . . . . . . . . . . . . . . 26--31
A. K. Oudjida and
N. Chaillet and
M. L. Berrandjia and
A. Liacha A new high radix-2 $r$ ($ r \geq 8$)
multibit recoding algorithm for large
operand size ($ N \geq 32$) multipliers 32--43
Mark Thorson Internet nuggets . . . . . . . . . . . . 44--48
Hideharu Amano and
Wayne Luk FPGA-based Connect6 solver with
hardware-accelerated move refinement . . 4--9
Thomas C. P. Chau and
Wayne Luk and
Peter Y. K. Cheung Roberts: reconfigurable platform for
benchmarking real-time systems . . . . . 10--15
Kei Kinoshita and
Daisuke Takano and
Tomoyuki Okamura and
Tetsuhiko Yao and
Yoshiki Yamaguchi An augmented reality system with a
coarse-grained reconfigurable device . . 16--21
Nicholas Ng and
Nobuko Yoshida and
Xin Yu Niu and
Kuen Hung Tsoi Session types: towards safe and fast
reconfigurable programming . . . . . . . 22--27
Rizwan Syed and
Yajun Ha and
Bharadwaj Veeravalli A low overhead abstract architecture for
FPGA resource management . . . . . . . . 28--33
Kuen Hung Tsoi and
Tobias Becker and
Wayne Luk Modelling reconfigurable systems in
event driven simulation . . . . . . . . 34--39
Zheng Zhi Shun and
Tsutomu Maruyama FPGA acceleration of CDO pricing based
on correlation expansions . . . . . . . 40--45
Hiroki Nakahara and
Hiroyuki Nakanishi and
Tsutomu Sasao On a wideband Fast Fourier Transform for
a radio telescope . . . . . . . . . . . 46--51
Cheng Ling and
Khaled Benkrid and
Tsuyoshi Hamada High performance phylogenetic analysis
on CUDA-compatible GPUs . . . . . . . . 52--57
Colin Yu Lin and
Hayden Kwok-Hay Kwok-Hay So Energy-efficient dataflow computations
on FPGAs using application-specific
coarse-grain architecture synthesis . . 58--63
Jamshaid Sarwar Malik and
Paolo Palazzari and
Ahmed Hemani Effort, resources, and abstraction vs
performance in high-level synthesis:
finding new answers to an old question 64--69
Takeshi Kakimoto and
Keisuke Dohi and
Yuichiro Shibata and
Kiyoshi Oguri Performance comparison of GPU
programming frameworks with the striped
Smith--Waterman algorithm . . . . . . . 70--75
Julien Tribino and
Antoine Trouvé and
Hadrien A. Clarke and
Kazuaki J. Murakami PASTIS: a photonic arbitration with
scalable token injection scheme . . . . 76--81
Takahiro Watanabe and
Minoru Watanabe $ 0.18 \mu $ m CMOS process
high-sensitivity optically
reconfigurable gate array VLSI . . . . . 82--86
Shogo Nakaya and
Makoto Miyamura and
Noboru Sakimura and
Yuichi Nakamura and
Tadahiko Sugibayashi A non-volatile reconfigurable offloader
for wireless sensor nodes . . . . . . . 87--92
Mark Thorson Internet nuggets . . . . . . . . . . . . 93--112
Michael Bond GPUDet: a deterministic GPU architecture 1--12
Hyojin Sung and
Rakesh Komuravelli and
Sarita V. Adve DeNovoND: efficient hardware support for
disciplined non-determinism . . . . . . 13--26
Benjamin Wester and
David Devecsery and
Peter M. Chen and
Jason Flinn and
Satish Narayanasamy Parallelizing data race detection . . . 27--38
Brandon Lucia and
Luis Ceze Cooperative empirical failure avoidance
for multithreaded programs . . . . . . . 39--50
Íñigo Goiri and
William Katsak and
Kien Le and
Thu D. Nguyen and
Ricardo Bianchini Parasol and GreenSwitch: managing
datacenters powered by renewable energy 51--64
Kai Shen and
Arrvindh Shriraman and
Sandhya Dwarkadas and
Xiao Zhang and
Zhuan Chen Power containers: an OS facility for
fine-grained power and energy management
on multicore servers . . . . . . . . . . 65--76
Christina Delimitrou and
Christos Kozyrakis Paragon: QoS-aware scheduling for
heterogeneous datacenters . . . . . . . 77--88
Lingjia Tang and
Jason Mars and
Wei Wang and
Tanima Dey and
Mary Lou Soffa ReQoS: reactive static/dynamic
compilation for QoS in warehouse scale
computers . . . . . . . . . . . . . . . 89--100
Joy Arulraj and
Po-Chun Chang and
Guoliang Jin and
Shan Lu Production-run software failure
diagnosis via hardware performance
counters . . . . . . . . . . . . . . . . 101--112
Wei Zhang and
Marc de Kruijf and
Ang Li and
Shan Lu and
Karthikeyan Sankaralingam ConAir: featherweight concurrency bug
recovery via single-threaded idempotent
execution . . . . . . . . . . . . . . . 113--126
Nicolas Viennot and
Siddharth Nair and
Jason Nieh Transparent mutable replay for multicore
debugging and patch validation . . . . . 127--138
Swarup Kumar Sahoo and
John Criswell and
Chase Geigle and
Vikram Adve Using likely invariants for automated
software fault localization . . . . . . 139--152
Eric Paulos The rise of the expert amateur: DIY
culture and the evolution of computer
science . . . . . . . . . . . . . . . . 153--154
Arun Raghavan and
Laurel Emurian and
Lei Shao and
Marios Papaefthymiou and
Kevin P. Pipe and
Thomas F. Wenisch and
Milo M. K. Martin Computational sprinting on a
hardware/software testbed . . . . . . . 155--166
Wonsun Ahn and
Yuelu Duan and
Josep Torrellas DeAliaser: alias speculation using
atomic region support . . . . . . . . . 167--180
Heekwon Park and
Seungjae Baek and
Jongmoo Choi and
Donghee Lee and
Sam H. Noh Regularities considered harmful: forcing
randomness to memory accesses to reduce
row buffer conflicts for multi-core,
multi-bank systems . . . . . . . . . . . 181--192
Nima Honarmand and
Nathan Dautenhahn and
Josep Torrellas and
Samuel T. King and
Gilles Pokam and
Cristiano Pereira Cyrus: unintrusive application-level
record-replay for replay parallelism . . 193--206
Augusto Born de Oliveira and
Sebastian Fischmeister and
Amer Diwan and
Matthias Hauswirth and
Peter F. Sweeney Why you should care about quantile
regression . . . . . . . . . . . . . . . 207--218
Charlie Curtsinger and
Emery D. Berger STABILIZER: statistically sound
performance evaluation . . . . . . . . . 219--228
Lokesh Gidra and
Gaël Thomas and
Julien Sopena and
Marc Shapiro A study of the scalability of
stop-the-world garbage collectors on
multicores . . . . . . . . . . . . . . . 229--240
Daniel S. McFarlin and
Charles Tucker and
Craig Zilles Discerning the dominant out-of-order
performance advantage: is it speculation
or dynamism? . . . . . . . . . . . . . . 241--252
Stephen Checkoway and
Hovav Shacham Iago attacks: why the system call API is
a bad untrusted RPC interface . . . . . 253--264
Owen S. Hofmann and
Sangman Kim and
Alan M. Dunn and
Michael Z. Lee and
Emmett Witchel InkTag: secure applications on an
untrusted operating system . . . . . . . 265--278
Cristiano Giuffrida and
Anton Kuijsten and
Andrew S. Tanenbaum Safe and automatic live update for
operating systems . . . . . . . . . . . 279--292
Haohui Mai and
Edgar Pek and
Hui Xue and
Samuel Talmadge King and
Parthasarathy Madhusudan Verifying security invariants in
ExpressOS . . . . . . . . . . . . . . . 293--304
Eric Schkufza and
Rahul Sharma and
Alex Aiken Stochastic superoptimization . . . . . . 305--316
Eric Schulte and
Jonathan DiLorenzo and
Westley Weimer and
Stephanie Forrest Automated repair of binary and assembly
programs for cooperating embedded
devices . . . . . . . . . . . . . . . . 317--328
Heming Cui and
Gang Hu and
Jingyue Wu and
Junfeng Yang Verifying systems rules using
rule-directed symbolic execution . . . . 329--342
Xiaoya Xiang and
Chen Ding and
Hao Luo and
Bin Bao HOTL: a higher order theory of locality 343--356
Hui Kang and
Jennifer L. Wong To hardware prefetch or not to
prefetch?: a virtualized environment
study and core binding approach . . . . 357--368
Hwanju Kim and
Sangwook Kim and
Jinkyu Jeong and
Joonwon Lee and
Seungryoul Maeng Demand-based coordinated scheduling for
SMP VMs . . . . . . . . . . . . . . . . 369--380
Mohammad Dashti and
Alexandra Fedorova and
Justin Funston and
Fabien Gaud and
Renaud Lachaize and
Baptiste Lepers and
Vivien Quema and
Mark Roth Traffic management: a holistic approach
to memory placement on NUMA systems . . 381--394
Adwait Jog and
Onur Kayiran and
Nachiappan Chidambaram Nachiappan and
Asit K. Mishra and
Mahmut T. Kandemir and
Onur Mutlu and
Ravishankar Iyer and
Chita R. Das OWL: cooperative thread array aware
scheduling techniques for improving
GPGPU performance . . . . . . . . . . . 395--406
Sreepathi Pai and
Matthew J. Thazhuthaveetil and
R. Govindarajan Improving GPGPU concurrency with elastic
kernels . . . . . . . . . . . . . . . . 407--418
Taewook Oh and
Hanjun Kim and
Nick P. Johnson and
Jae W. Lee and
David I. August Practical automatic loop specialization 419--430
Phitchaya Mangpo Phothilimthana and
Jason Ansel and
Jonathan Ragan-Kelley and
Saman Amarasinghe Portable performance on heterogeneous
architectures . . . . . . . . . . . . . 431--444
Aashish Mittal and
Dushyant Bansal and
Sorav Bansal and
Varun Sethi Efficient virtualization on embedded
Power Architecture\reg platforms . . . . 445--458
Mark D. Hill Research directions for 21st century
computer systems: ASPLOS 2013 panel . . 459--460
Anil Madhavapeddy and
Richard Mortier and
Charalampos Rotsos and
David Scott and
Balraj Singh and
Thomas Gazagnaire and
Steven Smith and
Steven Hand and
Jon Crowcroft Unikernels: library operating systems
for the cloud . . . . . . . . . . . . . 461--472
Asim Kadav and
Matthew J. Renzelmann and
Michael M. Swift Fine-grained fault tolerance using
device checkpoints . . . . . . . . . . . 473--484
Mark Silberstein and
Bryan Ford and
Idit Keidar and
Emmett Witchel GPUfs: integrating a file system with
GPUs . . . . . . . . . . . . . . . . . . 485--498
Nicholas Hunt and
Tom Bergan and
Luis Ceze and
Steven D. Gribble DDOS: taming nondeterminism in
distributed systems . . . . . . . . . . 499--508
Cheng Wang and
Youfeng Wu TSO\_ATOMICITY: efficient hardware
primitive for TSO-preserving region
optimizations . . . . . . . . . . . . . 509--520
Syed Ali Raza Jafri and
Gwendolyn Voskuilen and
T. N. Vijaykumar Wait-n-GoTM: improving HTM performance
by serializing cyclic dependencies . . . 521--534
Xuehai Qian and
Josep Torrellas and
Benjamin Sahelices and
Depei Qian Volition: scalable and precise
sequential consistency violation
detection . . . . . . . . . . . . . . . 535--548
J. P. Grossman and
Jeffrey S. Kuskin and
Joseph A. Bank and
Michael Theobald and
Ron O. Dror and
Douglas J. Ierardi and
Richard H. Larson and
U. Ben Schafer and
Brian Towles and
Cliff Young and
David E. Shaw Hardware support for fine-grained
event-driven computation in Anton 2 . . 549--560
Amitabha Sinha and
Mitrava Sarkar and
Soumojit Acharyya and
Suranjan Chakraborty A novel reconfigurable architecture of a
DSP processor for efficient mapping of
DSP functions using field programmable
DSP arrays . . . . . . . . . . . . . . . 1--8
Amrita Saha and
Manideepa Mukherjee and
Debanjana Datta and
Sangita Saha and
Amitabha Sinha Performance analysis of a FPGA based
novel binary and DBNS multiplier . . . . 9--16
Michael Sartin-Tarm and
Tony Nowatzki and
Lorenzo De Carli and
Karthikeyan Sankaralingam and
Cristian Estan Constraint centric scheduling guide . . 17--21
Apala Guha and
Yao Zhang and
Raihan ur Rasool and
Andrew A. Chien Systematic evaluation of workload
clustering for extremely
energy-efficient architectures . . . . . 22--29
Amrita Saha and
Pijush Biswas and
Amitabha Sinha An integrated development platform of a
reconfigurable radio processor for
software defined radio . . . . . . . . . 30--35
Santanu Pal and
Amitabha Sinha and
Pijush Biswas FPGA implementation of a novel DCT
architecture reducing constant cosine
terms . . . . . . . . . . . . . . . . . 36--40
Kuo-Kun Tseng and
Fu-Fu Zeng and
Huang-Nan Huang and
Yiming Liu and
Jeng-Shyang Pan and
W. H. Ip and
C. H. Wu A new non-exact Aho--Corasick framework
for ECG classification . . . . . . . . . 41--46
Subhashis Maitra and
Amitabha Sinha High performance MAC unit for DSP and
cryptographic applications . . . . . . . 47--55
Mark Thorson Internet nuggets . . . . . . . . . . . . 56--71
Bilel Belhadj and
Antoine Joubert and
Zheng Li and
Rodolphe Héliot and
Olivier Temam Continuous real-world inputs can open up
alternative accelerator designs . . . . 1--12
Paula Petrica and
Adam M. Izraelevitz and
David H. Albonesi and
Christine A. Shoemaker Flicker: a dynamically adaptive
architecture for power limited multicore
systems . . . . . . . . . . . . . . . . 13--23
Wajahat Qadeer and
Rehan Hameed and
Ofer Shacham and
Preethi Venkatesan and
Christos Kozyrakis and
Mark A. Horowitz Convolution engine: balancing efficiency
& flexibility in specialized computing 24--35
Kevin Lim and
David Meisner and
Ali G. Saidi and
Parthasarathy Ranganathan and
Thomas F. Wenisch Thin servers with smart pipes: designing
SoC accelerators for memcached . . . . . 36--47
Janani Mukundan and
Hillery Hunter and
Kyu-hyoun Kim and
Jeffrey Stuecheli and
José F. Martínez Understanding and mitigating refresh
overheads in high-density DDR4 DRAM
systems . . . . . . . . . . . . . . . . 48--59
Jamie Liu and
Ben Jaiyen and
Yoongu Kim and
Chris Wilkerson and
Onur Mutlu An experimental study of data retention
behavior in modern DRAM devices:
implications for retention time
profiling mechanisms . . . . . . . . . . 60--71
Prashant J. Nair and
Dae-Hyun Kim and
Moinuddin K. Qureshi ArchShield: architectural framework for
assisting DRAM scaling by tolerating
high error rates . . . . . . . . . . . . 72--83
Saugata Ghose and
Hyodong Lee and
José F. Martínez Improving memory scheduling via
processor-side load criticality
information . . . . . . . . . . . . . . 84--95
Canturk Isci and
Suzanne McIntosh and
Jeffrey Kephart and
Rajarshi Das and
James Hanson and
Scott Piper and
Robert Wolford and
Thomas Brey and
Robert Kantner and
Allen Ng and
James Norris and
Abdoulaye Traore and
Michael Frissora Agile, efficient virtualization power
management with low-latency server power
states . . . . . . . . . . . . . . . . . 96--107
Cheng-Chun Tu and
Chao-tang Lee and
Tzi-cker Chiueh Secure I/O device sharing among virtual
machines on multiple hosts . . . . . . . 108--119
Xiaotao Chang and
Hubertus Franke and
Yi Ge and
Tao Liu and
Kun Wang and
Jimi Xenidis and
Fei Chen and
Yu Zhang Improving virtualization in the presence
of software managed translation
lookaside buffers . . . . . . . . . . . 120--129
Ji Kim and
Christopher Torng and
Shreesha Srinath and
Derek Lockhart and
Christopher Batten Microarchitectural mechanisms to exploit
value structure in SIMT architectures 130--141
Angshuman Parashar and
Michael Pellauer and
Michael Adler and
Bushra Ahsan and
Neal Crago and
Daniel Lustig and
Vladimir Pavlov and
Antonia Zhai and
Mohit Gambhir and
Aamer Jaleel and
Randy Allmon and
Rachid Rayess and
Stephen Maresh and
Joel Emer Triggered instructions: a control
paradigm for spatially-programmed
architectures . . . . . . . . . . . . . 142--153
José A. Joao and
M. Aater Suleman and
Onur Mutlu and
Yale N. Patt Utility-based acceleration of
multithreaded applications on asymmetric
CMPs . . . . . . . . . . . . . . . . . . 154--165
Daniel Kudrow and
Kenneth Bier and
Zhaoxia Deng and
Diana Franklin and
Yu Tomita and
Kenneth R. Brown and
Frederic T. Chong Quantum rotations: a case study in
static and dynamic machine-code
generation for quantum computers . . . . 166--176
Richard A. Muscat and
Karin Strauss and
Luis Ceze and
Georg Seelig DNA-based molecular architecture with
spatially localized components . . . . . 177--188
Qing Guo and
Xiaochen Guo and
Ravi Patel and
Engin Ipek and
Eby G. Friedman AC-DIMM: associative computing with
STT-MRAM . . . . . . . . . . . . . . . . 189--200
Blake A. Hechtman and
Daniel J. Sorin Exploring memory consistency for
massively-threaded throughput-oriented
processors . . . . . . . . . . . . . . . 201--212
Yuelu Duan and
Abdullah Muzahid and
Josep Torrellas WeeFence: toward making fences free in
TSO . . . . . . . . . . . . . . . . . . 213--224
Harold W. Cain and
Maged M. Michael and
Brad Frey and
Cathy May and
Derek Williams and
Hung Le Robust architectural support for
transactional memory in the Power
architecture . . . . . . . . . . . . . . 225--236
Arkaprava Basu and
Jayneel Gandhi and
Jichuan Chang and
Mark D. Hill and
Michael M. Swift Efficient virtual memory for big memory
servers . . . . . . . . . . . . . . . . 237--248
Lisa Wu and
Raymond J. Barker and
Martha A. Kim and
Kenneth A. Ross Navigating big data with
high-throughput, energy-efficient data
partitioning . . . . . . . . . . . . . . 249--260
Eric S. Chung and
John D. Davis and
Jaewon Lee LINQits: big data on little clients . . 261--272
Islam Atta and
Pinar Tözün and
Xin Tong and
Anastasia Ailamaki and
Andreas Moshovos STREX: boosting instruction cache reuse
in OLTP workloads through stratified
transaction execution . . . . . . . . . 273--284
Indrani Paul and
Srilatha Manne and
Manish Arora and
W. Lloyd Bircher and
Sudhakar Yalamanchili Cooperative boosting: needy versus
greedy power management . . . . . . . . 285--296
Anys Bacha and
Radu Teodorescu Dynamic reduction of voltage margins by
leveraging on-chip ECC in Itanium II
processors . . . . . . . . . . . . . . . 297--307
Henry Cook and
Miquel Moreto and
Sarah Bird and
Khanh Dao and
David A. Patterson and
Krste Asanovic A hardware evaluation of cache
partitioning to improve utilization and
energy-efficiency while preserving
responsiveness . . . . . . . . . . . . . 308--319
Reetuparna Das and
Satish Narayanasamy and
Sudhir K. Satpathy and
Ronald G. Dreslinski Catnap: energy proportional multiple
network-on-chip . . . . . . . . . . . . 320--331
Adwait Jog and
Onur Kayiran and
Asit K. Mishra and
Mahmut T. Kandemir and
Onur Mutlu and
Ravishankar Iyer and
Chita R. Das Orchestrated scheduling and prefetching
for GPGPUs . . . . . . . . . . . . . . . 332--343
Naifeng Jing and
Yao Shen and
Yao Lu and
Shrikanth Ganapathy and
Zhigang Mao and
Minyi Guo and
Ramon Canal and
Xiaoyao Liang An energy-efficient and scalable
eDRAM-based register file architecture
for GPGPU . . . . . . . . . . . . . . . 344--355
Minsoo Rhu and
Mattan Erez Maximizing SIMD resource utilization in
GPGPUs with SIMD lane permutation . . . 356--367
Aniruddha S. Vaidya and
Anahita Shayesteh and
Dong Hyuk Woo and
Roy Saharoy and
Mani Azimi SIMD divergence optimization through
intra-warp compaction . . . . . . . . . 368--379
Young Hoon Son and
O. Seongil and
Yuhwan Ro and
Jae W. Lee and
Jung Ho Ahn Reducing memory access latency with
asymmetric DRAM bank organizations . . . 380--391
Ziyi Liu and
JongHyuk Lee and
Junyuan Zeng and
Yuanfeng Wen and
Zhiqiang Lin and
Weidong Shi CPU transparent protection of OS kernel
and hypervisor integrity with
programmable DRAM . . . . . . . . . . . 392--403
Djordje Jevdjic and
Stavros Volos and
Babak Falsafi Die-stacked DRAM caches for servers: hit
ratio, latency, or bandwidth? Have it
all with footprint cache . . . . . . . . 404--415
Jaewoong Sim and
Gabriel H. Loh and
Vilas Sridharan and
Mike O'Connor Resilient die-stacked DRAM caches . . . 416--427
Yu Du and
Miao Zhou and
Bruce R. Childers and
Daniel Mossé and
Rami Melhem Bit mapping for balanced PCM cell
programming . . . . . . . . . . . . . . 428--439
Nak Hee Seong and
Sungkap Yeo and
Hsien-Hsin S. Lee Tri-level-cell phase change memory:
toward an efficient and reliable memory
system . . . . . . . . . . . . . . . . . 440--451
Rodolfo Azevedo and
John D. Davis and
Karin Strauss and
Parikshit Gopalan and
Mark Manasse and
Sergey Yekhanin Zombie memory: extending memory lifetime
by reviving dead blocks . . . . . . . . 452--463
Adrian M. Caulfield and
Steven Swanson QuickSAN: a storage area network for
fast, distributed, solid state disks . . 464--474
Daniel Sanchez and
Christos Kozyrakis ZSim: fast and accurate
microarchitectural simulation of
thousand-core systems . . . . . . . . . 475--486
Jingwen Leng and
Tayler Hetherington and
Ahmed ElTantawy and
Syed Gilani and
Nam Sung Kim and
Tor M. Aamodt and
Vijay Janapa Reddi GPUWattch: enabling energy optimizations
in GPGPUs . . . . . . . . . . . . . . . 487--498
Meng-Ju Wu and
Minshu Zhao and
Donald Yeung Studying multicore processor scaling via
reuse distance analysis . . . . . . . . 499--510
Kristof Du Bois and
Stijn Eyerman and
Jennifer B. Sartor and
Lieven Eeckhout Criticality stacks: identifying critical
threads in parallel programs using
synchronization behavior . . . . . . . . 511--522
George Kurian and
Omer Khan and
Srinivas Devadas The locality-aware adaptive cache
coherence protocol . . . . . . . . . . . 523--534
Stefanos Kaxiras and
Alberto Ros A new perspective for efficient
virtual-cache coherence . . . . . . . . 535--546
Hongzhou Zhao and
Arrvindh Shriraman and
Snehasish Kumar and
Sandhya Dwarkadas Protozoa: adaptive granularity cache
coherence . . . . . . . . . . . . . . . 547--558
John Demme and
Matthew Maycock and
Jared Schmitz and
Adrian Tang and
Adam Waksman and
Simha Sethumadhavan and
Salvatore Stolfo On the feasibility of online malware
detection with performance counters . . 559--570
Ling Ren and
Xiangyao Yu and
Christopher W. Fletcher and
Marten van Dijk and
Srinivas Devadas Design space exploration and
optimization of path oblivious RAM in
secure processors . . . . . . . . . . . 571--582
Hassan M. G. Wassel and
Ying Gao and
Jason K. Oberg and
Ted Huffmire and
Ryan Kastner and
Frederic T. Chong and
Timothy Sherwood SurfNoC: a low latency and provably
non-interfering approach to secure
networks-on-chip . . . . . . . . . . . . 583--594
Di Wang and
Chuangang Ren and
Anand Sivasubramaniam Virtualizing power distribution in
datacenters . . . . . . . . . . . . . . 595--606
Hailong Yang and
Alex Breslow and
Jason Mars and
Lingjia Tang Bubble-Flux: precise online QoS
management for increased utilization in
warehouse scale computers . . . . . . . 607--618
Jason Mars and
Lingjia Tang Whare-map: heterogeneity in
``homogeneous'' warehouse-scale
computers . . . . . . . . . . . . . . . 619--630
Nikos Foutris and
Dimitris Gizopoulos and
Xavier Vera and
Antonio Gonzalez Deconfigurable microprocessor
architectures for silicon debug
acceleration . . . . . . . . . . . . . . 631--642
Gilles Pokam and
Klaus Danne and
Cristiano Pereira and
Rolf Kassa and
Tim Kranich and
Shiliang Hu and
Justin Gottschlich and
Nima Honarmand and
Nathan Dautenhahn and
Samuel T. King and
Josep Torrellas QuickRec: prototyping an Intel
architecture extension for record and
replay of multithreaded programs . . . . 643--654
Ruirui Huang and
Erik Halberg and
G. Edward Suh Non-race concurrency bug detection
through order-sensitive critical
sections . . . . . . . . . . . . . . . . 655--666
Subhashis Maitra and
Amitabha Sinha High efficiency MAC unit used in digital
signal processing and elliptic curve
cryptography . . . . . . . . . . . . . . 1--7
Tomislav Janjusic and
Krishna Kavi Gleipnir: a memory profiling and tracing
tool . . . . . . . . . . . . . . . . . . 8--12
Mark Thorson Internet nuggets . . . . . . . . . . . . 13--22
Ivan Godard The Mill: split-stream encoding . . . . 1--5
Alexander Thomasian Disk arrays with multiple RAID levels 6--24
Subhashis Maitra and
Amitabha Sinha Design and simulation of MAC unit using
combinational circuit and adder . . . . 25--33
Thomas C. P. Chau and
James S. Targett and
Marlon Wijeyasinghe and
Wayne Luk and
Peter Y. K. Cheung and
Benjamin Cope and
Alison Eele and
Jan Maciejowski Accelerating sequential Monte Carlo
method for real-time air traffic
management . . . . . . . . . . . . . . . 35--40
Atabak Mahram and
Martin C. Herbordt NCBI BLASTP on the Convey HC1-EX . . . . 41--46
Kentaro Sano and
Yoshiaki Kono and
Hayato Suzuki and
Ryotaro Chiba and
Ryo Ito and
Tomohiro Ueno and
Kyo Koizumi and
Satoru Yamamoto Efficient custom computing of
fully-streamed lattice Boltzmann method
on tightly-coupled FPGA cluster . . . . 47--52
Wim Vanderbauwhede and
Anton Frolov and
Sai Rahul Chalamalasetti and
Martin Margala A hybrid CPU--FPGA system for high
throughput (10Gb/s) streaming document
classification . . . . . . . . . . . . . 53--58
Ce Guo and
Wayne Luk and
Ekaterina Vinkovskaya and
Rama Cont Customisable pipelined engine for
intensity evaluation in multivariate
Hawkes point processes . . . . . . . . . 59--64
Heiner Giefers and
Christian Plessl and
Jens Förstner Accelerating finite difference time
domain simulations with reconfigurable
dataflow computers . . . . . . . . . . . 65--70
Yuki Ogawa and
Masahiro Iida and
Motoki Amagasaki and
Morihiro Kuga and
Toshinori Sueyoshi A reconfigurable Java accelerator with
software compatibility for embedded
systems . . . . . . . . . . . . . . . . 71--76
Takeshi Ohkawa and
Daichi Uetake and
Takashi Yokota and
Kanemitsu Ootsu and
Takanobu Baba Reconfigurable and hardwired ORB engine
on FPGA by Java-to-HDL synthesizer for
realtime application . . . . . . . . . . 77--82
Florent de Dinechin and
Matei Istoan and
Guillaume Sergent Fixed-point trigonometric functions on
FPGAs . . . . . . . . . . . . . . . . . 83--88
Jubee Tada Performance evaluation of $3$-D stacked
$ 32$-bit parallel multipliers . . . . . 89--94
Yuichiroh Tanaka and
Shimpei Sato and
Kenji Kise The UltraSmall soft processor . . . . . 95--100
Liucheng Guo and
David B. Thomas and
Wayne Luk Customisable architectures for the set
covering problem . . . . . . . . . . . . 101--106
Gary Plumbridge and
Jack Whitham and
Neil Audsley Blueshell: a platform for rapid
prototyping of multiprocessor NoCs and
accelerators . . . . . . . . . . . . . . 107--117
Chuan Hong and
Khaled Benkrid and
Nazrin Isa and
Xabier Iturbe A run-time reconfigurable system for
adaptive high performance efficient
computing . . . . . . . . . . . . . . . 113--118
Mark Thorson Internet nuggets . . . . . . . . . . . . 119--127
Al Davis Inside Windows Azure: the challenges and
opportunities of a cloud operating
system . . . . . . . . . . . . . . . . . 1--2
Stanko Novakovic and
Alexandros Daglis and
Edouard Bugnion and
Babak Falsafi and
Boris Grot Scale-out NUMA . . . . . . . . . . . . . 3--18
Sandeep R. Agrawal and
Valentin Pistol and
Jun Pang and
John Tran and
David Tarjan and
Alvin R. Lebeck Rhythm: harnessing data parallel
hardware for server workloads . . . . . 19--34
Mehrzad Samadi and
Davoud Anoushe Jamshidi and
Janghaeng Lee and
Scott Mahlke Paraprox: pattern-based approximation
for data parallel applications . . . . . 35--50
James Bornholt and
Todd Mytkowicz and
Kathryn S. McKinley Uncertain$<$ t$>$: a first-order type for
uncertain data . . . . . . . . . . . . . 51--66
Nuno Santos and
Himanshu Raj and
Stefan Saroiu and
Alec Wolman Using ARM trustzone to build a trusted
language runtime for mobile applications 67--80
John Criswell and
Nathan Dautenhahn and
Vikram Adve Virtual Ghost: protecting applications
from hostile operating systems . . . . . 81--96
Xun Li and
Vineeth Kashyap and
Jason K. Oberg and
Mohit Tiwari and
Vasanth Ram Rajarathinam and
Ryan Kastner and
Timothy Sherwood and
Ben Hardekopf and
Frederic T. Chong Sapper: a language for hardware-level
security policy enforcement . . . . . . 97--112
Radu Banabic and
George Candea and
Rachid Guerraoui Finding Trojan message vulnerabilities
in distributed systems . . . . . . . . . 113--126
Christina Delimitrou and
Christos Kozyrakis Quasar: resource-efficient and QoS-aware
cluster management . . . . . . . . . . . 127--144
Seyed Majid Zahedi and
Benjamin C. Lee REF: resource elasticity fairness with
sharing incentives for multiprocessors 145--160
Thannirmalai Somu Muthukaruppan and
Anuj Pathania and
Tulika Mitra Price theory based power management for
heterogeneous multi-cores . . . . . . . 161--176
Di Wang and
Sriram Govindan and
Anand Sivasubramaniam and
Aman Kansal and
Jie Liu and
Badriddine Khessib Underprovisioning backup power
infrastructure for datacenters . . . . . 177--192
Xiao Yu and
Shi Han and
Dongmei Zhang and
Tao Xie Comprehending performance from
real-world execution traces: a
device-driver case . . . . . . . . . . . 193--206
Joy Arulraj and
Guoliang Jin and
Shan Lu Leveraging the short-term memory of
hardware to diagnose production-run
software failures . . . . . . . . . . . 207--222
Nima Honarmand and
Josep Torrellas RelaxReplay: record and replay for
relaxed-consistency multiprocessors . . 223--238
Stefan Bucur and
Johannes Kinder and
George Candea Prototyping symbolic execution engines
for interpreted languages . . . . . . . 239--254
Lisa Wu and
Andrea Lottarini and
Timothy K. Paine and
Martha A. Kim and
Kenneth A. Ross Q100: the architecture and design of a
database processing unit . . . . . . . . 255--268
Tianshi Chen and
Zidong Du and
Ninghui Sun and
Jia Wang and
Chengyong Wu and
Yunji Chen and
Olivier Temam DianNao: a small-footprint
high-throughput accelerator for
ubiquitous machine-learning . . . . . . 269--284
Felix Xiaozhu Lin and
Zhen Wang and
Lin Zhong K2: a mobile operating system for
heterogeneous coherence domains . . . . 285--300
Konstantinos Menychtas and
Kai Shen and
Michael L. Scott Disengaged scheduling for fair,
protected access to fast computational
accelerators . . . . . . . . . . . . . . 301--316
Jeff Gehlhaar Neuromorphic processing: a new frontier
in scaling computer architecture . . . . 317--318
Ardalan Amiri Sani and
Kevin Boos and
Shaopu Qin and
Lin Zhong I/O paravirtualization at the device
file boundary . . . . . . . . . . . . . 319--332
Christoffer Dall and
Jason Nieh KVM\slash ARM: the design and
implementation of the Linux ARM
hypervisor . . . . . . . . . . . . . . . 333--348
Nadav Amit and
Dan Tsafrir and
Assaf Schuster VSwapper: a memory swapper for
virtualized environments . . . . . . . . 349--366
Jeremy Andrus and
Alexander Van't Hof and
Naser AlDuaij and
Christoffer Dall and
Nicolas Viennot and
Jason Nieh Cider: native execution of iOS apps on
Android . . . . . . . . . . . . . . . . 367--382
Heiner Litz and
David Cheriton and
Amin Firoozshahian and
Omid Azizi and
John P. Stevenson SI-TM: reducing transactional memory
abort rates through snapshot isolation 383--398
Wenjia Ruan and
Trilok Vyas and
Yujie Liu and
Michael Spear Transactionalizing legacy code: an
experience report using GCC and
Memcached . . . . . . . . . . . . . . . 399--412
Adam Morrison and
Yehuda Afek Fence-free work stealing on bounded TSO
processors . . . . . . . . . . . . . . . 413--426
Derek R. Hower and
Blake A. Hechtman and
Bradford M. Beckmann and
Benedict R. Gaster and
Mark D. Hill and
Steven K. Reinhardt and
David A. Wood Heterogeneous-race-free memory models 427--440
Myoungsoo Jung and
Wonil Choi and
John Shalf and
Mahmut Taylan Kandemir Triple-A: a Non-SSD based autonomic
all-flash array for high performance
storage systems . . . . . . . . . . . . 441--454
Ren-Shuo Liu and
De-Yu Shen and
Chia-Lin Yang and
Shun-Chih Yu and
Cheng-Yuan Michael Wang NVM duet: unified working memory and
persistent store architecture . . . . . 455--470
Jian Ouyang and
Shiding Lin and
Song Jiang and
Zhenyu Hou and
Yong Wang and
Yuanzheng Wang SDF: software-defined flash for
Web-scale Internet storage systems . . . 471--484
Anthony Gutierrez and
Michael Cieslak and
Bharan Giridhar and
Ronald G. Dreslinski and
Luis Ceze and
Trevor Mudge Integrated $3$D-stacked server designs
for increasing physical density of
key-value stores . . . . . . . . . . . . 485--498
Donald Nguyen and
Andrew Lenharth and
Keshav Pingali Deterministic Galois: on-demand,
portable and parameterless . . . . . . . 499--512
Haris Ribic and
Yu David Liu Energy-efficient work-stealing language
runtimes . . . . . . . . . . . . . . . . 513--528
Todd Mytkowicz and
Madanlal Musuvathi and
Wolfram Schulte Data-parallel finite-state machines . . 529--542
Zhijia Zhao and
Bo Wu and
Xipeng Shen Challenging the ``embarrassingly
sequential'': parallelizing finite state
machine-based computations through
principled speculation . . . . . . . . . 543--558
Yanqi Zhou and
David Wentzlaff The sharing architecture: sub-core
configurability for IaaS clouds . . . . 559--574
Amos Waterland and
Elaine Angelino and
Ryan P. Adams and
Jonathan Appavoo and
Margo Seltzer ASC: automatically scalable computation 575--590
Stijn Eyerman and
Lieven Eeckhout The benefit of SMT in the multi-core
era: flexibility towards degrees of
thread-level parallelism . . . . . . . . 591--606
Yufei Ding and
Mingzhou Zhou and
Zhijia Zhao and
Sarah Eisenstat and
Xipeng Shen Finding the limit: examining the
potential and complexity of compilation
scheduling for JIT-based runtime systems 607--622
Marc Lupon and
Enric Gibert and
Grigorios Magklis and
Sridhar Samudrala and
Raúl Martínez and
Kyriakos Stavrou and
David R. Ditzel Speculative hardware/software
co-designed floating-point multiply-add
fusion . . . . . . . . . . . . . . . . . 623--638
Eric Schulte and
Jonathan Dorn and
Stephen Harding and
Stephanie Forrest and
Westley Weimer Post-compiler software optimization for
reducing energy . . . . . . . . . . . . 639--652
David A. Wood Resolved: specialized architectures,
languages, and system software should
supplant general-purpose alternatives
within a decade . . . . . . . . . . . . 653--654
Olatunji Ruwase and
Michael A. Kozuch and
Phillip B. Gibbons and
Todd C. Mowry Guardrail: a high fidelity approach to
protecting hardware devices from buggy
drivers . . . . . . . . . . . . . . . . 655--670
Benjamin P. Wood and
Luis Ceze and
Dan Grossman Low-level detection of language-level
data races with LARD . . . . . . . . . . 671--686
Jiaqi Zhang and
Lakshminarayanan Renganarayana and
Xiaolan Zhang and
Niyu Ge and
Vasanth Bala and
Tianyin Xu and
Yuanyuan Zhou EnCore: exploiting system environment
and correlation information for
misconfiguration detection . . . . . . . 687--700
Gwendolyn Voskuilen and
T. N. Vijaykumar High-performance fractal coherence . . . 701--714
Woo-Cheol Kwon and
Tushar Krishna and
Li-Shiuan Peh Locality-oblivious cache organization
leveraging single-cycle multi-hop NoCs 715--728
Harshad Kasture and
Daniel Sanchez Ubik: efficient cache sharing with
strict QoS for latency-critical
workloads . . . . . . . . . . . . . . . 729--742
Bharath Pichai and
Lisa Hsu and
Abhishek Bhattacharjee Architectural support for address
translation on GPUs: designing memory
management units for CPU/GPUs with
unified address spaces . . . . . . . . . 743--758
Subijit Mondal and
Subhashis Maitra Data security-modified AES algorithm and
its applications . . . . . . . . . . . . 1--8
Soumik Sen and
Subhashis Maitra Three levels three dimensional compact
coding . . . . . . . . . . . . . . . . . 9--14
Alexander Thomasian and
Bingxing Liu and
Yuhui Deng Balancing disk access times in RAID5
disk arrays in degraded mode by
conditionally prioritizing fork/join
requests . . . . . . . . . . . . . . . . 15--19
Jayneel Gandhi and
Arkaprava Basu and
Mark D. Hill and
Michael M. Swift BadgerTrap: a tool to instrument x86-64
TLB misses . . . . . . . . . . . . . . . 20--23
Mark Thorson Internet nuggets . . . . . . . . . . . . 24--36
Brian Towles and
J. P. Grossman and
Brian Greskamp and
David E. Shaw Unifying on-chip and inter-node
switching within the Anton 2 network . . 1--12
Andrew Putnam and
Adrian M. Caulfield and
Eric S. Chung and
Derek Chiou and
Kypros Constantinides and
John Demme and
Hadi Esmaeilzadeh and
Jeremy Fowers and
Gopi Prashanth and
Gopal Jan and
Gray Michael and
Haselman Scott Hauck and
Stephen Heil and
Amir Hormati and
Joo-Young Kim and
Sitaram Lanka and
James Larus and
Eric Peterson and
Simon Pope and
Aaron Smith and
Jason Thong and
Phillip Yi and
Xiao Doug Burger A reconfigurable fabric for accelerating
large-scale datacenter services . . . . 13--24
Bhavya K. Daya and
Chia-Hsin Owen Chen and
Suvinay Subramanian and
Woo-Cheol Kwon and
Sunghyun Park and
Tushar Krishna and
Jim Holt and
Anantha P. Chandrakasan and
Li-Shiuan Peh SCORPIO: a $ 36$-core research chip
demonstrating snoopy coherence on a
scalable mesh NoC with in-network
ordering . . . . . . . . . . . . . . . . 25--36
Gaurang Upasani and
Xavier Vera and
Antonio González Avoiding core's DUE & SDC via acoustic
wave detectors and tailored error
containment and recovery . . . . . . . . 37--48
Long Chen and
Zhao Zhang MemGuard: a low cost and energy
efficient design to support and enhance
memory system reliability . . . . . . . 49--60
Siva Kumar Sastry Hari and
Radha Venkatagiri and
Sarita V. Adve and
Helia Naeimi GangES: gang error simulation for
hardware resiliency evaluation . . . . . 61--72
Jack Wadden and
Alexander Lyashevsky and
Sudhanva Gurumurthi and
Vilas Sridharan and
Kevin Skadron Real-world design and evaluation of
compiler-managed GPU redundant
multithreading . . . . . . . . . . . . . 73--84
Tianshi Chen and
Qi Guo and
Ke Tang and
Olivier Temam and
Zhiwei Xu and
Zhi-Hua Zhou and
Yunji Chen ArchRanker: a ranking approach to design
space exploration . . . . . . . . . . . 85--96
Yakun Sophia Shao and
Brandon Reagen and
Gu-Yeon Wei and
David Brooks Aladdin: a Pre-RTL, power-performance
accelerator simulator enabling large
design space exploration of customized
architectures . . . . . . . . . . . . . 97--108
Mario Badr and
Natalie Enright Jerger SynFull: synthetic traffic models
capturing cache coherent behaviour . . . 109--120
Ashish Venkat and
Dean M. Tullsen Harnessing ISA diversity: design of a
heterogeneous-ISA chip multiprocessor 121--132
Andreas Sembrant and
Erik Hagersten and
David Black-Schaffer The Direct-to-Data (D2D) cache:
navigating the cache hierarchy with a
single lookup . . . . . . . . . . . . . 133--144
Angelos Arelakis and
Per Stenstrom SC2: a statistical compression cache
scheme . . . . . . . . . . . . . . . . . 145--156
Vivek Seshadri and
Abhishek Bhowmick and
Onur Mutlu and
Phillip B. Gibbons and
Michael A. Kozuch and
Todd C. Mowry The dirty-block index . . . . . . . . . 157--168
Lei Liu and
Yong Li and
Zehan Cui and
Yungang Bao and
Mingyu Chen and
Chengyong Wu Going vertical in memory management:
handling multiplicity by multi-policy 169--180
Marc S. Orr and
Bradford M. Beckmann and
Steven K. Reinhardt and
David A. Wood Fine-grain task aggregation and
coordination on GPUs . . . . . . . . . . 181--192
Ivan Tanasic and
Isaac Gelado and
Javier Cabezas and
Alex Ramirez and
Nacho Navarro and
Mateo Valero Enabling preemptive multiprogramming on
GPUs . . . . . . . . . . . . . . . . . . 193--204
Dani Voitsechov and
Yoav Etsion Single-graph multiple flows: energy
efficient design alternative for GPGPUs 205--216
Simone Campanoni and
Kevin Brownell and
Svilen Kanev and
Timothy M. Jones and
Gu-Yeon Wei and
David Brooks HELIX--RC: an architecture-compiler
co-design for automatic parallelization
of irregular programs . . . . . . . . . 217--228
James E. Smith Efficient digital neurons for large
scale cortical architectures . . . . . . 229--240
Karthik Swaminathan and
Huichu Liu and
Jack Sampson and
Vijaykrishnan Narayanan An examination of the architecture and
system-level tradeoffs of employing
steep slope devices in $3$D CMPs . . . . 241--252
Rangharajan Venkatesan and
Shankar Ganesh Ramasubramanian and
Swagath Venkataramani and
Kaushik Roy and
Anand Raghunathan STAG: spintronic-tape architecture for
GPGPU cache hierarchies . . . . . . . . 253--264
Steven Pelley and
Peter M. Chen and
Thomas F. Wenisch Memory persistency . . . . . . . . . . . 265--276
Morteza Hoseinzadeh and
Mohammad Arjomand and
Hamid Sarbazi-Azad Reducing access latency of MLC PCMs
through line striping . . . . . . . . . 277--288
Myoungsoo Jung and
Wonil Choi and
Shekhar Srikantaiah and
Joonhyuk Yoo and
Mahmut T. Kandemir HIOS: a host interface I/O scheduler for
solid state disks . . . . . . . . . . . 289--300
David Lo and
Liqun Cheng and
Rama Govindaraju and
Luiz André Barroso and
Christos Kozyrakis Towards energy proportionality for
large-scale latency-critical workloads 301--312
Yanpei Liu and
Stark C. Draper and
Nam Sung Kim SleepScale: runtime joint speed scaling
and sleep states management for power
efficient data centers . . . . . . . . . 313--324
Ming Liu and
Tao Li Optimizing virtual machine consolidation
performance on NUMA server architecture
for cloud workloads . . . . . . . . . . 325--336
Seongil O and
Young Hoon Son and
Nam Sung Kim and
Jung Ho Ahn Row-buffer decoupling: a case for
low-latency DRAM microarchitecture . . . 337--348
Tao Zhang and
Ke Chen and
Cong Xu and
Guangyu Sun and
Tao Wang and
Yuan Xie Half-DRAM: a high-bandwidth and
low-power DRAM architecture from the
rethinking of fine-grained activation 349--360
Yoongu Kim and
Ross Daly and
Jeremie Kim and
Chris Fallin and
Ji Hye Lee and
Donghyuk Lee and
Chris Wilkerson and
Konrad Lai and
Onur Mutlu Flipping bits in memory without
accessing them: an experimental study of
DRAM disturbance errors . . . . . . . . 361--372
Runjie Zhang and
Ke Wang and
Brett H. Meyer and
Mircea R. Stan and
Kevin Skadron Architecture implications of pads as a
scarce resource . . . . . . . . . . . . 373--384
Shaoming Chen and
Yue Hu and
Ying Zhang and
Lu Peng and
Jesse Ardonne and
Samuel Irving and
Ashok Srivastava Increasing off-chip bandwidth in
multi-core processors with switchable
pins . . . . . . . . . . . . . . . . . . 385--396
Lei Jiang and
Bo Zhao and
Jun Yang and
Youtao Zhang A low power and reliable charge pump
design for phase change memories . . . . 397--408
Gwendolyn Voskuilen and
T. N. Vijaykumar Fractal++: closing the performance gap
between fractal and conventional
coherence . . . . . . . . . . . . . . . 409--420
Xuehai Qian and
Benjamin Sahelices and
Josep Torrellas OmniOrder: directory-based conflict
serialization of transactions . . . . . 421--432
Xuehai Qian and
Benjamin Sahelices and
Depei Qian Pacifier: record and replay for
relaxed-consistency multiprocessors with
distributed directory protocol . . . . . 433--444
Nima Honarmand and
Josep Torrellas Replay debugging: leveraging record and
replay for program debugging . . . . . . 445--456
Jonathan Woodruff and
Robert N. M. Watson and
David Chisnall and
Simon W. Moore and
Jonathan Anderson and
Brooks Davis and
Ben Laurie and
Peter G. Neumann and
Robert Norton and
Michael Roe The CHERI capability model: revisiting
RISC in an age of risk . . . . . . . . . 457--468
Llu\"\is Vilanova and
Muli Ben-Yehuda and
Nacho Navarro and
Yoav Etsion and
Mateo Valero CODOMs: protecting software with
code-centric memory domains . . . . . . 469--480
Arthur Perais and
André Seznec EOLE: paving the way for an effective
implementation of value prediction . . . 481--492
Kenneth Czechowski and
Victor W. Lee and
Ed Grochowski and
Ronny Ronen and
Ronak Singhal and
Richard Vuduc and
Pradeep Dubey Improving the energy efficiency of big
cores . . . . . . . . . . . . . . . . . 493--504
Renée St. Amant and
Amir Yazdanbakhsh and
Jongse Park and
Bradley Thwaites and
Hadi Esmaeilzadeh and
Arjang Hassibi and
Luis Ceze and
Doug Burger General-purpose code acceleration with
limited-precision analog computation . . 505--516
Advait Madhavan and
Timothy Sherwood and
Dmitri Strukov Race logic: a hardware acceleration for
dynamic programming algorithms . . . . . 517--528
Jose-Maria Arnau and
Joan-Manuel Parcerisa and
Polychronis Xekalakis Eliminating redundant fragment shader
executions on a mobile GPU via hardware
memoization . . . . . . . . . . . . . . 529--540
Yuhao Zhu and
Vijay Janapa Reddi WebCore: architectural support for
mobile Web browsing . . . . . . . . . . 541--552
Yuetsu Kodama and
Toshihiro Hanawa and
Taisuke Boku and
Mitsuhisa Sato PEACH2: an FPGA-based PCIe network
device for Tightly Coupled Accelerators 3--8
Shimpei Nomura and
Takuji Mitsuishi and
Jun Suzuki and
Yuki Hayashi and
Masaki Kan and
Hideharu Amano Performance Analysis of the Multi-GPU
System with ExpEther . . . . . . . . . . 9--14
Tsuyoshi Watanabe and
Naohito Nakasato GPU Accelerated Hybrid Tree Algorithm
for Collision Less $N$-body Simulations 15--20
Haruhisa Tsuyama and
Tsutomu Maruyama GPU and FPGA Acceleration of Level Set
Method . . . . . . . . . . . . . . . . . 21--25
Yu Tanabe and
Tsutomu Maruyama Fast and Accurate Optical Flow
Estimation using FPGA . . . . . . . . . 27--32
Cesar Torres-Huitzil and
Marco Aurelio Nuño-Maganda Area-time Efficient Implementation of
Local Adaptive Image Thresholding in
Reconfigurable Hardware . . . . . . . . 33--38
Diana Göhringer Reconfigurable Multiprocessor Systems:
Handling Hydras Heads --- A Survey . . . 39--44
Kentaro Sano and
Ryotaro Chiba and
Tomoya Ueno and
Hayato Suzuki and
Ryo Ito and
Satoru Yamamoto FPGA-based Custom Computing Architecture
for Large-Scale Fluid Simulation with
Building Cube Method . . . . . . . . . . 45--50
Tao Wang and
Guangyu Sun and
Jiahua Chen and
Jian Gong and
Haoyang Wu and
Xiaoguang Li and
Songwu Lu and
Jason Cong GRT: a Reconfigurable SDR Platform with
High Performance and Usability . . . . . 51--56
Yuki Ando and
Masataka Ogawa and
Yuya Mizoguchi and
Kouta Kumagai and
Miaw Torng-Der and
Shinya Honda A Case Study of FPGA Blokus Duo Solver
by System-Level Design . . . . . . . . . 57--62
Mioara Joldes and
Valentina Popescu and
Warwick Tucker Searching for Sinks for the Hénon Map
using a Multiple-precision GPU
Arithmetic Library . . . . . . . . . . . 63--68
Rie Soejima and
Koji Okina and
Keisuke Dohi and
Yuichiro Shibata and
Kiyoshi Oguri A Memory Profiling Framework for Stencil
Computation on an FPGA Accelerator with
High Level Synthesis . . . . . . . . . . 69--74
Shin Morishima and
Hiroki Matsutani Performance Evaluations of Graph
Database using CUDA and OpenMP
Compatible Libraries . . . . . . . . . . 75--80
Takuji Mitsuishi and
Shimpei Nomura and
Jun Suzuki and
Yuki Hayashi and
Masaki Kan and
Hideharu Amano Accelerating Breadth First Search on
GPU--BOX . . . . . . . . . . . . . . . . 81--86
Jose Nunez-Yanez Energy efficient Reconfigurable
Computing with Adaptive Voltage and
Logic scaling . . . . . . . . . . . . . 87--92
Mark Thorson Internet Nuggets . . . . . . . . . . . . 93--101
Mark Thorson Internet Nuggets . . . . . . . . . . . . 93--101
Ozcan Ozturk Architectural Support for Cyber-Physical
Systems . . . . . . . . . . . . . . . . 1--1
Yiying Zhang and
Jian Yang and
Amirsaman Memaripour and
Steven Swanson Mojim: a Reliable and Highly-Available
Non-Volatile Memory System . . . . . . . 3--18
Rujia Wang and
Lei Jiang and
Youtao Zhang and
Jun Yang SD--PCM: Constructing Reliable Super
Dense Phase Change Memory under Write
Disturbance . . . . . . . . . . . . . . 19--31
Vinson Young and
Prashant J. Nair and
Moinuddin K. Qureshi DEUCE: Write-Efficient Encryption for
Non-Volatile Memories . . . . . . . . . 33--44
Adam Morrison and
Yehuda Afek Temporally Bounding TSO for Fence-Free
Asymmetric Synchronization . . . . . . . 45--58
Alexander Matveev and
Nir Shavit Reduced Hardware NOrec: a Safe and
Scalable Hybrid Transactional Memory . . 59--71
Marc S. Orr and
Shuai Che and
Ayse Yilmazer and
Bradford M. Beckmann and
Mark D. Hill and
David A. Wood Synchronization Using Remote-Scope
Promotion . . . . . . . . . . . . . . . 73--86
Chang Liu and
Austin Harris and
Martin Maas and
Michael Hicks and
Mohit Tiwari and
Elaine Shi GhostRider: a Hardware-Software System
for Memory Trace Oblivious Computation 87--101
Christopher W. Fletcher and
Ling Ren and
Albert Kwon and
Marten van Dijk and
Srinivas Devadas Freecursive ORAM: [Nearly] Free
Recursion and Integrity Verification for
Position-based Oblivious RAM . . . . . . 103--116
David Chisnall and
Colin Rothwell and
Robert N. M. Watson and
Jonathan Woodruff and
Munraj Vadera and
Simon W. Moore and
Michael Roe and
Brooks Davis and
Peter G. Neumann Beyond the PDP-11: Architectural Support
for a Memory-Safe C Abstract Machine . . 117--130
Jiuyue Ma and
Xiufeng Sui and
Ninghui Sun and
Yupeng Li and
Zihao Yu and
Bowen Huang and
Tianni Xu and
Zhicheng Yao and
Yun Chen and
Haibin Wang and
Lixin Zhang and
Yungang Bao Supporting Differentiated Services in
Computers via Programmable Architecture
for Resourcing-on-Demand (PARD) . . . . 131--143
Yushi Omote and
Takahiro Shinagawa and
Kazuhiko Kato Improving Agility and Elasticity in
Bare-metal Clouds . . . . . . . . . . . 145--159
Md E. Haque and
Yong hun Eom and
Yuxiong He and
Sameh Elnikety and
Ricardo Bianchini and
Kathryn S. McKinley Few-to-Many: Incremental Parallelism for
Reducing Tail Latency in Interactive
Services . . . . . . . . . . . . . . . . 161--175
Patrick Colp and
Jiawen Zhang and
James Gleeson and
Sahil Suneja and
Eyal de Lara and
Himanshu Raj and
Stefan Saroiu and
Alec Wolman Protecting Data on Smartphones and
Tablets from Memory Attacks . . . . . . 177--189
Nathan Dautenhahn and
Theodoros Kasampalis and
Will Dietz and
John Criswell and
Vikram Adve Nested Kernel: an Operating System
Architecture for Intra-Kernel Privilege
Separation . . . . . . . . . . . . . . . 191--206
Zhangxi Tan and
Zhenghao Qian and
Xi Chen and
Krste Asanovic and
David Patterson DIABLO: a Warehouse-Scale Computer
Network Simulator using FPGAs . . . . . 207--221
Johann Hauswald and
Michael A. Laurenzano and
Yunqi Zhang and
Cheng Li and
Austin Rovinski and
Arjun Khurana and
Ronald G. Dreslinski and
Trevor Mudge and
Vinicius Petrucci and
Lingjia Tang and
Jason Mars Sirius: an Open End-to-End Voice and
Vision Personal Assistant and Its
Implications for Future Warehouse Scale
Computers . . . . . . . . . . . . . . . 223--238
Chao Xu and
Felix Xiaozhu Lin and
Yuyang Wang and
Lin Zhong Automated OS-level Device Runtime Power
Management . . . . . . . . . . . . . . . 239--252
Íñigo Goiri and
Thu D. Nguyen and
Ricardo Bianchini CoolAir: Temperature- and
Variation-Aware Management for
Free-Cooled Datacenters . . . . . . . . 253--265
Nikita Mishra and
Huazhe Zhang and
John D. Lafferty and
Henry Hoffmann A Probabilistic Graphical Model-based
Approach for Minimizing Energy Under
Performance Constraints . . . . . . . . 267--281
Jun Pang and
Chris Dwyer and
Alvin R. Lebeck More is Less, Less is More:
Molecular-Scale Photonic NoC Power
Topologies . . . . . . . . . . . . . . . 283--296
Vilas Sridharan and
Nathan DeBardeleben and
Sean Blanchard and
Kurt B. Ferreira and
Jon Stearley and
John Shalf and
Sudhanva Gurumurthi Memory Errors in Modern Systems: The
Good, The Bad, and The Ugly . . . . . . 297--310
Yavuz Yetim and
Sharad Malik and
Margaret Martonosi CommGuard: Mitigating Communication
Errors in Error-Prone Parallel Execution 311--323
Dohyeong Kim and
Yonghwi Kwon and
William N. Sumner and
Xiangyu Zhang and
Dongyan Xu Dual Execution for On the Fly Fine
Grained Execution Comparison . . . . . . 325--338
Petr Hosek and
Cristian Cadar VARAN the Unbelievable: an Efficient
$N$-version Execution Framework . . . . 339--353
Moshe Malka and
Nadav Amit and
Muli Ben-Yehuda and
Dan Tsafrir rIOMMU: Efficient IOMMU for I/O Devices
that Employ Ring Buffers . . . . . . . . 355--368
Daofu Liu and
Tianshi Chen and
Shaoli Liu and
Jinhong Zhou and
Shengyuan Zhou and
Olivier Teman and
Xiaobing Feng and
Xuehai Zhou and
Yunji Chen PuDianNao: a Polyvalent Machine Learning
Accelerator . . . . . . . . . . . . . . 369--381
Inigo Goiri and
Ricardo Bianchini and
Santosh Nagarakatte and
Thu D. Nguyen ApproxHadoop: Bringing Approximations to
MapReduce Frameworks . . . . . . . . . . 383--397
Michael Ringenburg and
Adrian Sampson and
Isaac Ackerman and
Luis Ceze and
Dan Grossman Monitoring and Debugging the Quality of
Results in Approximate Programs . . . . 399--411
Guruduth Banavar Watson and the Era of Cognitive
Computing . . . . . . . . . . . . . . . 413--413
Gordon Stewart and
Mahanth Gowda and
Geoffrey Mainland and
Bozidar Radunovic and
Dimitrios Vytiniotis and
Cristina Luengo Agullo Ziria: a DSL for Wireless Systems
Programming . . . . . . . . . . . . . . 415--428
Ravi Teja Mullapudi and
Vinay Vasista and
Uday Bondhugula PolyMage: Automatic Optimization for
Image Processing Pipelines . . . . . . . 429--443
Jeff Heckey and
Shruti Patil and
Ali JavadiAbhari and
Adam Holmes and
Daniel Kudrow and
Kenneth R. Brown and
Diana Franklin and
Frederic T. Chong and
Margaret Martonosi Compiler Management of Communication and
Parallelism for Quantum Computation . . 445--456
Muhammad Amber Hassaan and
Donald D. Nguyen and
Keshav K. Pingali Kinetic Dependence Graphs . . . . . . . 457--471
Stelios Sidiroglou-Douskos and
Eric Lahtinen and
Nathan Rittenhouse and
Paolo Piselli and
Fan Long and
Deokhwan Kim and
Martin Rinard Targeted Automatic Integer Overflow
Discovery Using Goal-Directed
Conditional Branch Enforcement . . . . . 473--486
Udit Dhawan and
Catalin Hritcu and
Raphael Rubin and
Nikos Vasilakis and
Silviu Chiricescu and
Jonathan M. Smith and
Thomas F. Knight, Jr. and
Benjamin C. Pierce and
Andre DeHon Architectural Support for
Software-Defined Metadata Processing . . 487--502
Danfeng Zhang and
Yao Wang and
G. Edward Suh and
Andrew C. Myers A Hardware Design Language for
Timing-Sensitive Information-Flow
Security . . . . . . . . . . . . . . . . 503--516
Matthew Hicks and
Cynthia Sturton and
Samuel T. King and
Jonathan M. Smith SPECS: a Lightweight Runtime Mechanism
for Protecting Software from
Security-Critical Processor Bugs . . . . 517--529
Yuelu Duan and
Nima Honarmand and
Josep Torrellas Asymmetric Memory Fences: Optimizing
Both Performance and Implementability 531--543
Hyojin Sung and
Sarita V. Adve DeNovoSync: Efficient Support for
Arbitrary Synchronization without
Writer-Initiated Invalidations . . . . . 545--559
Aritra Sengupta and
Swarnendu Biswas and
Minjia Zhang and
Michael D. Bond and
Milind Kulkarni Hybrid Static-Dynamic Analysis for
Statically Bounded Region
Serializability . . . . . . . . . . . . 561--575
Jade Alglave and
Mark Batty and
Alastair F. Donaldson and
Ganesh Gopalakrishnan and
Jeroen Ketema and
Daniel Poetzl and
Tyler Sorensen and
John Wickerson GPU Concurrency: Weak Behaviours and
Programming Assumptions . . . . . . . . 577--591
Jason Jong Kyu Park and
Yongjun Park and
Scott Mahlke Chimera: Collaborative Preemption for
Multitasking on a Shared GPU . . . . . . 593--606
Neha Agarwal and
David Nellans and
Mark Stephenson and
Mike O'Connor and
Stephen W. Keckler Page Placement Strategies for GPUs
within Heterogeneous Memory Systems . . 607--618
Zhijia Zhao and
Xipeng Shen On-the-Fly Principled Speculation for
FSM Parallelization . . . . . . . . . . 619--630
Tudor David and
Rachid Guerraoui and
Vasileios Trigonakis Asynchronized Concurrency: The Secret to
Scaling Concurrent Search Data
Structures . . . . . . . . . . . . . . . 631--644
Pramod Bhatotia and
Pedro Fonseca and
Umut A. Acar and
Björn B. Brandenburg and
Rodrigo Rodrigues iThreads: a Threading Library for
Parallel Incremental Computation . . . . 645--659
Lokesh Gidra and
Gaël Thomas and
Julien Sopena and
Marc Shapiro and
Nhan Nguyen NumaGiC: a Garbage Collector for Big
Data on Big NUMA Machines . . . . . . . 661--673
Khanh Nguyen and
Kai Wang and
Yingyi Bu and
Lu Fang and
Jianfei Hu and
Guoqing Xu FACADE: a Compiler and Runtime for
(Almost) Object-Bounded Big Data
Applications . . . . . . . . . . . . . . 675--690
Varun Agrawal and
Abhiroop Dabral and
Tapti Palit and
Yongming Shen and
Michael Ferdman Architectural Support for Dynamic
Linking . . . . . . . . . . . . . . . . 691--702
Andrew A. Chien and
Tung Thanh-Hoang and
Dilip Vasudevan and
Yuanwei Fang and
Amirali Shambayati $ 10 \times 10 $: a Case Study in
Highly-Programmable and Energy-Efficient
Heterogeneous Federated Architecture . . 2--9
Mark Thorson Internet Nuggets . . . . . . . . . . . . 10--16
Martin Herbordt and
Miriam Leeser Off-Loading LET Generation to PEACH2: a
Switching Hub for High Performance GPU
Clusters . . . . . . . . . . . . . . . . 3--8
Koji Okina and
Rie Soejima and
Kota Fukumoto and
Yuichiro Shibata and
Kiyoshi Oguri Power Performance Profiling of $3$-D
Stencil Computation on an FPGA
Accelerator for Efficient Pipeline
Optimization . . . . . . . . . . . . . . 9--14
Ahmad Lashgar and
Ebad Salehi and
Amirali Baniasadi A Case Study in Reverse Engineering
GPGPUs: Outstanding Memory Handling
Resources . . . . . . . . . . . . . . . 15--21
Ami Hayashi and
Yuta Tokusashi and
Hiroki Matsutani A Line Rate Outlier Filtering FPGA NIC
using 10GbE Interface . . . . . . . . . 22--27
Abhishek Kumar Jain and
Xiangwei Li and
Suhaib A. Fahmy and
Douglas L. Maskell Adapting the DySER Architecture with DSP
Blocks as an Overlay for the Xilinx Zynq 28--33
David de la Chevallerie and
Jens Korinth and
Andreas Koch ffLink: a Lightweight High-Performance
Open-Source PCI Express Gen3 Interface
for Reconfigurable Accelerators . . . . 34--39
Soukaina N. Hmid and
Jose G. F. Coutinho and
Wayne Luk A Transfer-Aware Runtime System for
Heterogeneous Asynchronous Parallel
Execution . . . . . . . . . . . . . . . 40--45
Ahmed Al-Wattar and
Shawki Areibi and
Gary Grewal Efficient Mapping and Allocation of
Execution Units to Task Graphs using an
Evolutionary Framework . . . . . . . . . 46--51
Amir Momeni and
Hamed Tabkhi and
Yash Ukidave and
Gunar Schirner and
David Kaeli Exploring the Efficiency of the OpenCL
Pipe Semantic on an FPGA . . . . . . . . 52--57
Takuji Mitsuishi and
Jun Suzuki and
Yuki Hayashi and
Masaki Kan and
Hideharu Amano Breadth First Search on Cost-efficient
Multi-GPU Systems . . . . . . . . . . . 58--63
Michael Mefenza and
Nicolas Edwards and
Christophe Bobda Interface Based Memory Synthesis of
Image Processing Applications in FPGA 64--69
Da Tong and
Viktor Prasanna High Throughput Sketch Based Online
Heavy Hitter Detection on FPGA . . . . . 70--75
Xinying Wang and
Phillip H. Jones and
Joseph Zambreno A Configurable Architecture for Sparse $
L U $ Decomposition on Matrices with
Arbitrary Patterns . . . . . . . . . . . 76--81
Kentaro Sano and
Fumiya Kono and
Naohito Nakasato and
Alexander Vazhenin and
Stanislav Sedukhin Stream Computation of Shallow Water
Equation Solver for FPGA-based $1$D
Tsunami Simulation . . . . . . . . . . . 82--87
Liucheng Guo and
Andreea Ingrid Funie and
David B. Thomas and
Haohuan Fu and
Wayne Luk Parallel Genetic Algorithms on Multiple
FPGAs . . . . . . . . . . . . . . . . . 86--93
Mark Thorson Internet Nuggets . . . . . . . . . . . . 94--100
Mark Thorson Internet Nuggets . . . . . . . . . . . . 7--11
Hadi Asgharimoghaddam and
Nam Sung Kim SpinWise: a Practical Energy-Efficient
Synchronization Technique for CMPs . . . 1--8
Lena E. Olson and
Mark D. Hill Probabilistic Directed Writebacks for
Exclusive Caches . . . . . . . . . . . . 9--18
Mark Thorson Internet Nuggets . . . . . . . . . . . . 19--22
Yuanyuan Zhou Programming Uncertain $<$T$>$ hings . . . 1--2
Sergi Abadal and
Albert Cabellos-Aparicio and
Eduard Alarcon and
Josep Torrellas WiSync: an Architecture for Fast
Synchronization through On-Chip Wireless
Communication . . . . . . . . . . . . . 3--17
Xiaodong Wang and
José F. Martínez ReBudget: Trading Off Efficiency vs.
Fairness in Market-Based Multicore
Resource Allocation via Runtime Budget
Reassignment . . . . . . . . . . . . . . 19--32
Haishan Zhu and
Mattan Erez Dirigent: Enforcing QoS for
Latency-Critical Tasks on Shared
Multicore Systems . . . . . . . . . . . 33--47
Yossi Kuperman and
Eyal Moscovici and
Joel Nider and
Razya Ladelsky and
Abel Gordon and
Dan Tsafrir Paravirtual Remote I/O . . . . . . . . . 49--65
Antoine Kaufmann and
SImon Peter and
Naveen Kr. Sharma and
Thomas Anderson and
Arvind Krishnamurthy High Performance Packet Processing with
FlexNIC . . . . . . . . . . . . . . . . 67--81
James Bornholt and
Antoine Kaufmann and
Jialin Li and
Arvind Krishnamurthy and
Emina Torlak and
Xi Wang Specifying and Checking File System
Crash-Consistency Models . . . . . . . . 83--98
Aravinda Prasad and
K. Gopinath Prudent Memory Reclamation in
Procrastination-Based Synchronization 99--112
Anurag Mukkara and
Nathan Beckmann and
Daniel Sanchez Whirlpool: Improving Dynamic Cache
Management with Static Data
Classification . . . . . . . . . . . . . 113--127
Myeongjae Jeon and
Yuxiong He and
Hwanju Kim and
Sameh Elnikety and
Scott Rixner and
Alan L. Cox TPC: Target-Driven Parallelism Combining
Prediction and Correction to Reduce Tail
Latency in Interactive Services . . . . 129--141
Fraser Brown and
Andres Nötzli and
Dawson Engler How to Build Static Checking Systems
Using Orders of Magnitude Less Code . . 143--157
Tong Zhang and
Dongyoon Lee and
Changhee Jung TxRace: Efficient Data Race Detection
Using Commodity Hardware Transactional
Memory . . . . . . . . . . . . . . . . . 159--173
Sidney Amani and
Alex Hixon and
Zilin Chen and
Christine Rizkallah and
Peter Chubb and
Liam O'Connor and
Joel Beeren and
Yutaka Nagashima and
Japheth Lim and
Thomas Sewell and
Joseph Tuong and
Gabriele Keller and
Toby Murray and
Gerwin Klein and
Gernot Heiser Cogent: Verifying High-Assurance File
System Implementations . . . . . . . . . 175--188
Nils Asmussen and
Marcus Völp and
Benedikt Nöthen and
Hermann Härtig and
Gerhard Fettweis M3: a Hardware/Operating-System
Co-Design to Tame Heterogeneous
Manycores . . . . . . . . . . . . . . . 189--203
Daniyal Liaqat and
Silviu Jingoi and
Eyal de Lara and
Ashvin Goel and
Wilson To and
Kevin Lee and
Italo De Moraes Garcia and
Manuel Saldana Sidewinder: an Energy Efficient and
Developer Friendly Heterogeneous
Architecture for Continuous Mobile
Sensing . . . . . . . . . . . . . . . . 205--215
Jonathan Balkind and
Michael McKeown and
Yaosheng Fu and
Tri Nguyen and
Yanqi Zhou and
Alexey Lavrov and
Mohammad Shahrad and
Adi Fuchs and
Samuel Payne and
Xiaohua Liang and
Matthew Matl and
David Wentzlaff OpenPiton: an Open Source Manycore
Research Framework . . . . . . . . . . . 217--232
Daniel Lustig and
Geet Sethi and
Margaret Martonosi and
Abhishek Bhattacharjee COATCheck: Verifying Memory Ordering at
the Hardware-OS Interface . . . . . . . 233--247
Alex Markuze and
Adam Morrison and
Dan Tsafrir True IOMMU Protection from DMA Attacks:
When Copy is Faster than Zero Copy . . . 249--262
Amro Awad and
Pratyusa Manadhata and
Stuart Haber and
Yan Solihin and
William Horne Silent Shredder: Zero-Cost Shredding for
Secure Non-Volatile Main Memory
Controllers . . . . . . . . . . . . . . 263--276
Youngjin Kwon and
Alan M. Dunn and
Michael Z. Lee and
Owen S. Hofmann and
Yuanzhong Xu and
Emmett Witchel Sego: Pervasive Trusted Metadata for
Efficiently Verified Untrusted System
Services . . . . . . . . . . . . . . . . 277--290
Dan Tsafrir Synopsis of the ASPLOS '16 Wild and
Crazy Ideas (WACI) Invited-Speakers
Session . . . . . . . . . . . . . . . . 291--294
R. Stanley Williams Brain Inspired Computing . . . . . . . . 295--295
Phitchaya Mangpo Phothilimthana and
Aditya Thakur and
Rastislav Bodik and
Dinakar Dhurjati Scaling up Superoptimization . . . . . . 297--310
Niranjan Hasabnis and
R. Sekar Lifting Assembly to Intermediate
Representation: a Novel Approach
Leveraging Compilers . . . . . . . . . . 311--324
Saurav Muralidharan and
Amit Roy and
Mary Hall and
Michael Garland and
Piyush Rai Architecture-Adaptive Code Variant
Tuning . . . . . . . . . . . . . . . . . 325--338
Xiaofeng Lin and
Yu Chen and
Xiaodong Li and
Junjie Mao and
Jiaquan He and
Wei Xu and
Yuanchun Shi Scalable Kernel TCP Design and
Implementation for Short-Lived
Connections . . . . . . . . . . . . . . 339--352
Izzat El Hajj and
Alexander Merritt and
Gerd Zellweger and
Dejan Milojicic and
Reto Achermann and
Paolo Faraboschi and
Wen-mei Hwu and
Timothy Roscoe and
Karsten Schwan SpaceJMP: Programming with Multiple
Virtual Address Spaces . . . . . . . . . 353--368
Felix Xiaozhu Lin and
Xu Liu \ttf memif: Towards Programming
Heterogeneous Memory Asynchronously . . 369--383
Wook-Hee Kim and
Jinwoong Kim and
Woongki Baek and
Beomseok Nam and
Youjip Won NVWAL: Exploiting NVRAM in Write-Ahead
Logging . . . . . . . . . . . . . . . . 385--398
Aasheesh Kolli and
Steven Pelley and
Ali Saidi and
Peter M. Chen and
Thomas F. Wenisch High-Performance Transactions for
Persistent Memories . . . . . . . . . . 399--411
Qing Guo and
Karin Strauss and
Luis Ceze and
Henrique S. Malvar High-Density Image Storage Using
Approximate Memory Cells . . . . . . . . 413--426
Joseph Izraelevitz and
Terence Kelly and
Aasheesh Kolli Failure-Atomic Persistent Memory Updates
via JUSTDO Logging . . . . . . . . . . . 427--442
Jaeung Han and
Seungheun Jeon and
Young-ri Choi and
Jaehyuk Huh Interference Management for Distributed
Parallel Applications in Consolidated
Clusters . . . . . . . . . . . . . . . . 443--456
Martin Maas and
Krste Asanovi\'c and
Tim Harris and
John Kubiatowicz Taurus: a Holistic Language Runtime
System for Coordinating Distributed
Managed-Language Applications . . . . . 457--471
Christina Delimitrou and
Christos Kozyrakis HCloud: Resource-Efficient Provisioning
in Shared Cloud Systems . . . . . . . . 473--488
Xiao Yu and
Pallavi Joshi and
Jianwu Xu and
Guoliang Jin and
Hui Zhang and
Guofei Jiang CloudSeer: Workflow Monitoring of Cloud
Infrastructures via Interleaved Logs . . 489--502
Yonghwi Kwon and
Dohyeong Kim and
William Nick Sumner and
Kyungtae Kim and
Brendan Saltaformaggio and
Xiangyu Zhang and
Dongyan Xu LDX: Causality Inference by Lightweight
Dual Execution . . . . . . . . . . . . . 503--515
Tanakorn Leesatapornwongsa and
Jeffrey F. Lukman and
Shan Lu and
Haryadi S. Gunawi TaxDC: a Taxonomy of Non-Deterministic
Concurrency Bugs in Datacenter
Distributed Systems . . . . . . . . . . 517--530
Junjie Mao and
Yu Chen and
Qixue Xiao and
Yuanchun Shi RID: Finding Reference Count Bugs with
Inconsistent Path Pair Checking . . . . 531--544
Huazhe Zhang and
Henry Hoffmann Maximizing Performance Under a Power
Cap: a Comparison of Hardware, Software,
and Hybrid Techniques . . . . . . . . . 545--559
Songchun Fan and
Seyed Majid Zahedi and
Benjamin C. Lee The Computational Sprinting Game . . . . 561--575
Alexei Colin and
Graham Harvey and
Brandon Lucia and
Alanson P. Sample An Energy-interference-free
Hardware-Software Debugger for
Intermittent Energy-harvesting Systems 577--589
Emmett Witchel Programmer Productivity in a World of
Mushy Interfaces: Challenges of the
Post-ISA Reality . . . . . . . . . . . . 591--591
Kevin Angstadt and
Westley Weimer and
Kevin Skadron RAPID Programming of Pattern-Recognition
Processors . . . . . . . . . . . . . . . 593--605
Xin Sui and
Andrew Lenharth and
Donald S. Fussell and
Keshav Pingali Proactive Control of Approximate
Programs . . . . . . . . . . . . . . . . 607--621
Jongse Park and
Emmanuel Amaro and
Divya Mahajan and
Bradley Thwaites and
Hadi Esmaeilzadeh AxGames: Towards Crowdsourcing Quality
Target Determination in Approximate
Computing . . . . . . . . . . . . . . . 623--636
James Bornholt and
Randolph Lopez and
Douglas M. Carmean and
Luis Ceze and
Georg Seelig and
Karin Strauss A DNA-Based Archival Storage System . . 637--649
Raghu Prabhakar and
David Koeplinger and
Kevin J. Brown and
HyoukJoong Lee and
Christopher De Sa and
Christos Kozyrakis and
Kunle Olukotun Generating Configurable Hardware from
Parallel Patterns . . . . . . . . . . . 651--665
Li-Wen Chang and
Hee-Seok Kim and
Wen-mei W. Hwu DySel: Lightweight Dynamic Selection for
Kernel-based Data-parallel Programming
Model . . . . . . . . . . . . . . . . . 667--680
Quan Chen and
Hailong Yang and
Jason Mars and
Lingjia Tang Baymax: QoS Awareness and Increased
Utilization for Non-Preemptive
Accelerators in Warehouse Scale
Computers . . . . . . . . . . . . . . . 681--696
Tony Nowatzki and
Karthikeyan Sankaralingam Analyzing Behavior Specialized
Acceleration . . . . . . . . . . . . . . 697--711
Man-Ki Yoon and
Negin Salajegheh and
Yin Chen and
Mihai Christodorescu PIFT: Predictive Information-Flow
Tracking . . . . . . . . . . . . . . . . 713--725
Ashish Venkat and
Sriskanda Shamasunder and
Hovav Shacham and
Dean M. Tullsen HIPStR: Heterogeneous-ISA Program State
Relocation . . . . . . . . . . . . . . . 727--741
Zelalem Birhanu Aweke and
Salessawi Ferede Yitbarek and
Rui Qiao and
Reetuparna Das and
Matthew Hicks and
Yossi Oren and
Todd Austin ANVIL: Software-Based Protection Against
Next-Generation Rowhammer Attacks . . . 743--755
Diego Didona and
Nuno Diegues and
Anne-Marie Kermarrec and
Rachid Guerraoui and
Ricardo Neves and
Paolo Romano ProteusTM: Abstraction Meets Performance
in Transactional Memory . . . . . . . . 757--771
Noam Shalev and
Eran Harpaz and
Hagar Porat and
Idit Keidar and
Yaron Weinsberg CSR: Core Surprise Removal in Commodity
Operating Systems . . . . . . . . . . . 773--787
Tanmay Gangwani and
Adam Morrison and
Josep Torrellas CASPAR: Breaking Serialization in
Lock-Free Multicore Synchronization . . 789--804
Jorge Albericio and
Patrick Judd and
Tayler Hetherington and
Tor Aamodt and
Natalie Enright Jerger and
Andreas Moshovos Cnvlutin: ineffectual-neuron-free deep
neural network computing . . . . . . . . 1--13
Ali Shafiee and
Anirban Nag and
Naveen Muralimanohar and
Rajeev Balasubramonian and
John Paul Strachan and
Miao Hu and
R. Stanley Williams and
Vivek Srikumar ISAAC: a convolutional neural network
accelerator with in-situ analog
arithmetic in crossbars . . . . . . . . 14--26
Ping Chi and
Shuangchen Li and
Cong Xu and
Tao Zhang and
Jishen Zhao and
Yongpan Liu and
Yu Wang and
Yuan Xie PRIME: a novel processing-in-memory
architecture for neural network
computation in ReRAM-based main memory 27--39
Christopher Torng and
Moyang Wang and
Christopher Batten Asymmetry-aware work-stealing runtimes 40--52
Hung-Wei Tseng and
Qianchen Zhao and
Yuxiao Zhou and
Mark Gahagan and
Steven Swanson Morpheus: creating application objects
efficiently for heterogeneous computing 53--65
Divya Mahajan and
Amir Yazdanbakhsh and
Jongse Park and
Bradley Thwaites and
Hadi Esmaeilzadeh Towards statistical guarantees in
controlling quality tradeoffs for
approximate acceleration . . . . . . . . 66--77
Akanksha Jain and
Calvin Lin Back to the future: leveraging Belady's
algorithm for improved cache replacement 78--89
Caching Hyun Park and
Taekyung Heo and
Jaehyuk Huh Efficient synonym filtering and scalable
delayed translation for hybrid virtual 90--102
Hsiang-Yun Cheng and
Jishen Zhao and
Jack Sampson and
Mary Jane Irwin and
Aamer Jaleel and
Yu Lu and
Yuan Xie LAP: loop-block aware inclusion
properties for energy-efficient
asymmetric last level caches . . . . . . 103--114
David Koeplinger and
Christina Delimitrou and
Raghu Prabhakar and
Christos Kozyrakis and
Yaqi Zhang and
Kunle Olukotun Automatic generation of efficient
accelerators for reconfigurable hardware 115--127
Donggyu Kim and
Adam Izraelevitz and
Christopher Celio and
Hokeun Kim and
Brian Zimmer and
Yunsup Lee and
Jonathan Bachrach and
Krste Asanovi\'c Strober: fast and accurate sample-based
energy simulation for arbitrary RTL . . 128--139
Michael A. Laurenzano and
Yunqi Zhang and
Jiang Chen and
Lingjia Tang and
Jason Mars PowerChop: identifying and managing
non-critical units in hybrid processor
architectures . . . . . . . . . . . . . 140--152
Boncheol Gu and
Andre S. Yoon and
Duck-Ho Bae and
Insoon Jo and
Jinyoung Lee and
Jonghyun Yoon and
Jeong-Uk Kang and
Moonsang Kwon and
Chanho Yoon and
Sangyeun Cho and
Jaeheon Jeong and
Duckhyun Chang Biscuit: a framework for near-data
processing of big data workloads . . . . 153--165
Muhammet Mustafa Ozdal and
Serif Yesil and
Taemin Kim and
Andrey Ayupov and
John Greth and
Steven Burns and
Ozcan Ozturk Energy efficient architecture for graph
analytics accelerators . . . . . . . . . 166--177
Ikuo Magaki and
Moein Khazraee and
Luis Vega Gutierrez and
Michael Bedford Taylor ASIC clouds: specializing the datacenter 178--190
Yunho Oh and
Keunsoo Kim and
Myung Kuk Yoon and
Jong Hyun Park and
Yongjun Park and
Won Woo Ro and
Murali Annavaram APRES: improving cache efficiency by
exploiting load characteristics on GPUs 191--203
Kevin Hsieh and
Eiman Ebrahimi and
Gwangsun Kim and
Niladrish Chatterjee and
Mike O'Connor and
Nandita Vijaykumar and
Onur Mutlu and
Stephen W. Keckler Transparent offloading and mapping
(TOM): enabling programmer-transparent
near-data processing in GPU systems . . 204--216
Chang Hyun Park and
Taekyung Heo and
Jaehyuk Huh Efficient synonym filtering and scalable
delayed translation for hybrid virtual
caching . . . . . . . . . . . . . . . . 217--229
Qiumin Xu and
Hyeran Jeon and
Keunsoo Kim and
Won Woo Ro and
Murali Annavaram Warped-slicer: efficient intra-SM
slicing through dynamic resource
partitioning for GPU multiprogramming 230--242
Song Han and
Xingyu Liu and
Huizi Mao and
Jing Pu and
Ardavan Pedram and
Mark A. Horowitz and
William J. Dally EIE: efficient inference engine on
compressed deep neural network . . . . . 243--254
Robert LiKamWa and
Yunhui Hou and
Julian Gao and
Mia Polansky and
Lin Zhong RedEye: analog ConvNet image sensor
architecture for continuous mobile
vision . . . . . . . . . . . . . . . . . 255--266
Brandon Reagen and
Paul Whatmough and
Robert Adolf and
Saketh Rama and
Hyunkwang Lee and
Sae Kyu Lee and
José Miguel Hernández-Lobato and
Gu-Yeon Wei and
David Brooks Minerva: enabling low-power,
highly-accurate deep neural network
accelerators . . . . . . . . . . . . . . 267--278
Yuan Yao and
Zhonghai Lu Opportunistic competition overhead
reduction for expediting critical
section in NoC based CMPs . . . . . . . 279--290
Channoh Kim and
Sungmin Kim and
Hyeon Gyu Cho and
Dooyoung Kim and
Jaehyeok Kim and
Young H. Oh and
Hakbeom Jang and
Jae W. Lee Short-circuit dispatch: accelerating
virtual machine interpreters on embedded
processors . . . . . . . . . . . . . . . 291--303
Christoffer Dall and
Shih-Wei Li and
Jin Tack Lim and
Jason Nieh and
Georgios Koloventzos ARM virtualization: performance and
architectural implications . . . . . . . 304--316
Jayesh Gaur and
Alaa R. Alameldeen and
Sreenivas Subramoney Base-victim compression: an
opportunistic cache compression
architecture . . . . . . . . . . . . . . 317--328
Jungrae Kim and
Michael Sullivan and
Esha Choukse and
Mattan Erez Bit-plane compression: transforming data
for better compression in many-core
architectures . . . . . . . . . . . . . 329--340
Prashant J. Nair and
Vilas Sridharan and
Moinuddin K. Qureshi XED: exposing on-die error detection
information for strong memory
reliability . . . . . . . . . . . . . . 341--353
Mohammad Mejbah ul Alam and
Abdullah Muzahid Production-run software failure
diagnosis via \underlineadaptive
\underlinecommunication
\underlinetracking . . . . . . . . . . . 354--366
Yu-Hsin Chen and
Joel Emer and
Vivienne Sze Eyeriss: a spatial architecture for
energy-efficient dataflow for
convolutional neural networks . . . . . 367--379
Duckhwan Kim and
Jaeha Kung and
Sek Chai and
Sudhakar Yalamanchili and
Saibal Mukhopadhyay Neurocube: a programmable digital
neuromorphic architecture with
high-density $3$D memory . . . . . . . . 380--392
Shaoli Liu and
Zidong Du and
Jinhua Tao and
Dong Han and
Tao Luo and
Yuan Xie and
Yunji Chen and
Tianshi Chen Cambricon: an instruction set
architecture for neural networks . . . . 393--405
Ziqiang Huang and
Andrew D. Hilton and
Benjamin C. Lee Decoupling loads for nano-instruction
set computers . . . . . . . . . . . . . 406--417
Timothy Hayes and
Oscar Palomar and
Osman Unsal and
Adrian Cristal and
Mateo Valero Future vector microprocessor extensions
for data aggregations . . . . . . . . . 418--430
Faissal M. Sleiman and
Thomas F. Wenisch Efficiently scaling out-of-order cores
for simultaneous multithreading . . . . 431--443
Milad Hashemi and
Khubaib and
Eiman Ebrahimi and
Onur Mutlu and
Yale N. Patt Accelerating dependent cache misses with
an enhanced memory controller . . . . . 444--455
Yunqi Zhang and
David Meisner and
Jason Mars and
Lingjia Tang Treadmill: attributing the source of
tail latency through precise load
testing and statistical inference . . . 456--468
Qiang Wu and
Qingyuan Deng and
Lakshmi Ganesh and
Chang-Hong Hsu and
Yun Jin and
Sanjeev Kumar and
Bin Li and
Justin Meza and
Yee Jiun Song Dynamo: facebook's data center-wide
power management system . . . . . . . . 469--480
Daniel Wong Peak efficiency aware scheduling for
highly energy proportional servers . . . 481--492
Chao Li and
Zhenhua Wang and
Xiaofeng Hou and
Haopeng Chen and
Xiaoyao Liang and
Minyi Guo Power attack defense: securing
battery-backed data centers . . . . . . 493--505
Mingyu Gao and
Christina Delimitrou and
Dimin Niu and
Krishna T. Malladi and
Hongzhong Zheng and
Bob Brennan and
Christos Kozyrakis DRAF: a low-power DRAM-based
reconfigurable acceleration fabric . . . 506--518
Lunkai Zhang and
Brian Neely and
Diana Franklin and
Dmitri Strukov and
Yuan Xie and
Frederic T. Chong Mellow Writes: extending lifetime in
resistive memories through selective
slow write backs . . . . . . . . . . . . 519--531
Yanqi Zhou and
David Wentzlaff MITTS: memory inter-arrival time traffic
shaping . . . . . . . . . . . . . . . . 532--544
Joshua San Miguel and
Natalie Enright Jerger The anytime automaton . . . . . . . . . 545--557
Siyang Wang and
Xiangyu Zhang and
Yuxuan Li and
Ramin Bashizade and
Song Yang and
Chris Dwyer and
Alvin R. Lebeck Accelerating Markov random field
inference using molecular optical Gibbs
sampling units . . . . . . . . . . . . . 558--569
Yipeng Huang and
Ning Guo and
Mingoo Seok and
Yannis Tsividis and
Simha Sethumadhavan Evaluation of an analog accelerator for
linear algebra . . . . . . . . . . . . . 570--582
Jin Wang and
Norm Rubin and
Albert Sidelnik and
Sudhakar Yalamanchili LaPerm: locality aware scheduler for
dynamic parallelism on GPUs . . . . . . 583--595
Sagi Shahar and
Shai Bergman and
Mark Silberstein ActivePointers: a case for software
address translation on GPUs . . . . . . 596--608
Myung Kuk Yoon and
Keunsoo Kim and
Sangpil Lee and
Won Woo Ro and
Murali Annavaram Virtual thread: maximizing thread-level
parallelism beyond GPU scheduling limit 609--621
Jungrae Kim and
Michael Sullivan and
Sangkug Lym and
Mattan Erez All-inclusive ECC: thorough end-to-end
protection for reliable computer memory 622--633
Henry Duwe and
Xun Jian and
Daniel Petrisko and
Rakesh Kumar Rescuing uncorrectable fault patterns in
on-chip memories through error pattern
transformation . . . . . . . . . . . . . 634--644
Dong Wan Kim and
Mattan Erez RelaxFault memory repair . . . . . . . . 645--657
Raghavendra Pradyumna Pothukuchi and
Amin Ansari and
Petros Voulgaris and
Josep Torrellas Using multiple input, multiple output
formal control to maximize resource
efficiency in architectures . . . . . . 658--670
Hari Cherupalli and
Rakesh Kumar and
John Sartori Exploiting dynamic timing slack for
energy efficiency in ultra-low-power
embedded systems . . . . . . . . . . . . 671--681
Yanqi Zhou and
Henry Hoffmann and
David Wentzlaff CASH: supporting IaaS customers with a
sub-core configurable architecture . . . 682--694
Mohammad Arjomand and
Mahmut T. Kandemir and
Anand Sivasubramaniam and
Chita R. Das Boosting access parallelism to PCM-based
main memory . . . . . . . . . . . . . . 695--706
Jayneel Gandhi and
Mark D. Hill and
Michael M. Swift Agile paging: exceeding the best of
nested and shadow paging . . . . . . . . 707--718
Hoseok Seol and
Wongyu Shin and
Jaemin Jang and
Jungwhan Choi and
Jinwoong Suh and
Lee-Sup Kim Energy efficient data encoding in DRAM
channels exploiting data value
similarity . . . . . . . . . . . . . . . 719--730
Jiayi Sheng and
Qingqing Xiong and
Chen Yang and
Martin C. Herbordt Collective Communication on FPGA
Clusters with Static Scheduling . . . . 2--7
Susumu Mashimo and
Thiem Van Chu and
Kenji Kise Cost-Effective and High-Throughput Merge
Network: Architecture for the Fastest
FPGA Sorting Accelerator . . . . . . . . 8--13
Cuong Pham-Quoc and
Biet Nguyen and
Tran Ngoc Thinh FPGA-based Multicore Architecture for
Integrating Multiple DDoS Defense
Mechanisms . . . . . . . . . . . . . . . 14--19
Fatemeh Eslami and
Steven J. E. Wilton An Improved Overlay and Mapping
Algorithm Supporting Rapid Triggering
for FPGA Debug . . . . . . . . . . . . . 20--25
Ryohei Kobayashi and
Tomohiro Misono and
Kenji Kise A High-speed Verilog HDL Simulation
Method using a Lightweight Translator 26--31
Shohei Sassa and
Kenji Kanazawa and
Shaowei Cai and
Moritoshi Yasunaga An FPGA Solver for Partial MaxSAT
Problems Based on Stochastic Local
Search . . . . . . . . . . . . . . . . . 32--37
Ernst Joachim Houtgast and
VladMihai Sima and
Koen Bertels and
Zaid AlArs An Efficient GPUAccelerated
Implementation of Genomic Short Read
Mapping with BWAMEM . . . . . . . . . . 38--43
Hiroki Nakahara and
Hiroyuki Nakanishi and
Kazumasa Iwai and
Tsutomu Sasao An FFT Circuit for a Spectrometer of a
Radio Telescope using the Nested RNS
including the Constant Division . . . . 44--49
Vinod Pangracious and
Mulhim Al-Doori Novel Three-Dimensional Embedded FPGA
Technology and Achitecture . . . . . . . 50--55
Oliver Knodel and
Paul R. Genssler and
Rainer G. Spallek Migration of long-running Tasks between
Reconfigurable Resources using
Virtualization . . . . . . . . . . . . . 56--61
Jubee Tada and
Maiki Hosokawa and
Ryusuke Egawa and
Hiroaki Kobayashi Effects of Stacking Granularity on $3$-D
Stacked Floating-point Fused Multiply
Add Units . . . . . . . . . . . . . . . 62--67
Jiang Su and
Jianxiong Liu and
David B. Thomas and
Peter Y. K. Cheung Neural Network Based Reinforcement
Learning Acceleration on FPGA Platforms 68--73
Erik H. D'Hollander High-Level Synthesis Optimization for
Blocked Floating-Point Matrix
Multiplication . . . . . . . . . . . . . 74--79
Chengzhe Li and
Lai Yoong Yee and
Hiroshi Maruyama and
Yoshiki Yamaguchi FPGA-based Volleyball Player Tracker . . 80--86
Qian Zhao and
Motoki Amagasaki and
Masahiro Iida and
Morihiro Kuga and
Toshinori Sueyoshi A Study of Heterogeneous Computing
Design Method based on Virtualization
Technology . . . . . . . . . . . . . . . 86--91
Colin Yu Lin and
Zhenghong Jiang and
Cheng Fu and
Hayden Kwok-Hay So and
Haigang Yang FPGA High-level Synthesis versus
Overlay: Comparisons on Computation
Kernels . . . . . . . . . . . . . . . . 92--97
Xusheng Zhan and
Yungang Bao and
Christian Bienia and
Kai Li PARSEC3.0: a Multicore Benchmark Suite
with Network Stacks and SPLASH-2X . . . 1--16
Yunji Chen Big Data Analytics and Intelligence at
Alibaba Cloud . . . . . . . . . . . . . 1--1
Hari Cherupalli and
Henry Duwe and
Weidong Ye and
Rakesh Kumar and
John Sartori Determining Application-specific Peak
Power and Energy Requirements for
Ultra-low Power Processors . . . . . . . 3--16
Quan Chen and
Hailong Yang and
Minyi Guo and
Ram Srivatsa Kannan and
Jason Mars and
Lingjia Tang Prophet: Precise QoS Prediction on
Non-Preemptive Accelerators to Improve
Utilization in Warehouse-Scale Computers 17--32
Svilen Kanev and
Sam Likun Xi and
Gu-Yeon Wei and
David Brooks Mallacc: Accelerating Memory Allocation 33--45
Shasha Wen and
Milind Chabbi and
Xu Liu REDSPY: Exploring Value Locality in
Software . . . . . . . . . . . . . . . . 47--61
Abhishek Bhattacharjee Translation-Triggered Prefetching . . . 63--76
Channoh Kim and
Jaehyeok Kim and
Sungmin Kim and
Dooyoung Kim and
Namho Kim and
Gitae Na and
Young H. Oh and
Hyeon Gyu Cho and
Jae W. Lee Typed Architectures: Architectural
Support for Lightweight Scripting . . . 77--90
Jihye Seo and
Wook-Hee Kim and
Woongki Baek and
Beomseok Nam and
Sam H. Noh Failure-Atomic Slotted Paging for
Persistent Memory . . . . . . . . . . . 91--104
Donald Nguyen and
Keshav Pingali What Scalable Programs Need from
Transactional Memory . . . . . . . . . . 105--118
Caroline Trippel and
Yatin A. Manerkar and
Daniel Lustig and
Michael Pellauer and
Margaret Martonosi TriCheck: Memory Model Verification at
the Trisection of Software, Hardware,
and ISA . . . . . . . . . . . . . . . . 119--133
Sanketh Nalli and
Swapnil Haria and
Mark D. Hill and
Michael M. Swift and
Haris Volos and
Kimberly Keeton An Analysis of Persistent Memory Use
with WHISPER . . . . . . . . . . . . . . 135--148
Tong Zhang and
Changhee Jung and
Dongyoon Lee ProRace: Practical Data Race Detection
for Production Use . . . . . . . . . . . 149--162
Lena E. Olson and
Mark D. Hill and
David A. Wood Crossing Guard: Mediating
Host-Accelerator Coherence Interactions 163--176
Joseph McMahan and
Michael Christensen and
Lawton Nichols and
Jared Roesch and
Sung-Yee Guo and
Ben Hardekopf and
Timothy Sherwood An Architecture Supporting Formal and
Compositional Binary Analysis . . . . . 177--191
Chun-Hung Hsiao and
Satish Narayanasamy and
Essam Muhammad Idris Khan and
Cristiano L. Pereira and
Gilles A. Pokam AsyncClock: Scalable Inference of
Asynchronous Event Causality . . . . . . 193--205
Irina Calciu and
Siddhartha Sen and
Mahesh Balakrishnan and
Marcos K. Aguilera Black-box Concurrent Data Structures for
NUMA Architectures . . . . . . . . . . . 207--221
Keval Vora and
Chen Tian and
Rajiv Gupta and
Ziang Hu CoRAL: Confined Recovery in Distributed
Asynchronous Graph Processing . . . . . 223--236
Keval Vora and
Rajiv Gupta and
Guoqing Xu KickStarter: Fast and Accurate
Computations on Streaming Graphs via
Trimmed Approximations . . . . . . . . . 237--251
Bobby Powers and
John Vilk and
Emery D. Berger Browsix: Bridging the Gap Between Unix
and the Browser . . . . . . . . . . . . 253--266
Samyam Rajbhandari and
Yuxiong He and
Olatunji Ruwase and
Michael Carbin and
Trishul Chilimbi Optimizing CNNs on Multicores for
Scalability, Performance and Goodput . . 267--280
Kirshanthan Sundararajah and
Laith Sakka and
Milind Kulkarni Locality Transformations for Nested
Recursive Iteration Spaces . . . . . . . 281--295
Ang Li and
Shuaiwen Leon Song and
Weifeng Liu and
Xu Liu and
Akash Kumar and
Henk Corporaal Locality-Aware CTA Clustering for Modern
GPUs . . . . . . . . . . . . . . . . . . 297--311
Berkeley Churchill and
Rahul Sharma and
JF Bastien and
Alex Aiken Sound Loop Superoptimization for Google
Native Client . . . . . . . . . . . . . 313--326
Ricardo Bianchini Improving Datacenter Efficiency . . . . 327--327
Mengxing Liu and
Mingxing Zhang and
Kang Chen and
Xuehai Qian and
Yongwei Wu and
Weimin Zheng and
Jinglei Ren DudeTM: Building Durable Transactions
with Decoupling for Persistent Memory 329--343
Ana Klimovic and
Heiner Litz and
Christos Kozyrakis ReFlex: Remote Flash $ \approx $ Local
Flash . . . . . . . . . . . . . . . . . 345--359
Djordje Jevdjic and
Karin Strauss and
Luis Ceze and
Henrique S. Malvar Approximate Storage of Compressed and
Encrypted Videos . . . . . . . . . . . . 361--373
Nima Elyasi and
Mohammad Arjomand and
Anand Sivasubramaniam and
Mahmut T. Kandemir and
Chita R. Das and
Myoungsoo Jung Exploiting Intra-Request Slack to
Improve SSD Performance . . . . . . . . 375--388
Kai Wang and
Aftab Hussain and
Zhiqiang Zuo and
Guoqing Xu and
Ardalan Amiri Sani Graspan: a Single-machine Disk-based
Graph System for Interprocedural Static
Analyses of Large-scale Systems Code . . 389--404
Ao Ren and
Zhe Li and
Caiwen Ding and
Qinru Qiu and
Yanzhi Wang and
Ji Li and
Xuehai Qian and
Bo Yuan SC-DCNN: Highly-Scalable Deep
Convolutional Neural Network using
Stochastic Computing . . . . . . . . . . 405--418
Jerry Ajay and
Chen Song and
Aditya Singh Rathore and
Chi Zhou and
Wenyao Xu $3$DGates: an Instruction-Level Energy
Analysis and Optimization of $3$D
Printers . . . . . . . . . . . . . . . . 419--433
Guilherme Cox and
Abhishek Bhattacharjee Efficient Address Translation for
Architectures with Multiple Page Sizes 435--448
Ilya Lesokhin and
Haggai Eran and
Shachar Raindel and
Guy Shapiro and
Sagi Grimberg and
Liran Liss and
Muli Ben-Yehuda and
Nadav Amit and
Dan Tsafrir Page Fault Support for Network
Controllers . . . . . . . . . . . . . . 449--466
Yang Hu and
Mingcong Song and
Tao Li Towards ``Full Containerization'' in
Containerized Network Function
Virtualization . . . . . . . . . . . . . 467--481
Bo Wu and
Xu Liu and
Xiaobo Zhou and
Changjun Jiang FLEP: Enabling Flexible and Efficient
Preemption on GPUs . . . . . . . . . . . 483--496
Kaiwei Li and
Jianfei Chen and
Wenguang Chen and
Jun Zhu SaberLDA: Sparsity-Aware Learning of
Topic Models on GPUs . . . . . . . . . . 497--509
Moein Khazraee and
Lu Zhang and
Luis Vega and
Michael Bedford Taylor Moonwalk: NRE Optimization in ASIC
Clouds . . . . . . . . . . . . . . . . . 511--526
Jason Jong Kyu Park and
Yongjun Park and
Scott Mahlke Dynamic Resource Management for
Efficient Utilization of Multitasking
GPUs . . . . . . . . . . . . . . . . . . 527--540
Rui Zhang and
Natalie Stanley and
Christopher Griggs and
Andrew Chi and
Cynthia Sturton Identifying Security Critical Properties
for the Dynamic Verification of a
Processor . . . . . . . . . . . . . . . 541--554
Andrew Ferraiuolo and
Rui Xu and
Danfeng Zhang and
Andrew C. Myers and
G. Edward Suh Verification of a Practical Hardware
Security Architecture Through Static
Information Flow Analysis . . . . . . . 555--568
David Chisnall and
Brooks Davis and
Khilan Gudka and
David Brazdil and
Alexandre Joannou and
Jonathan Woodruff and
A. Theodore Markettos and
J. Edward Maste and
Robert Norton and
Stacey Son and
Michael Roe and
Simon W. Moore and
Peter G. Neumann and
Ben Laurie and
Robert N. M. Watson CHERI JNI: Sinking the Java Security
Model into the C . . . . . . . . . . . . 569--583
Xinyang Ge and
Weidong Cui and
Trent Jaeger GRIFFIN: Guarding Control Flows Using
Intel Processor Trace . . . . . . . . . 585--598
Christina Delimitrou and
Christos Kozyrakis Bolt: I Know What You Did Last Summer
\ldots In The Cloud . . . . . . . . . . 599--613
Yiping Kang and
Johann Hauswald and
Cao Gao and
Austin Rovinski and
Trevor Mudge and
Jason Mars and
Lingjia Tang Neurosurgeon: Collaborative Intelligence
Between the Cloud and Mobile Edge . . . 615--629
Neha Agarwal and
Thomas F. Wenisch Thermostat: Application-transparent Page
Management for Two-tiered Main Memory 631--644
Antonio Barbalace and
Robert Lyerly and
Christopher Jelesnianski and
Anthony Carno and
Ho-Ren Chuang and
Vincent Legout and
Binoy Ravindran Breaking the Boundaries in
Heterogeneous-ISA Datacenters . . . . . 645--659
Daniel Lustig and
Andrew Wright and
Alexandros Papakonstantinou and
Olivier Giroux Automated Synthesis of Comprehensive
Memory Model Litmus Test Suites . . . . 661--675
Haopeng Liu and
Guangpu Li and
Jeffrey F. Lukman and
Jiaxin Li and
Shan Lu and
Haryadi S. Gunawi and
Chen Tian DCatch: Automatically Detecting
Distributed Concurrency Bugs in Cloud
Systems . . . . . . . . . . . . . . . . 677--691
Ali José Mashtizadeh and
Tal Garfinkel and
David Terei and
David Mazieres and
Mendel Rosenblum Towards Practical Default-On Multi-Core
Record/Replay . . . . . . . . . . . . . 693--708
Jian Huang and
Michael Allen-Bond and
Xuechen Zhang Pallas: Semantic-Aware Checking for
Finding Deep Bugs in Fast Path . . . . . 709--722
Jagadish B. Kotra and
Narges Shahidi and
Zeshan A. Chishti and
Mahmut T. Kandemir Hardware-Software Co-design to Mitigate
DRAM Refresh Overheads: a Case for
Refresh-Aware Process Scheduling . . . . 723--736
Jinchun Kim and
Elvira Teran and
Paul V. Gratz and
Daniel A. Jiménez and
Seth H. Pugsley and
Chris Wilkerson Kill the Program Counter: Reconstructing
Program Behavior in the Processor Cache
Hierarchy . . . . . . . . . . . . . . . 737--749
Mingyu Gao and
Jing Pu and
Xuan Yang and
Mark Horowitz and
Christos Kozyrakis TETRIS: Scalable and Efficient Neural
Network Acceleration with $3$D Memory 751--764
Wonjun Song and
Gwangsun Kim and
Hyungjoon Jung and
Jongwook Chung and
Jung Ho Ahn and
Jae W. Lee and
John Kim History-Based Arbitration for Fairness
in Processor-Interconnect of NUMA
Servers . . . . . . . . . . . . . . . . 765--777
Pulkit A. Misra and
Jeffrey S. Chase and
Johannes Gehrke and
Alvin R. Lebeck Enabling Lightweight Transactions with
Precision Time . . . . . . . . . . . . . 779--794
Ming Liu and
Liang Luo and
Jacob Nelson and
Luis Ceze and
Arvind Krishnamurthy and
Kishore Atreya IncBricks: Toward In-Network Computation
with an In-Network Cache . . . . . . . . 795--809
Ismail Akturk and
Ulya R. Karpuzcu AMNESIAC: Amnesic Automatic Computer . . 811--824
Yuxin Bai and
Victor W. Lee and
Engin Ipek Voltage Regulator Efficiency Aware Power
Management . . . . . . . . . . . . . . . 825--838
Norman P. Jouppi and
Cliff Young and
Nishant Patil and
David Patterson and
Gaurav Agrawal and
Raminder Bajwa and
Sarah Bates and
Suresh Bhatia and
Nan Boden and
Al Borchers and
Rick Boyle and
Pierre-luc Cantin and
Clifford Chao and
Chris Clark and
Jeremy Coriell and
Mike Daley and
Matt Dau and
Jeffrey Dean and
Ben Gelb and
Tara Vazir Ghaemmaghami and
Rajendra Gottipati and
William Gulland and
Robert Hagmann and
C. Richard Ho and
Doug Hogberg and
John Hu and
Robert Hundt and
Dan Hurt and
Julian Ibarz and
Aaron Jaffey and
Alek Jaworski and
Alexander Kaplan and
Harshit Khaitan and
Daniel Killebrew and
Andy Koch and
Naveen Kumar and
Steve Lacy and
James Laudon and
James Law and
Diemthu Le and
Chris Leary and
Zhuyuan Liu and
Kyle Lucke and
Alan Lundin and
Gordon MacKean and
Adriana Maggiore and
Maire Mahony and
Kieran Miller and
Rahul Nagarajan and
Ravi Narayanaswami and
Ray Ni and
Kathy Nix and
Thomas Norrie and
Mark Omernick and
Narayana Penukonda and
Andy Phelps and
Jonathan Ross and
Matt Ross and
Amir Salek and
Emad Samadiani and
Chris Severn and
Gregory Sizikov and
Matthew Snelham and
Jed Souter and
Dan Steinberg and
Andy Swing and
Mercedes Tan and
Gregory Thorson and
Bo Tian and
Horia Toma and
Erick Tuttle and
Vijay Vasudevan and
Richard Walter and
Walter Wang and
Eric Wilcox and
Doe Hyun Yoon In-Datacenter Performance Analysis of a
Tensor Processing Unit . . . . . . . . . 1--12
Swagath Venkataramani and
Ashish Ranjan and
Subarno Banerjee and
Dipankar Das and
Sasikanth Avancha and
Ashok Jagannathan and
Ajaya Durg and
Dheemanth Nagaraj and
Bharat Kaul and
Pradeep Dubey and
Anand Raghunathan ScaleDeep: a Scalable Compute
Architecture for Learning and Evaluating
Deep Networks . . . . . . . . . . . . . 13--26
Angshuman Parashar and
Minsoo Rhu and
Anurag Mukkara and
Antonio Puglielli and
Rangharajan Venkatesan and
Brucek Khailany and
Joel Emer and
Stephen W. Keckler and
William J. Dally SCNN: an Accelerator for
Compressed-sparse Convolutional Neural
Networks . . . . . . . . . . . . . . . . 27--40
Hari Cherupalli and
Henry Duwe and
Weidong Ye and
Rakesh Kumar and
John Sartori Bespoke Processors for Applications with
Ultra-low Area and Power Constraints . . 41--54
Yajing Chen and
Shengshuo Lu and
Cheng Fu and
David Blaauw and
Ronald Dreslinski, Jr. and
Trevor Mudge and
Hun-Seok Kim A Programmable Galois Field Processor
for the Internet of Things . . . . . . . 55--68
Aosen Wang and
Lizhong Chen and
Wenyao Xu XPro: a Cross-End Processing
Architecture for Data Analytics in
Wearables . . . . . . . . . . . . . . . 69--80
Ofir Weisse and
Valeria Bertacco and
Todd Austin Regaining Lost Cycles with HotCalls: a
Fast Interface for SGX Secure Enclaves 81--93
Shaizeen Aga and
Satish Narayanasamy InvisiMem: Smart Memory Defenses for
Memory Bus Side Channel . . . . . . . . 94--106
Amro Awad and
Yipeng Wang and
Deborah Shands and
Yan Solihin ObfusMem: a Low-Overhead Access
Obfuscation for Trusted Memories . . . . 107--119
S. Karen Khatamifard and
Longfei Wang and
Weize Yu and
Selçuk Köse and
Ulya R. Karpuzcu ThermoGater: Thermally-Aware On-Chip
Voltage Regulation . . . . . . . . . . . 120--132
Hailong Yang and
Quan Chen and
Moeiz Riaz and
Zhongzhi Luan and
Lingjia Tang and
Jason Mars PowerChief: Intelligent Power Allocation
for Multi-Stage Applications to Improve
Responsiveness on Power Constrained CMP 133--146
Gokul Subramanian Ravi and
Mikko H. Lipasti CHARSTAR: Clock Hierarchy Aware Resource
Scaling in Tiled ARchitectures . . . . . 147--160
Matthew D. Sinclair and
Johnathan Alsop and
Sarita V. Adve Chasing Away RAts: Semantics and
Evaluation for Relaxed Atomics on
Heterogeneous Systems . . . . . . . . . 161--174
Seunghee Shin and
James Tuck and
Yan Solihin Hiding the Long Latency of Persist
Barriers Using Speculative Execution . . 175--186
Alberto Ros and
Trevor E. Carlson and
Mehdi Alipour and
Stefanos Kaxiras Non-Speculative Load-Load Reordering in
TSO . . . . . . . . . . . . . . . . . . 187--200
Doowon Lee and
Valeria Bertacco MTraceCheck: Validating
Non-Deterministic Behavior of Memory
Consistency Models in Post-Silicon
Validation . . . . . . . . . . . . . . . 201--213
Ruohuang Zheng and
Michael C. Huang Redundant Memory Array Architecture for
Efficient Selective Protection . . . . . 214--227
Matthew Hicks Clank: Architectural Support for
Intermittent Computation . . . . . . . . 228--240
Manolis Kaliorakis and
Dimitris Gizopoulos and
Ramon Canal and
Antonio Gonzalez MeRLiN: Exploiting Dynamic Instruction
Behavior for Fast and Accurate
Microarchitecture Level Reliability
Assessment . . . . . . . . . . . . . . . 241--254
Minesh Patel and
Jeremie S. Kim and
Onur Mutlu The Reach Profiler (REAPER): Enabling
the Mitigation of DRAM Retention
Failures via Profiling at Aggressive
Conditions . . . . . . . . . . . . . . . 255--268
Zhenning Wang and
Jun Yang and
Rami Melhem and
Bruce Childers and
Youtao Zhang and
Minyi Guo Quality of Service Support for
Fine-Grained Sharing on GPUs . . . . . . 269--281
Sui Chen and
Lu Peng and
Samuel Irving Accelerating GPU Hardware Transactional
Memory with Snapshot Isolation . . . . . 282--294
Kai Wang and
Calvin Lin Decoupled Affine Computation for SIMT
GPUs . . . . . . . . . . . . . . . . . . 295--306
Gunjae Koo and
Yunho Oh and
Won Woo Ro and
Murali Annavaram Access Pattern-Aware Cache Management
for Improving Data Utilization in GPU 307--319
Akhil Arunkumar and
Evgeny Bolotin and
Benjamin Cho and
Ugljesa Milic and
Eiman Ebrahimi and
Oreste Villa and
Aamer Jaleel and
Carole-Jean Wu and
David Nellans MCM-GPU: Multi-Chip-Module GPUs for
Continued Performance Scalability . . . 320--332
Alireza Nazari and
Nader Sehatbakhsh and
Monjur Alam and
Alenka Zajic and
Milos Prvulovic EDDIE: EM-Based Detection of Deviations
in Program Execution . . . . . . . . . . 333--346
Mengjia Yan and
Bhargava Gopireddy and
Thomas Shull and
Josep Torrellas Secure Hierarchy-Aware Cache Replacement
Policy (SHARP): Defending Against
Cache-Based Side Channel Atacks . . . . 347--360
Zhaoxia Deng and
Ariel Feldman and
Stuart A. Kurtz and
Frederic T. Chong Lemonade from Lemons: Harnessing Device
Wearout to Create Limited-Use Security
Architectures . . . . . . . . . . . . . 361--374
Muhammad Shoaib Bin Altaf and
David A. Wood LogCA: a High-Level Performance Model
for Hardware Accelerators . . . . . . . 375--388
Raghu Prabhakar and
Yaqi Zhang and
David Koeplinger and
Matt Feldman and
Tian Zhao and
Stefan Hadjis and
Ardavan Pedram and
Christos Kozyrakis and
Kunle Olukotun Plasticine: a Reconfigurable
Architecture For Parallel Paterns . . . 389--402
Jaeha Kung and
Yun Long and
Duckhwan Kim and
Saibal Mukhopadhyay A Programmable Hardware Accelerator for
Simulating Dynamical Systems . . . . . . 403--415
Tony Nowatzki and
Vinay Gangadhar and
Newsha Ardalani and
Karthikeyan Sankaralingam Stream-Dataflow Acceleration . . . . . . 416--429
Zi Yan and
Ján Veselý and
Guilherme Cox and
Abhishek Bhattacharjee Hardware Translation Coherence for
Virtualized Systems . . . . . . . . . . 430--443
Chang Hyun Park and
Taekyung Heo and
Jungi Jeong and
Jaehyuk Huh Hybrid TLB Coalescing: Improving TLB
Translation Coverage under Diverse
Fragmented Memory Allocations . . . . . 444--456
Hanna Alam and
Tianhao Zhang and
Mattan Erez and
Yoav Etsion Do-It-Yourself Virtual Memory
Translation . . . . . . . . . . . . . . 457--468
Jee Ho Ryoo and
Nagendra Gulur and
Shuang Song and
Lizy K. John Rethinking TLB Designs in Virtualized
Environments: a Very Large
Part-of-Memory TLB . . . . . . . . . . . 469--480
Aasheesh Kolli and
Vaibhav Gogte and
Ali Saidi and
Stephan Diestelhorst and
Peter M. Chen and
Satish Narayanasamy and
Thomas F. Wenisch Language-level persistency . . . . . . . 481--493
Jiho Choi and
Thomas Shull and
Maria J. Garzaran and
Josep Torrellas ShortCut: Architectural Support for Fast
Object Access in Scripting Languages . . 494--506
Dibakar Gope and
David J. Schlais and
Mikko H. Lipasti Architectural Support for Server-Side
PHP Processing . . . . . . . . . . . . . 507--520
Sudarsun Kannan and
Ada Gavrilovska and
Vishal Gupta and
Karsten Schwan HeteroOS: OS Design for Heterogeneous
Memory Management in Datacenter . . . . 521--534
Yongming Shen and
Michael Ferdman and
Peter Milder Maximizing CNN Accelerator Efficiency
Through Resource Partitioning . . . . . 535--547
Jiecao Yu and
Andrew Lukefahr and
David Palframan and
Ganesh Dasika and
Reetuparna Das and
Scott Mahlke Scalpel: Customizing DNN Pruning to the
Underlying Hardware Parallelism . . . . 548--560
Christopher De Sa and
Matthew Feldman and
Christopher Ré and
Kunle Olukotun Understanding and Optimizing
Asynchronous Low-Precision Stochastic
Gradient Descent . . . . . . . . . . . . 561--574
Zhaoshi Li and
Leibo Liu and
Yangdong Deng and
Shouyi Yin and
Yao Wang and
Shaojun Wei Aggressive Pipelining of Irregular
Applications on Reconfigurable Hardware 575--586
Suvinay Subramanian and
Mark C. Jeffrey and
Maleen Abeydeera and
Hyun Ryong Lee and
Victor A. Ying and
Joel Emer and
Daniel Sanchez Fractal: an Execution Model for
Fine-Grain Nested Speculative
Parallelism . . . . . . . . . . . . . . 587--599
Arun Subramaniyan and
Reetuparna Das Parallel Automata Processor . . . . . . 600--612
Rajat Kateja and
Anirudh Badam and
Sriram Govindan and
Bikash Sharma and
Greg Ganger Viyojit: Decoupling Battery and DRAM
Capacities for Battery-Backed DRAM . . . 613--626
Vinson Young and
Prashant J. Nair and
Moinuddin K. Qureshi DICE: Compressing DRAM Caches for
Bandwidth and Capacity . . . . . . . . . 627--638
Mario Drumond and
Alexandros Daglis and
Nooshin Mirzadeh and
Dmitrii Ustiugov and
Javier Picorel and
Babak Falsafi and
Boris Grot and
Dionisios Pnevmatikatos The Mondrian Data Engine . . . . . . . . 639--651
Po-An Tsai and
Nathan Beckmann and
Daniel Sanchez Jenga: Software-Defined Cache
Hierarchies . . . . . . . . . . . . . . 652--665
Rahul Boyapati and
Jiayi Huang and
Pritam Majumder and
Ki Hwan Yum and
Eun Jung Kim APPROX-NoC: a Data Approximation
Framework for Network-On-Chip
Architectures . . . . . . . . . . . . . 666--677
Matthew Poremba and
Itir Akgun and
Jieming Yin and
Onur Kayiran and
Yuan Xie and
Gabriel H. Loh There and Back Again: Optimizing the
Interconnect in Networks of Memory Cubes 678--690
Binzhang Fu and
John Kim Footprint: Regulating Routing
Adaptiveness in Networks-on-Chip . . . . 691--702
Masoumeh Ebrahimi and
Masoud Daneshtalab EbDa: a New Theory on Design and
Verification of Deadlock-free
Interconnection Networks . . . . . . . . 703--715