Last update:
Tue Apr 15 07:32:36 MDT 2025
Anonymous Important announcement . . . . . . . . . 1--1 Anonymous Editorial: a journal transformed . . . . 3--4 Edward J. Krall and Patrick F. McGehearty A case study of parallel execution of a rule-based expert system . . . . . . . . 5--32 Vaughan R. Pratt Modeling concurrency with partial orders 33--71 S. Kasif Control and data driven execution of logic programs: a comparison . . . . . . 73--99 Parallax How are parallel systems invented? . . . 101--102
Paul R. Hudak The Denotational Semantics of a Para-Functional Programming Language . . 103--125 Guang R. Gao Maximum pipelining linear recurrence on static data flow computers . . . . . . . 127--149 Donald M. Chiarulli and Duncan A. Buell Parallel microprogramming tools for a horizontally reconfigurable architecture 151--162 D. Nau and P. Purdom and Chun-Hung Tzeng Experiments on alternatives to minimax 163--183 Parallax When is pull better than push? (parallel programming) . . . . . . . . . . . . . . 185--188
Khayri A. M. Ali OR-parallel execution of PROLOG on a multi-sequential machine . . . . . . . . 189--214 Bharat Jayaraman and Robert M. Keller Primitives for resource management in a demand-driven reduction model . . . . . 215--244 S. Taylor and S. Safra and E. Shapiro A parallel implementation of Flat Concurrent Prolog . . . . . . . . . . . 245--275 Parallax The bards on parallel programming . . . 277--277
Michael Wolfe Loops skewing: The wavefront method revisited . . . . . . . . . . . . . . . 279--293 Eugene D. Brooks, II The butterfly barrier (multiprocessing) 295--307 Alan George and Michael T. Heath and Joseph Liu and Esmond Ng Solution of sparse positive definite systems on a shared-memory multiprocessor . . . . . . . . . . . . . 309--325 S. P. Rana and D. K. Banerji An optimal distributed solution to the dining philosophers problem . . . . . . 327--335 Anonymous Hotspotting . . . . . . . . . . . . . . 337--337
Khayri A. M. Ali and Seif Haridi Global garbage collection for distributed heap storage systems . . . . 339--387 Hossam El-Gindy An optimal speed-up parallel algorithm for triangulating simplicial point sets in space . . . . . . . . . . . . . . . . 389--398 Ed Merks An Optimal Parallel Algorithm for Triangulating a Set of Points in the Plane . . . . . . . . . . . . . . . . . 399--411 B. Gro\vselj and C. Tropper Pseudosimulation: an algorithm for distributed simulation with limited memory . . . . . . . . . . . . . . . . . 413--456 Anonymous The church of the least fixed point . . 457--457
Robert H. Halstead, Jr. An Assessment of Multilisp --- Lessons from Experience . . . . . . . . . . . . 459--501 Eliezer Dekel and Shietung Peng and S. Sitharma Lyengar Optimal parallel algorithms for constructing and maintaining a balanced $m$-way search tree . . . . . . . . . . 503--528 Virgilio A. F. Almeida and Lawrence W. Dowdy Performance analysis of a scheme for concurrency/synchronization using queueing network models . . . . . . . . 529--550 Venkatramana G. Ajjanagadde and L. M. Patnaik Systolic Architecture for B-Spline Surfaces . . . . . . . . . . . . . . . . 551--565 Gary Lindstrom Sans pareil: Referees . . . . . . . . . 567--568
Shlomit S. Pinter and Yaron Wolfstahl On mapping processes to processors in distributed systems . . . . . . . . . . 1--15 Kristine Stougaard Thomsen Inheritance on processes, exemplified on distributed termination detection . . . 17--52 E. P. DeBenedictis A Multiprocessor Using Protocol-Based Programming Primitives . . . . . . . . . 53--84 Anonymous Amdahl's law . . . . . . . . . . . . . . 85--85
Ian Foster and Stephen Taylor Flat Parlog: a basis for comparison . . 87--125 Henk Meijer and Selim G. Akl Optimal computation of prefix sums on a binary tree of processors . . . . . . . 127--136 Michael Wolfe and Utpal Banerjee Data dependence and its application to parallel processing . . . . . . . . . . 137--178 Anonymous Isomorphic Computers Inc.: With Isomorphic Computers, more is more\TM 179--182
Adolfo Guzman and Edward J. Krall and Patrick F. McGehearty and Nader Bagherzadeh Performance of symbolic applications on a parallel architecture . . . . . . . . 183--214 Richard M. Fujimoto and Hwa-chung Feng A shared memory algorithm and proof for the generalized alternative construct in CSP . . . . . . . . . . . . . . . . . . 215--241 R. L. Wainwright Deriving parallel computations from functional specifications: a seismic example on a hypercube . . . . . . . . . 243--260 Anonymous Systolic processing . . . . . . . . . . 261--261
Nissim Francez and Shmuel Katz Fairness and the axioms of control predicates . . . . . . . . . . . . . . . 263--278 Frances E. Hunt Experiments with applicative updating: practical results . . . . . . . . . . . 279--303 E. Bradley and R. H. Halstead, Jr. Simulating logic circuits: a multiprocessor application . . . . . . . 305--338 Anonymous Connectionism . . . . . . . . . . . . . 339--339
Ashok Samal and Tom Henderson Parallel Consistent Labeling Algorithms 341--364 Charles Koelbel and Piyush Mehrotra and John Van Rosendale Semi-automatic process partitioning for parallel computation . . . . . . . . . . 365--382 Michael G. Main Trace, failure and testing equivalences for communicating processes . . . . . . 383--400 A. Davison Blackboard systems in Polka . . . . . . 401--424 Anonymous Fixpoints in Daily Life . . . . . . . . 425--425
John R. Gilbert and Earl Zmijewski A parallel graph partitioning algorithm for a message-passing multiprocessor . . 427--449 Pierpaolo Degano and Sergio Marchetti Partial ordering models for concurrency can be defined operationally . . . . . . 451--478 V. Nageshwara Rao and Vipin Kumar Parallel depth first search. Part I. Implementation . . . . . . . . . . . . . 479--499 Vipin Kumar and V. Nageshwara Rao Parallel depth first search. Part II. Analysis . . . . . . . . . . . . . . . . 501--519 Gary Lindstrom Sans pareil: Referees . . . . . . . . . 521--522
Debra Hensgen and Raphael Finkel and Udi Manber Two algorithms for barrier synchronization . . . . . . . . . . . . 1--17 Patrick Valduriez and Setrag Khoshfian Parallel evaluation of the transitive closure of a database relation . . . . . 19--42 Stephen L. Stepoway and Michael Christiansen Parallel Rendering of Fractal Surfaces 43--58 P. A. Tinker Performance of an OR-parallel logic programming system . . . . . . . . . . . 59--92 Gary Lindstrom Sage commentary . . . . . . . . . . . . 93--93
Anoop Gupta and Milind Tambe and Dirk Kalp and Charles Forgy and Allen Newell Parallel implementation of OPS5 on the Encore multiprocessor: results and analysis . . . . . . . . . . . . . . . . 95--124 John S. Conery Binding environments for parallel logic programs in non-shared memory multiprocessors . . . . . . . . . . . . 125--152 Rance Cleaveland and Prakash Panangaden Type theory and concurrency . . . . . . 153--206
Z. Somogyi and K. Ramamohanarao and J. Vaghani A backtracking algorithm for the stream AND-parallel execution of logic programs 207--257 Elizabeth W. Edmiston and Nolan G. Core and Joel H. Saltz and Roger M. Smith Parallel processing of biological sequence comparison algorithms . . . . . 259--275 V. K. Janakiram and E. F. Gehringer and D. P. Agrawal and Mehrotra and R. A randomized parallel branch-and-bound algorithm . . . . . . . . . . . . . . . 277--301
Carla Schlatter Ellis and Thomas J. Olson Algorithms for parallel memory allocation . . . . . . . . . . . . . . . 303--345 Mark T. Vandevoorde and Eric S. Roberts WorkCrews: an abstraction for controlling parallelism . . . . . . . . 347--366
James S. Miller Implementing a Scheme-Based Parallel Processing System . . . . . . . . . . . 367--402 G. Cybenko and T. G. Allen and J. E. Polito Practical Parallel Union-Find Algorithms for Transitive Closure and Clustering 403--423 Benjamin Goldberg Multiprocessor execution of functional programs . . . . . . . . . . . . . . . . 425--473
Lionel M. Ni and Chung-Ta King On partitioning and mapping for hypercube computing . . . . . . . . . . 475--495 Jim Crammond A Garbage Collection Algorithm for Shared Memory Parallel Processors . . . 497--522 Michael J. Swain Comments on A. Samal and T. Henderson: ``Parallel consistent labeling algorithms'' [Internat. J. Parallel Programming \bf 16 (1987), no. 5, 341--364] . . . . . . . . . . . . . . . 523--528 Gary Linstrom Sans pareil: referees . . . . . . . . . 529--530
Anne Neirynck and Prakash Panangaden and Alan J. Demers Effect analysis in higher-order languages . . . . . . . . . . . . . . . 1--36 Ran Ginosar and David Egozi Topological comparison of perfect shuffle and hypercube . . . . . . . . . 37--68 David M. Nicol and Joel H. Saltz and James C. Townsend Delay Point Schedules for Irregular Parallel Computations . . . . . . . . . 69--90
Kee-Hyun Park and Lawrence W. Dowdy Dynamic partitioning of multiprocessor systems . . . . . . . . . . . . . . . . 91--120 Alessandro Giacalone and Prateek Mishra and Sanjiva Prasad FACILE: a Symmetric Integration of Concurrent and Functional Programming 121--160
Rajiv Gupta and Charles R. Hill A Scalable Implementation of Barrier Synchronization Using An Adaptive Combining Tree . . . . . . . . . . . . . 161--180 Ian Foster A Multicomputer Garbage Collector for a Single Assignment Language . . . . . . . 181--203 Yi Xin Zhang Parallel algorithms for minimal spanning trees of directed graphs . . . . . . . . 205--221 Xiaoqiu Huang A space--efficient parallel sequence comparison algorithm for a message--passing multiprocessor . . . . 223--239
David Hemmendinger Initializing memory shared by several processors . . . . . . . . . . . . . . . 241--253 Gadi Taubenfeld and Shmuel Katz and Shlomo Moran Initial failures in distributed computations . . . . . . . . . . . . . . 255--276 Jason Gait Speedup and optimality in pipeline programs . . . . . . . . . . . . . . . . 277--290 G. A. Geist and E. Ng Task scheduling for parallel sparse Cholesky factorization . . . . . . . . . 291--314
Jeannette M. Wing Verifying atomic data types . . . . . . 315--357 Selim G. Akl and Frank Dehne Pipelined search on coarse grained networks . . . . . . . . . . . . . . . . 359--364 Juanito Camilleri An Operational Semantics for occam . . . 365--400 (or 149--167??) Arvind K. Bansal and Leon S. Sterling Transforming generate-and-test programs to execute under committed-choice AND-parallelism . . . . . . . . . . . . 401--446
Ambuj K. Singh and Ross Overbeek Derivation of Efficient Parallel Programs: an Example From Genetic Sequence Analysis . . . . . . . . . . . 447--484 Frederick Springsteel and Ivan Stojmenovi\'c Parallel general prefix computations with geometric, algebraic, and other applications . . . . . . . . . . . . . . 485--503 Woei-Kae Chen and Matthias F. M. Stallmann and Edward F. Gehringer Hypercube embedding heuristics: an evaluation . . . . . . . . . . . . . . . 505--549 Gary Lindstrom Sans pareil: Referees . . . . . . . . . 551--552
John H. Reif and Scott A. Smolka Data flow analysis of distributed communicating processes . . . . . . . . 1--30 Russell M. Clapp and Trevor N. Mudge and Donald C. Winsor Cache Coherence Requirements for Interprocess Rendezvous . . . . . . . . 31--51 Rajiv Gupta and Michael Epstein High Speed Synchronization of Processors Using Fuzzy Barriers . . . . . . . . . . 53--73
Duane A. Bailey and Janice E. Cuny and Craig P. Loomis ParaGraph: Graph editor support for parallel programming environments . . . 75--110 Raymond Greenlaw and Lawrence Snyder Achieving speedups for APL on an SIMD distributed memory machine . . . . . . . 111--127 Khayri A. M. Ali and Roland Karlsson The Muse Approach to OR-Parallel Prolog 129--162 (or 129--160??)
Manuel E. Bermudez and Richard Newman-Wolfe and George Logothetis Parallel Construction of SLR(1) and LALR(1) Parsers . . . . . . . . . . . . 163--184 Soumitra Sengupta and Arthur J. Bernstein Concurrency Control Optimizations in a Prolog Database . . . . . . . . . . . . 185--211 Frank Dehne and Quoc T. Pham and Ivan Stojmenovi\'c Optimal Visibility Algorithms for Binary Images on the Hypercube . . . . . . . . 213--224 Boris D. Lubachevsky Synchronization Barrier and Related Tools for Shared Memory Parallel Programming . . . . . . . . . . . . . . 225--250
L. V. Kalé and Vikram A. Saletore Parallel State-Space Search for a First Solution with Consistent Linear Speedups 251--293 Oscar H. Ibarra and Michael A. Palis An Efficient All-Parses Systolic Algorithm for General Context-free Parsing . . . . . . . . . . . . . . . . 295--331 Laurent Langlois Systolic Parsing of Context-free Languages . . . . . . . . . . . . . . . 333--355
Carole M. McNamee and Ronald A. Olsson Transformations for optimizing interprocess communication and synchronization mechanisms . . . . . . . 357--387 Rok Sosic and Richard F. Riesenfeld Parallel Algorithms for Line Generation 389--404 Douglas M. Blough and Nader Bagherzadeh Near-Optimal Message Routing and Broadcasting in Faulty Hypercubes . . . 405--423
E. Tick Execution Characteristics of Layered Streams . . . . . . . . . . . . . . . . 425--443 Khayri A. M. Ali and Roland Karlsson Full Prolog and Scheduling OR-Parallelism in Muse . . . . . . . . . 445--475 Michael D. Rice Semantics for Data Parallel Computation 477--509 Gary Lindstrom Sans pareil: Referees . . . . . . . . . 511--512
Manfred Broy and Thomas Streicher Specification and Design of Shared Resource Arbitration . . . . . . . . . . 1--22 Paul Feautrier Dataflow Analysis of Array and Scalar References . . . . . . . . . . . . . . . 23--53 (or 23--52??) Mike Livesey A Network Model of Barrier Synchronization Algorithms . . . . . . . 55--74
R. Mall and L. M. Patnaik Formal Timing Analysis of Distributed Systems . . . . . . . . . . . . . . . . 75--94 V. Singh and V. Kumar and G. Agha and C. Tomlinson Efficient Algorithms for Parallel Sorting on Mesh Multicomputers . . . . . 95--131 D. B. Skillicorn Models for Practical Parallel Computation . . . . . . . . . . . . . . 133--158
Kai Li and Jeffrey F. Naughton and James S. Plank An Efficient Checkpointing Method for Multicomputers with Wormhole Routing . . 159--180 Carole M. McNamee and Ronald A. Olsson An Attribute Grammar Approach to Compiler Optimization of IntraModule Interprocess Communication . . . . . . . 181--202 Gurdip Singh and Arthur J. Bernstein On the Relative Execution Times of Distributed Protocols . . . . . . . . . 203--235 Virginia M. Lo and Sanjay Rajopadhye and Samik Gupta and David Keldsen and Moataz A. Mohamed and Bill Nitzberg and Jan Arne Tell and Xiaoxiong Zhong OREGAMI: Tools for mapping parallel computations to parallel architectures 237--270
P. Adamson and E. Tick Greedy Partitioned Algorithms for the Shortest-Path Problem . . . . . . . . . 271--298 Matthew Huntbach Parallel Branch-and-Bound Search in Parlog . . . . . . . . . . . . . . . . . 299--314 Zheng Lin A Distributed Fair Polling Scheme Applied to OR-Parallel Logic Programming 315--339
Mohammad Ashraf Iqbal Approximate Algorithms for Partitioning Problems . . . . . . . . . . . . . . . . 341--361 Calvin Lin and Lawrence Snyder A Portable Implementation of SIMPLE . . 363--401 Amitabha Das and Louise E. Moser and P. M. Melliar-Smith A Parallel Sorting Algorithm for a Novel Model of Computation . . . . . . . . . . 403--419
Andrzej Ciepielewski Scheduling in OR-parallel Prolog systems: survey and open problems . . . 421--451 Steven Y. Susswein and Thomas C. Henderson and Joseph L. Zachary and Chuck Hansen and Paul Hinker and Gary C. Marsden Parallel Path Consistency . . . . . . . 453--473 Frank Dehne and Russ Miller and Andrew Rau Chaplin Optical Clustering on a Mesh-Connected Computer . . . . . . . . . . . . . . . . 475--486 Gary Lindstorm Sans pareil: Referees . . . . . . . . . 487--488
Michael A. Palis and David S. L. Wei Parallel Parsing of Tree Adjoining Grammars on the Connection Machine . . . 1--38 Stephen A. Schwab Extended parallelism in the Gröbner basis algorithm . . . . . . . . . . . . . . . 39--66 Balkrishna Ramkumar and Laxmikant V. Kalé A Join Algorithm for Combining AND Parallel Solutions in AND/OR Parallel Systems . . . . . . . . . . . . . . . . 67--107
Dilip Sarkar and Ivan Stojmenovi\'c Parallel Algorithms for Separation of Two Sets of Points and Recognition of Digital Convex Polygons . . . . . . . . 109--121 Xining Li and John Cleary and Brian Unger Virtual Time and Virtual Space . . . . . 123--150 Michael A. Palis and Sunil M. Shende An NC Algorithm for Recognizing Tree Adjoining Languages . . . . . . . . . . 151--167
Rajiv Gupta and Sunah Lee Exploiting Parallelism on a Fine-Grained MIMD Architecture Based Upon Channel Queues . . . . . . . . . . . . . . . . . 169--192 Ling-Yu Chuang and Vernon Rego and Aditya Mathur An application of program unification to priority queue vectorization . . . . . . 193--224
R. Govindarajan and S. Yu and V. S. Lakshmanan Attempting Guards in Parallel: a Data Flow Approach to Execute Generalized Guarded Commands . . . . . . . . . . . . 225--268 Ouri Wolfson and Weining Zhang and Harish Butani and Akira Kawaguchi and Mok Kui Parallel Processing of Graph Reachability in Databases . . . . . . . 269--302 Alan P. Sprague A Parallel Algorithm to Construct a Dominance Graph on Non-overlapping Rectangles . . . . . . . . . . . . . . . 303--312
Paul Feautrier Some efficient solutions to the affine scheduling problem. I. One-dimensional time . . . . . . . . . . . . . . . . . . 313--347 W. Loots and T. H. C. Smith A parallel algorithm for the $0$-$1$ knapsack problem . . . . . . . . . . . . 349--362 Bradley K. Seevers and Michael J. Quinn and Philip J. Hatcher A Parallel Programming Environment Supporting Multiple Data-Parallel Modules . . . . . . . . . . . . . . . . 363--386
Anonymous Important announcement to subscribers 387--387 Paul Feautrier Some Efficient Solutions to the Affine Scheduling Problem. Part II. Multidimensional Time . . . . . . . . . 389--420 Qi Ning and Guang R. Gao Optimal Loop Storage Allocation for Argument-Fetching Dataflow Machines . . 421--448 Khayri A. M. Ali and Roland Karlsson Scheduling Speculative Work in MUSE and Performance Results . . . . . . . . . . 449--476 Gary Lindstrom Referees and Valedictory . . . . . . . . 477--479
Gordon Bell Scalable, Parallel Computers: Alternatives, Issues, and Challenges . . 3--46 Jack B. Dennis Machines and Models for Parallel Computing . . . . . . . . . . . . . . . 47--77 Ken Kennedy Compiler technology for machine-independent parallel programming 79--98 David J. Kuck What Do Users of Parallel Computer Systems Really Need? . . . . . . . . . . 99--127
Nicholas Carriero and David Gelernter Case studies in asynchronous data parallelism . . . . . . . . . . . . . . 129--149 William Y. Chen and Scott A. Mahlke and Nancy J. Warter and Sadun Anik and Wen-Mei W. Hwu Profile-assisted instruction scheduling 151--181 Wei Li and Keshav Pingali A singular loop transformation framework based on non-singular matrices . . . . . 183--205
Wen-Mei Hwu and Alex Nicolau From the Guest Editors . . . . . . . . . 207 Walid A. Najjar and Lucas Roh and A. P. Wim Böhm An Evaluation of Medium-Grain Dataflow Code . . . . . . . . . . . . . . . . . . 209--242 Gary Tyson and Matthew Farrens Code Scheduling for Multiple Instruction Stream Architectures . . . . . . . . . . 243--272 M. Rajagopalan and V. H. Allan Specification of Software Pipelining Using Petri Nets . . . . . . . . . . . . 273--301 Mark R. Gilder and Mukkai S. Krishnamoorthy Automatic Source-Code Parallelization Using HICOR Objects . . . . . . . . . . 303--350 Jian Wang and Christine Eisenbeis and Martin Jourdan and Bogong Su Decomposed software pipelining: a new perspective and a new approach . . . . . 351--373
Yosi Ben-Asher and Eitan Farchi Using True Concurrency to Model Execution of Parallel Programs . . . . . 375--407 Feipei Lai and Yung-kuang Chao and Chia-Jung Hsieh The Complementary Relationship of Interprocedural Register Allocation and Inlining . . . . . . . . . . . . . . . . 409--434 M. K. Stoj\vcev and E. I. Milovanovi\'c and I. \vZ. Milovanovi\'c An Optimal Scheduling Procedure for Matrix Inversion on Linear Array at a Processor Level . . . . . . . . . . . . 435--448 Michael L. Scott and John M. Mellor-Crummey Fast, Contention-Free Combining Tree Barriers for Shared-Memory Multiprocessors . . . . . . . . . . . . 449--481
Utpal Banerjee Editor's Introduction . . . . . . . . . 483 Larry Carter and Jeanne Ferrante and Vasanth Bala XDP: a Compiler Intermediate Language Extension for the Representation and Optimization of Data Movement . . . . . 485--518 Milind Girkar and Constantine D. Polychronopoulos The Hierarchical Task Graph as a Universal Intermediate Representation 519--551 Keith A. Faigin and Stephen A. Weatherford and Jay P. Hoeflinger and David A. Padua and Paul M. Petersen The Polaris Internal Representation . . 553--586
Jie Liu and Vikram A. Saletore and Ted G. Lewis Safe Self-Scheduling: a Parallel Loop Scheduling Scheme for Shared-Memory Multiprocessors . . . . . . . . . . . . 589--616 Theodore Johnson Parallel-Access Memory Management Using Fast-Fits . . . . . . . . . . . . . . . 617--648
Shlomit S. Pinter Introduction . . . . . . . . . . . . . . 3 Nicholas Carriero and David Gelernter and Marc Jourdenais and David Kaminsky Piranha Scheduling: Strategies and Their Implementation . . . . . . . . . . . . . 5--33 Steven Novack and Alexandru Nicolau A Hierarchical Approach to Instruction-level Parallelization . . . 35--62 Dror E. Maydan and John L. Hennessy and Monica S. Lam Effectiveness of Data Dependence Analysis . . . . . . . . . . . . . . . . 63--81 David Bernstein and Mauricio Breternitz, Jr. and Ahmed M. Gheith and Bilha Mendelson Solutions and Debugging for Data Consistency in Multiprocessors with Noncoherent Caches . . . . . . . . . . . 83--103
David Abramson and A. McKay Evaluating the Performance of a SISAL Implementation of the Abingdon Cross Image Processing Benchmark . . . . . . . 105--134 Dror G. Feitelson and Larry Rudolph Coscheduling Based on Runtime Identification of Activity Working Sets 135--160 Wei-Ming Lin and Bo Yang Probabilistic Performance Analysis for Parallel Search Techniques . . . . . . . 161--189 Jean-François Collard Automatic Parallelization of while-Loops Using Speculative Execution . . . . . . 191--219
Stephen Melvin and Yale Patt Enhancing Instruction Scheduling with a Block-Structured ISA . . . . . . . . . . 221--243 Heng-Yi Chao and Mary P. Harper Minimizing Redundant Dependencies and Interprocessor Synchronizations . . . . 245--262
Elana D. Granston and Thierry Montaut and François Bodin Loop Transformations to Prevent False Sharing . . . . . . . . . . . . . . . . 263--301 Wayne Kelly and William Pugh Using Affine Closure to Find Legal Reordering Transformations . . . . . . . 303--325 Eric Stoltz and Michael Wolfe Detecting Value-Based Scalar Dependence 327--358 Yi-Qing Yang and Corinne Ancourt and François Irigoin Minimal Data Dependence Abstractions for Loop Transformations: Extended Version 359--388
Yosi Ben-Asher and Gudula Runger and Assaf Schuster and Reinhard Wilhelm 2DT-FP: a Parallel Functional Programming Language on Two-Dimensional Data . . . . . . . . . . . . . . . . . . 389--422 Elana D. Granston and Alexander V. Veidenbaum Combining Flow and Dependence Analyses to Expose Redundant Array Accesses . . . 423--470 Martin Griebl and Christian Lengauer A Communication Scheme for the Distributed Execution of Loop Nests with while Loops . . . . . . . . . . . . . . 471--496
Mario Mango Furnari Guest Editor's Introduction . . . . . . 497 Andrea Capitanio and Alexandru Nicolau and Nikil Dutt A Hypergraph-Based Model for Port Allocation on Multiple-Register-File VLIW Architectures . . . . . . . . . . . 499--513 Eduard Ayguade and Jesus Labarta and Jordi Garcia and Merce Girones and Mateo Valero Analyzing Reference Patterns in Automatic Data Distribution Tools . . . 515--535 Lawrence Rauchwerger and Nancy M. Amato and David A. Padua A Scalable Method for Run-Time Loop Parallelization . . . . . . . . . . . . 537--576
Matthew Farrens and Wen-mei Hwu Guest Editors' Introduction . . . . . . 1 B. Ramakrishna Rau Iterative Modulo Scheduling . . . . . . 3--64 Michael Schlansker and Vinod Kathail and Sadun Anik Parallelization of Control Recurrences for ILP Processors . . . . . . . . . . . 65--102
Alexandre E. Eichenberger and Edward S. Davidson and Santosh G. Abraham Minimizing Register Requirements of a Modulo Schedule via Optimum Stage Scheduling . . . . . . . . . . . . . . . 103--132 Po-Yung Chang and Eric Hao and Tse-Yu Yeh and Yale Patt Branch Classification: a New Mechanism for Improving Branch Predictor Performance . . . . . . . . . . . . . . 133--158 Gary Tyson and Matthew Farrens Evaluating the Effects of Predicated Execution on Branch Prediction . . . . . 159--186 Thomas M. Conte and Burzin A. Patel and Kishore N. Menezes and J. Stan Cox Hardware-Based Profiling: an Effective Technique for Profile-Driven Optimization . . . . . . . . . . . . . . 187--206
Jean-Luc Gaudiot Guest Editor's Introduction . . . . . . 207 Po-Yung Chang and Eric Hao and Yale N. Patt and Pohua P. Chang Using Predicated Execution to Improve the Performance of a Dynamically Scheduled Machine with Speculative Execution . . . . . . . . . . . . . . . 209--234 David H. Albonesi and Israel Koren A Mean Value Analysis Multiprocessor Model Incorporating Super-scalar Processors and Latency Tolerating Techniques . . . . . . . . . . . . . . . 235--263 M. Cosnard and M. Loi A Simple Algorithm for the Generation of Efficient Loop Structures . . . . . . . 265--289
Dean Engelhardt and Andrew Wendelborn A Partitioning-Independent Paradigm for Nested Data Parallelism . . . . . . . . 291--317 Herbert H. J. Hum and Olivier Maquelin and Kevin B. Theobald and Xinmin Tian and Guang R. Gao and Laurie J. Hendren A Study of the EARTH-MANNA Multithreaded System . . . . . . . . . . . . . . . . . 319--348 Evan Torrie and Margaret Martonosi and Mary W. Hall and Chau-Wen Tseng Memory Referencing Behavior in Compiler-Parallelized Applications . . . 349--376 Thomas Sterling and Daniel Savarese and Phillip Merkey and Kevin Olson An Empirical Evaluation of the Convex SPP-1000 Hierarchical Shared Memory System . . . . . . . . . . . . . . . . . 377--396
Lesley R. Matheson and Robert E. Tarjan Parallelism in Multigrid Methods: How Much Is Too Much? . . . . . . . . . . . 397--432 Kish Shen and Manuel V. Hermenegildo High-Level Characteristics of OR- and Independent AND-Parallelism in Prolog 433--478
Rastislav Bodik and Rajiv Gupta Array Data Flow Analysis for Load-Store Optimizations in Fine-Grain Architectures . . . . . . . . . . . . . 481--512 Beatrice Creusillet and François Irigoin Interprocedural Array Region Analyses 513--546 Rakesh Ghiya and Laurie J. Hendren Connection Analysis: a Practical Interprocedural Heap Analysis for C . . 547--578 Wayne Kelly and William Pugh and Evan Rosser and Tatiana Shpeisman Transitive Closure of Infinite Graphs and its Applications . . . . . . . . . . 579--598 Thomas J. Sheffler and Robert Schreiber and William Pugh and John R. Gilbert and Siddhartha Chatterjee Efficient Distribution Analysis via Graph Contraction . . . . . . . . . . . 599--620
Frank Dehne and Siang W. Song Randomized Parallel List Ranking for Distributed Memory Multi-processors . . 1--16 Christoph W. Kessler and Helmut Seidl The Fork95 Parallel Programming Language: Design, Implementation, Application . . . . . . . . . . . . . . 17--50
Kemal Ebcio\uglu and Wen-mei Hwu Guest Editors' Introduction . . . . . . 51 Vasanth Bala and Norman Rubin Efficient Instruction Scheduling Using Finite State Automata . . . . . . . . . 53--82 Thomas M. Conte and Sumedh W. Sathaye Optimization of VLIW Compatibility Systems Employing Dynamic Rescheduling 83--112 Richard E. Hank and Wen-mei W. Hwu and B. Ramakrishna Rau Region-Based Compilation: Introduction, Motivation, and Initial Experience . . . 113--146
Michael Schlansker and Vinod Kathail Techniques for Critical Path Reduction of Scalar Programs . . . . . . . . . . . 147--181 Marco Fillo and Stephen W. Keckler and William J. Dally and Nicholas P. Carter and Andrew Chang and Yevgeny Gurevich and Whay S. Lee The M-Machine Multicomputer . . . . . . 183--212 Gary Tyson and Matthew Farrens and John Matthews and Andrew R. Pleszkun Managing Data Caches Using Selective Cache Line Replacement . . . . . . . . . 213--242
Walid A. Najjar and Gabriel M. Silberman Foreword to the Special Issues . . . . . 243 Chris J. Newburn and John Paul Shen Post-Pass Partitioning of Signal Processing Programs . . . . . . . . . . 245--280 Stephen Jenks and Jean-Luc Gaudiot Exploiting Locality and Tolerating Remote Memory Access Latency Using Thread Migration . . . . . . . . . . . . 281--304 Laurie J. Hendren and Xinan Tang and Yingchun Zhu and Shereen Ghobrial and Guang R. Gao and Xun Xue and Haiying Cai and Pierre Ouellet Compiling C for the EARTH Multithreaded Architecture . . . . . . . . . . . . . . 305--338
Po- Yung Chang and Marius Evers and Yale N. Patt Improving Branch Prediction Accuracy by Reducing Pattern History Table Interference . . . . . . . . . . . . . . 339--362 Stephan Jourdan and Jared Stark and Tse-Hao Hsing and Yale N. Patt Recovery Requirements of Branch Prediction Storage Structures in the Presence of Mispredicted-Path Execution 363--383 Lorenz Huelsbergen Dynamic Resolution: a Runtime Technique for the Parallelization of Modifications to Directed Acyclic Graphs . . . . . . . 385--417 Daeyeon Park and Rafael H. Saavedra and Sungdo Moon Adaptive Granularity: Transparent Integration of Fine- and Coarse-Grain Communication . . . . . . . . . . . . . 419--446
Alain Darte and Frédéric Vivien Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs . . . . . . . 447--496 Catherine Mongenet Affine Dependence Classification for Communications Minimization . . . . . . 497--524 Vincent Loechner and Doran K. Wilde Parameterized Polyhedra and Their Vertices . . . . . . . . . . . . . . . . 525--549
Editorial Introduction Editor's Announcement . . . . . . . . . 1--2 David Sehr Guest Editor's Introduction . . . . . . 3--4 Val Donaldson and Jeanne Ferrante Analyzing Asynchronous Pipeline Schedules . . . . . . . . . . . . . . . 5--42 Tito Autrey and Michael Wolfe Initial Results for Glacial Variable Analysis . . . . . . . . . . . . . . . . 43--64 Ajita John and James C. Browne Compilation of constraint programs with noncyclic and cyclic dependencies to procedural parallel programs . . . . . . 65--119
Josep Llosa and Eduard Ayguadé and Mateo Valero Quantitative evaluation of register pressure on software pipelined loops . . 121--142 Ricardo Bianchini and Enrique V. Carrera and Leonidas Kontothanassis Evaluating the effect of coherence protocols on the performance of parallel programming constructs . . . . . . . . . 143--181 John John E. So and Thomas J. Downar and Raghunandan Janardhan and Howard Jay Siegel Mapping conjugate gradient algorithms for neutron diffusion applications onto SIMD, MIMD, and mixed-mode machines . . 183--207
Thomas Grün and Thomas Rauber and Jochen Röhrig Support for Efficient Programming on the SB-PRAM . . . . . . . . . . . . . . . . 209--240 Cindy Norris and Lori L. Pollock Experiences with cooperating register allocation and instruction scheduling 241--283 Pierre-Yves Calland and Alain Darte and Yves Robert and Frederic Vivien On the Removal of Anti- and Output-Dependences . . . . . . . . . . . 285--312 Erik R. Altman and Guang R. Gao Optimal Modulo Scheduling Through Enumeration . . . . . . . . . . . . . . 313--344
Steve Beaty and Wen-mei Hwu Foreword to the Special Issue . . . . . 345--347 Santosh G. Abraham and Vinod Kathail and Brian L. Deitrich Meld Scheduling: a Technique for Relaxing Scheduling Constraints . . . . 349--381 Ashwini K. Nanda and James O. Bondi and Simonjit Dutta The Misprediction Recovery Cache . . . . 383--415 John C. Gyllenhaal and Wen-mei W. Hwu and B. Ramakrishna Rau Optimization of Machine Descriptions for Efficient Use . . . . . . . . . . . . . 417--447 Eric Hao and Po-Yung Chang and Marius Evers and Yale N. Patt Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures . . . . . . . . . . . . . 449--478 Michael E. Wolf and Dror E. Maydan and Ding-Kai Chen Combining Loop Transformations Considering Caches and Scheduling . . . 479--503 Mikko H. Lipasti and John Paul Shen Exploiting Value Locality to Exceed the Dataflow Limit . . . . . . . . . . . . . 505--538
Zhiyuan Li and Pen-Chung Yew Introduction . . . . . . . . . . . . . . 539--540 Insung Park and Michael Voss and Brian Armstrong and Rudolf Eigenmann Parallel Programming and Performance Evaluation with the URSA Tool Family . . 541--561 Jaejin Lee and Samuel P. Midkiff and David A. Padua A Constant Propagation Algorithm for Explicitly Parallel Programs . . . . . . 563--589 Hwansoo Han and Chau-Wen Tseng and Pete Keleher Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs . . . . . . . . . . . . . . . . . . 591--612 John Mellor-Crummey and Vikram Adve Simplifying Control Flow in Compiler-Generated Parallel Code . . . . 613--638
Zhiyuan Li and Pen-Chung Yew Introduction . . . . . . . . . . . . . . 639--640 Nicholas Mitchell and Karin Högstedt and Larry Carter and Jeanne Ferrante Quantifying the Multi-Level Nature of Tiling Interactions . . . . . . . . . . 641--670 Jingling Xue and Chua-Huang Huang Reuse-Driven Tiling for Improving Data Locality . . . . . . . . . . . . . . . . 671--696
Jenn-Yuan Tsai and Zhenzhen Jiang and Pen-Chung Yew Compiler Techniques for the Superthreaded Architectures . . . . . . 1--19 Thomas Kistler and Michael Franz A Tree-Based Alternative to Java Byte-Codes . . . . . . . . . . . . . . . 21--33 Edward H. Gornish and Alexander Veidenbaum An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors . . . . . . . . . . . . 35--70
Kazuki Joe Guest Editor's Introduction . . . . . . 71--72 Bret A. Marsolf and Kyle A. Gallivan and Harry A. G. Wijshoff The Utilization of Matrix Structure to Generate Optimized Code from MATLAB Programs . . . . . . . . . . . . . . . . 73--96 Atsushi Kubota and Shogo Tatsumi and Toshihiko Tanaka and Masahiro Goshima and Shin-ichiro Mori and Hiroshi Nakashima and Shinji Tomita A Technique to Eliminate Redundant Inter-Processor Communication on Parallelizing Compiler TINPAR . . . . . 97--109 Mariko Sasakura and Kazuki Joe and Yoshitoshi Kunieda and Keijiro Araki NaraView: an Interactive $3$D Visualization System for Parallelization of Programs . . . . . . . . . . . . . . 111--129
Michael F. P. O'Boyle and Peter M. W. Knijnenburg Nonsingular Data Transformations: Definition, Validity, and Applications 131--159 Avi Mendelson and Michael Bekerman Design Alternatives of Multithreaded Architecture . . . . . . . . . . . . . . 161--193 Min Tan and Janet M. Siegel and Howard Jay Siegel Parallel Implementations of Block-Based Motion Vector Estimation for Video Compression on Four Parallel Processing Systems . . . . . . . . . . . . . . . . 195--225
Shlomit S. Pinter Introduction . . . . . . . . . . . . . . 227--228 Yiannakis Sazeides and James E. Smith Limits of Data Value Predictability . . 229--256 Steven Phillips and Anne Rogers Parallel Speech Recognition . . . . . . 257--288 Ragini Narasimhan and Daniel J. Rosenkrantz and S. S. Ravi Using Data Flow Information to Obtain Efficient Check Sets for Algorithm-Based Fault Tolerance . . . . . . . . . . . . 289--323
Thomas Conte and Wen-Mei Hwu and Mark Smotherman Editors' Introduction . . . . . . . . . 325--326 Keith I. Farkas and Paul Chow and Norman P. Jouppi and Zvonko Vranesic The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning . . . . . . . . . . . . . . 327--356 Gary S. Tyson and Todd M. Austin Memory Renaming: Fast, Early and Accurate Processing of Memory Communication . . . . . . . . . . . . . 357--380 David I. August and Wen-mei W. Hwu and Scott A. Mahlke The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication . . . . . . . . . . . . . . 381--423
Thomas Conte and Wen-mei Hwu and Mark Smotherman Editors' Introduction . . . . . . . . . 425--426 Andreas Moshovos and Gurindar S. Sohi Speculative Memory Cloaking and Bypassing . . . . . . . . . . . . . . . 427--456 Darko Kirovski and Johnson Kin and William H. Mangione-Smith Procedure Based Program Compression . . 457--475 Jack L. Lo and Susan J. Eggers and Henry M. Levy and Sujay S. Parekh and Dean M. Tullsen Tuning Compiler Optimizations for Simultaneous Multithreading . . . . . . 477--503
R. Govindarajan and N. S. S. Narasimha Rao and E. R. Altman and Guang R. Gao Enhanced Co-Scheduling: a Software Pipelining Method Using Modulo-Scheduled Pipeline Theory . . . . . . . . . . . . 1--46 Vincent Loechner and Catherine Mongenet Communication Optimization for Affine Recurrence Equations Using Broadcast and Locality . . . . . . . . . . . . . . . . 47--102 Marc Daumas and Paraskevas Evripidou Parallel Implementations of the Selection Problem: a Case Study . . . . 103--131
Anonymous Guest Editor's Introduction . . . . . . 133--134 Kazuaki Ishizaki and Hideaki Komatsu and Toshio Nakatani A Loop Transformation Algorithm for Communication Overlapping . . . . . . . 135--154 Naoshi Uchihira and Hideji Kawata and Fumitaka Tamura Scenario-Based Hypersequential Programming . . . . . . . . . . . . . . 155--157 Hironori Nakajo and Akihiro Ichikawa and Yukio Kaneda A Distributed Shared-Memory System on a Workstation Cluster Using Fast Serial Links . . . . . . . . . . . . . . . . . 179--194 Hideki Saito and Nicholas J. Stavrakos and Constantine D. Polychronopoulos and others The Design of the PROMIS Compiler-Towards Multi-Level Parallelization . . . . . . . . . . . . 195--212
Denis Barthou and Albert Cohen and Jean-François Collard Maximal Static Expansion . . . . . . . . 213--243 David K. Lowenthal Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs 245--274 Ramiro Varela Arias and Camino Rodríguez Vela and Jorge Puente Peinador and Cesar Alonso Gonzalez Parallel Logic Programming for Problem Solving . . . . . . . . . . . . . . . . 275--319
Anonymous Introduction . . . . . . . . . . . . . . 321--323 Erven Rohou and François Bodin and Christine Eisenbeis and Andre Seznec Handling Global Constraints in Compiler Strategy . . . . . . . . . . . . . . . . 325--345 Andreas Krall and Sylvain Lelait Compilation Techniques for Multimedia Processors . . . . . . . . . . . . . . . 347--361 N. Sreraman and R. Govindarajan A Vectorizing Compiler for Multimedia Extensions . . . . . . . . . . . . . . . 363--400 Henk Corporaal and Johan Janssen and Marnix Arnold Computation in the Context of Transport Triggered Architectures . . . . . . . . 401--427
Anonymous Introduction . . . . . . . . . . . . . . 429--430 Wolfram Amme and Peter Braun and François Thomasset and Eberhard Zehendner Data Dependence Analysis of Assembly Code . . . . . . . . . . . . . . . . . . 431--467 Fabien Quillere and Sanjay Rajopadhye and Doran Wilde Generation of Efficient Nested Loops from Polyhedra . . . . . . . . . . . . . 469--498 Alain Darte and Guillaume Huard Loop Shifting for Loop Compaction . . . 499--534
Paraskevas Evripidou Introduction . . . . . . . . . . . . . . 535--536 Manish Gupta and Sayak Mukhopadhyay and Navin Sinha Automatic Parallelization of Recursive Procedures . . . . . . . . . . . . . . . 537--562 Lori Carter and Beth Simon and Brad Calder and Larry Carter and Jeanne Ferrante Path Analysis and Renaming for Predicated Instruction Scheduling . . . 563--588 Peng Wu and David Padua Containers on the Parallelization of General-Purpose Java Programs . . . . . 589--605 Martin Griebl and Paul Feautrier and Christian Lengauer Index Set Splitting . . . . . . . . . . 607--631
Anonymous Introduction . . . . . . . . . . . . . . 1--2 Venkata Krishnan and Josep Torrellas The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors . . . . . . . . . . . . 3--33 Pierre Michaud and André Seznec and Stéphan Jourdan An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors . . . . . . . . . . . . . . . 35--58 Ramon Canal and Joan-Manuel Parcerisa and Antonio González Dynamic Code Partitioning for Clustered Architectures . . . . . . . . . . . . . 59--79 Artur Klauser and Srilatha Manne and Dirk Grunwald Selective Branch Inversion: Confidence Estimation for Branch Predictors . . . . 81--110
Matthew Arnold and Michael Hsiao and Ulrich Kremer and Barbara G. Ryder Exploring the Interaction between Java's Implicitly Thrown Exceptions and Instruction Scheduling . . . . . . . . . 111--137 Dhruva R. Chakrabarti and Prithviraj Banerjee Static Single Assignment Form for Message-Passing Programs . . . . . . . . 139--184 Jay P. Hoeflinger and Yunheung Paek and Kwang Yi Unified Interprocedural Parallelism Detection . . . . . . . . . . . . . . . 185--215
John Mellor-Crummey and David Whalley and Ken Kennedy Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings . . . . . . 217--247 Dimitrios S. Nikolopoulos and Theodore S. Papatheodorou The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors . . . . . . . . . . . . 249--282 Hongzhang Shan and Jaswinder Pal Singh A Comparison of MPI, SHMEM and Cache-Coherent Shared Address Space Programming Models on a Tightly-Coupled Multiprocessors . . . . . . . . . . . . 283--318 Induprakas Kodukula and Keshav Pingali Data-Centric Transformations for Locality Enhancement . . . . . . . . . . 319--364
Mayez Al-Mouhamed and Hussam Abu-Haimed Evaluation of Neural and Genetic Algorithms for Synthesizing Parallel Storage Schemes . . . . . . . . . . . . 365--399 Raju Pandey and James C. Browne Support for Implementation of Evolutionary Concurrent Systems . . . . 401--431 Isabelle Attali and Denis Caromel and Yung-Syau Chen and Jean-Luc Gaudiot and Andrew L. Wendelborn Enhancing Functional and Irregular Parallelism: Stateful Functions and their Semantics . . . . . . . . . . . . 433--460
Alex Veidenbaum Guest Editor's Introduction . . . . . . 461--462 Ken Kennedy Fast Greedy Weighted Fusion . . . . . . 463--491 Nawaaz Ahmed and Nikolay Mateev and Keshav Pingali Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests . . . . . 493--544 Vivek Sarkar Optimized Unrolling of Nested Loops . . 545--581
Yosi Ben-Asher and Dimitry Podvolny Y-Invalidate: a New Protocol for Implementing Weak Consistency in DSM Systems . . . . . . . . . . . . . . . . 583--606 Inbum Jung and Jongwoong Hyun and Joonwon Lee and Joongsoo Ma Two-Phase Barrier: a Synchronization Primitive for Improving the Processor Utilization . . . . . . . . . . . . . . 607--627
Tracy D. Braun and Renard Ulrey and Anthony A. Maciejewski and Howard Jay Siegel Parallel Approaches for Singular Value Decomposition as Applied to Robotic Manipulator Jacobians . . . . . . . . . 1--35 Francisco Corbera and Rafael Asenjo and Emilio Zapata New Shape Analysis and Interprocedural Techniques for Automatic Parallelization of C Codes . . . . . . . . . . . . . . . 37--63
Aart J. C. Bik and Milind Girkar and Paul M. Grey and Xinmin Tian Automatic Intra-Register Vectorization for the Intel\reg Architecture . . . . . 65--98 Jose M. Mantas Ruiz and Julio Ortega Lopera and Jose A. Carrillo de la Plata Component-Based Derivation of a Parallel Stiff ODE Solver Implemented in a Cluster of Computers . . . . . . . . . . 99--148
Dragan Milicev and Zoran Jovanovic Control Flow Regeneration for Software Pipelined Loops with Conditions . . . . 149--179 David Wonnacott Achieving Scalable Locality with Time Skewing . . . . . . . . . . . . . . . . 181--221
Alex Veidenbaum Guest Editor's Introduction . . . . . . 223--224 Dimitrios S. Nikolopoulos and Eduard Ayguadé and Constantine D. Polychronopoulos Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models . . . . . . . . . . . 225--255 Pramod G. Joisha and Samuel P. Midkiff and Mauricio J. Serrano and Manish Gupta Efficiently Adapting Java Binaries in Limited Memory Contexts . . . . . . . . 257--289 Arun Chauhan and Ken Kennedy Reducing and Vectorizing Procedures for Telescoping Languages . . . . . . . . . 291--315 George S. Almasi and C\ualin Ca\cscaval and José G. Castaños and Monty Denneau and Wilm Donath and Maria Eleftheriou and Mark Giampapa and Howard Ho and Derek Lieber and José E. Moreira and Dennis Newns and Marc Snir and Henry S. Warren, Jr. Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer . . . . . . . . . . . 317--351
Krishna M. Kavi and Alireza Moshtaghi and Deng-jyi Chen Modeling Multithreaded Applications Using Petri Nets . . . . . . . . . . . . 353--371 Alex Ramirez and Josep Ll. Larriba-Pey and Carlos Navarro and Mateo Valero and Josep Torrellas Software Trace Cache for Commercial Applications . . . . . . . . . . . . . . 373--395
Ivan D. Baev and Waleed M. Meleis and Santosh G. Abraham Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots 397--418 Paola Favati and Grazia Lotti and Ornella Menchi and Francesco Romani Railway Computation for Infinite Linear Systems . . . . . . . . . . . . . . . . 419--439
Kazuki Joe Guest Editor's Introduction . . . . . . 1--2 Siegfried Benkner and Viera Sipkova Exploiting Distributed-Memory and Shared-Memory Parallelism on Clusters of SMPs with Data Parallel Programs . . . . 3--19 Minsoo Jeon and Dongseung Kim Parallel Merge Sort with Load Balancing 21--33 J. Davison de St.Germain and Alan Morris and Steven G. Parker and Allen D. Malony and Sameer Shende Performance Analysis Integration in the Uintah Software Development Cycle . . . 35--53 Takeshi Iwashita and Masaaki Shimasaki Block Red-Black Ordering: a New Ordering Strategy for Parallelization of ICCG Method . . . . . . . . . . . . . . . . . 55--75
Alfredo Cristobal-Salas and Andrei Tchernykh and Jean-Luc Gaudiot and Wen-Yen Lin Non-Strict Execution in Parallel and Distributed Computing . . . . . . . . . 77--105 Patricio Buli\'c and Veselko Gu\vstin An Extended ANSI C for Processors with a Multimedia Extension . . . . . . . . . . 107--136 Zhijian Lu and John Lach and Mircea R. Stan and Kevin Skadron Alloyed Branch History: Combining Global and Local Branch History for Robust Performance . . . . . . . . . . . . . . 137--177
Anonymous Erratum . . . . . . . . . . . . . . . . 179--179 Eduard Ayguade Guest Editor's Introduction . . . . . . 181--183 Daisuke Takahashi and Mitsuhisa Sato and Taisuke Boku Performance Evaluation of the Hitachi SR8000 Using SPEC OMP2001 Benchmarks . . 185--196 Hideki Saito and Greg Gaertner and Wesley Jones and Rudolf Eigenmann and Hidetoshi Iwashita and Ron Lieberman and Matthijs van Waveren and Brian Whitney Large System Performance of SPEC OMP Benchmark Suites . . . . . . . . . . . . 197--209 Hirofumi Nakano and Kazuhisa Ishizaka and Motoki Obata and Keiji Kimura and Hironori Kasahara Static Coarse Grain Task Scheduling with Cache Optimization Using OpenMP . . . . 211--223 Seung-Jai Min and Ayon Basumallik and Rudolf Eigenmann Optimizing OpenMP Programs on Software Distributed Shared Memory Systems . . . 225--249
Silvius Rus and Lawrence Rauchwerger and Jay Hoeflinger Hybrid Analysis: Static & Dynamic Memory Reference Analysis . . . . . . . . . . . 251--283 Richard L. Graham and Sung-Eun Choi and David J. Daniel and Nehal N. Desai and Ronald G. Minnich and Craig E. Rasmussen and L. Dean Risinger and Mitchel W. Sukalski A Network-Failure-Tolerant Message-Passing System for Terascale Clusters . . . . . . . . . . . . . . . . 285--303 Venkata K. Pingali and Sally A. McKee and Wilson C. Hsieh and John B. Carter Restructuring Computations for Temporal Data Cache Locality . . . . . . . . . . 305--338
Han-Saem Yun and Jihong Kim and Soo-Mook Moon Time Optimal Software Pipelining of Loops with Control Flows . . . . . . . . 339--391 Keqin Li On the Performance of Randomized Embedding of Reproduction Trees in Static Networks . . . . . . . . . . . . 393--406
Alex Orailoglu Guest Editor's Introduction . . . . . . 407--409 Kubilay Atasu and Laura Pozzi and Paolo Ienne Automatic Application-Specific Instruction-Set Extensions Under Microarchitectural Constraints . . . . . 411--428 Nathan Clark and Hongtao Zhong and Wilkin Tang and Scott Mahlke Automatic Design of Application Specific Instruction Set Extensions Through Dataflow Graph Exploration . . . . . . . 429--449 José L. Ayala and Alexander Veidenbaum and Marisa López-Vallejo Power-Aware Compilation for Register File Energy Reduction . . . . . . . . . 451--467 G. Surendra and S. Banerjee and S. K. Nandy On the Effectiveness of Flow Aggregation in Improving Instruction Reuse in Network Processing Applications . . . . 469--487 C. Kachris and N. Bourbakis and A. Dollas A Reconfigurable Logic-Based Processor for the SCAN Image and Video Encryption Algorithm . . . . . . . . . . . . . . . 489--506
Lei Pan and MingKin Lai and Koji Noguchi and Javid J. Huseynov and Lubomir F. Bic and Michael B. Dillencourt Distributed Parallel Computing Using Navigational Programming . . . . . . . . 1--37 Jongwook Woo and Jean-Luc Gaudiot and Andrew L. Wendelborn Alias Analysis in Java with Reference-Set Representation for High-Performance Computing . . . . . . . 39--76
N. P. Manoj and K. V. Manjunath and R. Govindarajan CAS-DSM: a Compiler Assisted Software Distributed Shared Memory . . . . . . . 77--122 Mayez Al-Mouhamed Array Organization in Parallel Memories 123--163
Utpal Banerjee Guest Editor's Introduction . . . . . . 165--166 Jiuxing Liu and Jiesheng Wu and Dhabaleswar K. Panda High Performance RDMA-Based MPI Implementation over InfiniBand . . . . . 167--198 Daniel Ortega and Mateo Valero and Eduard Ayguadé Dynamic Memory Instruction Bypassing . . 199--224 Ravi Rajwar and Alain Kägi and James R. Goodman Inferential Queueing and Speculative Push . . . . . . . . . . . . . . . . . . 225--258
Utpal Banerjee Guest Editor's Introduction . . . . . . 259--261 Julita Corbalan and Xavier Martorell and Jesus Labarta Page Migration with Dynamic Space-Sharing Scheduling Policies: The Case of the SGI O2000 . . . . . . . . . 263--288 Steven Carroll and Constantine Polychronopoulos A Framework for Incremental Extensible Compiler Construction . . . . . . . . . 289--316 Konstantinos Kyriakopoulos and Kleanthis Psarris Data Dependence Analysis Techniques for Increased Accuracy and Extracted Parallelism . . . . . . . . . . . . . . 317--359
Stavros Souravlas and Manos Roumeliotis A Pipeline Technique for Dynamic Data Transfer on a Multiprocessor Grid . . . 361--388 Hideya Iwasaki and Zhenjiang Hu A New Parallel Skeleton for General Accumulative Computations . . . . . . . 389--414 H. Sarojadevi and S. K. Nandy and S. Balakrishnan On the Correctness of Program Execution When Cache Coherence Is Maintained Locally at Data-Sharing Boundaries in Distributed Shared Memory Multiprocessors . . . . . . . . . . . . 415--446
Javier Zalamea and Josep Llosa and Eduard Ayguadé and Mateo Valero Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures . . . . . . . . . . . 447--474 Virgil Palanciuc and Dragos Badea A Spill Code Minimization Technique-Application in the Metrowerks StarCore C Compiler . . . . . . . . . . 475--499 Vijay Menon and Keshav Pingali Look Left, Look Right, Look Left Again: an Application of Fractal Symbolic Analysis to Linear Algebra Code Restructuring . . . . . . . . . . . . . 501--523
Yonghong Song and Cheng Wang and Zhiyuan Li A Polynomial-Time Algorithm for Memory Space Reduction . . . . . . . . . . . . 1--33 Eric Hung-Yu Tseng and Jean-Luc Gaudiot Automatic Array Partitioning Based on the Smith Normal Form . . . . . . . . . 35--56 Mo Zeyao Concatenation Algorithms for Parallel Numerical Simulation of Radiation Hydrodynamics coupled with Neutron Transport . . . . . . . . . . . . . . . 57--71
Frederica Darema The Next Generation Software Program . . 73--79 David I. August and Sharad Malik and Li-Shiuan Peh and Vijay Pai and Manish Vachharajani and Paul Willmann Achieving Structural and Composable Modeling of Complex Systems . . . . . . 81--101 Naveen Kumar and Bruce R. Childers and Daniel Williams and Jack W. Davidson and Mary Lou Soffa Compile-Time Planning for Overhead Reduction in Software Dynamic Translators . . . . . . . . . . . . . . 103--114 Shobana Padmanabhan and Phillip Jones and David V. Schuehler and Scott J. Friedman and Praveen Krishnamurthy and Huakai Zhang and Roger Chamberlain and Ron K. Cytron and Jason Fritts and John W. Lockwood Extracting and Improving Microarchitecture Performance on Reconfigurable Architectures . . . . . . 115--136 Victor Eijkhout and Erika Fuentes and Thomas Eidson and Jack Dongarra The Component Structure of a Self-Adapting Numerical Software System 137--143 Douglas Gregor and Jaakko Järvi and Mayuresh Kulkarni and Andrew Lumsdaine and David Musser and Sibylle Schupp Generic Programming and High-Performance Libraries . . . . . . . . . . . . . . . 145--164 Yoon-Ju Lee and Pedro C. Diniz and Mary W. Hall and Robert Lucas Empirical Optimization for a Sparse Linear Solver: a Case Study . . . . . . 165--181 Gengbin Zheng and Terry Wilmarth and Praveen Jagadishprasad and Laxmikant V. Kalé Simulation-Based Performance Prediction for Large Parallel Machines . . . . . . 183--207 F. Berman and H. Casanova and A. Chien and K. Cooper and H. Dail and A. Dasgupta and W. Deng and J. Dongarra and L. Johnsson and K. Kennedy and C. Koelbel and B. Liu and X. Liu and A. Mandal and G. Marin and M. Mazina and J. Mellor-Crummey and C. Mendes and A. Olugbile and M. Patel and D. Reed and Z. Shi and O. Sievert and H. Xia and A. YarKhan New Grid Scheduling and Rescheduling Methods in the GrADS Project . . . . . . 209--229 J. Eliot B. Moss and Trek Palmer and Timothy Richards and Edward K. Walters and Charles C. Weems CISL: a Class-Based Machine Description Language for Co-Generation of Compilers and Simulators . . . . . . . . . . . . . 231--246
Ravi Iyer and Jack Perdue and Lawrence Rauchwerger and Nancy M. Amato and Laxmi Bhuyan An Experimental Evaluation of the HP V-Class and SGI Origin 2000 Multiprocessors using Microbenchmarks and Scientific Applications . . . . . . 307--350 Chao Lin and Jang-Ping Sheu Efficient Broadcast in Heterogeneous Networks of Workstations Using Two Sub-Networks . . . . . . . . . . . . . . 351--391 Sid-Ahmed-Ali Touati Register Saturation in Instruction Level Parallelism . . . . . . . . . . . . . . 393--449
Jean-Luc Gaudiot and Siang Wun Song Message from the Guest Editors . . . . . 451--452 Rodolfo Azevedo and Sandro Rigo and Marcus Bartholomeu and Guido Araujo and Cristiano Araujo and Edna Barros The ArchC Architecture Description Language and Tools . . . . . . . . . . . 453--484 Debora R. Roberti and Roberto P. Souto and Haroldo F. Campos Velho and Gervasio A. Degrazia and Domenico Anfossi Parallel Implementation of a Lagrangian Stochastic Model for Pollutant Dispersion . . . . . . . . . . . . . . . 485--498 Edson Toshimi Midorikawa and Helio Marci Oliveira and Jean Marcos Laine PEMPIs: a New Methodology for Modeling and Prediction of MPI Programs Performance . . . . . . . . . . . . . . 499--527 Onur Mutlu and Hyesoon Kim and David N. Armstrong and Yale N. Patt Using the First-Level Caches as Filters to Reduce the Pollution Caused by Speculative Memory References . . . . . 529--559 Yue Luo and Lizy K. John and Lieven Eeckhout SMA: a Self-Monitored Adaptive Cache Warm-Up Scheme for Microprocessor Simulation . . . . . . . . . . . . . . . 561--581
Franco Fummi and Ian G. Harris Editorial . . . . . . . . . . . . . . . 583--584 Mirko Loghi and Tiziana Margaria and Graziano Pravadelli and Bernhard Steffen Dynamic and Formal Verification of Embedded Systems: a Comparative Survey 585--611 Jean-Pierre Talpin and Paul Le Guernic and Sandeep Kumar Shukla and Rajesh Gupta A Compositional Behavioral Modeling Framework for Embedded System Design and Conformance Checking . . . . . . . . . . 613--643 Alfred Koelbl and Carl Pixley Constructing Efficient Formal Models from High-Level Descriptions Using Symbolic Simulation . . . . . . . . . . 645--666 Francesco Bruschi and Fabrizio Ferrandi and Donatella Sciuto A Framework for the Functional Verification of SystemC Models . . . . . 667--695 Iñigo Ugarte and Pablo Sanchez Verification of Embedded Systems Based on Interval Analysis . . . . . . . . . . 697--720
Ian G. Harris and Franco Fummi Guest Editor\'s Introduction . . . . . . 1--2 Xi Chen and Harry Hsieh and Felice Balarin Verification Approach of Metropolis Design Framework for Embedded Systems 3--27 Samar Abdi and Daniel Gajski Verification of System Level Model Transformations . . . . . . . . . . . . 29--59 David Currie and Xiushan Feng and Masahiro Fujita and Alan J. Hu and Mark Kwan and Sreeranga Rajan Embedded Software Verification Using Symbolic Execution and Uninterpreted Functions . . . . . . . . . . . . . . . 61--91 Ernesto Sánchez and Matteo Sonza Reorda and Giovanni Squillero Efficient Techniques for Automatic Verification-Oriented Test Set Optimization . . . . . . . . . . . . . . 93--109
Bilha Mendelson and Shlomit S. Pinter and Ayal Zaks Introduction . . . . . . . . . . . . . . 111--112 Michael Factor and Assaf Schuster and Konstantin Shagin A Platform-Independent Distributed Runtime for Standard Multithreaded Java 113--142 Gregory Chockler and Dahlia Malkhi Light-Weight Leases for Storage-Centric Coordination . . . . . . . . . . . . . . 143--170 Alexander Gendler and Avi Mendelson and Yitzhak Birk A PAB-Based Multi-Prefetcher Mechanism 171--188
Chris Jesshope and Alex Shafarenko Special issue on Micro-grids --- Guest Editor Introduction . . . . . . . . . . 189--192 Carmen Martínez and Enrique Vallejo and Ramón Beivide and Cruz Izu and Miquel Moretó Dense Gaussian Networks: Suitable Topologies for On-Chip Multiprocessors 193--211 Pedro Trancoso and Paraskevas Evripidou and Kyriakos Stavrou and Costas Kyriacou A Case for Chip Multiprocessors Based on the Data-Driven Multithreading Model . . 213--235 Asadollah Shahbahrami and Ben Juurlink and Demid Borodin and Stamatis Vassiliadis Avoiding Conversion and Rearrangement Overhead in SIMD Architectures . . . . . 237--260 Sylvain Girbal and Nicolas Vasilache and Cédric Bastoul and Albert Cohen and David Parello and Marc Sigler and Olivier Temam Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies . . . . . . . . . . . 261--317
Chris Jesshope and Alex Shafarenko Guest Editor's Introduction á $<$Part 2$>$ 319--322 Gajinder Panesar and Daniel Towner and Andrew Duller and Alan Gray and Will Robbins Deterministic Parallel Processing . . . 323--341 Ian Bell and Nabil Hasasneh and Chris Jesshope Supporting Microthread Scheduling and Synchronisation in CMPs . . . . . . . . 343--381 Clemens Grelck and Sven-Bodo Scholz SAC --- a Functional Array Language for Efficient Multi-threaded Execution . . . 383--427
Paraskevas Evripidou and George Samaras Metacomputing with Mobile Agents . . . . 429--458 Paul Feautrier Scalable and Structured Scheduling . . . 459--487
A. Aiello and M. Mango Furnari and A. Massarotti and S. Brandi and V. Caputo and V. Barone An Experimental Ontology Server for an Information Grid Environment . . . . . . 489--508 Ales Holobar and Milan Ojstersek and Damjan Zazula Distributed Jacobi Joint Diagonalization on Clusters of Personal Computers . . . 509--530
Rajani Pai and R. Govindarajan FEADS: a Framework for Exploring the Application Design Space on Network Processors . . . . . . . . . . . . . . . 1--31 Ender Özcan and Esin Onbasioglu Memetic Algorithms for Parallel Code Optimization . . . . . . . . . . . . . . 33--61 Chunhui Zhang and Fadi Kurdahi Reducing Off-Chip Memory Access via Stream-Conscious Tiling on Multimedia Applications . . . . . . . . . . . . . . 63--98
Tony Givargis Special Issue On Embedded Processors --- Guest Editor Introduction . . . . . . . 99--100 JoAnn M. Paul and Brett H. Meyer Amdahl's Law Revisited for Single Chip Systems . . . . . . . . . . . . . . . . 101--123 Sorin Manolache and Petru Eles and Zebo Peng Fault-aware Communication Mapping for NoCs with Guaranteed Latency . . . . . . 125--156 Peter Petrov and Alex Orailoglu Dynamic Tag Reduction for Low-Power Caches in Embedded Systems with Virtual Memory . . . . . . . . . . . . . . . . . 157--177
Sally A. McKee Guest Editor's Introduction . . . . . . 179--180 José E. Moreira and Valentina Salapura and George Almasi and Charles Archer and Ralph Bellofatto and Peter Bergner and Randy Bickford and Mathias Blumrich and José R. Brunheroto and Arthur A. Bright and Michael Brutman and José G. Castaños and Dong Chen and Paul Coteus and Paul Crumley and Sam Ellis and Thomas Engelsiepen and Alan Gara and Mark Giampapa and Tom Gooding and Shawn Hall and Ruud A. Haring and Roger Haskin and Philip Heidelberger and Dirk Hoenicke and Todd Inglett and Gerrard V. Kopcsay and Derek Lieber and David Limpert and Pat McCarthy and Mark Megerian and Mike Mundy and Martin Ohmacht and Jeff Parker and Rick A. Rand and Don Reed and Ramendra Sahoo and Alda Sanomiya and Richard Shok and Brian Smith and Gordon G. Stewart and Todd Takken and Pavlos Vranas and Brian Wallenfelt and Michael Blocksome and Joe Ratterman The Blue Gene/L Supercomputer: a Hardware and Software Story . . . . . . 181--206 Gregory L. Lee and Martin Schulz and Dong H. Ahn and Andrew Bernat and Bronis R. de Supinski and Steven Y. Ko and Barry Rountree Dynamic Binary Instrumentation and Data Aggregation on Large Scale Systems . . . 207--232 Michael Gschwind The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor . . . . . . . . . . . . . 233--262 Samuel Williams and John Shalf and Leonid Oliker and Shoaib Kamil and Parry Husbands and Katherine Yelick Scientific Computing Kernels on the Cell Processor . . . . . . . . . . . . . . . 263--298 James Laudon and Lawrence Spracklen The Coming Wave of Multithreaded Chip Multiprocessors . . . . . . . . . . . . 299--330
Eduard Ayguadé and Matthias S. Mueller Special Issue on OpenMP --- Guest Editors' Introduction . . . . . . . . . 331--333 Greg Bronevetsky and Bronis R. de Supinski Complete Formal Specification of the OpenMP Memory Model . . . . . . . . . . 335--392 Alejandro Duran and Roger Ferrer and Juan José Costa and Marc Gonz\`alez and Xavier Martorell and Eduard Ayguadé and Jesús Labarta A Proposal for Error Handling in OpenMP 393--416 Alan Morris and Allen D. Malony and Sameer S. Shende Supporting Nested OpenMP Parallelism in the TAU Performance System . . . . . . . 417--436
Eduard Ayguadé and Matthias S. Mueller Introduction . . . . . . . . . . . . . . 437--439 Russell Brown and Ilya Sharapov High-Scalability Parallelization of a Molecular Modeling Application: Performance and Productivity Comparison Between OpenMP and MPI Implementations 441--458 Dieter an Mey and Samuel Sarholz and Christian Terboven Nested Parallelization with OpenMP . . . 459--476 Markus Nordén and Henrik Löf and Jarmo Rantakokko and Sverker Holmgren Dynamic Data Migration for Structured AMR Solvers . . . . . . . . . . . . . . 477--491 Tien-Hsiung Weng and Ruey-Kuen Perng and Barbara Chapman OpenMP Implementation of SPICE3 Circuit Simulator . . . . . . . . . . . . . . . 493--505
Anup Gangwar and M. Balakrishnan and Preeti Ranjan Panda and Anshul Kumar Evaluation of Bus Based Interconnect Mechanisms in Clustered VLIW Architectures . . . . . . . . . . . . . 507--527 Issam W. Damaj Parallel Algorithms Development for Programmable Devices with Application from Cryptography . . . . . . . . . . . 529--572 Laurent Baduel and Françoise Baude and Denis Caromel Asynchronous Typed Object Groups for Grid Programming . . . . . . . . . . . . 573--614 Kento Emoto and Zhenjiang Hu and Kazuhiko Kakehi and Masato Takeichi A Compositional Framework for Developing Parallel Programs on Two-Dimensional Arrays . . . . . . . . . . . . . . . . . 615--658
Preeti Ranjan Panda Guest Editor Introduction: Special Issue on Multiprocessor-based Embedded Systems 1--2 Martino Ruggiero and Alessio Guerri and Davide Bertozzi and Michela Milano and Luca Benini A Fast and Accurate Technique for Mapping Parallel Applications on Stream-Oriented MPSoC Platforms with Communication Awareness . . . . . . . . 3--36 Traian Pop and Paul Pop and Petru Eles and Zebo Peng Analysis and Optimisation of Hierarchically Scheduled Multiprocessor Embedded Systems . . . . . . . . . . . . 37--67 Lobna Kriaa and Aimen Bouchhima and Marius Gligor and Anne-Marie Fouillart and Fréderic Pétrot and Ahmed-Amine Jerraya Parallel Programming of Multi-processor SoC: a HW--SW Interface Perspective . . 68--92 Ilya Issenin and Nikil Dutt Using FORAY Models to Enable MPSoC Memory Optimizations . . . . . . . . . . 93--113 Mohammad Abdullah Al Faruque and Jörg Henkel QoS-supported On-chip Communication for Multi-processors . . . . . . . . . . . . 114--139 Seng Lin Shee and Andrea Erdos and Sri Parameswaran Architectural Exploration of Heterogeneous Multiprocessor Systems for JPEG . . . . . . . . . . . . . . . . . . 140--162
Alberto F. De Souza and Rajkumar Buyya Introduction to the Special Issue on the 18th International Symposium on Computer Architecture and High Performance Computing . . . . . . . . . . . . . . . 163--165 Fredrik Warg and Per Stenstrom Dual-thread Speculation: a Simple Approach to Uncover Thread-level Parallelism on a Simultaneous Multithreaded Processor . . . . . . . . 166--183 Peter A. Rounce and Alberto F. De Souza Dynamic Instruction Scheduling in a Trace-based Multi-threaded Architecture 184--205 Wessam M. Hassanein and Layali K. Rashid and Moustafa A. Hammad Analyzing the Effects of Hyperthreading on the Performance of Data Management Systems . . . . . . . . . . . . . . . . 206--225 Renata Braga Araújo and Guilherme Henrique Trielli Ferreira and Gustavo Henrique Orair and Wagner Meira and Renato Antônio Celso Ferreira and Dorgival Olavo Guedes Neto and Mohammed Javeed Zaki The ParTriCluster Algorithm for Gene Expression Analysis . . . . . . . . . . 226--249 George Teodoro and Tulio Tavares and Renato Ferreira and Tahsin Kurc and Wagner Meira and Dorgival Guedes and Tony Pan and Joel Saltz A Run-time System for Efficient Execution of Scientific Workflows on Distributed Environments . . . . . . . . 250--266 Gabriel H. Loh and Daniel A. Jiménez Modulo Path History for the Reduction of Pipeline Overheads in Path-based Neural Branch Predictors . . . . . . . . . . . 267--286
Guang R. Gao and Mitsuhisa Sato and Eduard Ayguadé Guest Editors Introduction: Special Issue on OpenMP . . . . . . . . . . . . 287--288 Kevin O\'Brien and Kathryn O\'Brien and Zehra Sura and Tong Chen and Tao Zhang Supporting OpenMP on Cell . . . . . . . 289--311 Haoqiang Jin and Barbara Chapman and Lei Huang and Dieter an Mey and Thomas Reichstein Performance Evaluation of a Multi-Zone Application in Different OpenMP Approaches . . . . . . . . . . . . . . . 312--325 Milos Milovanovi\'c and Roger Ferrer and Vladimir Gajinov and Osman S. Unsal and Adrian Cristal and Eduard Ayguadé and Mateo Valero Nebelung: Execution Environment for Transactional OpenMP . . . . . . . . . . 326--346 Jie Tao and Marcel Kunze and Fabian Nowak and Rainer Buchty and Wolfgang Karl Performance Advantage of Reconfigurable Cache Design on Multicore Processor Systems . . . . . . . . . . . . . . . . 347--360
Dongsoo Kang and Chen Liu and Jean-Luc Gaudiot The Impact of Speculative Execution on SMT Processors . . . . . . . . . . . . . 361--385 K. Subramani and Kiran Yellajyosula On the Design and Implementation of a Shared Memory Dispatcher for Partially Clairvoyant Schedulers . . . . . . . . . 386--411 Mariana Luderitz Kolberg and Luiz Gustavo Fernandes and Dalcidio Moraes Claudio Dense Linear System: a Parallel Self-verified Solver . . . . . . . . . . 412--425 Ahmad Faraj and Pitch Patarasuk and Xin Yuan Bandwidth Efficient All-to-All Broadcast on Switched Clusters . . . . . . . . . . 426--453
Tony Givargis Guest Editor Introduction: Special Issue on Embedded Processors . . . . . . . . . 455--456 Praveen Kalla and X. Sharon Hu and Jörg Henkel A Flexible Framework for Communication Evaluation in SoC Design . . . . . . . . 457--477 Roman Lysecky Scalability and Parallel Execution of Warp Processing: Dynamic Hardware/Software Partitioning . . . . . 478--492 Zhi Guo and Betul Buyukkurt and John Cortes and Abhishek Mitra and Walild Najjar A Compiler Intermediate Representation for Reconfigurable Fabrics . . . . . . . 493--520
Hsiao-Hsi Wang and Kuan-Ching Li and Ssu-Hsuan Lu and Chun-Chieh Yang and Jean-Luc Gaudiot Design and Implementation of an Agent Home Scheme Strategy for Prefetch-Based DSM Systems . . . . . . . . . . . . . . 521--542 Ahmad Faraj and Pitch Patarasuk and Xin Yuan A Study of Process Arrival Patterns for MPI Collective Operations . . . . . . . 543--570 Aart J. C. Bik and David L. Kreitzer and Xinmin Tian A Case Study on Compiler Optimizations for the Intel$^\reg $ Core$^{TM}$ 2 Duo Processor . . . . . . . . . . . . . . . 571--591 H. L. A. van der Spek and S. Groot and E. M. Bakker and H. A. G. Wijshoff A Compile/Run-time Environment for the Automatic Transformation of Linked List Data Structures . . . . . . . . . . . . 592--623
Nicholas Carriero Guest Editor Introduction: Special Issue on High Performance Computing for High Productivity Environments . . . . . . . 1--2 Gaurav Sharma and Jos Martin MATLAB$^\reg $: a Language for Parallel Computing . . . . . . . . . . . . . . . 3--36 Masatoshi Seki dRuby and Rinda: Implementation and Application of Distributed Ruby and its Parallel Coordination Mechanism . . . . 37--57 L. Anthony Drummond and Vicente Galiano and Violeta Migallón and Jose Penadés PyACTS: a Python Based Interface to ACTS Tools and Parallel Scientific Applications . . . . . . . . . . . . . . 58--77 Luke Tierney and A. J. Rossini and Na Li Snow: a Parallel Computing Framework for the R System . . . . . . . . . . . . . . 78--90 David E. Hudak and Neil Ludban and Ashok Krishnamurthy and Vijay Gadepally and Siddharth Samsi and others A Computational Science IDE for HPC Systems: Design and Applications . . . . 91--105 Robert D. Bjornson and Nicholas J. Carriero and Martin H. Schultz and Patrick M. Shields and Stephen B. Weston NetWorkSpace: a Coordination System for High-Productivity Environments . . . . . 106--125
Jun Cao and Ayush Goyal and Krista A. Novstrup and Samuel P. Midkiff and James M. Caruthers An Optimizing Compiler for Parallel Chemistry Simulations . . . . . . . . . 127--152 J. Miguel-Alonso and J. Navaridas and F. J. Ridruejo Interconnection Network Simulation Using Traces of MPI Applications . . . . . . . 153--174 Joahyoung Lee and Inbum Jung Recovery Strategies for Streaming Media Service in a Cluster-Based VOD Server with a Fault Node . . . . . . . . . . . 175--194 Athanasios I. Margaris Log File Formats for Parallel Applications: a Review . . . . . . . . . 195--222 Mohammad J. Rashti and Ahmad Afsahi A Speculative and Adaptive MPI Rendezvous Protocol Over RDMA-enabled Interconnects . . . . . . . . . . . . . 223--246
Rudolf Eigenmann and Eduard Ayguadé Guest Editors' Introduction . . . . . . 247--249 Greg Bronevetsky and John Gyllenhaal and Bronis R. de Supinski CLOMP: Accurately Characterizing OpenMP Application Overheads . . . . . . . . . 250--265 Karl Fürlinger and Shirley Moore Capturing and Analyzing the Execution Control Flow of OpenMP Applications . . 266--276 Tobias Hilbrich and Matthias S. Müller and Bettina Krammer MPI Correctness Checking for OpenMP/MPI Applications . . . . . . . . . . . . . . 277--291 Alejandro Duran and Roger Ferrer and Eduard Ayguadé and Rosa M. Badia and Jesus Labarta A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks . . . . . . . 292--305 Morten S. Rasmussen and Matthias B. Stuart and Sven Karlsson Parallelism and Scalability in an Image Processing Application . . . . . . . . . 306--323 Pascal Vander-Swalmen and Gilles Dequen and Michaël Krajecki A Collaborative Approach for Multi-Threaded SAT Solving . . . . . . . 324--342
Prabhat Mishra Guest Editor Introduction: Special Issue on Nano/Bio-Inspired Applications and Architectures . . . . . . . . . . . . . 343--344 Jayram Moorkanikara Nageswaran and Andrew Felch and Ashok Chandrasekhar and Nikil Dutt and Richard Granger and others Brain Derived Vision Algorithm on High Performance Architectures . . . . . . . 345--369 Yang Zhao and Krishnendu Chakrabarty On-Line Testing of Lab-on-Chip Using Reconfigurable Digital-Microfluidic Compactors . . . . . . . . . . . . . . . 370--388 Scott Chilstedt and Chen Dong and Deming Chen Design and Evaluation of a Carbon Nanotube-Based Programmable Architecture 389--416 Michael DeBole and Ramakrishnan Krishnan and Varsha Balakrishnan and Wenping Wang and Hong Luo and others New-Age: a Negative Bias Temperature Instability-Estimation Framework for Microarchitectural Components . . . . . 417--431
Stéphane Genaud and Emmanuel Jeannot and Choopan Rattanapoka Fault-Management in P2P-MPI . . . . . . 433--461 Mohammad Reza Bonyadi and Mohsen Ebrahimi Moghaddam A Bipartite Genetic Algorithm for Multi-processor Task Scheduling . . . . 462--487 Guochun Shi and Volodymyr Kindratenko and Steven Gottlieb The Bottom-Up Implementation of One MILC Lattice QCD Application on the Cell Blade . . . . . . . . . . . . . . . . . 488--507 Chen Tian and Min Feng and Vijay Nagarajan and Rajiv Gupta Speculative Parallelization of Sequential Loops on Multicores . . . . . 508--535
Nadia Nedjah and Luiza de Macedo Mourelle High-Performance Hardware of the Sliding-Window Method for Parallel Computation of Modular Exponentiations 537--555 Steen Larsen and Parthasarathy Sarangam and Ram Huggahalli and Siddharth Kulkarni Architectural Breakdown of End-to-End Latency in a TCP/IP Network . . . . . . 556--571 Carolina Ribeiro Xavier and Rafael Sachetto Oliveira and Vinicius da Fonseca Vieira and Rodrigo Weber dos Santos and Wagner Meira Multi-Level Parallelism for the Cardiac Bidomain Equations . . . . . . . . . . . 572--592 Claudio Schepke and Nicolas Maillard and Philippe O. A. Navaux Parallel Lattice Boltzmann Method with Blocked Partitioning . . . . . . . . . . 593--611
Sven-Bodo Scholz and Alex Shafarenko Guest Editors' Editorial: Special Issue on the Second International Workshop on Microgrids . . . . . . . . . . . . . . . 1--3 Benedict R. Gaster and Tim Bainbridge and David Lacey and David Gardner Compilation Techniques for High Level Parallel Code . . . . . . . . . . . . . 4--18 Jan Haase and Andreas Hofmann and Klaus Waldschmidt A Self Distributing Virtual Machine for Adaptive Multicore Environments . . . . 19--37 Clemens Grelck and Sven-Bodo Scholz and Alex Shafarenko Asynchronous Stream Processing with S-Net . . . . . . . . . . . . . . . . . 38--67 Philip K. F. Hölzenspies and Timon D. ter Braak and Jan Kuper and Gerard J. M. Smit and Johann M. Hurink Run-time Spatial Mapping of Streaming Applications to Heterogeneous Multi-Processor Systems . . . . . . . . 68--83
Xiaobin Li and Jean-Luc Gaudiot Tolerating Radiation-Induced Transient Faults in Modern Processors . . . . . . 85--116 Chao Dong and Huijie Zhao and Wei Wang Parallel Nonnegative Matrix Factorization Algorithm on the Distributed Memory Platform . . . . . . 117--137 Nan Zhang Computing Optimised Parallel Speeded-Up Robust Features (P-SURF) on Multi-Core Processors . . . . . . . . . . . . . . . 138--158 Alexandros V. Gerbessiotis Parallel Option Price Valuations with the Explicit Finite Difference Method 159--182
Preeti Ranjan Panda and Rajendran Panda Guest Editorial: Special Issue on VLSI Design and Embedded Systems . . . . . . 183--184 Alexander Czutro and Ilia Polian and Matthew Lewis and Piet Engelke and Sudhakar M. Reddy and others Thread-Parallel Integrated Test Pattern Generator Utilizing Satisfiability Analysis . . . . . . . . . . . . . . . . 185--202 Tameesh Suri and Aneesh Aggarwal Improving Adaptability and Per-Core Performance of Many-Core Processors Through Reconfiguration . . . . . . . . 203--224 Unmesh D. Bordoloi and Samarjit Chakraborty GPU-based Acceleration of System-level Design Tasks . . . . . . . . . . . . . . 225--253 Reiley Jeyapaul and Aviral Shrivastava Code Transformations for TLB Power Reduction . . . . . . . . . . . . . . . 254--276 Sourav Roy H-NMRU: an Efficient Cache Replacement Policy with Low Area . . . . . . . . . . 277--287 Spyros Apostolakos and Apostolos Meliones and George Lykakis and Emmanuel Touloupis and Vassilis Vlagoulis Design, Implementation and Validation of an Open Source IP-PBX/VoIP Gateway Multi-Core SoC . . . . . . . . . . . . . 288--302 T. Kempf and S. Wallentowitz and G. Ascheid and R. Leupers and H. Meyr Analytical and Simulation-based Design Space Exploration of Software Defined Radios . . . . . . . . . . . . . . . . . 303--321 Vinay B. Y. Kumar and Siddharth Joshi and Sachin B. Patkar and H. Narayanan FPGA Based High Performance Double-Precision Matrix Multiplication 322--338
Matthias S. Müller and Eduard Ayguadé Guest Editors' Introduction . . . . . . 339--340 Stephen L. Olivier and Jan F. Prins Comparison of OpenMP 3.0 and Other Task Parallel Frameworks on Unbalanced Task Graphs . . . . . . . . . . . . . . . . . 341--360 Chunhua Liao and Daniel J. Quinlan and Jeremiah J. Willcock and Thomas Panas Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions . . . . . . . . . . . . . . 361--378 Paul Kapinos and Dieter an Mey Productivity and Performance Portability of the OpenMP 3.0 Tasking Concept When Applied to an Engineering Code Written in Fortran 95 . . . . . . . . . . . . . 379--395 J. Mark Bull and James Enright and Xu Guo and Chris Maynard and Fiona Reid Performance Evaluation of Mixed-Mode OpenMP/MPI Implementations . . . . . . . 396--417 François Broquedis and Nathalie Furmento and Brice Goglin and Pierre-André Wacrenier and Raymond Namyst ForestGOMP: an Efficient OpenMP Environment for NUMA Architectures . . . 418--439 Eduard Ayguadé and Rosa M. Badia and Pieter Bellens and Daniel Cabrera and Alejandro Duran Roger Ferrer and Marc González and Francisco Igual and Daniel Jiménez-González and Jesús Labarta and Luis Martinell and Xavier Martorell and Rafael Mayo and Josep M. Pérez and Judit Planas and Enrique S. Quintana-Ortí Extending OpenMP to Survive the Heterogeneous Multi-Core Era . . . . . . 440--459
Valentina Salapura and José E. Moreira and Sally A. McKee Guest Editors Introduction . . . . . . . 1--2 Daniele Paolo Scarpazza Top-Performance Tokenization and Small-Ruleset Regular Expression Matching: a Quantitative Performance Analysis and Optimization Study on the Cell/B.E. Processor . . . . . . . . . . 3--32 Arrvindh Shriraman and Sandhya Dwarkadas Analyzing Conflicts in Hardware-Supported Memory Transactions 33--61 Mehmet Belgin and Godmar Back and Calvin J. Ribbens A Library for Pattern-based Sparse Matrix Vector Multiply . . . . . . . . . 62--87 Rob V. van Nieuwpoort and John W. Romein Correlating Radio Astronomy Signals with Many-Core Hardware . . . . . . . . . . . 88--114 Jiayuan Meng and Kevin Skadron A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations . . . . . . . . . . . . . 115--142
Ghada F. El Kabbany and Nayer M. Wanas and Nadia H. Hegazi and Samir I. Shaheen A Dynamic Load Balancing Framework for Real-time Applications in Message Passing Systems . . . . . . . . . . . . 143--182 K. A. Hawick and A. Leist and D. P. Playne Regular Lattice and Small-World Spin Model Simulations Using CUDA and GPUs 183--201 Simon Uzezi Ewedafe and Rio Hirowati Shariffudin Parallel Implementation of $2$-D Telegraphic Equation on MPI/PVM Cluster 202--231 Nasser Giacaman and Oliver Sinnen Parallel Iterator for Parallelizing Object-Oriented Applications . . . . . . 232--269
Christian Fensch and Marcelo Cintra An Evaluation of an OS-Based Coherence Scheme for Tiled CMPs . . . . . . . . . 271--295 Grigori Fursin and Yuriy Kashnikov and Abdul Wahid Memon and Zbigniew Chamski and Olivier Temam and others Milepost GCC: Machine Learning Enabled Self-tuning Compiler . . . . . . . . . . 296--327 Arnaud Grasset and Philippe Millet and Philippe Bonnot and Sami Yehia and Wolfram Putzke-Roeming and others The MORPHEUS Heterogeneous Dynamically Reconfigurable Platform . . . . . . . . 328--356 R. Tornero and J. M. Orduña and A. Mejia and J. Flich and J. Duato A Communication-Driven Routing Technique for Application-Specific NoCs . . . . . 357--374 Enrique Vallejo and Sutirtha Sanyal and Tim Harris and Fernando Vallejo and Ramón Beivide and others Hybrid Transactional Memory with Pessimistic Concurrency Control . . . . 375--396 Harm Munk and Eduard Ayguadé and Cédric Bastoul and Paul Carpenter and Zbigniew Chamski and others ACOTES Project: Advanced Compiler Technologies for Embedded Streaming . . 397--450
Shaoshan Liu and Ligang Wang and Xiao-Feng Li and Jean-Luc Gaudiot Space-and-Time Efficient Parallel Garbage Collector for Data-Intensive Applications . . . . . . . . . . . . . . 451--472 Ying Qian and Ahmad Afsahi Process Arrival Pattern Aware Alltoall and Allgather on InfiniBand Clusters . . 473--493 L. Benini and R. Grottesi and S. Morigi and M. Ruggiero Parallel Rendering and Animation of Subdivision Surfaces on the Cell BE Processor . . . . . . . . . . . . . . . 494--521 Kush K. Kella and Aasia Khanum APCFS: Autonomous and Parallel Compressed File System . . . . . . . . . 522--532
Shaoshan Liu and Christine Eisenbeis and Jean-Luc Gaudiot Value Prediction and Speculative Execution on GPU . . . . . . . . . . . . 533--552 Ralf Hoffmann and Thomas Rauber Adaptive Task Pools: Efficiently Balancing Large Number of Tasks on Shared-address Spaces . . . . . . . . . 553--581 Can Ozturan and Dan Grigoras Guest Editorial: Parallel and Distributed Computing . . . . . . . . . 582--583 Anne Benoit and Hinde Lilia Bouziane and Yves Robert Optimizing the Reliability of Streaming Applications Under Throughput Constraints . . . . . . . . . . . . . . 584--614 George C. Caragea and Alexandros Tzannes and Fuat Keceli and Rajeev Barua and Uzi Vishkin Resource-Aware Compiler Prefetching for Fine-Grained Many-Cores . . . . . . . . 615--638 Alper Sen and Baris Aksanli and Murat Bozkurt Speeding Up Cycle Based Logic Simulation Using Graphics Processing Units . . . . 639--661
Yu-Min Lu and Peng-Sheng Chen Probabilistic Alias Analysis of Executable Code . . . . . . . . . . . . 663--693 Håkan Sundell Wait-Free Multi-Word Compare-and-Swap Using Greedy Helping and Grabbing . . . 694--716 Masroor Hussain and Muhammad Abid and Mushtaq Ahmad and Ashfaq Khokhar and Arif Masud A Parallel Implementation of ALE Moving Mesh Technique for FSI Problems using OpenMP . . . . . . . . . . . . . . . . . 717--745 Kayhan M. Imre and Cesur Baransel and Harun Artuner Efficient and Scalable Routing Algorithms for Collective Communication Operations on $2$D All-Port Torus Networks . . . . . . . . . . . . . . . . 746--782 Brian Demsky Using Discrete Event Simulation to Analyze Contention Managers . . . . . . 783--808 Seçkin Sanci and Veysi Isler A Parallel Algorithm for UAV Flight Route Planning on GPU . . . . . . . . . 809--837
Valentina Salapura and Michael Gschwind and Jens Knoop Guest Editorial: Parallel Systems and Compilers . . . . . . . . . . . . . . . 1--3 I-Jui Sung and Nasser Anssari and John A. Stratton and Wen-Mei W. Hwu Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications . . . . . . 4--24 Ferad Zyulkyarov and Srdjan Stipic and Tim Harris and Osman S. Unsal and Adrián Cristal and Ibrahim Hur and Mateo Valero Profiling and Optimizing Transactional Memory Applications . . . . . . . . . . 25--56 M. Awasthi and D. Nellans and K. Sudan and R. Balasubramonian and A. Davis Managing Data Placement in Memory Systems with Multiple Memory Controllers 57--83 Changhui Lin and Vijay Nagarajan and Rajiv Gupta Efficient Sequential Consistency Using Conditional Fences . . . . . . . . . . . 84--117 Yun Zhang and Jae W. Lee and Nick P. Johnson and David I. August DAFT: Decoupled Acyclic Fault Tolerance 118--140
Yan Huang and Jie Tang and Zhi-min Gu and Min Cai and Jianxun Zhang and Ninghan Zheng The Performance Optimization of Threaded Prefetching for Linked Data Structures 141--163 Jean-Claude Charr and Raphaël Couturier and David Laiymani Adaptation and Evaluation of the Multisplitting-Newton and Waveform Relaxation Methods Over Distributed Volatile Environments . . . . . . . . . 164--183 Mwaffaq Otoom and JoAnn M. Paul Workload Mode Identification for Chip Heterogeneous Multiprocessors . . . . . 184--224 Mohsen Ebrahimi Moghaddam and Mohammad Reza Bonyadi An Immune-based Genetic Algorithm with Reduced Search Space Coding for Multiprocessor Task Scheduling Problem 225--257
Wagner Meira and Ricardo Bianchini Special Issue on Computer Architecture and High-Performance Computing . . . . . 259--261 Ricardo Menotti and João M. P. Cardoso and Marcio M. Fernandes and Eduardo Marques LALP: a Language to Program Custom FPGA-Based Acceleration Engines . . . . 262--289 Jairo Panetta and Thiago Teixeira and Paulo R. P. de Souza Filho and Carlos A. da Cunha Filho and David Sotelo and Fernando M. Roxo da Motta and Silvio Sinedino Pinheiro and Andre L. Romanelli Rosa and Luiz R. Monnerat and Leandro T. Carneiro and Carlos H. B. de Albrecht Accelerating Time and Depth Seismic Migration by CPU and GPU Cooperation . . 290--312 Pedro Leite and João Marcelo Teixeira and Thiago Farias and Bernardo Reis and Veronica Teichrieb and Judith Kelner Nearest Neighbor Searches on the GPU: a Massively Parallel Approach for Dynamic Point Clouds . . . . . . . . . . . . . . 313--330 Artur Santos and João Marcelo Teixeira and Thiago Farias and Veronica Teichrieb and Judith Kelner Understanding the Efficiency of kD-tree Ray-Traversal Techniques over a GPGPU Architecture . . . . . . . . . . . . . . 331--352 Girish Venkatasubramanian and Renato J. Figueiredo and Ramesh Illikkal and Donald Newell TMT: a TLB Tag Management Framework for Virtualized Platforms . . . . . . . . . 353--380
Ákos Dudás and Sándor Juhász and Tamás Schrádi Software Controlled Adaptive Pre-Execution for Data Prefetching . . . 381--396 Giuliano Laccetti and Marco Lapegna and Valeria Mele and Diego Romano and Almerico Murli A Double Adaptive Algorithm for Multidimensional Integration on Multicore Based HPC Systems . . . . . . 397--409 Rohit Jalan and Arun Kejariwal Trin--Trin: Who's Calling? A Pin-Based Dynamic Call Graph Extraction Framework 410--442 John M. Neuberger and Nándor Sieben and James W. Swift An MPI Implementation of a Self-Submitting Parallel Job Queue . . . 443--464
Yan Huang and Zhi-Min Gu and Jie Tang and Min Cai and Jianxun Zhang and others Estimating Effective Prefetch Distance in Threaded Prefetching for Linked Data Structures . . . . . . . . . . . . . . . 465--487 Fadi Abboud and Yosi Ben-Asher and Yousef Shajrawi and Esti Stein Combining Height Reduction and Scheduling for VLIW Machines Enhanced with Three-Argument Arithmetic Operations . . . . . . . . . . . . . . . 488--513 Wai-Mee Ching and Da Zheng Automatic Parallelization of Array-oriented Programs for a Multi-core Machine . . . . . . . . . . . . . . . . 514--531 Joppe W. Bos Low-Latency Elliptic Curve Scalar Multiplication . . . . . . . . . . . . . 532--550
Hubertus Franke and Paul H. J. Kelly and Pedro Trancoso Guest Editorial: Computing Frontiers . . 551--552 Alexander D. Rast and Javier Navaridas and Xin Jin and Francesco Galluppi and Luis A. Plana and others Managing Burstiness and Scalability in Event-Driven Models on the SpiNNaker Neuromimetic System . . . . . . . . . . 553--582 Stamatis Kavadias and Manolis Katevenis and Michail Zampetakis and Dimitrios S. Nikolopoulos Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs . . 583--604 Yong Cao and Debprakash Patnaik and Sean Ponce and Jeremy Archuleta and Patrick Butler and others Parallel Mining of Neuronal Spike Streams on Graphics Processing Units . . 605--632 Vinod Tipparaju and Edoardo Apra and Weikuan Yu and Xinyu Que and Jeffrey S. Vetter Runtime Techniques to Enable a Highly-Scalable Global Address Space Model for Petascale Computing . . . . . 633--655
Mounira Bachir and Sid-Ahmed-Ali Touati and Frederic Brault and David Gregg and Albert Cohen Minimal Unroll Factor for Code Generation of Software Pipelining . . . 1--58 Shixun Zhang and Shinichi Yamagiwa and Masahiko Okumura and Seiji Yunoki Kernel Polynomial Method on GPU . . . . 59--88 Daniel Nicácio and Alexandro Baldassin and Guido Araújo Transaction Scheduling Using Dynamic Conflict Avoidance . . . . . . . . . . . 89--110 Khaled Hamidouche and Fernando Machado Mendonca and Joel Falcou and Alba Cristina Magalhaes Alves de Melo and Daniel Etiemble Parallel Smith--Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++ . . . . . . . . . . 111--136 Junchang Wang and Kai Zhang and Xinan Tang and Bei Hua B-Queue: Efficient and Practical Queuing for Fast Core-to-Core Communication . . 137--159
John McAllister and Luigi Carro and Skevos Evripidou Guest Editorial: Special Issue on 2011 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS XI) . . . 161--162 David A. Penry and Kurtis D. Cahill ADL-Based Specification of Implementation Styles for Functional Simulators . . . . . . . . . . . . . . . 163--211 Oscar Almer and Igor Böhm and Tobias Edler von Koch and Björn Franke and Stephen Kyle and Volker Seeker and Christopher Thompson and Nigel Topham A Parallel Dynamic Binary Translator for Efficient Multi-Core Simulation . . . . 212--235 Tiago Dias and Sebastián López and Nuno Roma and Leonel Sousa Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems . . . . . . . . . . . . . . . . 236--260 Kenneth C. Rovers and Jan Kuper UniTi: Unified Composition and Time for Multi-domain Model-based Design . . . . 261--304 Karthik T. Sundararajan and Timothy M. Jones and Nigel P. Topham The Smart Cache: an Energy-Efficient Cache Architecture Through Dynamic Adaptation . . . . . . . . . . . . . . . 305--330 Stefan Langemeyer and Peter Pirsch and Holger Blume Using SDRAM Memories for High-Performance Accesses to Two-Dimensional Matrices Without Transpose . . . . . . . . . . . . . . . 331--354
Calin Cascaval and Pedro Trancoso and Viktor Prasanna Guest Editorial: Computing Frontiers . . 355--356 Alexander Heinecke and Dirk Pflüger Emerging Architectures Enable to Boost Massively Parallel Data Mining Using Adaptive Sparse Grids . . . . . . . . . 357--399 Chunyang Gou and Georgi N. Gaydadjiev Addressing GPU On-Chip Shared Memory Bank Conflicts Using Elastic Pipeline 400--429 Gianfranco Bilardi and Kattamuri Ekanadham and Pratap Pattnaik Efficient Stack Distance Computation for a Class of Priority Replacement Policies 430--468 Nawab Ali and Sriram Krishnamoorthy and Mahantesh Halappanavar and Jeff Daily Multi-Fault Tolerance for Cartesian Data Distributions . . . . . . . . . . . . . 469--493
Emanuel Vianna and Giovanni Comarela and Tatiana Pontes and Jussara Almeida and Virgílio Almeida and Kevin Wilkinson and Harumi Kuno and Umeshwar Dayal Analytical Performance Models for MapReduce Workloads . . . . . . . . . . 495--525 Yunho Oh and Doohwan Oh and Won W. Ro GPU-Friendly Parallel Genome Matching with Tiled Access and Reduced State Transition Table . . . . . . . . . . . . 526--551 Claudio Schepke and Nicolas Maillard and Joerg Schneider and Hans-Ulrich Heiss Online Mesh Refinement for Parallel Atmospheric Models . . . . . . . . . . . 552--569 Christopher Oßner and Klemens Böhm Graphs for Mining-Based Defect Localization in Multithreaded Programs 570--593
Bugra Gedik Auto-tuning Similarity Search Algorithms on Multi-core Architectures . . . . . . 595--620 Nasser Giacaman and Oliver Sinnen Parallel Task for Parallelising Object-Oriented Desktop Applications . . 621--681 Zheng Gu and Matthew Small and Xin Yuan and Aniruddha Marathe and David K. Lowenthal Protocol Customization for Improving MPI Performance on RDMA-Enabled Clusters . . 682--703 Eunjung Park and John Cavazos and Louis-Noël Pouchet and Cédric Bastoul and Albert Cohen and P. Sadayappan Predictive Modeling in a Polyhedral Optimization Space . . . . . . . . . . . 704--750
Rudi Eigenmann and Sam Midkiff Compiler Infrastructure . . . . . . . . 751--752 Hansang Bae and Dheya Mustafa and Jae-Woo Lee and Aurangzeb and Hao Lin and Chirag Dave and Rudolf Eigenmann and Samuel P. Midkiff The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation 753--767 Yi Yang and Huiyang Zhou The Implementation of a High Performance GPGPU Compiler . . . . . . . . . . . . . 768--781 Gabriel Rodríguez and María J. Martín and Patricia González and Juan Touriño and Ramón Doallo Compiler-Assisted Checkpointing of Parallel Codes: The Cetus and LLVM Experience . . . . . . . . . . . . . . . 782--805 Amin Shafiee Sarvestani and Erik Hansson and Christoph Kessler Extensible Recognition of Algorithmic Patterns in DSP Programs for Automatic Parallelization . . . . . . . . . . . . 806--824 Barbara Chapman and Deepak Eachempati and Oscar Hernandez Experiences Developing the OpenUH Compiler and Runtime Infrastructure . . 825--854 Xipeng Shen and Yixun Liu and Eddy Z. Zhang and Poornima Bhamidipati An Infrastructure for Tackling Input-Sensitivity of GPU Program Optimizations . . . . . . . . . . . . . 855--869
Alba Melo and Jean-Luc Gaudiot and Luiz DeRose and Kunle Olukotun and Albert Zomaya Guest Editorial . . . . . . . . . . . . 1--3 Ana Avilés-González and Juan Piernas and Pilar González-Férez Scalable Metadata Management Through OSD+ Devices . . . . . . . . . . . . . . 4--29 Enqiang Sun and David Kaeli Aggressive Value Prediction on a GPU . . 30--48 Mouad Bahi and Christine Eisenbeis Impact of Reverse Computing on Information Locality in Register Allocation for High Performance Computing . . . . . . . . . . . . . . . 49--76 Joerg Schneider and Barry Linnert List-based Data Structures for Efficient Management of Advance Reservations . . . 77--93 Claudia Rosas and Anna Sikora and Josep Jorba and Andreu Moreno and Eduardo César Improving Performance on Data--Intensive Applications Using a Load Balancing Methodology Based on Divisible Load Theory . . . . . . . . . . . . . . . . . 94--118 Sasa Tomi\'c and Adrián Cristal and Osman Unsal and Mateo Valero Using Dynamic Runtime Testing for Rapid Development of Architectural Simulators 119--139 Edson Borin and Guido Araujo and Mauricio Breternitz, Jr. and Youfeng Wu Microcode Compression Using Structured--Constrained Clustering . . . 140--164 Sarala Arunagiri and Yipkei Kwok and Patricia J. Teller and Ricardo A. Portillo and Seetharami R. Seelam FAIRIO: a Throughput-oriented Algorithm for Differentiated I/O Performance . . . 165--197 M. M. Waliullah and Per Stenstrom Removal of Conflicts in Hardware Transactional Memory Systems . . . . . . 198--218 Nam Ma and Yinglong Xia and Viktor K. Prasanna Data Parallel Implementation of Belief Propagation in Factor Graphs on Multi-core Platforms . . . . . . . . . . 219--237
Eduarda Monteiro and Bruno Vizzotto and Cláudio Diniz and Marilena Maule and Bruno Zatt and Sergio Bampi Parallelization of Full Search Motion Estimation Algorithm for Parallel and Distributed Platforms . . . . . . . . . 239--264 Gabriel P. Silva and Juliana Correa and Cristiana Bentes and Sergio Guedes and Mariela Gabioux The Experience in Designing and Evaluating the High Performance Cluster Netuno . . . . . . . . . . . . . . . . . 265--286 Mitja Bezensek and Borut Robic A Survey of Parallel and Distributed Algorithms for the Steiner Tree Problem 287--319 Johann Steinbrecher and Cesar J. Philippidis and Weijia Shang A Case Study of Implementing Supernode Transformations . . . . . . . . . . . . 320--342 John K. Holmen and David L. Foster Accelerating Single Iteration Performance of CUDA--Based $3$D Reaction--Diffusion Simulations . . . . 343--363 John K. Holmen and David L. Foster Erratum to: Accelerating Single Iteration Performance of CUDA--Based $3$D Reaction--Diffusion Simulations . . 364--364 Luís Fabrício Wanderley Góes and Christiane Pousa Ribeiro and Márcio Castro and Jean-François Méhaut and Murray Cole and Marcelo Cintra Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications . . . . . . . . . . . . . . 365--382 Anonymous Editor's Note . . . . . . . . . . . . . 383--383 Changmin Lee and Won Woo Ro and Jean-Luc Gaudiot Boosting CUDA Applications with CPU--GPU Hybrid Computing . . . . . . . . . . . . 384--404
Jesus Carretero and Laurence T. Yang Parallel and Distributed Processing with Applications: Preface . . . . . . . . . 405--407 Jesús Cámara and Javier Cuenca and Domingo Giménez and Luis Pedro García and Antonio M. Vidal Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning . . . . . . . . . . . . . . 408--434 Abdullah Kayi and Olivier Serres and Tarek El-Ghazawi Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors 435--455 Yousun Ko and Minyoung Jung and Yo-Sub Han and Bernd Burgstaller A Speculative Parallel DFA Membership Test for Multicore, SIMD and Cloud Computing Environments . . . . . . . . . 456--489 Thomas Baumann and Michael Resch Parallel Parameter Identification in Industrial Biotechnology . . . . . . . . 490--504 Cheng Hua Li and Laurence T. Yang and Man Lin Parallel Training of an Improved Neural Network for Text Categorization . . . . 505--523
Gaetan Hains and Youry Khmelevsky Guest Editorial for High-level Parallel Programming and Applications . . . . . . 525--528 Alexandra Jimborean and Philippe Clauss and Jean-François Dollinger and Vincent Loechner and Juan Manuel Martinez Caamaño Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons . . . . . . . . . . . . . . . 529--545 Kento Emoto and Kiminori Matsuzaki An Automatic Fusion Mechanism for Variable-Length List Skeletons in SkeTo 546--563 Christopher Brown and Marco Danelutto and Kevin Hammond and Peter Kilpatrick and Archibald Elliott Cost-Directed Refactoring for Parallel Erlang Programs . . . . . . . . . . . . 564--582 Mathias Bourgoin and Emmanuel Chailloux and Jean-Luc Lamotte Efficient Abstractions for GPGPU Programming . . . . . . . . . . . . . . 583--600 Michel Steuwer and Malte Friese and Sebastian Albers and Sergei Gorlatch Introducing and Implementing the Allpairs Skeleton for Programming Multi-GPU Systems . . . . . . . . . . . 601--618 A. N. Yzelman and R. H. Bisseling and D. Roose and K. Meerbergen MulticoreBSP for C: a High-Performance Library for Shared-Memory Parallel Programming . . . . . . . . . . . . . . 619--642 Nuno Gaspar and Ludovic Henrio and Eric Madelaine Bringing Coq into the World of GCM Distributed Applications . . . . . . . . 643--662 Stefano Chessa and Susanna Pelagatti and Nicoletta Triolo Engineering Energy Efficient Visual Sensor Network Applications Using Skeletons . . . . . . . . . . . . . . . 663--680
Pavel Krömer and Jan Platos and Václav Snásel Nature-Inspired Meta-Heuristics on Modern GPUs: State of the Art and Brief Survey of Selected Algorithms . . . . . 681--709 Ciprian Dobre and Fatos Xhafa Parallel Programming Paradigms and Frameworks in Big Data Era . . . . . . . 710--738 Fahimeh Ramezani and Jie Lu and Farookh Khadeer Hussain Task-Based System Load Balancing in Cloud Computing Using Particle Swarm Optimization . . . . . . . . . . . . . . 739--754 Ugo Fiore and Francesco Palmieri and Aniello Castiglione and Alfredo De Santis A Cluster-Based Data-Centric Model for Network-Aware Task Scheduling in Distributed Systems . . . . . . . . . . 755--775 Ibtehal Nafea and Muhammad Younas and Robert Holton and Irfan Awan A Priority-Based Admission Control Scheme for Commercial Web Servers . . . 776--797 Tomoya Enokido and Ailixier Aikebaier and Makoto Takizawa Energy-Efficient Redundant Execution of Processes in a Fault-Tolerant Cluster of Servers . . . . . . . . . . . . . . . . 798--819 Zia ur Rehman and Omar Khadeer Hussain and Farookh Khadeer Hussain Parallel Cloud Service Selection and Ranking Based on QoS History . . . . . . 820--852 Fei Song and Daochao Huang and Huachun Zhou and Hongke Zhang and Ilsun You An Optimization-Based Scheme for Efficient Virtual Machine Placement . . 853--872
Alex Nicolau Acknowledgment to Reviewers . . . . . . 873--874 Shin-Kai Chen and Cheng-Yu Hung and Ching-Chih Chen and Chih-Wei Liu Parallelizing Complex Streaming Applications on Distributed Scratchpad Memory Multicore Architecture . . . . . 875--899 Young-Joo Kim and Sejun Song and Yong-Kee Jun VORD: a Versatile On-the-fly Race Detection Tool in OpenMP Programs . . . 900--930 S. Sankaraiah and Lam Hai Shuan and C. Eswaran and Junaidi Abdullah Performance Optimization of Video Coding Process on Multi-Core Platform Using GOP Level Parallelism . . . . . . . . . . . 931--947 Carlos H. González and Basilio B. Fraguela An Algorithm Template for Domain-Based Parallel Irregular Algorithms . . . . . 948--967 Steffen Ernsting and Herbert Kuchen A Scalable Farm Skeleton for Hybrid Parallel and Distributed Programming . . 968--987 Bert Gijsbers and Clemens Grelck An Efficient Scalable Runtime System for Macro Data Flow Processing Using S-Net 988--1011 M. Aldinucci and S. Campa and M. Danelutto and P. Kilpatrick and M. Torquati Design patterns percolating to parallel programming framework implementation . . 1012--1031 Michal Czapi\'nski and Chris Thompson and Stuart Barnes Reducing Communication Overhead in Multi-GPU Hybrid Solver for $2$D Laplace's Equation . . . . . . . . . . . 1032--1047
John McAllister and David Guevorkian and Hartwig Jeschke and Mihai Sima Guest Editorial: Special Issue on Embedded Computer Systems: Architectures, Modeling and Simulation 1--2 Teemu Nyländen and Jani Boutellier and Karri Nikunen and Jari Hannuksela and Olli Silvén Low-Power Reconfigurable Miniature Sensor Nodes for Condition Monitoring 3--23 Amine Anane and El Mostapha Aboulhamid A Transaction-Based Environment for System Modeling and Parallel Simulation 24--58 Georgios Keramidas and Chrysovalantis Datsios Revisiting Cache Resizing . . . . . . . 59--85 Daniel Baudisch and Klaus Schneider Evaluation of Speculation in Out-of-Order Execution of Synchronous Dataflow Networks . . . . . . . . . . . 86--129 Ricardo A. Velásquez and Pierre Michaud and André Seznec BADCO: Behavioral Application-Dependent Superscalar Core Models . . . . . . . . 130--157
Markus Metzger and Xinmin Tian and Walfred Tedeschi User-Guided Dynamic Data Race Detection 159--179 Jianxun Zhang and Zhimin Gu and Yan Huang and Ninghan Zheng and Xiaohan Hu Helper Thread Prefetching Control Framework on Chip Multi-processor . . . 180--202 I. Z. Reguly and M. B. Giles Finite Element Algorithms and Data Structures on Graphical Processing Units 203--239 Matthew Williamson and K. Subramani A Parallel Implementation for the Negative Cost Girth Problem . . . . . . 240--259 Zhendong Wu and Kai Lu and Xiaoping Wang and Xu Zhou Collaborative Technique for Concurrency Bug Detection . . . . . . . . . . . . . 260--285 Kshitij Mehta and Edgar Gabriel Multi-Threaded Parallel I/O for OpenMP Applications . . . . . . . . . . . . . . 286--309
Ching-Hsien Hsu and Xiaoming Li and Xuanhua Shi Network and Parallel Computing . . . . . 311--315 Quanqing Xu and Liang Zhao and Mingzhong Xiao and Anna Liu and Yafei Dai YuruBackup: a Space-Efficient and Highly Scalable Incremental Backup System in the Cloud . . . . . . . . . . . . . . . 316--338 Hui Huang and Ligang He and Xueguang Chen and Minghui Yu and Zhiwu Wang Automatic Composition of Heterogeneous Models Based on Semantic Web Services 339--358 Xiaowen Feng and Hai Jin and Ran Zheng and Lei Zhu and Weiqi Dai Accelerating Smith--Waterman Alignment of Species-Based Protein Sequences on GPU . . . . . . . . . . . . . . . . . . 359--380 Edwin Sha and Li Wang and Qingfeng Zhuge and Jun Zhang and Jing Liu Power Efficiency for Hardware/Software Partitioning with Time and Area Constraints on MPSoC . . . . . . . . . . 381--402 Hai Jin and Hanfeng Qin and Song Wu and Xuerong Guo CCAP: a Cache Contention-Aware Virtual Machine Placement Approach for HPC Cloud 403--420 Bernhard Egger and Erik Gustafsson and Changyeon Jo and Jeongseok Son Efficiently Restoring Virtual Machines 421--439 Feng Liang and Yunzhen Liu and Hai Liu and Shilong Ma and Bettina Schnor A Parallel Job Execution Time Estimation Approach Based on User Submission Patterns within Computational Grids . . 440--454 Xianming Zhong and Chengcheng Xiang and Miao Yu and Zhengwei Qi and Haibing Guan A Virtualization Based Monitoring System for Mini-intrusive Live Forensics . . . 455--471 Zhao Li and Yao Shen and Bin Yao and Minyi Guo OFScheduler: a Dynamic Network Optimizer for MapReduce in Heterogeneous Cluster 472--488 Kenn Slagter and Ching-Hsien Hsu and Yeh-Ching Chung An Adaptive and Memory Efficient Sampling Mechanism for Partitioning in MapReduce . . . . . . . . . . . . . . . 489--507 Songbin Liu and Xiaomeng Huang and Haohuan Fu and Guangwen Yang and Zhenya Song Data Reduction Analysis for Climate Data Sets . . . . . . . . . . . . . . . . . . 508--527 Hai Jin and Honglei Jiang and Shadi Ibrahim and Xiaofei Liao Inaccuracy in Private BitTorrent Measurements . . . . . . . . . . . . . . 528--547
Dheya Mustafa and Rudolf Eigenmann PETRA: Performance Evaluation Tool for Modern Parallelizing Compilers . . . . . 549--571 Steven Feldman and Pierre LaBorde and Damian Dechev A Wait-Free Multi-Word Compare-and-Swap Operation . . . . . . . . . . . . . . . 572--596 Tae-Hyuk Ahn and Adrian Sandu and Layne T. Watson and Clifford A. Shaffer and Yang Cao and William T. Baumann A Framework to Analyze the Performance of Load Balancing Schemes for Ensembles of Stochastic Simulations . . . . . . . 597--630 Ryma Mahfoudhi and Zaher Mahjoub and Wahid Nasri Parallel Communication-Avoiding Algorithm for Triangular Matrix Inversion on Homogeneous and Heterogeneous Platforms . . . . . . . . 631--655 Ali Jannesari Detection of High-Level Synchronization Anomalies in Parallel Programs . . . . . 656--678
Daniel Langr and Pavel Tvrdík and Ivan Simecek and Tomás Dytrych Downsampling Algorithms for Large Sparse Matrices . . . . . . . . . . . . . . . . 679--702 Alejandro Hidalgo-Paniagua and Miguel A. Vega-Rodríguez and Nieves Pavón and Joaquín Ferruz A Comparative Study of Parallel RANSAC Implementations in $3$D Space . . . . . 703--720 Deli Zhang and Brendan Lynch and Damian Dechev Queue-Based and Adaptive Lock Algorithms for Scalable Resource Allocation on Shared--Memory Multiprocessors . . . . . 721--751 Pekka Jääskeläinen and Carlos Sánchez de La Lama and Erik Schnetter and Kalle Raiskila and Jarmo Takala and Heikki Berg pocl: a Performance-Portable OpenCL Implementation . . . . . . . . . . . . . 752--785 María Botón-Fernández and Manuel Rodríguez-Pascual and Miguel A. Vega-Rodríguez and Francisco Prieto-Castrillo and Rafael Mayo-García A Comparative Analysis of Adaptive Solutions for Grid Environments . . . . 786--811 Jakub Nalepa and Miroslaw Blocho Co-operation in the Parallel Memetic Algorithm . . . . . . . . . . . . . . . 812--839 Slobodan Jeli\'c and Sören Laue and Domagoj Matijevi\'c and Patrick Wijerama A Fast Parallel Implementation of a PTAS for Fractional Packing and Covering Linear Programs . . . . . . . . . . . . 840--875 Jose L. Jodra and Ibai Gurrutxaga and Javier Muguerza Efficient $3$D Transpositions in Graphics Processing Units . . . . . . . 876--891 Christopher Brown High-Level Heterogeneous and Hierarchical Parallel Systems (HLPGPU 2014) . . . . . . . . . . . . . . . . . 892--893 Ashkan Tousimojarad and Wim Vanderbauwhede Steal Locally, Share Globally . . . . . 894--917 Hector Ortega-Arranz and Yuri Torres and Arturo Gonzalez-Escribano and Diego R. Llanos Comprehensive Evaluation of a New GPU-based Approach to the Shortest Path Problem . . . . . . . . . . . . . . . . 918--938 Hector Ortega-Arranz and Yuri Torres and Arturo Gonzalez-Escribano and Diego R. Llanos TuCCompi: a Multi-layer Model for Distributed Heterogeneous Computing with Tuning Capabilities . . . . . . . . . . 939--960
Guido Araujo and Jean-Luc Gaudiot Guest Editorial: SBAC--PAD 2013 . . . . 961--964 Yun R. Qu and Shijie Zhou and Viktor K. Prasanna A Decomposition-Based Approach for Scalable Many-Field Packet Classification on Multi-core Processors 965--987 Karlo G. Lenzi and Felipe A. P. Figueiredo Fully Optimized Code Block Segmentation Algorithm for LTE--Advanced . . . . . . 988--1003 Martin Schreiber and Christoph Riesinger Invasive Compute Balancing for Applications with Shared and Hybrid Parallelization . . . . . . . . . . . . 1004--1027 Zifan Liu and Nahid Emad and Soufian Ben Amor PageRank Computation Using a Multiple Implicitly Restarted Arnoldi Method for Modeling Epidemic Spread . . . . . . . . 1028--1053 Guohong Li and Olivier Temam and Zhenyu Liu Cluster Cache Monitor: Leveraging the Proximity Data in CMP . . . . . . . . . 1054--1077 J. Lobeiras and M. Amor and R. Doallo BPLG: a Tuned Butterfly Processing Library for GPU Architectures . . . . . 1078--1102 Paul-Antoine Arras and Didier Fuin List Scheduling in Embedded Systems Under Memory Constraints . . . . . . . . 1103--1128 Bharat Sukhwani and Mathew Thoennes and Hong Min A Hardware/Software Approach for Database Query Acceleration with FPGAs 1129--1159 Gregorio Bernabé and Javier Cuenca An Autotuning Engine for the $3$D Fast Wavelet Transform on Clusters with Hybrid CPU + GPU Platforms . . . . . . . 1160--1191 Gong Su and Stephen Heisig The Scalability of Disjoint Data Structures on a New Hardware Transactional Memory System . . . . . . 1192--1217 George Michelogiannakis and Xiaoye S. Li Extending Summation Precision for Network Reduction Operations . . . . . . 1218--1243
Ching-Hsien Hsu and Valentina Salapura Network and Parallel Computing . . . . . 1--4 Chengcheng Yang and Peiquan Jin and Lihua Yue Efficient Buffer Management for Tree Indexes on Solid State Drives . . . . . 5--25 Ralph Duncan and Peder Jungck and Kenneth Ross Using Packet Processing Object Modules Interchangeably as Stand-Alone Programs or ``Multi-app'' Components . . . . . . 26--45 Mei-Ling Chiang and Bo-Wen Yu and Chi-Shian Shia Operating System Enhancement for Supporting Massively Multiplayer Online Games in a Server Cluster . . . . . . . 46--67 Xiaofei Liao and Rentong Guo and Danping Yu A Phase Behavior Aware Dynamic Cache Partitioning Scheme for CMPs . . . . . . 68--86 Byungjoo Kim and Jung Eun Lee and Young J. Kim GPU Accelerated Finding of Channels and Tunnels for a Protein Molecule . . . . . 87--108 Yulong Yu and Xubin He and He Guo and Yuxin Wang A Credit-Based Load-Balance-Aware CTA Scheduling Optimization Scheme in GPGPU 109--129 Xi Li and Anthony Ventresque and John Murphy SOC: Satisfaction-Oriented Virtual Machine Consolidation in Enterprise Data Centers . . . . . . . . . . . . . . . . 130--150 Yihua Ding and James Z. Wang and Pradip K. Srimani A Linear Time Self-stabilizing Algorithm for Minimal Weakly Connected Dominating Sets . . . . . . . . . . . . . . . . . . 151--162 Jian Cao and Qiang Li and Yuede Ji and Yukun He Detection of Forwarding-Based Malicious URLs in Online Social Networks . . . . . 163--180 Lizhi Peng and Bo Yang and Yuehui Chen Effectiveness of Statistical Features for Early Stage Internet Traffic Identification . . . . . . . . . . . . . 181--197 Zhaoxin Fan and Shuoying Chen and Li Zha A Text Clustering Approach of Chinese News Based on Neural Network Language Model . . . . . . . . . . . . . . . . . 198--206
Anonymous Editor's Note: Special Section on Data-Flow for Multicore . . . . . . . . 207--207 Sebastian Weis and Arne Garbade and Bernhard Fechner and Avi Mendelson and Roberto Giorgi and Theo Ungerer Architectural Support for Fault Tolerance in a Teradevice Dataflow System . . . . . . . . . . . . . . . . . 208--232 Dragos Sb\^\irlea and Jun Shirako and Ryan Newton and Vivek Sarkar SCnC: Efficient Unification of Streaming with Dynamic Task Parallelism . . . . . 233--256 Andreas Diavastos and Pedro Trancoso and Mikel Luján and Ian Watson Integrating Transactions into the Data-Driven Multi-threading Model Using the TFlux Platform . . . . . . . . . . . 257--277 Daniel Orozco and Elkin Garcia and Robert Pavel and Jaime Arteaga and Guang Gao The Design and Implementation of TIDeFlow: A Dataflow-Inspired Execution Model for Parallel Loops and Task Pipelining . . . . . . . . . . . . . . . 278--307 Anonymous Editor's Note: Special Section on Concurrent Systems: Status and Perspectives . . . . . . . . . . . . . . 308--308 Nakul Jindal and Victor Lotrich and Erik Deumens and Beverly A. Sanders Exploiting GPUs with the Super Instruction Architecture . . . . . . . . 309--324 W. Morven Gentleman Concurrency Paradigms: Competitive, Coordinated, and Collaborative: Which Control Mechanisms are Appropriate? . . 325--336 Emre Kültürsay and Kemal Ebcioglu and Gürhan Küçük and Mahmut T. Kandemir Memory Partitioning in the Limit . . . . 337--380
Anonymous Editor's Note: High-Level Parallel Programming and Applications (HLPP) . . 381--382 Clemens Grelck Guest Editorial for High-Level Parallel Programming and Applications . . . . . . 383--385 Miguel Areias and Ricardo Rocha A Lock-Free Hash Trie Design for Concurrent Tabled Logic Programs . . . . 386--406 Alvaro Estebanez and Diego R. Llanos and Arturo Gonzalez-Escribano New Data Structures to Handle Speculative Parallelization at Runtime 407--426 Ye Wang and Zhiyuan Li GridFOR: a Domain Specific Language for Parallel Grid-Based Applications . . . . 427--448 Antoine Tran Tan and Joel Falcou and Daniel Etiemble and Hartmut Kaiser Automatic Task-Based Code Generation for High Performance Domain Specific Embedded Language . . . . . . . . . . . 449--465 Kiminori Matsuzaki and Reina Miyazaki Parallel Tree Accumulations on MapReduce 466--485 Tarek Menouer and Mohamed Rezgui and Bertrand Le Cun and Jean-Charles Régin Mixing Static and Dynamic Partitioning to Parallelize a Constraint Programming Solver . . . . . . . . . . . . . . . . . 486--505 Usman Dastgeer and Christoph Kessler Smart Containers and Skeleton Programming for GPU-Based Systems . . . 506--530 Marco Aldinucci and Sonia Campa and Marco Danelutto and Peter Kilpatrick and Massimo Torquati Pool Evolution: a Parallel Pattern for Evolutionary and Symbolic Computing . . 531--551 Tristan Aubrey-Jones and Bernd Fischer Synthesizing MPI Implementations from Functional Data-Parallel Programs . . . 552--573 Jean Fortin and Frédéric Gava BSP-Why: a Tool for Deductive Verification of BSP Algorithms with Subgroup Synchronisation . . . . . . . . 574--597 Konrad Siek and Pawel T. Wojciechowski Atomic RMI: a Distributed Transactional Memory Framework . . . . . . . . . . . . 598--619 José M. Andión and Manuel Arenaz and François Bodin and Gabriel Rodríguez and Juan Touriño Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives . . . 620--643 Ali Jannesari and Felix Wolf Automatic Generation of Unit Tests for Correlated Variables in Parallel Programs . . . . . . . . . . . . . . . . 644--662 Carlos Alberto Martínez-Angeles and Haicheng Wu and Inês Dutra and Vítor Santos Costa and Jorge Buenabad-Chávez Relational Learning with GPUs: Accelerating Rule Coverage . . . . . . . 663--685 Shigeyuki Sato and Kiminori Matsuzaki A Generic Implementation of Tree Skeletons . . . . . . . . . . . . . . . 686--707
Juan Chabkinian and Thomas J. E. Schwarz SJ Fast LH$*$ . . . . . . . . . . . . . . . 709--734 Marco Lattuada and Christian Pilato and Fabrizio Ferrandi Performance Estimation of Task Graphs Based on Path Profiling . . . . . . . . 735--771 Srimanth Gadde and William Acosta and Jordan Ringenberg and Robert Green and Vijay Devabhaktuni Achieving Optimal Inter-Node Communication in Graph Partitioning Using Random Selection and Breadth-First Search . . . . . . . . . . . . . . . . . 772--800 Ayaz ul Hassan Khan and Mayez Al-Mouhamed and Allam Fatayer and Nazeeruddin Mohammad Optimizing the Matrix Multiplication Using Strassen and Winograd Algorithms with Limited Recursions on Many-Core . . 801--830 Ayaz ul Hassan Khan and Mayez Al-Mouhamed and Allam Fatayer and Nazeeruddin Mohammad Erratum to: Optimizing the Matrix Multiplication Using Strassen and Winograd Algorithms with Limited Recursions on Many--Core . . . . . . . . 831--831 Ren Li and Haibo Hu and Heng Li and Yunsong Wu and Jianxi Yang MapReduce Parallel Programming Model: A State-of-the-Art Survey . . . . . . . . 832--866 Etem Deniz and Alper Sen Using Machine Learning Techniques to Detect Parallel Patterns of Multi-threaded Applications . . . . . . 867--900 Giuliano Laccetti and Marco Lapegna and Valeria Mele A Loosely Coordinated Model for Heap-Based Priority Queues in Multicore Environments . . . . . . . . . . . . . . 901--921
Anonymous Editor's Note: Special Issue on Computing Frontiers . . . . . . . . . . 923--923 Andreea Anghel and Laura Mihaela Vasilescu and Giovanni Mariani and Rik Jongerius and Gero Dittmann An Instrumentation Approach for Hardware-Agnostic Software Characterization . . . . . . . . . . . . 924--948 Musfiq Rahman and Bruce R. Childers Asteroid: Scalable Online Memory Diagnostics for Multi-core, Multi-socket Servers . . . . . . . . . . . . . . . . 949--974 Giovanni Mariani and Andreea Anghel and Rik Jongerius and Gero Dittmann Scaling Properties of Parallel Applications to Exascale . . . . . . . . 975--1002 Leandro Fiorin and Erik Vermij and Jan van Lunteren and Rik Jongerius and Christoph Hagleitner Exploring the Design Space of an Energy-Efficient Accelerator for the SKA1-Low Central Signal Processor . . . 1003--1027 Archimedes Pavlidis and Dimitris Gizopoulos Hierarchical Synthesis of Quantum and Reversible Architectures . . . . . . . . 1028--1053 Rui Han and Jianfeng Zhan and Jose Vazquez-Poletti Luis SARP: Synopsis--Based Approximate Request Processing for Low Latency and Small Correctness Loss in Cloud Online Services . . . . . . . . . . . . . . . . 1054--1077 Vassilis Vassiliadis and Charalampos Chalios and Konstantinos Parasyris and Christos D. Antonopoulos and Spyros Lalis and Nikolaos Bellas and Hans Vandierendonck and Dimitrios S. Nikolopoulos Exploiting Significance of Computations for Energy-Constrained Approximate Computing . . . . . . . . . . . . . . . 1078--1098
Chao Wang and Nadia Nedjah and Luiza M. Mourelle and Aili Wang Preface to the Special Issue on Sequential Code Parallelization . . . . 1099--1101 Nadia Nedjah and Luiza de Macedo Mourelle and Chao Wang A Parallel Yet Pipelined Architecture for Efficient Implementation of the Advanced Encryption Standard Algorithm on Reconfigurable Hardware . . . . . . . 1102--1117 Huang Wang and Xianglan Chen and Huaping Chen A Cross-ISA Kernelized High-Performance Parallel Emulator . . . . . . . . . . . 1118--1141 Ansar Javed and Bibrak Qamar and Mohsan Jameel and Aamir Shafi and Bryan Carpenter Towards Scalable Java HPC with Hybrid and Native Communication Devices in MPJ Express . . . . . . . . . . . . . . . . 1142--1172 Nadia Nedjah and Rogério de M. Calazan and Luiza de Macedo Mourelle and Chao Wang Parallel Implementations of the Cooperative Particle Swarm Optimization on Many-core and Multi-core Architectures . . . . . . . . . . . . . 1173--1199 Alessandro Pellegrini and Sebastiano Peluso and Francesco Quaglia and Roberto Vitali Transparent Speculative Parallelization of Discrete Event Simulation Applications Using Global Variables . . 1200--1247 Xiaomeng Huang and Yufang Ni and Dexun Chen and Songbin Liu and Haohuan Fu and Guangwen Yang Czip: a Fast Lossless Compression Algorithm for Climate Data . . . . . . . 1248--1267 Rachid Habel and Frédérique Silber-Chaussumier and François Irigoin and Elisabeth Brunet and François Trahay Combining Data and Computation Distribution Directives for Hybrid Parallel Programming: a Transformation System . . . . . . . . . . . . . . . . . 1268--1295 Martin Frieb and Ralf Jahr and Haluk Ozaktas and Andreas Hugl and Hans Regler and Theo Ungerer A Parallelization Approach for Hard Real-Time Systems and Its Application on Two Industrial Programs . . . . . . . . 1296--1336 Alcides Fonseca and Bruno Cabral and João Rafael and Ivo Correia Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime . . . . . . . . . . . . 1337--1358 Abubakar Siddique and Mohammad Ansari and Mikel Luján Purge--Rehab: Eager Software Transactional Memory with High Performance Under Contention . . . . . . 1359--1383
Vijayalakshmi Srinivasan and Yunquan Zhang Special Issue on Network and Parallel Computing . . . . . . . . . . . . . . . 1--3 Jinbao Zhang and Xiaofei Liao and Hai Jin and Dong Liu and Li Lin and Kao Zhao An Optimal Page-Level Power Management Strategy in PCM--DRAM Hybrid Memory . . 4--16 Vesna Smiljkovi\'c and Osman Ünsal and Adrián Cristal and Mateo Valero Determinism at Standard-Library Level in TM-Based Applications . . . . . . . . . 17--29 Chencheng Ye and Jacob Brock and Chen Ding and Hai Jin Rochester Elastic Cache Utility (RECU): Unequal Cache Sharing is Good Economics 30--44 Song Wu and Yongchang Li and Xinhou Wang and Hai Jin and Hanhua Chen Vshadow: Promoting Physical Servers into Virtualization World . . . . . . . . . . 45--66 Yaojie Lu and Sotirios G. Ziavras Instruction Fusion for Multiscalar and Many-Core Processors . . . . . . . . . . 67--78 Jing Li and Lei Liu and Yuan Wu and Xiaobing Feng and Chengyong Wu Two-Level Task Scheduling for Irregular Applications on GPU Platform . . . . . . 79--93 Preeti Malakar and Venkatram Vishwanath Hierarchical Read--Write Optimizations for Scientific Applications with Multi-variable Structured Datasets . . . 94--108 Maksudul Alam and Maleq Khan Parallel Algorithms for Generating Random Networks with Given Degree Sequences . . . . . . . . . . . . . . . 109--127 Yu Zhang and Huifang Cao DMR: a Deterministic MapReduce for Multicore Systems . . . . . . . . . . . 128--141 Sheng Wang and Weizhong Qiang and Hai Jin and Jinfeng Yuan CovertInspector: Identification of Shared Memory Covert Timing Channel in Multi-tenanted Cloud . . . . . . . . . . 142--156 Jiansheng Yao and Chunguang Ma and Peng Wu and Gang Du and Qi Yuan An Opportunistic Network Coding Routing for Opportunistic Networks . . . . . . . 157--171 Yong Su and Zhan Wang and Zhiguo Fan and Zheng Cao and Xiaoli Liu and En Shao and Xuejun An and Ninghui Sun HyperFatTree: a Large-Scale Tree-Based Network with Low-Radix Switches . . . . 172--184 Xingjing Lu and Long Chen and Zhiyuan Li Performance Evaluation and Enhancement of Process-Based Parallel Loop Execution 185--198
Marco Danelutto and Susanna Pelagatti and Massimo Torquati Guest Editorial: High-Level Parallel Programming and Applications . . . . . . 199--202 Mehdi Goli and Horacio González-Vélez Autonomic Coordination of Skeleton-Based Applications Over CPU/GPU Multi-Core Architectures . . . . . . . . . . . . . 203--224 Alvaro Estebanez and Diego R. Llanos and Arturo Gonzalez-Escribano Using the Xeon Phi Platform to Run Speculatively-Parallelized Codes . . . . 225--241 Mathias Bourgoin and Emmanuel Chailloux and Jean-Luc Lamotte High Level Data Structures for GPGPU Programming in a Statically Typed Language . . . . . . . . . . . . . . . . 242--261 Rafael Sotomayor and Luis Miguel Sanchez and Javier Garcia Blas and Javier Fernandez and J. Daniel Garcia Automatic CPU/GPU Generation of Multi-versioned OpenCL Kernels for C++ Scientific Applications . . . . . . . . 262--282 Steffen Ernsting and Herbert Kuchen Data Parallel Algorithmic Skeletons with Accelerator Support . . . . . . . . . . 283--299 Frédéric Loulergue and Wadoud Bousdira and Julien Tesson Calculating Parallel Programs in Coq Using List Homomorphisms . . . . . . . . 300--319 Le-Duc Tung and Zhenjiang Hu Towards Systematic Parallelization of Graph Transformations Over Pregel . . . 320--339 V. Allombert and F. Gava and J. Tesson Multi-ML: Programming Multi-BSP Algorithms in ML . . . . . . . . . . . . 340--361 Kiminori Matsuzaki Functional Models of Hadoop MapReduce with Application to Scan . . . . . . . . 362--381 Tiziano De Matteis and Gabriele Mencagli Parallel Patterns for Window-Based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach . . . . . 382--401 J. Darlington and A. J. Field and L. Hakim Tackling Complexity in High Performance Computing Applications . . . . . . . . . 402--420
Pierre Laborde and Steven Feldman and Damian Dechev A Wait-Free Hash Map . . . . . . . . . . 421--448 Nuno Fachada and Vitor V. Lopes and Rui C. Martins and Agostinho C. Rosa Parallelization Strategies for Spatial Agent-Based Models . . . . . . . . . . . 449--481 Milos Cvetanovi\'c and Zaharije Radivojevi\'c and Veljko Milutinovi\'c Restart Optimization for Transactional Memory with Lazy Conflict Detection . . 482--507 Jiaquan Gao and Zejie Li and Ronghua Liang and Guixia He Adaptive Optimization $ l_1$-Minimization Solvers on GPU . . . . 508--529 Victor Garcia and Alejandro Rico and Carlos Villavieja and Paul Carpenter and Nacho Navarro and Alex Ramirez Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors . . 530--550 Ayaz H. Khan and Mayez Al-Mouhamed and Muhammed Al-Mulhem and Adel F. Ahmed RT-CUDA: a Software Tool for CUDA Code Restructuring . . . . . . . . . . . . . 551--594 Yiming Han and Anthony T. Chronopoulos Scalable Loop Self-scheduling Schemes for Large-Scale Clusters and Cloud Systems . . . . . . . . . . . . . . . . 595--611 Asim YarKhan and Jakub Kurzak and Piotr Luszczek and Jack Dongarra Porting the PLASMA Numerical Library to the OpenMP Standard . . . . . . . . . . 612--633 Krupa Sivakumaran and Arul Siromoney Priority Based Yield of Shared Cache to Provide Cache QoS in Multicore Systems 634--656 Shuai Che and Bradford M. Beckmann and Steven K. Reinhardt Programming GPGPU Graph Applications with Linear Algebra Building Blocks . . 657--679 Xiao-qing Wang and Xian-long Jin and Da-zhi Kou and Jia-hui Chen A Parallel Approach for the Generation of Unstructured Meshes with Billions of Elements on Distributed-Memory Supercomputers . . . . . . . . . . . . . 680--710 Mohammed Sourouri and Scott B. Baden and Xing Cai Panda: a Compiler Framework for Concurrent CPU $+$ GPU Execution of $3$D Stencil Computations on GPU-accelerated Supercomputers . . . . . . . . . . . . . 711--729
Maozhen Li and Zhuo Tang Guest Editorial: The Parallel Storage, Processing and Analysis for Big Data . . 731--733 Qicong Wang and Jinhao Zhao and Dingxi Gong and Yehu Shen and Maozhen Li and Yunqi Lei Parallelizing Convolutional Neural Networks for Action Event Recognition in Surveillance Videos . . . . . . . . . . 734--759 Yang Liu and Lixiong Xu and Maozhen Li The Parallelization of Back Propagation Neural Network in MapReduce and Spark 760--779 Kien Tuong Phan and Tomas Henrique Maul and Tuong Thuy Vu An Empirical Study on Improving the Speed and Generalization of Neural Networks Using a Parallel Circuit Approach . . . . . . . . . . . . . . . . 780--796 Hsiang-Huang Wu and Chien-Min Wang Generalization of Large-Scale Data Processing in One MapReduce Job for Coarse-Grained Parallelism . . . . . . . 797--826 Yan Wang and Kenli Li and Keqin Li Partition Scheduling on Heterogeneous Multicore Processors for Multi-dimensional Loops Applications . . 827--852 Zhuoer Gu and Ligang He and Cheng Chang and Jianhua Sun and Hao Chen and Chenlin Huang Developing an Efficient Pattern Discovery Method for CPU Utilizations of Computers . . . . . . . . . . . . . . . 853--878 Wei Liu and Lu Wang and Yuyue Du and Maozhen Li Deadlock Property Analysis of Concurrent Programs Based on Petri Net Structure 879--898 Aijia Ouyang and Xuyu Peng and Jing Liu and Ahmed Sallam Hardware/Software Partitioning for Heterogeneous MPSoC Considering Communication Overhead . . . . . . . . . 899--922 Yang Ou and Nong Xiao and Fang Liu and Zhiguang Chen and Wei Chen and Lizhou Wu Gemini: a Novel Hardware and Software Implementation of High-performance PCIe SSD . . . . . . . . . . . . . . . . . . 923--945 Mingzhu Deng and Wei Chen and Nong Xiao and Songping Yu and Yupeng Hu GLE-Dedup: a Globally-Locally Even Deduplication by Request-Aware Placement for Better Read Performance . . . . . . 946--964 Jiayi Du and Renfa Li and Zheng Xiao and Zhao Tong and Li Zhang Optimization of Data Allocation on CMP Embedded System with Data Migration . . 965--981 Yuyue Du and Lu Wang and Man Qi Constructing Service Clusters Based on Service Space . . . . . . . . . . . . . 982--1000 Yanan Sun and Yuyue Du and Maozhen Li A Repair of Workflow Models Based on Mirroring Matrices . . . . . . . . . . . 1001--1020
Giuliano Laccetti and Ian Foster and Marco Lapegna and Paul Messina and Raffaele Montella and Almerico Murli Guest Editorial for Hybrid Parallelism in New HPC Systems . . . . . . . . . . . 1021--1025 Ami Marowka Energy-Aware Modeling of Scaled Heterogeneous Systems . . . . . . . . . 1026--1045 Moritz Kreutzer and Jonas Thies and Melven Röhrig-Zöllner and Andreas Pieper and Faisal Shahzad and Martin Galgon and Achim Basermann and Holger Fehske and Georg Hager and Gerhard Wellein GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems . . . . . . . . . 1046--1072 Beata Bylina and Joanna Potiopa Explicit Fourth-Order Runge--Kutta Method on Intel Xeon Phi Coprocessor . . 1073--1090 Pawel Czarnul Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors . . . . . . . . . 1091--1107 Andrzej Glowacz and Marcin Pietro\'n Implementation of Digital Watermarking Algorithms in Parallel Hardware Accelerators . . . . . . . . . . . . . . 1108--1127 Jieun Choi and Theodora Adufu and Yoonhee Kim Data-Locality Aware Scientific Workflow Scheduling Methods in HPC Cloud Environments . . . . . . . . . . . . . . 1128--1141 Raffaele Montella and Giulio Giunta and Giuliano Laccetti and Marco Lapegna and Carlo Palmieri and Carmine Ferraro and Valentina Pelliccia and Cheol-Ho Hong and Ivor Spence and Dimitrios S. Nikolopoulos On the Virtualization of CUDA Based GPU Remoting on ARM and x86 Machines in the GVirtuS Framework . . . . . . . . . . . 1142--1163 G. B. Barone and V. Boccia and D. Bottalico and R. Campagna and L. Carracciuolo and G. Laccetti and M. Lapegna An Approach to Forecast Queue Time in Adaptive Scheduling: How to Mediate System Efficiency and Users Satisfaction 1164--1193 P. Natesan and R. R. Rajalaxmi and G. Gowrison and P. Balasubramanie Hadoop Based Parallel Binary Bat Algorithm for Network Intrusion Detection . . . . . . . . . . . . . . . 1194--1213 Rossella Arcucci and Luisa D'Amore and Luisa Carracciuolo and Giuseppe Scotti and Giuliano Laccetti A Decomposition of the Tikhonov Regularization Functional Oriented to Exploit Hybrid Multilevel Parallelism 1214--1235 Johannes Langguth and Qiang Lan and Namit Gaur and Xing Cai Accelerating Detailed Tissue-Scale $3$D Cardiac Simulations Using Heterogeneous CPU--Xeon Phi Computing . . . . . . . . 1236--1258
Zhiyuan Shao and Jian He and Huiming Lv and Hai Jin FOG: a Fast Out-of-Core Graph Processing Framework . . . . . . . . . . . . . . . 1259--1272 Hai Jin and Aaqif Afzaal Abbasi and Song Wu Pathfinder: Application-Aware Distributed Path Computation in Clouds 1273--1284 Yuanzhen Geng and Xuanhua Shi and Cheng Pei and Hai Jin and Wenbin Jiang LCS: an Efficient Data Eviction Strategy for Spark . . . . . . . . . . . . . . . 1285--1297 Chonghua Wang and Zhiyu Hao and Lei Cui and Xiangyu Zhang and Xiaochun Yun Introspection-Based Memory Pruning for Live VM Migration . . . . . . . . . . . 1298--1309 Fengfeng Pan and Yinliang Yue and Jin Xiong dCompaction: Delayed Compaction for the LSM-Tree . . . . . . . . . . . . . . . . 1310--1325 Sudakshina Dutta and Dipankar Sarkar and Arvind Rawat Synchronization Validation for Cross-Thread Dependences in Parallel Programs . . . . . . . . . . . . . . . . 1326--1365 Xing Fan and Mostafa Mehrabi and Oliver Sinnen and Nasser Giacaman Supporting Enhanced Exception Handling with OpenMP in Object--Oriented Languages . . . . . . . . . . . . . . . 1366--1389 Youcef Barigou and Edgar Gabriel Maximizing Communication--Computation Overlap Through Automatic Parallelization and Run-time Tuning of Non-blocking Collective Operations . . . 1390--1416 Guillermo Payá-Vayá and Andreas Gerstlauer Guest Editorial: Special Issue on the 2015 International Conference on Embedded Computer Systems --- Architectures, Modeling and Simulation (SAMOS XV) . . . . . . . . . . . . . . . 1417--1419 Pei Liu and Ahmed Hemani and Kolin Paul and Christian Weis and Matthias Jung and Norbert Wehn $3$D-Stacked Many-Core Architecture for Biological Sequence Analysis Problems 1420--1460 Yosi Ben Asher and Irina Lipov and Vladislav Tartakovsky and Dror Tiv Generating ASIPs with Reduced Number of Connections to the Register-File . . . . 1461--1487 Xinnian Zheng and Lizy K. John and Andreas Gerstlauer LACross: Learning-Based Analytical Cross-Platform Performance and Power Prediction . . . . . . . . . . . . . . . 1488--1514 Biao Wang and Diego F. de Souza and Mauricio Alvarez-Mesa and Chi Ching Chi and Ben Juurlink and Aleksandar Ilic and Nuno Roma and Leonel Sousa GPU Parallelization of HEVC In-Loop Filters . . . . . . . . . . . . . . . . 1515--1535 Nabil Hallou and Erven Rohou and Philippe Clauss Runtime Vectorization Transformations of Binary Code . . . . . . . . . . . . . . 1536--1565 Christian Weis and Abdul Mutaal and Omar Naji and Matthias Jung and Andreas Hansson and Norbert Wehn DRAMSpec: a High-Level DRAM Timing, Power and Area Exploration Tool . . . . 1566--1591 Miguel Angel Aguilar and Juan Fernando Eusse and Projjol Ray and Rainer Leupers and Gerd Ascheid and Weihua Sheng and Prashant Sharma Towards Parallelism Extraction for Heterogeneous Multicore Android Devices 1592--1624 Nuno Fachada and Vitor V. Lopes and Rui C. Martins and Agostinho C. Rosa Erratum to: Parallelization Strategies for Spatial Agent-Based Models . . . . . 1625--1626
Sergei Gorlatch and Herbert Kuchen Guest Editorial: High-Level Parallel Programming with Algorithmic Skeletons 1--3 Jan Stypka and Wojciech Turek and Aleksander Byrski and Marek Kisiel-Dorohinicki and Adam D. Barwell and Christopher Brown and Kevin Hammond and Vladimir Janjic The Missing Link! A New Skeleton for Evolutionary Multi-agent Systems in Erlang . . . . . . . . . . . . . . . . . 4--22 Michael Haidl and Sergei Gorlatch High-Level Programming for Many-Cores Using C++14 and the STL . . . . . . . . 23--41 Fabian Wrede and Steffen Ernsting Simultaneous CPU--GPU Execution of Data Parallel Algorithmic Skeletons . . . . . 42--61 August Ernstsson and Lu Li and Christoph Kessler SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems . . . . . . . . . . . . . . . . 62--80 Antonio Brogi and Marco Danelutto and Daniele De Sensi and Ahmad Ibrahim and Jacopo Soldani and Massimo Torquati Analysing Multiple QoS Attributes in Parallel Design Patterns-Based Applications . . . . . . . . . . . . . . 81--100 Ari Rasch and Sergei Gorlatch Multi-dimensional Homomorphisms and Their Implementation in OpenCL . . . . . 101--119 Mehdi Goli and Horacio González-Vélez Formalised Composition and Interaction for Heterogeneous Structured Parallelism 120--151 Venkatesh Kannan and G. W. Hamilton Functional Program Transformation for Parallelisation Using Skeletons . . . . 152--172
Jixiang Yang and Qingbi He Scheduling Parallel Computations by Work Stealing: a Survey . . . . . . . . . . . 173--197 Samer Arandi and George Matheou and Costas Kyriacou and Paraskevas Evripidou Data-Driven Thread Execution on Heterogeneous Processors . . . . . . . . 198--224 Saurabh Hukerikar and Keita Teranishi and Pedro C. Diniz and Robert F. Lucas RedThreads: an Interface for Application-Level Fault Detection/Correction Through Adaptive Redundant Multithreading . . . . . . . . 225--251 Jorge Silva and Ana Aguiar and Fernando Silva Parallel Asynchronous Strategies for the Execution of Feature Selection Algorithms . . . . . . . . . . . . . . . 252--283 Jawad Haj-Yihia and Yosi Ben-Asher Software Static Energy Modeling for Modern Processors . . . . . . . . . . . 284--312 Sai Charan Koduru and Keval Vora and Rajiv Gupta Software Speculation on Caching DSMs . . 313--332 Antonino Tumeo and Hubertus Franke and Gianluca Palermo and John Feo Guest Editorial: Special Issue on Computing Frontiers . . . . . . . . . . 333--335 Naila Farooqui and Indrajit Roy and Yuan Chen Vanish Talwar and Rajkishore Barik and Brian Lewis and Tatiana Shpeisman and Karsten Schwan Accelerating Data Analytics on Integrated GPU Platforms via Runtime Specialization . . . . . . . . . . . . . 336--375 Ke Wang and Elaheh Sadredini and Kevin Skadron Hierarchical Pattern Mining with the Automata Processor . . . . . . . . . . . 376--411 William Horn and Manoj Kumar and Joefon Jann and José Moreira and Pratap Pattnaik and Mauricio Serrano and Gabriel Tanase and Hao Yu Graph Programming Interface (GPI): a Linear Algebra Programming Model for Large Scale Graph Computations . . . . . 412--440 David Jaeger and Hendrik Graupner and Chris Pelchen and Feng Cheng and Christoph Meinel Fast Automated Processing and Evaluation of Identity Leaks . . . . . . . . . . . 441--470 Farhana Aleen and Vyacheslav P. Zakharin and Rakesh Krishnaiyer and Garima Gupta and David Kreitzer and Chang-Sun Lin, Jr. Automated Compiler Optimization of Multiple Vector Loads/Stores . . . . . . 471--503
Salvatore Cuomo and Marco Aldinucci and Massimo Torquati Guest Editorial for Programming Models and Algorithms for Data Analysis in HPC Systems . . . . . . . . . . . . . . . . 505--507 Awais Ahmad and Anand Paul and Sadia Din M. Mazhar Rathore and Gyu Sang Choi and Gwanggil Jeon Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing . . . 508--527 Pasquale De Michele and Francesco Maiorano and Livia Marcellino and Francesco Piccialli A GPU Implementation of OLPCA Method in Hybrid Environment . . . . . . . . . . . 528--542 Puneet Jai Kaur and Sakshi Kaushal and Arun Kumar Sangaiah and Francesco Piccialli A Framework for Assessing Reusability Using Package Cohesion Measure in Aspect Oriented Systems . . . . . . . . . . . . 543--564 Gang Mei and Salvatore Cuomo and Hong Tian and Nengxiong Xu and Linjun Peng MeshCleaner: a Generic and Straightforward Algorithm for Cleaning Finite Element Meshes . . . . . . . . . 565--583 Bastien Plazolles and Didier El Baz and Martin Spel and Vincent Rivola and Pascal Gegout SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi . . . . 584--606 Emilia Popa and Mauro Iacono and Florin Pop Adapting MCP and HLFET Algorithms to Multiple Simultaneous Scheduling . . . . 607--629 M. Mazhar Rathore and Hojae Son and Awais Ahmad and Anand Paul and Gwanggil Jeon Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem . . . . . . . . . . . . . . . 630--646
Anonymous Editor's Note: Special Issue on Network and Parallel Computing for New Architectures and Applications . . . . . 647--647 Yuntao Lu and Chao Wang and Lei Gong and Xuehai Zhou SparseNN: a Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks . . . . . . . . . . . . 648--659 Sijiang Fan and Jiawei Fei and Li Shen Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC . . . 660--673 Chengfan Jia and Junnan Liu and Xu Jin and Han Lin and Hong An and Wenting Han and Zheng Wu and Mengxian Chi Improving the Performance of Distributed TensorFlow with RDMA . . . . . . . . . . 674--685 Xiangyu Ju and Quan Chen and Zhenning Wang and Minyi Guo and Guang R. Gao DCF: a Dataflow-Based Collaborative Filtering Training Algorithm . . . . . . 686--698 Zhiwen Chen and Xin He and Jianhua Sun and Hao Chen Have Your Cake and Eat it (Too): a Concurrent Hash Table with Hardware Transactions . . . . . . . . . . . . . . 699--709 Donghyun Gouk and Jie Zhang and Myoungsoo Jung Enabling Realistic Logical Device Interface and Driver for NVM Express Enabled Full System Simulations . . . . 710--721 Wenjie Liu and Sheng Ma and Libo Huang and Zhiying Wang The Design of NoC-Side Memory Access Scheduling for Energy-Efficient GPGPUs 722--735 Yang Shi and Yanmin Zhu and Linpeng Huang Partial-PreSET: Enhancing Lifetime of PCM-Based Main Memory with Fine-Grained SET Operations . . . . . . . . . . . . . 736--748 Jian Gao and Hongmei Wei and Kang Yu and Peng Qing A Scalable Runtime Fault Localization Framework for High-Performance Computing Systems . . . . . . . . . . . . . . . . 749--761 Han Lin and Zhichao Su and Xiandong Meng and Xu Jin and Zhong Wang and Wenting Han and Hong An and Mengxian Chi and Zheng Wu Combining Hadoop with MPI to Solve Metagenomics Problems that are both Data- and Compute-intensive . . . . . . 762--775 Fan Sun and Chao Wang and Lei Gong and Yiwei Zhang and Chongchong Xu and Yuntao Lu and Xi Li and Xuehai Zhou UniCNN: a Pipelined Accelerator Towards Uniformed Computing for CNNs . . . . . . 776--787 Weiqi Dai and Yukun Du and Hai Jin and Weizhong Qiang and Deqing Zou and Shouhuai Xu and Zhongze Liu RollSec: Automatically Secure Software States Against General Rollback . . . . 788--805
Francesco Piccialli and Salvatore Cuomo and Gwanggil Jeon Parallel Approaches for Data Mining in the Internet of Things Realm . . . . . . 807--811 Santosh Kumar and Sanjay Kumar Singh and Ali Imam Abidi and Deepanwita Datta and Arun Kumar Sangaiah Group Sparse Representation Approach for Recognition of Cattle on Muzzle Point Images . . . . . . . . . . . . . . . . . 812--837 Xiaomin Yang and Wei Wu and Binyu Yan and Huiqian Wang and Kai Zhou and Kai Liu Infrared Image Super-Resolution with Parallel Random Forest . . . . . . . . . 838--858 Jun-fang Song Vehicle Detection Using Spatial Relationship GMM for Complex Urban Surveillance in Daytime and Nighttime 859--872 Jun-fang Song and Wei-xing Wang and Feng Chen Target Detection Based on $3$D Multi-Component Model and Inverse Projection Transformation . . . . . . . 873--885 Muhammad Farhan and Sohail Jabbar and Muhammad Aslam and Awais Ahmad and Muhammad Munwar Iqbal and Murad Khan and Martinez-Enriquez Ana Maria A Real-Time Data Mining Approach for Interaction Analytics Assessment: IoT Based Student Interaction Framework . . 886--903 Vanitha Mohanraj and R. Sakthivel and Anand Paul and Seungmin Rho High Performance GCM Architecture for the Security of High Speed Network . . . 904--922 Salvatore Cuomo and Pasquale De Michele and Emanuel Di Nardo and Livia Marcellino Parallel Implementation of a Machine Learning Algorithm on GPU . . . . . . . 923--942 Wei Lu and Xiaomin Yang and Xu Gou and Lihua Jian and Wei Wu and Gwanggil Jeon Parallel Heat Kernel Volume Based Local Binary Pattern on Multi-Orientation Planes for Face Representation . . . . . 943--962 Zengyu Ding and Gang Mei and Salvatore Cuomo and Nengxiong Xu and Hong Tian Performance Evaluation of GPU-Accelerated Spatial Interpolation Using Radial Basis Functions for Building Explicit Surfaces . . . . . . . 963--991 Atif Khan and Naomie Salim and Haleem Farman and Murad Khan and Bilal Jan and Awais Ahmad and Imran Ahmed and Anand Paul Abstractive Text Summarization based on Improved Semantic Graph Approach . . . . 992--1016
Dhirendra Pratap Singh and Ishan Joshi and Jaytrilok Choudhary Survey of GPU Based Sorting Algorithms 1017--1034 Rafael Palomar and Juan Gómez-Luna and Faouzi A. Cheikh and Joaqu\'ìn Olivares-Bueno and Ole J. Elle High-Performance Computation of Bézier Surfaces on Parallel and Heterogeneous Platforms . . . . . . . . . . . . . . . 1035--1062 Marcin Gorawski and Michal Lorek Efficient Processing of Large Data Structures on GPUs: Enumeration Scheme Based Optimisation . . . . . . . . . . . 1063--1093 Mina Hosseini Rad and Ahmad Patooghy and Mahdi Fazeli An Efficient Programming Skeleton for Clusters of Multi-Core Processors . . . 1094--1109 Lucia G. Menezo and Valentin Puente and Pablo Abad and Jose-Angel Gregorio Mosaic: a Scalable Coherence Protocol 1110--1138 David Wehr and Rafael Radkowski Parallel $ k d$-Tree Construction on the GPU with an Adaptive Split and Sort Strategy . . . . . . . . . . . . . . . . 1139--1156 Mengda He and Viktor Vafeiadis and Shengchao Qin João F. Ferreira GPS$+$: Reasoning About Fences and Relaxed Atomics . . . . . . . . . . . . 1157--1183 Anonymous Editor's Note: Special Issue on Embedded Computer Systems: Architectures, Modeling and Simulation . . . . . . . . 1184--1184 Catalin Bogdan Ciobanu and Georgi Gaydadjiev and Christian Pilato and Donatella Sciuto The Case for Polymorphic Registers in Dataflow Computing . . . . . . . . . . . 1185--1219 Christos Kyrkou and Theocharis Theocharides and Christos-Savvas Bouganis and Marios Polycarpou Boosting the Hardware-Efficiency of Cascade Support Vector Machines for Embedded Classification Applications . . 1220--1246 Christopher Thompson and Miles Gould and Nigel Topham High Speed Cycle-Approximate Simulation of Embedded Cache-Incoherent and Coherent Chip-Multiprocessors . . . . . 1247--1282 Timo Viitanen and Janne Helkala and Heikki Kultala and Pekka Jääskeläinen and Jarmo Takala and Tommi Zetterman and Heikki Berg Variable Length Instruction Compression on Transport Triggered Architectures . . 1283--1303 Dimitra Papagiannopoulou and Andrea Marongiu and Tali Moreshet and Luca Benini and Maurice Herlihy and R. Iris Bahar Hardware Transactional Memory Exploration in Coherence-Free Many-Core Architectures . . . . . . . . . . . . . 1304--1328
Christopher Brown Guest Editorial Special Issue: High-Level Programming for Heterogeneous Parallel Systems . . . . . . . . . . . . 1--2 Javier Fresno and Daniel Barba and Arturo Gonzalez-Escribano and Diego R. Llanos HitFlow: a Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems . . . . . . . . . 3--23 Georgios C. Chasparis and Michael Rossbory Efficient Dynamic Pinning of Parallelized Applications by Distributed Reinforcement Learning . . . . . . . . . 24--38 Matthew B. Ashcraft and Alexander Lemon and David A. Penry and Quinn Snell Compiler Optimization of Accelerator Data Transfers . . . . . . . . . . . . . 39--58 Moria Abadi and Sharon Keidar-Barner and Dmitry Pidan and Tatyana Veksler Verifying Parallel Code After Refactoring Using Equivalence Checking 59--73 Marco Danelutto and Tiziano De Matteis and Daniele De Sensi and Gabriele Mencagli and Massimo Torquati and Marco Aldinucci and Peter Kilpatrick The RePhrase Extended Pattern Set for Data Intensive Parallel Computing . . . 74--93 Ana Moreton-Fernandez and Arturo Gonzalez-Escribano and Diego R. Llanos Multi-device Controllers: a Library to Simplify Parallel Heterogeneous Programming . . . . . . . . . . . . . . 94--113 Wim Vanderbauwhede and Syed Waqar Nabi and Cristian Urlea Type-Driven Automated Program Transformations and Cost Modelling for Optimising Streaming Programs on FPGAs 114--136 Hamidreza Mohebbi Parallel SIMD CPU and GPU Implementations of Berlekamp--Massey Algorithm and Its Error Correction Application . . . . . . . . . . . . . . 137--160
J. Daniel García and Arturo Gonzalez-Escribano Guest Editorial: High-Level Parallel Programming and the Road to High Performance . . . . . . . . . . . . . . 161--163 Clemens Grelck and Heinrich Wiesinger Persistent Asynchronous Adaptive Specialization for Generic Array Programming . . . . . . . . . . . . . . 164--183 Arvid Jakobsson Automatic Cost Analysis for Imperative BSP Programs . . . . . . . . . . . . . . 184--212 Angeles Navarro and Francisco Corbera and Andres Rodriguez and Antonio Vilches and Rafael Asenjo Heterogeneous parallel\_for Template for CPU--GPU Chips . . . . . . . . . . . . . 213--233 Fabian Wrede and Breno Menezes and Herbert Kuchen Fish School Search with Algorithmic Skeletons . . . . . . . . . . . . . . . 234--252 Dalvan Griebler and Renato B. Hoffmann and Marco Danelutto and Luiz G. Fernandes High-Level and Productive Stream Parallelism for Dedup, Ferret, and Bzip2 253--271 Javier López-Fandiño and Dora B. Heras and Francisco Argüello and Mauro Dalla Mura GPU Framework for Change Detection in Multitemporal Hyperspectral Images . . . 272--292 Miguel A. Vega-Rodr\'ìguez and José M. Granado-Criado Parallel Programming in Bioinformatics: Some Interesting Approaches . . . . . . 293--295 Enzo Rucci and Carlos Garcia Sanchez and Guillermo Botella Juan and Armando De Giusti and Marcelo Naiouf and Manuel Prieto-Matias SWIMM 2.0: Enhanced Smith--Waterman on Intel's Multicore and Manycore Architectures Based on AVX-512 Vector Extensions . . . . . . . . . . . . . . . 296--316 Ferran Badosa and Antonio Espinosa and Cesar Acevedo and Gonzalo Vera and Ana Ripoll A History-Based Resource Manager for Genome Analysis Workflows Applications on Clusters with Heterogeneous Nodes . . 317--342
Feng Zhang and Jidong Zhai and Marc Snir and Hai Jin and Hironori Kasahara and Mateo Valero Guest Editorial: Special Issue on Network and Parallel Computing for Emerging Architectures and Applications 343--344 Dong Han and Shengyuan Zhou and Tian Zhi and Yibo Wang and Shaoli Liu Float-Fix: an Efficient and Hardware-Friendly Data Type for Deep Neural Network . . . . . . . . . . . . . 345--359 Yong Yu and Tian Zhi and Xuda Zhou and Shaoli Liu and Yunji Chen and Shuyao Cheng BSHIFT: a Low Cost Deep Neural Networks Accelerator . . . . . . . . . . . . . . 360--372 Lianke Qin and Yifan Gong and Tianqi Tang and Yutian Wang and Jiangming Jin Training Deep Nets with Progressive Batch Normalization on Multi-GPUs . . . 373--387 Huihui Zou and Shanjiang Tang and Ce Yu and Hao Fu and Yusen Li and Wenjie Tang ASW: Accelerating Smith--Waterman Algorithm on Coupled CPU--GPU Architecture . . . . . . . . . . . . . . 388--402 Junhong Liu and Xin He and Weifeng Liu and Guangming Tan Register-Aware Optimizations for Parallel Sparse Matrix--Matrix Multiplication . . . . . . . . . . . . . 403--417 Donglin Chen and Jianbin Fang and Shizhao Chen and Chuanfu Xu and Zheng Wang Optimizing Sparse Matrix--Vector Multiplications on an ARMv8-based Many-Core Architecture . . . . . . . . . 418--432 Kang Jin and Cunlu Li and Dezun Dong and Binzhang Fu HARE: History-Aware Adaptive Routing Algorithm for Endpoint Congestion in Networks-on-Chip . . . . . . . . . . . . 433--450 Cheng Pan and Lan Zhou and Yingwei Luo and Xiaolin Wang and Zhenlin Wang Lightweight and Accurate Memory Allocation in Key--Value Cache . . . . . 451--466 Mingfan Li and Ke Wen and Han Lin and Xu Jin and Zheng Wu and Hong An and Mengxian Chi Improving the Performance of Distributed MXNet with RDMA . . . . . . . . . . . . 467--480 Heyang Xu and Yang Liu and Wei Wei and Ying Xue Migration Cost and Energy-Aware Virtual Machine Consolidation Under Cloud Environments Considering Remaining Runtime . . . . . . . . . . . . . . . . 481--501 Bo Wang and Jie Tang and Rui Zhang and Wei Ding and Deyu Qi A Dependency-Aware Storage Schema Selection Mechanism for In-Memory Big Data Computing Frameworks . . . . . . . 502--519 Peng Zhao and Lei Liu and Wei Cao and Xiao Dong and Jiansong Li and Xiaobing Feng ElasticActor: an Actor System with Automatic Granularity Adjustment . . . . 520--534
Nahid Farhady Ghalaty Editorial: Special Issue on Side-Channel and Fault Analysis of High-Performance Computing Platforms . . . . . . . . . . 535--537 Ahmad Moghimi and Jan Wichelmann and Thomas Eisenbarth and Berk Sunar MemJam: a False Dependency Attack Against Constant-Time Crypto Implementations . . . . . . . . . . . . 538--570 Hongyu Fang and Sai Santosh Dayapule and Fan Yao and Milo\vs Doroslova\vcki and Guru Venkataramani PrODACT: Prefetch-Obfuscator to Defend Against Cache Timing Channels . . . . . 571--594 Fan Yao and Milo\vs Doroslova\vcki and Guru Venkataramani Covert Timing Channels Exploiting Cache Coherence Hardware: Characterization and Defense . . . . . . . . . . . . . . . . 595--620 Alejandro Cabrera Aldaya and Billy Bob Brumley and Alejandro J. Cabrera Sarmiento and Santiago Sánchez-Solano Memory Tampering Attack on Binary GCD Based Inversion Algorithms . . . . . . . 621--640 Qiang-Sheng Hua and Xuanhua Shi and Yinglong Xia and Howie Huang Guest Editorial: Special Issue on Algorithms and Systems on Big Graph Processing . . . . . . . . . . . . . . . 641--643 Huanzhou Zhu and Ligang He and Songling Fu and Rui Li and Xie Han and Zhangjie Fu and Yongjian Hu and Chang-Tsun Li WolfPath: Accelerating Iterative Traversing-Based Graph Processing Algorithms on GPU . . . . . . . . . . . 644--667 Zhiyuan Shao and Zhenjie Mei and Xiaofeng Ding and Hai Jin BlockGraphChi: Enabling Block Update in Out-of-Core Graph Processing . . . . . . 668--685 Deng Li and Zhujun Chen and Jiaqi Liu Analysis for Behavioral Economics in Social Networks: An Altruism-Based Dynamic Cooperation Model . . . . . . . 686--708 Wei Liu and Lu Wang and Xin Feng and Man Qi and Chun Yan and Maozhen Li Soundness Analytics of Composed Logical Workflow Nets . . . . . . . . . . . . . 709--724 Jianliang Gao and Jianxin Wang and Jianbiao He and Fengxia Yan Against Signed Graph Deanonymization Attacks on Social Networks . . . . . . . 725--739 Haipeng Yao and Qiyi Wang and Luyao Wang and Peiying Zhang and Maozhen Li and Yunjie Liu An Intrusion Detection Framework Based on Hybrid Multi-Level Data Mining . . . 740--758 Xingwang Wang and Xiaohui Wei and Shang Gao and Yuanyuan Liu and Zongpeng Li A Novel Auction-Based Query Pricing Schema . . . . . . . . . . . . . . . . . 759--780
David Niedzielski and Kleanthis Psarris An Analytical Evaluation of Data Dependence Analysis Techniques . . . . . 781--804 Misun Yu and Joon-Sang Lee and Doo-Hwan Bae AdaptiveLock: Efficient Hybrid Data Race Detection Based on Real-World Locking Patterns . . . . . . . . . . . . . . . . 805--837 Andrea Crivellini and Matteo Franciolini OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver . . . . . 838--873 Andreas Simbürger and Sven Apel PolyJIT: Polyhedral Optimization Just in Time . . . . . . . . . . . . . . . . . . 874--906 Mohammad Amin Irandoost and Amir Masoud Rahmani MapReduce Data Skewness Handling: a Systematic Literature Review . . . . . . 907--950 Fabien Reumont-Locke and Naser Ezzati-Jivan Efficient Methods for Trace Analysis Parallelization . . . . . . . . . . . . 951--972 Pierre Zins and Michel Dagenais Tracing and Profiling Machine Learning Dataflow Applications on GPU . . . . . . 973--1013 Ismail Akturk and Ozcan Ozturk Adaptive Thread Scheduling in Chip Multiprocessors . . . . . . . . . . . . 1014--1044 Anonymous Editor's Note: Special Issue on High-Level Languages and Frameworks for High-Performance Computing . . . . . . . 1045--1045 Hél\`ene Coullon and Julien Bigot Extensibility and Composability of a Multi-Stencil Domain Specific Framework 1046--1085 Brad Peterson and Alan Humphrey and Dan Sunderland Automatic Halo Management for the Uintah GPU--Heterogeneous Asynchronous Many-Task Runtime . . . . . . . . . . . 1086--1116 José L. Quiroz-Fabián and Graciela Román-Alonso VPPE: a Novel Visual Parallel Programming Environment . . . . . . . . 1117--1151
Re'em Harel and Idan Mosseri and Harel Levin and Lee-or Alon and Matan Rusanovsky and Gal Oren Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential . . . . . . . 1--31 Zhen Yu and Yu Zuo and Yong Zhao Convoider: a Concurrency Bug Avoider Based on Transparent Software Transactional Memory . . . . . . . . . . 32--60 Wensi Yang and Qingfeng Yao and Kejiang Ye and Cheng-Zhong Xu Empirical Mode Decomposition and Temporal Convolutional Networks for Remaining Useful Life Estimation . . . . 61--79 Donglin Chen and Jianbin Fang and Chuanfu Xu and Shizhao Chen and Zheng Wang Characterizing Scalability of Sparse Matrix--Vector Multiplications on Phytium FT-2000+ . . . . . . . . . . . . 80--97 Shuo Chen and Zhan Shi and Dan Feng and Shang Liu and Fang Wang and Lei Yang and Ruili Yu CSMqGraph: Coarse-Grained and Multi-external-storage Multi-queue I/O Management for Graph Computing . . . . . 98--118 Ziyue Jiang and Yifan Gong and Jidong Zhai and Yu-Ping Wang and Wei Liu and Hao Wu and Jiangming Jin Message Passing Optimization in Robot Operating System . . . . . . . . . . . . 119--136 Zelin Liu and Jian Cao and Yudong Tan and Quanwu Xiao and Mukesh Prasad Planning Above the API Clouds Before Flying Above the Clouds: a Real-Time Personalized Air Travel Planning Approach . . . . . . . . . . . . . . . . 137--156
Gwanggil Jeon and Awais Ahmad and Salvatore Cuomo and Burak Kantarci Guest Editorial: Special Issue on Emerging Technology for Software Define Network Enabled Internet of Things . . . 157--161 Farhan Ullah and Junfeng Wang and Muhammad Farhan and Sohail Jabbar and Muhammad Kashif Naseer and Muhammad Asif LSA Based Smart Assessment Methodology for SDN Infrastructure in IoT Environment . . . . . . . . . . . . . . 162--177 Murad Khan and Javed Iqbal and Muhammad Talha and Muhammad Arshad and Muhammad Diyan and Kijun Han Big Data Processing using Internet of Software Defined Things in Smart Cities 178--191 S. Ramesh and C. Yaashuwanth QoS and QoE Enhanced Resource Allocation for Wireless Video Sensor Networks Using Hybrid Optimization Algorithm . . . . . 192--212 Mudassar Ahmad and Usman Ahmad and Md Asri Ngadi and Muhammad Asif Habib and Shehzad Khalid and Rehan Ashraf Loss Based Congestion Control Module for Health Centers Deployed by Using Advanced IoT Based SDN Communication Networks . . . . . . . . . . . . . . . . 213--243 Fakhri Alam Khan and Awais Ahmad and Muhammad Imran Energy Optimization of PR--LEACH Routing Scheme Using Distance Awareness in Internet of Things Networks . . . . . . 244--263 Tao Han and Miaowang Zeng and Lijuan Zhang and Arun Kumar Sangaiah A Channel-Aware Duty Cycle Optimization for Node-to-Node Communications in the Internet of Medical Things . . . . . . . 264--279 Salah A. Alabady and Fadi Al-Turjman and Sadia Din A Novel Security Model for Cooperative Virtual Networks in the IoT Era . . . . 280--295 E. Anna Devi and J. Martin Leo Manickam Identifying Partitions in Wireless Sensor Network . . . . . . . . . . . . . 296--309 Hsiu-Sen Chiang and Arun Kumar Sangaiah and Mu-Yen Chen and Jia-Yu Liu A Novel Artificial Bee Colony Optimization Algorithm with SVM for Bio-inspired Software-Defined Networking 310--328 M. BalaAnand and N. Karthikeyan and S. Karthik Designing a Framework for Communal Software: Based on the Assessment Using Relation Modelling . . . . . . . . . . . 329--343 Idrees Ahmed and Abid Khan and Adeel Anjum and Mansoor Ahmed and Muhammad Asif Habib A Secure Provenance Scheme for Detecting Consecutive Colluding Users in Distributed Networks . . . . . . . . . . 344--366 Ghulam Shabbir and Adeel Akram and Muhammad Munwar Iqbal and Sohail Jabbar and Mai Alfawair and Junaid Chaudhry Network Performance Enhancement of Multi-sink Enabled Low Power Lossy Networks in SDN Based Internet of Things 367--398
A. N. Gnana Jeevan and M. A. Maluk Mohamed DyTO: Dynamic Task Offloading Strategy for Mobile Cloud Computing Using Surrogate Object Model . . . . . . . . . 399--415 A. K. Gnanasekar and V. Nagarajan Efficient MAI Cancellation Scheme in MC-DS-CDMA Using SIC . . . . . . . . . . 416--430 R. Saravana Ram and A. Gopi Saminathan and S. Arun Prakash An Area Efficient and Low Power Consumption of Run Time Digital System Based on Dynamic Partial Reconfiguration 431--446 Sathees Lingam Paulswamy and Hariharan Kaluvan Quadrant Based Neighbor to Sink and Neighbor to Source Routing Protocol and Alternate Node Deployment Strategies for WSN . . . . . . . . . . . . . . . . . . 447--469 M. A. Manazir Ahsan and Ihsan Ali and Mohd Yamani Idna Bin Idris and Muhammad Imran and Muhammad Shoaib Countering Statistical Attacks in Cloud-Based Searchable Encryption . . . 470--495 E. Laxmi Lydia and P. Krishna Kumar and K. Shankar and S. K. Lakshmanaprabu and R. M. Vidhyavathi and Andino Maseleno Charismatic Document Clustering Through Novel $K$-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction . . . . . . . . . . . 496--514 R. Ramya Devi and V. Vijaya Chamundeeswari Triple DES: Privacy Preserving in Big Data Healthcare . . . . . . . . . . . . 515--533 Zengyu Ding and Gang Mei and Salvatore Cuomo and Yixuan Li and Nengxiong Xu Comparison of Estimating Missing Values in IoT Time Series Data Using Different Interpolation Algorithms . . . . . . . . 534--548 P. Durgadevi and S. Srinivasan Resource Allocation in Cloud Computing Using SFLA and Cuckoo Search Hybridization . . . . . . . . . . . . . 549--565 Bowei Shan and Yong Fang GPU Accelerated Parallel Algorithm of Sliding-Window Belief Propagation for LDPC Codes . . . . . . . . . . . . . . . 566--579 M. A. Manazir Ahsan and Ihsan Ali and Mohd Yamani Idna Bin Idris and Muhammad Imran and Muhammad Shoaib Correction to: Countering Statistical Attacks in Cloud-Based Searchable Encryption . . . . . . . . . . . . . . . 580--580
Christoph Kessler Guest Editor's Note: High-Level Parallel Programming 2019 . . . . . . . . . . . . 581--582 Christopher Brown and Vladimir Janjic and J. McCall Programming Heterogeneous Parallel Machines Using Refactoring and Monte-Carlo Tree Search . . . . . . . . 583--602 Christopher Brown and Vladimir Janjic and Kenneth MacKenzie Refactoring GrPPI: Generic Refactoring for Generic Parallelism in C++ . . . . . 603--625 F. Gava and Y. Marquer Axiomatization and Imperative Characterization of Multi-BSP Algorithms: A Q&A on a Partial Solution 626--651 Clemens Grelck and Cédric Blom Resource-Aware Data Parallel Array Processing . . . . . . . . . . . . . . . 652--674 M. Köster and J. Groß and A. Krüger Massively Parallel Rule-Based Interpreter Execution on GPUs Using Thread Compaction . . . . . . . . . . . 675--691 Luca Rinaldi and Massimo Torquati and Marco Danelutto Improving the Performance of Actors on Multi-cores with Parallel Patterns . . . 692--712 Fabian Wrede and Herbert Kuchen Towards High-Performance Code Generation for Multi-GPU Clusters Based on a Domain-Specific Language for Algorithmic Skeletons . . . . . . . . . . . . . . . 713--728 Anonymous Editor's Note . . . . . . . . . . . . . 729--729 Kang Jin and Dezun Dong and Binzhang Fu DancerFly: An Order-Aware Network-on-Chip Router On-the-Fly Mitigating Multi-path Packet Reordering 730--749 Junmin Xiao and Guizhao Zhang and Guangming Tan Fast Data-Obtaining Algorithm for Data Assimilation with Large Data Set . . . . 750--770
Ayman A. Ataher Mahmud and Satakshi and W. Jeberson Aircraft Landing Scheduling Using Embedded Flower Pollination Algorithm 771--785 P. Gowtham and V. P. Arunachalam and S. Karthik An Efficient Monitoring of Real Time Traffic Clearance for an Emergency Service Vehicle Using IOT . . . . . . . 786--812 S. Chidambaram and A. Sumathi Optimal Feature Selection for the Classification of Hyperspectral Imagery Using Adaptive Spectral--Spatial Clustering . . . . . . . . . . . . . . . 813--832 M. S. Arunkumar and P. Suresh and C. Gunavathi High Utility Infrequent Itemset Mining Using a Customized Ant Colony Algorithm 833--849 Puneet Jai Kaur and Sakshi Kaushal A Fuzzy Approach for Estimating Quality of Aspect Oriented Systems . . . . . . . 850--869 Iftikhar Ahmad and Rafidah Md Noor and Muhammad Shoaib A Cooperative Heterogeneous Vehicular Clustering Mechanism for Road Traffic Management . . . . . . . . . . . . . . . 870--889 Han Zhang and Yurong Qian and Chenwei Tian A ViBe Based Moving Targets Edge Detection Algorithm and Its Parallel Implementation . . . . . . . . . . . . . 890--908 Seokhoon Ryu and Young-Sup Lee and Seonghyun Kim Active Control of Engine Sound Quality in a Passenger Car Using a Virtual Error Microphone . . . . . . . . . . . . . . . 909--927 Wei Wang and Huansheng Song and Hua Cui Landslide Multi-attitude Data Measurement of Bedding Rock Slope Model 928--939
Zeyu He and Qiuli Huang and Chuliang Weng Handling Data Skew for Aggregation in Spark SQL Using Task Stealing . . . . . 941--956 Kim Grüttner and Philipp A. Hartmann and Wolfgang Rosenstiel A Timed-Value Stream Based ESL Timing and Power Estimation and Simulation Framework for Heterogeneous MPSoCs . . . 957--1007 Yuanzhe Li and Loren Schwiebert Memory-Optimized Wavefront Parallelism on GPUs . . . . . . . . . . . . . . . . 1008--1031 Jihyun Park and Byoungju Choi and Seungyeun Jang Dynamic Analysis Method for Concurrency Bugs in Multi-process/Multi-thread Environments . . . . . . . . . . . . . . 1032--1060
Tim Süß and Lars Nagel and Thomas Soddemann Pure Functions in C: A Small Keyword for Automatic Parallelization . . . . . . . 1--24 Bo Wang and Jie Tang and Deyu Qi A Task-Aware Fine-Grained Storage Selection Mechanism for In-Memory Big Data Computing Frameworks . . . . . . . 25--50 Evan Coleman and Erik J. Jensen and Masha Sosonkina Fault Recovery Methods for Asynchronous Linear Solvers . . . . . . . . . . . . . 51--80 Jean-Charles Papin and Christophe Denoual and Raymond Namyst SPAWN: An Iterative, Potentials-Based, Dynamic Scheduling and Partitioning Tool 81--103 Raphael Beamonte and Naser Ezzati-Jivan and Michel R. Dagenais Automated Generation of Model-Based Constraints for Common Multi-core and Real-Time Applications Using Execution Tracing . . . . . . . . . . . . . . . . 104--134
Anonymous Editor's Note: Special Issue on High-level Programming for Heterogeneous Parallel Systems (2019) . . . . . . . . 135--135 Adam Seewald and Ulrik Pagh Schultz and Henrik Skov Midtiby Coarse-Grained Computation-Oriented Energy Modeling for Heterogeneous Parallel Embedded Systems . . . . . . . 136--157 V. Pothos and E. Vassalos and N. Fragoulis Deep Learning Inference with Dynamic Graphs on Heterogeneous Platforms . . . 158--176 Marco Danelutto and Gabriele Mencagli and Peter Kilpatrick Algorithmic Skeletons and Parallel Design Patterns in Mainstream Parallel Programming . . . . . . . . . . . . . . 177--198 Anonymous Editor's Note: Special Issue on International Embedded Systems Symposium (2019) . . . . . . . . . . . . . . . . . 199--199 Zhongqi Cheng and Tim Schmidt and Rainer Dömer Scaled Static Analysis and IP Reuse for Out-of-Order Parallel SystemC Simulation 200--215 Tomoaki Kawada and Shinya Honda and Hiroaki Takada TZmCFI: RTOS-Aware Control-Flow Integrity Using TrustZone for Armv8-M 216--236 Paulo C. Santos and João P. C. de Lima and Luigi Carro Enabling Near-Data Accelerators Adoption by Through Investigation of Datapath Solutions . . . . . . . . . . . . . . . 237--252 Menbere Kina Tekleyohannes and Vladimir Rybalkin and Andreas Dengel $i$DocChip: A Configurable Hardware Architecture for Historical Document Image Processing . . . . . . . . . . . . 253--284
Amartya Mukherjee and Prateeti Mukherjee and Nilanjan Dey iGridEdgeDrone: Hybrid Mobility Aware Intelligent Load Forecasting by Edge Enabled Internet of Drone Things for Smart Grid Networks . . . . . . . . . . 285--325 Furat Al-Obaidy and Arghavan Asad and Farah A. Mohammadi A Power-Aware Hybrid Cache for Chip-Multi Processors Based on Neural Network Prediction Technique . . . . . . 326--346 Maria Fazio and Alina Buzachis and Massimo Villari A Map-Reduce Approach for the Dijkstra Algorithm in SDN Over Osmotic Computing Systems . . . . . . . . . . . . . . . . 347--375 Guillaume Iooss and Christophe Alias and Sanjay Rajopadhye Monoparametric Tiling of Polyhedral Programs . . . . . . . . . . . . . . . . 376--409 Isil Öz and Sanem Arslan Predicting the Soft Error Vulnerability of Parallel Applications Using Machine Learning . . . . . . . . . . . . . . . . 410--439 Iraklis M. Spiliotis and Charalampos Sitaridis and Michael P. Bekakos Parallel Computation of Discrete Orthogonal Moment on Block Represented Images Using OpenMP . . . . . . . . . . 440--462 Biao Xing and DanDan Wang and Cuihua He Accelerating DES and AES Algorithms for a Heterogeneous Many-core Processor . . 463--486
Jörg Mische and Martin Frieb and Theo Ungerer PIMP My Many-Core: Pipeline-Integrated Message Passing . . . . . . . . . . . . 487--505 Sven Rheindt and Sebastian Maier and Andreas Herkersdorf \pkgDySHARQ: Dynamic Software-Defined Hardware-Managed Queues for Tile-Based Architectures . . . . . . . . . . . . . 506--540 Sven Gesper and Moritz Weißbrich and Guillermo Payá-Vayá Evaluation of Different Processor Architecture Organizations for On-Site Electronics in Harsh Environments . . . 541--569 Akshay Srivatsa and Mostafa Mansour and Andreas Herkersdorf \pkgDynaCo: Dynamic Coherence Management for Tiled Manycore Architectures . . . . 570--599 Rafael Stahl and Alexander Hoffman and Ulf Schlichtmann \pkgDeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices . . . . . . . . . . . . . . . . 600--624
Guangming Tan and Guang R. Gao Guest Editorial: Special issue on Network and Parallel Computing for Emerging Architectures and Applications 625--627 Jiansong Li and Wei Cao and Xiaobing Feng Compiler-assisted Operator Template Library for DNN Accelerators . . . . . . 628--645 Tianba Chen and Wei Li and Yunchun Li o\pkgM-DRL: Deep Reinforcement Learning Based Coflow Traffic Scheduler with MLFQ Threshold Adaption . . . . . . . . . . . 646--657 Zhanyuan Di and En Shao and Guangming Tan High-performance Migration Tool for Live Container in a Workflow . . . . . . . . 658--670 Ziyu Zhang and Zitan Liu and Hong An RDMA-Based Apache Storm for High-Performance Stream Data Processing 671--684 Yang Bai and Dinghuang Hu and Xiangke Liao CCRP: Converging Credit-Based and Reactive Protocols in Datacenters . . . 685--699 Hui Dong and Jianxi Fan and Jingya Zhou Fault-Tolerant and Unicast Performances of the Data Center Network HSDC . . . . 700--714 Mengshan Yu and Guisheng Fan and Liang Chen Location-based and Time-aware Service Recommendation in Mobile Edge Computing 715--731 Haonan Ji and Shibo Lu and Brian Vinter Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations . . 732--744 Xiao Hu and Zhonghai Lu A Configurable Hardware Architecture for Runtime Application of Network Calculus 745--760
Troels Henriksen Bounds Checking on GPU . . . . . . . . . 761--775 Breno A. de Melo Menezes and Nina Herrmann and Fernando Buarque de Lima Neto High-Level Parallel Ant Colony Optimization with Algorithmic Skeletons 776--801 Frédéric Dabrowski On Single-Valuedness in Textually Aligned SPMD Programs . . . . . . . . . 802--819 Millán A. Martínez and Basilio B. Fraguela and José C. Cabaleiro A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems . . . . . . . . . . . . . . . . 820--845 August Ernstsson and Johan Ahlqvist and Christoph Kessler \pkgSkePU 3: Portable High-Level Programming of Heterogeneous Systems and HPC Clusters . . . . . . . . . . . . . . 846--866 Pascal Jungblut and Karl Fürlinger Portable Node-Level Parallelism for the PGAS Model . . . . . . . . . . . . . . . 867--885 Vladimir Janjic and Christopher Brown and Adam D. Barwell Restoration of Legacy Parallelism: Transforming Pthreads into Farm and Pipeline Patterns . . . . . . . . . . . 886--910 Anshu S. Anand and Karthik Sayani and R. K. Shyamasundar Fortress Abstractions in X10 Framework 911--933
Neeraj Gupta and Mahdi Khosravy and Rubén González Crespo Lightweight Artificial Intelligence Technology for Health Diagnosis of Agriculture Vehicles: Parallel Evolving Artificial Neural Networks by Genetic Algorithm . . . . . . . . . . . . . . . 1--26 Fei Yin and Feng Shi A Comparative Survey of Big Data Computing and HPC: From a Parallel Programming Model to a Cluster Architecture . . . . . . . . . . . . . . 27--64 Jichi Guo and Qing Yi and Kleanthis Psarris Enhancing the Effectiveness of Inlining in Automatic Parallelization . . . . . . 65--88 Talha Naqash and Sajjad Hussain Shah and Muhammad Najam Ul Islam Statistical Analysis Based Intrusion Detection System for Ultra-High-Speed Software Defined Network . . . . . . . . 89--114 Tongsheng Geng and Marcos Amaris and Jean-Luc Gaudiot A Profile-Based AI-Assisted Dynamic Scheduling Approach for Heterogeneous Architectures . . . . . . . . . . . . . 115--151 Rajesh Pandian Muniasamy and Rupesh Nasre and N. S. Narayanaswamy Accelerating Computation of Steiner Trees on GPUs . . . . . . . . . . . . . 152--185
Marc Reichenbach and Matthias Jung and Alex Orailoglu Guest Editorial: Special Issue on 2020 IEEE International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS 2020) . . . . . . . . . . . . . . 187--188 Sohan Lal and Bogaraju Sharatchandra Varma and Ben Juurlink A Quantitative Study of Locality in GPU Caches for Memory-Divergent Workloads 189--216 Lukas Steiner and Matthias Jung and Norbert Wehn DRAMSys4.0: an Open-Source Simulation Framework for In-depth DRAM Analyses . . 217--242 Mark Sagi and Nguyen Anh Vu Doan and Andreas Herkersdorf Fine-Grained Power Modeling of Multicore Processors Using FFNNs . . . . . . . . . 243--266 Minyu Cui and Angeliki Kritikakou and Emmanuel Casseau Energy-Efficient Partial-Duplication Task Mapping Under Multiple DVFS Schemes 267--294 Niko Zurstraßen and Lukas Jünger and Rainer Leupers AMAIX In-Depth: a Generic Analytical Model for Deep Learning Accelerators . . 295--318
August Ernstsson and Nicolas Vandenbergen and Christoph Kessler A Deterministic Portable Parallel Pseudo-Random Number Generator for Pattern-Based Programming of Heterogeneous Parallel Systems . . . . . 319--340 Peter Thoman and Florian Tischler and Thomas Fahringer The Celerity High-level API: C++20 for Accelerator Clusters . . . . . . . . . . 341--359 Sébastien Rivault and Mostafa Bamha and Sophie Robert A Scalable Similarity Join Algorithm Based on MapReduce and LSH . . . . . . . 360--380 Hemalatha Eedi and Sahith Karra and Rahul Utkoor An Improved/Optimized Practical Non-Blocking PageRank Algorithm for Massive Graphs* . . . . . . . . . . . . 381--404 Vasilios Kelefouras and Karim Djemame and Nikolaos Voros A Methodology for Efficient Tile Size Selection for Affine Loop Kernels . . . 405--432
Nina Herrmann and Breno A. de Melo Menezes and Herbert Kuchen Stencil Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments . . . . . . . . . . . . . . 433--453 Júnior Löff and Renato B. Hoffmann and Ricardo Pieper and Dalvan Griebler and Luiz G. Fernandes DSParLib: a C++ Template Library for Distributed Stream Parallelism . . . . . 454--485 Breno Augusto de Melo Menezes and Herbert Kuchen and Fernando Buarque de Lima Neto Parallelization of Swarm Intelligence Algorithms: Literature Review . . . . . 486--514 Jash Khatri and Arihant Samar and Bikash Behera and Rupesh Nasre Scaling the Maximum Flow Computation on GPUs . . . . . . . . . . . . . . . . . . 515--561 S. Ramesh and C. Yaashuwanth Retraction Note: QoS and QoE Enhanced Resource Allocation for Wireless Video Sensor Networks Using Hybrid Optimization Algorithm . . . . . . . . . 562--562
Nicol\`o Tonci and Massimo Torquati and Gabriele Mencagli and Marco Danelutto Distributed-Memory \pkgFastFlow Building Blocks . . . . . . . . . . . . . . . . . 1--21 Rui S. Silva and João L. Sobral Efficient High-Level Programming in Plain Java . . . . . . . . . . . . . . . 22--42 Stephen Timcheck and Jeremy Buhler Interruptible Nodes: Reducing Queueing Costs in Irregular Streaming Dataflow Applications on Wide-SIMD Architectures 43--60 August Ernstsson and Dalvan Griebler and Christoph Kessler Assessing Application Efficiency and Performance Portability in Single-Source Programming for Heterogeneous Parallel Systems . . . . . . . . . . . . . . . . 61--82 Ruairidh MacGregor and Blair Archibald and Phil Trinder Generic Exact Combinatorial Search at HPC Scale . . . . . . . . . . . . . . . 83--106 M. BalaAnand and N. Karthikeyan and S. Karthik Retraction Note: Designing a Framework for Communal Software: Based on the Assessment Using Relation Modelling . . 107--107
Haoran Wang and Thibaut Tachon and Chong Li and Sophie Robert and Sébastien Limet SMSG: Profiling-Free Parallelism Modeling for Distributed Training of DNN 109--127 Grace Nansamba and Amani Altarawneh and Anthony Skjellum A Fault-Model-Relevant Classification of Consensus Mechanisms for MPI and HPC . . 128--149 Fabian Knorr and Peter Thoman and Thomas Fahringer Declarative Data Flow in a Graph-Based Distributed Memory Runtime System . . . 150--171 Nina Herrmann and Herbert Kuchen Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments . . . . . . . . . 172--185 Lo\"\ic Sylvestre and Emmanuel Chailloux and Jocelyn Sérot Accelerating OCaml Programs on FPGA . . 186--207
Matthew Norman and Isaac Lyngaas and Abhishek Bagusetty and Mark Berrill Portable C++ Code that can Look and Feel Like Fortran Code with Yet Another Kernel Launcher (YAKL) . . . . . . . . . 209--230 Daniel Presser and Frank Siqueira Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks 231--255 Vsevolod Bohaienko Calculation of Distributed-Order Fractional Derivative on Tensor Cores-Enabled GPU . . . . . . . . . . . 256--270 Virginia Niculescu and Frédéric Loulergue Guest Editor's Note: High--Level Parallel Programming 2021 . . . . . . . 271--273
Polychronis Velentzas and Michael Vassilakopoulos and Antonio Corral and Christos Antonopoulos GPU-Based Algorithms for Processing the $k$ Nearest--Neighbor Query on Spatial Data Using Partitioning and Concurrent Kernel Execution . . . . . . . . . . . . 275--308 Yacine Hakimi and Riyadh Baghdadi and Yacine Challal A Hybrid Machine Learning Model for Code Optimization . . . . . . . . . . . . . . 309--331
Alex Orailoglu and Marc Reichenbach and Matthias Jung Special Issue on SAMOS 2022 . . . . . . 1--2 Viktor Razilov and Robert Wittig and Emil Matú\vs and Gerhard Fettweis Access Interval Prediction by Partial Matching for Tightly Coupled Memory Systems . . . . . . . . . . . . . . . . 3--19 Milad Kokhazadeh and Georgios Keramidas and Vasilios Kelefouras and Iakovos Stamoulis A Practical Approach for Employing Tensor Train Decomposition in Edge Devices . . . . . . . . . . . . . . . . 20--39 Christian Heidorn and Muhammad Sabih and Nicolai Meyerhöfer and Christian Schinabeck and Jürgen Teich and Frank Hannig Hardware-Aware Evolutionary Explainable Filter Pruning for Convolutional Neural Networks . . . . . . . . . . . . . . . . 40--58 Luise Müller and Philipp Wanko and Christian Haubelt and Torsten Schaub Investigating Methods for ASPmT-Based Design Space Exploration in Evolutionary Product Design . . . . . . . . . . . . . 59--92 Alessandro Ottaviano and Robert Balas and Giovanni Bambini and Antonio Del Vecchio and Maicol Ciani and Davide Rossi and Luca Benini and Andrea Bartolini ControlPULP: a RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation . . . . . . . . . . . . . . . 93--123
Yingpeng Wen and Zhilin Qiu and Dongyu Zhang and Dan Huang and Nong Xiao and Liang Lin Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method . . . . 125--146 Alif Ahmed and Farzana Ahmed Siddique and Kevin Skadron GraphTango: a Hybrid Representation Format for Efficient Streaming Graph Updates and Analysis . . . . . . . . . . 147--170 Fabian Knorr and Philip Salzmann and Peter Thoman and Thomas Fahringer Automatic Discovery of Collective Communication Patterns in Parallelized Task Graphs . . . . . . . . . . . . . . 171--186 Pedro Moreno and Miguel Areias and Ricardo Rocha and Vítor Santos Costa Yet Another Lock-Free Atom Table Design for Scalable Symbol Management in Prolog 187--206 Nicol\`o Tonci and Sébastien Rivault and Mostafa Bamha and Sophie Robert and Sébastien Limet and Massimo Torquati LSH SimilarityJoin Pattern in \pkgFastFlow . . . . . . . . . . . . . . 207--230
Bing Wei and Qiang Huang and Hui Chen and Chenhao Zhang and Limin Xiao Erasure-Coded Hybrid Writes Based on Data Delta . . . . . . . . . . . . . . . 231--252 Björn Birath and August Ernstsson and John Tinnerholm and Christoph Kessler High-Level Programming of FPGA-Accelerated Systems with Parallel Patterns . . . . . . . . . . . . . . . . 253--273 Nina Herrmann and Justus Dieckmann and Herbert Kuchen Optimizing Three-Dimensional Stencil-Operations on Heterogeneous Computing Environments . . . . . . . . . 274--297 Achilleas Tzenetopoulos and Dimosthenis Masouros and Sotirios Xydis and Dimitrios Soudris Orchestration Extensions for Interference- and Heterogeneity-Aware Placement for Data-Analytics . . . . . . 298--323
Bhanu Dwivedi and Bachu Dushmanta Kumar Patro RMOWOA: a Revamped Multi-Objective Whale Optimization Algorithm for Maximizing the Lifetime of a Network in Wireless Sensor Networks . . . . . . . . . . . . 325--366 Mustafa Sanli Design and Performance Evaluation of a Novel High-Speed Hardware Architecture for Keccak Crypto Coprocessor . . . . . 367--379 Songwen Pei and Wei Qin and Jianan Li and Junhao Tan and Jie Tang and Jean-Luc Gaudiot Intelligent Page Migration on Heterogeneous Memory by Using Transformer . . . . . . . . . . . . . . 380--399 Kevin Jude Concessao and Unnikrishnan Cheramangalath and Ricky Dev and Rupesh Nasre Meerkat: a Framework for Dynamic Graph Algorithms on GPUs . . . . . . . . . . . 400--453
Assia Brighen and Asma Chouikh and Hamida Ikhlef and Hachem Slimani and Abdelmounaam Rezgui and Hamamache Kheddouci Giraph-Based Distributed Algorithms for Coloring Large-Scale Graphs . . . . . . ?? Re'em Harel and Tal Kadosh and Niranjan Hasabnis and Timothy Mattson and Yuval Pinter and Gal Oren PragFormer: Data-Driven Parallel Source Code Classification with Transformers ?? Jianwu Long and Luping Liu K*-Means: an Efficient Clustering Algorithm with Adaptive Decision Boundaries . . . . . . . . . . . . . . . ?? Naw Safrin Sattar and Khaled Z. Ibrahim and Aydin Buluc and Shaikh Arifuzzaman DyG-DPCD: a Distributed Parallel Community Detection Algorithm for Large-Scale Dynamic Graphs . . . . . . . ?? Stefan Brankovi\'c and Lazar Smiljkovi\'c and Predrag Obradovi\'c and Milo\vs Radonjii\'c and Marko Mi\vsi\'c Fast Parallel CPU--GPU Approximate Spectral Clustering for Transcriptomics Data . . . . . . . . . . . . . . . . . . ?? Anonymous Journal navigation . . . . . . . . . . . ??
M. Mohamed Asan Basiri High Throughput Instruction-Data Level Parallelism Based Arithmetic Hardware Accelerator . . . . . . . . . . . . . . ?? Valentin Beauvais and Nicol\`o Tonci and Sophie Robert and Sébastien Limet Parallelizing RNA-Seq Analysis with \pkgBioSkel: a \pkgFastFlow Based Prototype . . . . . . . . . . . . . . . ?? Yaseen Zaidi and Simon Winberg Automatic Heterogeneous Runtime Using Signal Processing Domain-Specific and Parallel Patterns . . . . . . . . . . . ?? Parinaz Barakhshan and Rudolf Eigenmann Advancing Interactive Parallelization: \pkgiCetus . . . . . . . . . . . . . . . ?? Marco Edoardo Santimaria and Alberto Riccardo Martinelli and Iacopo Colonnelli and Barbara Cantalupo and Massimo Torquati and Marco Aldinucci CAPIO-CL: The CAPIO Coordination Language . . . . . . . . . . . . . . . . ?? Christopher Brown and Adam D. Barwell \pkgpi-par: a Dependently-Typed Parallel Language with Algorithmic Skeletons . . ?? Simone Frassinelli and Gabriele Mencagli Larger-Than-Memory Stateful Stream Processing with WindFlow . . . . . . . . ?? Paolo Palazzari and Marco Faltelli and Francesco Iannone FIPLib: an Image Processing Library for FPGAs Using High-Level Synthesis . . . . ?? Ricardo Leonarczyk and Gabriele Mencagli and Dalvan Griebler Self-Adaptive Micro-Batching for Low-Latency GPU-Accelerated Stream Processing . . . . . . . . . . . . . . . ?? Michail Boulasikis and Flavius Gruian and Robert-Zoltán Szász Using Machine Learning Hardware to Solve Linear Partial Differential Equations with Finite Difference Methods . . . . . ?? William Ruys and Hochan Lee and Bozhi You and Shreya Talati and Jaeyoung Park and James Almgren-Bell and Yineng Yan and Milinda Fernando and Mattan Erez and Milos Gligoric and Martin Burtscher and Christopher J. Rossbach and Keshav Pingali and George Biros Performance Characterization of Python Runtimes for Multi-device Task Parallel Programming . . . . . . . . . . . . . . ?? Anonymous Journal navigation . . . . . . . . . . . ??