Last update:
Fri Feb 7 17:36:14 MST 2025
Joanne L. Martin An Invitation To Participate . . . . . . 3--4 Erich Bloch Supercomputing and the Growth of Computational Science in the National Science Foundation . . . . . . . . . . . 5--8 Richard A. Friesner and Jean-Philippe Brunet and Robert E. Wyatt and Claude Leforestier and Steven Binkley Computational Approach to Large Quantum Dynamical Problems . . . . . . . . . . . 9--23 John M. Dawson and Viktor K. Decyk and Brendan McNamara Particle Modeling of Plasmas On Supercomputers . . . . . . . . . . . . . 24--43 George N. Reeke, Jr. and Gerald M. Edelman and Dan Sulzbach Selective Neural Networks and Their Implications for Recognition Automata 44--69 Rami Melhem and Dennis Gannon Toward Efficient Implementation of Preconditioned Conjugate Gradient Methods on Vector Supercomputers . . . . 70--98 Francis Sullivan and Jack Dongarra Algorithm Design for Large-Scale Computations . . . . . . . . . . . . . . 99--105 Anonymous High-Speed Computing and Artificial Intelligence Connection . . . . . . . . 106--110 Anonymous An Agenda for Improved Evaluation of Supercomputer Performance . . . . . . . 110--111 Jack Dongarra Book Reviews: \booktitleThe Connection Machine . . . . . . . . . . . . . . . . 112--112 Joanne L. Martin Book Reviews: \booktitleHigh-Speed Computing: Scientific Applications and Algorithm Design . . . . . . . . . . . . 113--113 Anonymous Software for High Performance Computers 114--115 Anonymous Advanced Computing Research Facility Offers Opportunities for Experimentation in Multiprocessing . . . . . . . . . . . 115--116 Anonymous Dispelling the ``No Software Myth . . . 116--116
Robert B. Wilhelmson A Walk Into the Future . . . . . . . . . 3--5 Merry Maisel Science At the San Diego Supercomputer Center . . . . . . . . . . . . . . . . . 6--10 William R. Martin and Forrest B. Brown Status of Vectorized Monte Carlo for Particle Transport Analysis . . . . . . 11--32 David C. Torney and Tony T. Warnock and Peter Kollman Computer Simulation of Diffusion-Limited Chemical Reactions in Three Dimensions 33--43 Boles\law K. Szymanski and Dieter Mueller-Wichards Parallel Programming With Recurrent Equations . . . . . . . . . . . . . . . 44--74 S. T. Kao and E. L. Leiss and Olin Johnson An Experimental Implementation of Migration Algorithms on the Intel Hypercube . . . . . . . . . . . . . . . 75--99 B. Buslee Book Review: \booktitleSupercomputers: Value and Trends, Bill Buzbee, Computer Research and Applications Group, Computing and Communications Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545 . . . . . . . . 100--103
Joanne L. Martin The Missing Pieces . . . . . . . . . . . 3--4 Tricia Nunns Supercomputing in Western Canada . . . . 5--11 Petter E. Bjòrstad and Jon Braekhus and John Aldag Implementation and Performance of the Large-Scale Finite Element Code Sesam On a Wide Range of Scientific Computers . . 12--25 R. E. Benner and G. R. Montry and G. G. Weigand and Iain Duff Concurrent Multifrontal Methods: Shared Memory, Cache, and Frontwidth Issues . . 26--44 Misako Ishiguro and Hiroo Harada and Mitsuhiro Makino and Joanne L. Martin Performance Analysis of Vectorized Nuclear Codes on a FACOM VP-100 at the Japan Atomic Energy Research Institute 45--56 William R. Martin and Tzu-Chiang Wan and Tarek S. Abdel-Rahman and Trevor N. Mudge and Kenichi Miura Monte Carlo Photon Transport On Shared Memory and Distributed Memory Parallel Processors . . . . . . . . . . . . . . . 57--74 Michelle Y. Kim and Anil Nigam and George Paul and Robert J. Flynn and Garry H. Rodrigue Disk Interleaving and Very Large Fast Fourier Transforms . . . . . . . . . . . 75--96 Anonymous Networking Needs and Trends in Data Communications . . . . . . . . . . . . . 97--100
Brendan McNamara The Mass Market for Supercomputing . . . 3--4 Dennis Meredith Science and Technology At Cornell's Theory Center . . . . . . . . . . . . . 5--9 C. Cleveland Ashcraft and Roger G. Grimes and John G. Lewis and Barry W. Peyton and Horst D. Simon and Petter E. Bjòrstad Progress in Sparse Matrix Methods for Large Linear Systems on Vector Supercomputers . . . . . . . . . . . . . 10--30 Christian E. Petersen and Christopher A. Sims Computer Simulation of Large Scale Econometric Models: Project LINK . . . . 31--53 Albert Ando and Paul Beaumont and Matthew Ando and Christopher A. Sims Efficiency of the CYBER 205 for Stochastic Simulations of a Simultaneous, Nonlinear, Dynamic Econometric Model . . . . . . . . . . . 54--81 Diana Choi and Creon Levit and Steven E. Follin Implementation of a Distributed Interactive Graphics System . . . . . . 82--95 Stuart E. Rogers and Pieter G. Buning and Fergus J. Merritt and Steven E. Follin Distributed Interactive Graphics Applications in Computational Fluid Dynamics . . . . . . . . . . . . . . . . 96--105 David Salzman Visualization in Scientific Computing: Summary of an NSF-Sponsored Panel Report On Graphics, Image Processing, and Workstations . . . . . . . . . . . . . . 106--108 Gregory J. McRae Book Reviews: \booktitleThe Characteristics of Parallel Algorithms 109--110 Joanne L. Martin Book Reviews: \booktitleSupercomputers and Their Use . . . . . . . . . . . . . 110--111
Dennis Gannon Programming Environments for Supercomputing . . . . . . . . . . . . . 3--4 Ralph Z. Roskies and Penny D. Sackett Science At the Pittsburgh Supercomputing Center . . . . . . . . . . . . . . . . . 5--11 Kyle Gallivan and William Jalby and Ulrike Meier and Ahmed H. Sameh Impact of Hierarchical Memory Systems on Linear Algebra Algorithm Design . . . . 12--48 O. Terki-Hassaine and E. L. Leiss Multitasking $3$-D Forward Modeling Using High Order Finite Difference Methods on the CRAY X-MP/416 . . . . . . 49--65 M. Lescrenier and Ph. L. Tointt Large Scale Unconstrained Optimization on the FPS 164 and CRAY X-MP Vector Processors . . . . . . . . . . . . . . . 66--81 David H. Bailey A High-Performance FFT Algorithm for Vector Supercomputers . . . . . . . . . 82--87 Gene M. Amdahl Limits of Expectation . . . . . . . . . 88--94 John W. D. Connolly Book Review: \booktitleThe Supercomputer Era . . . . . . . . . . . . . . . . . . 95--96
Robert A. Brown Supercomputers in Chemistry and Chemical Engineering . . . . . . . . . . . . . . 3--4 Jan Almlöf and Donald G. Truhlar and H. T. Davis and Klavs F. Jensen and Matthew Tirrell and Terry Lybrand Supercomputer Chemistry At the University of Minnesota . . . . . . . . 5--15 Gregory J. McRae and Jana B. Milford and Barbara J. Slompak Changing Roles for Supercomputing in Chemical Engineering . . . . . . . . . . 16--40 Donna A. Bassolino and Fumio Hirata and Douglas B. Kitchen and Dorothea Kominos and Arthur Pardi and Ronald M. Levy Determination of Protein Structures in Solution Using NMR Data and IMPACT . . . 41--61 David A. Dixon and Frederic A. Van-Cadedge A Molecular Model for the Helicity of Polytetrafluoro Ethylene (Tefloin\reg) 62--81 Joanne L. Martin Book Review: \booktitleSupercomputer Research in Chemistry and Chemical Engineering (ACS Symposium Series, Vol. 353) . . . . . . . . . . . . . . . . . . 82--83
Joanne L. Martin A Retrospective . . . . . . . . . . . . 3--5 William H. Allen Centers of Supercomputing --- Science at the National Center for Supercomputing 6--9 Arvind and David E. Culler and Gino K. Maa Assessing the Benefits of Fine-Grain Parallelism in Dataflow Programs . . . . 10--36 Michael W. Berry and Ahmed Sameh Multiprocessor Schemes for Solving Block Tridiagonal Linear Systems . . . . . . . 37--57 Kazutami Tago and Hiroki Kumahora and Noriyuki Sadaoka and Kinya Kobayashi Vectorized calculations and use of fast semiconductor memories in the DV-X $ \alpha $ method . . . . . . . . . . . . 58--72 Heihachiro Hara and Yoichi Kodera and Kazuhiko Kanehiro Flow Simulations By Parallel Computer MiPax . . . . . . . . . . . . . . . . . 73--80
Patrick Gaffney IBM Bergen Scientific Centre and the International Conference On Vector and Parallel Computing . . . . . . . . . . . 3--4 Gérard Meurant Domain Decomposition Methods for Partial Differential Equations on Parallel Computers . . . . . . . . . . . . . . . 5--12 James J. Little and Tomaso Poggio and Edward B. Gamble, Jr. Seeing in parallel: the Vision Machine 13--28 Linda G. Shapiro Programming Parallel Vision Algorithms: a Dataflow Language Approach . . . . . . 29--44 Richard E. Ewing Large-Scale Computing in Reservoir Simulation . . . . . . . . . . . . . . . 45--53 J. A. Hertz Statistical Mechanics of Neural Computation . . . . . . . . . . . . . . 54--62 Wolfgang Gentzsch Comparison of Supercomputers and Mini-Supercomputers for Computational Fluid Dynamics Calculations . . . . . . 63--71 Tony F. Chan Domain Decomposition Algorithms and Computational Fluid Dynamics . . . . . . 72--83 C. David Callahan and Keith D. Cooper and Robert T. Hood and Ken Kennedy and Linda Torczon ParaScope: a Parallel Programming Environment . . . . . . . . . . . . . . 84--99 Jan Kok Parallel Programming With Ada . . . . . 100--108 Hermann Mierendorff and Karl Solchenbach and Ulrich Trottenberg On the SUPRENUM System . . . . . . . . . 109--117 Albert M. Erisman Supercomputing as a tool for product development . . . . . . . . . . . . . . 118--121
Joanne L. Martin Supercomputers, Networks, and Privacy 3--4 Christopher Eoyang and Raul H. Mendez Supercomputing in Japan: Institute for Supercomputing Research . . . . . . . . 5--9 Linda Kaufman and Norm Schryer Solving two-dimensional partial differential equations on vector and scalar machines . . . . . . . . . . . . 10--33 Anna Nagurney and Dae-Shik S. Kim Parallel and Serial Variational Inequality Decomposition Algorithms for Multicommodity Market Equilibrium Problems . . . . . . . . . . . . . . . . 34--58 Gary R. Montry Massively Parallel Mathematical Sieves 59--74 Swarn P. Kumar Solving tridiagonal linear systems on the Butterfly parallel computer . . . . 75--81 K. J. M. Moriarty Parallel Processing of Large-Scale Applications on Powerful Multiple Processors . . . . . . . . . . . . . . . 82--87
John E. Aldag The Impact of Supercomputers: Global, Pervasive, Positive . . . . . . . . . . 3--5 lain S. Duff CERFACS: a European Center for High-Performance Computation . . . . . . 6--9 J. A. Sethian and James B. Salem Animation of Interactive Fluid Flow Visualization Tools on a Data Parallel Machine . . . . . . . . . . . . . . . . 10--39 Michel J. Daydé and Iain S. Duff Level 3 BLAS in $ L U $ Factorization on the CRAY-2, ETA-10P, and IBM 3090-200/VF 40--70 Daniel A. Menascé and Virgilio A. F. Almeida Analytic Models of Supercomputer Performance in Multiprogramming Environments . . . . . . . . . . . . . . 71--91 David A. Mandell and Harold E. Trease Parallel Processing a Three-Dimensional Free-Lagrange Code: a Case History . . . 92--99 Frederic A. Van-Catledge Toward a General Model for Evaluating the Relative Performance of Computer Systems . . . . . . . . . . . . . . . . 100--108
Joanne L. Martin Supercomputing: Beyond the Daily Planet 3--4 M. Berry and D. Chen and P. Koss and D. Kuck and S. Lo and Y. Pang and L. Pointer and R. Roloff and A. Sameh and E. Clementi and S. Chin and D. Schneider and G. Fox and P. Messina and D. Walker and C. Hsiung and J. Schwarzmeier and K. Lue and S. Orszag and F. Seidl and O. Johnson and R. Goodrum and J. Martin The PERFECT Club Benchmarks: Effective Performance Evaluation of Supercomputers 5--40 Patrick R. Amestoy and Iain S. Duff Vectorization of a Multiprocessor Multifrontal Code . . . . . . . . . . . 41--59 Armel de La Bourdonnaye The Element By Element Method as a Preconditioner for Linear Systems Coming From Finite Element Models . . . . . . . 60--68 Brendan McNamara Supercomputer Throughput Benchmarks for the Cray-1s and Cyber 205 With Estimates for Class VII Supercomputers . . . . . . 69--85 David H. Bailey and Horst D. Simon and John T. Barton and Martin J. Fouts Floating Point Arithmetic in Future Supercomputers . . . . . . . . . . . . . 86--90 P. Y.-T. Hsu and B. R. Rau and K. J. M. Moriarty Applications Development on the Very Long Instruction Word CYDRA-5 . . . . . 91--98
Bill Buzbee Report From Trondheim . . . . . . . . . 3--5 Jack Dongarra Advanced Computing Research Facility, Mathematics and Computer Science Division, Argonne National Laboratory 6--8 Dennis C. Jespersen and Creon Levit A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer . . . . 9--27 Verena Meiser Umar and Charlotte Froese Fischer Multitasking the Davidson Algorithm for the Large, Sparse Eigenvalue Problem . . 28--53 K. J. M. Moriarty Optimizing the SU(3) Lattice Gauge Theory Algorithm on the NEC SX-2 Supercomputer . . . . . . . . . . . . . 54--63 Hwa A. Lim and Gregory Riccardi and Charles M. Bauer and Sanjay Sharma A Vector Algorithm for Lattice Gas Hydrodynamics . . . . . . . . . . . . . 64--67 R. C. Brower and K. J. M. Moriarty and P. Tamayo A Fast Algorithm To Simulate the Microcanonical Dynamics of the Ising Model . . . . . . . . . . . . . . . . . 68--72 Anna Nagumey Book Review: \booktitleParallel and Distributed Computation: Numerical Methods . . . . . . . . . . . . . . . . 73--74
Sidney Fembach A U.S. high-performance computing program . . . . . . . . . . . . . . . . 3--5 Alison Brown and Ashley Burns and Kevin Wohlever Centers of supercomputing --- research at the Ohio Supercomputer Center . . . . 6--9 Kevin J. M. Moriarty and Claudio Rebbi Supercomputer Methods for the Solution of Fundamental Problems of Particle Physics . . . . . . . . . . . . . . . . 10--30 Ernst L. Leiss and Raj H. Thapar Three-Dimensional Dip Moveout on the SX-2: An XMU Implementation . . . . . . 31--48 Anna Nagurney and Dae-Shik Kim and Alan G. Robinson Serial and Parallel Equilibration of Large-Scale Constrained Matrix Problems with Application to the Social and Economic Sciences . . . . . . . . . . . 49--71 Abdulmannan Saati and Sedat Biringen and Charbel Farhat Solving Navier--Stokes Equations on a Massively Parallel Processor: Beyond the 1 GFLOP Performance . . . . . . . . . . 72--80 Mary-Anne Mahaffy The Direction of Numerically Intensive Computing in Higher Education . . . . . 81--87 Horst D. Simon Are Highly Parallel Systems Ready for Prime Time? . . . . . . . . . . . . . . 88--94 Edward D. Lazowska and Kenneth C. Sevcik Workshop on Scientific Computing Performance Analysis . . . . . . . . . . 95--97
Steve Follin About This Issue . . . . . . . . . . . . 3--4 Warren M. Washington and Thomas W. Bettge and Gerald A. Meehl and Jeffery B. Yost Computer Simulation of the Global Climatic Effects of Increased Greenhouse Gases . . . . . . . . . . . . . . . . . 5--19 Robert B. Wilhelmson and Brian F. Jewett and Crystal Shaw and Louis J. Wicker and Matthew Arrott and Colleen B. Bushell and Mark Bajuk and Jeffrey Thingvold and Jeffery B. Yost A Study of the Evolution of a Numerically Modeled Severe Storm . . . . 20--36 Mark A. Johnson and James J. O'Brien Modeling the Pacific Ocean . . . . . . . 37--47 W. Reid Thompson Global Four-Band Spectral Classification of Jupiter's Clouds: Color/Albedo Units and Trends . . . . . . . . . . . . . . . 48--65 Susumu Shirayama and Kunio Kuwahara Flow Visualization in Computational Fluid Dynamics . . . . . . . . . . . . . 66--80 Nateri K. Madavan and Paul Kelaita and Sharad Gavali Supercomputer Applications in Gas Turbine Flowfield Simulation . . . . . . 81--95 Stuart E. Rogers and Dochan Kwak and Cetin Kiris and I-Dee Chang Numerical Simulation of Flow Through Biofluid Devices . . . . . . . . . . . . 96--106 Ichiro Hagiwara and Masaaki Tsuda and Yoshihiro Sato and Yuichi Kitagawa Simulation of Automobile Side Member Collapse for Crash Energy Management . . 107--114 Akio Koide Visual Simulation of a Chemical Reaction 115--123 Fumiko Yonezawa and Shoichi Sakamoto and Shuichi Nosé Glass, Transition . . . . . . . . . . . 124--133 David A. Dixon and William B. Farnham and Patrick J. Capobianco Quantum Chemical Molecular Models for Fluorinated Polymers: Visualization of Structures and Vibrational Motions . . . 134--149 Robert B. Haber Scientific Visualization: What's Beyond the Vision? . . . . . . . . . . . . . . 150--153 James D. Foley Scientific Data Visualization Software: Trends and Directions . . . . . . . . . 154--157
Tom Kitchens The U.S. Department of Energy's ``Grand Challenge'' Program . . . . . . . . . . 3--5 Arthur A. Mirin The National Energy Research Supercomputer Center . . . . . . . . . . 6--10 Brian E. Hingerty and Suse Broyde Atomic Resolution Structures of DNA and DNA Modified by Carcinogens . . . . . . 11--21 Rajiv K. Kalia and Priya Vashishta and Lin H. Yang and Fred W. Dech and John Rowlan Quantum Molecular Dynamics: a New Algorithm for Linear and Nonlinear Electron Transport in Disordered Materials . . . . . . . . . . . . . . . 22--33 D. V. Anderson and W. A. Cooper and R. Gruber and S. Merazzi and U. Schwenn Methods for the Efficient Calculation of the Magnetohydrodynamic (MHD) Stability Properties of Magnetically Confined Fusion Plasmas . . . . . . . . . . . . . 34--47 K. M. Bitar and R. Edwards and U. Heller and A. D. Kennedy and W. Liu and T. A. DeGrand and S. A. Gottlieb and A. Krasnitz and J. B. Kogut and R. L. Renken and M. C. Ogilvie and P. Rossi and D. K. Sinclair and K. C. Wang and R. L. Sugar and M. Teper and D. Toussaint The High Energy Monte Carlo Grand Challenge: Simulating Quarks and Gluons 48--60 Claude Bernard and Rajan Gupta and Gregory Kilcup and Stephen R. Sharpe and Amarjit Soni Lattice Calculation of Electroweak Amplitude . . . . . . . . . . . . . . . 61--71 Keh-Fei Liu Hadron Structure and Interaction from lattice Quantum Chromodynamics Calculations . . . . . . . . . . . . . . 72--80 Andrew Pohorille and Wilson S. Ross and Ignacio Tinoco, Jr. DNA Dynamics in Aqueous Solution: Opening the Double Helix . . . . . . . . 81--96 B. A. Carreras and N. Dominguez and J. B. Drake and J.-N. Leboeuf and L. A. Charlton and J. A. Holmes and D. K. Lee and V. E. Lynch and L. Garcia Plasma Turbulence Calculations On Supercomputers . . . . . . . . . . . . . 97--110 Y.-Y. Ye and C.-T. Chan and K.-M. Ho and B. N. Harmon Total Energy Calculations for Structural Phase Transformations . . . . . . . . . 111--121 James W. Davenport and Guo-Xin Qian and Gayanath W. Fernando and Michael Weinert First Principles Molecular Dynamics Studies of Liquid and Solid Sodium . . . 122--130
Larry Lee and Sunny Christensen The North Carolina Supercomputing Center: a Study of Economic Development Impact . . . . . . . . . . . . . . . . . 3--8 Sangback Ma and Anthony T. Chronopoulos Implementation of Iterative Methods for Large Sparse Nonsymmetric Linear Systems on a Parallel Vector Machine . . . . . . 9--24 Marco Zaider and David E. Orr and John L. Fry Calculational Aspects of the Assessment of Dielectric Response Function and Energy Loss in Materials: Applications to Ice and Polyacetylene . . . . . . . . 25--39 Jack C. M. Wang and John M. Gary and Hari K. Iyer A Technique to Evaluate Benchmarks: a Case Study Using the Livermore Loops . . 40--55 B. McNamara and K. J. M. Moriarty Computer-Aided Software Development Tools for the Supercomputer Environment 56--70 Robert B. Haber and David A. McNabb and Robert A. Ellis Eliminating Distance in Scientific Computing: An Experiment in Televisualization . . . . . . . . . . . 71--89 Gary Demos Issues in Applying Massively Parallel Computing Power . . . . . . . . . . . . 90--105
Charlotte Froese Fischer Concurrent Vector Algorithms for Spline Solutions of the Helium Pair Equation 5--20 X. W. Wang and Steven G. Louie and Marvin L. Cohen Predicting High-Pressure and Excited-State Properties of Real Materials . . . . . . . . . . . . . . . 21--33 L. G. Ferreira and S.-H. Wei and Alex Zunger Stability, Electronic Structure, and Phase Diagrams of Novel Inter-Semiconductor Compounds . . . . . 34--56 Gregory J. Tawa and Jules W. Moskowitz and Paula A. Whitlock and Kevin E. Schmidt Accurate First Principles Calculation of Many-Body Interactions . . . . . . . . . 57--71 Mutsumi Aoyagi and Ron Shepard and Albert F. Wagner An Ab Initio Theoretical Study of the $ \hbox {CH} + \hbox {H}_2 \rightleftharpoons \hbox {CH}_3^* \rightleftharpoons \hbox {CH}_2 + \hbox {H} $ Reactions . . . . . . . . . . . . 72--89 Hans M. Amman and David A. Kendrick Parallel Processing for Large-Scale Nonlinear Control Experiments in Economics . . . . . . . . . . . . . . . 90--95
Chris Barrett and Frank Bobrowicz and Ralph G. Brickner and Bradley A. Clark and Rajan Gupta and Ann H. Hayes and Harold Trease and Andrew B. White, Jr. Centers of supercomputing --- supercomputing at Los Alamos National Laboratory . . . . . . . . . . . . . . . 3--9 Michael T. Heath and George A. Geist and John B. Drake Early Experience with the Intel iPSC/860 at Oak Ridge National Laboratory . . . . 10--26 James M. Hutchinson and Stavros A. Zenios Financial Simulations on a Massively Parallel Connection Machine . . . . . . 28--46 Maurice Yarrow and Unmeel B. Mehta Multiprocessing on Supercomputers for Computational Aerodynamics . . . . . . . 47--73 Hong-Qiang Ding Simulating Lattice QCD on a Caltech/JPL Hypercube . . . . . . . . . . . . . . . 74--81 Hsieh-Lung Hsu and Hojjat Adeli A Microtasking Algorithm for Optimization of Structures . . . . . . . 82--91 Michael P. Persons and Lawrence L. Halcomb Decoupled asynchronous I/O for data processing applications on supercomputers . . . . . . . . . . . . . 92--95 Nora H. Sabelli Perspectives: Role of High-Performance Computing in Science Education . . . . . 95--98 Anonymous Meetings . . . . . . . . . . . . . . . . 102--103 Anonymous The International Journal of Supercomputer Applications- Information for Contributors . . . . . . . . . . . . 104--105
Joanne L. Martin In Memoriam --- Sidney Fernbach (1917-1991) . . . . . . . . . . . . . . 3--3 Dennis W. Duke Computational Science at the Supercomputer Computations Research Institute . . . . . . . . . . . . . . . 4--12 John M. Dawson and Richard D. Sydora and Viktor K. Decyk and Paulette C. Liewer and Robert D. Ferraro Physics Modeling of Tokamak Transport, a Grand Challenge for Controlled Fusion 13--35 R. T. Scalettar and D. J. Scalapino and R. L. Sugar and S. R. White Quantum Monte Carlo Simulations of a $ \hbox {CuO}_2 $ Model . . . . . . . . . 36--45 Misako Ishiguro Queuing Model Analysis of the Fujitsu VP2000 with Dual Scalar Architecture . . 46--62 D. H. Bailey and E. Barszcz and J. T. Barton and D. S. Browning and R. L. Carter and L. Dagum and R. A. Fatoohi and P. O. Frederickson and T. A. Lasinski and R. S. Schreiber and H. D. Simon and V. Venkatakrishnan and S. K. Weeratunga The NAS Parallel Benchmarks . . . . . . 63--73 Anthony T. Chronopoulos and C. R. Swaminathan and V. R. Voller The Stefan Problem Solved via Conjugate Gradient-like Iterative Methods on a Parallel Vector Machine . . . . . . . . 74--91 M. J. Daydé and I. S. Duff Use of Level 3 BLAS in $ L U $ Factorization in a Multiprocessing Environment on Three Vector Multiprocessors: The Alliant FX/80, the CRAY-2, and the IBM 3090 VF . . . . . . 92--110 Anonymous Meetings . . . . . . . . . . . . . . . . 113--114 Anonymous The \booktitleInternational Journal of Supercomputer Applications --- Information for Contributors . . . . . . 115--116
Thomas A. Weber The National Science Foundation Supercomputer Centers Program . . . . . 3--3 Lawrence E. Brandt Centers of supercomputing --- a history and prospectus for the NSF supercomputer centers . . . . . . . . . . . . . . . . 4--9 Paulette Clancy Computer Simulation of Crystal Growth and Dissolution in Metals and Semiconductors . . . . . . . . . . . . . 10--33 M. D. Smooke and V. Giovangigli Numerical Modeling of Axisymmetric Laminar Diffusion Flames by a Parallel Boundary Value Method . . . . . . . . . 34--49 Steven A. Gottlieb and A. Krasnitz and U. M. Heller and A. D. Kennedy and W. Liu and J. B. Kogut and R. L. Renken and D. K. Sinclair and K. C. Wang and R. L. Sugar and D. Toussaint Hadron Thermodynamics on the Connection Machine . . . . . . . . . . . . . . . . 50--60 Claude Bernard and Michael C. Ogilvie and Thomas A. DeGrand and Carleton E. DeTar and Steven A. Gottlieb and A. Krasnitz and R. L. Sugar and D. Toussaint Studying Quarks And Gluons on MIMD Parallel Computers . . . . . . . . . . . 61--70 Lars Hernquist The Fueling of Active Galaxies . . . . . 71--83 Herbert W. Hamber Simulations of Discrete Quantized Gravity . . . . . . . . . . . . . . . . 84--97 Charles L. Brooks III and William S. Young and Douglas J. Tobias Molecular Simulations On Supercomputers 98--112 Anonymous Meetings . . . . . . . . . . . . . . . . 113--114
Joanne L. Martin Editorial . . . . . . . . . . . . . . . 3--3 Bahram Nassersharif Centers of Supercomputing --- Science and Engineering at the Texas A&M University Supercomputer Center . . . . 4--12 Michael W. Berry Large-Scale Sparse Singular Value Computations . . . . . . . . . . . . . . 13--49 Lawrence Sirovich and Richard Everson Management and Analysis of Large Scientific Datasets . . . . . . . . . . 50--68 Krister Dackland and Erik Elmroth and Bo Kågström and Charles Van Loan Parallel block matrix factorizations on the shared-memory multiprocessor IBM 3090 VF/600J . . . . . . . . . . . . . . 69--97 S. K. Kim and A. T. Chronopoulos An Efficient Parallel Algorithm for Extreme Eigenvalues of Sparse Nonsymmetric Matrices . . . . . . . . . 98--111 Cherri M. Pancake What Should We Expect from Parallel Language Standards? . . . . . . . . . . 112--117 Anonymous Announcements . . . . . . . . . . . . . 118--119 Anonymous Meetings . . . . . . . . . . . . . . . . 120--121 Anonymous The International Journal of Supercomputer Applications-Information for Contributors . . . . . . . . . . . . 122--123
Matthew Witten Editorial: the Frankenstein Project: Building a Man in the Machine and the Arrival of the Computational Physician 127--137 Robert Jones Protein Sequence and Structure Comparison on Massively Parallel Computers . . . . . . . . . . . . . . . 138--146 Dean F. Sittig and Mark A. Shifman and Prakash Nadkarni and Perry L. Miller Parallel Computation for Medicine and Biology: Applications of Linda at Yale University . . . . . . . . . . . . . . . 147--163 Richard T. Hart and Z. Maria Oden and Susannah W. Parrish and David B. Burr Computational Methods for Bone Mechanics Studies . . . . . . . . . . . . . . . . 164--174 David Strip and Michael Karasick Solid Modeling On a Massively Parallel Processor . . . . . . . . . . . . . . . 175--192 Jianping Zhu and Yung Ming Chen History Matching for Multiphase Reservoir Models on Shared Memory Supercomputers . . . . . . . . . . . . . 193--206 Anonymous Announcements . . . . . . . . . . . . . 207--207 Anonymous Meetings . . . . . . . . . . . . . . . . 208--209
Donald M. Austin Centers of supercomputing --- the University of Minnesota Army High Performance Computing Research Center 215--223 Hans-Georg Reusch Experiences with the Parallelization and Vectorization of Simulation Codes for Heavy-Ion Reactions . . . . . . . . . . 224--240 Jean-Philippe Brunet and S. Lennart Johnsson All-to-All Broadcast and Applications on the Connection Machine . . . . . . . . . 241--256 Matthew Witten and Robert E. Wyatt Increasing Our Understanding of Biological Models Through Visual and Sonic Representations: a Cortical Case Study . . . . . . . . . . . . . . . . . 257--280 R. C. Brower and C. Rebbi and P. Tamayo and K. J. M. Moriarty and S. Sanielevici Benchmarking High-Performance Computing Systems by Means of Local-Creutz Simulations of the $ d = 2 $ Ising Model 281--287 Scientific Supercomputing Subcommittee, Technical Committee on Supercomputing Applications, IEEE Computer Society NSF Supercomputer Center Study: February 1992 . . . . . . . . . . . . . . . . . . 288--303 Anonymous Largest-Known Prime Number Uncovered . . 304--304 Anonymous Meetings . . . . . . . . . . . . . . . . 305--307
Jack Dongarra Editorial . . . . . . . . . . . . . . . 313--313 Kevin Timson and Ann Redelfs Centers of supercomputing --- Center for Research on Parallel Computation . . . . 314--321 S. Lennart Johnsson and Luis F. Ortiz Local Basic Linear Algebra Subroutines (LBLAS) for distributed memory architectures and languages with array syntax . . . . . . . . . . . . . . . . . 322--350 Roberto Ansaloni and Stefano Evangelisti and Giuseppe Paruolo and Elda Rossi Efficient Parallel Implementation of a Full Configuration Interaction Algorithm for Circular Polyenes on a CRAY Y-MP . . 351--360 K. J. M. Moriarty and S. Sanielevici and D. W. Kuba Parallel Processing and the Sustained Production Performance of the CRAY Y-MP: Benchmarks Using Optimized Microtasked Lattice SU(3) Code . . . . . . . . . . . 361--370 S. Y. Moon and C. S. Yoon and T. J. Chung Multitasking for Local Parallelism in Applications to Chemically Reacting Supersonic Flows on CRAY Y-MP . . . . . 371--382 Skef Wholey and Clifford Lasser and Gyan Bhanot Correspondence: FLO67: a Case Study in Scalable Programming . . . . . . . . . . 383--388 Anonymous Announcements . . . . . . . . . . . . . 389--389 Anonymous Meetings . . . . . . . . . . . . . . . . 390--391 Anonymous The International Journal of Supercomputer Applications- . . . . . . 392--406 S. K. Kim and A. T. Chronopoulos An Efficient Parallel Algorithm for Extreme Eigenvalues of Sparse Nonsymmetric Matrices . . . . . . . . . 407--420 Anonymous Perspectives . . . . . . . . . . . . . . 421--426 Anonymous Announcements . . . . . . . . . . . . . 427--428 Anonymous Meetings . . . . . . . . . . . . . . . . 429--430 Anonymous The International Journal of Supercomputer Applications- . . . . . . 431--432
J. B. Drake and G. A. Geist and H. R. Hicks and K. L. Kliewer and G. M. Stocks and L. E. Toran and P. H. Worley The Center for Computational Sciences at Oak Ridge National Laboratory . . . . . 3--14 Shiwei Zhang and M. H. Kalos Exact Monte Carlo Calculations for Fermions on a Parallel Machine . . . . . 15--24 C.-S. Chang and G. De Titta and H. Hauptman and R. Miller and P. Thuman and C. Weeks Using Parallel Computers to Solve the Phase Problem of X-Ray Crystallography 25--49 Fongray Frank Young and Chwan-Hwa ``John'' Wu A Fully Vectorized Code for Nonequilibrium RF Glow Discharge Fluid Modeling and Its Parallel Processing on a CRAY X-MP . . . . . . . . . . . . . . 50--63 Patrick R. Amestoy and Iain S. Duff Memory Management Issues in Sparse Multifrontal Methods on Multiprocessors 64--82
Juli Raw and Donald C. Aston and Karinne W. Gordon and Kyle Wheeler The 0th Heterogeneous Computing Challenge: Fun and (Sometimes Too Much) Excitement . . . . . . . . . . . . . . . 91--96 Gary A. Mastin and Steven J. Plimpton and Dennis C. Ghiglia A Massively Parallel Digital Processor for Spotlight Synthetic Aperture Radar 97--112 Alan Edelman Large Dense Numerical Linear Algebra in 1993: The Parallel Computing Influence 113--128 Mark T. Jones and Paul E. Plassmann Computation of Equilibrium Vortex Structures for Type-II Superconductors 129--143 Charis Gantes and Jerome J. Connor and Robert D. Logcher Simulation of the Deployment Process of Multiunit Deployable Structures on a CRAY-2 . . . . . . . . . . . . . . . . . 144--154 H. Adeli and S. L. Hung A Concurrent Adaptive Conjugate Gradient Learning Algorithm on MIMD Shared-Memory Machines . . . . . . . . . . . . . . . . 155--166 Lincoln Gray and Scott Klasky and Robert Byers Visualizing Complex Patterns in the Spread of Head and Neck Cancers . . . . 167--178 Anonymous Message-Passing Interface . . . . . . . 179--179 Anonymous Meetings . . . . . . . . . . . . . . . . 180--181 Anonymous The International Journal of Supercomputer Applications- . . . . . . 182--183
Anna Nagurney Introduction To the Special Issue . . . 187--188 Mahmoud A. El-Gamal and Richard D. McKelvey and Thomas R. Palfrey Computational Issues in the Statistical Design and Analysis of Experimental Games . . . . . . . . . . . . . . . . . 189--200 Hans M. Amman and David A. Kendrick Forward Looking Behavior and Learning in Stochastic Control . . . . . . . . . . . 201--211 Ayse Imrohoroglu and Selahattin Imrohoroglu and Douglas H. Joines A Numerical Algorithm for Solving Models with Incomplete Markets . . . . . . . . 212--230 Vassilis Argyrou Hajivassiliou Simulating normal rectangle probabilities and their derivatives: Effects of vectorization . . . . . . . . 231--253 Manfred Gilli and Giorgio Pauletto Econometric Model Simulation On Parallel Computers . . . . . . . . . . . . . . . 254--264 Agapi L. Somwaru and Kenneth Hanson Globally convex agricultural production system: parameter estimation . . . . . . 265--271
K. Lowther and J. C. Salem and J. A. Sethian Interactive, animated visualization environment for three-dimensional fluid flow . . . . . . . . . . . . . . . . . . 277--291 Yan Huo and Robert Schreiber Efficient, Massively Parallel Eigenvalue Computation . . . . . . . . . . . . . . 292--303 George Delic Performance Attributes for Code and Workload Analysis on CRAY X-MP and Y-MP Systems . . . . . . . . . . . . . . . . 304--336 C.-H. Lai Domain decomposition methods for semiconductor device problems on a Cray S-MP . . . . . . . . . . . . . . . . . . 337--348 Soren S. Nielsen and Stavros A. Zenios Massively Parallel Proximal Algorithms for Solving Linear Stochastic Network Programs . . . . . . . . . . . . . . . . 349--364 Anonymous Meetings . . . . . . . . . . . . . . . . 365--366
Joanne L. Martin Editorial . . . . . . . . . . . . . . . 3--4 Frederick H. Hausheer Introduction to the Theme Issue . . . . 5--5 Terry R. Stouch and Howard E. Alper and Donna Bassolino-Klimas Supercomputing Studies of Biomembranes 6--23 Raul E. Cachau and Rick Gussio and John A. Beutler and Gwendolyn N. Chmurny and Bruce D. Hilton and Gary M. Muschik and John W. Erickson Solution Structure of Taxol Determined Using a Novel Feedback-Scaling Procedure for NOE-Restrained Molecular Dynamics 24--34 Salvatore Profeta, Jr. and Rayomand J. Unwalla and Daniel J. Russell Relative energies and structural features of small amines and their ammonium analogs: Results from 6-31G* optimizations and an MM2 ammonium force field . . . . . . . . . . . . . . . . . 35--46 John E. Mertz and B. Montgomery Pettitt Molecular Dynamics At a Constant pH . . 47--53 Ai Chen and Cynthia S. Hirtzel Massively Parallel Monte Carlo Simulations on CM2 for Gas Adsorption in Zeolite Molecular Sieves . . . . . . . . 54--63 Manish Deshpande and Jinzhang Feng and Charles L. Merkle and Ashish Deshpande Application of a Distributed Network in Computational Fluid Dynamic Simulations 64--67 Anonymous \booktitleThe International Journal of Supercomputer Applications and High Performance Computing . . . . . . . . . 68--69
Ken Kennedy and Kevin Timson Centers of supercomputing --- making parallel computing truly usable: research, education, and knowledge transfer at the Center for Research on Parallel Computation . . . . . . . . . . 73--79 Mani Chandy and Ian Foster and Ken Kennedy and Charles Koelbel and Chau-Wen Tseng Integrated Support for Task and Data Parallelism . . . . . . . . . . . . . . 80--98 Jaeyoung Choi and Jack J. Dongarra and Roldan Pozo and Danny C. Sorensen and David W. Walker CRPC Research into Linear Algebra Software for High Performance Computers 99--118 Ulrich Kremer and Marcelo Ramé Compositional Oil Reservoir Simulation in Fortran D: a Feasibility Study On Intel iPsc\slash 860 . . . . . . . . . . 119--128 John K. Salmon and Michael S. Warren and Gregoire S. Winckelmans Fast Parallel Tree Codes for Gravitational and Fluid Dynamical $ {N}$-Body Problems . . . . . . . . . . . 129--142 B. Averick and C. Bischof and B. Bixby and A. Carle and J. Dennis and M. El-Alem and A. El-Bakry and A. Griewank and G. Johnson and R. Lewis and J. Moré and R. Tapia and V. Torczon and K. Williamson Numerical Optimization at the Center for Research on Parallel Computation . . . . 143--153 Anonymous Supercomputer applications and High Performance Computing- . . . . . . . . . 154--155
Anonymous MPI: a Message-Passing Interface Standard . . . . . . . . . . . . . . . . 159--416
Luca F. Pavarino and Marcelo Ramé Numerical Experiments With an Overlapping Additive Schwarz Solver for $3$-D Parallel Reservoir Simulation . . 3--17 Marie-Odile Bristeau and Jocelyne Erhel and Philippe Féat and Roland Glowinski and Jacques Périaux Solving the Helmholtz Equation At High-Wave Numbers On a Parallel Computer With a Shared Virtual Memory . . . . . . 18--28 Bruce A. Shapiro and Jih-Hsiang Chen and Tim Busse and Joseph Navetta and Wojciech Kasprzak and Jacob V. Maizel, Jr. Optimization and Performance Analysis of a Massively Parallel Dynamic Programming Algorithm for RNA Secondary Structure Prediction . . . . . . . . . . . . . . . 29--39 Yu-Chung Chang and Tony F. Chan Performance Modeling for High-Order Finite Difference Methods on the Connection Machine CM-2 . . . . . . . . 40--57 W. F. Wong and Yoshio Oyanagi and Eiichi Goto Evaluation of the Hitachi S-3800 Supercomputer Using Six Benchmarks . . . 58--70 U. Kremer and M. Ramé Erratum: Compositional Oil Reservoir Simulation in Fortran D: a Feasibility Study on Intel iPSC/860 . . . . . . . . 71--71 Anonymous The International Journal of Supercomputer Applications and High Performance Computing- . . . . . . . . . 72--73
Louis H. Turcotte Introduction . . . . . . . . . . . . . . 77--78 Anthony Skjellum and Ewing Lusk and William Gropp Early applications in the Message-Passing Interface (MPI) . . . . 79--94 Steven A. Moyer and Vaidy S. Sunderam Parallel I/O as a Parallel Application 95--107 Adam Beguelin and Jack Dongarra and Al Geist and Robert Manchek and Vaidy Sunderam Recent Enhancements to PVM . . . . . . . 108--127 P. Dragovitsch and X. Zhao and L. C. Dennis and G. A. Riccardi PVMGeant --- a Parallel Simulation Code for the CLAS Detector at CEBAF . . . . . 128--137 Timothy G. Mattson Programming Environments for Parallel and Distributed Computing: a Comparison of P4, PVM, Linda, and TCGMSG . . . . . 138--161
Alexandre Ern and Craig C. Douglas and Mitchell D. Smooke Detailed Chemistry Modeling of Laminar Diffusion Flames on Parallel Computers 167--186 Thomas A. Cortese and S. Balachandar High Performance Spectral Simulation of Turbulent Flows in Massively Parallel Machines with Distributed Memory . . . . 187--204 Vincent Bouchitté and Pierre Boulet and Alain Darte and Yves Robert Evaluating Array Expressions on Massively Parallel Machines with Communication/Computation Overlap . . . 205--219 Henry Ker-Chang Chang and Chung-Yu Liou Parallel Implementation of Linear Quadtree Codes Using the nCube 2 Supercomputer System . . . . . . . . . . 220--231 Anonymous \booktitleThe International Journal of Supercomputer Applications and High Performance Computing . . . . . . . . . 232--233
Ember Uziel and Michael W. Berry Parallel Models of Animal Migration in Northern Yellowstone National Park . . . 237--255 Santhosh Kumaran and Robert N. Miller A Comparison of Parallelization Techniques for a Finite Element Quasigeostrophic Model of Regional Ocean Circulation . . . . . . . . . . . . . . 256--279 Chris H. Walshaw and Mark Cross and Martin G. Everett A Localized Algorithm for Optimizing Unstructured Mesh Partitions . . . . . . 280--295 Sridhar Chirravuri and Suchendra M. Bhandarkar and David Whitmire A Massively Parallel Algorithm for $ K_2 $ Entropy Computation: Case Studies of Model Systems and \em In Vivo Data . . . 296--311 Ann-Marie Mårtensson-Pendrill Perspectives: Turnaround times at a supercomputing center . . . . . . . . . 312--314
Yu Hu and S. Lennart Johnsson A Data-Parallel Implementation of Hierarchical $ {N}$-Body Methods . . . . 3--40 Andreas Stathopoulos and Anders B. Ynnerman and Charlotte Froese Fischer A PVM Implementation of the MCHF Atomic Structure Package . . . . . . . . . . . 41--61 Thomas Rauber and Gudula Rünger Parallel Implementations of Iterated Runge--Kutta Methods . . . . . . . . . . 62--90 George Delic and Richard I. Haller Factor Analysis of Applications Performance Data for the Cray Y-MP . . . 91--113
Thomas A. DeFanti and Ian Foster and Michael E. Papka and Rick Stevens and Tim Kuhfuss Overview of the I-WAY: Wide-Area Visual Supercomputing . . . . . . . . . . . . . 123--131 Michael L. Norman and Peter Beckman and Greg Bryan and John Dubinski and Dennis Gannon and Lars Hernquist and Kate Keahey and Jeremiah P. Ostriker and John Shalf and Joel Welling and Shelby Yang Galaxies Collide on the I-WAY: An Example of Heterogeneous Wide-Area Collaborative Supercomputing . . . . . . 132--144 Valerie E. Taylor and Milana Huang and Thomas Canfield and Rick Stevens and Daniel Reed and Stephen Lamm Performance Modeling of Interactive; Immersive Virtual Environments for Finite Element Simulations . . . . . . . 145--156 George A. Geist II and James A. Kohl and Donald M. C. Nicholson and Philip M. Papadopoulos and Bart D. Semeraro and William A. Shelton and G. Malcolm Stocks and Yang Wang Early Experiences with Distributed Supercomputing on I-WAY: First Principles Materials Science and Parallel Acoustic Wave Propagation . . . 157--169 Stephen J. Young and Gary Guo You Fan and David Hessler and Stephan Lamont and T. Todd Elvins and Martin Hadida-Hassan and Gary Alan Hanyzewski and James W. Durkin and Philip Hubbard and Gordon Kindlmann and Eric Wong and Donald Greenberg and Sidney Karin and Mark H. Ellisman Implementing a Collaboratory for Microscopic Digital Anatomy . . . . . . 170--181 Gary D. Kerbel and Tim Pierce and J. L. Milovich and Dan E. Shumaker and Alan Verlo and Ronald E. Waltz and Gregory W. Hammett and Mike A. Beer and Bill Dorland Interactive Scientific Exploration of Gyrofluid Tokamak Turbulence . . . . . . 182--198 Glen H. Wheless and Cathy M. Lascara and Arnoldo Valle-Levinson and Donald P. Brutzman and William Sherman and William L. Hibbard and Brian E. Paul The Chesapeake Bay Virtual Environment (CBVE): Initial Results from the Prototypical System . . . . . . . . . . 199--210 William L. Hibbard and John Anderson and Ian Foster and Brian E. Paul and Robert Jacob and Chad Schafer and Mary K. Tyree Exploring Coupled Atmosphere-Ocean Models Using Vis5D . . . . . . . . . . . 211--222 Darin Diachin and Lori Freitag and Daniel Heath and Jim Herzog and William Michels and Paul Plassmann Collaborative Virtual Environments Used in the Design of Pollution Control Systems . . . . . . . . . . . . . . . . 223--235 Richard M. Crutcher and M. Pauline Baker and George Baxter and John Pixton and Raymond Plante and Harold Ravlin and Douglas Roberts and Randall Sharpe Radio Synthesis Imaging: a Grand Challenge HPCC Project . . . . . . . . . 236--245 Anonymous Information for Contributors . . . . . . 246--247
Mark T. Nelson and William F. Humphrey and Attila Gursoy and Andrew Dalke and Laxmikant V. Kalé and Robert D. Skeel and Klaus Schulten NAMD: a Parallel Object-Oriented Molecular Dynamics Program . . . . . . . 251--268 Susan Burgee and Anthony A. Giunta and Vladimir Balabanov and Bernard Grossman and William H. Mason and Robert Narducci and Raphael T. Haftka and Layne T. Watson A Coarse-Grained Parallel Variable-Complexity Multidisciplinary Optimization Paradigm . . . . . . . . . 269--299 David Kramer and S. Lennart Johnsson and Yu Hu Local Basic Linear Algebra Subroutines (LBLAS) for the CM-5/SE . . . . . . . . 300--335 Andrew Ilin and L. Ridgway Scott Correspondence: Loop Splitting for High Performance Computers . . . . . . . . . 336--340 Anonymous \booktitleThe International Journal of Supercomputer Applications and High Performance Computing: Information for Contributors . . . . . . . . . . . . . . 341--342 Anonymous Index: Volume 10 . . . . . . . . . . . . 343--345
Jan Clinckemaillie and Birgit Elsner and Guy Lonsdale and Serge Meliciani and Stefanos Vlachoutsis and Frank de Bruyne and Michael Holzner Performance Issues of the Parallel PAM-CRASH Code . . . . . . . . . . . . . 3--11 Susan E. Dorward and Lesley R. Matheson and Robert E. Tarjan Toward Efficient Unstructured Multigrid Preprocessing . . . . . . . . . . . . . 12--33 Jeffrey M. Constantin and Michael W. Berry and Bradley T. Vander Zanden Parallelization of the Hoshen--Kopelman Algorithm Using a Finite State Machine 34--48 Michael T. Heath and Padma Raghavan Performance of a Fully Parallel Sparse Solver . . . . . . . . . . . . . . . . . 49--64 Paul Fischer and David Gottlieb On the Optimal Number of Subdomains for Hyperbolic Problems on Parallel Computers . . . . . . . . . . . . . . . 65--76
Jack Dongarra and Bernard Tourancheau Preface To the Special Issue . . . . . . 83--83 Cherri M. Pancake Can Users Play an Effective Role in Parallel Tools Research? . . . . . . . . 84--94 Jean-Luc Dekeyser and Christian Lefebvre HPF-Builder: a Visual Environment to Transform Fortran 90 Codes to HPF . . . 95--102 William Gropp and Ewing Lusk Sowing MPICH: a Case Study in the Dissemination of a Portable Environment for Parallel Scientific Computing . . . 103--114 Ian Foster and Carl Kesselman Globus: a Metacomputing Infrastructure Toolkit . . . . . . . . . . . . . . . . 115--128 Andrew S. Grimshaw and Anh Nguyen-Tuong and Mark J. Lewis and M. Hyett Campus-Wide Computing: Early Results Using Legion at the University of Virginia . . . . . . . . . . . . . . . . 129--143 Oleg Y. Nickolayev and Philip C. Roth and Daniel A. Reed Real-Time Statistical Clustering for Event Trace Reduction . . . . . . . . . 144--159 Thomas Ludwig and Roland Wismüller and Michael Oberhuber and Arndt Bode An Open Interface for the On-Line Monitoring of Parallel and Distributed Programs . . . . . . . . . . . . . . . . 160--174 Anonymous \booktitleThe International Journal of Supercomputer Applications and High Performance Computing: Information for Contributors . . . . . . . . . . . . . . 175--176
Janice E. Cuny and Robert A. Dunn and Steven T. Hackstadt and Christopher W. Harrop and Harold H. Hersey and Allen D. Malony and Douglas R. Toomey Building Domain-Specific Environments for Computational Science: a Case Study in Seismic Tomography . . . . . . . . . 179--196 Françoise Tisseur Parallel Implementation of the Yau and Lu Method for Eigenvalue Computation . . 197--204 Pierre Manneback Solving Irregular Sparse Linear Systems on a Multicomputer Using the CGNR Method 205--211 Henri Casanova and Jack Dongarra NetSolve: a Network-Enabled Server for Solving Computational Science Problems 212--223 G. A. Geist II and James Arthur Kohl and Philip M. Papadopoulos CUMULVS: Providing Fault Tolerance, Visualization, and Steering of Parallel Applications . . . . . . . . . . . . . . 224--235 Karsten M. Decker and Brian J. N. Wylie Software Tools for Scalable Multilevel Application Engineering . . . . . . . . 236--250 Roldan Pozo Template Numerical Toolkit for Linear Algebra: High Performance Programming with C++ and the Standard Template Library . . . . . . . . . . . . . . . . 251--263 Anonymous \booktitleThe International Journal of Supercomputer Applications and High Performance Computing: Information for Contributors . . . . . . . . . . . . . . 264--265
Adrian Colbrook and Iain Duff and Tony Hey and Klaus Stüben and Clemens-August Thole Editorial . . . . . . . . . . . . . . . 275--276 Thierry Coupez and Stéphane Marie From a Direct Solver to a Parallel Iterative Solver in 3-D Forming Simulation . . . . . . . . . . . . . . . 277--285 Michel Géradin and Danielle Coulon and Jean-Pierre Delsemme Parallelization of the SAMCEF Finite Element Software through Domain Decomposition and FETI Algorithm . . . . 286--298 Dag Fritzson and Peter Fritzson and Patrik Nordling and Tommy Persson Rolling Bearing Simulation on MIMD Computers . . . . . . . . . . . . . . . 299--313 C. Addison and E. Appiani and R. Cook and M. Corvi and P. G. N. Howard and B. Stephens Parallel SAR Image Enhancement . . . . . 314--327 Markus Ast and T. Jerez and Jesus Labarta and Hartmut Manz and Andres Pérez and Uwe Schulz and Jaume Solé Runtime Parallelization of the Finite Element Code PERMAS . . . . . . . . . . 328--335 Anders Ytterström A Tool for Partitioning Structured Multiblock Meshes for Parallel Computational Mechanics . . . . . . . . 336--343 Mike C. Dracopoulos and Craig Glasgow and A. Kevin Parrott and Rick Janssen and Pergiorgio Alotto and John Simkin Bulk Synchronous Parallelization of Industrial Electromagnetic Software . . 344--358 Anonymous Index to Volume 11 . . . . . . . . . . . 359--361 Anonymous \booktitleThe International Journal of Supercomputer Applications and High Performance Computing . . . . . . . . . 362--363
MPI Forum Special Issue: MPI2: a Message-Passing Interface Standard . . . . . . . . . . . 1--299
David Mackay and G. Mahinthakumar and Ed D'Azevedo A Study of I/O in a Parallel Finite Element Groundwater Transport Code . . . 307--319 P. Lockey and R. Proctor and I. D. James Characterization of I/O Requirements in a Massively Parallel Shelf Sea Model . . 320--332 Ron A. Oldfield and David E. Womble and Curtis C. Ober Efficient Parallel I/O in Seismic Processing . . . . . . . . . . . . . . . 333--344 Jarek Nieplocha and Ian Foster and Rick A. Kendall ChemIO: High Performance Parallel I/O for Computational Chemistry Applications 345--363 Huseyin Simitci and Daniel A. Reed A Comparison of Logical and Physical Parallel I/O Patterns . . . . . . . . . 364--380 Anonymous \booktitleThe International Journal of High Performance Computing Applications: Information for Contributors . . . . . . 381--382
Rajeev Thakur and Ewing Lusk and William Gropp I/O in Parallel Applications: The Weakest Link . . . . . . . . . . . . . . 389--395 G. Davis and L. Lau and R. Young and F. Duncalfe and L. Brebber Parallel Run Length Encoding Compression: Reducing I/O in Dynamic Environmental Simulations . . . . . . . 396--410 Meenakshi A. Kandaswamy and Mahmut T. Kandemir and Alok N. Choudhary and David E. Bernholdt An Experimental Study to Analyze and Optimize Hartree--Fock Application's I/O with PASSION . . . . . . . . . . . . . . 411--439 Anonymous Meetings . . . . . . . . . . . . . . . . 440--445 Anonymous Index to International Journal of High Performance Computing Applications . . . 446--447
Colin J. Aro and Garry H. Rodrigue and Douglas A. Rotman A High Performance Chemical Kinetics Algorithm for $3$-D Atmospheric Models 3--15 R. Alan McCoy and Yuefan Deng Parallel Particle Simulations of Thin-Film Deposition . . . . . . . . . . 16--32 Alex R. Carrillo and John E. West and David A. Horner and John F. Peters Interactive Large-Scale Soil Modeling Using Distributed High Performance Computing Environments . . . . . . . . . 33--48 Ranieri Baraglia and Renato Ferrini and Domenico Laforenza and Antonio Lagan\`a On the Optimization of a Pipeline Model to Integrate a Reduced-Dimensionality Schrödinger Equation for Distributed Memory Architectures . . . . . . . . . . 49--62 G. Wang and Danesh K. Tafti Performance Enhancement on Microprocessors with Hierarchical Memory Systems for Solving Large Sparse Linear Systems . . . . . . . . . . . . . . . . 63--79 S. F. Ashby and W. J. Bosl and R. D. Falgout and S. G. Smith and A. F. B. Tompson and T. J. Williams A Numerical Simulation of Groundwater Flow and Contaminant Transport on the Cray T3D and C90 Supercomputers . . . . 80--93
Sandra Baldini and Luc Giraud and Javier G. Izaguirre and Jose M. Jimenez and Luis M. Matey High Performance Computing in Multibody System Design . . . . . . . . . . . . . 99--106 Stephen T. Barnard and Luis M. Bernardo and Horst D. Simon An MPI Implementation of the SPAI Preconditioner on the T3E . . . . . . . 107--123 Eleanor Chu Impact of Physical/Logical Network Topology on Parallel Matrix Computation 124--145 Dror G. Feitelson On the Interpretation of Top500 Data . . 146--153 Aiichiro Nakano A Rigid-Body-Based Multiple Time Scale Molecular Dynamics Simulation of Nanophase Materials . . . . . . . . . . 154--162 Kevin R. Wadleigh High Performance FFT Algorithms for Cache-Coherent Multiprocessors . . . . . 163--171
Jack J. Dongarra and Bernard Tourancheau Special Issue Introduction: Clusters and Computational Grids for Scientific Computing . . . . . . . . . . . . . . . 179--179 Frederica Darema New Software Technologies for the Development and Runtime Support of Complex Applications . . . . . . . . . . 180--190 Thomas Sterling and Daniel F. Savarese From Toys to Teraflops: Bridging the Beowulf Gap . . . . . . . . . . . . . . 191--200 A. Chien and M. Lauria and R. Pennington and M. Showerman and G. Iannello and M. Buchanan and K. Connelly and L. Giannini and G. Koenig and S. Krishnamurthy and Q. Liu and S. Pakin and G. Sampemane Design and Evaluation of an HPVM-Based Windows NT Supercomputer . . . . . . . . 201--219 Jim Basney and Miron Livny Improving Goodput by Coscheduling CPU and Network Capacity . . . . . . . . . . 220--230 Henri Casanova and MyungHo Kim and James S. Plank and Jack J. Dongarra Adaptive Scheduling for Task Farming with Grid Middleware . . . . . . . . . . 231--240 Paul A. Gray and Vaidy S. Sunderam Metacomputing with the IceT System . . . 241--252 Alan Su and Francine Berman and Richard Wolski and Michelle Mills Strout Using AppLeS to Schedule Simple SARA on the Computational Grid . . . . . . . . . 253--262 Ariel Tamches and Barton P. Miller Using Dynamic Kernel Instrumentation for Kernel and Application Tuning . . . . . 263--276 Omer Zaki and Ewing Lusk and William Gropp and Deborah Swider Toward Scalable Performance Visualization with Jumpshot . . . . . . 277--288
Jeffrey L. Tilson and Mike Minkoff and Albert F. Wagner and Ron Shepard and Paul Sutton and Robert J. Harrison and Ricky A. Kendall and Adrian T. Wong High-Performance Computational Chemistry: Hartree--Fock Electronic Structure Calculations on Massively Parallel Processors . . . . . . . . . . 291--302 A. K. Dhingra and M. Zhang and R. Ratnam and D. Suri A Coarse-Grained Parallel Homotopy for Mechanism Design . . . . . . . . . . . . 303--319 Toshiya Kimura and Hiroshi Takemiya Distributed Parallel Computing for Fluid-Structure Coupled Simulations on a Heterogeneous Parallel Computer Cluster 320--333 C. Walshaw and M. Cross and R. Diekmann and F. Schlimbach Multilevel Mesh Partitioning for Optimizing Domain Shape . . . . . . . . 334--353 George D. Byrne and Alan C. Hindmarsh Correspondence: PVODE, an ODE Solver for Parallel Computers . . . . . . . . . . . 354--365 Anonymous Index to International Journal of High Performance Computing Applications, Volume 13 . . . . . . . . . . . . . . . 366--368
Hany H. Ammar and Zhouhui Miao Parallel Algorithms for the Training Process of a Neural Network-Based System 3--25 M. Scot Breitenfeld and Philippe H. Geubelle Parallel Implementation of a Spectral Scheme for Simulations of 3-D Dynamic Fracture Events . . . . . . . . . . . . 26--38 Sangback Ma Comparisons of the ILU(0), Point-SSOR, and SPAI Preconditioners on the CRAY-T3E for Nonsymmetric Sparse Linear Systems Arising from PDEs on Structured Grids 39--48 Steve W. Bova and Clay P. Breshears and Christine E. Cuicchi and Zeki Demirbilek and Henry A. Gabb Dual-Level Parallel Analysis of Harbor Wave Response Using MPI and OpenMP . . . 49--64 Weian Deng and S. Sitharama Iyengar and Nathan E. Brener A Fast Parallel Thinning Algorithm for the Binary Image Skeletonization . . . . 65--81
Tony Chan and Victor Eijkhout Design of a Library of Parallel Preconditioners . . . . . . . . . . . . 91--101 William Gropp and David Keyes and Lois Curfman McInness and M. D. Tidriri Globalized Newton--Krylov--Schwarz Algorithms and Software for Parallel Implicit CFD . . . . . . . . . . . . . . 102--136 Kevin McManus and Mark Cross and Chris Walshaw and Steve Johnson and Peter Leggett A Scalable Strategy for the Parallelization of Multiphysics Unstructured Mesh-Iterative Codes on Distributed-Memory Systems . . . . . . . 137--174
Frederica Darema and Jack Dongarra and Subhash Saini Preface . . . . . . . . . . . . . . . . 179--179 Frederica Darema Performance Engineering Technology for the Design, Management, and Control of Computing Systems . . . . . . . . . . . 180--188 S. Browne and J. Dongarra and N. Garner and G. Ho and P. Mucci A Portable Programming Interface for Performance Evaluation on Modern Processors . . . . . . . . . . . . . . . 189--204 Tony Hey and David Lancaster The Development of Parkbench and Performance Prediction . . . . . . . . . 205--215 Tahsin Kurc and Mustafa Uysal and Hyeonsang Eom and Jeff Hollingsworth and Joel Saltz and Alan Sussman Efficient Performance Prediction for Large-Scale Data-Intensive Applications 216--227 G. R. Nudd and D. J. Kerbyson and E. Papaefstathiou and S. C. Perry and J. S. Harper and D. V. Wilcox PACE --- a Toolset for the Performance Prediction of Parallel and Distributed Systems . . . . . . . . . . . . . . . . 228--251 Lewis Mackenzie and Mohamed Ould-Khaoua Comparative Modeling of Network Topologies and Routing Strategies in Multicomputers . . . . . . . . . . . . . 252--267 Kento Aida and Atsuko Takefusa and Hidemoto Nakada and Satoshi Matsuoka and Satoshi Sekiguchi and Umpei Nagashima Performance Evaluation Model for Scheduling in Global Computing Systems 268--279
J. C. Browne and E. Berger and A. Dube Compositional Development of Performance Models in POEMS . . . . . . . . . . . . 283--291 Daniel A. Menascé Web Performance Modeling Issues . . . . 292--303 Vikram Adve and Rizos Sakellariou Application Representations for Multiparadigm Performance Modeling of Large-Scale Parallel Scientific Codes 304--316 Bryan Buck and Jeffrey K. Hollingsworth An API for Runtime Code Patching . . . . 317--329 Adolfy Hoisie and Olaf Lubeck and Harvey Wasserman Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications . . . . . . . . . . . . . . 330--346 Katarzyna Keahey and Peter Beckman and James Ahrens Ligature: Component Architecture for High Performance Applications . . . . . 347--356 Jeffrey S. Vetter and Daniel A. Reed Real-Time Performance Monitoring, Adaptive Control, and Interactive Steering of Computational Grids . . . . 357--366 Neil J. Gunther The Dynamics of Performance Collapse in Large-Scale Networks and Computers . . . 367--372 Anonymous Index to \booktitleInternational Journal of High Performance Computing Applications, Volume 14 . . . . . . . . 373--375
Rajive Bagrodia and Ewa Deelman and Thomas Phan Parallel Simulation of Large-Scale Parallel Applications . . . . . . . . . 3--12 Jason Abate and Peng Wang and Kamy Sepehrnoori Parallel Compositional Reservoir Simulation on Clusters of PCs . . . . . 13--21 David Kerlick and Eric Dillon and David Levine Performance Testing of a Parallel Multiblock CFD Solver . . . . . . . . . 22--35 Luc Giraud and Ronan Guivarch and Joël Stein Parallel Distributed FFT-Based Solvers for $3$-D Poisson Problems in Meso-Scale Atmospheric Simulations . . . . . . . . 36--46 Jen-Chih Lin and Huan-Chao Keh Reconfiguration of Complete Binary Trees in Full IEH Graphs and Faulty Hypercubes 47--55 Edmond Chow Parallel Implementation and Practical Use of Sparse Approximate Inverse Preconditioners with a Priori Sparsity Patterns . . . . . . . . . . . . . . . . 56--74 Mihai Horoi and Richard J. Enbody Using Amdahl's Law as a Metric to Drive Code Parallelization: Two Case Studies 75--80
Mark Baker Preface . . . . . . . . . . . . . . . . 91--91 Thomas Sterling An Introduction to PC Clusters for High Performance Computing . . . . . . . . . 92--101 Amy Apon and Mark Baker Network Technologies . . . . . . . . . . 102--114 Steve Chapin and Joachim Worringen Operating Systems . . . . . . . . . . . 115--123 Rajkumar Buyya and Toni Cortes and Hai Jin Single System Image . . . . . . . . . . 124--135 Mark Baker and Amy Apon Middleware . . . . . . . . . . . . . . . 136--142 Anthony Skjellum and Rossen Dimitrov and Srihari Venkata Angaluri and David Lifka and George Coulouris and Putchong Uthayopas and Stephen L. Scott and Rasit Eskicioglu Systems Administration . . . . . . . . . 143--161 Erich Schikuta and Helmut Wanek Parallel I/O . . . . . . . . . . . . . . 162--168 Ira Pramanick High Availability . . . . . . . . . . . 169--174 Jack Dongarra and Shirley Moore and Anne Trefethen Numerical Libraries and Tools for Scalable Parallel Cluster Computing . . 175--180 David A. Bader and Robert Pennington Applications . . . . . . . . . . . . . . 181--185 Daniel S. Katz and Jeremy Kepner Embedded\slash Real-Time Systems . . . . 186--190 Anonymous Appendixes: Appendix A: Linux, Windows NT, AIX, Solaris; Appendix B: Compilers and Preprocessors, MPI Implementations, Development Environments, Debuggers, Performance Analyzers . . . . . . . . . 191--194
Jack Dongarra and Bernard Tourancheau Preface . . . . . . . . . . . . . . . . 199--199 Ian Foster and Carl Kesselman and Steven Tuecke The Anatomy of the Grid: Enabling Scalable Virtual Organization . . . . . 200--222 William E. Johnston Using Computing and Data Grids for Large-Scale Science and Engineering . . 223--242 Henri Casanova and Thomas M. Bartol, Jr. and Joel Stiles and Francine Berman Distributing MCell Simulations on the Grid . . . . . . . . . . . . . . . . . . 243--257 Rich Wolski and James S. Plank and John Brevik and Todd Bryan Analyzing Market-Based Resource Allocation Strategies for the Computational Grid . . . . . . . . . . . 258--281 Thomas Sterling and Daniel S. Katz and Larry Bergman High Performance Computing Systems for Autonomous Spaceborne Missions . . . . . 282--296 Jean-Yves Berthou and Eric Fayolle Comparing OpenMP, HPF, and MPI Programming: a Study Case . . . . . . . 297--309 Olivier Beaumont and Arnaud Legrand and Fabrice Rastello and Yves Robert Static $ L U $ Decomposition on Heterogeneous Platforms . . . . . . . . 310--323
Francine Berman and Andrew Chien and Keith Cooper and Jack Dongarra and Ian Foster and Dennis Gannon and Lennart Johnsson and Ken Kennedy and Carl Kesselman and John Mellor-Crummey and Dan Reed and Linda Torczon and Rich Wolski The GrADS Project: Software Support for High-Level Grid Application Development 327--344 Gabrielle Allen and David Angulo and Ian Foster and Gerd Lanfermann and Chuang Liu and Thomas Radke and Ed Seidel and John Shalf The Cactus Worm: Experiments with Dynamic Resource Discovery and Allocation in a Grid Environment . . . . 345--358 Antoine Petitet and Susan Blackford and Jack Dongarra and Brett Ellis and Graham Fagg and Kenneth Roche and Sathish Vadhiyar Numerical Libraries and the Grid . . . . 359--374 Matei Ripeanu and Adriana Iamnitchi and Ian Foster Performance Predictions for a Numerical Relativity Package in Grid Environments 375--387 Boris Chernyavsky and Doyle Knight Investigation of Large Eddy Simulation Code Scaling Performance and Network Type Influence on a Linux PC Cluster . . 388--393 Anonymous Index to \booktitleInternational Journal of High Performance Computing Applications, Volume 15 . . . . . . . . 394--396
Jack Dongarra Preface: Basic Linear Algebra Subprograms Technical (Blast) Forum Standard I . . . . . . . . . . . . . . . 1--111 Anonymous Acknowledgments . . . . . . . . . . . . 2--3 Anonymous Suggestions for Reading . . . . . . . . 4--4 Anonymous Introduction . . . . . . . . . . . . . . 5--18 Anonymous Dense and Banded Blas . . . . . . . . . 19--86 Anonymous Annex A Appendix . . . . . . . . . . . . 87--93 Anonymous Annex B Legacy Blas . . . . . . . . . . 94--107 Anonymous Annex C . . . . . . . . . . . . . . . . 108--108 Anonymous References . . . . . . . . . . . . . . . 109--109 Anonymous Index . . . . . . . . . . . . . . . . . 110--111
Jack Dongarra Preface: Basic Linear Algebra Subprograms Technical (Blast) Forum Standard II . . . . . . . . . . . . . . 115--115 Anonymous Acknowledgments . . . . . . . . . . . . 116--117 Anonymous Suggestions for Reading . . . . . . . . 118--118 Anonymous 3 Sparse Blas . . . . . . . . . . . . . 119--141 Anonymous 4 Extended and Mixed Precision Blas . . 142--174 Anonymous Annex A . . . . . . . . . . . . . . . . 175--181 Anonymous Annex B . . . . . . . . . . . . . . . . 182--195 Anonymous Annex C . . . . . . . . . . . . . . . . 196--196 Anonymous References . . . . . . . . . . . . . . . 197--197 Anonymous Index . . . . . . . . . . . . . . . . . 198--199
S. S. Iyengar and Sri Kumar Preface . . . . . . . . . . . . . . . . 203--205 R. R. Brooks and C. Griffin and D. S. Friedlander Self-Organized Distributed Sensor Network Entity Tracking . . . . . . . . 207--219 R. R. Brooks and C. Griffin Traffic Model Evaluation of \em Ad Hoc Target Tracking Algorithms . . . . . . . 221--234 D. S. Friedlander and S. Phoha Semantic Information Fusion for Coordinated Signal Processing in Mobile Sensor Networks . . . . . . . . . . . . 235--241 Mark T. Jones and Shashank Mehrotra and Jae H. Park Tasking Distributed Sensor Networks . . 243--257 J. C. Chen and K. Yao and T. L. Tung and C. W. Reed and D. Chen Source Localization and Tracking of a Wideband Source Using a Randomly Distributed Beamforming Sensor Array . . 259--272 Ivo H. Pineda-Torres and Ibrahim Gokcen and Bill P. Buckles Image Feature Set For Correspondence Mappings . . . . . . . . . . . . . . . . 273--283 Nageswara S. V. Rao Netlets For End-To-End Delay Minimization in Distributed Computing Over The Internet Using Two-Paths . . . 285--292 Maurice Chu and Horst Haussecker and Feng Zhao Scalable Information-Driven Sensor Querying and Routing for \em Ad Hoc Heterogeneous Sensor Networks . . . . . 293--313 Edoardo S. Biagioni and K. W. Bridges The Application of Remote Sensor Technology to Assist The Recovery of Rare And Endangered Species . . . . . . 315--324 Hairong Qi and Xiaoling Wang and S. Sitharama Iyengar and Krishnendu Chakrabarty High Performance Sensor Integration in Distributed Sensor Networks Using Mobile Agents . . . . . . . . . . . . . . . . . 325--335 John W. Fisher III and Martin J. Wainwright and Erik B. Sudderth and Alan S. Willsky Statistical and Information-Theoretic Methods for Self-Organization and Fusion of Multimodal, Networked Sensors . . . . 337--353
Xian-He Sun and Thomas Fahringer and Mario Pantano Scala: a Performance System for Scalable Computing . . . . . . . . . . . . . . . 357--370 G. Mahinthakumar and F. Saied A Hybrid MPI-OpenMP Implementation of an Implicit Finite-Element Code on Parallel Architectures . . . . . . . . . . . . . 371--393 Dimitri J. Mavriplis Parallel Performance Investigations of an Unstructured Mesh Navier--Stokes Solver . . . . . . . . . . . . . . . . . 395--407 Chao Yang and Padma Raghavan and Lloyd Arrowood and Donald W. Noid and Bobby G. Sumpter and Robert E. Tuzun Large-Scale Normal Coordinate Analysis on Distributed Memory Parallel Systems 409--424 L. Giraud Combining Shared and Distributed Memory Programming Models on Clusters of Symmetric Multiprocessors: Some Basic Promising Experiments . . . . . . . . . 425--430 Anonymous Index . . . . . . . . . . . . . . . . . 431--432
Dieter Kranzlmüller and Peter Kacsuk and Jack Dongarra and Jens Volkert Recent Advances in Parallel Virtual Machine and Message Passing Interface (Select papers from the EuroPVMMPI 2002 Conference) . . . . . . . . . . . . . . 3--5 Ron Brightwell and Rolf Riesen and Arthur B. Maccabe Design, Implementation, and Performance of MPI on Portals 3.0 . . . . . . . . . 7--19 Félix Garcia-Carballeira and Alejandro Calderon and Jesus Carretero and Javier Fernandez and Jose M. Perez The Design of the Expand Parallel File System . . . . . . . . . . . . . . . . . 21--37 Francesc Giné and Francesc Solsona and Porfidio Hernández and Emilio Luque Dealing with Memory Constraints in a Non-Dedicated Linux Cluster . . . . . . 39--48 Rolf Rabenseifner and Gerhard Wellein Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures . . . . . . . . . . . . . 49--62 Sébastien Laflamme and Julien Dompierre and François Guibault and Robert Roy Applying Parmetis to Structured Remeshing for Industrial CFD Applications . . . . . . . . . . . . . . 63--76 Pawe\l Czarnul Programming, Tuning and Automatic Parallelization of Irregular Divide-and-Conquer Applications in DAMPVM/DAC . . . . . . . . . . . . . . . 77--93
Bernd Hamann and E. Wes Bethel and Horst Simon and Juan Meza NERSC `Visualization Greenbook': Future Visualization Needs of the DOE Computational Science Community Hosted at NERSC . . . . . . . . . . . . . . . . 97--123 Jack Dongarra and Victor Eijkhout Self-Adapting Numerical Software for Next Generation Applications . . . . . . 125--131 Claudio Luis de Amorim Guest Editor's Preface . . . . . . . . . 133--134 Johann Großschädl Architectural Support for Long Integer Modulo Arithmetic on RISC-Based Smart Cards . . . . . . . . . . . . . . . . . 135--146 Leonardo Bidese de Pinho and Edison Ishikawa and Claudio Luis de Amorim GloVE: a Distributed Environment for Scalable Video-on-demand Systems . . . . 147--161 D. P. Ruchkys and S. W. Song A Parallel Solution to Infer Genetic Network Architectures in Gene Expression Analysis . . . . . . . . . . . . . . . . 163--172 Cristina Boeres and Vinod E. F. Rebello Towards Optimal Static Task Scheduling for Realistic Machine Models: Theory and Practice . . . . . . . . . . . . . . . . 173--189 Adenauer Corrêa Yamin and Jorge Victória Barbosa and Iara Augustin and Luciano Cavalheiro da Silva and Rodrigo Real and Cláudio Geyer and Gerson Cavalheiro Towards Merging Context-Aware, Mobile and Grid Computing . . . . . . . . . . . 191--203
David W. Walker Preface: Grid Computing: Infrastructure and Applications . . . . . . . . . . . . 207--208 Gabriel Mateescu Qaulity of Service on the Grid via Metascheduling with Resource Co-scheduling and Co-reservation . . . . 209--218 Saleem N. Bhatti and Sòren-Aksel Sòrensen and Peter Clark and Jon Crowcroft Network QoS for Grid Systems . . . . . . 219--236 Nader Mohamed and Jameela Al-Jaroodi and Hong Jiang and David Swanson Scalable Bulk Data Transfer in Wide Area Networks . . . . . . . . . . . . . . . . 237--248 Sudharshan Vazhkudai and Jennifer M. Schopf Using Regression Techniques to Predict Large Data Transfers . . . . . . . . . . 249--268 Catherine Houstis and Spyros Lalis and Marios Pitikakis and George V. Vasilakis and Kyriakos Kritikos and Antonis Smardas A Grid Service-based Infrastructure for Accessing Scientific Collections: The Case of the ARION System . . . . . . . . 269--280 Andrew Woolf and Keith Haines and Chunlei Liu A Web Service Model for Climate Data Access on the Grid . . . . . . . . . . . 281--295 Salman AlSairafi and Filippia-Sofia Emmanouil and Moustafa Ghanem and Nikolaos Giannadakis and Yike Guo and Dimitrios Kalaitzopoulos and Michelle Osmond and Anthony Rowe and Jameel Syed and Patrick Wendel The Design of Discovery Net: Towards Open Grid Services for Knowledge Discovery . . . . . . . . . . . . . . . 297--315 Yan Huang JISGA: a Jini-Based Service-Oriented Grid Architecture . . . . . . . . . . . 317--327 Lican Huang and Zhaohui Wu and Yunhe Pan Virtual and Dynamic Hierarchical Architecture for E-Science Grid . . . . 329--347
Craig A. Lee Best Applications Papers from the Third International Workshop on Grid Computing 351--351 Jim Smith and Paul Watson and Anastasios Gounaris and Norman W. Paton and Alvaro A. A. Fernandes and Rizos Sakellariou Distributed Query Processing on the Grid 353--367 Yaohang Li and Michael Mascagni Analysis of Large-scale Grid-based Monte Carlo Applications . . . . . . . . . . . 369--382 Marcio Faerman and Adam Birnbaum and Francine Berman and Henri Casanova Resource Allocation Strategies for Guided Parameter Space Searches . . . . 383--402 William H. Bell and David G. Cameron and A. Paul Millar and Luigi Capozza and Kurt Stockinger and Floriano Zini Optorsim: a Grid Simulator for Studying Dynamic Data Replication Strategies . . 403--416 Christian Pérez and Thierry Priol and André Ribes A Parallel CORBA Component Model for Numerical Code Coupling . . . . . . . . 417--429 Gregor von Laszewski and Branko Ruscic and Kaizar Amin and Patrick Wagstrom and Sriram Krishnan and Sandeep Nijsure A Framework for Building Scientific Knowledge Grids Applied to Thermochemical Tables . . . . . . . . . 431--447 Gabrielle Allen and Tom Goodale and Thomas Radke and Michael Russell and Ed Seidel and Kelly Davis and Konstantinos N. Dolkas and Nikolaos D. Doulamis and Thilo Kielmann and André Merzky and Jarek Nabrzyski and Juliusz Pukacki and John Shalf and Ian Taylor Enabling Applications on the Grid: a GridLab Overview . . . . . . . . . . . . 449--466
Henri Casanova and Francine Berman and Thomas Bartol and Erhan Gokcay and Terry Sejnowski and Adam Birnbaum and Jack Dongarra and Michelle Miller and Mark Ellisman and Marcio Faerman and Graziano Obertelli and Rich Wolski and Stuart Pomerantz and Joel Stiles The Virtual Instrument: Support for Grid-Enabled Mcell Simulations . . . . . 3--17 Katherine Yelick Special Issue on Automatic Performance Tuning . . . . . . . . . . . . . . . . . 19--19 Markus Püschel and José M. F. Moura and Bryan Singer and Jianxin Xiong and Jeremy Johnson and David Padua and Manuela Veloso and Robert W. Johnson Spiral: a Generator for Platform-Adapted Libraries of Signal Processing Algorithms . . . . . . . . . . . . . . . 21--45 Dragan Mirkovi\'c and Lennart Johnsson Automatic Performance Tuning for Fast Fourier Transforms . . . . . . . . . . . 47--64 Richard Vuduc and James W. Demmel and Jeff A. Bilmes Statistical Models for Empirical Search-Based Performance Tuning . . . . 65--94 Michelle Mills Strout and Larry Carter and Jeanne Ferrante and Barbara Kreaseck Sparse Tiling for Stationary Iterative Methods . . . . . . . . . . . . . . . . 95--113 Sriram Sellappa and Siddhartha Chatterjee Cache-Efficient Multigrid Algorithms . . 115--133 Eun-Jin Im and Katherine Yelick and Richard Vuduc Sparsity: Optimization Framework for Sparse Matrix Kernels . . . . . . . . . 135--158 Sathish S. Vadhiyar and Graham E. Fagg and Jack J. Dongarra Towards an Accurate Model for Collective Communications . . . . . . . . . . . . . 159--167
Weicheng Huang and Danesh K. Tafti A Parallel Adaptive Mesh Refinement Algorithm for Solving Nonlinear Dynamical Systems . . . . . . . . . . . 171--181 Y. Deng and J. Glimm and J. W. Davenport and X. Cai and E. Santos Performance Models on QCDOC for Molecular Dynamics with Coulomb Potentials . . . . . . . . . . . . . . . 183--195 Darren J. Kerbyson and Adolfy Hoisie and Scott Pakin and Fabrizio Petrini and Harvey J. Wasserman A Performance Evaluation of an Alpha EV7 Processing Node . . . . . . . . . . . . 199--209 W. Jalby and C. Lemuet and X. Le Pasteur WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing . . . . . . . . . . 211--224 John Mellor-Crummey and John Garvin Optimizing Sparse Matrix--Vector Product Computations Using Unroll and Jam . . . 225--236 Qing Yi and Ken Kennedy Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion . . . . . . . . . . . 237--253 Martin Swany and Rich Wolski Building Performance Topologies for Computational Grids . . . . . . . . . . 255--265 Celso L. Mendes and Daniel A. Reed Monitoring Large Systems Via Statistical Sampling . . . . . . . . . . . . . . . . 267--277
Tony Hey and Anne E. Trefethen UK e-Science Programme: Next Generation Grid Applications . . . . . . . . . . . 285--291 Pascale Vicat-Blanc Primet and Robert Harakaly and Franck Bonnassieux Grid Network Monitoring in the European Datagrid Project . . . . . . . . . . . . 293--304 Roland Wismüller and Marian Bubak and W\lodzimierz Funika and Bartosz Bali\'s A Performance Analysis Tool for Interactive Applications on the Grid . . 305--316 Philip M. Papadopoulos and Caroline A. Papadopoulos and Mason J. Katz and William J. Link and Greg Bruno Configuring Large High-Performance Clusters at Lightspeed: a Case Study . . 317--326 Yiannis Cotronis Composition of Message Passing Interface Applications over MPICH-G2 . . . . . . . 327--339 Otto Sievert and Henri Casanova A Simple MPI Process Swapping Architecture for Iterative Applications 341--352 Graham E. Fagg and Jack J. Dongarra Building and Using a Fault-Tolerant MPI Implementation . . . . . . . . . . . . . 353--361 William Gropp and Ewing Lusk Fault Tolerance in Message Passing Interface Programs . . . . . . . . . . . 363--372 E. Caron and F. Desprez and M. Quinson and F. Suter Performance Evaluation of Linear Algebra Routines . . . . . . . . . . . . . . . . 373--390
Jeremy Kepner HPC Productivity: An Overarching View 393--397 D. E. Post and R. P. Kendall Software Project Management and Quality Engineering Practices for Complex, Coupled Multiphysics, Massively Parallel Computational Simulations: Lessons Learned From ASCI . . . . . . . . . . . 399--416 Marc Snir and David A. Bader A Framework for Measuring Supercomputer Productivity . . . . . . . . . . . . . . 417--432 Thomas Sterling Productivity Metrics and Models for High Performance Computing . . . . . . . . . 433--440 Ken Kennedy and Charles Koelbel and Robert Schreiber Defining and Measuring the Productivity of Programming Languages . . . . . . . . 441--448 Robert W. Numrich Performance Metrics Based on Computational Action . . . . . . . . . . 449--458 Stuart Faulk and John Gustafson and Philip Johnson and Adam Porter and Walter Tichy and Lawrence Votta Measuring High Performance Computing Productivity . . . . . . . . . . . . . . 459--473 J. Gustafson Purpose-Based Benchmarks . . . . . . . . 475--487 David J. Kuck Productivity in High Performance Computing . . . . . . . . . . . . . . . 489--504 Jeremy Kepner High Performance Computing Productivity Model Synthesis . . . . . . . . . . . . 505--516
Kevin McManus and Alison Williams and Mark Cross and Nick Croft and Chris Walshaw Assessing the Scalability of Multiphysics Tools for Modeling Solidification and Melting Processes on Parallel Clusters . . . . . . . . . . . 1--27 Paul M. Eder and James E. Giuliani and Somnath Ghosh Multilevel Parallel Programming for Three-Dimensional Voronoi Cell Finite Element Modeling of Heterogeneous Materials . . . . . . . . . . . . . . . 29--45 Salvatore Orlando and Domenico Laforenza Preface: Selected Papers from the EUROPVM/MPI 2003 Conference, Venice, Italy, 29 September--2 October 2003 . . 47--47 Rajeev Thakur and Rolf Rabenseifner and William Gropp Optimization of Collective Communication Operations in MPICH . . . . . . . . . . 49--66 Edgar Gabriel and Graham E. Fagg and Jack J. Dongarra Evaluating Dynamic Communicators and One-Sided Operations for Current MPI Libraries . . . . . . . . . . . . . . . 67--79 Albert Chan and Frank Dehne and Ryan Taylor CGMGRAPH/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines . . 81--97
Dieter Kranzlmüller and Peter Kacsuk and Jack Dongarra Recent Advances in Parallel Virtual Machine and Message Passing Interface 99--101 Ron Brightwell and Rolf Riesen and Keith D. Underwood Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications 103--117 Rajeev Thakur and William Gropp and Brian Toonen Optimizing the Synchronization Operations in Message Passing Interface One-Sided Communication . . . . . . . . 119--128 Gopalakrishnan Santhanaraman and Jiesheng Wu and Wei Huang and Dhabaleswar K. Panda Designing Zero-Copy Message Passing Interface Derived Datatype Communication Over Infiniband: Alternative Approaches and Performance Evaluation . . . . . . . 129--142 Dawid Kurzyniec and Vaidy Sunderam Failure Resilient Heterogeneous Parallel Computing Across Multidomain Clusters 143--155 Franco Frattolillo Running Large-Scale Applications on Cluster Grids . . . . . . . . . . . . . 157--172
Aristides Patrinos Preface . . . . . . . . . . . . . . . . 175--175 John B. Drake and Philip W. Jones and George R. Carr, Jr. Overview of the Software Design of the Community Climate System Model . . . . . 177--186 Patrick H. Worley and John B. Drake Performance Portability in the Physical Parameterizations of the Community Atmospheric Model . . . . . . . . . . . 187--201 Arthur A. Mirin and William B. Sawyer A Scalable Implementation of a Finite-Volume Dynamical Core in the Community Atmosphere Model . . . . . . . 203--212 William M. Putman and Shian-Jiann Lin and Bo-Wen Shen Cross-Platform Performance of a Portable Communication Module and the NASA Finite Volume General Circulation Model . . . . 213--223 John Dennis and Aimé Fournier and William F. Spotz and Amik St-Cyr and Mark A. Taylor and Stephen J. Thomas and Henry Tufo High-Resolution Mesh Convergence Properties and Parallel Efficiency of a Spectral Element Atmospheric Dynamical Core . . . . . . . . . . . . . . . . . . 225--235 Steven Ghan and Timothy Shippert Load Balancing and Scalability of a Subgrid Orography Scheme in a Global Climate Model . . . . . . . . . . . . . 237--245 Forrest M. Hoffman and Mariana Vertenstein and Hideyuki Kitabata and James B. White III Vectorizing the Community Land Model . . 247--260 Darren J. Kerbyson and Philip W. Jones A Performance Model of the Parallel Ocean Program . . . . . . . . . . . . . 261--276 Jay Larson and Robert Jacob and Everest Ong The Model Coupling Toolkit: a New Fortran90 Toolkit for Building Multiphysics Parallel Coupled Models . . 277--292 Robert Jacob and Jay Larson and Everest Ong $ M \times N $ Communication and Parallel Interpolation in Community Climate System Model Version 3 Using the Model Coupling Toolkit . . . . . . . . . 293--307 Anthony P. Craig and Robert Jacob and Brian Kauffman and Tom Bettge and Jay Larson and Everest Ong and Chris Ding and Yun He CPL6: The New Extensible, High Performance Parallel Coupler for the Community Climate System Model . . . . . 309--327 Yun He and Chris H. Q. Ding Coupling Multicomponent Models with MPH on Distributed Memory Computer Architectures . . . . . . . . . . . . . 329--340 Nancy Collins and Gerhard Theurich and Cecelia DeLuca and Max Suarez and Atanas Trayanov and V. Balaji and Peggy Li and Weiyu Yang and Chris Hill and Arlindo da Silva Design and Implementation of Components in the Earth System Modeling Framework 341--350
Marc Baboulin and Luc Giraud and Serge Gratton A Parallel Distributed Solver for Large Dense Symmetric Systems: Applications to Geodesy and Electromagnetism Problems 353--363 Chun-Ho Liu and Chat-Ming Woo and Dennis Y. C. Leung Performance Analysis of a Linux PC Cluster Using a Direct Numerical Simulation of Fluid Turbulence Code . . 365--374 Ping Wang and Y. Tony Song and Yi Chao and Hongchun Zhang Parallel Computation of the Regional Ocean Modeling System . . . . . . . . . 375--385 Robert Fowler Preface . . . . . . . . . . . . . . . . 387--388 Kostadin Damevski and Steven G. Parker $ M \times N $ Data Redistribution Through Parallel Remote Method Invocation . . . . . . . . . . . . . . . 389--398 Felipe Bertrand and Yongquan Yuan and Kenneth Chiu and Randall Bramley An Approach to Parallel $ M \times N $ Communication . . . . . . . . . . . . . 399--407 Johan Steensland and Jaideep Ray A Partitioner-Centric Model for Structured Adaptive Mesh Refinement Partitioning Trade-Off Optimization: Part I . . . . . . . . . . . . . . . . . 409--422 Keith D. Cooper and Todd Waterman Investigating Adaptive Compilation Using the MIPSPro Compiler . . . . . . . . . . 423--431 Guohua Jin and John Mellor-Crummey Improving Performance by Reducing the Memory Footprint of Scientific Applications . . . . . . . . . . . . . . 433--451 Weikuan Yu and Sayantan Sur and Dhabaleswar K. Panda and Rob T. Aulwes and Rich L. Graham High Performance Broadcast Support in LA-MPI Over Quadrics . . . . . . . . . . 453--463 Graham E. Fagg and Edgar Gabriel and Zizhong Chen and Thara Angskun and George Bosilca and Jelena Pjesivac-Grbovic and Jack J. Dongarra Process Fault Tolerance: Semantics, Design and Applications for High Performance Computing . . . . . . . . . 465--477 Sriram Sankaran and Jeffrey M. Squyres and Brian Barrett and Vishal Sahay and Andrew Lumsdaine and Jason Duell and Paul Hargrove and Eric Roman The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing . . . . . . . . . . . . . 479--493
Larry Carter and Henri Casanova and Jeanne Ferrante and Frédéric Desprez and Yves Robert Preface . . . . . . . . . . . . . . . . 3--4 O. Beaumont and L. Marchal and Y. Robert Complexity Results for Collective Communications on Heterogeneous Platforms . . . . . . . . . . . . . . . 5--17 M. Drozdowski and M. Lawenda and F. Guinand Scheduling Multiple Divisible Loads . . 19--30 Hél\`ene Renard and Yves Robert and Frédéric Vivien Data Redistribution Algorithms for Heterogeneous Processor Rings . . . . . 31--43 Barbara Kreaseck and Larry Carter and Henri Casanova and Jeanne Ferrante and Sagnik Nandy Interference-Aware Scheduling . . . . . 45--59 Yves Caniou and Emmanuel Jeannot Multicriteria Scheduling Heuristics for GridRPC Systems . . . . . . . . . . . . 61--76 Aurélien Bouteiller and Hinde-Lilia Bouziane and Thomas Herault and Pierre Lemarinier and Franck Cappello Hybrid Preemptive Scheduling of Message Passing Interface Applications on Grids 77--90 Darin England and Jon Weissman A Resource Leasing Policy for on-Demand Computing . . . . . . . . . . . . . . . 91--101 Gosia Wrzesi\'nska and Rob V. van Nieuwpoort and Jason Maassen and Thilo Kielmann and Henri E. Bal Fault-Tolerant Scheduling of Fine-Grained Tasks in Grid Environments 103--114 Arjav J. Chakravarti and Gerald Baumgartner and Mario Lauria Self-Organizing Scheduling on the Organic Grid . . . . . . . . . . . . . . 115--130 Asim YarKhan and Keith Seymour and Kiran Sagi and Zhiao Shi and Jack Dongarra Recent Developments in GridSolve . . . . 131--141 Holly Dail and Frédéric Desprez Experiences with Hierarchical Request Flow Management for Network-Enabled Server Environments . . . . . . . . . . 143--157
Osni Marques and Tony Drummond Preface . . . . . . . . . . . . . . . . 161--162 David E. Bernholdt and Benjamin A. Allan and Robert Armstrong and Felipe Bertrand and Kenneth Chiu and Tamara L. Dahlgren and Kostadin Damevski and Wael R. Elwasif and Thomas G. W. Epperly and Madhusudhan Govindaraju and Daniel S. Katz and James A. Kohl and Manoj Krishnan and Gary Kumfert and J. Walter Larson and Sophia Lefantzi and Michael J. Lewis and Allen D. Malony and Lois C. Mclnnes and Jarek Nieplocha and Boyana Norris and Steven G. Parker and Jaideep Ray and Sameer Shende and Theresa L. Windus and Shujia Zhou A Component Architecture for High-Performance Scientific Computing 163--202 Jarek Nieplocha and Bruce Palmer and Vinod Tipparaju and Manojkumar Krishnan and Harold Trease and Edoardo Apr\`a Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit . . . . . . . . . . 203--231 J. Nieplocha and V. Tipparaju and M. Krishnan and D. K. Panda High Performance Remote Memory Access Communication: The ARMCI Approach . . . 233--253 James A. Kohl and Torsten Wilde and David E. Bernholdt Cumulvs: Interacting with High-Performance Scientific Simulations, for Visualization, Steering and Fault Tolerance . . . . . . . . . . . . . . . 255--285 Sameer S. Shende and Allen D. Malony The Tau Parallel Performance System . . 287--311
Jack Dongarra and Bernard Tourancheau Special Issue on Tools in the ACTS Collection 2004 . . . . . . . . . . . . 317--317 A. Bouteiller and T. Herault and G. Krawezik and P. Lemarinier and F. Cappello MPICH-V Project: a Multiprotocol Automatic Fault-Tolerant MPI . . . . . . 319--333 E. Caron and F. Desprez Diet: a Scalable Toolbox to Build Network Enabled Servers on the Grid . . 335--352 B. R. Buck and J. K. Hollingsworth A New Hardware Monitor Design to Measure Data Structure-Specific Cache Eviction Information . . . . . . . . . . . . . . 353--363 L. Marchal and Y. Yang and H. Casanova and Y. Robert Steady-State Scheduling of Multiple Divisible Load Applications on Wide-Area Distributed Computing Platforms . . . . 365--381 X. Liu and A. A. Chien Realistic Large-Scale Online Network Simulation . . . . . . . . . . . . . . . 383--399 E. Lusk and N. Desai and R. Bradshaw and A. Lusk and R. Butler An Interoperability Approach to System Software, Tools, and Libraries for Clusters . . . . . . . . . . . . . . . . 401--407 J. P. Morrison and B. Coghlan and A. Shearer and S. Foley and D. Power and R. Perrott WEBCOM-G: a Candidate Middleware for Grid-Ireland . . . . . . . . . . . . . . 409--422 X. Zhang and B. Rutt and Ü. Çatalyürek and T. Kurç and P. Stoffa and M. Sen and J. Saltz Supporting Scalable and Distributed Data Subsetting and Aggregation in Large-Scale Seismic Data Analysis . . . 423--438
Larry Carter and Henri Casanova and Frédéric Desprez and Jeanne Ferrante and Yves Robert Preface . . . . . . . . . . . . . . . . 441--442 Emmanuel Jeannot and Frédéric Wagner Scheduling Messages For Data Redistribution: An Experimental Study 443--454 Rahul Trivedi and Abhishek Chandra and Jon Weissman Heterogeneity-Aware Workload Distribution in Donation-Based Grids . . 455--466 Kaoutar El Maghraoui and Travis J. Desell and Boleslaw K. Szymanski and Carlos A. Varela The Internet Operating System: Middleware for Adaptive Distributed Computing . . . . . . . . . . . . . . . 467--480 Raphaël Bolze and Franck Cappello and Eddy Caron and Michel Daydé and Frédéric Desprez and Emmanuel Jeannot and Yvon Jégou and Stephane Lanteri and Julien Leduc and Noredine Melab and Guillaume Mornet and Raymond Namyst and Pascale Primet and Benjamin Quetier and Olivier Richard and El-Ghazali Talbi and Iréa Touche Grid'5000: a Large Scale and Highly Reconfigurable Experimental Grid Testbed 481--494 Cynthia Bailey Lee and Allan Snavely On the User--Scheduler Dialogue: Studies of User-Provided Runtime Estimates and Utility Functions . . . . . . . . . . . 495--506 Lionel Eyraud A Pragmatic Analysis of Scheduling Environments on New Computing Platforms 507--516 Pushpinder Kaur Chouhan and Holly Dail and Eddy Caron and Frédéric Vivien Automatic Middleware Deployment Planning on Clusters . . . . . . . . . . . . . . 517--530 Brinkley Sprunt Managing the Complexity of Performance Monitoring Hardware: The Brink Andabyss Approach . . . . . . . . . . . . . . . . 533--540 Mohamed Dahmani and Robert Roy Scalability Modeling For Deterministic Particle Transport Solvers . . . . . . . 541--556 C. Shyam Sunder and G. Baskar and V. Babu and David Strenski A Detailed Performance Analysis of the Interpolation Supplemented Lattice Boltzmann Method on the Cray T3E and Cray X1 . . . . . . . . . . . . . . . . 557--570 Dali Wang and Michael W. Berry and Louis J. Gross On Parallelization of a Spatially-Explicit Structured Ecological Model for Integrated Ecosystem Simulation . . . . . . . . . . . . . . . 571--581
Osman Ya\csar and Hasan Da\=g Preface . . . . . . . . . . . . . . . . 3--4 \.I \.Ilkay Boduro\=glu and Zeynep Erenay A Pattern Recognition Model for Predicting a Financial Crisis in Turkey: Turkish Economic Stability Index . . . . 5--20 Omer Ozan Sonmez and Attila Gursoy A Novel Economic-Based Scheduling Heuristic for Computational Grids . . . 21--29 O. Ya\csar and M. Koça\cs Computational Modeling of Hermetic Reciprocating Compressors . . . . . . . 30--41 Siraj-ul-Islam and Ikram A. Tirmizi and Fazal Haq Quartic Non-Polynomial Splines Approach to the Solution of a System of Second-Order Boundary-Value Problems . . 42--49 Ziya Arnavut Lossless and Near-Lossless Compression of ECG Signals with Block-Sorting Techniques . . . . . . . . . . . . . . . 50--58 Burak Alakent and Mehmet C. Camurdan and Pemra Doruker Mimicking Protein Dynamics by the Integration of Elastic Network Model with Time Series Analysis . . . . . . . 59--65 Berk Onat and Sondan Durukano\=glu and Hasan Da\=g A Parallel Implementation: Real Space Green's Function Technique . . . . . . . 66--74 Alexey Lastovetsky and Ravi Reddy Data Partitioning with a Functional Performance Model of Heterogeneous Processors . . . . . . . . . . . . . . . 76--90 Gyu Sang Choi and Saurabh Agarwal and Jin-Ha Kim and Chita R. Das and Andy B. Yoo Performance Comparison of Coscheduling Algorithms for Non-Dedicated Clusters Through a Generic Framework . . . . . . 91--105 Dimitri J. Mavriplis and Michael J. Aftosmis and Marsha Berger High Resolution Aerospace Applications Using the NASA Columbia Supercomputer 106--126
Beniamino Di Martino and Dieter Kranzlmüller and Jack Dongarra Preface . . . . . . . . . . . . . . . . 129--131 Robert Latham and Robert Ross and Rajeev Thakur Implementing MPI-IO Atomic Mode and Shared File Pointers Using MPI One-Sided Communication . . . . . . . . . . . . . 132--143 Wei-keng Liao and Kenin Coloma and Alok Choudhary and Lee Ward Cooperative Client-Side File Caching for MPI Applications . . . . . . . . . . . . 144--154 Christopher Falzone and Anthony Chan and Ewing Lusk and William Gropp A Portable Method for Finding User Errors in the Usage of MPI Collective Operations . . . . . . . . . . . . . . . 155--165 Narayan Desai and Ewing Lusk and Rick Bradshaw A Composition Environment for MPI Programs . . . . . . . . . . . . . . . . 166--173 Allen D. Malony and Sameer Shende and Alan Morris and Felix Wolf Compensation of Measurement Overhead in Parallel Performance Profiling . . . . . 174--194 Belgacem Ben Youssef and Gang Cheng and Kyriacos Zygourakis and Pauline Markenscoff Parallel Implementation of a Cellular Automaton Modeling the Growth of Three-Dimensional Tissues . . . . . . . 196--209 Katarzyna Rycerz and Alfredo Tirado-Ramos and Alessia Gualandris and Simon F. Portegies Zwart and Marian Bubak and Peter M. A. Sloot Interactive $N$-Body Simulations on the Grid: HLA Versus MPI . . . . . . . . . . 210--221 Stylianos Bounanos and Martin Fleury and Sebastien Nicolas and Anthony Vickers Load-Balanced Drift-Diffusion Model Simulation: Cluster Software Performance Evaluation . . . . . . . . . . . . . . . 222--245
Jeremy Kepner and Hans Zima Preface . . . . . . . . . . . . . . . . 249--250 Zoran Budimli\'c and Mackale Joyner and Ken Kennedy Improving Compilation of Java Scientific Applications . . . . . . . . . . . . . . 251--265 K. Yelick and P. Hilfinger and S. Graham and D. Bonachea and J. Su and A. Kamil and K. Datta and P. Colella and T. Wen Parallel Languages and Compilers: Perspective From the Titanium Experience 266--290 B. L. Chamberlain and D. Callahan and H. P. Zima Parallel Programmability and the Chapel Language . . . . . . . . . . . . . . . . 291--312 R. E. Diaconescu and H. P. Zima An Approach To Data Distributions in Chapel . . . . . . . . . . . . . . . . . 313--335 N. Travinin Bliss and J. Kepner pMATLAB Parallel MATLAB Library . . . . 336--359 Piotr Luszczek and Jack Dongarra High Performance Development for High End Computing With Python Language Wrapper (PLW) . . . . . . . . . . . . . 360--369 G. Tan and L. Xu and Z. Dai and S. Feng and N. Sun A Study of Architectural Optimization Methods in Bioinformatics Applications 371--384
David K. Kahaner Preface . . . . . . . . . . . . . . . . 387--387 H. S. Bhatt and H. J. Kotecha and B. K. Singh and K. Bandyopadhyay and V. H. Patel and A. Dasgupta Connecting Grids Using Communication Satellites . . . . . . . . . . . . . . . 388--404 Chee Shin Yeo and Rajkumar Buyya Pricing for Utility-Driven Resource Management and Allocation in Clusters 405--418 H. S. Bhatt and R. M. Patel and H. J. Kotecha and V. H. Patel and A. Dasgupta GANESH: Grid Application Management and Enhanced Scheduling . . . . . . . . . . 419--428 S. S. Thakur and S. Nandi and R. Bhattacharjee and D. Goswami An Asynchronous Wakeup Power-Saving Protocol for Multi-Hop Ad Hoc Networks 429--442 Adam K. L. Wong and Andrzej M. Goscinski The Performance of a Parallel TSP Program and Byte Sequential Benchmarks Executing on a Shared Cluster . . . . . 443--455 Alfredo Buttari and Jack Dongarra and Julie Langou and Julien Langou and Piotr Luszczek and Jakub Kurzak Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems . . . . . . . . . . . . . 457--466 Alfredo Buttari and Victor Eijkhout and Julien Langou and Salvatore Filippone Performance Optimization and Modeling of Blocked Sparse Kernels . . . . . . . . . 467--484 Charles S. Zender and Harry Mangalam Scaling Properties of Common Statistical Operators for Gridded Datasets . . . . . 485--498
Rupak Biswas and Leonid Oliker Preface . . . . . . . . . . . . . . . . 3--4 Leonid Oliker and Andrew Canning and Jonathan Carter and John Shalf and Stéphane Ethier Scientific Application Performance on Leading Scalar and Vector Supercomputering Platforms . . . . . . . 5--20 Hongzhang Shan and Erich Strohmaier and Ji Qiang Performance Analysis of Leading HPC Architectures With Beambeam3D . . . . . 21--32 Bronis R. de Supinski and Martin Schulz and Vasily V. Bulatov and William Cabot and Bor Chan and Andrew W. Cook and Erik W. Draeger and James N. Glosli and Jeffrey A. Greenough and Keith Henderson and Alison Kubota and Steve Louis and Brian J. Miller and Mehul V. Patel and Thomas E. Spelce and Frederick H. Streitz and Peter L. Williams and Robert K. Yates and Andy Yoo and George Almasi and Gyan Bhanot and Alan Gara and John A. Gunnels and Manish Gupta and Jose Moreira and James Sexton and Bob Walkup and Charles Archer and Francois Gygi and Timothy C. Germann and Kai Kadau and Peter S. Lomdahl and Charles Rendleman and Michael L. Welcome and William McLendon and Bruce Hendrickson and Franz Franchetti and Stefan Kral and Jürgen Lorenz and Christoph W. Überhuber and Edmond Chow and Ümit Çatalyürek BlueGene/L Applications: Parallelism on a Massive Scale . . . . . . . . . . . . 33--51 Sadaf R. Alam and Richard F. Barrett and Mark R. Fahey and Jeffery A. Kuehn and O. E. Bronson Messer and Richard T. Mills and Philip C. Roth and Jeffrey S. Vetter and Patrick H. Worley An Evaluation of the Oak Ridge National Laboratory Cray XT3 . . . . . . . . . . 52--80 German Rodriguez and Rosa M. Badia and Jesus Labarta An Evaluation of Marenostrum Performance 81--96 Robert Hood and Rupak Biswas and Johnny Chang and M. Jahed Djomehri and Haoqiang Jin Benchmarking the Columbia Supercluster 97--112 Aiichiro Nakano and Rajiv K. Kalia and Ken-ichi Nomura and Ashish Sharma and Priya Vashishta and Fuyuki Shimojo and Adri C. T. van Duin and William A. Goddard and Rupak Biswas and Deepak Srivastava and Lin H. Yang De Novo Ultrascale Atomistic Simulations on High-End Parallel Supercomputers . . 113--128
S. R. Tiyyagura and P. Adamidis and R. Rabenseifner and P. Lammers and S. Borowski and F. Lippold and F. Svensson and O. Marxen and S. Haberhauer and A. P. Seitsonen and J. Furthmüller and K. Benkert and M. Galle and T. Bönisch and U. Küster and M. M. Resch Teraflops Sustained Performance With Real World Applications . . . . . . . . 131--148 Michael Wehner and Leonid Oliker and John Shalf Towards Ultra-High Resolution Models of Climate and Weather . . . . . . . . . . 149--165 Dylan G. Allegretti and Garrett T. Kenyon and William C. Priedhorsky Cellular Automata for Distributed Sensor Networks . . . . . . . . . . . . . . . . 167--176 Geoffrey W. Cowles Parallelization of the FVCOM Coastal Ocean Model . . . . . . . . . . . . . . 177--193 Nguyen Hai Chau and Atsushi Kawai and Toshikazu Ebisuzaki Acceleration of Fast Multipole Method Using Special-Purpose Computer GRAPE . . 194--205 Yu-Heng Tseng and Chris Ding Efficient Parallel I/O in Community Atmosphere Model (CAM) . . . . . . . . . 206--218 Jakub Kurzak and Dragan Mirkovi\'c and B. Montgomery Pettitt and S. Lennart Johnsson Automatic Generation of FFT for Translations of Multipole Expansions in Spherical Harmonics . . . . . . . . . . 219--230
Jinjun Chen and Hai Jin and Mengchu Zhou Preface . . . . . . . . . . . . . . . . 235--237 Ru-Yue Ma and Yong-Wei Wu and Xiang-Xu Meng and Shi-Jun Liu and Li Pan Grid-Enabled Workflow Management System Based on BPEL . . . . . . . . . . . . . 238--249 Pilar Herrero and José Luis Bosque and Manuel Salvadores and María S. Pérez WE-AMBLE: a Workflow Engine To Manage Awareness in Collaborative Grid Environments . . . . . . . . . . . . . . 250--267 Andrew Harrison and Ian Taylor and Ian Wang and Matthew Shields WS-RF Workflow in Triana . . . . . . . . 268--283 Wanchun Dou and Jinjun Chen and Jianxun Liu and S. C. Cheung and Guihai Chen and Shaokun Fan A Workflow Engine-Driven SOA-Based Cooperative Computing Paradigm in Grid Environments . . . . . . . . . . . . . . 284--300 Cecilia Gomes and Omer F. Rana and Jose Cunha Extending Grid-Based Workflow Tools With Patterns/Operators . . . . . . . . . . . 301--318 Jinjun Chen and Yun Yang Activity Completion Duration Based Checkpoint Selection for Dynamic Verification of Temporal Constraints in Grid Workflow Systems . . . . . . . . . 319--329 Dang Minh Quan and D. Frank Hsu Mapping Heavy Communication Grid-Based Workflows Onto Grid Resources Within an SLA Context Using Metaheuristics . . . . 330--346 Tristan Glatard and Johan Montagnat and Diane Lingrand and Xavier Pennec Flexible and Efficient Workflow Deployment of Data-Intensive Applications on Grids With MOTEUR . . . 347--360
Antonio Plaza and Chein-I Chang Preface . . . . . . . . . . . . . . . . 363--365 Antonio Plaza and Chein-I Chang Clusters Versus FPGA for Parallel Processing of Hyperspectral Imagery . . 366--385 David Valencia and Alexey Lastovetsky and Maureen O'Flynn and Antonio Plaza and Javier Plaza Parallel Processing of Remotely Sensed Hyperspectral Images on Heterogeneous Networks of Workstations Using HeteroMPI 386--407 Mingkai Hsueh and Chein-I Chang Field Programmable Gate Arrays (FPGA) for Pixel Purity Index Using Blocks of Skewers for Endmember Extraction in Hyperspectral Imagery . . . . . . . . . 408--423 Javier Setoain and Manuel Prieto and Christian Tenllado and Francisco Tirado GPU for Parallel On-Board Hyperspectral Image Processing . . . . . . . . . . . . 424--437 Qian Du and James E. Fowler Low-Complexity Principal Component Analysis for Hyperspectral Image Compression . . . . . . . . . . . . . . 438--448 Uwe Fladrich and Jörg Stiller and Wolfgang E. Nagel Improved Performance for Nodal Spectral Element Operators . . . . . . . . . . . 450--459 Selim Gurun and Rich Wolski and Chandra Krintz and Dan Nurmi On the Efficacy of Computation Offloading Decision-Making Strategies 460--479
Jack J. Dongarra and Julien Langou The Problem With the LINPACK Benchmark 1.0 Matrix Generator . . . . . . . . . . 5--13 Jian He and Alex Verstak and L. T. Watson and M. Sosonkina Performance Modeling and Analysis of a Massively Parallel Direct---Part 1 . . . 14--28 Jian He and Alex Verstak and M. Sosonkina and L. T. Watson Performance Modeling and Analysis of a Massively Parallel Direct---Part 2 . . . 29--41 James Mc Donald and Aaron Golden and S. Gerard Jennings OpenDDA: a Novel High-Performance Computational Framework for the Discrete Dipole Approximation . . . . . . . . . . 42--61 Jin Woo Park and Si Hyong Park and Seung Jo Kim Optimization With High-Cost Objective Function Evaluations in a Computing Grid and an Application To Simulation-Based Design . . . . . . . . . . . . . . . . . 62--83 Sundari M. Sivagama and Sathish S. Vadhiyar and Ravi S. Nanjundiah Dynamic Component Extension: a Strategy for Performance Improvement in Multicomponent Applications . . . . . . 84--98 Marta Beltrán and Antonio Guzmán How to Balance the Load on Heterogeneous Clusters . . . . . . . . . . . . . . . . 99--118
Alexey Lastovetsky and Vladimir Rychkov Accurate and Efficient Estimation of Parameters of Heterogeneous Communication Performance Models . . . . 123--139 Jacques M. Bahi and Jean-Claude Charr and Raphaël Couturier and David Laiymani A Parallel Algorithm To Solve Large Stiff ODE Systems on Grid Systems . . . 140--151 Werner Mach and Erich Schikuta Parallel Algorithms for the Execution of Relational Database Operations Revisited on Grids . . . . . . . . . . . . . . . . 152--170 Anne Benoit and Harald Kosch and Veronika Rehn-Sonigo and Yves Robert Multi-Criteria Scheduling of Pipeline Workflows (and Application To the JPEG Encoder) . . . . . . . . . . . . . . . . 171--187
Jack Dongarra and Bernard Tourancheau Editorial . . . . . . . . . . . . . . . 195--195 Martin J. Chorley and David W. Walker and Martyn F. Guest Hybrid Message-Passing and Shared-Memory Programming in a Molecular Dynamics Application on Multicore Clusters . . . 196--211 Franck Cappello Fault Tolerance in Petascale/Exascale Systems: Current Knowledge, Challenges and Research Opportunities . . . . . . . 212--226 Mark L. James and Andrew A. Shapiro and Paul L. Springer and Hans P. Zima Adaptive Fault Tolerance for Scalable Cluster Computing in Space . . . . . . . 227--241 James S. Plank The Raid-6 Liber8Tion Code . . . . . . . 242--251 Tahsin Kurc and Shannon Hastings and Vijay Kumar and Stephen Langella and Ashish Sharma and Tony Pan and Scott Oster and David Ervin and Justin Permar and Sivaramakrishnan Narayanan and Yolanda Gil and Ewa Deelman and Mary Hall and Joel Saltz HPC and Grid Computing for Integrative Biomedical Research . . . . . . . . . . 252--264 Shuaiwen Song and Rong Ge and Xizhou Feng and Kirk W. Cameron Energy Profiling and Analysis of the HPC Challenge Benchmarks . . . . . . . . . . 265--276 Piotr Luszczek Parallel Programming in MATLAB . . . . . 277--283 Judit Planas and Rosa M. Badia and Eduard Ayguadé and Jesus Labarta Hierarchical Task-Based Programming With StarSs . . . . . . . . . . . . . . . . . 284--299
Jack Dongarra and Pete Beckman and Patrick Aerts and Frank Cappello and Thomas Lippert and Satoshi Matsuoka and Paul Messina and Terry Moore and Rick Stevens and Anne Trefethen and Mateo Valero The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community 309--322 Bernd Mohr Summary of the IESP White Papers . . . . 323--327 Barbara Chapman and Jesús Labarta and Vivek Sarkar and Mitsuhisa Sato Programmability Issues . . . . . . . . . 328--331 Thomas Sterling Models of Computation --- Enabling Exascale . . . . . . . . . . . . . . . . 332--334 Thomas Sterling The Biggest Need: a New Model of Computation . . . . . . . . . . . . . . 335--336 Ewing Lusk Slouching Towards Exascale . . . . . . . 337--339 Jesús Labarta and Eduard Ayguadé and Mateo Valero BSC Vision Towards Exascale . . . . . . 340--343 Laxmikant Kale Programming Models at Exascale: Adaptive Runtime Systems, Incomplete Simple Languages, and Interoperability . . . . 344--346 Arthur Maccabe and Hugo Falter and William Kramer Resource Management . . . . . . . . . . 347--349 Mark Seager and Brent Gorda The Case for a Hierarchical System Model for Linux Clusters . . . . . . . . . . . 350--354 Bernd Mohr and Matthias S. Müller and Wolfgang E. Nagel Performance at Exascale . . . . . . . . 355--356 David Skinner and Alok Choudary On the Importance of End-to-End Application Performance Monitoring and Workload Analysis at the Exascale . . . 357--360 Jean-Yves Berthou and Jean-François Hamelin and Etienne de Rocquigny XXL Simulation for XXIst Century Power Systems Operation . . . . . . . . . . . 361--365 David Keyes Partial Differential Equation-Based Applications and Solvers at Extreme Scale . . . . . . . . . . . . . . . . . 366--368 Peter Michielse Application Analysis and Porting in the PRACE Project . . . . . . . . . . . . . 369--373 Franck Cappello and Al Geist and Bill Gropp and Laxmikant Kale and Bill Kramer and Marc Snir Toward Exascale Resilience . . . . . . . 374--388 William Kramer and David Skinner An Exascale Approach to Software and Hardware Design . . . . . . . . . . . . 389--391 William Kramer and David Skinner Consistent Application Performance at the Exascale . . . . . . . . . . . . . . 392--394 Mark Seager and Brent Gorda A Collaboration and Commercialization Model for Exascale Software Research . . 395--397 Giovanni Aloisio and Sandro Fiore Towards Exascale Distributed Data Management . . . . . . . . . . . . . . . 398--400 Al Geist and Sudip Dosanjh IESP Exascale Challenge: Co-Design of Architectures and Algorithms . . . . . . 401--402 David Barkai The Application Perspective: Seeking Productivity and Performance . . . . . . 403--408 Robert F. Lucas Musings on the Path Forward to Exascale 409--410 Laxmikant Kale Early Application Development/Tuning and Application Characterization/Segmentation . . . . . 411--412 William Gropp and Marc Snir On the Need for a Consortium of Capability Centers . . . . . . . . . . . 413--420 Abani Patra and Rob Pennington and Ed Seidel Exascale Software: Some Questions to Drive the Development . . . . . . . . . 421--422 Anne Trefethen and Nick Higham and Iain Duff and Peter Coveney Developing a High-Performance Computing/Numerical Analysis Roadmap . . 423--426 Al Geist and Robert Lucas Major Computer Science Challenges at Exascale . . . . . . . . . . . . . . . . 427--436 Michael A. Heroux Software Challenges for Extreme Scale Computing: Going From Petascale to Exascale Systems . . . . . . . . . . . . 437--439
Alexey Lastovetsky and Tahar Kechadi Recent Advances in Parallel Virtual Machine and Message Passing Interface 3--4 Pavan Balaji and Anthony Chan and William Gropp and Rajeev Thakur and Ewing Lusk The Importance of Non-Data-Communication Overheads in MPI . . . . . . . . . . . . 5--15 Sameer Kumar and Ahmad Faraj and Amith R. Mamidala and Brian Smith and Gabor Dozsa and Bob Cernohous and John Gunnels and Douglas Miller and Joseph Ratterman and Philip Heidelberger Architecture of the Component Collective Messaging Interface . . . . . . . . . . 16--33 Alexey Lastovetsky and Vladimir Rychkov and Maureen O'Flynn Accurate Heterogeneous Communication Models and a Software Tool for Their Efficient Estimation . . . . . . . . . . 34--48 Pavan Balaji and Darius Buntinas and David Goodell and William Gropp and Rajeev Thakur Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming . . . . 49--57 Jesper Larsson Träff and Andreas Ripke and Christian Siebert and Pavan Balaji and Rajeev Thakur and William Gropp A Pipelined Algorithm for Large, Irregular All-Gather Problems . . . . . 58--68 Ron Brightwell Exploiting Direct Access Shared Memory for MPI on Multi-Core Processors . . . . 69--77 Javier Garcia Blas and Florin Isaila and Jesus Carretero and David Singh and Felix Garcia-Carballeira Implementation and Evaluation of File Write-Back and Prefetching for MPI-IO Over GPFS . . . . . . . . . . . . . . . 78--92 Stephen F. Siegel and Andrew R. Siegel Madre: the Memory-Aware Data Redistribution Engine . . . . . . . . . 93--104
Hong Li and Linda Petzold Efficient Parallelization of the Stochastic Simulation Algorithm for Chemically Reacting Systems on the Graphics Processing Unit . . . . . . . . 107--116 Julianne Chung and Philip Sternberg and Chao Yang High-Performance Three-Dimensional Image Reconstruction for Molecular Structure Determination . . . . . . . . . . . . . 117--135 J. C. Pichel and D. B. Heras and J. C. Cabaleiro and A. J. García-Loureiro and F. F. Rivera Increasing the Locality of Iterative Methods and Its Application to the Simulation of Semiconductor Devices . . 136--153 Do Van Tuan and Ui-Pil Chong Audio Watermarking Based on Advanced Wigner Distribution and Important Frequency Peaks . . . . . . . . . . . . 154--163 Florin Isaila and Francisco Javier Garcia Blas and Jesús Carretero and Wei-keng Liao and Alok Choudhary A Scalable Message Passing Interface Implementation of an Ad-Hoc Parallel I/O system . . . . . . . . . . . . . . . . . 164--184 Derek Groen and Stefan Harfst and Simon Portegies Zwart The Living Application: a Self-Organizing System for Complex Grid Tasks . . . . . . . . . . . . . . . . . 185--193 Mehmet Belgin and Godmar Back and Calvin J. Ribbens Operation Stacking for Ensemble Computations With Variable Convergence 194--212 Patrick Downes and Oisín Curran and John Cunniffe and Andy Shearer Distributed Radiotherapy Simulation with the Webcom Workflow System . . . . . . . 213--227 Bruce Palmer and Vidhya Gurumoorthi and Alexandre Tartakovsky and Tim Scheibe A Component-Based Framework for Smoothed Particle Hydrodynamics Simulations of Reactive Fluid Flow in Porous Media . . 228--239
Jose Ignacio Garzon and Eduardo Huedo and Ruben Santiago Montero and Ignacio Martin Llorente and Pablo Chacon End-To-End Cache System for Grid Computing: Design and Efficiency Analysis of a High-Throughput Bioinformatic Docking Application . . . 243--264 M. E. Tryby and B. Y. Mirghani and G. K. Mahinthakumar and S. R. Ranjithan A Solution Framework for Environmental Characterization Problems . . . . . . . 265--283 Ewa Deelman Grids and Clouds: Making Workflow Applications Work in Heterogeneous Distributed Environments . . . . . . . . 284--298 Thomas Hauser and Raymond LeBeau Optimization of a Computational Fluid Dynamics Code for the Memory Hierarchy: a Case Study . . . . . . . . . . . . . . 299--318 Toshiyuki Imamura and Takuma Kano and Susumu Yamada and Masahiko Okumura and Masahiko Machida High-Performance Quantum Simulation for Coupled Josephson Junctions on the Earth Simulator: a Challenge To the Schrödinger Equation on $ 256^4 $ Grids . . . . . . 319--334 Marc Casas and Rosa M. Badia and Jesús Labarta Automatic Phase Detection and Structure Extraction of MPI Applications . . . . . 335--360
Ninghui Sun and David Kahaner and Debbie Chen High-performance Computing in China: Research and Applications . . . . . . . 363--409 Abhinav Bhatelé and Lukasz Wesolowski and Eric Bohm and Edgar Solomonik and Laxmikant V. Kalé Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar . . . . . . . . . . . . . . . . . 411--427 Nicolas Gourdain and Marc Montagnac and Fabien Wlassow and Michel Gazaix High-performance Computing to Simulate Large-scale Industrial Flows in Multistage Compressors . . . . . . . . . 429--443 Ke Liu and Hai Jin and Jinjun Chen and Xiao Liu and Dong Yuan and Yun Yang A Compromised-Time-Cost Scheduling Algorithm in SwinDeW-C for Instance-Intensive Cost-Constrained Workflows on a Cloud Computing Platform 445--456 Paula Cecilia Fritzsche and Jose-Jesus Fernandez and Dolores Rexachs and Inmaculada Garcia and Emilio Luque Analytical Performance Prediction for Iterative Reconstruction Techniques in Electron Tomography of Biological Structures . . . . . . . . . . . . . . . 457--468 Nor Asilah Wati Abdul Hamid and Paul Coddington Comparison of MPI Benchmark Programs on Shared Memory and Distributed Memory Machines (Point-to-Point Communication) 469--483 Hung-Hsun Su and Max Billingsley and Alan D. George Parallel Performance Wizard: a Performance System for the Analysis of Partitioned Global-Address-Space Applications . . . . . . . . . . . . . . 485--510 Rajib Nath and Stanimire Tomov and Jack Dongarra An Improved Magma GEMM for Fermi Graphics Processing Units . . . . . . . 511--515
Jack Dongarra and Pete Beckman and Terry Moore and Patrick Aerts and Giovanni Aloisio and Jean-Claude Andre and David Barkai and Jean-Yves Berthou and Taisuke Boku and Bertrand Braunschweig and Franck Cappello and Barbara Chapman and Xuebin Chi and Alok Choudhary and Sudip Dosanjh and Thom Dunning and Sandro Fiore and Al Geist and Bill Gropp and Robert Harrison and Mark Hereld and Michael Heroux and Adolfy Hoisie and Koh Hotta and Zhong Jin and Yutaka Ishikawa and Fred Johnson and Sanjay Kale and Richard Kenway and David Keyes and Bill Kramer and Jesus Labarta and Alain Lichnewsky and Thomas Lippert and Bob Lucas and Barney Maccabe and Satoshi Matsuoka and Paul Messina and Peter Michielse and Bernd Mohr and Matthias S. Mueller and Wolfgang E. Nagel and Hiroshi Nakashima and Michael E. Papka and Dan Reed and Mitsuhisa Sato and Ed Seidel and John Shalf and David Skinner and Marc Snir and Thomas Sterling and Rick Stevens and Fred Streitz and Bob Sugar and Shinji Sumimoto and William Tang and John Taylor and Rajeev Thakur and Anne Trefethen and Mateo Valero and Aad van der Steen and Jeffrey Vetter and Peg Williams and Robert Wisniewski and Kathy Yelick The International Exascale Software Project roadmap . . . . . . . . . . . . 3--60 Kamran Karimi and Neil Dickson and Firas Hamze High-performance Physics Simulations Using Multi-core CPUs and GPGPUs in a Volunteer Computing Context . . . . . . 61--69 Silvio Migliori and Giovanni Bracco and Lorella Fatone and Maria Cristina Recchioni and Francesco Zirilli A Parallel Code for Time-Dependent Acoustic Scattering Involving Passive or Smart Obstacles . . . . . . . . . . . . 70--92 Rosa Filgueira and David E. Singh and Jesús Carretero and Alejandro Calderón and Félix García Adaptive-CoMPI: Enhancing MPI-Based Applications' Performance and Scalability by using Adaptive Compression . . . . . . . . . . . . . . 93--114 D. Guo and W. Gropp Optimizing Sparse Data Structures for Matrix-vector Multiply . . . . . . . . . 115--131
Pavan Balaji and Abhinav Vishnu Special Issue on Programming Models and Systems Software Support for High-End Computing Applications . . . . . . . . . 135--136 Pieter Bellens and Josep M. Perez and Rosa M. Badia and Jesus Labarta Making the Best of Temporal Locality: Just-in-Time Renaming and Lazy Write-Back on the Cell/B.E. . . . . . . 137--147 Kazutomo Yoshii and Kamil Iskra and Harish Naik and Pete Beckman and P. Chris Broekema Performance and Scalability Evaluation of `Big Memory' on Blue Gene Linux . . . 148--160 Patrick Widener and Matthew Wolf and Hasan Abbasi and Scott McManus and Mary Payne and Matthew Barrick and Jack Pulikottil and Patrick Bridges and Karsten Schwan Exploiting Latent I/O Asynchrony in Petascale Science Applications . . . . . 161--179 Rinku Gupta and Harish Naik and Pete Beckman Understanding Checkpointing Overheads on Massive-Scale Systems: Analysis of the IBM Blue Gene/P System . . . . . . . . . 180--192 S. Murtaza and A. G. Hoekstra and P. M. A. Sloot Cellular Automata Simulations on a FPGA cluster . . . . . . . . . . . . . . . . 193--204 Juan Gómez-Luna and José María González-Linares and José Ignacio Benavides and Emilio L. Zapata and Nicolás Guil Load Balancing versus Occupancy Maximization on Graphics Processing Units: the Generalized Hough Transform as a Case Study . . . . . . . . . . . . 205--222 Abigail Hunter and Faisal Saied and Chinh Le and Marisol Koslowski Large-Scale $3$D Phase Field Dislocation Dynamics Simulations on High-Performance Architectures . . . . . . . . . . . . . 223--235 Matthias Korch and Thomas Rauber Parallel Low-Storage Runge--Kutta Solvers for ODE Systems with Limited Access Distance . . . . . . . . . . . . 236--255
Jack Dongarra and Bernard Tourancheau Selected papers of the Workshop on Clusters, Clouds and Grids for Scientific Computing (CCGSC) . . . . . . 259--260 Anne Benoit and Paul Renaud-Goud and Yves Robert Models and complexity results for performance and energy optimization of concurrent streaming applications . . . 261--273 Scott Callaghan and Philip Maechling and Patrick Small and Kevin Milner and Gideon Juve and Thomas H. Jordan and Ewa Deelman and Gaurang Mehta and Karan Vahi and Dan Gunter and Keith Beattie and Christopher Brooks Metrics for heterogeneous scientific workflows: a case study of an earthquake science application . . . . . . . . . . 274--285 Ananta Tiwari and Jeffrey K. Hollingsworth and Chun Chen and Mary Hall and Chunhua Liao and Daniel J. Quinlan and Jacqueline Chame Auto-tuning full applications: a case study . . . . . . . . . . . . . . . . . 286--294 Christian Obrecht and Frédéric Kuznik and Bernard Tourancheau and Jean-Jacques Roux The Thelma Project: Multi-GPU implementation of the lattice Boltzmann method . . . . . . . . . . . . . . . . . 295--303 Deb Agarwal and You-Wei Cheah and Dan Fay and Jonathan Fay and Dean Guo and Tony Hey and Marty Humphrey and Keith Jackson and Jie Li and Christophe Poulain and Youngryel Ryu and Catharine van Ingen Data-intensive science: the Terapixel and Modisazure projects . . . . . . . . 304--316 Philip M. Papadopoulos Extending clusters to Amazon EC2 using the Rocks toolkit . . . . . . . . . . . 317--327 Manu Shantharam and Anirban Chatterjee and Padma Raghavan Exploiting dense substructures for fast sparse matrix vector multiplication . . 328--341 Charles Lively and Xingfu Wu and Valerie Taylor and Shirley Moore and Hung-Ching Chang and Kirk Cameron Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems . . . . . . . . . . . . . . . . 342--350
Balaji Pavan and Vishnu Abhinav Special Issue on Programming Models, Software and Tools for High-End Computing . . . . . . . . . . . . . . . 353--354 Yong Chen and Huaiyu Zhu and Philip C. Roth and Hui Jin and Xian-He Sun Global-aware and multi-order context-based prefetching for high-performance processors . . . . . . 355--370 Gengbin Zheng and Abhinav Bhatelé and Esteban Meneses and Laxmikant V. Kalé Periodic hierarchical load balancing for large supercomputers . . . . . . . . . . 371--385 Barry Smith and Hong Zhang Sparse triangular solves for $ I L U $ revisited: data layout crucial to better performance . . . . . . . . . . . . . . 386--391 Alexander E. MacDonald and Jacques Middlecoff and Tom Henderson and Jin-Luen Lee A general method for modeling on irregular grids . . . . . . . . . . . . 392--403 Francisco D. Igual and Rafael Mayo and Timothy Hartley and Ümit V. Çatalyürek and Antonio Ruiz and Manuel Ujaldon Color and texture analysis using emerging parallel architectures . . . . 404--427 Heike Jagode and Andreas Knüpfer and Jack Dongarra and Matthias Jurenz and Matthias S. Müller and Wolfgang E. Nagel Trace-based performance analysis for the petascale simulation code Flash . . . . 428--439 Tp Collignon and Mb van Gijzen Fast iterative solution of large sparse linear systems on geographically separated clusters . . . . . . . . . . . 440--450 David L. Hart Measuring TeraGrid: workload characterization for a high-performance computing federation . . . . . . . . . . 451--465 Shanti Bhushan and Pablo Carrica and Jianming Yang and Frederick Stern Scalability studies and large grid computations for surface combatant using CFDShip-Iowa . . . . . . . . . . . . . . 466--487 M. Chau and R. Couturier and J. Bahi and P. Spiteri Parallel solution of the obstacle problem in Grid environments . . . . . . 488--495 Aydin Buluç and John R. Gilbert The Combinatorial BLAS: design, implementation, and applications . . . . 496--509
Anjuli Bamzai Preface . . . . . . . . . . . . . . . . 3--4 John M. Dennis and Mariana Vertenstein and Patrick H. Worley and Arthur A. Mirin and Anthony P. Craig and Robert Jacob and Sheri Mickelson Computational performance of ultra-high-resolution capability in the Community Earth System Model . . . . . . 5--16 Arthur A. Mirin and Patrick H. Worley Improving the performance scalability of the Community Atmosphere Model . . . . . 17--30 Anthony P. Craig and Mariana Vertenstein and Robert Jacob A new flexible coupler for Earth system modeling developed for CCSM4 and CESM1 31--42 John M. Dennis and Jim Edwards and Ray Loy and Robert Jacob and Arthur A. Mirin and Anthony P. Craig and Mariana Vertenstein An application-level parallel I/O library for Earth system models . . . . 43--53 Katherine J. Evans and Andrew G. Salinger and Patrick H. Worley and Stephen F. Price and William H. Lipscomb and Jeffrey A. Nichols and James B. White III and Mauro Perego and Mariana Vertenstein and James Edwards and Jean-François Lemieux A modern solver interface to manage solution algorithms in the Community Earth System Model . . . . . . . . . . . 54--62 Peter H. Lauritzen and Arthur A. Mirin and John Truesdale and Kevin Raeder and Jeffrey L. Anderson and Julio Bacmeister and Richard B. Neale Implementation of new diffusion/filtering operators in the CAM--FV dynamical core . . . . . . . . . 63--73 John M. Dennis and Jim Edwards and Katherine J. Evans and Oksana Guba and Peter H. Lauritzen and Arthur A. Mirin and Amik St-Cyr and Mark A. Taylor and Patrick H. Worley CAM--SE: a scalable spectral element dynamical core for the Community Atmosphere Model . . . . . . . . . . . . 74--89
Torsten Hoefler and Kamil Iskra Operating systems and runtime environments on supercomputers . . . . . 93--94 Jan Stoess and Udo Steinberg and Volkmar Uhlig and Jens Kehne and Jonathan Appavoo and Amos Waterland A lightweight virtual machine monitor for Blue Gene/P . . . . . . . . . . . . 95--109 Stephen L. Olivier and Allan K. Porterfield and Kyle B. Wheeler and Michael Spiegel and Jan F. Prins OpenMP task scheduling strategies for multicore NUMA systems . . . . . . . . . 110--124 Patrick G. Bridges and Dorian Arnold and Kevin T. Pedretti and Madhav Suresh and Feng Lu and Peter Dinda and Russ Joseph and Jack Lange Virtual-machine-based emulation of future generation high-performance computing systems . . . . . . . . . . . 125--135 Terry Jones Linux kernel co-scheduling and bulk synchronous parallelism . . . . . . . . 136--145 Pavan Balaji and Jiayuan Meng Applications for the Heterogeneous Computing Era . . . . . . . . . . . . . 146--147 Yan Li and Jeffrey R. Diamond and Xu Wang and Haibo Lin and Yudong Yang and Zhenxing Han Large-scale Fast Fourier Transform on a heterogeneous multi-core system . . . . 148--158 Sean Whalen and Sophie Engle and Sean Peisert and Matt Bishop Network-theoretic classification of parallel computation patterns . . . . . 159--169 Haicheng Wu and Gregory Diamos and Jin Wang and Si Li and Sudhakar Yalamanchili Characterization and transformation of unstructured control flow in bulk synchronous GPU applications . . . . . . 170--185
Beniamino Di Martino and Eduard Mehofer and Dan Quinlan and Markus Schordan Graphical processing units and scientific applications . . . . . . . . 189--191 Lancelot Perrotte and Guillaume Saupin Fast GPU perspective grid construction and triangle tracing for exhaustive ray tracing of highly coherent rays . . . . 192--202 Aria Shahingohar and Roy Eagleson A framework for GPU accelerated deformable object modeling . . . . . . . 203--214 Andreas Monitzer Combining lattice Boltzmann and discrete element methods on a graphics processor 215--226 Marc-André Hermanns and Markus Geimer and Bernd Mohr and Felix Wolf Scalable detection of MPI-2 remote memory access inefficiency patterns . . 227--236 Francisco D. Igual and Rafael Mayo and Timothy D. R. Hartley and Ümit V. Çatalyürek and Antonio Ruiz and Manuel Ujaldon Retracted: Color and texture analysis on emerging parallel architectures . . . . 237--259 Thomas Gw Epperly and Gary Kumfert and Tamara Dahlgren and Dietmar Ebner and Jim Leek and Adrian Prantl and Scott Kohn High-performance language interoperability for scientific computing through Babel . . . . . . . . 260--274 Maciej Malawski and Tomasz Gubala and Marian Bubak Component-based approach for programming and running scientific applications on grids and clouds . . . . . . . . . . . . 275--295 Florian Ries and Tommaso De Marco and Roberto Guerrieri Tuning solution of large non-Hermitian linear systems on multiple graphics processing unit accelerated workstations 296--309 Keiichiro Fukazawa and Takayuki Umeda Performance measurement of magnetohydrodynamic code for space plasma on typical scalar-type supercomputer systems with a large number of cores . . . . . . . . . . . . 310--318 Chirag Dekate and Matthew Anderson and Maciej Brodowicz and Hartmut Kaiser and Bryce Adelstein-Lelbach and Thomas Sterling Improving the scalability of parallel $N$-body applications with an event-driven constraint-based execution model . . . . . . . . . . . . . . . . . 319--332
Horst Simon and Jack Dongarra and Hemant Shukla Introduction to the Special Issue . . . 335--336 Rio Yokota and Lorena A. Barba A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems . . . . . . . . . . . . 337--346 Richard L. Martin and Prabhat and David D. Donofrio and James A. Sethian and Maciej Haranczyk Accelerating analysis of void space in porous materials on multicore and GPU platforms . . . . . . . . . . . . . . . 347--357 Melvyn Wright Adaptive Real-Time Imaging Synthesis Telescopes . . . . . . . . . . . . . . . 358--366 Hsi-Yu Schive and Ui-Han Zhang and Tzihong Chiueh Directionally unsplit hydrodynamic schemes with hybrid Mpi/Openmp/GPU parallelization in AMR . . . . . . . . . 367--377 Michael Commer and Filipe Rnc Maia and Gregory A. Newman Iterative Krylov solution methods for geophysical electromagnetic simulations on throughput-oriented processing units 378--385 Bálint Joó and Mike A. Clark Lattice QCD on GPU clusters, using the Quda library and the Chroma software system . . . . . . . . . . . . . . . . . 386--398 E. Wes Bethel and Mark Howison Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning . . . . . . . . 399--412 Mahantesh Halappanavar and John Feo and Oreste Villa and Antonino Tumeo and Alex Pothen Approximate weighted matching on emerging manycore and multithreaded architectures . . . . . . . . . . . . . 413--430
David E. Keyes and Lois C. McInnes and Carol Woodward and William Gropp and Eric Myra and Michael Pernice and John Bell and Jed Brown and Alain Clo and Jeffrey Connors and Emil Constantinescu and Don Estep and Kate Evans and Charbel Farhat and Ammar Hakim and Glenn Hammond and Glen Hansen and Judith Hill and Tobin Isaac and Xiangmin Jiao and Kirk Jordan and Dinesh Kaushik and Efthimios Kaxiras and Alice Koniges and Kihwan Lee and Aaron Lott and Qiming Lu and John Magerlein and Reed Maxwell and Michael McCourt and Miriam Mehl and Roger Pawlowski and Amanda P. Randles and Daniel Reynolds and Beatrice Rivi\`ere and Ulrich Rüde and Tim Scheibe and John Shadid and Brendan Sheehan and Mark Shephard and Andrew Siegel and Barry Smith and Xianzhu Tang and Cian Wilson and Barbara Wohlmuth Multiphysics simulations: Challenges and opportunities . . . . . . . . . . . . . 4--83
Pavan Balaji and Satoshi Matsuoka Guest Editors' Introduction: Special Issue on Applications for the Heterogeneous Computing Era . . . . . . 87--88 Mitesh R. Meswani and Laura Carrington and Didem Unat and Allan Snavely and Scott Baden and Stephen Poole Modeling and predicting performance of high performance computing applications on hardware accelerators . . . . . . . . 89--108 Huming Zhu and Yu Cao and Zhiqiang Zhou and Maoguo Gong and Licheng Jiao Parallel unsupervised Synthetic Aperture Radar image change detection on a graphics processing unit . . . . . . . . 109--122 Torsten Hoefler and Kamil Iskra Operating systems and runtime environments on supercomputers . . . . . 123--123 Brian Kocoloski and John Lange Improving compute node performance using virtualization . . . . . . . . . . . . . 124--135 Hakan Akkan and Michael Lang and Lorie Liebrock Understanding and isolating the noise in the Linux kernel . . . . . . . . . . . . 136--146 Abhishek Kulkarni and Latchesar Ionkov and Michael Lang and Andrew Lumsdaine Optimizing process creation and execution on multi-core architectures 147--161 Jan Treibig and Georg Hager and Hannes G. Hofmann and Joachim Hornegger and Gerhard Wellein Pushing the limits for medical image reconstruction on recent standard multicore processors . . . . . . . . . . 162--177 M. A. Clark and P. C. La Plante and L. J. Greenhill Accelerating radio astronomy cross-correlation with graphics processing units . . . . . . . . . . . . 178--192 Tareq Malas and Aron J. Ahmadia and Jed Brown and John A. Gunnels and David E. Keyes Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor . . . . . . . . . 193--209 K. G. Felker and A. R. Siegel and S. F. Siegel Optimizing Memory Constrained Environments in Monte Carlo Nuclear Reactor Simulations . . . . . . . . . . 210--216 Kiran Narayanan and Angel Mora and Nicholas Allsopp and Tamer El Sayed A hybrid, massively parallel implementation of a genetic algorithm for optimization of the impact performance of a metal/polymer composite plate . . . . . . . . . . . . . . . . . 217--227
Jack Dongarra and Bernard Tourancheau Introduction for August Special Issue CCDSC . . . . . . . . . . . . . . . . . 231--231 Mohammed EM Diouri and Ghislain L. Tsafack Chetsa and Olivier Glück and Laurent Lef\`evre and Jean-Marc Pierson and Patricia Stolf and Georges Da Costa Energy efficiency in high-performance computing with and without knowledge of applications and services . . . . . . . 232--243 Wesley Bland and Aurelien Bouteiller and Thomas Herault and George Bosilca and Jack Dongarra Post-failure recovery of MPI communication capability: Design and rationale . . . . . . . . . . . . . . . 244--254 Kyle Spafford and Jeffrey S. Vetter and Thomas Benson and Mike Parker Modeling synthetic aperture radar computation with Aspen . . . . . . . . . 255--262 Joel H. Saltz and George Teodoro and Tony Pan and Lee A. D. Cooper and Jun Kong and Scott Klasky and Tahsin M. Kurc Feature-based analysis of large-scale spatio-temporal sensor data on hybrid architectures . . . . . . . . . . . . . 263--272 Ana Gainaru and Franck Cappello and Marc Snir and William Kramer Failure prediction for HPC systems and applications: Current situation and open issues . . . . . . . . . . . . . . . . . 273--282 Emmanuel Jeannot Symbolic mapping and allocation for the Cholesky factorization on NUMA machines: Results and optimizations . . . . . . . 283--290 Vicente Peruffo Minotto and Claudio Rosito Jung and Luiz Gonzaga da Silveira, Jr. and Bowon Lee GPU-based approaches for real-time sound source localization using the SRP--PHAT algorithm . . . . . . . . . . . . . . . 291--306 Chaofeng Hou and Ji Xu and Peng Wang and Wenlai Huang and Xiaowei Wang and Wei Ge and Xianfeng He and Li Guo and Jinghai Li Petascale molecular dynamics simulation of crystalline silicon on Tianhe-1A . . 307--317 José R. Sanjurjo and Margarita Amor and Montserrat Bóo and Ramón Doallo Parallel Monte Carlo radiosity using scene partitioning . . . . . . . . . . . 318--334 I. Carpenter and R. K. Archibald and K. J. Evans and J. Larkin and P. Micikevicius and M. Norman and J. Rosinski and J. Schwarzmeier and M. A. Taylor Progress towards accelerating HOMME on hybrid multi-core systems . . . . . . . 335--347 Ekaterini Solomou and Spiros Kostopoulos and Konstantinos Sidiropoulos and Emmanouil Athanasiadis and Eleftherios Lavdas and Dimitris Glotsos and George Sakellaropoulos and Petros Zampakis and John Stonham and Dionisis Cavouras Designing a pattern recognition system on GPU for discriminating between patients with micro-ischaemic and multiple sclerosis lesions, using MRI images . . . . . . . . . . . . . . . . . 348--359 Anshu Dubey and Alan C. Calder and Christopher Daley and Robert T. Fisher and C. Graziani and George C. Jordan and Donald Q. Lamb and Lynn B. Reid and Dean M. Townsley and Klaus Weide Pragmatic optimizations for better scientific utilization of large supercomputers . . . . . . . . . . . . . 360--373
Leonid Oliker and Richard Vuduc Introduction for Special Issue on Autotuning . . . . . . . . . . . . . . . 377--378 Protonu Basu and Mary Hall and Malik Khan and Suchit Maindola and Saurav Muralidharan and Shreyas Ramalingam and Axel Rivera and Manu Shantharam and Anand Venkat Towards making autotuning mainstream . . 379--393 Ray S. Chen and Jeffrey K. Hollingsworth Towards fully automatic auto-tuning: Leveraging language features of Chapel 394--402 Nicholas Chaimov and Scott Biersdorff and Allen D. Malony Tools for machine-learning-based empirical autotuning and specialization 403--411 Sanket Tavarageri and J. Ramanujam and P. Sadayappan Adaptive parallel tiled code generation and accelerated auto-tuning . . . . . . 412--425 Diego Fabregat-Traver and Paolo Bientinesi Application-tailored linear algebra algorithms: a search-based approach . . 426--439 Bryan Marker and Don Batory and Robert van de Geijn A case study in mechanically deriving dense linear algebra code . . . . . . . 440--453 Khaled Z. Ibrahim and Kamesh Madduri and Samuel Williams and Bei Wang and Stephane Ethier and Leonid Oliker Analysis and optimization of gyrokinetic toroidal simulations on homogeneous and heterogeneous platforms . . . . . . . . 454--473 Thanadech Thanakornworakij and Raja Nassar and Chokchai Box Leangsuksun and Mihaela Paun Reliability model of a system of $k$ nodes with simultaneous failures for high-performance computing applications 474--482 Raúl Valín and Carlos Sampedro and Natalia Seoane and Manuel Aldegunde and Antonio Garcia-Loureiro and Andres Godoy and Francisco Gámiz Optimisation and parallelisation of a $2$D MOSFET multi-subband ensemble Monte Carlo simulator . . . . . . . . . . . . 483--492 Jacobo Lobeiras and Moisés Viñas and Margarita Amor and Basilio B. Fraguela and Manuel Arenaz and J. A. García and M. J. Castro Parallelization of shallow water simulations on current multi-threaded systems . . . . . . . . . . . . . . . . 493--512
Dahai Guo and William Gropp Applications of the streamed storage format for sparse matrix operations . . 3--12 Miguel O. Bernabeu and James Southern and Nicholas Wilson and Peter Strazdins and Jonathan Cooper and Joe Pitt-Francis Chaste: a case study of parallelisation of an open source finite-element solver with applications to computational cardiac electrophysiology simulation . . 13--32 Guillermo Vigueras and Juan M. Orduña and Miguel Lozano and José M. Cecilia and José M. García Accelerating collision detection for large-scale crowd simulation on multi-core and many-core architectures 33--49 Liang Zheng and Huai Zhang and Taras Gerya and Matthew Knepley and David A. Yuen and Yaolin Shi Implementation of a multigrid solver on a GPU for Stokes equations with strongly variable viscosity based on Matlab and CUDA . . . . . . . . . . . . . . . . . . 50--60 Alexander Vondrous and Michael Selzer and Johannes Hötzer and Britta Nestler Parallel computing for phase-field models . . . . . . . . . . . . . . . . . 61--72 Yasuhiro Idomura and Motoki Nakata and Susumu Yamada and Masahiko Machida and Toshiyuki Imamura and Tomohiko Watanabe and Masanori Nunami and Hikaru Inoue and Shigenobu Tsutsumi and Ikuo Miyoshi and Naoyuki Shida Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer . . . . . . . . . . . . . . . 73--86 Andrew R. Siegel and Kord Smith and Paul K. Romano and Benoit Forget and Kyle G. Felker Multi-core performance studies of a Monte Carlo neutron transport code . . . 87--96 Iain Bethune and J. Mark Bull and Nicholas J. Dingle and Nicholas J. Higham Performance analysis of asynchronous Jacobi's method implemented in MPI, SHMEM and OpenMP . . . . . . . . . . . . 97--111 Diego Darriba and Guillermo L. Taboada and Ramón Doallo and David Posada High-performance computing selection of models of DNA substitution for multicore clusters . . . . . . . . . . . . . . . . 112--125
Marc Snir and Robert W. Wisniewski and Jacob A. Abraham and Sarita V. Adve and Saurabh Bagchi and Pavan Balaji and Jim Belak and Pradip Bose and Franck Cappello and Bill Carlson and Andrew A. Chien and Paul Coteus and Nathan A. DeBardeleben and Pedro C. Diniz and Christian Engelmann and Mattan Erez and Saverio Fazzari and Al Geist and Rinku Gupta and Fred Johnson and Sriram Krishnamoorthy and Sven Leyffer and Dean Liberty and Subhasish Mitra and Todd Munson and Rob Schreiber and Jon Stearley and Eric Van Hensbergen Addressing failures in exascale computing . . . . . . . . . . . . . . . 129--173 Shijin Yuan and Shicheng Wen and Hongyu Li and Xinfeng Zhang and Qin Liu An optimization framework for adjoint-based climate simulations: a case study of the Zebiak--Cane model . . 174--182 Wangdong Yang and Kenli Li and Yan Liu and Lin Shi and Lanjun Wan Optimization of quasi-diagonal matrix-vector multiplication on GPU . . 183--195 Azzam Haidar and Stanimire Tomov and Jack Dongarra and Raffaele Solc\`a and Thomas Schulthess A novel hybrid CPU-GPU generalized eigensolver for electronic structure calculations based on fine-grained memory aware tasks . . . . . . . . . . . 196--209 Marin Bougeret and Henri Casanova and Yves Robert and Frédéric Vivien and Dounia Zaidouni Using group replication for resilience on exascale systems . . . . . . . . . . 210--224 Anshu Dubey and Katie Antypas and Alan C. Calder and Chris Daley and Bruce Fryxell and J. Brad Gallagher and Donald Q. Lamb and Dongwook Lee and Kevin Olson and Lynn B. Reid and Paul Rich and Paul M. Ricker and Katherine M. Riley and Robert Rosner and Andrew Siegel and Noel T. Taylor and Klaus Weide and Francis X. Timmes and Natasha Vladimirova and John ZuHone Evolution of FLASH, a multi-physics scientific simulation code for high-performance computing . . . . . . . 225--237 Shuai Che and Kevin Skadron BenchFriend: Correlating the performance of GPU benchmarks . . . . . . . . . . . 238--250
Jiayuan Meng and Toshio Endo Special Issue on Applications for the Heterogeneous Computing Era . . . . . . 253--254 Tao Gao and Yutong Lu and Baida Zhang and Guang Suo Using the Intel Many Integrated Core to accelerate graph traversal . . . . . . . 255--266 Dip Sankar Banerjee and Parikshit Sakurikar and Kishore Kothapalli Comparison sorting on hybrid multicore architectures for fixed and variable length keys . . . . . . . . . . . . . . 267--284 Andra Hugo and Abdou Guermouche and Pierre-André Wacrenier and Raymond Namyst Composing multiple StarPU applications over heterogeneous machines: a supervised approach . . . . . . . . . . 285--300 Yang You and Haohuan Fu and Shuaiwen Leon Song and Maryam Mehri Dehnavi and Lin Gan and Xiaomeng Huang and Guangwen Yang Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax--Wendroff correction stencil . . . . . . . . . . . 301--318 Yash Ukidave and Amir Kavyan Ziabari and Perhaad Mistry and Gunar Schirner and David Kaeli Analyzing power efficiency of optimization techniques and algorithm design methods for applications on heterogeneous platforms . . . . . . . . 319--334 Yukihiro Hasegawa and Jun-Ichi Iwata and Miwako Tsuji and Daisuke Takahashi and Atsushi Oshiyama and Kazuo Minami and Taisuke Boku and Hikaru Inoue and Yoshito Kitazawa and Ikuo Miyoshi and Mitsuo Yokokawa Performance evaluation of ultra-large-scale first-principles electronic structure calculation code on the K computer . . . . . . . . . . . . . 335--355 Sandra Wienke and Marcel Spekowius and Alesja Dammer and Dieter an Mey and Christian Hopmann and Matthias S. Müller Towards an accurate simulation of the crystallisation process in injection moulded plastic components by hybrid parallelisation . . . . . . . . . . . . 356--367 M. Luisa Córdoba and Antonio García Dopico and M. Isabel García and Francisco Rosales and Jesús Arnaiz and Rodolfo Bermejo and Pedro Galán del Sastre Efficient parallelization of a regional ocean model for the western Mediterranean Sea . . . . . . . . . . . 368--383
Javier Garcia Blas and Jesus Carretero Recent advances in the Message Passing Interface . . . . . . . . . . . . . . . 387--389 James Dinan and Ryan E. Grant and Pavan Balaji and David Goodell and Douglas Miller and Marc Snir and Rajeev Thakur Enabling communication concurrency through flexible MPI endpoints . . . . . 390--405 Christi Symeonidou and Polyvios Pratikakis and Dimitrios S. Nikolopoulos and Angelos Bilas Distributed region-based memory allocation and synchronization . . . . . 406--414 Brian W. Barrett and Ron Brightwell and Ryan Grant and Simon D. Hammond and K. Scott Hemmert An evaluation of MPI message rate on hybrid-core processors . . . . . . . . . 415--424 Emmanuelle Saillard and Patrick Carribault and Denis Barthou PARCOACH: Combining static and dynamic validation of MPI collective communications . . . . . . . . . . . . . 425--434 Judicael A. Zounmevo and Dries Kimpe and Robert Ross and Ahmad Afsahi Extreme-scale computing services over MPI: Experiences, observations and features proposal for next-generation message passing interface . . . . . . . 435--449 Sameer Kumar and Amith Mamidala and Philip Heidelberger and Dong Chen and Daniel Faraj Optimization of MPI collective operations on the IBM Blue Gene/Q supercomputer . . . . . . . . . . . . . 450--464
Kamil Iskra and Torsten Hoefler Operating systems and runtime environments on supercomputers . . . . . 3--4 Scott Levy and Kurt B. Ferreira and Patrick G. Bridges and Aidan P. Thompson and Christian Trott A study of the viability of exploiting memory content similarity to improve resilience to memory errors . . . . . . 5--20 Yin Lu and Yong Chen and Yu Zhuang and Jialin Liu and Rajeev Thakur Collective input/output under memory constraints . . . . . . . . . . . . . . 21--36 Erik Vermij and Leandro Fiorin and Rik Jongerius and Christoph Hagleitner and Koen Bertels Challenges in exascale radio astronomy: Can the SKA ride the technology wave? 37--50 Jun Chai and Johan Hake and Nan Wu and Mei Wen and Xing Cai and Glenn T. Lines and Jing Yang and Huayou Su and Chunyuan Zhang and Xiangke Liao Towards simulation of subcellular calcium dynamics at nanometre resolution 51--63 Manish Bajpai and Phalguni Gupta and Prabhat Munshi Fast multi-processor multi-GPU based algorithm of tomographic inversion for $3$D image reconstruction . . . . . . . 64--72 Henri Casanova and Fanny Dufossé and Yves Robert and Frédéric Vivien Mapping applications on volatile resource . . . . . . . . . . . . . . . . 73--91 Mark Gates and Michael T. Heath and John Lambros High-performance hybrid CPU and GPU parallel algorithm for digital volume correlation . . . . . . . . . . . . . . 92--106 Lynn Wood and Jeff Daily and Michael Henry and Bruce Palmer and Karen Schuchardt and Donald Dazlich and Ross Heikes and David Randall A global climate model agent for high spatial and temporal resolution data . . 107--116
Simon McIntosh-Smith and James Price and Richard B. Sessions and Amaurys A. Ibarra High performance in silico virtual drug screening on many-core processors . . . 119--134 Hari K. Raghavan and Sathish S. Vadhiyar Adaptive executions of hyperbolic block-structured AMR applications on GPU systems . . . . . . . . . . . . . . . . 135--153 Anthony P. Craig and Sheri A. Mickelson and Elizabeth C. Hunke and David A. Bailey Improved parallel performance of the CICE model in CESM1 . . . . . . . . . . 154--165 Hee Won Lee and Mihail L. Sichitiu and David Thuente High-performance emulation of heterogeneous systems using adaptive time dilation . . . . . . . . . . . . . 166--183 Fv Grigoriev and Av Sulimov and Igor Kochikov and Oa Kondakova and Vb Sulimov and Av Tikhonravov High-performance atomistic modeling of optical thin films deposited by energetic processes . . . . . . . . . . 184--192 Azzam Haidar and Tingxing Dong and Piotr Luszczek and Stanimire Tomov and Jack Dongarra Batched matrix computations on hardware accelerators based on GPUs . . . . . . . 193--208 Didem Unat and Cy Chan and Weiqun Zhang and Samuel Williams and John Bachan and John Bell and John Shalf ExaSAT: an exascale co-design tool for performance modeling . . . . . . . . . . 209--232 Tomas Ekeberg and Stefan Engblom and Jing Liu Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters . . . . . . . . . . . . . . . . 233--243
Frédéric Magoul\`es and Mark Parsons and Lorna Smith Innovative Algorithms for Extreme Scale Computing . . . . . . . . . . . . . . . 247--248 Vincent Reverdy and Jean-Michel Alimi and Vincent Bouillot and Yann Rasera and Pier-Stefano Corasaniti and Ir\`ene Balm\`es and Stéphane Requena and Xavier Delaruelle and Jean-Noel Richet DEUS full observable universe simulations: Numerical challenge and outlooks . . . . . . . . . . . . . . . . 249--260 George Mozdzynski and Mats Hamrud and Nils Wedi A Partitioned Global Address Space implementation of the European Centre for Medium Range Weather Forecasts Integrated Forecasting System . . . . . 261--273 Alan Gray and Alistair Hart and Oliver Henrich and Kevin Stratford Scaling soft matter physics to thousands of graphics processing units in parallel 274--283 Frédéric Magoul\`es and Abal-Kassim Cheik Ahamed Alinea: an Advanced Linear Algebra Library for Massively Parallel Computations on Graphics Processing Units . . . . . . . . . . . . . . . . . 284--310 Stefano Markidis and Jing Gong and Michael Schliephake and Erwin Laure and Alistair Hart and David Henty and Katherine Heisey and Paul Fischer OpenACC acceleration of the Nek5000 spectral element code . . . . . . . . . 311--319 Michael T. Heath A tale of two laws . . . . . . . . . . . 320--330 Unai Lopez-Novoa and Jon Sáenz and Alexander Mendiburu and Jose Miguel-Alonso An efficient implementation of kernel density estimation for multi-core and many-core architectures . . . . . . . . 331--347 Nan Dun and Hajime Fujita and John R. Tramm and Andrew A. Chien and Andrew R. Siegel Data decomposition in Monte Carlo neutron transport simulations using global view arrays . . . . . . . . . . . 348--365 Hartwig Anzt and Stanimire Tomov and Piotr Luszczek and William Sawyer and Jack Dongarra Acceleration of GPU-based Krylov solvers via data transfer reduction . . . . . . 366--383
Dewan Ibtesham and Kurt B. Ferreira and Dorian Arnold A checkpoint compression study for high-performance computing systems . . . 387--402 Austin R. Benson and Sven Schmit and Robert Schreiber Silent error detection in numerical time-stepping schemes . . . . . . . . . 403--421 Erlin Yao and Jiutian Zhang and Mingyu Chen and Guangming Tan and Ninghui Sun Detection of soft errors in $ L U $ decomposition with partial pivoting using algorithm-based fault tolerance 422--436 Steven McDonagh and Cigdem Beyan and Phoenix X. Huang and Robert B. Fisher Applying semi-synchronised task farming to large-scale computer vision problems 437--460 Marco Aldinucci and Guilherme Peretti Pezzi and Maurizio Drocco and Concetto Spampinato and Massimo Torquati Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern . . . . . . . . . . . . . . . . 461--472 Massimo Minervini and Cristian Rusu and Mario Damiano and Valter Tucci and Angelo Bifone and Alessandro Gozzi and Sotirios A. Tsaftaris Large-scale analysis of neuroimaging data on commercial clouds with content-aware resource allocation strategies . . . . . . . . . . . . . . . 473--488 Tim Besard and Bjorn De Sutter and Andrés Frías-Velázquez and Wilfried Philips Case study of multiple trace transform implementations . . . . . . . . . . . . 489--505 Jorge González-Domínguez and Jan Christian Kässens and Lars Wienbrandt and Bertil Schmidt Large-scale genome-wide association studies on a GPU cluster using a CUDA-accelerated PGAS programming model 506--510
Jack Dongarra and Michael A. Heroux and Piotr Luszczek High-performance conjugate-gradient benchmark: a new metric for ranking high-performance computing systems . . . 3--10 Jongsoo Park and Mikhail Smelyanskiy and Karthikeyan Vaidyanathan and Alexander Heinecke and Dhiraj D. Kalamkar and Md Mosotofa Ali Patwary and Vadim Pirogov and Pradeep Dubey and Xing Liu and Carlos Rosales and Cyril Mazauric and Christopher Daley Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors 11--27 Everett Phillips and Massimiliano Fatica Performance analysis of the high-performance conjugate gradient benchmark on GPUs . . . . . . . . . . . 28--38 Yiqun Liu and Chao Yang and Fangfang Liu and Xianyi Zhang and Yutong Lu and Yunfei Du and Canqun Yang and Min Xie and Xiangke Liao 623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores . . 39--54 Kiyoshi Kumahata and Kazuo Minami and Naoya Maruyama High-performance conjugate gradient performance improvement on the K computer . . . . . . . . . . . . . . . . 55--70 Toshitaka Baba and Kazuto Ando and Daisuke Matsuoka and Mamoru Hyodo and Takane Hori and Narumi Takahashi and Ryoko Obayashi and Yoshiyuki Imato and Dai Kitamura and Hitoshi Uehara and Toshihiro Kato and Ryotaro Saka Large-scale, high-speed tsunami prediction for the Great Nankai Trough Earthquake on the K computer . . . . . . 71--84 Edmond Chow and Xing Liu and Sanchit Misra and Marat Dukhan and Mikhail Smelyanskiy and Jeff R. Hammond and Yunfei Du and Xiang-Ke Liao and Pradeep Dubey Scaling up Hartree--Fock calculations on Tianhe-2 . . . . . . . . . . . . . . . . 85--102 Dahai Guo and William Gropp and Luke N. Olson A hybrid format for better performance of sparse matrix-vector multiplication on a GPU . . . . . . . . . . . . . . . . 103--120 Patrick M. Widener and Scott Levy and Kurt B. Ferreira and Torsten Hoefler On noise and the performance benefit of nonblocking collectives . . . . . . . . 121--133
Jiri Jaros and Alistair P. Rendell and Bradley E. Treeby Full-wave nonlinear ultrasound simulation on distributed clusters with applications in high-intensity focused ultrasound . . . . . . . . . . . . . . . 137--155 Xinqiang Miao and Xianlong Jin and Junhong Ding Improving the parallel efficiency of large-scale structural dynamic analysis using a hierarchical approach . . . . . 156--168 Bozhong Liu and Weidong Qiu and Lin Jiang and Zheng Gong Software pipelining for graphic processing unit acceleration: Partition, scheduling and granularity . . . . . . . 169--185 Rone Kwei Lim and J. William Pro and Matthew R. Begley and Marcel Utz and Linda R. Petzold High-performance simulation of fracture in idealized brick and mortar composites using adaptive Monte Carlo minimization on the GPU . . . . . . . . . . . . . . . 186--199 Hoang-Vu Dang and Bertil Schmidt and Andreas Hildebrandt and Tuan Tu Tran and Anna Katharina Hildebrandt CUDA-enabled hierarchical ward clustering of protein structures based on the nearest neighbour chain algorithm 200--211 Tanzima Islam and Kathryn Mohror and Martin Schulz Exploring the MPI tool information interface: features and capabilities . . 212--222 Bruce Palmer and William Perkins and Yousu Chen and Shuangshuang Jin and David Callahan and Kevin Glass and Ruisheng Diao and Mark Rice and Stephen Elbert and Mallikarjuna Vallem and Zhenyu Huang GridPACK\TM: a framework for developing power grid simulations on high-performance computing platforms . . 223--240 Teng Wang and Kevin Vasko and Zhuo Liu and Hui Chen and Weikuan Yu Enhance parallel input/output with cross-bundle aggregation . . . . . . . . 241--256
Yi Liu and Xiongzi Ge and David Hung-Chang Du and Xiaoxia Huang Par-BF: a parallel partitioned Bloom filter for dynamic data sets . . . . . . 259--275 Adnan Ozsoy An efficient parallelization of longest prefix match and application on data compression . . . . . . . . . . . . . . 276--289 Daniele Pianu and Roberto Nerino and Claudia Ferraris and Antonio Chimienti A novel approach to train random forests on GPU for computer vision applications using local features . . . . . . . . . . 290--304 Ignacio Laguna and David F. Richards and Todd Gamblin and Martin Schulz and Bronis R. de Supinski and Kathryn Mohror and Howard Pritchard Evaluating and extending user-level fault tolerance in MPI applications . . 305--319 Matthew Otten and Jing Gong and Azamat Mametjanov and Aaron Vose and John Levesque and Paul Fischer and Misun Min An MPI/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication . . . . . . . . 320--334 Md. Mohsin Ali and Peter E. Strazdins and Brendan Harding and Markus Hegland Complex scientific applications made fault-tolerant with the sparse grid combination technique . . . . . . . . . 335--359 William Boyd and Andrew Siegel and Shuo He and Benoit Forget and Kord Smith Parallel performance results for the OpenMOC neutron transport code on multicore platforms . . . . . . . . . . 360--375
Zsolt Horváth and Rui Ap Perdigão and Jürgen Waser and Daniel Cornel and Artem Konev and Günter Blöschl Kepler shuffle for real-world flood simulations on GPUs . . . . . . . . . . 379--395 Shuibing He and Yan Liu and Yang Wang and Xian-He Sun and Chuanhe Huang Enhancing hybrid parallel file system through performance and space-aware data layout . . . . . . . . . . . . . . . . . 396--410 Seiji Tsuboi and Kazuto Ando and Takayuki Miyoshi and Daniel Peter and Dimitri Komatitsch and Jeroen Tromp A 1.8 trillion degrees-of-freedom, 1.24 petaflops global seismic wave simulation on the K computer . . . . . . . . . . . 411--422 Huda Ibeid and Rio Yokota and David Keyes A performance model for the communication in fast multipole methods on high-performance computing platforms 423--437 Pavol Bauer and Stefan Engblom and Stefan Widgren Fast event-based epidemiological simulations on national scales . . . . . 438--453 Kazuto Ando and Mamoru Hyodo and Toshitaka Baba and Takane Hori and Toshihiro Kato and Masaru Watanabe and Shin-ichi Ichikawa and Hisakuni Kitahara and Hitoshi Uehara and Hikaru Inoue Parallel-algorithm extension for tsunami and earthquake-cycle simulators for massively parallel execution on the K computer . . . . . . . . . . . . . . . . 454--468 Alejandro Calderón and Alberto García and Félix García-Carballeira and Jesús Carretero and Javier Fernández Improving performance using computational compression through memoization: a case study using a railway power consumption simulator . . 469--485 Jonathan Y. Kemal and Roger L. Davis and John D. Owens Multidisciplinary simulation acceleration using multiple shared memory graphical processing units . . . 486--508
Jack Dongarra and Bernard Tourancheau Guest Editor's Note: Special Issue on Clusters, Clouds and Data for Scientific Computing . . . . . . . . . . . . . . . 3--3 Ewa Deelman and Christopher Carothers and Anirban Mandal and Brian Tierney and Jeffrey S. Vetter and Ilya Baldin and Claris Castillo and Gideon Juve and Dariusz Król and Vickie Lynch and Ben Mayer and Jeremy Meredith and Thomas Proffen and Paul Ruth and Rafael Ferreira da Silva PANORAMA: an approach to performance modeling and diagnosis of extreme-scale workflows . . . . . . . . . . . . . . . 4--18 Michela Taufer and Arnold L. Rosenberg Scheduling DAG-based workflows on single cloud instances: High-performance and cost effectiveness with a static scheduler . . . . . . . . . . . . . . . 19--31 George Teodoro and Tahsin Kurc and Guilherme Andrade and Jun Kong and Renato Ferreira and Joel Saltz Application performance analysis and efficient execution on systems with multi-core CPUs, GPUs and MICs: a case study with microscopy image analysis . . 32--51 Anne Benoit and Saurabh K. Raina and Yves Robert Efficient checkpoint/verification patterns . . . . . . . . . . . . . . . . 52--65 Enric Tejedor and Yolanda Becerra and Guillem Alomar and Anna Queralt and Rosa M. Badia and Jordi Torres and Toni Cortes and Jesús Labarta PyCOMPSs: Parallel computational workflows in Python . . . . . . . . . . 66--82 Marc Buffat and Anne Cadiou and Lionel Le Penven and Christophe Pera In situ analysis and visualization of massively parallel computations . . . . 83--90 Shad Kirmani and Jeonghyung Park and Padma Raghavan An embedded sectioning scheme for multiprocessor topology-aware mapping of irregular applications . . . . . . . . . 91--103 Al Geist and Daniel A. Reed A survey of high-performance computing scaling challenges . . . . . . . . . . . 104--113
William Spataro and Giuseppe A. Trunfio and Georgios Ch. Sirakoulis High performance computing in modelling and simulation . . . . . . . . . . . . . 117--118 Ana Flávia P. Camargos and Viviane C. Silva and Jean-M. Guichon and Gérard Meunier GPU-accelerated iterative solution of complex-entry systems issued from $3$D edge-FEA of electromagnetics in the frequency domain . . . . . . . . . . . . 119--133 Themistoklis Giitsidis and Nikolaos I. Dourvas and Georgios Ch Sirakoulis Parallel implementation of aircraft disembarking and emergency evacuation based on cellular automata . . . . . . . 134--151 Irfan Uddin One-IPC high-level simulation of microthreaded many-core architectures 152--162 Davide Spataro and Donato D'Ambrosio and Giuseppe Filippone and Rocco Rongo and William Spataro and Davide Marocco The new SCIARA-fv3 numerical model and acceleration by GPGPU strategies . . . . 163--176
Anonymous Preface . . . . . . . . . . . . . . . . 179--180 Anonymous Notice . . . . . . . . . . . . . . . . . 181--181 Ivan Merelli and Paolo Cozzi and Elisabetta Ronchieri and Daniele Cesini and Daniele D'Agostino Porting bioinformatics applications from grid to cloud: a macromolecular surface analysis application case study . . . . 182--195 Fabio Tordini and Maurizio Drocco and Claudia Misale and Luciano Milanesi and Pietro Li\`o and Ivan Merelli and Massimo Torquati and Marco Aldinucci NuChart-II: The road to a fast and scalable tool for Hi-C data analysis . . 196--211 Matthias Diener and Eduardo Hm Cruz and Philippe Oa Navaux Modeling memory access behavior for data mapping . . . . . . . . . . . . . . . . 212--228 Jan G. Cornelis and Jan Lemeire and Tim Bruylants and Peter Schelkens Heterogeneous acceleration of volumetric JPEG 2000 using OpenCL . . . . . . . . . 229--245 Fredrik Robertsén and Jan Westerholm and Keijo Mattila Designing a graphics processing unit accelerated petaflop capable lattice Boltzmann solver: Read aligned data layouts and asynchronous communication 246--255
Rune Havnung Bakken and Lars Moland Eliassen Real-time three-dimensional skeletonisation using general-purpose computing on graphics processing units applied to computer vision-based human pose estimation . . . . . . . . . . . . 259--273 Lena Oden and Holger Fröning InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU . . . . . . . . . . . . . . 274--284 Anish Varghese and Bob Edwards and Gaurav Mitra and Alistair P. Rendell Programming the Adapteva Epiphany 64-core network-on-chip coprocessor . . 285--302 Miaoqing Huang and Chenggang Lai and Xuan Shi and Zhijun Hao and Haihang You Study of parallel programming models on computer clusters with Intel MIC coprocessors . . . . . . . . . . . . . . 303--315 Rosa Filguiera and Amrey Krause and Malcolm Atkinson and Iraklis Klampanos and Alexander Moreno dispel4py: a Python framework for data-intensive scientific computing . . 316--334 Anthony Kougkas and Hassan Eslami and Xian-He Sun and Rajeev Thakur and William Gropp Rethinking key--value store for parallel I/O optimization . . . . . . . . . . . . 335--356
Pavan Balaji and Zhiyi Huang Special issue on programming models and applications for multicores and manycores . . . . . . . . . . . . . . . 359--360 Yao Wu and Long Zheng and Brian Heilig and Guang R. Gao HAMR: a dataflow-based real-time in-memory cluster computing engine . . . 361--374 Hartwig Anzt and Stanimire Tomov and Jack Dongarra On the performance and energy efficiency of sparse linear algebra on GPUs . . . . 375--390 Jonathan C. Beard and Peng Li and Roger D. Chamberlain RaftLib: a C++ template library for high performance stream parallel processing 391--404 Nicholas Chaimov and Khaled Z. Ibrahim and Samuel Williams and Costin Iancu Reaching bandwidth saturation using transparent injection parallelization 405--421 Ahmad Qawasmeh and Maxime R. Hugues and Henri Calandra and Barbara M. Chapman Performance portability in reverse time migration and seismic modelling via OpenACC . . . . . . . . . . . . . . . . 422--440 Peng Li and Jonathan C. Beard and Jeremy D. Buhler Deadlock-free buffer configuration for stream computing . . . . . . . . . . . . 441--450 Akhil Langer and Ehsan Totoni and Udatta Palekar and Laxmikant V. Kalé Energy-optimal configuration selection for manycore chips with variation . . . 451--466
Gordon Bell and David H. Bailey and Jack Dongarra and Alan H. Karp and Kevin Walsh A look back on 30 years of the Gordon Bell Prize . . . . . . . . . . . . . . . 469--484 Felix Schmitt and Robert Dietrich and Guido Juckeland Scalable critical-path analysis and optimization guidance for hybrid MPI--CUDA applications . . . . . . . . . 485--498 Sebastião Miranda and Jonas Feldt and Frederico Pratas and Ricardo A. Mata and Nuno Roma and Pedro Tomás Efficient parallelization of perturbative Monte Carlo QM/MM simulations in heterogeneous platforms 499--516 Chao Jin and Bronis R. de Supinski and David Abramson and Heidi Poxon and Luiz DeRose and Minh Ngoc Dinh and Mark Endrei and Elizabeth R. Jessup A survey on software methods to improve the energy efficiency of parallel computing . . . . . . . . . . . . . . . 517--549 Timothy Dykes and Claudio Gheller and Marzia Rivi and Mel Krokos Splotch . . . . . . . . . . . . . . . . 550--563 A. Chien and P. Balaji and N. Dun and A. Fang and H. Fujita and K. Iskra and Z. Rubenstein and Z. Zheng and J. Hammond and I. Laguna and D. Richards and A. Dubey and B. van Straalen and M. Hoemmen and M. Heroux and K. Teranishi and A. Siegel Exploring versioned distributed arrays for resilience in scientific applications . . . . . . . . . . . . . . 564--590
Jack Dongarra and Bernard Tourancheau Guest editors' note . . . . . . . . . . 3--3 Ewing Lusk and Ralph Butler and Steven C. Pieper Evolution of a minimal parallel programming model . . . . . . . . . . . 4--13 Yiannis Georgiou and Emmanuel Jeannot and Guillaume Mercier and Ad\`ele Villiermet Topology-aware job mapping . . . . . . . 14--27 Brice Videau and Kevin Pouget and Luigi Genovese and Thierry Deutsch and Dimitri Komatitsch and Frédéric Desprez and Jean-François Méhaut BOAST . . . . . . . . . . . . . . . . . 28--44 Javier Conejero and Sandra Corella and Rosa M. Badia and Jesus Labarta Task-based programming in COMPSs to converge from HPC to big data . . . . . 45--60 Supun Kamburugamuve and Pulasthi Wickramasinghe and Saliya Ekanayake and Geoffrey C. Fox Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink 61--73 Charalampos Chalios and Giorgis Georgakoudis and Konstantinos Tovletoglou and George Karakonstantis and Hans Vandierendonck and Dimitrios S. Nikolopoulos DARE . . . . . . . . . . . . . . . . . . 74--88 Anne Benoit and Lo\"\ic Pottier and Yves Robert Resilient co-scheduling of malleable applications . . . . . . . . . . . . . . 89--103 Moustafa AbdelBaky and Javier Diaz-Montes and Manish Parashar Software-defined environments for science and engineering . . . . . . . . 104--122 Guillaume Aupy and Anne Benoit and Sicheng Dai and Lo\"\ic Pottier and Padma Raghavan and Yves Robert and Manu Shantharam Co-scheduling Amdahl applications on cache-partitioned systems . . . . . . . 123--138 George Bosilca and Aurelien Bouteiller and Amina Guermouche and Thomas Herault and Yves Robert and Pierre Sens and Jack Dongarra A failure detector for HPC platforms . . 139--158 Ewa Deelman and Tom Peterka and Ilkay Altintas and Christopher D. Carothers and Kerstin Kleese van Dam and Kenneth Moreland and Manish Parashar and Lavanya Ramakrishnan and Michela Taufer and Jeffrey Vetter The future of scientific workflows . . . 159--175 Anne Benoit and Laurent Lef\`evre and Anne-Cécile Orgerie and Issam Ra\"\is Reducing the energy consumption of large-scale computing systems through combined shutdown policies with multiple constraints . . . . . . . . . . . . . . 176--188 David W. Walker Morton ordering of $2$D arrays for efficient access to hierarchical memory 189--203
Fande Kong and Xiao-Chuan Cai Scalability study of an implicit solver for coupled fluid-structure interaction problems on unstructured meshes in $3$D 207--219 Hartwig Anzt and Moritz Kreutzer and Eduardo Ponce and Gregory D. Peterson and Gerhard Wellein and Jack Dongarra Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs . . . . . . . . . . . . . . . . . . 220--230 Michael O. Lam and Jeffrey K. Hollingsworth Fine-grained floating-point precision analysis . . . . . . . . . . . . . . . . 231--245 Lídia Kuan and Frederico Pratas and Leonel Sousa and Pedro Tomás MrBayes sMC$^3$: Accelerating Bayesian inference of phylogenetic trees . . . . 246--265 Martin Kronbichler and Ababacar Diagne and Hanna Holmgren A fast massively parallel two-phase flow solver for microfluidic chip simulation 266--287 Alan Gray and Kevin Stratford A lightweight approach to performance portability with targetDP . . . . . . . 288--301 Carmen Cotelo and María Aránzazu Amo Baladrón and Roland Aznar and Pablo Lorente and Pablo Rey and Aurelio Rodríguez On the successful coexistence of oceanographic operational services with other computational workloads . . . . . 302--313
Miguel A. Vega-Rodríguez and Álvaro Rubio-Largo Parallelism in computational biology . . 317--320 Nathan T. Weeks and Glenn R. Luecke and Brandon M. Groth and Marina Kraeva and Li Ma and Luke M. Kramer and James E. Koltes and James M. Reecy High-performance epistasis detection in quantitative trait GWAS . . . . . . . . 321--336 Enzo Rucci and Carlos Garcia and Guillermo Botella and Armando E. De Giusti and Marcelo Naiouf and Manuel Prieto-Matias OSWALD: OpenCL Smith--Waterman on Altera's FPGA for Large Protein Databases . . . . . . . . . . . . . . . 337--350 F. Auricchio and M. Ferretti and A. Lefieux and M. Musci and A. Reali and S. Trimarchi and A. Veneziani Parallelizing a finite element solver in computational hemodynamics . . . . . . . 351--362 Suejb Memeti and Sabri Pllana A machine learning approach for accelerating DNA sequence analysis . . . 363--379 Francesco Asnicar and Luca Masera and Emanuela Coller and Caterina Gallo and Nadir Sella and Thomas Tolio and Paolo Morettin and Luca Erculiani and Francesca Galante and Stanislau Semeniuta and Giulia Malacarne and Kristof Engelen and Andrea Argentini and Valter Cavecchia and Claudio Moser and Enrico Blanzieri NES$^2$RA: Network expansion by stratified variable subsetting and ranking aggregation . . . . . . . . . . 380--392 Héctor Martínez and Sergio Barrachina and Maribel Castillo and Joaquín Tárraga and Ignacio Medina and Joaquín Dopazo and Enrique S. Quintana-Ortí A framework for genomic sequencing on clusters of multicore and manycore processors . . . . . . . . . . . . . . . 393--406 Sebastian Daberdaku and Carlo Ferrari Computing voxelised representations of macromolecular surfaces . . . . . . . . 407--432
M. Asch and T. Moore and R. Badia and M. Beck and P. Beckman and T. Bidot and F. Bodin and F. Cappello and A. Choudhary and B. de Supinski and E. Deelman and J. Dongarra and A. Dubey and G. Fox and H. Fu and S. Girona and W. Gropp and M. Heroux and Y. Ishikawa and K. Keahey and D. Keyes and W. Kramer and J-F Lavignon and Y. Lu and S. Matsuoka and B. Mohr and D. Reed and S. Requena and J. Saltz and T. Schulthess and R. Stevens and M. Swany and A. Szalay and W. Tang and G. Varoquaux and J.-P. Vilotte and R. Wisniewski and Z. Xu and I. Zacharov Big data and extreme-scale computing . . 435--479 Roman Wyrzykowski and Ewa Deelman Guest Editor's note . . . . . . . . . . 480--481 Adrian Klusek and Pawe\l Topa and Jaros\law Was and Robert Luba\'s An implementation of the Social Distances Model using multi-GPU systems 482--495 Krzysztof Jurczuk and Marek Kretowski and Johanne Bezy-Wendling GPU-based computational modeling of magnetic resonance imaging of vascular structures . . . . . . . . . . . . . . . 496--511 Mindaugas Radziunas Modeling and simulations of broad-area edge-emitting semiconductor devices . . 512--522 Lukasz Szustak and Kamil Halbiniak and Lukasz Kuczynski and Joanna Wrobel and Adam Kulawik Porting and optimization of solidification application for CPU-MIC hybrid platforms . . . . . . . . . . . . 523--539 Heike Jagode and Anthony Danalis and Jack Dongarra Accelerating NWChem Coupled Cluster through dataflow-based execution . . . . 540--551 Tom Scogland and David Beckingsale Introduction . . . . . . . . . . . . . . 553--554 Tom Deakin and Simon McIntosh-Smith and Matt Martineau and Wayne Gaudin An improved parallelism scheme for deterministic discrete ordinates transport . . . . . . . . . . . . . . . 555--569 Robert F. Bird and Patrick Gillies and Michael R. Bareford and Andy Herdman and Stephen Jarvis Performance Optimisation of Inertial Confinement Fusion Codes using Mini-applications . . . . . . . . . . . 570--581 Oe Bronson Messer and Ed D'Azevedo and Judy Hill and Wayne Joubert and Mark Berrill and Christopher Zimmer MiniApps derived from production HPC applications using multiple programming models . . . . . . . . . . . . . . . . . 582--593
Wesley Bland and Mattan Erez Special Issue on FTS . . . . . . . . . . 597--597 David E. Bernholdt and Wael R. Elwasif and Christos Kartsaklis and Seyong Lee and Tiffany M. Mintz Programmer-guided reliability for extreme-scale applications . . . . . . . 598--612 Faisal Shahzad and Moritz Kreutzer and Thomas Zeiser and Rui Machado and Andreas Pieper and Georg Hager and Gerhard Wellein Building and utilizing fault tolerance support tools for the GASPI applications 613--626 Simon McIntosh-Smith and Rob Hunt and James Price and Alex Warwick Vesztrocy Application-based fault tolerance techniques for sparse matrix solvers . . 627--640 Omer Subasi and Tatiana Martsinkevich and Ferad Zyulkyarov and Osman Unsal and Jesus Labarta and Franck Cappello Unified fault-tolerance framework for hybrid task-parallel message-passing applications . . . . . . . . . . . . . . 641--657 F. Rizzi and K. Morris and K. Sargsyan and P. Mycek and C. Safta and O. Le Ma\^\itre and O. Knio and B. Debusschere Partial differential equations preconditioner resilient to soft and hard faults . . . . . . . . . . . . . . 658--673 J. Ignacio Hidalgo and Francisco Fernández de Vega Special issue on ``Evolutionary Algorithms on Parallel Architectures and Distributed Infrastructures'' . . . . . 674--675 Rafael Nogueras and Carlos Cotta Analyzing self-$ \star $ island-based memetic algorithms in heterogeneous unstable environments . . . . . . . . . 676--692 Diego Teijeiro and Xoán C. Pardo and Patricia González and Julio R. Banga and Ramón Doallo Towards cloud-based parallel metaheuristics . . . . . . . . . . . . . 693--705 Francisco Chávez and Francisco Fernández de Vega and Daniel Lanza and César Benavides and Juan Villegas and Leonardo Trujillo and Gustavo Olague and Graciela Román Deploying massive runs of evolutionary algorithms with ECJ and Hadoop: Reducing interest points required for face recognition . . . . . . . . . . . . . . 706--720 Luis Acedo and Clara Burgos and José-Ignacio Hidalgo and Victor Sánchez-Alonso and Rafael-Jacinto Villanueva and Javier Villanueva-Oller Calibrating a large network model describing the transmission dynamics of the human papillomavirus using a particle swarm optimization algorithm in a distributed computing environment . . 721--728 Amogh Katti and Giuseppe Di Fatta and Thomas Naughton and Christian Engelmann Epidemic failure detection and consensus for extreme parallelism . . . . . . . . 729--743 Jeremy Iverson and George Karypis A virtual memory manager optimized for node-level cooperative multi-tasking in memory constrained systems . . . . . . . 744--759 Tarun Prabhu and William Gropp DAME: Runtime-compilation for data movement . . . . . . . . . . . . . . . . 760--774
Pavan Balaji and Kai-Cheung Leung Introduction . . . . . . . . . . . . . . 777--778 David del Rio Astorga and Manuel F. Dolz and Luis Miguel Sánchez and J. Daniel García and Marco Danelutto and Massimo Torquati Finding parallel patterns through static analysis in C++ applications . . . . . . 779--788 Baldomero Imbernón and José M. Cecilia and Horacio Pérez-Sánchez and Domingo Giménez METADOCK: a parallel metaheuristic schema for virtual screening methods . . 789--803 Javier García-Blas and Christopher Brown High-level programming for heterogeneous and hierarchical parallel systems . . . 804--806 Marco Danelutto and Peter Kilpatrick and Gabriele Mencagli and Massimo Torquati State access patterns in stream parallel computations . . . . . . . . . . . . . . 807--818 Issam Said and Pierre Fortin and Jean-Luc Lamotte and Henri Calandra Leveraging the accelerated processing units for seismic imaging: a performance and power efficiency comparison against CPUs and GPUs . . . . . . . . . . . . . 819--837 Ana Moreton-Fernandez and Hector Ortega-Arranz and Arturo Gonzalez-Escribano Controllers: an abstraction to ease the use of hardware accelerators . . . . . . 838--853 David del Rio Astorga and Manuel F. Dolz and Luis Miguel Sánchez and Javier Fernández and J. Daniel García An adaptive offline implementation selector for heterogeneous parallel platforms . . . . . . . . . . . . . . . 854--863 Italo Epicoco and Silvia Mocavero and Andrew R. Porter and Stephen M. Pickles and Mike Ashworth and Giovanni Aloisio Hybridisation strategies and data structures for the NEMO ocean model . . 864--881 Esthela Gallardo and Jérôme Vienne and Leonardo Fialho and Patricia Teller and James Browne Employing MPI\_T in MPI Advisor to optimize application performance . . . . 882--896 Mirco Altenbernd and Dominik Göddeke Soft fault detection and correction for multigrid . . . . . . . . . . . . . . . 897--912 Martin Schreiber and Pedro S. Peixoto and Terry Haut and Beth Wingate Beyond spatial scalability limitations with a massively parallel method for linear oscillatory problems . . . . . . 913--933
Hugues Digonnet and Thierry Coupez and Patrice Laure and Luisa Silva Massively parallel anisotropic mesh adaptation . . . . . . . . . . . . . . . 3--24 J. Loffeld and Jaf Hittinger On the arithmetic intensity of high-order finite-volume discretizations for hyperbolic systems of conservation laws . . . . . . . . . . . . . . . . . . 25--52 Franz Pichler and Gundolf Haase Finite element method completely implemented for graphic processor units using parallel algorithm libraries . . . 53--66 Muhammad Nufail Farooqi and Daulet Izbassarov and Metin Muradoglu and Didem Unat Communication analysis and optimization of $3$D front tracking method for multiphase flow simulations . . . . . . 67--80 Daniel S. Abdi and Lucas C. Wilcox and Timothy C. Warburton and Francis X. Giraldo A GPU-accelerated continuous and discontinuous Galerkin non-hydrostatic atmospheric model . . . . . . . . . . . 81--109 Masahiro Nakao and Hitoshi Murai and Hidetoshi Iwashita and Taisuke Boku and Mitsuhisa Sato Implementation and evaluation of the HPC challenge benchmark in the XcalableMP PGAS language . . . . . . . . . . . . . 110--123 E. Calore and A. Gabbana and Sf Schifano and R. Tripiccione Optimization of lattice Boltzmann simulations on heterogeneous computers 124--139 Jan Hückelheim and Paul Hovland and Michelle Mills Strout and Jens-Dominik Müller Reverse-mode algorithmic differentiation of an OpenMP-parallel compressible flow solver . . . . . . . . . . . . . . . . . 140--154 Tadashi Yamazaki and Jun Igarashi and Junichiro Makino and Toshikazu Ebisuzaki Real-time simulation of a cat-scale artificial cerebellum on PEZY-SC processors . . . . . . . . . . . . . . . 155--168 Bei Wang and Stephane Ethier and William Tang and Khaled Z. Ibrahim and Kamesh Madduri and Samuel Williams and Leonid Oliker Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers . . . . . . . . . . . . . 169--188 Linda Stals Algorithm-based fault recovery of adaptively refined parallel multilevel grids . . . . . . . . . . . . . . . . . 189--211 Vladimir Mironov and Alexander Moskovsky and Michael D'Mello and Yuri Alexeev An efficient MPI/OpenMP parallelization of the Hartree--Fock--Roothaan method for the first generation of Intel\reg Xeon Phi\TM processor architecture . . . 212--224
Carlos Teijeiro and Thomas Hammerschmidt and Ralf Drautz and Godehard Sutmann Optimized parallel simulations of analytic bond-order potentials on hybrid shared/distributed memory with MPI and OpenMP . . . . . . . . . . . . . . . . . 227--241 Daniel S. Abdi and Francis X. Giraldo and Emil M. Constantinescu and Lester E. Carr and Lucas C. Wilcox and Timothy C. Warburton Acceleration of the IMplicit--EXplicit nonhydrostatic unified model of the atmosphere on manycore processors . . . 242--267 Katherine J. Evans and Richard K. Archibald and David J. Gardner and Matthew R. Norman and Mark A. Taylor and Carol S. Woodward and Patrick H. Worley Performance analysis of fully explicit and fully implicit solvers within a spectral element shallow-water atmosphere model . . . . . . . . . . . . 268--284 Dingwen Tao and Sheng Di and Hanqi Guo and Zizhong Chen and Franck Cappello Z-checker: a framework for assessing lossy compression of scientific data . . 285--303 Hasan Metin Aktulga and Chris Knight and Paul Coffman and Kurt A. O'Hearn and Tzu-Ray Shan and Wei Jiang Optimizing the performance of reactive molecular dynamics simulations for many-core architectures . . . . . . . . 304--321 Anshu Dubey and Petros Tzeferacos and Don Q. Lamb The dividends of investing in computational software design: a case study . . . . . . . . . . . . . . . . . 322--331 Irina Demeshko and Jerry Watkins and Irina K. Tezaur and Oksana Guba and William F. Spotz and Andrew G. Salinger and Roger P. Pawlowski and Michael A. Heroux Toward performance portability of the Albany finite element analysis code using the Kokkos library . . . . . . . . 332--352 Elmar Peise and Paolo Bientinesi The ELAPS framework: Experimental Linear Algebra Performance Studies . . . . . . 353--365 Marc Casas and Wilfried N. Gansterer and Elias Wimmer Resilient gossip-inspired all-reduce algorithms for high-performance computing: Potential, limitations, and open questions . . . . . . . . . . . . . 366--383 Pietro Cicotti and Manu Shantharam and Laura Carrington Reducing communication in parallel graph search algorithms with software caches 384--396 Jon Calhoun and Franck Cappello and Luke N. Olson and Marc Snir and William D. Gropp Exploring the feasibility of lossy compression for PDE simulations . . . . 397--410 Andreas Müller and Michal A. Kopera and Simone Marras and Lucas C. Wilcox and Tobin Isaac and Francis X. Giraldo Strong scaling for numerical weather prediction at petascale with the atmospheric model NUMA . . . . . . . . . 411--426
Gabriele Mencagli and Felipe Mg França and Cristiana Barbosa Bentes and Leandro Augusto Justen Marzulo and Mauricio Lima Pilla Special issue on parallel applications for in-situ computing on the next-generation computing platforms . . 429--430 João Vicente Ferreira Lima and Issam Ra\"\is and Laurent Lef\`evre and Thierry Gautier Performance and energy analysis of OpenMP runtime systems with dense linear algebra algorithms . . . . . . . . . . . 431--443 Jucele França de Alencar Vasconcellos and Edson Norberto Cáceres and Henrique Mongelli and Siang Wun Song and Frank Dehne and Jayme Luiz Szwarcfiter New BSP/CGM algorithms for spanning trees . . . . . . . . . . . . . . . . . 444--461 Anderson Avila and Renata Hax Sander Reiser and Maurício Lima Pilla and Adenauer Correa Yamin Improving in situ GPU simulation of quantum computing in the D-GM environment . . . . . . . . . . . . . . 462--472 Matheus S. Serpa and Eduardo Hm Cruz and Matthias Diener and Arthur M. Krause and Philippe Oa Navaux and Jairo Panetta and Albert Farrés and Claudia Rosas and Mauricio Hanzich Optimization strategies for geophysics models on manycore systems . . . . . . . 473--486 Roman Wyrzykowski and Ewa Deelman Guest Editor's note: Special issue on challenges and solutions for porting applications to emerging high performance computing systems . . . . . 487--488 Adrian K\lusek and Adrian K\lusek and Marcin Lo\'s and Maciej Paszy\'nski and Witold Dzwinel Efficient model of tumor dynamics simulated in multi-GPU environment . . . 489--506 Vladimir Stegailov and Ekaterina Dlinnova and Timur Ismagilov and Mikhail Khalilov and Nikolay Kondratyuk and Dmitry Makagon and Alexander Semenov and Alexei Simonov and Grigory Smirnov and Alexey Timofeev Angara interconnect makes GPU-based Desmos supercomputer an efficient tool for molecular dynamics calculations . . 507--521 Daniel Langr and Tomás Dytrych and Kristina D. Launey and Jerry P. Draayer Accelerating many-nucleon basis generation for high performance computing enabled ab initio nuclear structure studies . . . . . . . . . . . 522--533 Lukasz Szustak and Pawel Bratek Performance portable parallel programming of heterogeneous stencils across shared-memory platforms with modern Intel processors . . . . . . . . 534--553 Christian Simmendinger and Roman Iakymchuk and Luis Cebamanos and Dana Akhmetova and Valeria Bartsch and Tiberiu Rotaru and Mirko Rahn and Erwin Laure and Stefano Markidis Interoperability strategies for GASPI and MPI in large-scale scientific applications . . . . . . . . . . . . . . 554--568
Nils Kohl and Johannes Hötzer and Florian Schornbaum and Martin Bauer and Christian Godenschwager and Harald Köstler and Britta Nestler and Ulrich Rüde A scalable and extensible checkpointing scheme for massively parallel simulations . . . . . . . . . . . . . . 571--589 Vaibhav Sundriyal and Kristopher Keipert and Masha Sosonkina and Mark S. Gordon Effect of frequency scaling granularity on energy-saving strategies . . . . . . 590--601 Karl-Robert Wichmann and Martin Kronbichler and Rainald Löhner and Wolfgang A. Wall Practical applicability of optimizations and performance models to complex stencil-based loop kernels in CFD . . . 602--618 Samuel Elliott and Raghu Raj Prasanna Kumar and Natasha Flyer and Tuan Ta and Richard Loft Implementation of a scalable, performance portable shallow water equation solver using radial basis function-generated finite difference methods . . . . . . . . . . . . . . . . 619--631 Hector Emilio Barrios Molano and Kamy Sepehrnoori Development of a framework for parallel reservoir simulation . . . . . . . . . . 632--650 Domenico Rea and Giansimone Perrino and Diego di Bernardo and Livia Marcellino and Diego Romano A GPU algorithm for tracking yeast cells in phase-contrast microscopy images . . 651--659 Lubomir Riha and Michal Merta and Radim Vavrik and Tomas Brzobohaty and Alexandros Markopoulos and Ondrej Meca and Ondrej Vysocky and Tomas Kozubek and Vit Vondrak A massively parallel and memory-efficient FEM toolbox with a hybrid total FETI solver with accelerator support . . . . . . . . . . 660--677 Niclas Jansson and Rahul Bale and Keiji Onishi and Makoto Tsubokura CUBE: a scalable framework for large-scale industrial simulations . . . 678--698 Thomas Heller and Bryce Adelstein Lelbach and Kevin A. Huck and John Biddiscombe and Patricia Grubel and Alice E. Koniges and Matthias Kretz and Dominic Marcello and David Pfander and Adrian Serio and Juhan Frank and Geoffrey C. Clayton and Dirk Pflüger and David Eder and Hartmut Kaiser Harnessing billions of tasks for a scalable portable hydrodynamic simulation of the merger of two stars 699--715 Andrea Borghesi and Andrea Bartolini and Michela Milano and Luca Benini Pricing schemes for energy-efficient HPC systems: Design and exploration . . . . 716--734 Kasia \'Swirydowicz and Noel Chalmers and Ali Karakus and Tim Warburton Acceleration of tensor-product operations for high-order finite element methods . . . . . . . . . . . . . . . . 735--757
Michael Mascagni CRE2017 Special Issue Introduction IJHPCA . . . . . . . . . . . . . . . . . 761--762 Line Pouchard and Sterling Baldwin and Todd Elsethagen and Shantenu Jha and Bibi Raju and Eric Stephan and Li Tang and Kerstin Kleese Van Dam Computational reproducibility of scientific workflows at extreme scales 763--776 Kento Sato and Ignacio Laguna and Gregory L. Lee and Martin Schulz and Christopher M. Chambreau and Simone Atzeni and Michael Bentley and Ganesh Gopalakrishnan and Zvonimir Rakamaric and Geof Sawaya and Joachim Protze and Dong H. Ahn Pruners: Providing reproducibility for uncovering non-deterministic errors in runs on supercomputers . . . . . . . . . 777--783 Salil Mahajan and Katherine J. Evans and Joseph H. Kennedy and Min Xu and Mathew R. Norman and Marcia L. Branstetter Ongoing solution reproducibility of earth system models as they progress toward exascale computing . . . . . . . 784--790 Roman Iakymchuk and Stef Graillat and David Defour and Enrique S. Quintana-Ortí Hierarchical approach for deriving a reproducible unblocked $ L U $ factorization . . . . . . . . . . . . . 791--803 Sergio Iserte and Héctor Martínez and Sergio Barrachina and Maribel Castillo and Rafael Mayo and Antonio J. Peña Dynamic reconfiguration of noniterative scientific applications: a case study with HPG aligner . . . . . . . . . . . . 804--816 Markus Huber and Ulrich Rüde and Barbara Wohlmuth Adaptive control in roll-forward recovery for extreme scale multigrid . . 817--837 Nikola Tchipev and Steffen Seckler and Matthias Heinen and Jadran Vrabec and Fabio Gratl and Martin Horsch and Martin Bernreuther and Colin W. Glass and Christoph Niethammer and Nicolay Hammer and Bernd Krischok and Michael Resch and Dieter Kranzlmüller and Hans Hasse and Hans-Joachim Bungartz and Philipp Neumann TweTriS: Twenty trillion-atom simulation 838--854 Stefan Lemvig Glimberg and Allan Peter Engsig-Karup and Luke N. Olson A massively scalable distributed multigrid framework for nonlinear marine hydrodynamics . . . . . . . . . . . . . 855--868 Masahiro Nakao and Tetsuya Odajima and Hitoshi Murai and Akihiro Tabuchi and Norihisa Fujita and Toshihiro Hanawa and Taisuke Boku and Mitsuhisa Sato Evaluation of XcalableACC with tightly coupled accelerators/InfiniBand hybrid communication on accelerated cluster . . 869--884 Milos Ivanovi\'c and Ana Kaplarevi\'c-Malisi\'c and Boban Stojanovi\'c and Marina Svicevi\'c and Srboljub M. Mijailovich Machine learned domain decomposition scheme applied to parallel multi-scale muscle simulation . . . . . . . . . . . 885--896 Andrew C. Kirby and Michael J. Brazell and Zhi Yang and Rajib Roy and Behzad R. Ahrabi and Michael K. Stoellinger and Jay Sitaraman and Dimitri J. Mavriplis Wind farm simulations using an overset hp-adaptive approach with blade-resolved turbine models . . . . . . . . . . . . . 897--923 Katharina Kormann and Klaus Reuter and Markus Rampp A massively parallel semi-Lagrangian solver for the six-dimensional Vlasov--Poisson equation . . . . . . . . 924--947 David Strelák and Carlos Óscar S. Sorzano and José María Carazo and Jirí Filipovic A GPU acceleration of $3$-D Fourier reconstruction in cryo-EM . . . . . . . 948--959 Pierre Fortin and Maxime Touche Dual tree traversal on integrated GPUs for astrophysical $N$-body simulations 960--972 Dominic E. Charrier and Benjamin Hazelwood and Ekaterina Tutlyaeva and Michael Bader and Michael Dumbser and Andrey Kudryavtsev and Alexander Moskovsky and Tobias Weinzierl Studies on the energy and deep memory behaviour of a cache-oblivious, task-based hyperbolic PDE solver . . . . 973--986 Jorge Ejarque and Marc Domínguez and Rosa M. Badia A hierarchic task-based programming model for distributed heterogeneous computing . . . . . . . . . . . . . . . 987--997 Ibrahim Al-Kharusi and David W. Walker Locality properties of $3$D data orderings with application to parallel molecular dynamics simulations . . . . . 998--1018 Mohammad Y. Al-Shorman and Majd M. Al-Kofahi Ultrasonic pulse propagation simulation using OpenCL for environment mapping and discovery . . . . . . . . . . . . . . . 1019--1029 John M. Dennis and Brian Dobbins and Christopher Kerr and Youngsung Kim Optimizing the HOMME dynamical core for multicore platforms . . . . . . . . . . 1030--1045 Ichitaro Yamazaki and Akihiro Ida and Rio Yokota and Jack Dongarra Distributed-memory lattice $H$-matrix factorization . . . . . . . . . . . . . 1046--1063
Jack Dongarra and Bernard Tourancheau Guest editors' note: Special issue on clusters, clouds, and data for scientific computing . . . . . . . . . . 1067--1068 Hartwig Anzt and Goran Flegar and Thomas Grützmacher and Enrique S. Quintana-Ortí Toward a modular precision ecosystem for high-performance computing . . . . . . . 1069--1078 Mark Endrei and Chao Jin and Minh Ngoc Dinh and David Abramson and Heidi Poxon and Luiz Derose and Bronis R. de Supinski Statistical and machine learning models for optimizing energy in parallel applications . . . . . . . . . . . . . . 1079--1097 Jungwon Kim and Jeffrey S. Vetter Implementing efficient data compression and encryption in a persistent key--value store for HPC . . . . . . . . 1098--1112 Heike Jagode and Anthony Danalis and Hartwig Anzt and Jack Dongarra PAPI software-defined events for in-depth performance analysis . . . . . 1113--1127 Ewa Deelman and Anirban Mandal and Ming Jiang and Rizos Sakellariou The role of machine learning in scientific workflows . . . . . . . . . . 1128--1139 Ana Gainaru and Hongyang Sun and Guillaume Aupy and Yuankai Huo and Bennett A. Landman and Padma Raghavan On-the-fly scheduling versus reservation-based scheduling for unpredictable workflows . . . . . . . . 1140--1158 Daniel Balouek-Thomert and Eduard Gibert Renart and Ali Reza Zamani and Anthony Simonet and Manish Parashar Towards a computing continuum: Enabling edge-to-cloud integration for data-driven workflows . . . . . . . . . 1159--1174 Dylan Chapp and Danny Rorabaugh and Kento Sato and Dong H. Ahn and Michela Taufer A three-phase workflow for general and expressive representations of nondeterminism in HPC applications . . . 1175--1184 Guillaume Aupy and Brice Goglin and Valentin Honoré and Bruno Raffin Modeling high-throughput applications for in situ analytics . . . . . . . . . 1185--1200 Franck Cappello and Sheng Di and Sihuan Li and Xin Liang and Ali Murat Gok and Dingwen Tao and Chun Hong Yoon and Xin-Chuan Wu and Yuri Alexeev and Frederic T. Chong Use cases of lossy compression for floating-point data in scientific data sets . . . . . . . . . . . . . . . . . . 1201--1220 Guillaume Aupy and Anne Benoit and Brice Goglin and Lo\"\ic Pottier and Yves Robert Co-scheduling HPC workloads on cache-partitioned CMP platforms . . . . 1221--1239 Alexandre Denis and Julien Jaeger and Emmanuel Jeannot and Marc Pérache and Hugo Taboada Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor . . . . . . . . . . . . . . . 1240--1254 Li Han and Valentin Le F\`evre and Louis-Claude Canon and Yves Robert and Frédéric Vivien A generic approach to scheduling and checkpointing workflows . . . . . . . . 1255--1274 Anand Venkat and Tharindu Rusira and Raj Barik and Mary Hall and Leonard Truong SWIRL: High-performance many-core CPU code generation for deep neural networks 1275--1289 Thiago Sfx Teixeira and William Gropp and David Padua Managing code transformations for better performance portability . . . . . . . . 1290--1306 Anonymous Corrigendum to A failure detector for HPC platforms . . . . . . . . . . . . . NP1--NP1
José M. Cecilia Guest Editors' note: Special issue on novel high-performance computing algorithms and platforms in bioinformatics . . . . . . . . . . . . . 3--4 Javier Prades and Baldomero Imbernón and Carlos Reaño and Jorge Peña-García and Jose Pedro Cerón-Carrasco and Federico Silla and Horacio Pérez-Sánchez Maximizing resource usage in multifold molecular dynamics with rCUDA . . . . . 5--19 Christian Ponte-Fernández and Jorge González-Domínguez and María J. Martín Fast search of third-order epistatic interactions on CPU and GPU clusters . . 20--29 Baldomero Imbernón and Antonio Llanes and José-Matías Cutillas-Lozano and Domingo Giménez HYPERDOCK: Improving virtual screening through parallel hyperheuristics . . . . 30--41 Marta Garcia-Gasulla and Filippo Mantovani and Marc Josep-Fabrego and Beatriz Eguzkitza and Guillaume Houzeaux Runtime mechanisms to survive new HPC architectures: a use case in human respiratory simulations . . . . . . . . 42--56 César González and Mariano Pérez and Juan M. Orduña and Javier Chaves and Ana-Bárbara García HPG-HMapper: a DNA hydroxymethylation analysis tool . . . . . . . . . . . . . 57--65 Akrem Benatia and Weixing Ji and Yizhuo Wang and Feng Shi Sparse matrix partitioning for optimizing SpMV on CPU--GPU heterogeneous platforms . . . . . . . . 66--80 José M. Mantas and Francesco Vecil Hybrid OpenMP--CUDA parallel implementation of a deterministic solver for ultrashort DG-MOSFETs . . . . . . . 81--102 Pawe\l Russek and Pawe\l Russek and Ernest Jamro and Agnieszka Dabrowska-Boruch and Kazimierz Wiatr A study of the loops control for reconfigurable computing with OpenCL in the LABS local search problem . . . . . 103--114 Daobi Chen and Liang Yuan and Yunquan Zhang and Jingfu Yan and David Kahaner HPC software capability landscape in China . . . . . . . . . . . . . . . . . 115--153
Jue Wang and XinFu He Special issue on advanced simulation in engineering . . . . . . . . . . . . . . 157--158 Xinming Qin and Honghui Shang and Lei Xu and Wei Hu and Jinlong Yang and Shigang Li and Yunquan Zhang The static parallel distribution algorithms for hybrid density-functional calculations in HONPAS package . . . . . 159--168 Xiaodong Hu and Zhonghua Lu and Jian Zhang and Xiazhen Liu and Wu Yuan and Shan Liang and Haikuo Zhang A parallel algorithm for chimera grid with implicit hole cutting method . . . 169--177 Xianmeng Wang and Zhifeng Zhou and Changjun Hu and Wen Yang and Minfu Zhao and Zhaoshun Wang and Peng Shi Accelerating and tuning small matrix multiplications on Sunway TaihuLight: a case study of spectral element CFD Code Nek5000 . . . . . . . . . . . . . . . . 178--186 Gary Lawson and Masha Sosonkina and Tal Ezer and Yuzhong Shen Applying EMD/HHT analysis to power traces of applications executed on systems with Intel Xeon Phi . . . . . . 187--198 Mario Hernández and Juan M. Cebrián and José M. Cecilia and José M. García Offloading strategies for Stencil kernels on the KNC Xeon Phi architecture: Accuracy versus performance . . . . . . . . . . . . . . 199--207 Atsushi Hori and Kazumi Yoshinaga and Thomas Herault and Aurélien Bouteiller and George Bosilca and Yutaka Ishikawa Overhead of using spare nodes . . . . . 208--226 Jean Luca Bez and André Ramos Carneiro and Pablo José Pavan and Valéria Soldera Girelli and Francieli Zanon Boito and Bruno Alves Fagundes and Carla Osthoff and Pedro Leite da Silva Dias and Jean-François Méhaut and Philippe Oa Navaux I/O performance of the Santos Dumont supercomputer . . . . . . . . . . . . . 227--245 Louis-Claude Canon and Aurélie Kong Win Chang and Yves Robert and Frédéric Vivien Scheduling independent stochastic tasks under deadline and budget constraints 246--264
Roberto Porc\`u and Edie Miglio and Nicola Parolini and Mattia Penati and Noemi Vergopolan HPC simulations of brownout: a noninteracting particles dynamic model 267--281 Byron E. Moutafis and George A. Gravvanis and Christos K. Filelis-Papadopoulos Hybrid multi-projection method using sparse approximate inverses on GPU clusters . . . . . . . . . . . . . . . . 282--305 Tieqiang Mo and Renfa Li Iteratively solving sparse linear system based on PaRSEC task scheduling . . . . 306--315 David Zwick and S. Balachandar A scalable Euler--Lagrange approach for multiphase flow simulation on spectral elements . . . . . . . . . . . . . . . . 316--339 Sébastien Cayrols and Iain S. Duff and Florent Lopez Parallelization of the solve phase in a task-based Cholesky solver using a sequential task flow model . . . . . . . 340--356 Jcs Kadupitiya and Geoffrey C. Fox and Vikram Jadhao Machine learning for parameter auto-tuning in molecular dynamics simulations: Efficient dynamics of ions near polarizable nanoparticles . . . . . 357--374
Kadir Akbudak and Hatem Ltaief and Vincent Etienne and Rached Abdelkhalak and Thierry Tonellot and David Keyes Asynchronous computations for solving the acoustic wave propagation equation 377--393 Yang Liu and Wissam Sid-Lakhdar and Elizaveta Rebrova and Pieter Ghysels and Xiaoye Sherry Li A parallel hierarchical blocked adaptive cross approximation algorithm . . . . . 394--408 Tom Peterka and Deborah Bard and Janine C. Bennett and E. Wes Bethel and Ron A. Oldfield and Line Pouchard and Christine Sweeney and Matthew Wolf Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources . . 409--427 Francesco Cremonesi and Georg Hager and Gerhard Wellein and Felix Schürmann Analytic performance modeling and analysis of detailed neuron simulations 428--449 Kevin Verma and Christopher Mccabe and Chong Peng and Robert Wille A PCISPH implementation using distributed multi-GPU acceleration for simulating industrial engineering applications . . . . . . . . . . . . . . 450--464 A. Grannan and K. Sood and B. Norris and A. Dubey Understanding the landscape of scientific software used on high-performance computing platforms . . 465--477 Anonymous Thanks to Reviewers . . . . . . . . . . 478--478 Anonymous Corrigendum . . . . . . . . . . . . . . NP1--NP1 Anonymous Corrigendum . . . . . . . . . . . . . . NP2--NP2
Walid Keyrouz and Michael Mascagni CRE2019 Special Issue Introduction IJHPCA . . . . . . . . . . . . . . . . . 481--482 David H. Bailey Reproducibility and variable precision computing . . . . . . . . . . . . . . . 483--490 Gregory Kiar and Pablo de Oliveira Castro and Pierre Rioux and Eric Petit and Shawn T. Brown and Alan C. Evans and Tristan Glatard Comparing perturbation models for evaluating stability of neuroimaging pipelines . . . . . . . . . . . . . . . 491--501 Roman Iakymchuk and Maria Barreda Vayá and Stef Graillat and José I. Aliaga and Enrique S. Quintana-Ortí Reproducibility of parallel preconditioned conjugate gradient in hybrid programming environments . . . . 502--518 Brett Neuman and Andy Dubois and Laura Monroe and Robert W. Robey Fast, good, and repeatable: Summations, vectorization, and reproducibility . . . 519--531 Srdan Nikoli\'c and Nenad Stevanovi\'c and Milos Ivanovi\'c Optimizing parallel particle tracking in Brownian motion using machine learning 532--546 Amanda Bienz and William D. Gropp and Luke N. Olson Reducing communication in algebraic multigrid with multi-step node aware communication . . . . . . . . . . . . . 547--561 Paul Fischer and Misun Min and Thilina Rathnayake and Som Dutta and Tzanio Kolev and Veselin Dobrev and Jean-Sylvain Camier and Martin Kronbichler and Tim Warburton and Kasia \'Swirydowicz and Jed Brown Scalability of high-performance PDE solvers . . . . . . . . . . . . . . . . 562--586
James D. Stevens and Andreas Klöckner A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modeling . . . . . . . . . . 589--614 Masaki Iwasawa and Daisuke Namekata and Ryo Sakamoto and Takashi Nakamura and Yasuyuki Kimura and Keigo Nitadori and Long Wang and Miyuki Tsubouchi and Jun Makino and Zhao Liu and Haohuan Fu and Guangwen Yang Implementation and performance of Barnes--Hut $n$-body algorithm on extreme-scale heterogeneous many-core architectures . . . . . . . . . . . . . 615--628 Tianjiao Sun and Lawrence Mitchell and Kaushik Kulkarni and Andreas Klöckner and David A. Ham and Paul H. J. Kelly A study of vectorization for matrix-free finite element methods . . . . . . . . . 629--644 Mohammed Al Farhan and Ahmad Abdelfattah and Stanimire Tomov and Mark Gates and Dalal Sukkari and Azzam Haidar and Robert Rosenberg and Jack Dongarra MAGMA templates for scalable linear algebra on emerging architectures . . . 645--658 Cristian Ramon-Cortes and Ramon Amela and Jorge Ejarque and Philippe Clauss and Rosa M. Badia \pkgAutoParallel: Automatic parallelisation and distributed execution of affine loop nests in Python 659--675 Hank Childs and Sean D. Ahern and James Ahrens and Andrew C. Bauer and Janine Bennett and E. Wes Bethel and Peer-Timo Bremer and Eric Brugger and Joseph Cottam and Matthieu Dorier and Soumya Dutta and Jean M. Favre and Thomas Fogal and Steffen Frey and Christoph Garth and Berk Geveci and William F. Godoy and Charles D. Hansen and Cyrus Harrison and Bernd Hentschel and Joseph Insley and Chris R. Johnson and Scott Klasky and Aaron Knoll and James Kress and Matthew Larsen and Jay Lofstead and Kwan-Liu Ma and Preeti Malakar and Jeremy Meredith and Kenneth Moreland and Paul Navrátil and Patrick O'Leary and Manish Parashar and Valerio Pascucci and John Patchett and Tom Peterka and Steve Petruzza and Norbert Podhorszki and David Pugmire and Michel Rasquin and Silvio Rizzi and David H. Rogers and Sudhanshu Sane and Franz Sauer and Robert Sisneros and Han-Wei Shen and Will Usher and Rhonda Vickery and Venkatram Vishwanath and Ingo Wald and Ruonan Wang and Gunther H. Weber and Brad Whitlock and Matthew Wolf and Hongfeng Yu and Sean B. Ziegeler A terminology for in situ visualization and analysis systems . . . . . . . . . . 676--691
Roman Wyrzykowski and Ewa Deelman Guest editor's note: Special issue on application performance optimization in the era of extreme heterogeneity . . . . 3--4 Dominik Ernst and Georg Hager and Jonas Thies and Gerhard Wellein Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs . . . . . 5--19 Krzysztof Jurczuk and Marcin Czajkowski and Marek Kretowski Fitness evaluation reuse for accelerating GPU-based evolutionary induction of decision trees . . . . . . 20--32 Krzysztof Rojek and Kamil Halbiniak and Lukasz Kuczynski CFD code adaptation to the FPGA architecture . . . . . . . . . . . . . . 33--46 Hannah Morgan and Patrick Sanan and Matthew Knepley and Richard Tran Mills Understanding performance variability in standard and pipelined parallel Krylov solvers . . . . . . . . . . . . . . . . 47--59 Andreas Pieper and Georg Hager and Holger Fehske A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials . . . . 60--77 Yutong Ye and Hongyin Zhu and Chaoying Zhang and Binghai Wen Efficient graphic processing unit implementation of the chemical-potential multiphase lattice Boltzmann method . . 78--96 Bartosz Kohnke and Carsten Kutzner and Andreas Beckmann and Gert Lube and Ivo Kabadshow and Holger Dachsel and Helmut Grubmüller A CUDA fast multipole method with highly efficient M2L far field evaluation . . . 97--117
Wenpeng Ma and Xiao-Chuan Cai Point-block incomplete $ L U $ preconditioning with asynchronous iterations on GPU for multiphysics problems . . . . . . . . . . . . . . . . 121--135 Ross Adelman Highly parallel boundary element method for solving extremely large, wide-area power-line models . . . . . . . . . . . 136--153 Weijian Zheng and Dali Wang and Fengguang Song Designing a parallel Feel-the-Way clustering algorithm on HPC systems . . 154--169 Vedran Novakovi\'c and Sanja Singer Implicit Hari--Zimmermann algorithm for the generalized SVD on the GPUs . . . . 170--205
Peter Benner and Enrique Quintana-Ortí and Jens Saak Introduction to the Special Issue related to the Power-Aware Computing Workshop 2019-PACO 2019 . . . . . . . . 209--210 Jonas Dünnebacke and Stefan Turek and Christoph Lohmann and Andriy Sokolov and Peter Zajac Increased space-parallelism via time-simultaneous Newton-multigrid methods for nonstationary nonlinear PDE problems . . . . . . . . . . . . . . . . 211--225 Pratik Nayak and Terry Cojean and Hartwig Anzt Evaluating asynchronous Schwarz solvers on GPUs . . . . . . . . . . . . . . . . 226--236 Axel Klawonn and Martin Lanser and Oliver Rheinbach and Gerhard Wellein and Markus Wittmann Energy efficiency of nonlinear domain decomposition methods . . . . . . . . . 237--253 Ernesto Dufrechou and Pablo Ezzatti and Enrique S. Quintana-Ortí Selecting optimal SpMV realizations for GPUs via machine learning . . . . . . . 254--267 Maria Barreda and Manuel F. Dolz and M. Asunción Castaño Convolutional neural nets for estimating the run time and energy consumption of the sparse matrix--vector product . . . 268--281
Tommaso Benacchio and Luca Bonaventura and Mirco Altenbernd and Chris D. Cantwell and Peter D. Düben and Mike Gillard and Luc Giraud and Dominik Göddeke and Erwan Raffin and Keita Teranishi and Nils Wedi Resilience and fault tolerance in high-performance computing for numerical weather and climate prediction . . . . . 285--311 Nikolay Kondratyuk and Vsevolod Nikolskiy and Daniil Pavlov and Vladimir Stegailov GPU-accelerated molecular dynamics: State-of-art software performance and porting from Nvidia CUDA to AMD HIP . . 312--324 Hatem Elshazly and Francesc Lordan and Jorge Ejarque and Rosa M. Badia Accelerated execution via eager-release of dependencies in task-based workflows 325--343 Ahmad Abdelfattah and Hartwig Anzt and Erik G. Boman and Erin Carson and Terry Cojean and Jack Dongarra and Alyson Fox and Mark Gates and Nicholas J. Higham and Xiaoye S. Li and Jennifer Loe and Piotr Luszczek and Srikara Pranesh and Siva Rajamanickam and Tobias Ribizel and Barry F. Smith and Kasia \'Swirydowicz and Stephen Thomas and Stanimire Tomov and Yaohung M. Tsai and Ulrike Meier Yang A survey of numerical linear algebra methods utilizing mixed-precision arithmetic . . . . . . . . . . . . . . . 344--369 Karl-Robert Wichmann and Martin Kronbichler and Rainald Löhner and Wolfgang A. Wall A runtime based comparison of highly tuned lattice Boltzmann and finite difference solvers . . . . . . . . . . . 370--390 Shu-Mei Tseng and Bogdan Nicolae and Franck Cappello and Aparna Chandramowlishwaran Demystifying asynchronous I/O Interference in HPC applications . . . . 391--412 Markus Holzer and Martin Bauer and Harald Köstler and Ulrich Rüde Highly efficient lattice Boltzmann multiphase simulations of immiscible fluids at high-density ratios on CPUs and GPUs through code generation . . . . 413--427
Bronis R. de Supinski Special Issue Introduction: The Gordon Bell Special Prize for HPC-Based COVID-19 Research Finalists . . . . . . 431--431 Lorenzo Casalino and Abigail C. Dommer and Zied Gaieb and Emilia P. Barros and Terra Sztain and Surl-Hee Ahn and Anda Trifan and Alexander Brace and Anthony T. Bogetti and Austin Clyde and Heng Ma and Hyungro Lee and Matteo Turilli and Syma Khalid and Lillian T. Chong and Carlos Simmerling and David J. Hardy and Julio Dc Maia and James C. Phillips and Thorsten Kurth and Abraham C. Stern and Lei Huang and John D. Mccalpin and Mahidhar Tatineni and Tom Gibbs and John E. Stone and Shantenu Jha and Arvind Ramanathan and Rommie E. Amaro AI-driven multiscale simulations illuminate mechanisms of SARS-CoV-2 spike dynamics . . . . . . . . . . . . . 432--451 Jens Glaser and Josh V. Vermaas and David M. Rogers and Jeff Larkin and Scott Legrand and Swen Boehm and Matthew B. Baker and Aaron Scheinberg and Andreas F. Tillack and Mathialakan Thavappiragasam and Ada Sedova and Oscar Hernandez High-throughput virtual laboratory for drug discovery using massive datasets 452--468 Sam Ade Jacobs and Tim Moon and Kevin Mcloughlin and Derek Jones and David Hysom and Dong H. Ahn and John Gyllenhaal and Pythagoras Watson and Felice C. Lightstone and Jonathan E. Allen and Ian Karlin and Brian Van Essen Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative models . . . . . 469--482 Jonathan Ozik and Justin M. Wozniak and Nicholson Collier and Charles M. Macal and Mickaël Binois A population data-driven workflow for COVID-19 modeling and learning . . . . . 483--499
Timothy C. Germann Co-design in the Exascale Computing Project . . . . . . . . . . . . . . . . 503--507 Weiqun Zhang and Andrew Myers and Kevin Gott and Ann Almgren and John Bell AMReX: Block-structured adaptive mesh refinement for multiphysics applications 508--526 Tzanio Kolev and Paul Fischer and Misun Min and Jack Dongarra and Jed Brown and Veselin Dobrev and Tim Warburton and Stanimire Tomov and Mark S. Shephard and Ahmad Abdelfattah and Valeria Barra and Natalie Beams and Jean-Sylvain Camier and Noel Chalmers and Yohann Dudouit and Ali Karakus and Ian Karlin and Stefan Kerkemeier and Yu-Hsiang Lan and David Medina and Elia Merzari and Aleksandr Obabko and Will Pazner and Thilina Rathnayake and Cameron W. Smith and Lukas Spies and Kasia \'Swirydowicz and Jeremy Thompson and Ananias Tomboulides and Vladimir Tomov Efficient exascale discretizations: High-order finite element methods . . . 527--552 Seher Acer and Ariful Azad and Erik G. Boman and Aydin Buluç and Karen D. Devine and Sm Ferdous and Nitin Gawande and Sayan Ghosh and Mahantesh Halappanavar and Ananth Kalyanaraman and Arif Khan and Marco Minutoli and Alex Pothen and Sivasankaran Rajamanickam and Oguz Selvitopi and Nathan R. Tallent and Antonino Tumeo EXAGRAPH: Graph and combinatorial methods for enabling exascale applications . . . . . . . . . . . . . . 553--571 Susan M. Mniszewski and James Belak and Jean-Luc Fattebert and Christian Fa Negre and Stuart R. Slattery and Adetokunbo A. Adedoyin and Robert F. Bird and Choongseok Chang and Guangye Chen and Stéphane Ethier and Shane Fogerty and Salman Habib and Christoph Junghans and Damien Lebrun-Grandié and Jamaludin Mohd-Yusof and Stan G. Moore and Daniel Osei-Kuffuor and Steven J. Plimpton and Adrian Pope and Samuel Temple Reeve and Lee Ricketson and Aaron Scheinberg and Amil Y. Sharma and Michael E. Wall Enabling particle applications for exascale computing platforms . . . . . . 572--597 Francis J. Alexander and James Ang and Jenna A. Bilbrey and Jan Balewski and Tiernan Casey and Ryan Chard and Jong Choi and Sutanay Choudhury and Bert Debusschere and Anthony M. Degennaro and Nikoli Dryden and J. Austin Ellis and Ian Foster and Cristina Garcia Cardona and Sayan Ghosh and Peter Harrington and Yunzhi Huang and Shantenu Jha and Travis Johnston and Ai Kagawa and Ramakrishnan Kannan and Neeraj Kumar and Zhengchun Liu and Naoya Maruyama and Satoshi Matsuoka and Erin Mccarthy and Jamaludin Mohd-Yusof and Peter Nugent and Yosuke Oyama and Thomas Proffen and David Pugmire and Sivasankaran Rajamanickam and Vinay Ramakrishniah and Malachi Schram and Sudip K. Seal and Ganesh Sivaraman and Christine Sweeney and Li Tan and Rajeev Thakur and Brian Van Essen and Logan Ward and Paul Welch and Michael Wolf and Sotiris S. Xantheas and Kevin G. Yager and Shinjae Yoo and Byung-Jun Yoon Co-design Center for Exascale Machine Learning Technologies (ExaLearn) . . . . 598--616 Ian Foster and Mark Ainsworth and Julie Bessac and Franck Cappello and Jong Choi and Sheng Di and Zichao Di and Ali M. Gok and Hanqi Guo and Kevin A. Huck and Christopher Kelly and Scott Klasky and Kerstin Kleese van Dam and Xin Liang and Kshitij Mehta and Manish Parashar and Tom Peterka and Line Pouchard and Tong Shu and Ozan Tugluk and Hubertus van Dam and Lipeng Wan and Matthew Wolf and Justin M. Wozniak and Wei Xu and Igor Yakushin and Shinjae Yoo and Todd Munson Online data analysis and reduction: an important Co-design motif for extreme-scale computers . . . . . . . . 617--635
Thomas M. Evans and Julia C. White Multiphysics coupling in the Exascale Computing Project . . . . . . . . . . . 3--4 Thomas M. Evans and Andrew Siegel and Erik W. Draeger and Jack Deslippe and Marianne M. Francois and Timothy C. Germann and William E. Hart and Daniel F. Martin A survey of software implementations used by application codes in the Exascale Computing Project . . . . . . . 5--12 John A. Turner and James Belak and Nathan Barton and Matthew Bement and Neil Carlson and Robert Carson and Stephen Dewitt and Jean-Luc Fattebert and Neil Hodge and Zechariah Jibben and Wayne King and Lyle Levine and Christopher Newman and Alex Plotkowski and Balasubramaniam Radhakrishnan and Samuel Temple Reeve and Matthew Rolchigo and Adrian Sabau and Stuart Slattery and Benjamin Stump ExaAM: Metal additive manufacturing simulation at the fidelity of the microstructure . . . . . . . . . . . . . 13--39 Jordan Musser and Ann S. Almgren and William D. Fullmer and Oscar Antepara and John B. Bell and Johannes Blaschke and Kevin Gott and Andrew Myers and Roberto Porcu and Deepak Rangarajan and Michele Rosso and Weiqun Zhang and Madhava Syamlal MFIX-Exa: a path toward exascale CFD-DEM simulations . . . . . . . . . . . . . . 40--58 J. Austin Harris and Ran Chu and Sean M. Couch and Anshu Dubey and Eirik Endeve and Antigoni Georgiadou and Rajeev Jain and Daniel Kasen and M. P. Laiu and Oe B. Messer and Jared O'Neal and Michael A. Sandoval and Klaus Weide Exascale models of stellar explosions: Quintessential multi-physics simulation 59--77 David Mccallen and Houjun Tang and Suiwen Wu and Eric Eckert and Junfei Huang and N. Anders Petersson Coupling of regional geophysics and local soil-structure models in the EQSIM fault-to-structure earthquake simulation framework . . . . . . . . . . . . . . . 78--92 Matthew R. Norman and David A. Bader and Christopher Eldred and Walter M. Hannah and Benjamin R. Hillman and Christopher R. Jones and Jungmin M. Lee and Lr Leung and Isaac Lyngaas and Kyle G. Pressel and Sarat Sreepathi and Mark A. Taylor and Xingqiu Yuan Unprecedented cloud resolution in a GPU-enabled full-physics atmospheric climate simulation on OLCF's Summit supercomputer . . . . . . . . . . . . . 93--105 Eric Suchyta and Scott Klasky and Norbert Podhorszki and Matthew Wolf and Abolaji Adesoji and Cs Chang and Jong Choi and Philip E. Davis and Julien Dominski and Stéphane Ethier and Ian Foster and Kai Germaschewski and Berk Geveci and Chris Harris and Kevin A. Huck and Qing Liu and Jeremy Logan and Kshitij Mehta and Gabriele Merlo and Shirley V. Moore and Todd Munson and Manish Parashar and David Pugmire and Mark S. Shephard and Cameron W. Smith and Pradeep Subedi and Lipeng Wan and Ruonan Wang and Shuangxi Zhang The Exascale Framework for High Fidelity coupled Simulations (EFFIS): Enabling whole device modeling in fusion science 106--128
John A. Taylor and Pablo Larraondo and Bronis R. de Supinski Data-driven global weather predictions at high resolutions . . . . . . . . . . 130--140 Pascal R. Bähr and Bruno Lang and Peer Ueberholz and Marton Ady and Roberto Kersevan Development of a hardware-accelerated simulation kernel for ultra-high vacuum with Nvidia RTX GPUs . . . . . . . . . . 141--152 Giovanni Isotton and Carlo Janna and Massimo Bernaschi A GPU-accelerated adaptive FSAI preconditioner for massively parallel simulations . . . . . . . . . . . . . . 153--166 Zhi Yao and Revathi Jambunathan and Yadong Zeng and Andrew Nonaka A massively parallel time-domain coupled electrodynamics-micromagnetics solver 167--181 Yuta Hirokawa and Atsushi Yamada and Shunsuke Yamada and Masashi Noda and Mitsuharu Uemoto and Taisuke Boku and Kazuhiro Yabana Large-scale ab initio simulation of light-matter interaction at the atomic scale in Fugaku . . . . . . . . . . . . 182--197 Mojtaba Barzegari and Liesbet Geris Highly scalable numerical simulation of coupled reaction--diffusion systems with moving interfaces . . . . . . . . . . . 198--213 Isaac Lyngaas and Matthew Norman and Youngsung Kim SAM++: Porting the E3SM-MMF cloud resolving model using a C++ portability library . . . . . . . . . . . . . . . . 214--230 Leigh Lapworth Parallel encryption of input and output data for HPC applications . . . . . . . 231--250 Emmanuel Agullo and Mirco Altenbernd and Hartwig Anzt and Leonardo Bautista-Gomez and Tommaso Benacchio and Luca Bonaventura and Hans-Joachim Bungartz and Sanjay Chatterjee and Florina M. Ciorba and Nathan Debardeleben and Daniel Drzisga and Sebastian Eibl and Christian Engelmann and Wilfried N. Gansterer and Luc Giraud and Dominik Göddeke and Marco Heisig and Fabienne Jézéquel and Nils Kohl and Xiaoye Sherry Li and Romain Lion and Miriam Mehl and Paul Mycek and Michael Obersteiner and Enrique S. Quintana-Ortí and Francesco Rizzi and Ulrich Rüde and Martin Schulz and Fred Fung and Robert Speck and Linda Stals and Keita Teranishi and Samuel Thibault and Dominik Thönnes and Andreas Wagner and Barbara Wohlmuth Resiliency in numerical algorithm design for extreme scale simulations . . . . . 251--285
Stephen Herbein and Tapasya Patki and Dong H. Ahn and Sebastian Mobo and Clark Hathaway and Silvina Caíno-Lores and James Corbett and David Domyancic and Thomas RW Scogland and Bronis R. de Supinski and Michela Taufer An analytical performance model of generalized hierarchical scheduling . . 289--306 Roel Van Beeumen and Khaled Z. Ibrahim and Gregory D. Kahanamoku-Meyer and Norman Y. Yao and Chao Yang Enhancing scalability of a matrix-free eigensolver for studying many-body localization . . . . . . . . . . . . . . 307--319 Michiel Van Gendt and Tim Besard and Stefaan Vandenberghe and Bjorn De Sutter Productively accelerating positron emission tomography image reconstruction on graphics processing units with Julia 320--336 Jatin Gharat and Bipin Kumar and Leena Ragha and Amit Barve and Shaik Mohammad Jeelani and John Clyne Development of NCL equivalent serial and parallel Python routines for meteorological data analysis . . . . . . 337--355 Adrian P. Dieguez and Margarita Amor and Ramón Doallo and Akira Nukada and Satoshi Matsuoka Efficient high-precision integer multiplication on the GPU . . . . . . . 356--369 Ii Michael R. Wyatt and Stephen Herbein and Todd Gamblin and Michela Taufer AI4IO: a suite of AI-based tools for IO-aware scheduling . . . . . . . . . . 370--387 Heather Pacella and Alec Dunton and Alireza Doostan and Gianluca Iaccarino Task-parallel in situ temporal compression of large-scale computational fluid dynamics data . . . . . . . . . . 388--418 Pablo Antonio Martínez and Biagio Peccerillo and Sandro Bartolini and José M. García and Gregorio Bernabé Performance portability in a real world application: PHAST applied to Caffe . . 419--439
Andrew Kassen and Varun Shankar and Aaron L. Fogelson A fine-grained parallelization of the immersed boundary method . . . . . . . . 443--458 Dustin Ruda and Stefan Turek and Dirk Ribbrock and Peter Zajac Very fast finite element Poisson solvers on lower precision accelerator hardware: a proof of concept study for Nvidia Tesla V100 . . . . . . . . . . . . . . . 459--474 Hiroyuki Ootomo and Rio Yokota Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance . . . 475--491 Arturo Vargas and Thomas M. Stitt and Kenneth Weiss and Vladimir Z. Tomov and Jean-Sylvain Camier and Tzanio Kolev and Robert N. Rieben Matrix-free approaches for GPU acceleration of a high-order finite element hydrodynamics application using MFEM, Umpire, and RAJA . . . . . . . . . 492--509 R Lily Hu and Damien Pierce and Yusef Shafi and Anudhyan Boral and Vladimir Anisimov and Sella Nevo and Yi-fan Chen Accelerating physics simulations with tensor processing units: an inundation modeling example . . . . . . . . . . . . 510--523 Marcin Rogowski and Lisandro Dalcin and Matteo Parsani and David E. Keyes Performance analysis of relaxation Runge--Kutta methods . . . . . . . . . . 524--542 Sebastian Friedemann and Bruno Raffin An elastic framework for ensemble-based large-scale data assimilation . . . . . 543--563 Anonymous Corrigendum to `Unprecedented cloud resolution in a GPU-enabled full-physics atmospheric climate simulation on OLCF's Summit supercomputer' . . . . . . . . . 564
Anonymous Special issue introduction . . . . . . . 567 Kazuto Ando and Rahul Bale and ChungGang Li and Satoshi Matsuoka and Keiji Onishi and Makoto Tsubokura Digital transformation of droplet/aerosol infection risk assessment realized on ``Fugaku'' for the fight against COVID-19 . . . . . . . 568--586 Andrew E. Blanchard and John Gounley and Debsindhu Bhowmik and Mayanka Chandra Shekar and Isaac Lyngaas and Shang Gao and Junqi Yin and Aristeidis Tsaris and Feiyi Wang and Jens Glaser Language models for the prediction of SARS-CoV-2 inhibitors . . . . . . . . . 587--602 Anda Trifan and Defne Gorgun and Michael Salim and Zongyi Li and Alexander Brace and Maxim Zvyagin and Heng Ma and Austin Clyde and David Clark and David J. Hardy and Tom Burnley and Lei Huang and John McCalpin and Murali Emani and Hyenseung Yoo and Junqi Yin and Aristeidis Tsaris and Vishal Subbiah and Tanveer Raza and Jessica Liu and Noah Trebesch and Geoffrey Wells and Venkatesh Mysore and Thomas Gibbs and James Phillips and S. Chakra Chennubhotla and Ian Foster and Rick Stevens and Anima Anandkumar and Venkatram Vishwanath and John E. Stone and Emad Tajkhorshid and Sarah A. Harris and Arvind Ramanathan Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action . . . . . . . . . . . . . . . . . 603--623
Mark Parsons Special issue: Introduction . . . . . . 3 Parantapa Bhattacharya and Jiangzhuo Chen and Stefan Hoops and Dustin Machi and Bryan Lewis and Srinivasan Venkatramanan and Mandy L. Wilson and Brian Klahn and Aniruddha Adiga and Benjamin Hurt and Joseph Outten and Abhijin Adiga and Andrew Warren and Young Yun Baek and Przemyslaw Porebski and Achla Marathe and Dawen Xie and Samarth Swarup and Anil Vullikanti and Henning Mortveit and Stephen Eubank and Christopher L. Barrett and Madhav Marathe Data-driven scalable pipeline using national agent-based models for real-time pandemic response and decision support . . . . . . . . . . . . . . . . 4--27 Abigail Dommer and Lorenzo Casalino and Fiona Kearns and Mia Rosenfeld and Nicholas Wauer and Surl-Hee Ahn and John Russo and Sofia Oliveira and Clare Morris and Anthony Bogetti and Anda Trifan and Alexander Brace and Terra Sztain and Austin Clyde and Heng Ma and Chakra Chennubhotla and Hyungro Lee and Matteo Turilli and Syma Khalid and Teresa Tamayo-Mendoza and Matthew Welborn and Anders Christensen and Daniel GA Smith and Zhuoran Qiao and Sai K. Sirumalla and Michael O'Connor and Frederick Manby and Anima Anandkumar and David Hardy and James Phillips and Abraham Stern and Josh Romero and David Clark and Mitchell Dorrell and Tom Maiden and Lei Huang and John McCalpin and Christopher Woods and Alan Gray and Matt Williams and Bryan Barker and Harinda Rajapaksha and Richard Pitts and Tom Gibbs and John Stone and Daniel M. Zuckerman and Adrian J. Mulholland and Thomas Miller and Shantenu Jha and Arvind Ramanathan and Lillian Chong and Rommie E. Amaro #COVIDisAirborne: AI-enabled multiscale computational microscopy of delta SARS-CoV-2 in a respiratory aerosol . . 28--44 Zhe Li and Chengkun Wu and Yishui Li and Runduo Liu and Kai Lu and Ruibo Wang and Jie Liu and Chunye Gong and Canqun Yang and Xin Wang and Chang-Guo Zhan and Hai-Bin Luo Free energy perturbation-based large-scale virtual screening for effective drug discovery against COVID-19 . . . . . . . . . . . . . . . . 45--57
Martin Kronbichler and Dmytro Sashko and Peter Munch Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementations . . . . . . . . . . . . 61--81 José I. Aliaga and Hartwig Anzt and Thomas Grützmacher and Enrique S. Quintana-Ortí and Andrés E. Tomás Compressed basis GMRES on high-performance graphics processing units . . . . . . . . . . . . . . . . . 82--100 John M. Dennis and Allison H. Baker and Brian Dobbins and Michael M. Bell and Jian Sun and Youngsung Kim and Ting-Yu Cha Enabling efficient execution of a variational data assimilation application . . . . . . . . . . . . . . 101--114 Marc T. Henry de Frahan and Jon S. Rood and Marc S. Day and Hariswaran Sitaraman and Shashank Yellapantula and Bruce A. Perry and Ray W. Grout and Ann Almgren and Weiqun Zhang and John B. Bell and Jacqueline H. Chen PeleC: an adaptive mesh refinement solver for compressible reacting flows 115--131 Long Qu and Rached Abdelkhalak and Hatem Ltaief and Issam Said and David Keyes Exploiting temporal data reuse and asynchrony in the reverse time migration 132--150 Jakub Sístek and Tomás Oberhuber Acceleration of a parallel BDDC solver by using graphics processing units on subdomains . . . . . . . . . . . . . . . 151--164 Florent Lopez and Theo Mary Mixed precision $ L U $ factorization on GPU tensor cores: reducing data movement and memory footprint . . . . . . . . . . 165--179 Lukas Einkemmer and Alexander Moriggl Semi-Lagrangian 4d, 5d, and 6d kinetic plasma simulation on large-scale GPU-equipped supercomputers . . . . . . 180--196 Jacques Middlecoff and Yonggang G. Yu and Mark W. Govett Performance comparison of the A-grid and C-grid shallow-water models on icosahedral grids . . . . . . . . . . . 197--208
Jack Dongarra and Bernard Tourancheau Guest editors note: Special issue on clusters, clouds, and data for scientific computing . . . . . . . . . . 211--212 Emmanuel Jeannot and Guillaume Pallez and Nicolas Vidal IO-aware Job-Scheduling: Exploiting the Impacts of Workload Characterizations to select the Mapping Strategy . . . . . . 213--228 Piotr Luszczek and Wissam M. Sid-Lakhdar and Jack Dongarra Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering . . . . . . . . . . . . . . 229--244 Satoshi Matsuoka and Jens Domke and Mohamed Wahib and Aleksandr Drozd and Torsten Hoefler Myths and legends in high-performance computing . . . . . . . . . . . . . . . 245--259 Naweiluo Zhou and Giorgio Scorzelli and Jakob Luettgau and Rahul R. Kancharla and Joshua J. Kane and Robert Wheeler and Brendan P. Croom and Pania Newell and Valerio Pascucci and Michela Taufer Orchestration of materials science workflows for heterogeneous resources at large scale . . . . . . . . . . . . . . 260--271 Jorge Ejarque and Rosa M. Badia Automatizing the creation of specialized high-performance computing containers 272--287 Sadaf R. Alam and Miguel Gila and Mark Klein and Maxime Martinasso and Thomas C. Schulthess Versatile software-defined HPC and cloud clusters on Alps supercomputer for diverse workflows . . . . . . . . . . . 288--305 Sanjukta Bhowmick and Patrick Bell and Michela Taufer A Survey of Graph Comparison Methods with Applications to Nondeterminism in High-Performance Computing . . . . . . . 306--327 Vladimir Ostapenco and Laurent Lef\`evre and Anne-Cécile Orgerie and Benjamin Fichel Modeling, evaluating, and orchestrating heterogeneous environmental leverages for large-scale data center management 328--350 Jeffrey S. Vetter and Prasanna Date and Farah Fahim and Shruti R. Kulkarni and Petro Maksymovych and A. Alec Talin and Marc Gonzalez Tallada and Pruek Vanna-iampikul and Aaron R. Young and David Brooks and Yu Cao and Wei Gu-Yeon and Sung Kyu Lim and Frank Liu and Matthew Marinella and Bobby Sumpter and Narasinga Rao Miniskar Abisko: Deep codesign of an architecture for spiking neural networks using novel neuromorphic materials . . . . . . . . . 351--379 Andrés E. Tomás and Enrique S. Quintana-Orti and Hartwig Anzt Fast truncated SVD of sparse and dense matrices on graphics processors . . . . 380--393 Hongwei Jin and Krishnan Raghavan and George Papadimitriou and Cong Wang and Anirban Mandal and Mariam Kiran and Ewa Deelman and Prasanna Balaprakash Graph neural networks for detecting anomalies in scientific workflows . . . 394--411 Robert Underwood and Julie Bessac and David Krasowska and Jon C. Calhoun and Sheng Di and Franck Cappello Black-box statistical prediction of lossy compression ratios for scientific data . . . . . . . . . . . . . . . . . . 412--433 Olga Pearce and Stephanie Brink Finding the forest in the trees: Enabling performance optimization on heterogeneous architectures through data science analysis of ensemble performance data . . . . . . . . . . . . . . . . . . 434--441 Sabra Ossen and Jeremy Musser and Luke Dalessandro and Martin Swany INDIANA --- In-Network Distributed Infrastructure for Advanced Network Applications . . . . . . . . . . . . . . 442--461
Philipp Grete and Joshua C. Dolence and Jonah M. Miller and Joshua Brown and Ben Ryan and Andrew Gaspar and Forrest Glines and Sriram Swaminarayan and Jonas Lippuner and Clell J. Solomon and Galen Shipman and Christoph Junghans and Daniel Holladay and James M. Stone and Luke F. Roberts Parthenon --- a performance portable block-structured adaptive mesh refinement framework . . . . . . . . . . 465--486 Martin Karp and Daniele Massaro and Niclas Jansson and Alistair Hart and Jacob Wahlgren and Philipp Schlatter and Stefano Markidis Large-Scale direct numerical simulations of turbulence using GPUs and modern Fortran . . . . . . . . . . . . . . . . 487--502 Sergio Iserte and Alejandro González-Barberá and Paloma Barreda and Krzysztof Rojek A study on the performance of distributed training of data-driven CFD simulations . . . . . . . . . . . . . . 503--515 He Bai and Changjun Hu and Yuhan Zhu and Dandan Chen and Genshen Chu and Shuai Ren Accelerating cluster dynamics simulation of fission gas behavior in nuclear fuel on deep computing unit-based heterogeneous architecture supercomputer 516--529 Robert Schade and Tobias Kenter and Hossam Elgabarty and Michael Lass and Thomas D. Kühne and Christian Plessl Breaking the exascale barrier for the electronic structure problem in ab-initio molecular dynamics . . . . . . 530--538 Megan Hickman Fulp and Dakota Fulp and Changfeng Zou and Cooper Sanders and Ayan Biswas and Melissa C. Smith and Jon C. Calhoun Accelerated dynamic data reduction using spatial and temporal properties . . . . 539--559 Noel Chalmers and Abhishek Mishra and Damon McDougall and Tim Warburton HipBone: a performance-portable graphics processing unit-accelerated C++ version of the NekBone benchmark . . . . . . . . 560--577 Will Pazner and Tzanio Kolev and Jean-Sylvain Camier End-to-end GPU acceleration of low-order-refined preconditioning for high-order finite element discretizations . . . . . . . . . . . . 578--599 Jerry Watkins and Max Carlson and Kyle Shan and Irina Tezaur and Mauro Perego and Luca Bertagna and Carolyn Kao and Matthew J. Hoffman and Stephen F. Price Performance portable ice-sheet modeling with MALI . . . . . . . . . . . . . . . 600--625 Marc Gonzalez Tallada and Enric Morancho Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications . . . . . . . . 626--646
Edmond Chow Editorial . . . . . . . . . . . . . . . 649 Jie Chen and Zhiwei Nie and Yu Wang and Kai Wang and Fan Xu and Zhiheng Hu and Bing Zheng and Zhennan Wang and Guoli Song and Jingyi Zhang and Jie Fu and Xiansong Huang and Zhongqi Wang and Zhixiang Ren and Qiankun Wang and Daixi Li and Dongqing Wei and Bin Zhou and Chao Yang and Yonghong Tian Running ahead of evolution-AI-based simulation for predicting future high-risk SARS-CoV-2 variants . . . . . 650--665 Darren J. Hsu and Hao Lu and Aditya Kashi and Michael Matheson and John Gounley and Feiyi Wang and Wayne Joubert and Jens Glaser TwoFold: Highly accurate structure and affinity prediction for protein-ligand complexes from sequences . . . . . . . . 666--682 Maxim Zvyagin and Alexander Brace and Kyle Hippe and Yuntian Deng and Bin Zhang and Cindy Orozco Bohorquez and Austin Clyde and Bharat Kale and Danilo Perez-Rivera and Heng Ma and Carla M. Mann and Michael Irvin and Defne G. Ozgulbas and Natalia Vassilieva and James Gregory Pauloski and Logan Ward and Valerie Hayot-Sasson and Murali Emani and Sam Foreman and Zhen Xie and Diangen Lin and Maulik Shukla and Weili Nie and Josh Romero and Christian Dallago and Arash Vahdat and Chaowei Xiao and Thomas Gibbs and Ian Foster and James J. Davis and Michael E. Papka and Thomas Brettin and Rick Stevens and Anima Anandkumar and Venkatram Vishwanath and Arvind Ramanathan GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics 683--705
Roman Wyrzykowski and Ewa Deelman Guest Editor's note: Special issue on challenges and solutions for porting applications to next-generation high performance computing systems . . . . . 3--4 Daniel Langr and Tomás Dytrych Parallel multithreaded deduplication of data sequences in nuclear structure calculations . . . . . . . . . . . . . . 5--16 Roman Iakymchuk and Stef Graillat and José I. Aliaga General framework for re-assuring numerical reliability in parallel Krylov solvers: a case of bi-conjugate gradient stabilized methods . . . . . . . . . . . 17--33 Daniil Pavlov and Vladislav Galigerov and Daniil Kolotinskii and Vsevolod Nikolskiy and Vladimir Stegailov GPU-based molecular dynamics of fluid flows: Reaching for turbulence . . . . . 34--49
Jesus Carretero and Estela Suarez and Martin Schulz Malleability techniques applications in high-performance computing . . . . . . . 53--54 Rafael Rodríguez-Sánchez and Adrián Castelló and Sandra Catalán and Francisco D. Igual and Enrique S. Quintana-Ortí Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors 55--68 Iker Martín-Álvarez and José I. Aliaga and Maribel Castillo and Sergio Iserte and Rafael Mayo Dynamic spawning of MPI processes applied to malleability . . . . . . . . 69--93 Joel Criado and Victor Lopez and Joan Vinyals-Ylla-Catala and Guillem Ramirez-Miranda and Xavier Teruel and Marta Garcia-Gasulla Role-shifting threads: Increasing OpenMP malleability to address load imbalance at MPI and OpenMP . . . . . . . . . . . 94--107 Alberto Cascajo and David E. Singh and Jesús Carretero Detecting interference between applications and improving the scheduling using malleable application clones . . . . . . . . . . . . . . . . . 108--133
Natsuki Hosono and Mikito Furuichi Efficient implementation of low-order-precision smoothed particle hydrodynamics . . . . . . . . . . . . . 137--153 Anders Melander and Emil Stròm and Finnur Pind and Allan P. Engsig-Karup and Cheol-Ho Jeong and Tim Warburton and Noel Chalmers and Jan S. Hesthaven Massively parallel nodal discontinuous Galerkin finite element method simulator for room acoustics . . . . . . . . . . . 154--174 Hyun-Gyu Kang and Raymond S. Tuminaro and Andrey Prokopenko and Seth R. Johnson and Andrew G. Salinger and Katherine J. Evans An implicit barotropic mode solver for MPAS-ocean using a modern Fortran solver interface . . . . . . . . . . . . . . . 175--191 Peter Munch and Martin Kronbichler Cache-optimized and low-overhead implementations of additive Schwarz methods for high-order FEM multigrid computations . . . . . . . . . . . . . . 192--209 Kadir Akbudak Hypergraph-based locality-enhancing methods for graph operations in Big Data applications . . . . . . . . . . . . . . 210--224 Yuxi Hong and Hatem Ltaief and Matteo Ravasi and David Keyes High performance computing seismic redatuming by inversion with algebraic compression and multiple precisions . . 225--244 Bingxin Wei and Yizhuo Wang and Fangli Chang and Jianhua Gao and Weixing Ji Predicting optimal sparse general matrix--matrix multiplication algorithm on GPUs . . . . . . . . . . . . . . . . 245--259
John J. Loffeld and Andy Nonaka and Daniel R. Reynolds and David J. Gardner and Carol S. Woodward Performance of explicit and IMEX MRI multirate methods on complex reactive flow problems within modern parallel adaptive structured grid frameworks . . 263--281 Daniel S. Abdi and Isidora Jankov Accelerating atmospheric physics parameterizations using graphics processing units . . . . . . . . . . . . 282--296 Hiroyuki Ootomo and Katsuhisa Ozaki and Rio Yokota DGEMM on integer matrix multiplication unit . . . . . . . . . . . . . . . . . . 297--313 Qianxiang Ma and Rio Yokota An inherently parallel $ {\cal H}^2$-ULV factorization for solving dense linear systems on GPUs . . . . . . . . . . . . 314--336 Misun Min and Michael Brazell and Ananias Tomboulides and Matthew Churchfield and Paul Fischer and Michael Sprague Towards exascale for wind energy simulations . . . . . . . . . . . . . . 337--355 Mrinalgouda Patil and Ravi Lumba and Buvana Jayaraman and Anubhav Datta An integrated three-dimensional aeromechanical analysis for the prediction of stresses on modern coaxial rotors . . . . . . . . . . . . . . . . . 356--376 Anonymous Retraction Notice: Azzam Haidar and Tingxing Dong and Piotr Luszczek and Stanimire Tomov and Jack Dongarra, \booktitleBatched matrix computations on hardware accelerators based on GPUs, Int. J. High Perform. Comput. Appl. \bf 29(2) 193--208 (2015) . . . . . . . . . 377
Michael A. Heroux and Lois Curfman McInnes and James Ahrens and Todd Gamblin and Timothy C. Germann and Xiaoye Sherry Li and Kathryn Mohror and Todd Munson and Sameer Shende and Rajeev Thakur and Jeffrey Vetter and James Willenbring ECP libraries and tools: an overview . . 381--408 Pedro Valero-Lara and Seyong Lee and Marc Gonzalez-Tallada and Joel Denny and Keita Teranishi and Jeffrey S. Vetter Enhancing Kokkos with OpenACC . . . . . 409--426 Joel E. Denny and Seyong Lee and Pedro Valero-Lara and Marc Gonzalez-Tallada and Keita Teranishi and Jeffrey S. Vetter Clacc: OpenACC for C/C++ in Clang . . . 427--446 Julian Andrej and Nabil Atallah and Jan-Phillip Bäcker and Jean-Sylvain Camier and Dylan Copeland and Veselin Dobrev and Yohann Dudouit and Tobias Duswald and Brendan Keith and Dohyun Kim and Tzanio Kolev and Boyan Lazarov and Ketan Mittal and Will Pazner and Socratis Petrides and Syun'ichi Shiraiwa and Mark Stowell and Vladimir Tomov High-performance finite elements with MFEM . . . . . . . . . . . . . . . . . . 447--467 Ahmad Abdelfattah and Natalie Beams and Robert Carson and Pieter Ghysels and Tzanio Kolev and Thomas Stitt and Arturo Vargas and Stanimire Tomov and Jack Dongarra MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures . . . . . . . 468--490 David E. Bernholdt and George Bosilca and Aurelien Bouteiller and Ron Brightwell and Jan Ciesko and Matthew GF Dosanjh and Giorgis Georgakoudis and Ignacio Laguna and Scott Levy and Thomas Naughton and Stephen L. Olivier and Howard P. Pritchard and Whit Schonbein and Joseph Schuchart and Amir Shehata Taking the MPI standard and the open MPI library to exascale . . . . . . . . . . 491--507 Kenneth Moreland and Tushar M. Athawale and Vicente Bolea and Mark Bolstad and Eric Brugger and Hank Childs and Axel Huebl and Li-Ta Lo and Berk Geveci and Nicole Marsaglia and Sujin Philip and David Pugmire and Silvio Rizzi and Zhe Wang and Abhishek Yenpure Visualization at exascale: Making it all work with VTK-m . . . . . . . . . . . . 508--526 Hui Zhou and Ken Raffenetti and Yanfei Guo and Thomas Gillis and Robert Latham and Rajeev Thakur Designing and prototyping extensions to the Message Passing Interface in MPICH 527--545
Mahesh Lakshminarasimhan and Oscar Antepara and Tuowen Zhao and Benjamin Sepanski and Protonu Basu and Hans Johansen and Mary Hall and Samuel Williams Bricks: a high-performance portability layer for computations on block-structured grids . . . . . . . . . 549--567 Terry Cojean and Pratik Nayak and Tobias Ribizel and Natalie Beams and Yu-Hsiang Mike Tsai and Marcel Koch and Fritz Göbel and Thomas Grützmacher and Hartwig Anzt Ginkgo --- a math library designed to accelerate Exascale Computing Project science applications . . . . . . . . . . 568--584 Wajih Boukaram and Yuxi Hong and Yang Liu and Tianyi Shi and Xiaoye S. Li Batched sparse direct solver design and evaluation in SuperLU_DIST . . . . . . . 585--598 Andrew Myers and Weiqun Zhang and Ann Almgren and Thierry Antoun and John Bell and Axel Huebl and Alexander Sinn AMReX and pyAMReX: Looking beyond the exascale computing project . . . . . . . 599--611 Laksono Adhianto and Jonathon Anderson and Robert Matthew Barnett and Dragana Grbic and Vladimir Indic and Mark Krentel and Yumeng Liu and Sr\dbaran Milakovi\'c and Wileam Phan and John Mellor-Crummey Refining HPCToolkit for application performance analysis at exascale . . . . 612--632 Hengrui Luo and Younghyun Cho and James W. Demmel and Igor Kozachenko and Xiaoye S. Li and Yang Liu Non-smooth Bayesian optimization in tuning scientific applications . . . . . 633--657 Weijian Zheng and Jack Kordas and Tyler J. Skluzacek and Raj Kettimuthu and Ian Foster Globus service enhancements for exascale applications and facilities . . . . . . 658--670 Piotr Luszczek and Anthony Castaldo and Yaohung M. Tsai and Daniel Mishler and Jack Dongarra Numerical eigen-spectrum slicing, accurate orthogonal eigen-basis, and mixed-precision eigenvalue refinement using OpenMP data-dependent tasks and accelerator offload . . . . . . . . . . 671--691
Mark Gates and Ahmad Abdelfattah and Kadir Akbudak and Mohammed Al Farhan and Rabab Alomairy and Daniel Bielich and Treece Burgess and Sébastien Cayrols and Neil Lindquist and Dalal Sukkari and Asim YarKhan Evolution of the SLATE linear algebra library . . . . . . . . . . . . . . . . 3--17 Lisa Claus and Pieter Ghysels and Wajih Halim Boukaram and Xiaoye Sherry Li A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression . . . . 18--31 James Ahrens and Marco Arienti and Utkarsh Ayachit and Janine Bennett and Roba Binyahib and Ayan Biswas and Peer-Timo Bremer and Eric Brugger and Roxana Bujack and Hamish Carr and Jieyang Chen and Hank Childs and Soumya Dutta and Abdelilah Essiari and Berk Geveci and Cyrus Harrison and Subhashis Hazarika and Megan Hickman Fulp and Petar Hristov and Xuan Huang and Joseph Insley and Yuya Kawakami and Chloe Keilers and James Kress and Matthew Larsen and Dan Lipsa and Meghanto Majumder and Nicole Marsaglia and Victor A. Mateevitsi and Valerio Pascucci and John Patchett and Saumil Patel and Steve Petruzza and David Pugmire and Silvio Rizzi and David H. Rogers and Oliver Rübel and Jorge Salinas and Sudhanshu Sane and Sergei Shudler and Alexandra Stewart and Karen Tsai and Terece L. Turton and Will Usher and Zhe Wang and Gunther H. Weber and Corey Wetterer-Nelson and Jonathan Woodring and Abhishek Yenpure The ECP ALPINE project: In situ and post hoc visualization infrastructure and analysis capabilities for exascale . . . 32--51 Logan Ward and J. Gregory Pauloski and Valerie Hayot-Sasson and Yadu Babuji and Alexander Brace and Ryan Chard and Kyle Chard and Rajeev Thakur and Ian Foster Employing artificial intelligence to steer exascale workflows with Colmena 52--64 M Scot Breitenfeld and Houjun Tang and Huihuo Zheng and Jordan Henderson and Suren Byna HDF5 in the exascale era: Delivering efficient and scalable parallel I/O for exascale applications . . . . . . . . . 65--78 Xingfu Wu and John R. Tramm and Jeffrey Larson and John-Luke Navarro and Prasanna Balaprakash and Brice Videau and Michael Kruse and Paul Hovland and Valerie Taylor and Mary Hall Integrating ytopt and libEnsemble to autotune OpenMC . . . . . . . . . . . . 79--103 Peter Lindstrom and Jeffrey Hittinger and James Diffenderfer and Alyson Fox and Daniel Osei-Kuffuor and Jeffrey Banks ZFP: a compressed array representation for numerical computations . . . . . . . 104--122 Cody J. Balos and Marcus Day and Lucas Esclapez and Anne M. Felden and David J. Gardner and Malik Hassanaly and Daniel R. Reynolds and Jon S. Rood and Jean M. Sexton and Nicholas T. Wimer and Carol S. Woodward SUNDIALS time integrators for exascale applications with many independent systems of ordinary differential equations . . . . . . . . . . . . . . . 123--146 Aurelien Bouteiller and Thomas Herault and Qinglei Cao and Joseph Schuchart and George Bosilca PaRSEC: Scalability, flexibility, and hybrid architecture support for task-based applications in ECP . . . . . 147--166 Andrey Prokopenko and Daniel Arndt and Damien Lebrun-Grandié and Bruno Turcksin and Nicholas Frontiere and J. D. Emberson and Michael Buehlmann Advances in ArborX to support exascale applications . . . . . . . . . . . . . . 167--176 Stephen Hudson and Jeffrey Larson and John-Luke Navarro and Stefan M. Wild Portable, heterogeneous ensemble workflows at scale using libEnsemble . . 177--192 Roxana Bujack and Maya Gokhale and Latchesar Ionkov and Keita Iwabuchi and Michael Jantz and Terry Jones and Sumathi Lakshmiranganatha and Michael K. Lang and Jason Lee and M. Ben Olson and Scott Pakin and Roger Pearce and Jonathan Pietarila Graham and Li Tang and Terece L. Turton and Sean Williams The ECP SICM project: Managing complex memory hierarchies for exascale applications . . . . . . . . . . . . . . 193--207