Quick search Find article
Quick search
Find article

PERI - auto-tuning memory-intensive kernels for multicore

S Williams1,2, K Datta2, J Carter1, L Oliker1,2, J Shalf1, K Yelick1,2 and D Bailey1

Show affiliations


[1]
Asanovic K, et al 2006 The landscape of parallel computing research: A view from Berkeley EECS (University of California, Berkeley) Report UCB/EECS-2006-183

[2]
Gschwind M 2006 Chip multiprocessing and the cell broadband engine CF'06: Proc. 3rd Conf. on Computing Frontiers (Ischia, Italy) p 1-8

[3]
Vuduc R, Demmel J and Yelick K 2005 OSKI: A library of automatically tuned sparse matrix kernels Proc. SciDAC 2005, J. of Physics: Conf. Series (Institute of Physics Publishing)

[4]
Williams S, Oliker L, Vuduc R, Shalf J, Yelick K and Demmel J 2007 Optimization of sparse matrix-vector multiplication on emerging multicore platforms Proc. SC2007: High performance computing, networking, and storage conference

[5]
Datta K, Murphy M, Volkov V, Williams S, Carter J, Oliker L, Patterson D, Shalf J and Yelick K 2008 Stencil computation optimization autotuning state-of-the-art multicore architectures Preprint

[6]
Macnab A, Vahala G, Vahala L and Pavlo P 2002 Lattice Boltzmann model for dissipative MHD Proc. 29th EPS Conf. on Controlled Fusion and Plasma Physics vol 26B (Montreux, Switzerland)

[7]
Biskamp D 2003 Magnetohydrodynamic Turbulence (Cambridge University Press)
CrossRef 
[8]
Williams S, Carter J, Oliker L, Shalf J and Yelick K 2008 Lattice Boltzmann simulation optimization on leading multicore platforms Interational Conf. Parallel and Distributed Computing Systems (IPDPS) (Miami, Florida)

[9]
Williams S 2008 Autotuning performance on multicore computers PhD Thesis University of California, Berkeley

[10]
Williams S and Patterson d 2008 The roofline model: An insightful multicore performance model Preprint

[11]
Williams S, Patterson D, Oliker L, Shalf J and Yelick K 2008 The roofline model: A pedagogical tool for auto-tuning kernels on multicore architectures Hot Chips 20: Stanford University, stanford, California, August 24-26, 2008 (IEEE MIcro)

[12]
Whaley R C, Petitet A and Dongarra J 2001 Automated empirical optimizations of software and the ATLAS project Parallel Computing 27 (1) 3-25
CrossRef 
[13]
Bilmes J, Asanović K, Chin C W and Demmel J 1997 Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI C coding methodology Proc. Int. Conf. Supercomputing (Vienna, Austria)

[14]
Frigo M and Johnson, S G 1998 FFTW: An adaptive software architecture for the FFT Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing vol 3 (IEEE) p 1381-1384

[15]
Moura J M F, Johnson J, Johnson R W, Padua D, Prasanna V K, Püschel M and Veloso M 2000 SPIRAL: Automatic implementation of signal processing algorithms High Performance Embedded Computing (HPEC)

[16]
Hill M D and Smith A J 1989 Evaluating associativity in CPU caches IEEE Trans. Comput. 38 (12) 1612-1630
CrossRef 
[17]
Sylvester D and Keutzer K 2001 Microarchitectures for systems on a chip in small process geometries Proc. IEEE 467-489
CrossRef 
crossref member

  1. PERI - auto-tuning memory-intensive kernels for multicore

    S Williams et al 2008 J. Phys.: Conf. Ser. 125 012038

  2. Orbital and spin angular momentum in conical diffraction

    M V Berry et al 2005 J. Opt. A: Pure Appl. Opt. 7 685

  3. Shear-induced constraint to amphiphile chain dynamics in wormlike micelles

    W. M. Holmes et al 2004 Europhys. Lett. 66 464

  4. The bandgap of a photonic crystal with triangular dielectric rods in a honeycomb lattice

    Weimin Kuang et al 2005 J. Opt. A: Pure Appl. Opt. 7 525

  5. Breaking the diffraction resolution barrier in far-field microscopy by molecular optical bistability

    Mariano Bossi et al 2006 New J. Phys. 8 275

  6. Loss and revival of phase coherence in a Bose–Einstein condensate moving through an optical lattice

    Francesco Nesi and Michele Modugno 2004 J. Phys. B: At. Mol. Opt. Phys. 37 S101

  7. Detecting event-related time-dependent directional couplings

    R G Andrzejak et al 2006 New J. Phys. 8 6

  8. Dissociative recombination as primary dissociation channel in plasma chemistry

    D C Schram et al 2009 J. Phys.: Conf. Ser. 192 012012

  9. Calibrating Redshift Distributions beyond Spectroscopic Limits with Cross-Correlations

    Jeffrey A. Newman 2008 ApJ 684 88

  10. A new integrable differential-difference system and its explicit solutions

    Yong-Tang Wu and Xing-Biao Hu 1999 J. Phys. A: Math. Gen. 32 1515

Users also read

What's this?
This innovative new feature generates a list of articles 'also read' by other users based on them reading the original article. Article abstracts citations and references are all considered and weighted accordingly. We hope that this will help you find relevant papers for your research.

  1. PERI auto-tuning
  2. Some essential techniques for developing efficient petascale applications
  3. Cyber-Enabled Scientific Discovery
More

Related review articles

What's this?
View review articles related to this research to gain an insight into the key trends in this subject area. Related review articles are selected based on PACS/MSC codes, and are no more than three years old.

  1. FPGA-based, specialized trigger and data acquisition systems for high-energy physics experiments

View by subject




Export






Please login to access our web services, or create an account if you don't yet have one.

You must have cookies enabled in your web browser to be able to login.

Username
Password

Forgotten your password? Get a new one here.