Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0 O Fuhrer, T Chadha, T Hoefler, G Kwasniewski, X Lapillonne, D Leutwyler, ... Geoscientific Model Development 11 (4), 1665-1681, 2018 | 50 | 2018 |

Using compiler techniques to improve automatic performance modeling A Bhattacharyya, G Kwasniewski, T Hoefler 2015 International Conference on Parallel Architecture and Compilation (PACT …, 2015 | 24 | 2015 |

A PCIe congestion-aware performance model for densely populated accelerator servers M Martinasso, G Kwasniewski, SR Alam, TC Schulthess, T Hoefler SC'16: Proceedings of the International Conference for High Performance …, 2016 | 18 | 2016 |

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication G Kwasniewski, M Kabić, M Besta, J VandeVondele, R Solcà, T Hoefler Proceedings of the International Conference for High Performance Computing …, 2019 | 14 | 2019 |

Extreme scale plasma turbulence simulations on top supercomputers worldwide W Tang, B Wang, S Ethier, G Kwasniewski, T Hoefler, KZ Ibrahim, ... SC'16: Proceedings of the International Conference for High Performance …, 2016 | 9 | 2016 |

Automatic complexity analysis of explicitly parallel programs T Hoefler, G Kwasniewski Proceedings of the 26th ACM symposium on Parallelism in algorithms and …, 2014 | 8 | 2014 |

Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0, Geosci. Model Dev., 11, 1665–1681 O Fuhrer, T Chadha, T Hoefler, G Kwasniewski, X Lapillonne, D Leutwyler, ... | 7 | 2018 |

Automatic Performance Modeling of HPC Applications F Wolf, C Bischof, A Calotoiu, T Hoefler, C Iwainsky, G Kwasniewski, ... Software for Exascale Computing-SPPEXA 2013-2015, 445-465, 2016 | 4 | 2016 |

Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis J de Fine Licht, G Kwasniewski, T Hoefler The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays …, 2020 | 2 | 2020 |

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal LU Factorization G Kwasniewski, T Ben-Nun, AN Ziogas, T Schneider, M Besta, T Hoefler arXiv preprint arXiv:2010.05975, 2020 | | 2020 |

Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis JF Licht, G Kwasniewski, T Hoefler arXiv preprint arXiv:1912.06526, 2019 | | 2019 |

A scalable weakly-synchronous algorithm for solving partial differential equations K Aditya, T Gysi, G Kwasniewski, T Hoefler, DA Donzis, JH Chen arXiv preprint arXiv:1911.05769, 2019 | | 2019 |

Scaling a Convection-Resolving RCM to Near-Global Scales O Fuhrer, D Leutwyler, T Chadha, G Kwasniewski, T Hoefler, X Lapillonne, ... 2017 AGU Fall Meeting, 2017 | | 2017 |

Scaling a Convection-Resolving RCM to Near-Global Scales D Leutwyler, O Fuhrer, T Chadha, G Kwasniewski, T Hoefler, X Lapillonne, ... AGUFM 2017, A24F-05, 2017 | | 2017 |