Slim fly: A cost effective low-diameter network topology M Besta, T Hoefler SC'14: Proceedings of the International Conference for High Performance …, 2014 | 177 | 2014 |

Enabling highly-scalable remote memory access programming with MPI-3 one sided R Gerstenberger, M Besta, T Hoefler Scientific Programming 22, 1970 | 116 | 1970 |

To push or to pull: On reducing communication and synchronization in graph computations M Besta, M Podstawski, L Groner, E Solomonik, T Hoefler Proceedings of the 26th International Symposium on High-Performance Parallel …, 2017 | 49 | 2017 |

Evaluating the cost of atomic operations on modern architectures H Schweizer, M Besta, T Hoefler 2015 International Conference on Parallel Architecture and Compilation (PACT …, 2015 | 45 | 2015 |

Programming abstractions for data locality A Tate, A Kamil, A Dubey, A Größlinger, B Chamberlain, B Goglin, ... | 34 | 2014 |

Scaling betweenness centrality using communication-efficient sparse matrix multiplication E Solomonik, M Besta, F Vella, T Hoefler Proceedings of the International Conference for High Performance Computing …, 2017 | 25 | 2017 |

Slimsell: A vectorizable graph representation for breadth-first search M Besta, F Marending, E Solomonik, T Hoefler 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2017 | 24 | 2017 |

Survey and taxonomy of lossless graph compression and space-efficient graph representations M Besta, T Hoefler arXiv preprint arXiv:1806.01799, 2018 | 20 | 2018 |

Fault tolerance for remote memory access programming models M Besta, T Hoefler Proceedings of the 23rd international symposium on High-performance parallel …, 2014 | 19 | 2014 |

High-performance distributed rma locks P Schmid, M Besta, T Hoefler Proceedings of the 25th ACM International Symposium on High-Performance …, 2016 | 18 | 2016 |

Accelerating irregular computations with hardware transactional memory and active messages M Besta, T Hoefler Proceedings of the 24th International Symposium on High-Performance Parallel …, 2015 | 16 | 2015 |

A modular benchmarking infrastructure for high-performance and reproducible deep learning T Ben-Nun, M Besta, S Huber, AN Ziogas, D Peter, T Hoefler 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2019 | 15 | 2019 |

Slim NoC: A low-diameter on-chip network topology for high energy efficiency and scalability M Besta, SM Hassan, S Yalamanchili, R Ausavarungnirun, O Mutlu, ... ACM SIGPLAN Notices 53 (2), 43-55, 2018 | 15* | 2018 |

Transformations of high-level synthesis codes for high-performance computing J de Fine Licht, S Meierhans, T Hoefler CoRR, vol. abs/1805.08288, 2018 | 13 | 2018 |

Active access: A mechanism for high-performance distributed data-centric computations M Besta, T Hoefler Proceedings of the 29th ACM on International Conference on Supercomputing …, 2015 | 11 | 2015 |

Communication-avoiding parallel minimum cuts and connected components L Gianinazzi, P Kalvoda, A De Palma, M Besta, T Hoefler ACM SIGPLAN Notices 53 (1), 219-232, 2018 | 10 | 2018 |

Substream-centric maximum matchings on fpga M Besta, M Fischer, T Ben-Nun, J de Fine Licht, T Hoefler Proceedings of the 2019 ACM/SIGDA International Symposium on Field …, 2019 | 7 | 2019 |

Log (graph) a near-optimal high-performance graph representation M Besta, D Stanojevic, T Zivic, J Singh, M Hoerold, T Hoefler Proceedings of the 27th International Conference on Parallel Architectures …, 2018 | 7 | 2018 |

Graph processing on FPGAs: Taxonomy, survey, challenges M Besta, D Stanojevic, JDF Licht, T Ben-Nun, T Hoefler arXiv preprint arXiv:1903.06697, 2019 | 6 | 2019 |

Network-accelerated non-contiguous memory transfers S Di Girolamo, K Taranov, A Kurth, M Schaffner, T Schneider, J Beránek, ... Proceedings of the International Conference for High Performance Computing …, 2019 | 5 | 2019 |