Doing more with less: Growth, improvements, and management of NMSU’s computing capabilities

Trecakov, Strahinja; Von Wolff, Nicholas

doi:10.1145/3437359.3465610

Citation Details

Doing more with less: Growth, improvements, and management of NMSU’s computing capabilities

Deployed in 2015, Discovery is New Mexico State University’s commonly-available High-Performance Computing (HPC) cluster. The deployment of Discovery was initiated by Information and Communication Technologies (ICT) employees from the Systems Administration group who wanted to help researchers run their computations on a more powerful system than one they had sitting in their offices. Over the years, the cluster has grown 6 times, and as of March 2021 has 52 compute nodes, 1480 CPU cores, 17 Terabytes of RAM, 30 GPUs, and 1.8 Petabytes of usable storage. Discovery’s hardware is acquired using a combination of university funds, condo-model based funds, and grant funds, causing Discovery to be a heterogeneous system containing several CPU generations. This paper discusses our growth and administration experiences on this heterogeneous system, as well as our outreach and contribution to the HPC community. more »

Award ID(s):: 1757207

PAR ID:: 10294204

Author(s) / Creator(s):: Trecakov, Strahinja; Von Wolff, Nicholas

Date Published:: 2021-07-17

Journal Name:: Practice and Experience in Advanced Research Computing

Page Range / eLocation ID:: 1 to 4

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3437359.3465610

More Like this