Grace Hopper Overview
The Grace Hopper node (sgh100) on the Sol supercomputer is a unique piece of equipment featuring an ARM-based processor (rather than x86_64) and a GPU. This combination lends to new capabilities, such as leveraging the high memory from the node and the modern gh200 GPU.
For more information on what makes the Grace Hopper node unique, see NVIDIA's Whitepaper
ARM processors do not run x86_64 software, so a special set of software tools are made available to leverage this hardware. You will need to compile for aarch64 for software to properly run on this node.
Alternatives to compiling ARM-based software include using Apptainer Containers compiled with ARM. On Sol, we have the following containers known to be working with this node. This list is not all-inclusive and may change over time, but the location will always house ARM-ready container images.
[software@sgh001:~]$ ls -1 /packages/aarch64/simg/
autodock_2020.06.sif*
chroma_2021.04.sif*
gromacs_2023.2.sif*
julia_v2.4.1.sif*
lammps_patch_15Jun2023.sif*
nvhpc_24.5-devel-cuda_multi-ubuntu22.04.sif*
pytorch_24.05-py3.sif*
quantum_espresso_qe-7.1.sif*
relion_3.1.3.sif*
tensorflow_24.05-tf2-py3-igpu.sif*
Requesting a Grace Hopper from the Job Scheduler
Using the following commands, you can request an allocation on this node and run these containers:
$ salloc -p arm -G 1
Running a container on the Grace Hopper
$ apptainer run pytorch_24.05-py3.sif # when CPU only, OR
$ apptainer run --nv pytorch_24.05-py3.sif # when GPU is requested
=============
== PyTorch ==
=============
NVIDIA Release 24.05 (build 91431256)
PyTorch Version 2.4.0a0+07cecf4
Apptainer> python
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True