Encyclopedia Galactica

❯

❯

❯

bigblue

ChaNGa

Building

This machine is really nuts, 192 cores in 2 sockets per node.

This module set: openmpi/4.1.5rc2/hpcx

Build with: ./buildold ChaNGa verbs-linux-x86_64 smp -j8 --with-production

./charmrun.smp +p 190 ++mpiexec ./ChaNGa.smp ++ppn 95 +setcpuaffinity +commap 0,96 +pemap 1-95,97-191 agora.param

works one one node

Running 3 steps with agora low:

192 cores, SMP (2 processes, full commap): 167 seconds!
192 cores, SMP (2 processes, no commap): > 600 seconds!
96 cores, SMP (1 processes): 320 seconds
192 cores without cpuaffinity/comap: doesn’t run!
192 cores without SMP (192 processes): absurdly slow

Fastest choice for agora:

./charmrun.smp +p 95 ++mpiexec ./ChaNGa.smp ++ppn 95 +setcpuaffinity +commap 0 +pemap 1-95 agora.param

Weirdly this requires 192 MPI tasks???
- It seems to still be just as fast if I use 48 SMP tasks as well??
- 24 is slower by ~50%.
I can’t get it to be fast without 192 MPI tasks!
- It’s slower if I spread the pe map around, but only by ~40%
- It’s WAY slower without commap or pemap (10x slower)

What if I just try compiling with the multicore options? Hoo boy that’s fast and easy…

pkdgrav3

Compiling

Load these modules: python/3.12.9/gcc.8.5.0 fftw/3.3.10/gcc-8.5.0/openmpi-4.1.6 boost/1.84.0/gcc.8.5.0 gsl/2.7.1/gcc-8.5.0 openmpi/4.1.6/gcc.8.5.0/mt hdf5/1.14.6/gcc-8.5.0/openmpi-4.1.6
FFTW needs to be explicitly pointed at: cmake -DFFTW_ROOT=/opt/ohpc/pub/apps/uofm/fftw/3.3.10-gcc-mpi/ -S . -B build
Needs to use mpirun to execute, srun doesn’t work for reasons

Performance

Using 4 or 8 tasks per node is good.

ChaNGa
Building
pkdgrav3
Compiling
Performance

Created with Quartz v4.5.1 © 2025

GitHub
Discord Community