Quick performance evaluation with VTune APS or detailed hotspot, memory, or threading analysis with VTune profiler.
First load the environment module:
module add vtune/XXXX
Intro:
www.intel.com/content/www/us/en/docs/vtune-profiler/get-started-guide/2023/linux-os.html
Manuals:
Intel_APS.pdf
VTune_features_for_HPC.pdf
vtune -help
Run VTune via command line interface
Run your application with VTune wrapper as follows:
www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/command-line-interface.html
mpirun -np 4 aps -collect hotspots advanced-hotspots ./path-to_your/app.exe args_of_your_app # after completion, explore the results: aps-report aps_result_*
mpirun -np 4 vtune –collect hotspots -result-dir vtune_hotspot ./path-to_your/app.exe args_of_your_app # after completion, explore the results: vtune -report summary -r vtune_*
Run VTune-GUI (not recommended)
Login with x-window support (ssh -X) and then start
vtune-gui
Run VTune-GUI remotely on your local browser (recommended)
Login to the supercomputer with local port forwarding and start your VTune server on an exclusive compute node (1h job):
ssh -L 127.0.0.1:55055:127.0.0.1:55055 blogin.hlrn.de salloc -p standard96:test -t 01:00:00 ssh -L 127.0.0.1:55055:127.0.0.1:55055 $SLURM_NODELIST module add intel/19.0.5 impi/2019.9 vtune/2022 vtune-backend --web-port=55055 --enable-server-profiling &
Open 127.0.0.1:55055 in your browser (allow security exception, if first time set initial password).
In 1st "Welcome" VTune tab (run MPI parallel Performance Snapshot):
Click: Configure Analysis
-> Set application: /path-to-your-application/program.exe
-> Check: Use app. dir. as work dir.
-> In case of MPI parallelism, expand "Advanced": keep defaults but paste the following wrapper script and check "Trace MPI":
#!/bin/bash echo "Target process PID: ${VTUNE_TARGET_PID}" # Run VTune collector (here with 4 MPI ranks) mpirun -np 4 "$@"
Under HOW run: Performance Snapshot.
(After completion/result finalization a 2nd result tab opens automatically.)
In 2nd "r0..." VTune tab (explore Performance Snapshot results):
-> Here you find several analysis results e.g. the HPC Perf. Characterization.
-> Under Performance Snapshot - depending on the snapshot outcome - VTune suggests (see % in figure below) more detailed follow-up analysis types:
--> For example select/run a Hotspot analysis:
In 3nd "r0..." VTune tab (Hotspot analysis):
-> Expand sub-tab Top-down Tree
--> In Function Stack expand "_start" function and expand further down to "main" function (first with entry in source file column)
--> In source file column double click on "filename.c" of "main" function
-> In new sub-tab "filename.c" scroll down to line with maximal CPU Time: Total to find hotspot in main function
To quit the debug session press "Exit" in the VTune "Menu" (upper left symbol of "three horizontal bars"). Then close the browser page. Exit your compute node via CTRL+D and kill your interactive job:
squeue -l -u $USER scancel your-job-id