How to Use the Portland Group pgprof Profiler
On the Katana Cluster, you can use the Portland Group’s
pgprof multi-process and multi-threaded profiler to profile MPI and OpenMP FORTRAN or C codes. For more details, please consult the PGI Tools Guide (PDF).
Please note that only two licenses of pgprof are available on the system. If both are in use, you will have to wait for one to free up. When you are done using pgprof, please exit from it to avoid blocking others from accessing it.
Profiling MPI codes with pgprof
Profiling an MPI code requires four steps:
- Step 1. Select MPI_IMPLEMENTATION to be mpich.
katana:~ % setenv MPI_IMPLEMENTATION mpich katana:~ % printenv MPI_IMPLEMENTATION mpich
The latter, optional, command confirms that mpich is the active MPI_IMPLEMENTATION.
- Step 2. Compile code with -Mprof=time,mpi,func
katana:~ % mpif77 -o example example.f -Mprof=mpi,func
- Step 3. Run code to generate timing data.
katana:~ % mpirun -np 4 example
- Running the code interactively limits the number of tasks to 4.
- Four output files will be generated for this run: pgprof.out, pgprof.out1, pgprof.out2, and pgprof.out3.
- You will need to run the job in batch if more than 4 slots are needed or if the job requires more than a few minutes to run. Don’t forget to reset MPI_IMPLEMENTATION to mpich in your batch script:
. . . export MPI_IMPLEMENTATION=mpich . . .
- Step 4. Run the profiler to display performance data through the GUI.
You will need an X-server running because
pgprofwill spawn a GUI window as shown below.
katana:~ % pgprof -exe example1
In the above GUI window, the elapsed time and slots used (4 processes) are displayed immediately below the task bar at the top. The left sub-window shows the starting locations for “main”, functions “integral” and “fct” within the single source file example1_2.f. The function “integral” is called by main exactly once per process. The function “integral” calls “fct” 500 times. Since 4 slots are used, the work load for “fct” is 25%, which is also shown graphically through a horizontal bar in red. All this information is displayed on the top right sub-window. The bottom window shows the work load of “example1″ (shaded in blue). You may select “fct” by clicking on “fct” and the bottom window will respond in kind.
- If pgprof -o example1 were used, the line near the top of the GUI window that reports the wall clock time would print “example1_2″ instead of “a.out” as the executable name.
pgprofmay also be invoked in text mode
katana:F77/tmp % pgprof -exe example1 -text pgprof 7.1-1 Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved. Copyright 2000-2007, STMicroelectronics, Inc. All Rights Reserved. Datafile : pgprof.out Processes : 4 Threads : 4 pgprof> print Profile output - Tue Nov 20 09:06:50 EST 2007 Program : a.out Datafile : pgprof.out Process : 0 Total Time for Process : 0.100336 secs Sort by max time Select all Routine Source Line Calls Time(%) Name File No. 1 91 example1_2 example1.f 1 500 5 fct example1.f 88 1 4 integral example1.f 73 pgprof>
Profiling OpenMP codes with pgprof
pgprof, follow the step-by-step procedures shown below:
- Step 1. Compile the program using the appropriate PGI compiler with -Mprof=func to turn on profiling on a function level.
katana:~ % pgf77 -o example example.f -Mprof=time,func -O3
- Step 2. Run the compiled code (either interactively or in batch) to generate a single
- Step 3. Run the
pgprofprofiler to collect profiling data.
By default, a GUI-based window will be launched which reports profiling data extracted from the file
pgprof.out. You will need an X-server based windowing software for this to work. At BU, if you access Katana from a Windows-based PC, you can download x-win32 for free.
- GUI Method
katana:~ % pgprof -exe example1
In the above GUI window, the elapsed time is displayed immediately below the task bar at the top. At the bottom right corner, there is a drop-down menu. For OpenMP applications, you should select “Threads” as in the picture. The left sub-window shows “main” as well as functions “integral” and “fct,” all belonging to the same source file, example1.f. The bottom window shows the activities of all the threads for the active function which is highlighted in blue in the left top window. In this case, “main” is the master thread and hence only thread 0 has actions. The function “integral” is called by main exactly once per thread (there are 4 threads). In turn, “integral” calls “fct” 500 times per thread for a total of 2000 times (see the right top window).
Upon clicking “fct” with the mouse button, the bottom shows the corresponding activities:
- Text Method
katana:OpenMP/F77omp % pgprof -text -exe example1 pgprof 7.1-1 Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved Copyright 2000-2007, STMicroelectronics, Inc. All Rights Reserved Datafile : pgprof.out Processes : 1 Threads : 4 pgprof> print Profile output - Sun Nov 18 09:47:21 EST 2007 Program : a.out Datafile : pgprof.out Process : 0 Total Time for Process : 0.044703 secs Sort by max time Select all Routine Source Line Calls Time(%) Name File No. 1 51 example1 example1.f 1 2,000 29 fct example1.f 59 4 20 integral example1.f 44 pgprof> quit
- GUI Method