Compiled below is a list of facts that may be important when you try to compile, link, and run programs on the Katana Cluster:
- The Katana Cluster consists of the Katana login node and a set of compute nodes on which batch jobs are run. The login node also serves as a compute node.
- To make full use of all of the memory available to a node, you must be running a 64-bit executable. A 32-bit executable can only accomodate 2 GB of memory allocation. If you are using pre-built commercial or other application packages, they may not be built as 64-bit applications. Gaussian is such an example.
- Sixty-four-bit addressing is the system default on the Katana Cluster. To build 32-bit executables use the “-m 32″ compiler option with the GNU compilers and “-tp k8-32″ with the PGI compilers. (More details …)
- Executables previously generated on other Linux machines may work on the Katana Cluster. However, we recommend that you recompile all programs for Katana applications.
- Two sets of compilers, PGI (default) and GNU, are available. These can be switched on or off through the environment variable MPI_COMPILER. (More details …)
- IF your code mixes C with fortran, it will most likely require additional language-related support libraries. For Portland Group compilers, add either -pgf77libs or -pgf90libs, depending on the language syntax. (See pgcc)
- MPI is supported by both the “openmpi” (default) and “mpich” MPI implementations. These two options can be switched on or off through the environment variable MPI_IMPLEMENTATION. (More details …)
- MPI C and C++ programs need to include
mpi.hwhere necessary while MPI FORTRAN 77/90/95 programs need
mpif.h. No additional header files or compiler switches are needed for C++ programs.
- MPI-2 functionalities, such as MPI_Put and MPI_Get, are supported only in the “openmpi” MPI implementation.
- The mixed MPI-OpenMP parallel paradigm is also supported. In this mode, jobs must be submitted via the “multi-threaded MPI” parallel environment.
- A timing comparison of an MPI code for a 2D Laplace solver on the SCV computer systems (Katana Cluster (original a## and b## nodes), IBM Bluegene, and the now retired IBM pSeries) is provided to demonstrate their relative performances.
- A user can submit as many jobs as desired. However, no more than 64 processors of a single user can be in the run state simultaneously.
- The maximum run time limit is generally 24 hours (-l h_rt=24:00:00). However, we are now allowing a limited number of jobs per user to run up to 72 hours. A user can request up to 4 processors (as 4 single processor jobs or one 4 processor job for example) with a run time limit of 72 hours; we currently have 12 slots for this purpose among all users. The default limit is 2 hours if you do not specify a higher limit.
- Parallel processing with OpenMP is limited to 8 processors. When submitting a batch job, use “-pe omp N” with N a number between 1 and 8 to assure that all requested processors are on a single node.
- The distributed-memory parallel mathematical library ScaLAPACK is available.
- Usually, there is no need to know the specific node names assigned at runtime to a batch job. However, if a job needs the node names, they are available through an environment variable $PE_HOSTFILE at runtime. (More details … )
- Local scratch disk space is available on each node (amount varies by node). It is NOT backed up and can only be kept for 10 days. (More details … )
- Debugging and profiling tools are available.