SCF User Information
Table of contents
- Overview
- General conditions of use
- Passwords and access
- News and announcements
- Getting information and help
- Allocations and accounting
- Hardware configuration
- File systems
- Usage policies and batch
- Software
Overview
The Boston University Scientific Computing Facilities (SCF) consist of a collection of high-performance computers, high-speed networks, and advanced visualization facilities. These facilities are managed by the Scientific Computing & Visualization (SCV) group of Information Services & Technology (IS&T) in collaboration with the Center for Computational Science (CCS).
The SCF facilities currently include an IBM BladeCenter Katana Cluster, an IBM Blue Gene system, an IBM pSeries 655, and our virtual reality/scientific visualization facilities. Your SCF login name and password allows access automatically to most of these facilities. Consulting is available through SCV staff on the use of all these facilities.
General conditions of use
Your use of these machines is governed by Boston University’s Conditions of Use and Policy on Computing Ethics. By using your account on the SCF and other University computers, you are agreeing to the terms and conditions set forth therein, as well as the usage policies described below.
Passwords and access
When you first get an account, you will receive instructions on setting up your BU login (for external users only – for BU users, your login will be the same as on other BU systems) and/or your SCF Password (for all users). This SCF password is a standard Unix password, distinct from your BU Kerberos password. This password can be used to access all SCF systems that you have access to and to access our web materials. You may optionally also use your BU Kerberos password to log in to the Katana Cluster once you have an account, but it will not allow access to our other systems or web materials.
If you ever wish to change your SCF (non-Kerberos) password, you should log in to either katana.bu.edu or twister.bu.edu and run the command passwd. If you ever forget your SCF password, you should send email to scfacct@bu.edu.
News and announcements
We will periodically make important announcements regarding usage policies, software and hardware upgrades, downtime, etc. It is important that you read these messages on a regular basis and we provide several methods for you to do so.
All messages are posted to the system message board and to the BU mailing list scfug-l. The system message board can be viewed using the program msgs. By default this command will be included in your .login startup file. If you modify this file, we suggest that you continue to include this command.
The users group mailing list, scfug-l, is used for general discussions regarding the Scientific Computing Facilities, as well distributing important announcements. To have the scfug-l posting sent to you by e-mail, send a one-line message to Majordomo, specifying your preferred e-mail address, as shown below:
subscribe scfug-l your_email_address
Messages sent to the scfug-l mailing list are also posted to theCoCo bulletin board on our website.
We will also regularly send individual messages to you regarding your account status and usage. These will be sent via e-mail to your SCF account. If you do not regularly read e-mail on this system, you should have your mail forwarded to a machine where you do. This may be done by creating a .forward file in your home directory containing the e-mail address to which your mail should be redirected.
Getting information and help
Visit our home page or click the links below to get started:
- Overview of SCV Computational Facilities
- SCV Online Help
- Katana Cluster Information
- IBM Blue Gene Information
- IBM p655 Information
- SCF Technical Summary
- SCF Frequently Asked Questions list
- SCV Software Packages
- MPI Documentation
- Message Passing Tutorial
If you are experiencing system problems, please send e-mail to “help” on the system on which you are experiencing the problem. If that is not possible, please contact us.
For more information or help in using or porting applications to our systems, please see our Scientific Programming Consulting page or contact our Scientific Programmer Kadin Tseng (kadin@bu.edu).
If you have questions regarding your computer account or resource allocations, please contact us.
Allocations and Accounting
We account for all usage (batch and interactive) by all of our users on our large systems. It is the responsibility of the Principal Investigator to monitor his/her project’s usage and to request an appropriate allocation of processor time, expressed in “service units” (SUs). Information on accounts and allocations, as well as forms to request resources may be found on the Accounts & Project Maintenance pages.
Allocations
Although there are no monetary charges for use of our facilities, all projects are given a specific annual usage allocation, in terms of “service units” (SUs). This allocation must be renewed (and adjusted if appropriate) annually. The table below shows the SU charge on each of our systems for 1 hour of CPU usage. This charge is based on clock speed and other performance-related factors:
| Cluster Name | Host Names | Processor Type & Speed | SU charge for each CPU hour |
|---|---|---|---|
| IBM Blue Gene | levi.bu.edu1, lee.bu.edu1 | IBM Blue Gene (700 MHz PowerPC 440) | 0.25 |
| IBM pSeries | twister.bu.edu, scrabble.bu.edu, marbles.bu.edu, crayon.bu.edu, litebrite.bu.edu, hotwheels.bu.edu, | IBM p655 (1.1 GHz Power4) | 0.85 |
| IBM pSeries | jacks.bu.edu, playdoh.bu.edu, slinky.bu.edu | IBM p655 (1.7 GHz Power4) | 1.31 |
| Katana Cluster | katana.bu.edu, katana-a02..a14, katana-b01..b08 | 2.6 GHz AMD Opteron 2218HE blades | 1.0 |
| Katana Cluster | katana-b09..b14 | 3.0 GHz Intel Xeon E5450 blades | 1.5 |
| Katana Cluster | katana-c01..c14 | 2.4 GHz AMD Opteron 2216HE blades | 0.9 |
| Katana Cluster | katana-d01..d08 | 2.93 GHz Intel Xeon X5570 blades (24GB Memory) | 1.9 |
| Katana Cluster | katana-d091, geo1 | 2.93 GHz Intel Xeon X5570 blades (24GB Memory) | 1.9/0.01 |
| Katana Cluster | katana-d11..d12 | 2.2 GHz AMD Opteron 275 blades (4GB Memory) | 0.0 |
| Katana Cluster | katana-d13..d14 | 2.93 GHz Intel Xeon X5570 blades (96GB Memory) | 2.4 |
| Katana Cluster | katana-e01..e031 | 2.93 GHz Intel Xeon X5670 blades (48GB Memory) | 1.9/0.01 |
| Katana Cluster | katana-e04..e131 | 2.93 GHz Intel Xeon X5670 blades (96GB Memory) | 1.9/0.01 |
| Katana Cluster | katana-f01..f14, katana-g01..g14 | 2.4 GHz AMD Opteron 2216HE blades | 0.9 |
| Katana Cluster | katana-h01..h02 | 2.93 GHz Intel Xeon X5670 blades (48GB Memory) | 1.9 |
1 These machines have limited access; not all SCF users can fully utilize these systems. For those users with special access to these systems, the SU charge is 0.0 for these systems only.
Note that by “CPU” we are referring to a single “core.” Also, note that there is a distinction between how time is charged on the IBM Blue Gene from the other systems. On the IBM Blue Gene, you will be charged for the amount of wall clock time you use each processor for (so using 32 processors for 1 hour each will cost 32 * 1 * 0.25 = 8 SUs). On all the other systems, you will be charged for the actual amount of CPU usage so a disk-intensive job will be charged less than a cpu-intensive job over the same wall clock period.
Since we like to have a rough idea of the anticipated load on each machine, requests are made for the number of CPU-hours on specific machines, and these are converted to SUs. However, your SU allocation may be spent on any of the machines to which you have access. Your CPU-time usage will be reported to you in SUs.
As an example, let’s say you request 4000 CPU-hours on the Blue Gene. You will be awarded 1000 SUs (4000*0.25). You could use your 1000 SUs to run 4000 CPU-hours on the Blue Gene (1000/0.25), 667 CPU-hours on the 3.0 GHz Katana blades (1000/1.5), 1176 CPU-hours on the pSeries p655 (1000/0.85), etc. Note that even though the Katana Cluster contains blades with several different charge rates, SUs are awarded at the rate of 1.0 SU per requested CPU-hour.
Projects which exceed their allocation will be prohibited from running additional jobs. It is the responsibility of the project’s principal investigator to monitor his project’s usage and to request an appropriate allocation of time/SUs. Information on accounts and allocations may also be found on the Accounts & Project Maintenance pages.
Reporting
Each month we send principal investigators a summary of usage and remaining allocations for their projects; this report also gives /project file systems allocation and usage information for those projects with allocations on any of the /project file systems. Individual researchers are sent a summary of their own usage for all the projects with which they are associated. Individuals may also review the details of their recent usage, /project file systems disk usage, and monthly summary information using the password-protected web pages which may be found under the Accounts & Project Maintenance pages.
In addition to e-mailed reports and individual usage web pages, we have developed a utility called “acctool” to help you keep track of your CPU usage. Type “acctool -help” on any of the machines to get more information.
Project accounting
All usage accounting is based on projects. For most researchers, those who only belong to one project, the fact that the accounting is project-based will be inconsequential. However, researchers who are associated with multiple projects must pay special attention to assure that their usage is properly attributed to the correct project. The procedures for doing this are described below.
Each account on the system has been assigned a default project. This is the project which will be charged if no further actions are taken. Projects are implemented as UNIX groups on all systems. The command “groups” shows all of your projects. The first one listed is your default project. To change your default project, go to our Resource Monitoring page and click the link which begins “Individuals can see” (you will need your BU login ID and SCF [non-Kerberos] password). You should then complete and submit the appropriate web form, and your default project will be changed the next time the system configuration files are updated, generally overnight. To immediately but temporarily change your current project, you may use the “newgrp” command on any of our systems. This command will start a new UNIX shell associated with the project you specify. All interactive commands issued from this shell will be accounted to the new project. Batch jobs will generally be accounted to your default project unless you use a batch system specific method to override this behavior. You can also view the documentation for the various batch systems for details.
Configuration
In December, 2007 we made available the Katana Cluster, which has since been significantly expanded a number of times. The Katana cluster is made up of machines of a number of different configurations – please consult the cluster web page for details. The Katana Cluster runs the BULinux 5.0 operating system. The login machine via SSH is katana.bu.edu. Those with access to it may also log in to geo.bu.edu.
The Boston University Blue Gene is a single rack system, containing 1024 compute nodes. Each compute node contains a dual core 32-bit 700Mhz PowerPC 440 processor with 512MB of main memory. Our Blue Gene has a peak performance of 5.7 Teraflops. The login machines are levi.bu.edu and lee.bu.edu. New SCF users are not automatically given access to the Blue Gene. If you wish to use it, you should go here and follow the instructions.
The IBM pSeries 655 is a 72-processor system composed of six nodes, named Twister, Scrabble, Marbles, Crayon, Litebrite, Hotwheels, Jacks, Playdoh, and Slinky. Users with accounts on the IBM pSeries systems must use SSH to log in to twister.bu.edu.
Please see our Technical Summary page for more information on the configurations of all the machines in the SCF.
File Systems
You have one home directory on the IBM p655 systems and a separate shared one for the Katana Cluster and the IBM Blue Gene. All home directories are backed up nightly.
If you accidentally remove a file, you can request that it be restored. Please specify exactly what files you deleted, what machine and file system those files were on, and at what time you deleted them.
The /scratch file systems are available for people who need a large amount of storage for a short period of time. Files are automatically purged after 10 days. If there is a critical shortage of scratch space, it may be necessary to purge files which are less than 10 days old. Files which have been “touched” but not modified will be treated as old and removed immediately. The scratch partitions are NOT BACKED UP.
If you are creating files in /scratch using the tar utility, please see our Frequently Asked Questions page for additional information.
Each machine has its own /scratch partition, but these partitions can be accessed from all the machines in the same cluster by using the full path. On the pSeries machines, those paths are of the form: /hostname/scratch; for example /frisbee/scratch.
Each machine also has its own /tmp and /var/tmp file systems. These file systems are used to store temporary files created by system programs such as compilers and editors. Users should never store files in these directories. They are not backed up. Old files are removed nightly and whenever the system administrators feel that it is necessary.
Principal investigators may also request more permanent disk space on the /project and/or /projectnb file systems for their projects. This disk space is similar to /scratch, but is allocated specifically to individual projects and is not automatically purged. The difference between the two file systems is that /project is backed up nightly while /projectnb is not backed up at all. More information can be found on our Project Disk Space page.
We also encourage users with large amounts of data that does not need to be online at all times to use our mass storage facility to archive their data. This system has a very large amount of space available.
Usage policies and batch
Certain machines in each cluster have been designated for a particular set of functions. The following machines are available for interactive work: katana.bu.edu (and geo.bu.edu for a limited set of users) on the Katana Cluster, levi.bu.edu and lee.bu.edu on the IBM Blue Gene, and twister.bu.edu on the IBM pSeries systems. General interactive login sessions are allowed only on these machines.
Katana Cluster Batch System:
The batch system on the Katana Cluster is the Sun Grid Engine.
Jobs on the Katana Cluster are limited to a maximum of 64 processors and generally have a wall clock time limit of 24 hours. However, we are now allowing a limited number of jobs per user to run up to 72 hours. A user can request up to 4 processors (as 4 single processor jobs or one 4 processor job for example) with a run time limit of 72 hours; we currently have 12 slots for this purpose among all users. The default limit is 2 hours if you do not specify a higher limit.
Blue Gene Batch System:
The batch system used on the Blue Gene is IBM’s LoadLeveler. The current limitation is that all jobs must use a partition of exactly 32, 128, 512 or 1024 (the entire machine) nodes and no job may run for more than 5 hours of wall clock time. 1024-node jobs are only allowed to run in off hours.
IBM pSeries Batch System:
The batch system on the IBM pSeries machines is the Load Sharing Facility (LSF) software. There are a number of different batch queues for various types of jobs. These include “short” and “long” queues for jobs of different running times and separate queues for 1-, 4-, and 8-processor jobs. In general, the short queues will run with a higher priority than the long queues. It is very important that you submit your job to the appropriate batch queue and you should always specify a queue name when submitting a job, e.g., bsub -q p4-short progname.
Please look at our Technical Summary page for more information about the queue structure.
All long running jobs must be submitted through the batch system. A system process monitors the processor consumption of all running jobs and will automatically terminate any job which is not running under the batch system and uses more than 10 minutes of processor time.
There is a very nice X-Windows interface to the batch system, using the command “xlsbatch“. You may also use traditional UNIX-style commands. To submit a job, use the command “bsub“. To see the jobs that are queued, use “bjobs“. To see the queue parameters use the command “bqueues“. You may remove or kill a job with the command “bkill“. For more information on the batch system, see LSF Basics.
Software
Information regarding the software available on the systems can be found on our Software Packages page.
Note that for some software packages, you may need to add specific directories to your execution path or correctly set particular environment variables in order to use them. This is usually done by modifying your .login or .cshrc file. Please refer to the documentation on the web page referenced above for the specific details.
