ENG-Grid Instructions

ENG-Grid is being deprecated

Contact enghelp@bu.edu for help migrating to other resources.

Quick Start

If all you want is an interactive shell on the grid in order to run single instances of interactive programs such as Matlab, simply run ssh -X eng-grid.bu.edu and once logged in run qlogin

SSH Clients

On MacOS and Linux systems you can use included terminals and ssh. On Window you can install:

X-Windows

If you intend to use graphics, you should be running an X-Window system. On Microsoft Windows consider:

On the Mac there is XQuartz. It’s old and slow, but it still works. Please tell us if you find alternatives.

Whichever X11 server you use, secure it, and use xauth and X11-forwarding. Don’t set the DISPLAY environment variable on the remote system. For MobaXterm. turn off X11 remote access. X11-Forwarding will work flawlessly, without any popup warning, as the X11-forwarding connections tunneled using an ssh connection.

Full instructions

Follow the instructions below for a quick rundown of how to run both interactive and batch jobs, and then see Grid Overview for a list of all of the pages detailing how to use various features and software on the Grid. You can also join our Google Group mailing list to stay updated on grid-related news.

  1. Login into the grid from your Secure SHell (ssh) client:
    ssh -X eng-grid.bu.edu

    or if your local account name differs from your BU login, add your login:

    ssh -X myBUlogin@eng-grid.bu.edu

    You only need the -X option if you’re running a program the needs X-Windows (X11), e.g. Matlab or COMSOL.

    For security reasons, the Grid denies off-campus connections, so if you’re off-campus, you should connection via the VPN. Your session will be automatically terminated after 24 hours.

  2. Run interactive GUI applications: If you want an interactive shell to run a program with a GUI such as Matlab, just ssh -X eng-grid.bu.edu and run:
    $ qlogin

    and that will give you a login shell on one of the nodes in the grid. Then just start your program.

    Note that from off-campus, you should connect via the Virtual Private Network (VPN). Your session will be automatically terminated after 24 hours. If you are trying to run long jobs, see below.

  3. Batch Jobs: Note that the following instructions refer to bme.q, but depending on your department, you might use ece.q or other queues that may be available or restricted depending on your affiliation and project. Making sure you’re cd’d into your nokrb directory, run qsub with the "-cwd" (current working directory) option, like so:
    $ qsub -q bme.q -cwd -b y "your command line"
     or
    $ qsub -q bme.q -cwd yourscript.sh

    The "-b y" option is easy, but if you submit a long command string it will be clipped. We recommend using an .sh script instead. Take a look at the example script.

    /mnt/nokrb/sge/etc/whereami.sh

    To check the status of your jobs, run “qstat -f”. With an X Window server running, you can use “qmon &”. As a very quick example, try running:

    $ qsub -q bme.q -cwd -b y "whoami; pwd; hostname; date"

    You can see a graphical depiction of the Grid’s activity here.

    To abort jobs that are already running, you can use the qmon GUI, or you can use qdel. Specifically qdel ####, where #### is your job number as reported by qstat. Also qdel -u username, will kill all of your running jobs.

    Read the man pages for qsub, qstat, and qdel for more info.

  4. Job Logs: After your job is done, you should see log files in your working directory: scriptname.e#### should be your standard error, and scriptname.o#### will be your standard output, where #### is your job number. In the case that any of your jobs die, and you can’t figure out the problem based on what it says in the “job control” section of qmon, run qstat -f -j jobnumber and that will provide more information for you.For more example submission jobs, take a look at /mnt/nokrb/sge/examples/jobs. These will give you a good idea of how to code your own jobs. Start with their examples rather than writing your own from scratch–there’s some very specific shell-passing variables that need to be in there or else nothing will work!As mentioned above, there are also more queues than just bme.q, which is a public queue and is open to everyone, not just BME members. Take a look in qmon or qstat -g c. Certain queues are open to only certain sets of users. If you have a lab or departmental queue and find you cannot use it, you should ask enghelp@bu.edu to be added to the permissions for it.
  5. Memory allocation: By default your job will receive up to 2 GB of RAM. If you expect you could need more than this, you can use qsub -l s_vmem=4G to, for example, request 4 GB. (And if you know you’ll actually need less than 2, requesting a lower amount will help keep RAM free for jobs that do need it.) See the Memory Requirements page for full details.
  6. See Grid Software for a list of major software tools installed on the grid.
  7. Note, if your individual jobs are doing significant disk operations, you will get a considerable speedup by keeping that data in the /tmp directory. Please remember that any data you save there is on the local disk of that particular node and not shared with any other nodes at all, so you will need to move any data you need to keep to nokrb at the end of each individual run.
  8. We have a number of ready-to-run examples for different kinds of batch jobs hosted here: https://github.com/eng-it/grid-tests.

Low versus High-Priority Jobs

Certain queues on the Grid are set up for subordination/pre-emption. We recommend that you send low-priority or long-running jobs to

$ qsub -q lowpriority.q

This will cause higher-priority jobs to be able to suspend your jobs while they run, and then unsuspend your jobs when they are done. The advantage of using a subordinate queue such as lowpriority.q is that you get far more resources available to you at all times, while not stomping on jobs that need to complete quickly, or the highest-priority jobs submitted by the labs who “bought-in” by paying for the machines in the grid.

Note that subordinate queues are not for qlogin use — if you did send a qlogin to a subordinate queue, your qlogin could get suspended at any moment and your shell would appear to freeze.

Administrators can configure this behavior on the command line with “qconf -mq queuename”, but not in the GUI. For example, the configuration for hyness.q (accessible only to the lab that paid for the me.q machines) down to me.q (normal priority on these machines) down to lowpriority.q (which runs at subordinate or preempted priority on nearly all machines in the grid) is set as follows:

# qconf -sq hyness.q|grep subordinate
subordinate_list      slots=32(me.q:0:sr)

# qconf -sq me.q|grep subordinate
subordinate_list      slots=32(lowpriority.q:0:sr)

This mimics the configuration shown in section 2c of the subordinate_list heading of the man page at http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html . The bme.q is set similarly, but with only one level of preemption and 7 slots per node. Settings for queuewise subordination (section 1 on that man page) are set on the ece.q to cause all lowpriority.q jobs on a given machine to get suspended if the machine fills up with ece.q jobs. Slotwise preemption is not necessary here since only a minimal number of slots are allocated to ece.machines.

Managing Files

Just a bit more info: if you would like to share your files with Windows, in Windows, go to start–>run and type

\\ad\eng\users\y\o\yourloginname

and it will prompt you for a username and password. use:

login: ad\yourloginname
pass: your normal Kerberos password

This directory is the same as your home directory on Linux.

(You could also SCP or SFTP to eng-grid or eng-grid2 to access the same filesystem. www.winscp.net is a good program to use.)

See MountingENGNAS for more detailed information and instructions for specific operating systems.

Tutorials

  • For a very quick way of getting yourself up to speed on UNIX/Linux, you should go through the exercises in Unix and Linux: Visual QuickStart Guide, which is available free online when you’re logged in from BU (or from outside BU using the VPN).
  • Once you’ve done that, or if you’re already very comfortable with UNIX/Linux but don’t have much shell experience, you should look at Learning the bash Shell

You can make a shell script “submitter.sh” which ou run locally, and that script loops through and runs “qsub yourscript.sh” as many times as you want to, passing different parameters to yourscript.sh each time, and yourscript.sh will pass those parameters as variables to your compiled matlab binary, so the parameters all get set automatically.