X

Information for Current Faculty and Students

This page contains information to assist faculty with getting started with High Performance Computing (HPC). To begin, faculty members should create a ticket at the University of Memphis helpdesk to request an account (Self Service->Research and HPC Software->HPC Account). You can also reach out to us at hpcadmins@memphis.edu if you have any questions.

Getting Started

The HPC system uses your normal University of Memphis credentials to authenticate so you log in with your UofM username (UUID) and password - the same credentials you use for University email, OneDrive access, etc. However, an account still needs to be created for you on the HPC system before you can log in. To do that, please make a ticket to request an account (Self Service->Research and HPC Software->HPC Account) with your University username (UUID), full name and department and, if a student, your advisor's name. To receive an HPC account, you must be an enrolled student, faculty or staff member, or conducting research affiliated with a department within the University of Memphis. If the latter case applies, then we must have verification from the department which will sponsor your time on our resources and we may discuss this on a case by case basis.

Logging In

Once you have acquired an HPC account, you may open up a s(ecure)sh(ell) to the login node by opening up a Terminal window (Linux/Mac) and running the following command:

ssh [username]@bigblue.memphis.edu

Windows users will most likely need to install a terminal emulation application such as PuTTY or MobaXTerm, SSH clients which offers a graphical user interface where you can input the necessary login information, such as your username, password, and host name. An SFTP(Secure File Transfer Protocol) client such as WinSCP or Tunnelier will also be necessary for uploading to and downloading from, your share on the cluster.

If you wish to access the HPC from outside the University's LAN you use the University of Memphis VPN.

Submitting Jobs

ALL jobs must be submitted to the SLURM job scheduler as either batch job (through a submission script) or interactively.

Batch Scripts (Templates for batch jobs with explanations on how to modify)

Interactive Submission (Without GUI and with GUI information)

Scheduler Control Commands (A selection of optional commands available for customization)

Python (How to use python on the cluster)

Matlab (How to use matlab on the cluster)

Best Practices

Jobs

  • The login nodes are meant for editing scripts and submitting jobs, and if a command takes longer than a few minutes to run, it is probably appropriate to run an interactive job with the SLURM srun or salloc commands.
  • The SLURM sbatch command submits batch job scripts.
  • The SLURM salloc command allocates resources for a job. Run the scancel command to relinquish the allocation.
  • The SLURM srun command runs an interactive job. Exiting depends on the command that is run interactively. If you run bash interactively, then the exit command will stop the job.
  • Make sure you define, with --cpus-per-task or --ntasks, the appropriate number of CPU-cores for your workload.
  • Make sure you define, with --nodes, the appropriate number of nodes for your workload.
  • Make sure you define, with --mem-per-cpu or --mem, the appropriate amount of memory for your workload.
  • Use job names, with --job-name, that help you identify what jobs you are running.
  • Use the least amount of time, with --time, you think your job will take. E-mail us with the jobId if you need more time.
  • The SLURM sacct command can provide resource usage and allocation details for completed jobs.
  • The SLURM squeue and sstat commands can provide resource allocation and usage details, respectively, for pending and running jobs.
  • The SLURM scontrol command can be used to update some pending and running jobs' resources.
  • The SLURM scancel command can be used to cancel pending and running jobs.
  • Run a few small jobs that get progressively larger to determine appropriate resource, such as memory or time, sizes.
  • Over 300 other users run jobs on the cluster, and if your job doesn't start immediately, it will be in pending state until resources are available.
  • Don't modify or delete your jobs' files when your jobs' are running!
  • Try not to include the % character in your jobs. It is usually a control character in SLURM. I.e. %a, %j, and %A are all sbatch related.

Storage

  • We have 690 terabytes of storage shared among 300+ users, be considerate and use sparingly if you can.
  • If you have a lot of past job files you would like to keep, consider compressing them, tar -czf [jobFolder].tar.gz [jobFolder], or ask us about archive storage.
  • Cluster storage is for cluster jobs, not personal backup.
  • Storage in your home and project directories, /home/$USER/ and /project/$USER/, is backed up weekly, but your scratch, /scratch/$USER/, is never backed up.
  • Previous cluster home directories are available on the cluster, /oldhome/$USER/, and in archive.
  • Consider using a self contained folder for jobs that include the job batch script, input data, and output data files.
  • Deleted files, using the rm command, are gone unless they've been around at least a day, then they might be in the daily GPFS snapshots.
  • Files and directories names' containing a space usually need an escape sequence,  Hello\ HPC, or quotes, "Hello HPC".
  • High speed storage is available in /scratch/$USER/ and the quota is initially 10 TB.
  • Long term storage is available in /project/$USER/ and the quota is initially 1 TB.
  • User storage is available in /home/$USER/ and the quota is initially 50 GB.

Network

  • We have a very fast connection inbound and outbound from the cluster. Feel free to upload and download as much as you need.
  • Our very fast connection is only intended for cluster data. We will not host your server or database on the internet.
  • Our internal network is extremely fast, especially between nodes, but storage is the slowest component.
  • Our extremely fast internal network can still fill the storage in less than a day.

Software

  • Our operating system is Rocky Linux 8.8, and we cannot directly run Windows, Mac, Ubuntu, etc only programs.
  • Singularity is the container system we currently use. If you need a docker container, it is often trivial to use with Singularity.
  • You can install software in your home directory. Generally, installs use the make toolchain:
    1. ./configure --prefix=$HOME
    2. make
    3. make install
  • If you need software that requires root/sudo, make a ticket under Self Service->Research and HPC Software->HPC Account.
  • In general, we will not install software that gives users root/sudo access or interferes with the scheduling software.
  • Learn what you need about Linux commands, but generally, you only need to know cd, ls, rm, mkdir, rmdir, scp/rsync, ssh, and vi/nano/emacs commands.
  • Linux programs are generally case sensitive, i.e. a and A are different, unlike Windows.
  • Be careful when using the $ character in scripts and command line because it usually indicates a variable. Use the escape sequence, \$threeFifty, or single quotes, '$threeFifty', instead.

Visualization

  • Although we have GPU nodes on the cluster for general purpose computing (CUDA or OpenCL), they do not have the ability to speed up displayed applications through OpenGL.
  • The mesa module uses CPUs to display OpenGL content. While the CPUs are quite powerful, they are not quite as fast as some dedicated GPUs.

Useful Links

Resources

Intel Compiler Directives for Fortran

Intel Preprocessor Options for C++

Links

Job Scheduler Control Commands and Environmental Variables

CUDA Documentation