X

Information for Current Faculty and Students

This page contains information to assist researchers with getting started with High Performance Computing (HPC). To request and account, create a ticket at the University of Memphis helpdesk  (Self Service -> Research and HPC Software -> HPC Account). You can also reach out to us at hpcadmins@memphis.edu if you have any questions.

Getting Started

The HPC system uses your normal University of Memphis credentials for authentication--the same username and password you use for university email, OneDrive, etc. However, before you can log in, an account needs to be created for you on the HPC cluster itself.  To do that, please create a ticket to request an account (see above).  HPC accounts are available to enrolled students, faculty and staff of the university.  External researchers and collaborators affiliated with the University can ask their collaborator to sponsor an account.

Logging In

Once you have an HPC account, Linux and Mac users can open a terminal window and start a secureshell (ssh) connection to the login node by running the following command:

ssh [username]@bigblue.memphis.edu

Windows users can use the Windows subsystem for linux (WSL), or install a terminal emulation application such as PuTTY or MobaXTerm--SSH clients which offer a graphical user interface where you can input the necessary login information, such as your username, password, and host name. An SFTP (Secure File Transfer Protocol) client such as WinSCP or Tunnelier will also be necessary for uploading/downloading files on the cluster.

For access the HPC from outside the University, connect first to the University's VPN and then connect to the cluster.

We have recently added the web-based interface from OpenOnDemand, which can be accessed from Campus/VPN IP addresses, at the URL https://bigblueweb.memphis.edu.  This interface allows for much fo the operability of the cluster from your web browser.

Submitting Jobs

ALL jobs must be submitted to the SLURM job scheduler as either batch job (through a submission script) or interactively.

Batch Scripts (Templates for batch jobs with explanations on how to modify)

Interactive Submission (Without GUI and with GUI information)

Scheduler Control Commands (A selection of optional commands available for customization)

Python (How to use python on the cluster)

Matlab (How to use matlab on the cluster)

Note that with the web interface, BigBlueWeb, you can start an interactive GUI job from your browser.

Best Practices

Jobs

  • The login nodes are meant for editing scripts and submitting jobs, and if a command takes longer than a few minutes to run, it is probably appropriate to run an interactive job with the SLURM srun or salloc commands.
  • The SLURM sbatch command submits batch job scripts.
  • The SLURM salloc command allocates resources for a job. Run the scancel command to relinquish the allocation.
  • The SLURM srun command runs an interactive job. Exiting depends on the command that is run interactively. If you run bash interactively, then the exit command will stop the job.
  • Make sure you define, with --cpus-per-task or --ntasks, the appropriate number of CPU-cores for your workload.
  • Make sure you define, with --nodes, the appropriate number of nodes for your workload.
  • Make sure you define, with --mem-per-cpu or --mem, the appropriate amount of memory for your workload.
  • Use job names, with --job-name, that help you identify what jobs you are running.
  • Use the least amount of time, with --time, you think your job will take. E-mail us with the jobId if you need more time.
  • The SLURM sacct command can provide resource usage and allocation details for completed jobs.
  • The SLURM squeue and sstat commands can provide resource allocation and usage details, respectively, for pending and running jobs.
  • The SLURM scontrol command can be used to update some pending and running jobs' resources.
  • The SLURM scancel command can be used to cancel pending and running jobs.
  • Run a few small jobs that get progressively larger to determine appropriate resource, such as memory or time, sizes.
  • Over 300 other users run jobs on the cluster, and if your job doesn't start immediately, it will be in pending state until resources are available.
  • Don't modify or delete your jobs' files when your jobs' are running!
  • Try not to include the % character in your jobs. It is usually a control character in SLURM. I.e. %a, %j, and %A are all sbatch related.

Storage

  • We have 690 terabytes of storage shared among 300+ users, be considerate and use sparingly if you can.
  • If you have a lot of past job files you would like to keep, consider compressing them, tar -czf [jobFolder].tar.gz [jobFolder], or ask us about archive storage.
  • Cluster storage is for cluster jobs, not personal backup.
  • Storage in your home and project directories, /home/$USER/ and /project/$USER/, is backed up weekly, but your scratch, /scratch/$USER/, is never backed up.  Also, note that backups of data are in the same data center as the original storage--there is no off-site backup facility.
  • Files deleted using the rm command are gone unless they've been around at least a day, in which case then they might be in the daily GPFS snapshots.
  • Previous cluster home directories are available on the cluster, /oldhome/$USER/ until May of 2025.
  • Consider using a self contained folder for jobs that include the job batch script, input data, and output data files.
  • Spaces in file and directory names usually need an escape character,  Hello\ HPC, or quotes, "Hello HPC".
  • High speed storage is available in /scratch/$USER/ and the quota is initially 10 TB.
  • Long term storage is available in /project/$USER/ and the quota is initially 1 TB.
  • User storage is available in /home/$USER/ and the quota is initially 50 GB.

Network

  • We have a very fast connection inbound and outbound from the cluster. Feel free to upload and download as much as you need.
  • Our very fast connection is only intended for cluster data. We cannot host your server or database on the internet.
  • Our internal network is extremely fast, especially between nodes--storage is the slowest component.
  • Our extremely fast internal network can still fill the storage in less than a day.

Software

  • Our operating system is Rocky Linux 8.8, and we cannot directly run software intended for other operating systems (Windows, Mac, Ubuntu, etc).
  • Singularity is the container system we currently use. If you need a docker container, it is often trivial to use with Singularity.
  • You can install software in your home directory. Generally, installs use the make toolchain:
    1. ./configure --prefix=$HOME
    2. make
    3. make install
  • If you need software that requires root/sudo, make a ticket under Self Service->Research and HPC Software->HPC Account.
  • In general, we will not install software that gives users root/sudo access or interferes with the scheduling software.
  • Learn what you need about Linux commands, but generally, you only need to know cd, ls, rm, mkdir, rmdir, scp/rsync, ssh, and vi/nano/emacs commands.
  • Linux programs are generally case sensitive, i.e. a and A are different, unlike Windows.
  • Be careful when using the $ character in scripts and command line because it usually indicates a variable. Use the escape sequence, \$threeFifty, or single quotes, '$threeFifty', instead.

Visualization

  • Although we have GPU nodes on the cluster for general purpose computing (CUDA or OpenCL), they do not have the ability to speed up displayed applications through OpenGL.
  • The mesa module uses CPUs to display OpenGL content. While the CPUs are quite powerful, they are not quite as fast as some dedicated GPUs.

Useful Links

Resources

Intel Compiler Directives for Fortran

Intel Preprocessor Options for C++

Links

Job Scheduler Control Commands and Environmental Variables

CUDA Documentation