X

Using python on the cluster

This is a short guide to use python on the cluster using the scheduler. We don't recommend running resource intensive scripts on the login nodes because there are only 2 and other users need to be able to submit jobs and move data on these nodes. If you are running anything resource intensive on the login nodes it may exit prematurely.

Table of Contents

Setup

Prior to creating a script, a decision about the python version and features is required. Versions can be viewed with the command:

module avail 

And versions, other than the system version, can be loaded with the command module load version, for example:

module load python/3.8.7 

There are many versions of python on the cluster with different environments:

Version Description
system python 2.7.5 The default environment. This version of python is not recommended as many libraries like numpy and keras are not and cannot be easily installed and updated.
python/2.7.15 For python 2 applications, this environment has many commonly used libraries, but given that python 2 is no longer supported, getting a fix for a bug in the language or a package may be difficult.
python/3.7.0 and python/3.8.7 These versions have many commonly used libraries where support and updates are possible. As new versions of python are released, we may add additional versions. Use pip3 and python3 commands, respectively.
miniconda/2.7.15 and miniconda/3.7 These versions support many user friendly environment customizations. While each use python versions 2.7.15 and 3.7, they can actually be used to create a user environment with most any python version.
tensorflow/2.0.0 and tensorflow/2.4.1 These are just python/3.7.0 and 3.8.7 with additional cuda library modules. Use pip3 and python3 commands, respectively.
intel/2019.5 This version of the intel compiler has its own version of python 3.6.9. It can be overriden by loading the python/3.7.0 or python/3.8.7 modules after loading it. In most cases, we won't add or update a python package in the 3.6.9 version.

Installing packages

To install packages as a user, for most python versions, just use the pip or pip3 command with the additional option --user, as in (with package replaced by any package you would like to install):

pip install --user package

Once installed, it should be located in /home/$USER/.local/, a hidden folder. 

Virtual Environment

Preferable to installing packages with the --user option is creating a virtual environment after selecting the version of python above with module load version. The advantage is that you don't need to worry about conflicting package versions from the default package directory and you can install any version of a package you prefer. To create a virtual environment with most of the python versions, just use the virtualenv command ("~/myPython" could be replaced with any directory you prefer):

mkdir ~/myPython
virtualenv ~/myPython

Once the environment is created, you can active the environment with the source command (from the terminal or a script, note that you don't have to run the module load command):

source ~/myPython/bin/activate

Once the environment is activated, you can install packages simply using the pip install command, without --user (with package replaced by any package you wish to install):

pip install package

For batch scripts, you would use the source command in place of the normal module load command. To deactivate the environment, just use the command:

deactivate

Conda Environment

Similar to the python virtual environment, the conda environment can be enabled by loading the miniconda module, for example module load miniconda/3.7, and then running the following command to create the environment:

conda create --name myConda

This will, by default, create the environment in a hidden directory at ~/.conda/envs/myConda. To activate the environment, just use the command:

conda activate myConda

Once the environment is activated, you can install packages simply using the conda install command (with package replaced by any package you wish to install):

conda install package

To deactivate, just use the command:

conda deactivate

Execution

Executing python on the HPC's cluster is typically done through batch submission scripts, but interactive jobs can also be achieved through the srun command or an ipython/Jupyter notebook server process.

Batch Script

The batch script is just a normal submission script with environment loading (module load, sourcing the virtual environment, or conda activate) and python commands (python and python3). Use this template for an example.

Interactive

To run python interactively, run the srun command for a normal bash interactive environment (for 1 CPU-core, 1 GB of memory, and 10 minutes in the computeq partition):

srun -t 00:10:00 --pty bash

Then just load the environment you want with module load, sourcing the virtual environment, or conda activate, for example:

module load python/3.8.7
python3 myScript.py

Remember to run the exit command when you are finished running your interactive environment.

Optionally, you can replace the bash command with the python or python3 command in srun to get an interactive python terminal (just remember to load your environment before running srun), for example:

module load python/3.8.7
srun -t 00:10:00 --pty python3
#>>>

Note that during any interactive srun sessions, you may see messages such as "srun: job ###### queued and waiting for resources", and this indicates that the system is unable to allocate your job immediately. You can either reduce the resources your job uses (such as CPU-cores, memory, or time-limit) or you can just wait. The system will eventually run your job, but it really is just best to use the batch submission.

Jupyter Notebook

Using Jupyter Notebook on the cluster requires submitting a batch job using a script similar to this example script, and a terminal with X11 forwarding (such as MobaXTerm, "ssh -Y uuid@hpclogin.memphis.edu", or x2go). Once you are connected, submit the job using the sbatch command, for example:

sbatch jupyter-test.sh

Check that the job submitted with the squeue command, for example:

squeue -u $USER
# JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
#2147524  computeq jupyter- jspngler  R       1:45      1 c006

And then looking at the output of the slurm-jobid.out file using the cat command, for example:

cat jupyter-2147524.out 
#*** Starting Jupyter on:  c006

#[I 13:15:41.606 NotebookApp] Serving notebooks from local directory: /home/jspngler/python
#[I 13:15:41.606 NotebookApp] Jupyter Notebook 6.4.3 is running at:
#[I 13:15:41.606 NotebookApp] http://c006:8888/?token=ASDFASDFASDF
#[I 13:15:41.606 NotebookApp]  or http://127.0.0.1:8888/?token=ASDFASDFASDF
#[I 13:15:41.606 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
#[C 13:15:41.611 NotebookApp]
#   
#    To access the notebook, open this file in a browser:
#        file:///home/jspngler/.local/share/jupyter/runtime/nbserver-100836-open.html
#    Or copy and paste one of these URLs:
#        http://c006:8888/?token=ASDFASDFASDF
#     or http://127.0.0.1:8888/?token=ASDFASDFASDF

Then open the suggested URL with either the firefox or chromium-browser commands, for example:

chromium-browser http://c006:8888/?token=ASDFASDFASDF

If you choose to exit the kernel or quit the notebook in the jupyter-notebook page, your job will quit. If you don't, you can repeat the firefox or chromium-browser command to reopen your notebook, assuming your job hasn't reached it's time-limit or memory limit (check using the squeue -u $USER command and the slurm-jobid.out file).

Input and Output

With interactive jobs and jupyter-notebook jobs, the output is straight forward (what you see is what you get, plus any files your script/program creates). For batch submission, you have the option of redirecting the standard output and standard error to separate files with the #SBATCH directives (use "%j" to avoid overwriting files without the same jobid):

#SBATCH --output="standardOutput-%j.out"
#SBATCH --error="standardError-%j.out"

If you have files output to standard windows paths, such as "c:\path\to\file" or "path\to\file", you should change the output to match the Linux standard directories using "/" instead of "\" and change "c:\" absolute paths to your own directories' paths like "${HOME}/path/to/file". Additionally, you will need to upload any files your script will open as input to the cluster.