Python
When you first log in to the cluster, the system python is available but this is almost always not what you want. They are installed in /usr/bin
paths and we use them for administration purposes.
We have also installed different python versions each compiled by different gcc compilers. Currently they are:
Python/2.7.15-fosscuda-2018b
Python/2.7.18-GCCcore-9.3.0
Python/2.7.18-GCCcore-10.2.0
Python/2.7.18-GCCcore-10.3.0-bare
Python/3.8.6-GCCcore-10.2.0
Python/3.9.5-GCCcore-10.3.0-bare
Python/3.9.5-GCCcore-10.3.0
Python/3.11.3-GCCcore-12.3.0
Anaconda
We have installed the Anaconda Python distribution as replacement to the system Python. Anaconda provides many additional packages which are ideal for scientific computing.
To use Anaconda, We first remove all the loaded modules
Then load the Anaconda3
You can use tab completion to see what versions are available. To see all the pre-installed Anaconda packages and their versions use the conda list
command:
[testuser1@sci ~]$ conda list
# packages in environment at /home/modules/software/Anaconda3/2024.02-1:
#
# Name Version Build Channel
_anaconda_depends 2024.02 py311_mkl_1
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
abseil-cpp 20211102.0 hd4dd3e8_0
aiobotocore 2.7.0 py311h06a4308_0
aiohttp 3.9.3 py311h5eee18b_0
aioitertools 0.7.1 pyhd3eb1b0_0
aiosignal 1.2.0 pyhd3eb1b0_0
alabaster 0.7.12 pyhd3eb1b0_0
altair 5.0.1 py311h06a4308_0
anaconda-anon-usage 0.4.3 py311hfc0e8ea_100
anaconda-catalogs 0.2.0 py311h06a4308_0
anaconda-client 1.12.3 py311h06a4308_0
anaconda-cloud-auth 0.1.4 py311h06a4308_0
anaconda-navigator 2.5.2 py311h06a4308_0
anaconda-project 0.11.1 py311h06a4308_0
anyio 4.2.0 py311h06a4308_0
aom 3.6.0 h6a678d5_0
appdirs 1.4.4 pyhd3eb1b0_0
archspec 0.2.1 pyhd3eb1b0_0
argon2-cffi 21.3.0 pyhd3eb1b0_0
argon2-cffi-bindings 21.2.0 py311h5eee18b_0
arrow 1.2.3 py311h06a4308_1
arrow-cpp 14.0.2 h374c478_1
astroid 2.14.2 py311h06a4308_0
astropy 5.3.4 py311hf4808d0_0
...
If we run python:
[testuser1@sci ~]$ python
Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
The Anaconda Python distribution is system software. This means that you can use any of its packages but you cannot make any modifications to them (such as an upgrade) and you cannot install new ones in their location. You can, however, install whatever packages you want in your home directory in custom environments. The two most popular package managers for installing Python packages are conda and pip.
Conda enables you to easily install complex packages and software. Creating multiple environments enables you to have installations of the same software in different versions or incompatible software collections at once. You can easily share a list of the installed packages with collaborators or colleagues, so they can setup the same environment in a matter of minutes.
Unlike pip, conda serves as both a package and environment manager. It is not limited to a single programming language, supporting packages for Python, R, Fortran, and more. Conda primarily uses the main channel of Anaconda Cloud for installations, but it can also access other channels like bioconda, intel, r, and conda-forge. It always installs pre-compiled binary files, which often offer better performance by utilizing Intel MKL. Below is an example of creating an environment and installing packages in it:
ml purge
ml Anaconda3/2024.02-1
source $EBROOTANACONDA3/etc/profile.d/conda.sh
conda create --name myenv <package-1> <package-2> ... <package-N>
conda activate myenv
Home disk quota
Installing python packages can simply make you run out of your home disk quota. So we strongly recommend to move your .conda
folder to the work directory.
As an example let us install the yt package:
After a few seconds, it tell us it want to get and install many packages (~ 76 MB) including python-3.11.9
. As the default python version of the loaded Anaconda module is 3.11.7
and the python minor version is not important for installing yt
we decide to explicitly determine python version to reduce downloading packages:
Now the download size is decreased to ~ 43 MB. To activate and test:
Install the environment in a specific path
If we want to install the conda environment in another directory than our
home, we can add --prefix PATH
. This enables multiple users of a project to
share the conda environment by installing it into their project folder instead
of the users home
Daily usage
To load this environment we have to use the following commands either on the command line (by assigning an interactive job) or in our job script:
Then we can use all software as usual.
To deactivate the current environment:
conda deactivate
If we need to install additional software or packages, we can search for it with:
and install it with:
To see installed packages:
To list available environments:
To remove "myproject" environment:
If the python package we are looking for is not available in conda
we can usually use pip
from within a conda environment to install additional python packages:
To update a single package with conda:
or to update all packages:
You can find the original documentation at
Share your environment
To export a list of all packages/programs installed with conda in a certain environment (in this case "myproject"):
To setup a new environment (let's call it "newproj") from an exported package list:
For more details, please see Managing packages and Managing environments.
Using python in interactive jobs
When we want to use python interactively for a time longer than just a short test, we should submit an interactive job. Before running our code, we should setup the environment as explained above.
Using jupyter Lab
To use jupyter notebook submit an interactive job and setup your environment, then in a node that is assigned to us (e.g. compute-0-0), we run
This will start jupyter and print some information (including an address where the jupyter is running at)
Then on the frontend node we run
and in our computer (one that we used to connect to the frontend) run
Finally we go to the address where the jupyter is running in our web browser, e.g.
Using python for long time jobs
For longer usage it will be better to submit a batch script. This could be like
#!/bin/bash
#SBATCH -J jupyter
#SBATCH --partition long
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --output="stdout_jup.txt"
#SBATCH --error="stderr_jup.txt"
#SBATCH --mem-per-cpu=1000
#SBATCH --time=7-0:0:0
ulimit -s unlimited
cd $SLURM_SUBMIT_DIR
ml purge
ml Anaconda3
source /home/modules/software/Anaconda3/2020.02/etc/profile.d/conda.sh
conda activate base
python ./your_python_script.py >& out
and repeat the ssh tunneling as explained above.