Compute Python w/ Conda

In this document, you will find the necessary information on how to run your Python code with the needed libraries and packages, using the power of the compute servers.

Introduction

Every Python project uses different libraries like numpy, matplotlib or photontorch, right?
But not everyone needs the same ones. For example, a user in sunset may need numpy and scipy, while another one numpy and matplotlib. Or even yourself, depending on the project you are working on, you might need different libraries for each project you do.

In order to separate workspaces, we will create what is called "virtual environment" that basically lets you create a personal space where you can install your libraries without interfering with your other projects.
Inside this personal space, you still have access to all your files; it's like working in a project folder, where everything you install is stored there.

The tool to generate this personal virtual environment is conda. Once the space is created, the user can access or exit at any time to make use of the libraries installed inside.
This means that you can be working on your scripts and, at the moment you need to execute them, to be able to make use of the packages and libraries installed inside, you can just activate the environment, execute the file, and deactivate it if you are finished.

So following with the guide, the first step is to check conda is installed and initialize the application. Once conda is activated, you can use it to create environments and install your needed packages to execute your scripts from now on.

Load the conda module

First of all you should load the conda module. With "module avail" you can list the available conda modules. To load the 24.3 version you should:

>> module load conda/24.3

Activate conda

This step 0, as explained before, is to initialize conda in our user to use it in our upcoming projects. This means that, once the application is working, we don't need to configure it again.

Check conda is installed

Conda is installed by default in the servers, but just in case, it is never a bad idea to check it.

>> conda info

This does nothing but shows the conda version and some other information about paths and directories.

Initialize conda

Even though it is installed, we need to initialize it. This is because the program is stored in the shared server folders, but we need to activate it in our userspace in order to work. This step is only done once, the first time.

>> conda init

As the code says, now we need to close the terminal and open a new one to apply the changes and settings.

Now, by default, the system runs inside the base virtual environment, which is indicated in brackets:

(base) >>

Recent Complications: If this (base) line doesn't appear, writing this command will fix it (without the need to close the terminal and open a new one):

>>source .bashrc

Here we can already install all the libraries we want, but it is preferable to do it inside a personal space created by us, which is explained in the next section.
To deactivate this automatic configuration and exit the (base) environment execute the following commands.

(base) >> conda config --set auto_activate_base false
(base) >> conda deactivate
>>

Now we are outside (base) and can create all the virtual environments we need/want, each one with different configurations.

Create a new virtual environment

Creating the environment

To create a virtual environment using conda tool, the first step is to define the name, and the version of the program running python if you want. This last setting is optional. If not specified, it will run the latest version of Python inside the environment.

>> conda create -n projectname # Default
>> conda create -n projectname python=3.10 # Extra posibilities
>> conda create -n projectname python
>> conda create -n projectname scipy=0.15.0

Accessing the environment

Once the space is created, it is time to open it and start working inside, just like if it was a new project folder. Here, as we explained before, you can install any Python's library you need.

>> conda activate projectname

Now you will notice how the command line will be preceded by the environment name in parenthesis. This means we have activated it correctly and accessed it.

(projectname) >> 
(projectname) >>

It is very important to be inside the environment when installing or executing scripts, otherwise the system will not recognize the packages installed inside and you, as a user, will not be able to use them.

Activating and deactivating the environment.

Remember: You can access, switch and exit between environments at any time, they are not deleted!!

As already seen, access to the created environment is done with "conda activate projectname".
Every time you login into your user's workspace, you can activate the environment created at any time; the environments are host in your local directory which is independent from the servers.

In order to exit the environment (it is not being deleted) to work on a different one or just because you are not using it anymore, you can type conda deactivate. Notice how this command only works when the environment is activated (indicated by parenthesis).

This example shows the list of your virtual environments, and how to access and exit them:

>> conda env list
# conda environments:
#
env1       /home/usuaris/sitsc/user/.conda/envs/env1
env2       /home/usuaris/sitsc/user/.conda/envs/env2
base         * /opt/conda

>> conda activate env1
(env1) >>
(env1) >> conda deactivate # exit from the first environment
>> # You are outside of the env again, as it no longer shows it in parenthesis

>> # Now it is time to access another one
>> conda activate env2
(env2) >>
(env2) >> conda deactivate
>>

Installing packages inside the environment

Now we are inside our workspace (indicated by parenthesis), so we can proceed to install any library we need. In this case we will focus on showing how to install photontorch, but any other library can be installed, too.

Conda virtual environment is useful for any type of project you want to work on, not only photontorch. So if you want to execute other type of Python files, you can create a different environment with other configurations and libraries installed.

(projectname) >> conda install pytorch numpy scipy
(projectname) >> pip install photontorch
(projectname) >> conda install tqdm
(projectname) >> conda install matplotlib
(projectname) >> conda install pandoc

To see the list of installed packages:

(projectname) >> conda list
(projectname) >> pip list

Create and Execute Python files in the computing servers

Now that our workspace is correctly set up with all the libraries we need to run our code installed, it is time to create and execute the scripts using the compute servers. The procedure is the following:

Create our Python script
Access the environment (indicated in parenthesis)
(env1) Execute all the scripts needed in Calcula servers
Exit the environment once we are done.

It is very important to execute the scripts in the compute servers server while we are inside the environment created. This way the scripts will be executed reading the libraries installed inside (env1).

Creating and editing our Python script

There are different ways to write our code.

The easiest one in linux is to create an "empty file" and save it as ".py". To do this open the file manager, go to the folder where you want to save your files and create the "empty file" with right-click. Then add your code and save it with "mycode.py".

The other alternative is to use the vi text editor, but it is more complex to use.

Please visit the guide on how to use Jupyter Notebook if you want to use this powerful tool.

Show and save plots on Python using Matplotlib

In case you are wondering how to plot graphs on Python, use matplotlib library from python, here is a quick guide to do so inside the environment.
To install it:

(projectname) >> conda install matplotlib

Once the graph is defined, use function plt.savefig() before plt.show() to create a png file with the plot, and then show() to plot it in the screen. This way, even if you are using the power of compute servers to run your scripts, the files generated will be stored in your local directory.

For example:

import matplotlib 
import matplotlib.pyplot as plt
plt.plot([0, 1, 2, 3, 4], [0, 3, 5, 9, 11])
plt.xlabel('Months')
plt.ylabel('Books Read')
plt.savefig('books_read.png') 
plt.show()

Then storing the shown graph is up to you, by just clicking the "save" button. Remember the graph will already be saved with the plt.savefig() command.

Execute Python scripts

Okay, now you have your code ready to be executed and are inside the environment, so it is time to use the power of the compute servers to run the code.

To understand the functionality of the commands used to execute the files on the compute servers, please feel free to review the Batch processing guide (HPC) before continuing with the guide.

Otherwise, here are some examples on how to run code in the sunset calcula server (it could be any other).

Execute a Python script using srun

Using srun, the job is submitted for execution in real-time.

Automatically, the job is sent to the first compute server that has the required resources written by the user, but it can also be specified with the -w command:

(proyecto) >> cat hello_world.py
print("Hello World")

(proyecto) >> srun -p sunset -A sunset -c1 --mem=2000 python hello_world.py
=====================================================================
 SLURM_JOB_ID        = 286028
 SLURM_NODE          = sunsetc01
 SLURM_JOB_PARTITION = sunset
=====================================================================
Hello World
=====================================================================
JobID 286028: Could not read info.
=====================================================================

(proyecto) >>

Execute a Python script using sbatch

Another way is to use the command sbatch which is used to execute bash scripts for later execution.

Basically, the main difference with srun is that results are written to a file.

This is very helpful when we want to save the results or check the values at any moment instead of just seeing it on the terminal directly as seen in the previous example.

Since sbatch only executes bash scripts and not Python ones, we need to create a new ".sh" file and add the necessary commands that will open python and execute our "file.py":

(proyecto) >> cat executable.sh #Parameters for the new "executable.sh" file #!/bin/bash #SBATCH -p sunset #SBATCH -A sunset #SBATCH -c1 #SBATCH --mem=2000 python helloworld.py (proyecto) >> sbatch executable.sh #Execute the file in calcula servers Submitted batch job 286046 (proyecto) >> cat slurm-286046.out #Read the output once the job is finished Hello World (proyecto) >>

Annex: Aditional guides

Deleting a virtual environment

There are two ways:
A. Deleting the folder

 >> rm -rf ~/.conda/envs/projectname/

B. Using conda functions

 >> conda env remove -n projectname

See list of virtual environments

 >> conda env list 
# conda environments: 
# 
proyecto /home/usuaris/sitsc/usr/.conda/envs/proyecto 
proyecto2 /home/usuaris/sitsc/usr/.conda/envs/proyecto2 
base * /opt/conda

Create a new environment with the same packages as projectname

Inside the projectname which acts as the template, save the parameters with:

(projectname) usr@sunsetc01:~$ conda env export > environment.yml

This file is used to create copies in other machines (usr2) with the command:

usr2@sunsetc01:~$ conda env create -f /environment.yml

If the user wants to create a new enviornment with the same characteristics in the same computer, the file needs to be edited. To do so:

Change name to a new one. For example: name: projectname2
Change prefix to a new one. For example: prefix: ~/.conda/envs/projectname2 Finally, access the new environment:

usr@sunsetc01:~$ conda env create -f /environment.yml 
usr@sunsetc01:~$ conda activate projectname2 
(projectname2) usr@sunsetc01:~$

Contact	usd.utgcnticupc.edu
Address	UPC Campus Nord, C/Jordi Girona 1-3, Buildings D3-D4-D5, 08034 Barcelona (SPAIN)
Telephone	+34 934017486