How to: HPC2N

Starting off with HPC2N

The HPC2N is a high-performance computing resource hosted at the Umeå University. It is part of the National Academic Infrastructure for Supercomputing in Sweden (NAISS). As all NAISS infrastructure, it is open to all researchers within Sweden free of charge. The cluster guide will give you a good overview of what the system is.

HPC2N Training & Drop-in

HPC2N offers regular training to improve usage of their systems. For a regularly updated list, please see here.

Applying for an account at SUPR NAISS

In order use any NAISS resource, you need to apply for an account at SUPR NAISS.

You will also use this page to log on when appying for projects.

Applying for projects at HPC2N

Here is the link to the official documentation on how to get access to HPC2N. If you don’t have an account at HPC2N, you need to be added to an existing project, or apply for a project then request an account.

The following is a small rundown of this information based on our experience.

More often than not, small compute and small storage projects are enough for the bioinformatics projects we see at SLU. You must apply for a storage project alongside a compute project if you are planning to do any analyses.

The administrators at NAISS approve projects at least once a week, but often more frequently.

When you have logged in to SUPR, there will be a list of options in the panel on the left, select ‘Resources’, scroll down to ‘Kebnekaise’, and click on ‘Go to HPC2N Local Compute 2025’ to get to the application. From there you will ‘Create a New Proposal for HPC2N Compute 2025’.

From here you can apply for computing and storage projects.

Computing Projects

You will be redirected to apply for a compute project based on the above instructions. From there, you can also apply for a small storage project if the linked storage project is not large enough for your project.

Storage Projects

When applying for a separate storage project, you will have to request the size of the project folder as well as the number of files you will be storing.

As a general rule of thumb:

  • For folder size: multiply the size of the data you are uploading by 5-7, depending on the analysis you will be doing. If the size of your files decreases with each step of your pipeline (and you remove large intermediate files during your workflow) as with variant calling for example, a factor of 5 is usually more than enough. If you are performing multiple genome alignments from de novo assemblies, for example, your file sizes don’t decrease significantly between each step of the pipeline, then you should multiply by a higher factor of 7 (or higher, depending on what you will be doing).

  • For number of files: multiply the number of files you are starting out with by 1000. If you are using Nextflow workflows, this number may have to be a bit higher (but should not exceed 5000) as the number of intermediate files produced here is higher due to parallelisation.

In short, consider the analyses you will be doing, what kind of data you are producing, and how big the files are. With these two very general rules, it should be easier to decide on these parameters when applying for projects!

When your storage and compute projects have been approved, you will be able to upload data from your computer, or download from data delivery systems. If you are downloading from SciLifeLab, please see their guide.

Your approved projects will be located under /proj/nobackup where xyz are the project identifiers provided to you by NAISS, or based on the folder name you selected when applying for the project. Upload your project data into the storage project folder. DO NOT upload your data to your user home directory as that directory is only intended for things you will use for every project and space is limited.

Loading modules

HPC2N uses modules. This ensures that the path is not flooded, and allows for multiple versions of software to be installed simultaneously.

  • To see all available modules type module avail
  • If you want to see the modules that match your search term type module spider tool_name. This is not case specific, so you could search “fastqc” to see all modules that match the string “fastqc” (without the inverted commas!)
  • To load a module type module load tool_name from the module spider results. This is case specific, so be sure to

Slurm

Dardel uses a job manager called Slurm, and they have put a how-to onto their homepage. The following is simply a short overview of the most commonly used functions by SLUBI, and not an exhaustive list:

To use Slurm, the project ID from your compute project is used for the -A flag. Your storage project ID is where you store data. Here, the / and spaces in the name are replaced by - as you can see below.

More usefull flags and options:

  • Time flags are in the format of HH:MM:SS
  • To run an interactive session use interactive -A naiss202x-xy-xyz -t 2:00:00 to launch an interactive session of 2 hours. Time can be changed according to your needs
  • If you need to run a batch script, set it up so that the information in the HPC2N guide is in the header with your project number. You submit your script with the sbatch script.sh command
  • To view the jobs active for your user squeue -u your_username
  • To cancel a job scancel jobnumber. You can get the jobnumber from the squeue command above.

Containers

HPC2N readily supports the use of containers through the use of apptainer. Please make use of this way to make your research more reproducible and transferable

Nextflow

HPC2N supports the use of Nextflow.