Introduction to Scientific Computing

TJ Langford, Wright Lab & YCRC

September 28, 2021

Scientific Computing

Scientific Computing

  • Scientific Computing is a rapidly growing multidisciplinary using advanced computing capabilities to understand and solve complex problems

  • Ranges from laptop-based data acquisition to highly parallel super-computers

  • How we define it: nearly everything we do here

Scientific Computing at Wright Lab

Types of Computing Support

  • Personal devices (laptops/desktops)
  • Workstations (Data Acquisition or GPU tower)
  • Servers (compute environments or file servers)
  • Software and data management

Educational GitHub

Wright Lab Servers

  • WLab houses and maintains multiple computing environments for researchers:
    • wright, rubin, mgm, strickland, etc.
    • Each has 20+ CPUs and between 50 and 100TB of storage
  • Linux environment (Centos7) designed for compatibility with YCRC’s high-performance computing
  • Used as file servers and computational environments for the whole lab

Container building platform

  • Modern tools like Docker and Singularity enable software to be “containerized”
    • Enables code to be run on different systems without recompiling
  • YCRC Containers Tutorial
  • WLab hosts a dedicated server for building containers
  • Email for access and help

Key-based SSH logins

  • To maintain security, logins require ssh-keys

    ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
  • This produces a key-pair: two files id_rsa and id_rsa.pub

  • The private key (id_rsa) is kept secret on your computer, and the public key (id_rsa.pub) is placed on the server

Example public key

ssh-rsa  AAAAB3NzaC1yc2EAAAADAQABAAABAQCmffiWouCCBhfyL7tG5rZUIS
QbPbjm3T8HDm4okH54pDwb4r0zTRQ8BviOvGnqW7BoloBRcCk4FjIuG3L3bmidIpT
izq3GLIV/u3S6fjMnTt8iMr5/I4lKLTTwyarOqblZncIYfvEhlMG/
iF/Bcu+IL0oZ0QvRbdeD6IrRtQCwgSAqzVC0tBTRh1GuOayYQfuq5
64B8Zm2mWqLfhgvcu4y0Z2Ifs47xZKRjGoz8Cipc95EJtJXtUI+IGJ
GQcYMIz1REdyEADagqXYMCzoC1Hjt4DxCmnM1aws7S4T8dT4NQHMta/LntwXAspv
9oJqHH0+ovTti+obZ7+xEUFQiwwt langford@yellowtail

Example private key

-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAACFwAAAAdzc2gtcn
NhAAAAAwEAAQAAAgEAu7+TPw3JMC/+QKcV2JBQDqLJtne4x6pXk1KAeowAdN3hiT75dNE/
guEkNgBqwSl+CIhZGnOKNB2lSO1CHKKM1CHiSE7+3L1lpN/2wV2S2AzBhD6XneoapUKhHL
qwGDBTXPw51ywa5S6aqpr7thIJnZZbVM8jvo770hB6aFpEBmVo3fi1Emx06gJvqcf3PrBA
ZOLaIK4EECjg2+qZ8HEB5jfs166qGADaj/yXfse0dFlWcjZ9Y11uGtUIEY36pxhvXeBPSH
umBKKkVYtJ7k9eso6de29HQpPcX/ceZ/oHQklOxvqRjHH0u4HF2IzCdNTIqixBGw9VziOT
u/3jLpPV630iVcfxme8M+1WmX389IBznXs0IhUTXyRjtSwL1l/u0g/CzHy8mFJ/DQd2czd
Vu8qkJ4aDV5MDKayDCUvZlKQaeEIjaQ7O+OzcyLEDioGIBU0ULHYSItt4+VWYHYKEP3R03
GEX+KJI+dH3f12gYP54Tn5HHsp1NQEMXlPbH6xD0enI5WP1gKgAmF0YbPJ9cGnmCKAoydn
T+TVmCLnP0Wf3NIyGwdT5e1FrBj+ejFrLZeHqtRphLcwtk7wa4/EaVA0p6btWv5klmMX4A
fL8jKxuXV170I9bdIRK75BBE/QGzUNWWIJbWjY0CCzaEl20X5q73sRg5UVI775wqh4qliT
cAAAdIFh8T6xYfE+sAAAAHc3NoLXJzYQAAAgEAu7+TPw3JMC/+QKcV2JBQDqLJtne4x6pX
k1KAeowAdN3hiT75dNE/guEkNgBqwSl+CIhZGnOKNB2lSO1CHKKM1CHiSE7+3L1lpN/2wV
2S2AzBhD6XneoapUKhHLqwGDBTXPw51ywa5S6aqpr7thIJnZZbVM8jvo770hB6aFpEBmVo
3fi1Emx06gJvqcf3PrBAZOLaIK4EECjg2+qZ8HEB5jfs166qGADaj/yXfse0dFlWcjZ9Y1
1uGtUIEY36pxhvXeBPSHumBKKkVYtJ7k9eso6de29HQpPcX/ceZ/oHQklOxvqRjHH0u4HF
2IzCdNTIqixBGw9VziOTu/3jLpPV630iVcfxme8M+1WmX389IBznXs0IhUTXyRjtSwL1l/
u0g/CzHy8mFJ/DQd2czdVu8qkJ4aDV5MDKayDCUvZlKQaeEIjaQ7O+OzcyLEDioGIBU0UL
HYSItt4+VWYHYKEP3R03GEX+KJI+dH3f12gYP54Tn5HHsp1NQEMXlPbH6xD0enI5WP1gKg
AmF0YbPJ9cGnmCKAoydnT+TVmCLnP0Wf3NIyGwdT5e1FrBj+ejFrLZeHqtRphLcwtk7wa4
/EaVA0p6btWv5klmMX4AfL8jKxuXV170I9bdIRK75BBE/QGzUNWWIJbWjY0CCzaEl20X5q
73sRg5UVI775wqh4qliTcAAAADAQABAAACAHCgIaKHkJL5l1oNYUuCdqPw/3QYKZ6NDu/v
Y+cfqP5yQ+NjBZ4QEDtg96n1YhTx4QsZT+pQOS2+QvKWcTxgPn7avLWHvdeJPjpDp/CjQ3
2bWVMNgUJXtxg/+goT66L3CmsTW1c6u/+Tj3CcfDbiZyZDlhIwGE8t0t5WyDdlPr1fhCL0
GRsuOIxQXc/JhwXyEQ70Dsnf2cwf2ZPTDflwsST7k3Zm7t5rVLFfYznIbvYEyZjGGz3KRE
yQUsDFHQGz5Mq5zpW6pwLM1iwC3Jy6vCH873emb2b+8AoZYgIZuNUIyNsQYNORYLBCzv5y
ogwXwTBkT8H91wlLyknRqcC9KmuMfsO8tZtMCkE2p0qQMRjBPJpAwz7RE00KbSaqO7ddct
hlc9n2F23VhYtKvsUhgGDAVMR02TUoOMeJ1hdyNav1wvmtSV2wHZrUfRRYyz/ty4dDLLOQ
/x5eCnCJt/gGIKteCEvOAuGS6NDeYFCZh7ehtLYAytJ+5GV0CTz/j0nyyDhyFFMtGjy5p+
eoZGKN24Xz6xu+iUdd+6MGXm/gBvtVlpjGFjNR/wIfF6dTIlrO1O5fBPrU3jACCmik7d+0
0MBhaTj9va3rSqSnxdeZ9SAG9h19X0T/eiGcGyusq0zZvWatV6t/o0pLaIOjrnP//XmWza
wp1X1NVth8/XnMLBTRAAABAF9cxGFdTvsEH9FcrLKmrlFy+bmOYivfu3oUjtLgGErxEZkK
Xqc0iOA++CJT6B0SEPEXi7zgcvpH6oFVlJKeJe8lb37fxSorRZGoA3GVFJ14qyOSQst7o5
73uso6TYtqM5zff2nmbcKASQx+RUYKtJCb6oxJCCXD5jOPiYf5R0Ngw2auhfNcQFPvHFCM
UIiwJT+ZEjxDAmkYmbQpzMQm7ERB39+0r/7lHVz3IrW+f1O4UDWYndc2vNKoubEF+Yqs+H
ueQ8rGVK3pE8q0aBbIpyIIK/XY5Kjkt87ZGYep8B3YFl+7vjRUOWDt9gFZbJs2GaTeUENA
D5aLT+qQEhvLT9EAAAEBAOKckzHfsHvwx8tob77pIcFCquVmELVZwaVRsB1ZCfrd7iFXMR
U6L7yQF/8GDjiKD5jwrnz9B6vilGoiGKOiWI3gHxQnajLebOEKJ/qtcS5HYjxQ/fvtuDq3
n2gqMRsmwJWKRIzpGubCo0je5MLKxHsBEVhznTbFunAHiQoGlJcXUMUZLLtt0SbCI8/+KR
wCvFEhxgk8ZPN+M8duBH2K8/H+8/tXimbL9i5xmshVKoXUKG/UPf2wOc/XaQ996d9v0Nle
8+0RzzAkQcesNnu2aeoGBkNmKSe8yGYLUzocn8q5epy6AKJfeoNitGsQkSMwoc561shwbP
A7UUG7ydU+xIUAAAEBANQYwM4xHaZIJaAKLH7cp4rKxfjret/v0Dips8aJDF/Q7x9II26n
xPleoVSvJ7GXZXlei0IuQfbZH7fg6A2Fhn1b8jWh/bYySizLdE1Lim/VJ0joBpHfpRh/J1
Hf1L/s0446YP+PZb+GKMFZQLjB6kuIpzgPT9nlvpPRGRYTkggGlQNFJqlKcCljhpWQj6Zr
pAtug+6QyVJ+Gcq/Mq9nH6oKINiAG8UAEWFYSiwzGEc5tfajyDesoUz9BldaTDjbeiJiPx
K3uO766ElEtzQDcqzfLjVZUsuTNIvW+RpAG9tXjNSn6/hhSWfQptJrjGZ3jretZcTXQzNQ
D+eodJW0EYsAAAATbGFuZ2ZvcmRAeWVsbG93dGFpbA==
-----END OPENSSH PRIVATE KEY-----

Requesting an account

  • Email for help with access to Wright Lab Systems
  • We will ask for:
    • netID, public key, and what group/PI are you working with
  • We will then generate an account on one of the servers and help you login

Installing software

  • The WLab computing team also assists with installing software
  • Includes centrally maintained systems or laptops/workstations
  • Reach out to the WLab team and will work out how best to help

Contacts and General Help

WLab Computing Contact:

Slack #computing channel

Thomas Langford: WLC254, YCRC225

Data Storage and Movement

Data storage and Movement

  • Experiments at WLab are generating huge volumes of data
  • Need to be able to efficiently move and manage these data between institutions
  • We provide a few tools that can help with this:
    1. Globus
    2. Google TeamDrive
    3. Storage@Yale Archive

Globus

  • Globus is a cross-institutional platform for data transfer
  • Utilizes GridFTP underneath, but manages all authentication via institutional CAS
  • Web and command-line interfaces for scheduling and monitoring transfers
  • Schedule TB+ size data transfers which run in the background

Google TeamDrive

  • Through EliApps, all Yale researchers are provided unlimited Google Drive storage
  • Additionally, Google TeamDrive allows for experiments or projects to have shared storage
    • Can include non-Yale collaborators
    • Manage access individually (Read-only, Read/Write, Admin)
  • Globus Connector for Google Drive (link)
  • Currently not enabled by default, email to get access

Storage@Yale Archive

  • Storage@Yale has an archive tier for experiments that have concluded or want off-site backups of data
  • There is a cost per TB per year, but this ensures that your data are safe
  • Warning: Not high-speed, not intended for data you need to access often

Yale Center for Research Computing

Yale Center for Research Computing

  • Mission: Support and enable research across campus
    • Manage and maintain HPC Systems
    • Consult with researchers for individual issues
    • Workshop series focused on range of topics
  • Independent center of ~15 staff members
  • Located at 160 Saint Ronan St (just behind WLab)

What is a cluster?

What is a cluster?

  • Many rack-mounted computers, called nodes
  • Login node controls access and provides users a place to launch jobs
    • Never run jobs on login node, hurts others ability to login
  • Large, distributed file system for parallel access

What is a cluster?

West Campus Data Center

Grace cluster

  • Most WLab people will work on the Grace cluster
  • 27,000+ CPUs across 900+ nodes
  • 4+ PB high-performance parallel storage
  • 90 Data-center quality GPUs
  • Ideal for “high-throughput” parallel processing
  • Globus endpoint to enable large transfers of data

Requesting an account

  • Accounts are free to all Yale researchers
  • Request an account at http://research.computing.yale.edu/account-request
    • Include your PI’s name
    • An email will be sent with details of setting up ssh-keys
  • Access from terminal: sh ssh <netID>@grace.hpc.yale.edu
  • NB: must be on Yale’s network or connected via VPN

SLURM and submitting “jobs”

  • Unlike a workstation, the cluster uses a scheduler (SLURM)to manage how programs are executed
  • SLURM does the hard work of figuring out how to efficiently pack jobs into nodes
  • Jobs are submitted from the command line using interactive or batch interfaces
  • The cluster is separated into partitions for different types of jobs

Partitions

The cluster is divided into separate groups of nodes based on the types of jobs we want to run on them

  • day: limited to 24hrs of wall-time, users limited to 1000CPUs
  • week: longer run time, but fewer cores per user
  • mpi: specific group of identical nodes for highly-parallel computation (like galaxy simulations)
  • scavenge: special partition for high-throughput independent jobs, 10000CPUs/user

Available software

  • The YCRC team installs software that can be broadly used by researchers.
    • Compilers, general libraries, languages, etc.
    • Will also install specific applications if they are widely used or sufficiently complicated to install by a user
  • These can be viewed by running module avail on the cluster:

Available software

[tl397@grace2 ~]$ module avail

---------------------- /gpfs/loomis/apps/avx/modules/base ----------------------
   miniconda/4.5.12    miniconda/4.7.10 (D)

---------------------- /gpfs/loomis/apps/avx/modules/bio -----------------------
   AFNI/2019.0.24
   BamTools/2.5.1-foss-2018a
   Bowtie2/2.3.4.1-foss-2018a
   Chimera/1.12-linux_x86_64
   FSL/5.0.10-centos7_64
   FSL/6.0.0-centos7_64                   (D)
   FastQC/0.11.7-Java-1.8.0_92
   FreeSurfer/5.3.0-HCP
   GROMACS/5.1.4-foss-2016b-hybrid
   GROMACS/2016.5-intel-2018a
   GROMACS/2019.3-foss-2018a              (D)
   IGV/2.5.0-Java-1.8.0_92
   MRtrix3/3.0_RC3-foss-2018a
   MrBayes/3.2.6-foss-2016b
   PLINK/1.90-beta6.9
   Rosetta/3.10
   SAMtools/1.7-foss-2018a

Installing your own software

  • We encourage users to attempt to install any specific application
  • project directory is perfect for this:
    • ./configure --prefix=$HOME/project/
    • cmake -DCMAKE_INSTALL_PREFIX=$HOME/project/
  • Don’t hesitate to ask for help or guidance!

Python environments

  • Python users should make use of the miniconda module

    module load miniconda
    conda create --name my_env python=3.7 numpy matplotlib
    source activate my_env
  • Additional packages can be installed into an existing environment:

    (my_env)[tl397@grace1]:~$ conda install scipy

Running jobs

Interactive jobs

  • Best for prototyping code where immediate feedback is needed

  • srun command:

    srun --pty --x11 -c 4 --mem-per-cpu=5G -p interactive bash
  • This requests 4 CPUs, 20GB of RAM, sets up X11 forwarding and requests that the jobs run on the interactive partition

  • Good for long code compilation, request up to 4 CPUs and make -j4

Example interactive job:

stl397@grace2:~$ srun --pty --x11 -p interactive -t 1:00:00 bsh
[tl397@c01n01 ~]$ module load miniconda
[tl397@c01n01 ~]$ source activate my_env
(my_env) [tl397@c01n01 ~]$ python my_script.py

Job is “submitted” from the login node (grace2) and then “run” on a compute node (c01n01)

Open OnDemand

  • Web-portal to the clusters called Open OnDemand
  • Launch interactive GUI-based jobs (Remote Desktop, Jupyter Notebook, etc.)
  • ood-grace.hpc.yale.edu and login with Yale credentials
  • Great way to quickly get started with the clusters

Batch jobs

  • Many jobs don’t require user intervention/input
  • Can be submitted to run whenever SLURM can fit them in
  • Start with a script:
    1. Specifies resources
    2. Runs commands
    3. Collects outputs

Example batch script:

#!/bin/bash
#SBATCH --job-name=Example
#SBATCH -t 1:00:00 # 1hour
#SBATCH --mem-per-cpu=10g
#SBATCH --cpus-per-task=5
#SBATCH --partition=day

module load miniconda
source activate my_env
python myscript.py

Managing batch jobs: squeue

squeue: lists jobs that are currently running or waiting to run

tl397@grace2:~$ squeue -u tl397
JOBID     PARTITION       NAME       USER    ST     TIME   TIME_LIMIT  NODELIST(REASON)     NODES   CPUS    MIN_MEMORY
29438864  interactive     Example    tl397   R      0:06      1:00:00  c01n01               1        5      10G

Managing batch jobs: sacct

sacct: details of past jobs, useful for debugging why a job failed to complete

[tl397@c01n01 ~]$ sacct -j 29432699
               JobID    JobName      User  Partition        NodeList    Elapsed      State ExitCode     MaxRSS                        AllocTRES
-------------------- ---------- --------- ---------- --------------- ---------- ---------- -------- ---------- --------------------------------
29432699                   bash     tl397 interacti+          c01n01   00:21:31  COMPLETED      0:0              billing=4,cpu=4,mem=20G,node=1
29432699.extern          extern                               c01n01   00:21:31  COMPLETED      0:0       940K   billing=4,cpu=4,mem=20G,node=1
29432699.0                 bash                               c01n01   00:21:31  COMPLETED      0:0     30524K             cpu=4,mem=20G,node=1

Summary

Summary

  • YCRC and WLab provide many resources for scientific computing
    • We’ve only scratched the surface with this tutorial
  • Going forward, we will be providing more topical tutorials
    • Data Analysis, Containers, Intro to GPUs, and more
  • YCRC hosts tutorials and bootcamps every few weeks

How to get help:

Website and documentation:

research.computing.yale.edu

docs.ycrc.yale.edu

Help Email:

YCRC Users Group list-serve:

YCRC Users Group

  • YCRC hosts a monthly meeting for users from across campus
  • First Wednesday of each month at 4pm, at YCRC (160 Saint Ronan St)
  • Today’s Topic: Introduction to Neural Networks and Machine Learning by Chase Shimmin
  • Food and refreshments are provided!