Introduction to Python for Data Analysis and Automation in Biology

Technical University of Denmark

August 21-23, 2023

09:00-16:00

Instructors: Kai Blin, Alberto Delgado, Marjan Mansourvar

Helpers: Angel Luu Phanthanourak, Judit Szenei

General Information

Software Carpentry aims to help researchers get their work done in less time and with less pain by teaching them basic research computing skills. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

For more information on what we teach and why, please see our paper "Best Practices for Scientific Computing".

Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: DTU Lyngby Campus, Building 202, Room 8003. Get directions with OpenStreetMap or Google Maps.

When: August 21-23, 2023. Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organizers have checked that:

Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.

Contact: Please email kblin@biosustain.dtu.dk for more information.

Roles: To learn more about the roles at the workshop (who will be doing what), refer to our Workshop FAQ.


Code of Conduct

Everyone who participates in Carpentries activities is required to conform to the Code of Conduct. This document also outlines how to report an incident if needed.


Schedule

Day 1

09:00 Automating Tasks with the Unix Shell
10:20 Morning break
10:35 Automating Tasks with the Unix Shell (Continued)
12:30 Lunch break
13:30 Version Control with Git
14:30 Afternoon break
14:45 Version Control with Git (Continued)
15:45 Wrap-up
16:00 END

Day 2

09:00 Introduction to Python
10:30 Morning break
10:45 Introduction to Python (Continued)
12:00 Lunch break
13:00 Introduction to Python (Continued)
14:30 Afternoon break
14:45 Introduction to Python (Continued)
15:45 Wrap-up
16:00 END

Day 3

09:00 Introduction to Pandas
10:30 Morning break
10:45 Visualizations with Altair
12:00 Lunch break
13:00 Introduction to Machine Learning with Scikit Learn
14:30 Afternoon break
14:45 Introduction to Machine Learning with Scikit Learn (Continued)
15:30 Wrap-up and Outlook
16:00 END

Setup

To participate in a Software Carpentry workshop, you will need access to software as described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

The Bash Shell

Bash is a commonly-used shell that gives you the power to do tasks more quickly.

The default shell in some versions of macOS is Bash, and Bash is available in all versions, so no need to install anything. You access Bash from the Terminal (found in /Applications/Utilities). See the Git installation video tutorial for an example on how to open the Terminal. You may want to keep Terminal in your dock for this workshop.

To see if your default shell is Bash type echo $SHELL in Terminal and press the Return key. If the message printed does not end with '/bash' then your default is something else and you can run Bash by typing bash

If you want to change your default shell, see this Apple Support article and follow the instructions on "How to change your default shell".

Video Tutorial

The default shell is usually Bash and there is usually no need to install anything.

To see if your default shell is Bash type echo $SHELL in a terminal and press the Enter key. If the message printed does not end with '/bash' then your default is something else and you can run Bash by typing bash.

Text Editor

When you're writing code, it's nice to have a text editor that is optimized for writing code, with features like automatic color-coding of key words. The default text editor on macOS and Linux is usually set to Vim, which is not famous for being intuitive. If you accidentally find yourself stuck in it, hit the Esc key, followed by :+Q+! (colon, lower-case 'q', exclamation mark), then hitting Return to return to the shell.

In this course, we'll be using nano, which comes pre-installed and Visual Studio Code, so be sure to install that.

Python

Python is a popular language for research computing, and great for general-purpose programming as well. Installing all of its research packages individually can be a bit difficult, so we recommend mambaforge.

Why do we use mamba (and mambaforge) for this course?

Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.9 is fine).

  1. In the WSL2 terminal, type
    cd ~
    and press Enter (or Return depending on your keyboard).
  2. Type
    wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
    and press Enter
  3. Type
    bash Mambaforge-$(uname)-$(uname -m).sh
    and then press Enter. You will follow the text-only prompts. To move through the text, press Spacebar. Type yes and press enter to approve the license. Press Enter (or Return) to approve the default location for the files. Type yes and press Enter (or Return) to prepend mamba to your PATH (this makes the mamba distribution the default Python).
  4. Close the terminal window and re-start the Terminal app.
  5. Type
    conda config --set auto_activate_base false
    and press Enter to stop conda from auto-activating.
  6. Close the terminal window and re-start the Terminal app one more time.
  7. Type the following commands, hit Enter after each line:
    conda config --add channels defaults
    conda config --add channels bioconda
    conda config --add channels conda-forge
    conda config --set channel_priority strict
    mamba activate base
    mamba update mamba
    mamba update --all
              
  8. Lastly, to install Jupyter Lab and all dependencies, type the following commands, hit Enter after each line:
    mamba create -n jupyter python=3
    mamba activate jupyter
    mamba install numpy pandas scikit-learn matplotlib jupyter jupyterlab altair pip git
                
  9. To check your install, type
    explorer.exe .
    in your WSL terminal and hit Enter to open an explorer window in your WSL home directory.
  10. Download and copy the check_install.py script into your WSL home directory.
  11. In the WSL terminal window, run
    python3 check_install.py
    . If it prints
    All dependencies installed!
    you're good to go.
Also see our YouTube tutorials for installing miniconda and installing jupyter.
  1. In your Terminal app, type
    cd ~
    and press Enter (or Return depending on your keyboard).
  2. Type
    wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
    and press Enter
  3. Type
    bash Mambaforge-$(uname)-$(uname -m).sh
    and then press Enter. You will follow the text-only prompts. To move through the text, press Spacebar. Type yes and press enter to approve the license. Press Enter (or Return) to approve the default location for the files. Type yes and press Enter (or Return) to prepend mamba to your PATH (this makes the mamba distribution the default Python).
  4. Close the terminal window and re-start the Terminal app.
  5. Type
    conda config --set auto_activate_base false
    and press Enter to stop conda from auto-activating.
  6. Close the terminal window and re-start the Terminal app one more time.
  7. Type the following commands, hit Enter after each line:
    conda config --add channels defaults
    conda config --add channels bioconda
    conda config --add channels conda-forge
    conda config --set channel_priority strict
    mamba activate base
    mamba update mamba
    mamba update --all
              
  8. Lastly, to install Jupyter Lab and all dependencies, type the following commands, hit Enter after each line:
    mamba create -n jupyter python=3
    mamba activate jupyter
    mamba install numpy pandas scikit-learn matplotlib jupyter jupyterlab altair pip git
                
  9. To check your install, download the check_install.py script.
  10. In the Terminal window, run
    python3 ~/Download/check_install.py
    . If it prints
    All dependencies installed!
    you're good to go.
  1. In a terminal window, type
    cd ~
    and press Enter (or Return depending on your keyboard).
  2. Type
    wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
    and press Enter
  3. Type
    bash Mambaforge-$(uname)-$(uname -m).sh
    and then press Enter. You will follow the text-only prompts. To move through the text, press Spacebar. Type yes and press enter to approve the license. Press Enter (or Return) to approve the default location for the files. Type yes and press Enter (or Return) to prepend mamba to your PATH (this makes the mamba distribution the default Python).
  4. Close the terminal window and re-start the terminal.
  5. Type
    conda config --set auto_activate_base false
    and press Enter to stop conda from auto-activating.
  6. Close the terminal window and re-start the terminal one more time.
  7. Type the following commands, hit Enter after each line:
    conda config --add channels defaults
    conda config --add channels bioconda
    conda config --add channels conda-forge
    conda config --set channel_priority strict
    mamba activate base
    mamba update mamba
    mamba update --all
              
  8. Lastly, to install Jupyter Lab and all dependencies, type the following commands, hit Enter after each line:
    mamba create -n jupyter python=3
    mamba activate jupyter
    mamba install numpy pandas scikit-learn matplotlib jupyter jupyterlab altair pip git
                
  9. To check your install, download the check_install.py script.
  10. In the Terminal window, run
    python3 ~/Download/check_install.py
    . If it prints
    All dependencies installed!
    you're good to go.