Introduction to Python for Data Science

An introductory lesson centered around data science tasks in Python.

This lesson contains three episodes that focus on:

  1. Manipulating tabular data with Pandas
  2. Visualizing data with Pandas and altair
  3. Common tasks in machine learning

Prerequisites

We assume that you have participated in the previous lessons for day 1 and 2 of the workshop Getting Started with Python for Data Science and Automation in Biotechnology. That means, we expect you to be familiar with file paths on the shell and that you have a basic understanding of Python.

Schedule

Setup Download files required for the lesson
00:00 1. Reading and Working with Tabular Data What is a Pandas data frame?
How do I get an overview on my tabular data?
How do I read tabular data in different formats?
How do I access subsets of a data frame?
How do I calculate simple statistics like the mean?
01:30 2. Morning Break Break
01:45 3. Visualize Your Data What tools exist to plot data in Python?
How do I make a basic plot?
What visuals are available?
How can I best visualize groups of data?
03:00 4. Lunch Break Break
04:00 5. Machine Learning What is machine learning?
What kind of tasks can I apply machine learning to?
How do I perform machine learning with scikit-learn?
How do I interpret my results?
05:30 6. Afternoon Break Break
05:45 7. Machine Learning (continued) Break
06:30 8. Wrap-up and Outlook Break
07:00 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.