Introduction to Python for Data Science: Glossary

Key Points

Reading and Working with Tabular Data
  • Use pandas.read_* and pandas.DataFrame.to_* to import / export data.

  • The method info describes the data frame object.

  • The method describe summarizes value distributions in columns.

  • You can use labels or index locations to select both subsets and elements from your data frame.

  • Selection using conditions is very powerful.

  • Selections create views on your original data.

  • Use mean, max, min, and others to calculate simple statistics.

  • Use split-apply-combine to calculate statistics within groups in a data frame.

Visualize Your Data
  • Pandas provides quick ways to create simple visualizations.

  • A layered grammar of graphics implementation provides a structured approach to plotting.

  • A good implementation can make expressing complex visualizations straightforward.

Machine Learning
  • Preparing data in the right format is often the hardest task.

  • Machine learning provides methods for tasks such as dimensionality reduction, clustering, classification, and anomaly detection.

  • Having good visualizations is crucial for interpretation.

Glossary

FIXME