Machine learning educational resources (wip)
# Quick utility to embed the videos below
from IPython.display import YouTubeVideo
def embed_video(index, playlist='PLeFIaIQF2TkB04NMOWoj3vyBa58LdoRLe'):
return YouTubeVideo('', index=index - 1, list=playlist, width=600, height=350)
Part 1: Loading and Visualizing Data¶
In this video, I introduce the dataset, and use the Jupyter notebook to download and visualize it.
embed_video(1)
Relevant resources:
Fremont Bridge Bike Counter: the website where you can explore the data
A Whirlwind Tour of Python: a book introducing the Python programming language, aimed at scientists and engineers.
Python Data Science Handbook: a book introducing Python's data science tools, including an introduction to the IPython, Pandas, and Matplotlib tools used here.
Part 2: Further Data Exploration¶
In this video, I do some slightly more sophisticated visualization with the data, using matplotlib and pandas.
embed_video(2)
Relevant Resources:
- Pivot Tables Section from the Python Data Science Handbook
Part 3: Version Control with Git & GitHub¶
In this video, I set up a repository on GitHub and commit the notebook into version control.
embed_video(3)
Relevant Resources:
- Version Control With Git: excellent novice-level tutorial from Software Carpentry
- Github Guides: set of tutorials on using GitHub
- The Whys and Hows of Licensing Scientific Code: my 2014 blog post on AstroBetter
Part 4: Working with Data and GitHub¶
In this video, I refactor the data download script so that it only downloads the data when needed
embed_video(4)
Relevant Resources:
- How To Package Your Python Code: broad tutorial on Python packaging.
Relevant resources:
- Pytest Documentation
- Getting Started with Pytest: a nice tutorial by Jacob Kaplan-Moss
Relevant Resources:
- Python
strftime
reference - Pandas Datetime Section from the Python Data Science Handbook
Part 9: Further Data Exploration: PCA and GMM¶
In this video, I apply unsupervised learning techniques to the data to explore what we can learn from it
embed_video(10)
Relevant Resources:
- Principal Component Analysis In-Depth from the Python Data Science Handbook
- Gaussian Mixture Models In-Depth from the Python Data Science Handbook
Part 10: Cleaning-up the Notebook¶
In this video, I clean-up the unsupervised learning analysis to make it more reproducible and presentable.
embed_video(11)
Relevant Resources:
- Learning Seattle's Work Habits from Bicycle Counts: A blog post using Fremont Bridge data