Daily Schedule Part 2 (Actual — Kept Retrospectively)
Back to Course home page
See also Daily Schedule - Part 1
Part 2: Data Science Foundations (using Joel Grus, Data Science from Scratch, 2nd Edition)
Part 2 Uses Grus and lasts for the remaining four weeks of Term 6
Week 4 — Yet Another Review of Python — Some Vector and Matrix Algebra — Statistics and Probability
- June 4 — Chapters 1-3: Mostly redundant but excellent review of Python and Matplotlib — Review the three chapters, but completely stop using Jupyter or Jupyter lab, and instead get everything working in PyCharm Professional Edition (free for students) — When Grus says that you should not be tampering with your base Python environment, he is completely correct — So learn how to make a venv that you could call grus or dsfs and then switch to it
- June 7 — Chapters 4-6: Linear Algebra, Statistics, and Probability (due to having taken last fall’s Bayesian Statistics class, much of the math in Chapters 5 and 6 will be review)
Week 5 — Optimization (aka Minimization and Maximization) — Working with Data
- June 11 — Chapters 7 and 8: Hypotheses & Inference and Gradient Descent — Make a local repo from the magic hexijin.github.io GitHub repo, put an index.md file in it, and then push to origin main — The only remaining step to having your own home page is to enable GitHub pages in this repo — For more advanced reading, Grus recommends this Overview of Gradient Descent by Eric Ruder
- June 15 — Chapters 9 and 10: Getting and Working with Data (including subtracting the mean and dividing by the standard deviation to get rescaled data sets, and a load of utilities for doing principal component analysis, that Grus somewhat-too-rapidly introduced at the end of Chapter 10)
Week 6 — Machine Learning — Linear Regression
- June 19 — Chapters 11 and 13: Machine Learning and Naive Bayes (and you may need to pick up some material from Chapter 12 on k-Nearest Neighbors which we are otherwise skipping)
- June 21 — Chapters 14 and 15: Simple Linear Regression and Multiple Regression — In Chapter 15, Grus squeezed in a digression on The Bootstrap which is a computational approach not just to estimating parameters, but to estimating uncertainties in those parameters
In the interest of getting to Neural Networks and Deep Learning in our final week, we are skipping Chapter 12 (on k-Nearest Neighbors), Chapter 16 (on Logistic Regression), and Chapter 17 (on Decision Trees)
Week 7 — Neural Networks — Deep Learning
- June 23 — Chapter 18: Neural Networks
- June 25 (no meeting, but do the live coding session) — Get a feeling for how a real pro codes, including type-hinting, systematic adherence to style choices, and code testing, by building the code in PyCharm as Grus builds a deep learning libary in VS Code, pausing the live coding session whenever you need to catch up with him, and fixing the style errors that PyCharm’s linter catches and Grus’s runs of mypy miss — Grus’s live coding session is effectively a blindingly-fast introduction to the same material as is in Chapters 18 and 19
- June 26 (final meeting) — Chapter 19: Deep Learning — Only up to and including the section titled “Softmaxes and Cross-Entropy”
See also Looking Beyond