Supplementary Material and Looking Beyond

Supplementary Material

(1) Since Python’s classes are something we only did much with toward the end, and they are often considered a fundamental part of an introduction to Python, you could do a quick introduction to classes (aka “object-oriented programming”) such as Section 1.16 of Python Distilled by David Beazley. The three things that are generally considered important about object-oriented programming are encapsulation (controlled access to the object’s data), inheritance (behavior is inherited from superclasses in the class heirarchy and extended by subclasses), and polymorphism (behavior can be overriden in subclasses). Chapters 4 and 7 of Beazley are a much more complete introduction to object-oriented programming.

(2) Since I am only showing you how to use Git, not how the git database and tools actually function, consider supplementing your understanding with Pragmatic Guide to Git by Travis Swicegood.

Looking Beyond Our Endpoint (Chapter 19 on Deep Learning)

Although the chapter we finished with did not attempt to cover the 2017 “Attention is All You Need” paper, Grus has almost perfectly set you up for a junior-level course in natural-language processing and LLMs that almost every computer science department is now or shortly going to have on offer, but which are currently typically a graduate-level offering. Before or instead of taking such a course, you can review or go beyond where we have gotten as of Chapter 19 of Grus in the following six (or limitless other) ways:

Of the six possibilities for looking beyond enumerated here, the third possibility (the 3Blue1Brown videos by Grant Sanderson) is among the easiest and most quickly profitable, and the fifth possibility (the book by Chollet & Watson) is probably the most appropriate for our current understanding level.

(1) A concise and mathematically-sophisticated review of all that we have done (and then some) is “A high-bias, low-variance introduction to Machine Learning for physicists.”

(2) Grus considers how LLMs have changed and will continue to change the workflow of a data scientist starting at the 15:00 mark in this late-2023 video “Doing Data Science in the Time of ChatGPT.” This is a casual survey that may only serve to cement what you have already discovered you can do with a current-generation LLM like Grok 3 or ChatGPT 4.5.

(3) Recapitulate what we have done and then look inside the mathematics and implementation of LLMs, without actually doing any more implementation, by watching the seven Deep Learning videos by 3Blue1Brown (Grant Sanderson). Sanderson’s visualizations are a joy to watch even when he is presenting something you already understand, but perhaps the first three in the series are not worth your time given how much we have learned from Grus. The fourth in the series (“Backpropagation Calculus”) will help cement the slick multi-variable calculus Grus is doing in Chapter 19. There is no substitute though for actually writing up the application of the multi-variable chain rule yourself. The final three in the series (“Transformers Explained Visually,” “Attention in Transformers,” and “How Might LLMs Store Facts?”) will definitely be new, and will give you at least a vague idea how neural nets and deep learning are applied to create LLMs like Grok, Gemini, and ChatGPT.

(4) How this is going to affect industry after industry is anybody’s guess, but a recent and informed guess (from venture capitalist Marc Andreessen) is in this late-2024 Lex Fridman interview of Marc Andreessen. (The link deliberately jumps you to a point over three hours into the interview.)

(5) Following up on a recommendation for further reading given at the end of Chapter 19 of Grus, consider, as your next read, the preliminary version of Deep Learning with Python, Third Edition by Chollet & Watson (the final version of the third edition is estimated to appear September 2025, but the early access release already has training LLMs in it). Alternatively Build a Large Language Model (from Scratch) by Sebastian Raschka is popular and well-reviewed on Amazon.

(6) The Spring 2025 Stanford CS 336 lectures - “Language Modeling from Scratch” are available on YouTube. Although this is a graduate level course, if your calculus, statistics, and data science are solid, and your Python is at the level of Grus, the material is accessible to a junior (but you have to be willing to work quite a bit harder than a graduate student who is fluent in Python would have to).