Preface
As a compiler writer for domain specific cloud languages, I became interested in compiler implementations for domain specific tensor languages (such as PyTorch 2) after the "software 3.0" unlock (programming with natural language) from large language models like ChatGPT. However, I became frustrated with the non-constructiveness and disjointedness of my learning experience in the discipline of machine learning systems — the book that you are currently reading is my personal answer1 to these frustrations. It
- is inspired by the introductory computer science canon created by Schemers. SICP and it's dual, HTDP, took you from counting to compilers in an unbroken, logical sequence, which, although has an informal flânnerie like feel, provides strong foundational edifice for computation. The recent addition of DCIC, spawning from it's phylogonetic cousin PAPL, was created to cater to the recent shift in data science with the table data structure. This book follows suit, (aspirationally titled SITP), and is an experimental CS2 for "software 2.0" which places a heavier focus on the statistical inference, numerical linear algebra, low level and parallel systems programming required for deep learning, taking the reader from counting, to compilers for calculus.
- concerns itself with the low-level programming of deep learning systems. So the capstone project
teenygradinvolves programming against language, platform, and architecture specifics with aPythoncore for user productivty, and CPU/GPU accelerated kernels implemented withRustandCUDA Rustfor it's amenability towards native acceleration. However, you are more than welcome to follow along with your own choice of host language implementation — for instance, feel free to swap outPythonforJavascript2,RustforC++, etc3.
With that said, the book has three parts:
- in part 1 you implement a multidimensional
Tensorand acceleratedBLASkernels - in part 2 you implement
.backward()and acceleratedcuBLASkernels for the age of research - in part 3 you implement a fusion compiler with
OpNodegraph IR for the age of scaling
If you empathize with some of my frustrations, you may benefit from the course too4. Good luck on your journey. Are you ready to begin?
-
As of the time writing in 2026, I am writing this book with myself as the target audience. For a good critique on constructionist learning, refer to (Ames 2018). ↩
-
Or combine the productivity and performance with
Mojofor instance. The world is your oyster. ↩ -
And if not, I hope this book poses as a good counterexample for what you have in mind — for instance, perhaps Kevin Murphy's excellent encyclopedic treatment of the subject ↩