Build a Large Language Model (From Scratch) versus Natural Language Processing with Transformers.

Both show up on every "best" list. They're not competitors. They're a sequence. Here's which one to read first, and when.

Reviewed by Ashish Sheth · Updated April 2026
Option A
Build a Large Language Model (From Scratch)
Build a Large Language Model (From Scratch)
Sebastian Raschka · 2024
READ FULL REVIEW →
Option B
Natural Language Processing with Transformers
Natural Language Processing with Transformers
Lewis Tunstall, Leandro von Werra, Thomas Wolf · 2022
READ FULL REVIEW →
Author
Sebastian Raschka
Lewis Tunstall, Leandro von Werra, Thomas Wolf
Pages
368
408
Published
2024
2022
Publisher
Manning Publications
O'Reilly Media
Level
intermediate
intermediate
Amazon Rating
4.5/5 (445)
4.6/5 (257)
Goodreads Rating
4.6/5 (313)
4.39/5 (212)
Build a Large Language Model (From Scratch)
Strengths
+ Clear, step-by-step pedagogy that breaks down complex concepts into manageable pieces
+ Hands-on coding throughout, you build a working model on your laptop
+ Excellent diagrams and visual explanations alongside code
+ Companion GitHub repo has 91,000+ stars with bonus materials
Caveats
Limited mathematical depth on why certain architectural choices exist
Focuses only on GPT-style architecture, no coverage of alternatives
Requires solid Python and basic ML knowledge to follow along
Natural Language Processing with Transformers
Strengths
+ Written by the team that built the Transformers library — definitive authority
+ Code-first with runnable Hugging Face examples throughout
+ Covers the full lifecycle: pretrain, fine-tune, evaluate, deploy
+ Excellent for engineers who want to actually ship NLP features
Caveats
Predates the LLM/GPT-4 era — emphasis is on smaller fine-tuned models
Some library APIs have evolved since publication
Less coverage of generative LLMs than the title implies for 2026 readers
The verdict
Choose based on your specific needs: Build a Large Language Model (From Scratch) focuses on transformer architecture from scratch, while Natural Language Processing with Transformers emphasizes transformer architecture in depth.
Build a Large Language Model (From Scratch)
Check Price on Amazon →
Natural Language Processing with Transformers
Check Price on Amazon →
Frequently asked
Which is better, Build a Large Language Model (From Scratch) or Natural Language Processing with Transformers?
Choose based on your specific needs: Build a Large Language Model (From Scratch) focuses on transformer architecture from scratch, while Natural Language Processing with Transformers emphasizes transformer architecture in depth.
Do I need a GPU to follow along?
No. The model you build is small enough to train on a regular laptop CPU. That's intentional. The goal is understanding, not training a production model.
Is NLP with Transformers still relevant given how fast LLMs evolved?
The transformer architecture chapters and Hugging Face workflows are still the standard. What's outdated: emphasis on BERT-era fine-tuning over GPT-style prompting. Pair with a current LLM book.