Build a Large Language Model (From Scratch) versus Natural Language Processing with Transformers.

Both show up on every "best" list. They're not competitors. They're a sequence. Here's which one to read first, and when.

Reviewed by Ashish Sheth · Updated April 2026

Option A

Build a Large Language Model (From Scratch)

Sebastian Raschka · 2024

READ FULL REVIEW →

Option B

Natural Language Processing with Transformers

Lewis Tunstall, Leandro von Werra, Thomas Wolf · 2022

READ FULL REVIEW →

Author

Sebastian Raschka

Lewis Tunstall, Leandro von Werra, Thomas Wolf

Pages

368

408

Published

2024

2022

Publisher

Manning Publications

O'Reilly Media

Level

intermediate

Amazon Rating

4.5/5 (445)

4.6/5 (257)

Goodreads Rating

4.6/5 (313)

4.39/5 (212)

Build a Large Language Model (From Scratch)

Strengths

+ Clear, step-by-step pedagogy that breaks down complex concepts into manageable pieces

+ Hands-on coding throughout, you build a working model on your laptop

+ Excellent diagrams and visual explanations alongside code

+ Companion GitHub repo has 91,000+ stars with bonus materials

Caveats

− Limited mathematical depth on why certain architectural choices exist

− Focuses only on GPT-style architecture, no coverage of alternatives

− Requires solid Python and basic ML knowledge to follow along

Natural Language Processing with Transformers

Strengths

+ Written by the team that built the Transformers library — definitive authority

+ Code-first with runnable Hugging Face examples throughout

+ Covers the full lifecycle: pretrain, fine-tune, evaluate, deploy

+ Excellent for engineers who want to actually ship NLP features

Caveats

− Predates the LLM/GPT-4 era — emphasis is on smaller fine-tuned models

− Some library APIs have evolved since publication

− Less coverage of generative LLMs than the title implies for 2026 readers

The verdict

Choose based on your specific needs: Build a Large Language Model (From Scratch) focuses on transformer architecture from scratch, while Natural Language Processing with Transformers emphasizes transformer architecture in depth.

Build a Large Language Model (From Scratch)

Check Price on Amazon →

Natural Language Processing with Transformers

Check Price on Amazon →

Frequently asked

Which is better, Build a Large Language Model (From Scratch) or Natural Language Processing with Transformers?

Do I need a GPU to follow along?

No. The model you build is small enough to train on a regular laptop CPU. That's intentional. The goal is understanding, not training a production model.

Is NLP with Transformers still relevant given how fast LLMs evolved?

The transformer architecture chapters and Hugging Face workflows are still the standard. What's outdated: emphasis on BERT-era fine-tuning over GPT-style prompting. Pair with a current LLM book.