nano-gpt-z

A controlled study of catastrophic forgetting in small language models. Trained a 17M parameter GPT from scratch on 1B tokens of standard English, then fine-tuned it on Gen Z slang across eight data regimes — measuring WikiText-103 perplexity and style shift at every step. Key finding: perplexity increases 47x after just 1k tokens of fine-tuning, while style acquisition plateaus at 18–20% regardless of data volume.

View on GitHub →Read the Paper →

may be slow on first load · may talk gibberish occasionally