Expanding My Vocabulary to a Million Words

Published 16 Jul. 2025.
Tags: .

I often find myself coming across new words that I look up in a dictionary and promptly forget about. I’ve been using Anki to learn Mandarin with my HSK stroke order deck, and I want an easy way to use the same approach for English. Existing decks I found were two small (didn’t contain words I wanted to learn) and lacked detail (I find the etymology very handy in understanding the meaning of words), so I decided to make my own.

Wiktionary is a collaborative dictionary with incredibly detailed entries for 1.2+ million English words. The data is freely available from kaikki.org under CC BY-SA 4.0 and GFDL licenses in a raw JSONL format. I’ve written anki-wiktionary-english-dictionary to transform this data into Anki flashcards. Each card includes definitions, IPA pronunciation, etymology, audio pronunciation, word forms, hyphenation (for syllable breaks). I’ve taken the top 500K words from Wiktionary according to Google Book’s ngram viewer dataset. You can download the deck from AnkiWeb if you don’t want to build it yourself. This code should also be useful in doing the same for other languages, or adding cross-language decks with Wiktionary’s translation data.

An example Anki card for Anathema

After discovering homoiconicity was in the top 800K, I imported another 500K words which brought me above the free sync server’s limit of 500MB (and to the clickbait title), so I deployed my own sync server.

Now if you’ll excuse me, I have a few words to learn…