Where Do Our Frequency Numbers Come From
The real-world data sources behind Waffuru's frequency rankings
February 26, 2026 · Waffuru Team
Waffuru's weighted frequency system is only as good as its data. Every card in our database has frequency rankings across four categories (spoken, written, web, and anime), each sourced from real-world corpora.
Spoken Frequency
Our spoken frequency data comes from a large subtitle dataset. These capture how Japanese is actually spoken in conversations, interviews, and daily life, not how textbooks say it should be.
Written Frequency
Written rankings are derived from newer novels. This gives a broad view of formal and semi-formal written Japanese, without being predisposed to archaic Japanese.
Web Frequency
Web frequency data comes from large-scale crawls of Japanese websites, including blogs, forums, social media, and news sites. This captures informal, modern usage that doesn't always show up in traditional written corpora: slang, internet expressions, and everyday vocabulary.
Anime Frequency
Our anime frequency rankings come from subtitle corpora covering thousands of anime series. If most of your Japanese exposure comes from entertainment, this category matters a lot, since media vocabulary often differs from what textbooks teach.
How Rankings Work
Each card gets a rank within each category, where rank 1 is the most frequent. A word like 的 (まと/てき) might rank in the top 50 for written frequency but much lower for anime. When a card doesn't appear in a particular corpus, it gets a high default rank so it naturally falls to the end of that category's ordering.
These four frequencies are the raw input to Waffuru's weighted scoring system. Your personal weights control how they're combined, so the same data produces a different study order for every user.