Alpha — This app is in early alpha. Accounts and progress may be reset before beta. Please disregard any references to "beta" elsewhere on the site.
← Back to News

Where Do Our Frequency Numbers Come From

The real-world data sources behind Waffuru's frequency rankings

February 26, 2026 · Waffuru Team

Waffuru's weighted frequency system is only as good as its data. Every card in our database has frequency rankings across four categories (spoken, written, web, and anime), each sourced from real-world corpora.

Spoken Frequency

Our spoken frequency data comes from a large subtitle dataset. These capture how Japanese is actually spoken in conversations, interviews, and daily life, not how textbooks say it should be.

Written Frequency

Written rankings are derived from newer novels. This gives a broad view of formal and semi-formal written Japanese, without being predisposed to archaic Japanese.

Web Frequency

Web frequency data comes from large-scale crawls of Japanese websites, including blogs, forums, social media, and news sites. This captures informal, modern usage that doesn't always show up in traditional written corpora: slang, internet expressions, and everyday vocabulary.

Anime Frequency

Our anime frequency rankings come from subtitle corpora covering thousands of anime series. If most of your Japanese exposure comes from entertainment, this category matters a lot, since media vocabulary often differs from what textbooks teach.

How Rankings Work

Each card gets a rank within each category, where rank 1 is the most frequent. A word like 的 (まと/てき) might rank in the top 50 for written frequency but much lower for anime. When a card doesn't appear in a particular corpus, it gets a high default rank so it naturally falls to the end of that category's ordering.

These four frequencies are the raw input to Waffuru's weighted scoring system. Your personal weights control how they're combined, so the same data produces a different study order for every user.