1. IPA Phonetic Transcription
Every word typed into a lyric line is converted to its IPA representation in real time, entirely in the browser — no network requests.
Pipeline
Built-in dictionary
Each language ships with a hand-curated dictionary of irregular or hard-to-convert words: common pronouns, contractions, verb forms, and prepositions. These are looked up first for exact results.
Rule-based converters
For any word not in the dictionary, a deterministic set of substitution rules converts spelling to IPA. Quality depends on how regular the language's orthography is:
| Language | Regularity | Notes |
|---|---|---|
| 🇪🇸 Spanish | Very high | Nearly one-to-one letter↔sound mapping |
| 🇮🇹 Italian | High | A few digraphs: sci, gli, gn, ch/gh |
| 🏛️ Latin | High | Classical pronunciation; digraphs ae/oe/au/ph/th |
| 🇫🇷 French | Medium | Silent letters, liaison, nasal vowels |
| 🇬🇧 English | Low | Highly irregular; rule results are approximate |
2. Word Embedding Data Pipeline
The suggestion engine needs two things per vocabulary word: its IPA (for rhyme scoring)
and its embedding vector (for meaning scoring). These are precomputed offline by
scripts/build_word_data.py and saved as data/words_{lang}.js.
FastText word vectors
FastText vectors are trained on Wikipedia
by Meta AI using a skip-gram model. Each word is placed in a 300-dimensional space so that words
appearing in similar contexts end up pointing in similar directions.
The .vec files are plain text sorted by frequency, which lets us stream just the top N words
without downloading the full 2–4 GB file.
PCA dimensionality reduction
Principal Component Analysis reduces each 300-dimensional vector to 50 dimensions while maximising preserved variance. The PCA is fit on the full filtered vocabulary, so the 50 components capture the dominant semantic axes of the language.
Benefits: 6× smaller file · faster browser computation · slight denoising.
Trade-off: minor loss of semantic precision (negligible for top-N retrieval).
Int8 quantization
After unit-normalising each vector (‖v‖ = 1), values are scaled by 127, rounded, and stored as signed bytes. Cosine similarity in int8 is nearly identical to float32 because the vectors were unit-normalised before scaling — quantisation error per dimension is ≤ 0.004, negligible over 50 dimensions.
Storage per word: $50$ bytes (vs $200$ bytes for float32) — total for $250\text{k}$ words: $12.5\,\text{MB}$ int8 vectors $+$ words/IPA JSON $+$ base64 overhead $\approx 23\,\text{MB}$ per file.
espeak-ng IPA
espeak-ng is an open-source speech synthesis engine used as a pronunciation oracle. phonemizer wraps it into a Python API. Words are processed in batches of 2 000 to amortise subprocess startup cost, with 4 parallel workers.
3. Rhyme Score 3
The rhyme score counts how many phonemes two words share from the end — the classic phonetic definition of a rhyme. It is displayed as an orange number on each chip.
Step 1 — IPA cleaning
Before comparison, both the target and candidate IPA strings are normalised to make rhyme detection accent-tolerant:
This prevents near-homophones from being ranked as non-rhymes (e.g. "bees" / "peace").
Step 2 — Suffix overlap count
Scan both IPA strings character by character from the right. The score is how many characters match before the first mismatch.
Step 3 — Normalisation
The best-rhyming candidate always scores 1.0. The displayed orange number is the raw count before normalisation.
4. Meaning Score 85%
The meaning score measures how semantically close a candidate word is to the target, using cosine similarity between their word embedding vectors. It is displayed as a purple percentage on each chip.
Cosine similarity
| Symbol | Meaning |
|---|---|
| $\mathbf{a}$ | 50d embedding of the target word |
| $\mathbf{b}$ | 50d embedding of the candidate word |
| $\mathbf{\theta}$ | angle between $\mathbf{a}$ and $\mathbf{b}$ |
| Value | Interpretation |
|---|---|
| $+1$ | Same direction — semantically very close |
| $\phantom{+}0$ | Orthogonal — unrelated words |
| $-1$ | Opposite — contrasting contexts |
Geometric intuition
Word embeddings place semantically similar words at small angles from each other in vector space:
Normalisation
$\mathbf{t}$ is the target word's vector. Negative similarities are clamped to 0 before normalisation. The displayed purple % is $\text{round}(\text{sem}_i \times 100)$.
Why cosine and not Euclidean distance?
Cosine similarity measures the angle between vectors, ignoring magnitude. Since all vectors are unit-normalised before int8 quantization, their magnitudes are all ≈ 127. Using cosine ensures the score reflects semantic direction, not accidental magnitude differences from rounding.
5. Final Ranking
Score combination
Slider positions
| Slider | w | Formula | Effect |
|---|---|---|---|
| 0% — full Rhyme | 0.0 | 1.0 × rhyme | Best-rhyming words first |
| 30% (default) | 0.3 | 0.3 × sem + 0.7 × rhyme | Rhyme-leaning blend |
| 50% | 0.5 | 0.5 × sem + 0.5 × rhyme | Equal weight |
| 100% — full Meaning | 1.0 | 1.0 × sem | Semantically closest words first |
Why independent normalisation matters
Both scores are normalised independently — each divided by its own maximum. Without this, whichever score has larger raw values would dominate at any slider position. With it, w = 0.5 means genuinely equal influence, regardless of the raw scale of each score.
6. Complexity
$N = 250\,000$ words, $d = 50$ (embedding dim), $D = 300$ (original FastText dim).
| Step | When | Complexity | Typical execution time |
|---|---|---|---|
| Stream vectors | One-time, offline | $\mathcal{O}(N \cdot D)$ | ~10 min download ~270 MB from FastText |
| PCA | $\mathcal{O}(N \cdot D^2)$ | ~10 min SVD on $250\text{k} \times 300$ matrix |
|
| espeak-ng IPA | $\mathcal{O}(N)$ | ~10 min 125 batches × 4 parallel workers |
|
| Data file load | First ✦ click per language | $\mathcal{O}(N)$ | 1–3 s Parse 23 MB JS · atob on 16 MB base64 string · fill Int8Array of 12.5 M bytes · insert 250k entries into Map |
| Score all candidates | Target word changes | $\mathcal{O}(N \cdot d)$ | ~300 ms 250k × 50 multiply-adds (cosine) + 250k IPA suffix matches |
| Re-rank | Every slider move | $\mathcal{O}(N \log N)$ | ~30 ms recompute weighted score + sort 250k candidates |
Scoring results are cached per target word — moving the slider only triggers a re-sort, not a re-score. The dominant one-time cost is the data file load; all subsequent interactions reuse the same in-memory Int8Array.