Features & Technical Details
1. Smart Search & Transliteration
Guejmi's search engine is designed to handle the complexity of the Tunisian dialect, which is written in both Latin (Arabizi) and Arabic scripts.
How it Works
The search engine uses a tiered approach to find the best match:
- Tier 1: Exact Match: Checks if the query exactly matches a stored spelling.
- Tier 2: Skeleton Match ("The Magic Tier"): Checks if the "consonant skeleton" of the query matches a stored skeleton.
- Tier 3: Fuzzy Skeleton Match: Uses Trigram Similarity on the skeletons (> 0.3) as a safety net.
The "Consonant Skeleton" Algorithm
To normalize the chaotic Tunisian dialect (Derja), we generate a simplified "skeleton" for every word
using the get_derja_skeleton function:
- Transliteration: Arabic characters are converted to Latin (Arabizi) using a
standard mapping (e.g.,
ع->3,ح->7). - Lowercase: All text is converted to lowercase.
- Digraph Standardization: Common combinations are unified:
ou->w(e.g., "kouijina" -> "kwijina")ch,sh,sch->x(e.g., "mchit" -> "mxit")ph->fgue->g,gui->gi
- Arabizi Normalization: Aggressive unification of similar sounding consonants:
9,k,c->q(unifies "krahba", "qrahba", "crahba")7->h5->kh3->a(treated as a vowel-like placeholder)
- Vowel Stripping: Removes
a, e, i, o, u, yto focus on the consonant structure. - Deduplication: Collapses repeated characters (e.g.,
mm->m).
Example:
- Input: "Kouijina" -> Skeleton: "qwjn"
- Input: "Coujina" -> Skeleton: "qwjn"
- Result: Match!
2. Smart Merge System
To prevent the dictionary from becoming fragmented with multiple spellings of the same word (e.g., "Krahba", "Karhba", "Karahba"), we implemented a "Smart Merge" system.
Workflow
- Submission Interception: When a user submits a new word, the backend checks for existing similar entries (Trigram Similarity > 0.3).
- Visual Verification: If a match is found, the frontend displays a modal:
"We found 'ExistingWord'. Is your word just another way to spell this?"
- User Choice:
- Yes (Merge): The new spelling is added as a
Spellingvariant to the existing entry. The definition is added to the existing entry's definitions list. - No (Force Create): A completely new entry is created.
- Yes (Merge): The new spelling is added as a
3. Dynamic Word Display
The way a word is displayed changes based on how the user found it.
Logic
- Search Context: If a user searches for "wakri" (Latin), the title on the word page will be "Wakri".
- Arabic Priority: If there is no search context, the system prioritizes the Arabic spelling (e.g., "وكري") if available.
- Canonical Fallback: If neither applies, the "primary spelling" (usually the first one created) is shown.
4. Word of the Day
- Endpoint:
/api/entries/random/ - Logic: Fetches a random, high-quality entry to display on the homepage.
- Caching: Can be cached to ensure the same word is shown for 24 hours (implementation detail).
5. Trending Words
- Endpoint:
/api/entries/trending/ - Logic: Returns words with the most recent activity (votes, comments, or views).