Features & Technical Details

1. Smart Search & Transliteration

Guejmi's search engine is designed to handle the complexity of the Tunisian dialect, which is written in both Latin (Arabizi) and Arabic scripts.

How it Works

The search engine uses a tiered approach to find the best match:

  1. Tier 1: Exact Match: Checks if the query exactly matches a stored spelling.
  2. Tier 2: Skeleton Match ("The Magic Tier"): Checks if the "consonant skeleton" of the query matches a stored skeleton.
  3. Tier 3: Fuzzy Skeleton Match: Uses Trigram Similarity on the skeletons (> 0.3) as a safety net.

The "Consonant Skeleton" Algorithm

To normalize the chaotic Tunisian dialect (Derja), we generate a simplified "skeleton" for every word using the get_derja_skeleton function:

  1. Transliteration: Arabic characters are converted to Latin (Arabizi) using a standard mapping (e.g., ع -> 3, ح -> 7).
  2. Lowercase: All text is converted to lowercase.
  3. Digraph Standardization: Common combinations are unified:
    • ou -> w (e.g., "kouijina" -> "kwijina")
    • ch, sh, sch -> x (e.g., "mchit" -> "mxit")
    • ph -> f
    • gue -> g, gui -> gi
  4. Arabizi Normalization: Aggressive unification of similar sounding consonants:
    • 9, k, c -> q (unifies "krahba", "qrahba", "crahba")
    • 7 -> h
    • 5 -> kh
    • 3 -> a (treated as a vowel-like placeholder)
  5. Vowel Stripping: Removes a, e, i, o, u, y to focus on the consonant structure.
  6. Deduplication: Collapses repeated characters (e.g., mm -> m).

Example:

2. Smart Merge System

To prevent the dictionary from becoming fragmented with multiple spellings of the same word (e.g., "Krahba", "Karhba", "Karahba"), we implemented a "Smart Merge" system.

Workflow

  1. Submission Interception: When a user submits a new word, the backend checks for existing similar entries (Trigram Similarity > 0.3).
  2. Visual Verification: If a match is found, the frontend displays a modal:
    "We found 'ExistingWord'. Is your word just another way to spell this?"
  3. User Choice:
    • Yes (Merge): The new spelling is added as a Spelling variant to the existing entry. The definition is added to the existing entry's definitions list.
    • No (Force Create): A completely new entry is created.

3. Dynamic Word Display

The way a word is displayed changes based on how the user found it.

Logic

4. Word of the Day

5. Trending Words