A clear-eyed walk through what actually happens when you submit a name to KYC Right Source: how it is parsed, classified by cultural origin, fuzzy-matched against the watchlist, scored against context, and finally settled by a single composite verdict. Compliance officers are welcome; engineers are encouraged.
Older screening engines run a name-only match first, hand the survivors to a rule sheet, then apply DOB and country filters. Each step throws away context the next step would have used.
Our engine evaluates the full set of signals — name tokens, script, country, date of birth, identifiers — in a single computation. The composite verdict is produced once, in real time, with every input weighted by how trustworthy it is on the day. Threshold logic is configurable per business line, so the same engine can be tuned aggressively for high-risk corridors and conservatively for retail onboarding.
A reader who hears Dmitri Sokolov, Li Mei, and Marc Lefèvre instantly files each one under a cultural origin. Our matcher does the same — and uses that prediction to pick its strategy.
A lightweight classifier — trained on a multi-million-name corpus across scripts and cultures — assigns a heritage tag to every input. That tag drives downstream choices: in European naming conventions, given-name and family-name order is interchangeable; in East-Asian naming order, it is not. Particles, suffixes, double surnames and middle initials each follow their own grammar instead of being stripped as noise.
Names cannot be compared character-for-character — diacritics, punctuation and casing carry no semantic meaning. Each input is reduced to a canonical shape before scoring.
François Müller-Bélanger →
FRANCOIS MULLER BELANGER
Apostrophe variants (’ ' `) collapse to one form. Hyphens, dots and commas are flattened to whitespace.
Northern Star Logistics Ltd. (P.L.C.) →
NORTHERN STAR LOGISTICS LTD (PLC)
Everything is upper-cased. Multiple internal spaces become one. Leading and trailing whitespace is trimmed.
After canonicalisation, the engine breaks a name into tokens and assigns each one a semantic role. The role determines how much that token contributes to the final score and which fuzzy strategies apply to it.
Personal names are tagged with first / middle / last / abbreviation / particle / title — adjusted by the cultural-origin classifier in step 02. Entity names recognise legal forms (LTD, OJSC, SPRL, JSC…) and noise words (OF, THE, INTERNATIONAL) in every supported language so they can be down-weighted automatically.
A screening engine that only catches perfectly-spelled inputs misses most of what matters. Our matcher tolerates the everyday mistakes operations teams produce — and the legitimate variation that arises from honest transcription.
Each tolerance is implemented as a deliberate, weighted rule — not a blanket Levenshtein threshold — so the matcher knows the difference between *Mater* / *Matter* (one likely typo) and *Mater* / *Mateo* (different name).
| Pattern | Example A | Example B | Caught? | |
|---|---|---|---|---|
| Letter inversion | SARAH | ↔ | SAHRA | ✓ |
| Doubled letter | CARRIER | ↔ | CARIER | ✓ |
| Missing letter | RECEIVABLE | ↔ | RECEVABLE | ✓ |
| Phonetic equivalent | CATHERINE | ↔ | KATHERINE | ✓ |
| Cyrillic orthography | SOKOLOFF | ↔ | SOKOLOV | ✓ |
| Word boundary shift | REDLINE LOGISTICS | ↔ | RED LINE LOGISTICS | ✓ |
| Common alias | JONATHAN | ↔ | JON | ✓ |
| Suffix variation | JR. / JUNIOR | ↔ | II / 2ND | ✓ |
Operational systems — core banking, payments, ERPs — frequently store names in their original alphabet. Sanctions lists from OFAC, the EU and the UN publish in Latin. Our engine bridges the two.
Long, multi-token names do not have to match end-to-end. A partial input — say only the family-name portion in the source script — still raises the right hit when surrounding tokens are missing, because the algorithm knows which tokens carry the weight.
Each token in the input is compared to its candidate counterpart in the watchlist. An exact match scores 100; a fuzzy match earns 75–99; missing or unexpected tokens incur penalties.
Surname tokens carry the heaviest weight. Particles, titles and noise words barely move the needle. The engine then aggregates token-level scores into a single number per candidate, sorts by descending score, and returns everything above a configurable threshold.
Threshold control is per-tenant, per-business-line — high-risk corridors can demand 90+; low-friction onboarding can tolerate 75+ knowing the case manager will catch borderline calls.
Names are necessary but never sufficient. The matcher applies dedicated logic to dates, geographies, identifiers and other attributes, each with its own quality model.
Year, month and day compared independently. Configurable tolerance for transposed digits and unknown components — *exact match*, *year only*, *±1 year*, *unknown*.
Exact ISO codes, alpha-2 / alpha-3 mapping, and historical names handled. We do not fuzzy-match places — only verified geographic matches are allowed.
Passport, national ID, tax ID, LEI, SWIFT / BIC. Compared exactly with format-aware checks (length, checksum) so a typo doesn't fire as a hit.
Compared with reliability weighting — gender mismatches are signal, not noise, but absence is treated as unknown, never as conflicting.
After the name-side and attribute-side computations finish, a composite scoring layer produces the final number — boosting the score when reliable attributes corroborate the name match, and penalising it when they don’t.
Each attribute carries a configurable trust weight reflecting your data quality. If your country field is always populated and reliable, a country match might add up to +25. If your DOB capture is patchy, a missing DOB carries no penalty. If DOB conflicts, the layer can subtract up to −20.
The result: weak name matches with strong corroboration get raised to alerts; strong name matches with conflicting attributes get suppressed. Both moves come with full reasons in the audit log.
A 30-minute walkthrough on a slice of your real data is the fastest way to see the engine in action against your incumbent. Sandbox key dispatched in 24 hours.