KYC Right Source

01 · Decision in one pass

Every signal, evaluated together — not in stages.

Older screening engines run a name-only match first, hand the survivors to a rule sheet, then apply DOB and country filters. Each step throws away context the next step would have used.

Our engine evaluates the full set of signals — name tokens, script, country, date of birth, identifiers — in a single computation. The composite verdict is produced once, in real time, with every input weighted by how trustworthy it is on the day. Threshold logic is configurable per business line, so the same engine can be tuned aggressively for high-risk corridors and conservatively for retail onboarding.

NameA. Petrov

DOB1972-04-18

CountryRU

IDPRS•••3471

Unified Match Engine

all signals · single decision

Composite score · 94 % → ESCALATE L2

02 · Origin-aware fuzzy match

A name follows different rules depending on where it comes from.

A reader who hears Dmitri Sokolov, Li Mei, and Marc Lefèvre instantly files each one under a cultural origin. Our matcher does the same — and uses that prediction to pick its strategy.

A lightweight classifier — trained on a multi-million-name corpus across scripts and cultures — assigns a heritage tag to every input. That tag drives downstream choices: in European naming conventions, given-name and family-name order is interchangeable; in East-Asian naming order, it is not. Particles, suffixes, double surnames and middle initials each follow their own grammar instead of being stripped as noise.

Dmitri Sokolov

Russian / Slavic

Li Mei

Chinese · Han

Marc Lefèvre

French · Latin-Eu

Khalid bin Mansour

Arabic · MENA

Aiyana Whitehorse

North-American

Hiroshi Tanaka

Japanese

03 · Canonical form

Standardise before comparing.

Names cannot be compared character-for-character — diacritics, punctuation and casing carry no semantic meaning. Each input is reduced to a canonical shape before scoring.

Diacritic stripping

François Müller-Bélanger →
FRANCOIS MULLER BELANGER

Punctuation normalised

Apostrophe variants (’ ' `) collapse to one form. Hyphens, dots and commas are flattened to whitespace.

Legal form recognition

Northern Star Logistics Ltd. (P.L.C.) →
NORTHERN STAR LOGISTICS LTD (PLC)

Case & whitespace

Everything is upper-cased. Multiple internal spaces become one. Leading and trailing whitespace is trimmed.

04 · Token classification

Each token gets a role — and the role drives the weight.

After canonicalisation, the engine breaks a name into tokens and assigns each one a semantic role. The role determines how much that token contributes to the final score and which fuzzy strategies apply to it.

«

VOLKOV

SURNAME

,

D

ABBREVFIRSTNAME?

K

ABBREVMIDDLENAME?

»

«

URAL

CORE-NAME

HOLDINGS

CORE-NAME

PLC

LEGAL-FORM

»

Personal names are tagged with first / middle / last / abbreviation / particle / title — adjusted by the cultural-origin classifier in step 02. Entity names recognise legal forms (LTD, OJSC, SPRL, JSC…) and noise words (OF, THE, INTERNATIONAL) in every supported language so they can be down-weighted automatically.

05 · Misspellings & transcription noise

Built to forgive the kinds of error humans actually make.

A screening engine that only catches perfectly-spelled inputs misses most of what matters. Our matcher tolerates the everyday mistakes operations teams produce — and the legitimate variation that arises from honest transcription.

Each tolerance is implemented as a deliberate, weighted rule — not a blanket Levenshtein threshold — so the matcher knows the difference between *Mater* / *Matter* (one likely typo) and *Mater* / *Mateo* (different name).

Pattern	Example A		Example B	Caught?
Letter inversion	SARAH	↔	SAHRA	✓
Doubled letter	CARRIER	↔	CARIER	✓
Missing letter	RECEIVABLE	↔	RECEVABLE	✓
Phonetic equivalent	CATHERINE	↔	KATHERINE	✓
Cyrillic orthography	SOKOLOFF	↔	SOKOLOV	✓
Word boundary shift	REDLINE LOGISTICS	↔	RED LINE LOGISTICS	✓
Common alias	JONATHAN	↔	JON	✓
Suffix variation	JR. / JUNIOR	↔	II / 2ND	✓

06 · Cross-script transliteration

Match Latin watchlists against names in the local script.

Operational systems — core banking, payments, ERPs — frequently store names in their original alphabet. Sanctions lists from OFAC, the EU and the UN publish in Latin. Our engine bridges the two.

أحمد بن يوسف الزهراني

AHMED BIN YOUSEF AL-ZAHRANI

Arabic script → Latin · partial input still triggers a hit

Дмитрий Соколов

DMITRI SOKOLOV

Cyrillic → Latin · soft / hard signs handled

王李明

WANG LI-MING

Chinese hanzi → Latin · pinyin with tone-tolerant fuzzing

Long, multi-token names do not have to match end-to-end. A partial input — say only the family-name portion in the source script — still raises the right hit when surrounding tokens are missing, because the algorithm knows which tokens carry the weight.

07 · How a hit becomes a score

Token-level scores compose a name-level verdict.

Each token in the input is compared to its candidate counterpart in the watchlist. An exact match scores 100; a fuzzy match earns 75–99; missing or unexpected tokens incur penalties.

Surname tokens carry the heaviest weight. Particles, titles and noise words barely move the needle. The engine then aggregates token-level scores into a single number per candidate, sorts by descending score, and returns everything above a configurable threshold.

Threshold control is per-tenant, per-business-line — high-risk corridors can demand 90+; low-friction onboarding can tolerate 75+ knowing the case manager will catch borderline calls.

SURNAMEweight × 1.0

96

FIRSTNAMEweight × 0.7

88

MIDDLENAMEweight × 0.4

72

PARTICLEweight × 0.1

100

UNEXPECTEDpenalty

−8

Aggregate

89

08 · Attribute matching

Beyond names — every other signal weighs in.

Names are necessary but never sufficient. The matcher applies dedicated logic to dates, geographies, identifiers and other attributes, each with its own quality model.

Dates of birth

Year, month and day compared independently. Configurable tolerance for transposed digits and unknown components — *exact match*, *year only*, *±1 year*, *unknown*.

Country & city

Exact ISO codes, alpha-2 / alpha-3 mapping, and historical names handled. We do not fuzzy-match places — only verified geographic matches are allowed.

Identifiers

Passport, national ID, tax ID, LEI, SWIFT / BIC. Compared exactly with format-aware checks (length, checksum) so a typo doesn't fire as a hit.

Gender & flags

Compared with reliability weighting — gender mismatches are signal, not noise, but absence is treated as unknown, never as conflicting.

09 · Composite verdict

One score that knows what it’s talking about.

After the name-side and attribute-side computations finish, a composite scoring layer produces the final number — boosting the score when reliable attributes corroborate the name match, and penalising it when they don’t.

Each attribute carries a configurable trust weight reflecting your data quality. If your country field is always populated and reliable, a country match might add up to +25. If your DOB capture is patchy, a missing DOB carries no penalty. If DOB conflicts, the layer can subtract up to −20.

The result: weak name matches with strong corroboration get raised to alerts; strong name matches with conflicting attributes get suppressed. Both moves come with full reasons in the audit log.

Example A — borderline name, strong context

Name match · Mugabe, R G ↔ Mugabe, Robert Gabriel

good

89

Country match · ZW = ZW

match

+5

DOB match · 1924-02-21 = 1924-02-21

match

+3

Composite (+ trust boost)97

Example B — perfect name, conflicting context

Name match · John Smith ↔ John Smith

exact

100

Country match · CA ≠ NZ

conflict

−10

DOB match · 1981-03-04 ≠ 1956-11-22