- Get link
- X
- Other Apps
- Get link
- X
- Other Apps
The purpose of this article is to explain how to use the Double Metaphone phonetic algorithm in Excel for reliable name and text matching, so that you can build robust fuzzy lookup, deduplication, and data cleansing workflows directly inside your workbooks.
1. Why phonetic matching is essential in Excel data projects
Exact text matching in Excel is fragile because real-world data is full of typos, alternative spellings, and transliteration differences.
Common problems include records such as “Smith” vs “Smyth”, “Schmidt” vs “Smith”, “Jon” vs “John”, and names that vary slightly across systems.
If you rely only on functions such as VLOOKUP, XLOOKUP, or MATCH with exact comparison, many logically identical records will not join.
Phonetic algorithms solve this problem by converting each string into a code that reflects how it sounds rather than how it is written.
Rows that share the same phonetic code are likely to represent the same person or entity even when the spelling differs.
Double Metaphone is one of the most widely used phonetic algorithms for names because it:
- Produces a primary and a secondary code to handle alternative pronunciations.
- Handles a broad range of European and non-English name patterns.
- Is more discriminating than older algorithms like Soundex.
Note : Phonetic matching does not prove that two records belong to the same person. It is a strong candidate-generation step that should be combined with additional checks such as date of birth, address, or ID numbers.
2. What Double Metaphone codes look like
Double Metaphone transforms a word into up to two short codes.
The first is called the primary code and represents the most likely pronunciation.
The second is the secondary code, representing a plausible alternative pronunciation.
For example, typical Double Metaphone encodings behave like this.
| Original text | Primary code | Secondary code | Comment |
|---|---|---|---|
| Smith | SM0 | XMT | “th” can be heard closer to “t” or “s” depending on accent. |
| Schmidt | XMT | SMT | Shares XMT with “Smith”, enabling a phonetic match. |
| Smyth | SM0 | XMT | Different spelling, very similar codes to “Smith”. |
| García | KRS | KRTS | Latin-derived surname, consonant clusters normalized. |
| Garcia | KRS | KRTS | Diacritic removed, codes remain consistent. |
The strength of Double Metaphone is that different spellings like “Smith”, “Smyth”, and “Schmidt” often share at least one code in common.
In Excel, you can exploit this by computing phonetic keys in helper columns and matching based on those keys instead of raw text.
3. Architecture: how to bring Double Metaphone into Excel
Excel does not ship with Double Metaphone as a built-in function.
To use it, you need to choose one of the following approaches.
3.1 VBA user-defined function (most common approach)
You implement Double Metaphone in VBA as a user-defined function (UDF) and call it in cells like an ordinary worksheet function.
- Pros: Runs inside the workbook, portable with the file, works with any modern Excel version that supports macros.
- Cons: Requires macro-enabled workbooks, careful maintenance of VBA code, slightly slower than native formulas for large datasets.
3.2 Power Query custom function
You can also implement Double Metaphone in Power Query (M language) and precompute codes during data import.
- Pros: Pushes the heavy computation into the ETL step, resulting datasets are values only, which are faster in formulas.
- Cons: Requires familiarity with Power Query, less interactive for ad-hoc cell-level tests.
3.3 Precomputing with external tools
In some environments, you already have Double Metaphone available in a database or data-processing platform.
For example, SQL Server, Python, or data quality tools may expose a Double Metaphone function.
In that case, you can precompute phonetic keys in those systems and import them into Excel as columns.
Note : For large tables with hundreds of thousands of rows, precomputing Double Metaphone keys outside Excel or via Power Query and loading them as plain values often yields much better recalculation performance than running VBA UDFs over the entire sheet.
4. Implementing a simplified Double Metaphone UDF in Excel VBA
This section presents a practical VBA implementation that captures the core behavior of Double Metaphone for many English names.
It is a simplified implementation focused on business data cleansing rather than a byte-perfect reference port of the original algorithm, but it is sufficient for most everyday deduplication and fuzzy matching tasks in Excel.
4.1 Adding the VBA module
- Save your workbook as a macro-enabled workbook (*.xlsm).
- Press Alt + F11 to open the Visual Basic Editor.
- In the Project pane, right-click your workbook, choose Insert → Module.
- Paste the following code into the new module.
Option Explicit ' Public worksheet functions. Public Function DMetaphonePrimary(ByVal textIn As String) As String Dim p As String, s As String Call DMetaphoneEncode(textIn, p, s) DMetaphonePrimary = p End Function Public Function DMetaphoneSecondary(ByVal textIn As String) As String Dim p As String, s As String Call DMetaphoneEncode(textIn, p, s) DMetaphoneSecondary = s End Function Public Function DMetaphoneBoth(ByVal textIn As String) As String Dim p As String, s As String Call DMetaphoneEncode(textIn, p, s) If s <> "" And s <> p Then DMetaphoneBoth = p & "|" & s Else DMetaphoneBoth = p End If End Function ' Core encoder (simplified Double Metaphone). Private Sub DMetaphoneEncode(ByVal textIn As String, _ ByRef primary As String, _ ByRef secondary As String) Dim s As String Dim i As Long, n As Long Dim ch As String Dim next2 As String, next3 As String s = NormalizeInput(textIn) primary = "" secondary = "" n = Len(s) If n = 0 Then Exit Sub ' Handle initial vowels. If IsVowel(Mid$(s, 1, 1)) Then primary = Mid$(s, 1, 1) secondary = primary i = 2 Else i = 1 End If Do While i <= n And Len(primary) < 4 ch = Mid$(s, i, 1) next2 = "" next3 = "" If i + 1 <= n Then next2 = Mid$(s, i, 2) If i + 2 <= n Then next3 = Mid$(s, i, 3) Select Case ch Case "B" Call AppendCode(primary, secondary, "P") If i < n And Mid$(s, i + 1, 1) = "B" Then i = i + 2 Else i = i + 1 End If Case "C" If next2 = "CH" Then Call AppendCode(primary, secondary, "X") i = i + 2 ElseIf next3 = "CIA" Or next3 = "CIO" Or next3 = "CIE" Then Call AppendCode(primary, secondary, "X") i = i + 3 ElseIf Mid$(s, i + 1, 1) = "Z" Then Call AppendCode(primary, secondary, "S") i = i + 2 ElseIf Mid$(s, i + 1, 1) = "I" Or Mid$(s, i + 1, 1) = "E" Or Mid$(s, i + 1, 1) = "Y" Then Call AppendCode(primary, secondary, "S") i = i + 2 Else Call AppendCode(primary, secondary, "K") If i < n And Mid$(s, i + 1, 1) = "C" Then i = i + 2 Else i = i + 1 End If End If Case "D" If next3 = "DGE" Or next3 = "DGI" Or next3 = "DGY" Then Call AppendCode(primary, secondary, "J") i = i + 3 Else Call AppendCode(primary, secondary, "T") If i < n And Mid$(s, i + 1, 1) = "D" Then i = i + 2 Else i = i + 1 End If End If Case "F" Call AppendCode(primary, secondary, "F") If i < n And Mid$(s, i + 1, 1) = "F" Then i = i + 2 Else i = i + 1 End If Case "G" If Mid$(s, i + 1, 1) = "H" Then If i > 1 And Not IsVowel(Mid$(s, i - 1, 1)) Then i = i + 2 Else Call AppendCode(primary, secondary, "F") i = i + 2 End If ElseIf Mid$(s, i + 1, 1) = "N" Then Call AppendCode(primary, secondary, "N") i = i + 2 ElseIf Mid$(s, i + 1, 1) = "I" Or Mid$(s, i + 1, 1) = "E" Or Mid$(s, i + 1, 1) = "Y" Then Call AppendCode(primary, secondary, "J") i = i + 2 Else Call AppendCode(primary, secondary, "K") If i < n And Mid$(s, i + 1, 1) = "G" Then i = i + 2 Else i = i + 1 End If End If Case "H" If i = 1 Or IsVowel(Mid$(s, i - 1, 1)) Or Not IsVowel(Mid$(s, i + 1, 1)) Then i = i + 1 Else Call AppendCode(primary, secondary, "H") i = i + 1 End If Case "J" Call AppendCode(primary, secondary, "J") If i < n And Mid$(s, i + 1, 1) = "J" Then i = i + 2 Else i = i + 1 End If Case "K" Call AppendCode(primary, secondary, "K") If i < n And Mid$(s, i + 1, 1) = "K" Then i = i + 2 Else i = i + 1 End If Case "P" If Mid$(s, i + 1, 1) = "H" Then Call AppendCode(primary, secondary, "F") i = i + 2 Else Call AppendCode(primary, secondary, "P") If i < n And Mid$(s, i + 1, 1) = "P" Then i = i + 2 Else i = i + 1 End If End If Case "Q" Call AppendCode(primary, secondary, "K") If i < n And Mid$(s, i + 1, 1) = "Q" Then i = i + 2 Else i = i + 1 End If Case "S" If next3 = "SCH" Then Call AppendCode(primary, secondary, "SK") i = i + 3 ElseIf next3 = "SIO" Or next3 = "SIA" Then Call AppendCode(primary, secondary, "X") i = i + 3 ElseIf next2 = "SH" Then Call AppendCode(primary, secondary, "X") i = i + 2 Else Call AppendCode(primary, secondary, "S") If i < n And Mid$(s, i + 1, 1) = "S" Then i = i + 2 Else i = i + 1 End If End If Case "T" If next3 = "TIA" Or next3 = "TIO" Then Call AppendCode(primary, secondary, "X") i = i + 3 ElseIf next2 = "TH" Then Call AppendCode(primary, secondary, "0") i = i + 2 Else Call AppendCode(primary, secondary, "T") If i < n And Mid$(s, i + 1, 1) = "T" Then i = i + 2 Else i = i + 1 End If End If Case "V" Call AppendCode(primary, secondary, "F") If i < n And Mid$(s, i + 1, 1) = "V" Then i = i + 2 Else i = i + 1 End If Case "W", "Y" If IsVowel(Mid$(s, i + 1, 1)) Then Call AppendCode(primary, secondary, ch) End If i = i + 1 Case "X" Call AppendCode(primary, secondary, "KS") i = i + 1 Case "Z" Call AppendCode(primary, secondary, "S") If i < n And Mid$(s, i + 1, 1) = "Z" Then i = i + 2 Else i = i + 1 End If Case Else ' Vowels and any other consonants not explicitly handled. i = i + 1 End Select Loop primary = Left$(primary, 4) secondary = Left$(secondary, 4) End Sub Private Function NormalizeInput(ByVal textIn As String) As String Dim s As String, i As Long, ch As String s = UCase$(Trim$(textIn)) Dim out As String out = "" For i = 1 To Len(s) ch = Mid$(s, i, 1) If ch >= "A" And ch <= "Z" Then out = out & ch End If Next i ' Handle a few special initial combinations. If Left$(out, 2) = "KN" Or Left$(out, 2) = "GN" Or Left$(out, 2) = "PN" Then out = Mid$(out, 2) ElseIf Left$(out, 2) = "WR" Then out = Mid$(out, 2) ElseIf Left$(out, 1) = "X" Then out = "S" & Mid$(out, 2) End If NormalizeInput = out End Function Private Function IsVowel(ByVal ch As String) As Boolean IsVowel = (ch = "A" Or ch = "E" Or ch = "I" Or ch = "O" Or ch = "U" Or ch = "Y") End Function Private Sub AppendCode(ByRef primary As String, ByRef secondary As String, ByVal code As String) If Len(primary) < 4 Then primary = primary & code If Len(secondary) < 4 Then secondary = secondary & code End Sub Note : This VBA implementation focuses on the most frequent English name patterns and is intentionally capped at four characters for each key. If your data quality project requires a fully faithful Double Metaphone implementation, you can extend the rule set or replace this module with a more exhaustive version following the same function signatures.
4.2 Using the functions in worksheets
Assume a table of customer names in column A starting in A2.
You can create helper columns for the primary and secondary Double Metaphone codes.
- In B2, enter:
=DMetaphonePrimary($A2) - In C2, enter:
=DMetaphoneSecondary($A2) Fill these formulas down the column.
You now have phonetic keys that you can use for joins and fuzzy lookups instead of the raw names.
5. Building phonetic matching formulas in Excel
5.1 One-to-one lookup with Double Metaphone keys
Consider two lists.
- A master list of customers in an Excel Table named tblMaster with columns [FullName], [DM_Primary], and [DM_Secondary].
- A new import of names in a Table named tblImport with column [FullName].
First, compute phonetic keys for both tables using the UDFs shown above.
To perform a lookup from import to master based on the primary code, you can use XLOOKUP.
=XLOOKUP( DMetaphonePrimary([@FullName]), tblMaster[DM_Primary], tblMaster[FullName], "", 0 ) This formula finds the first master name that shares the same primary Double Metaphone key as the imported name.
It is a simple and scalable way to align records that are likely to be the same person.
5.2 Matching using both primary and secondary codes
To improve coverage, consider both primary and secondary codes.
In the imported table, add two helper columns.
- [DM_Primary] with
=DMetaphonePrimary([@FullName]). - [DM_Secondary] with
=DMetaphoneSecondary([@FullName]).
You can then use a dynamic array formula (Excel 365 or later) to list all potential matches from the master table.
=FILTER( tblMaster[FullName], (tblMaster[DM_Primary]=[@DM_Primary]) + (tblMaster[DM_Secondary]=[@DM_Primary]) + (tblMaster[DM_Primary]=[@DM_Secondary]) + (tblMaster[DM_Secondary]=[@DM_Secondary]) ) This formula returns all master names whose primary or secondary codes overlap with the import row’s primary or secondary codes.
| Comparison type | Condition | Interpretation |
|---|---|---|
| Strong match | Primary = Primary | Highest confidence, usually same pronunciation. |
| Normal match | Primary = Secondary or Secondary = Primary | Common for different spellings of the same name. |
| Weak match | Secondary = Secondary | Possible match, should be reviewed with more context. |
You can capture the match strength in another column by assigning a score to each pattern and using an IF or IFS formula.
5.3 Combining phonetic keys with other criteria
Phonetic matching becomes safer when combined with additional signals.
Typical combinations include:
- Phonetic key plus same postal code.
- Phonetic key plus same date of birth.
- Phonetic key plus similar email address or phone number.
For example, you can build a composite Boolean condition inside FILTER.
=FILTER( tblMaster[FullName], ( (tblMaster[DM_Primary]=[@DM_Primary]) + (tblMaster[DM_Secondary]=[@DM_Primary]) ) * (tblMaster[PostCode]=[@PostCode]) ) This formula only returns candidates that share a phonetic code and the same postal code, which greatly reduces false positives.
6. Using Double Metaphone in data quality workflows
Double Metaphone works best as part of a structured data quality pipeline.
6.1 Recommended workflow for deduplicating names
- Normalize text. Trim spaces, convert to a consistent case, strip accents, and standardize common prefixes or suffixes.
- Compute phonetic keys. Use DMetaphonePrimary and DMetaphoneSecondary in helper columns.
- Generate candidate pairs. Use formulas or Power Query to group records by phonetic key combinations.
- Score pairs. Add additional checks such as same city, similar email, or matching date of birth.
- Review and confirm. Present high-scoring candidate pairs to users via filtered views or dashboards to approve merges.
6.2 Leveraging Power Query for precomputation
For large datasets, it is often efficient to precompute Double Metaphone codes in Power Query.
A common pattern is:
- Load the raw data into Power Query.
- Apply a custom function that calls a Double Metaphone implementation (for example via a small .NET library exposed through a connector).
- Load the resulting table into Excel with phonetic key columns as standard values.
This separates the heavier phonetic calculations from your interactive formulas and keeps recalculation times under control.
7. Performance and governance best practices
7.1 Performance tips for Double Metaphone in Excel
- Prefer working with Excel Tables and structured references so that formulas automatically expand to new rows without manual copying.
- Limit the number of times you call the UDF on the same text. Compute the key once in a helper column and reuse that column in all lookups.
- Avoid volatile functions (such as OFFSET or INDIRECT) near your phonetic calculations, because they force more frequent recalculation.
- For very large lists, consider splitting the workload into batches across separate sheets or files, or push the computation into Power Query or an external system.
7.2 Data governance considerations
- Document your phonetic matching logic, including how you treat primary and secondary codes and what thresholds you use for accepting matches.
- Store both the original text and the phonetic codes. Never overwrite the raw names with phonetic keys.
- Log manual decisions in a separate table (for example, “these two records are confirmed duplicates”) and feed them back into future rules.
- Ensure your matching process complies with privacy and data protection regulations, especially when handling personal names and contact information.
Note : A phonetic key by itself should never be used as the sole identifier for a person. It is a probabilistic feature intended to support matching, not a replacement for proper IDs or legally reliable identifiers.
FAQ
Is Double Metaphone always better than Soundex for Excel name matching?
For most modern name datasets, Double Metaphone provides more accurate and flexible phonetic encoding than traditional Soundex.
It accounts for additional language influences and produces primary and secondary codes, which increases the chance of capturing valid matches for names with multiple pronunciations.
However, Soundex is simpler and already supported natively in some databases, so if you are constrained to that environment, you may still find Soundex useful as a first pass.
Can I use Double Metaphone in Excel without VBA?
Yes, but you need an external component.
A practical option is to use Power Query with a custom connector or library that provides a Double Metaphone function, then precompute keys as part of the data load.
Another option is to compute phonetic keys in a database, Python script, or data quality tool and then import the results into Excel as regular columns.
If you want everything to live in the workbook and support cell-level formulas, a VBA UDF is usually the most straightforward approach.
How should I handle multi-word names with Double Metaphone in Excel?
For multi-word names, the safest pattern is to compute phonetic keys separately for each logical component, such as given name and surname.
For example, split “John Michael Smith” into first, middle, and last name columns, compute DMetaphonePrimary for each, and build matching rules that compare each part separately.
This reduces false matches that arise when two people share a surname but differ on given name or when middle names are inconsistently recorded.
Does Double Metaphone work for non-English names?
Double Metaphone was designed to handle a variety of European and international name patterns better than earlier algorithms, and it often performs reasonably well for many non-English names written in Latin script.
However, its rules are still largely centered on English-like spelling and pronunciation, so performance can vary across languages and transliteration systems.
For datasets dominated by a specific non-English language, you may need to tune the rules or combine Double Metaphone with language-specific normalization to achieve the best results.
How do I prevent false positives from phonetic matching in Excel?
Always combine phonetic keys with at least one other attribute, such as postal code, date of birth, or another identifying field.
Introduce a scoring system that considers multiple factors and only treat high-scoring pairs as probable duplicates.
For sensitive applications, include a manual review step and clearly track which pairs were confirmed or rejected, so that your rules can be refined over time.
추천·관련글
- Fix Poor XRD Alignment: Expert Calibration Guide for Accurate Powder Diffraction
- Industrial Waste Phase Separation Troubleshooting: How to Break Stable Emulsions and Restore Settling
- Fix FTIR Baseline Slope: Proven Methods for Accurate Spectra
- Fix Electrochemical iR Compensation Errors: Practical Guide to Uncompensated Resistance (Ru)
- Correct Curved ICP-OES Calibration: Expert Methods to Restore Linearity
- GC Flow Instability Fix: Proven Steps to Stabilize Gas Chromatography Flow
double metaphone excel
excel data cleansing
fuzzy lookup names
phonetic matching excel
text matching in excel
- Get link
- X
- Other Apps