This article explains how to design and implement robust record linkage in Excel using Power Query, including exact and fuzzy matching, data cleaning, and master data consolidation so that you can reliably deduplicate and connect records across multiple tables.
1. What is record linkage and why use Excel Power Query
Record linkage is the process of identifying records in different tables that refer to the same real-world entity, such as a customer, vendor, product, or location.
In typical Excel workbooks, this problem appears as deduplicating customer lists from different systems, mapping multiple address files to a single master, or aligning transaction-level data with a reference table when keys are incomplete or inconsistent.
Power Query in Excel is well suited for record linkage because it provides:
- Repeatable query steps that can be refreshed when new data arrives.
- Merge operations that behave like joins in databases.
- Fuzzy matching capabilities to handle typos, spacing differences, and minor text variations.
- Powerful text and column transformation functions to normalize messy data.
Instead of relying on complex nested formulas or manual checking, you can use Power Query to build a structured record linkage pipeline that is refreshable and auditable.
2. Planning a record linkage strategy before opening Excel
Before you start building queries, you should define how the matching logic will work from a business perspective.
2.1 Define your linkage keys and business rules
Start by listing which fields you will consider for matching.
- Strong identifiers: Tax ID, national ID, system customer ID, email address, phone number.
- Descriptive attributes: Name, company name, address, city, postal code, date of birth.
Typical record linkage strategies in Excel Power Query use a combination of these keys:
- Exact match on strong identifier (for example, tax ID or email) whenever available.
- Exact or fuzzy match on name plus a blocking field such as city or postal code.
- Fallback logic that uses additional information such as phone or date of birth when primary matches are not conclusive.
2.2 Deterministic vs fuzzy linkage in Power Query
In advanced data quality tools, you may see probabilistic or scoring-based matching. Power Query does not implement full-blown probabilistic record linkage, but you can still approximate similar logic by combining:
- Deterministic exact joins for high-confidence matches.
- Fuzzy merges for near matches, constrained by additional blocking fields.
- Post-processing rules to classify matches into tiers such as “exact”, “high-confidence fuzzy”, and “requires manual review”.
3. Preparing your data for record linkage in Power Query
Data cleaning has more impact on record linkage quality than any single fuzzy matching parameter. Normalization should aim to make values comparable and reduce meaningless differences.
3.1 Common normalization steps
Typical text normalization actions in Power Query include:
- Trimming leading and trailing spaces.
- Replacing multiple spaces with a single space.
- Converting to upper case or lower case.
- Removing punctuation such as commas, periods, or hyphens where they are not significant.
- Standardizing abbreviations (for example, “Co.” to “Company”, “Ltd” to “Limited”).
| Goal | Power Query step | Typical M expression |
|---|---|---|
| Trim and uppercase company names | Add Custom Column | Text.Upper(Text.Trim([CompanyName])) |
| Remove internal double spaces | Replace Values (space-space → space) | Text.Replace([CompanyName], " ", " ") |
| Strip punctuation from names | Custom column with nested replace | Text.Remove([Name], {".", ",", ";", ":"}) |
| Create combined key (Name + City) | Merge Columns | Table.CombineColumns(#"Previous Step", {"NameNorm","CityNorm"}, Combiner.CombineTextByDelimiter(" | ", QuoteStyle.None), "Key_Name_City") |
3.2 Blocking keys to reduce false positives
Blocking means restricting fuzzy matching to subsets of data that share a relatively reliable attribute. For example:
- Only compare records with the same postal code.
- Only match person names where year of birth is the same.
- Only match company names where country is identical.
In Power Query, you implement blocking by including these fields in the merge key in addition to the field on which you do fuzzy matching. For example, you configure a fuzzy merge on a combined key such as “NormalizedName | PostalCode”.
4. Loading tables into Power Query
For Excel source tables, the standard pattern is:
- Convert source ranges into Tables (Ctrl+T) and give them meaningful names, such as
Customers_AandCustomers_B. - Select a cell inside the first table, then go to Data > From Table/Range to load it into Power Query.
- Repeat for the second table.
- Apply the normalization steps described earlier, creating columns such as
NameNorm,CityNorm, and combined keys.
A simple M script fragment for normalization might look like this:
let Source = Excel.CurrentWorkbook(){[Name="Customers_A"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{{"CustomerName", type text}, {"City", type text}}), #"Added NameNorm" = Table.AddColumn(#"Changed Type", "NameNorm", each Text.Upper(Text.Trim([CustomerName])), type text), #"Added CityNorm" = Table.AddColumn(#"Added NameNorm", "CityNorm", each Text.Upper(Text.Trim([City])), type text), #"Added Key_Name_City" = Table.AddColumn(#"Added CityNorm", "Key_Name_City", each [NameNorm] & " | " & [CityNorm], type text) in #"Added Key_Name_City" 5. Exact match record linkage with Merge Queries
Exact merge is the starting point and should be used whenever strong identifiers exist.
5.1 Performing an exact merge in Power Query
- In the Power Query editor, select the main query (for example,
Customers_A). - Go to Home > Merge Queries > Merge Queries as New.
- In the Merge dialog, choose the second query (for example,
Customers_B). - Click the key column in each table, such as
EmailorCustomerID. - Select the desired join kind:
- Inner for only matched pairs (intersection).
- Left Outer when using table A as master and pulling matches from B.
- Full Outer to inspect all matches and non-matches.
- Confirm to create the merged query and expand the nested table.
The equivalent M code for a left outer merge on email could be:
let CustomersA = #"Customers_A_Normalized", CustomersB = #"Customers_B_Normalized", #"Merged Queries" = Table.NestedJoin( CustomersA, {"EmailNorm"}, CustomersB, {"EmailNorm"}, "Match", JoinKind.LeftOuter ), #"Expanded Match" = Table.ExpandTableColumn( #"Merged Queries", "Match", {"CustomerID", "Phone", "Address"}, {"B_CustomerID", "B_Phone", "B_Address"} ) in #"Expanded Match" 5.2 Handling multiple matches and survivorship
If the merge key is not unique, a single record from table A may have multiple matches in table B. Common strategies include:
- Keeping only the first match based on a priority rule (for example, most recent update date).
- Aggregating matched records (for example, concatenating multiple phone numbers).
- Flagging multi-matches for manual review.
To choose a single survivor per key, you can group and keep the top row by a sort order:
let #"Sorted Rows" = Table.Sort(#"Expanded Match", {{"LastUpdated", Order.Descending}}), #"Grouped Rows" = Table.Group( #"Sorted Rows", {"CustomerID"}, {{"AllRows", each Table.FirstN(_, 1), type table [CustomerID=nullable text, ...]}} ), #"Expanded First" = Table.ExpandTableColumn(#"Grouped Rows", "AllRows", {"CustomerID","NameNorm","B_Address","LastUpdated"}, {"CustomerID","NameNorm","B_Address","LastUpdated"}) in #"Expanded First" 6. Fuzzy record linkage with Power Query fuzzy merge
Fuzzy matching in Power Query allows you to merge tables when the join key values are similar but not identical. This is critical for record linkage scenarios with typos, different casing, or minor naming variations.
6.1 Configuring fuzzy merge in the user interface
In the Merge dialog, you can enable fuzzy matching as follows:
- After selecting both tables and the join columns, check Use fuzzy matching to perform the merge.
- Click Similarity Threshold options to configure:
- Similarity threshold (0 to 1). Higher means stricter matching.
- Ignore case to treat upper and lower case as identical.
- Match by combining text parts to tolerate reordered words.
- Maximum number of matches per row to limit duplicates.
- Transformation table for mapping synonyms and abbreviations.
6.2 Fuzzy merge using combined keys
To reduce false positives, it is common to perform fuzzy matching on a composite key, for example “normalized name + city”. Suppose both tables contain a Key_Name_City column built from normalized values.
An M script using Table.FuzzyJoin might look like this:
let CustomersA = #"Customers_A_Normalized", CustomersB = #"Customers_B_Normalized", #"Fuzzy Merged" = Table.FuzzyJoin( CustomersA, {"Key_Name_City"}, CustomersB, {"Key_Name_City"}, "Match", JoinKind.LeftOuter, [ IgnoreCase = true, SimilarityThreshold = 0.88, MaxMatches = 1 ] ), #"Expanded Match" = Table.ExpandTableColumn( #"Fuzzy Merged", "Match", {"CustomerID","CustomerName","City","Phone"}, {"B_CustomerID","B_CustomerName","B_City","B_Phone"} ) in #"Expanded Match" Key points:
- The similarity threshold is tuned to limit matches to highly similar values.
MaxMatches = 1avoids multiple fuzzy matches per record, which simplifies survivorship rules but may ignore legitimate duplicates. Adjust according to your use case.- Composite keys embed blocking logic directly into the join column.
6.3 Using a transformation table for synonyms and abbreviations
A transformation table lets you map multiple variant spellings to a common standard form before fuzzy matching. Typical examples include:
- “CO”, “CO.”, “COMPANY” → “COMPANY”.
- “ST”, “STREET” → “STREET”.
- Local language abbreviations for “Corporation”, “Limited”, and so on.
Design your transformation table with at least two columns:
| From | To |
|---|---|
| CO | COMPANY |
| CO. | COMPANY |
| INC | INCORPORATED |
| LTD | LIMITED |
In the Merge dialog, choose this table as the transformation table for the fuzzy merge. Power Query will apply the mappings before computing similarity, effectively standardizing names.
7. Building a consolidated master table
After performing exact and fuzzy merges, the goal is often to build a single “golden” master table that aggregates attributes from multiple sources.
7.1 Layered matching approach
A robust pattern in Power Query is to build separate queries for each matching tier and then append results:
- Tier 1 – Exact ID match: Merge on strong identifiers such as tax ID or internal customer ID.
- Tier 2 – Exact email or phone: For records not matched in tier 1, merge on normalized email or phone.
- Tier 3 – Fuzzy name + city: For remaining records, apply a fuzzy merge on a combined key.
- Tier 4 – Unmatched: Retain records that still have no counterpart for manual investigation.
Each tier can add a column such as MatchTier containing values like “ID”, “Email”, “FuzzyNameCity”, or “Unmatched”. Finally, append the tier queries:
let Tier1 = #"Matches_ID", Tier2 = #"Matches_Email", Tier3 = #"Matches_FuzzyNameCity", Unmatched = #"Unmatched_Records", #"All_Combined" = Table.Combine({Tier1, Tier2, Tier3, Unmatched}) in #"All_Combined" 7.2 Survivorship and attribute precedence
When attributes exist in both tables, you need rules for which source to trust. Examples:
- Prefer the CRM system for contact details, but the billing system for tax information.
- Choose the most recent non-null value based on an
UpdatedAttimestamp. - Fallback to secondary fields when primary fields are blank.
You can implement survivorship rules with custom columns. For example, prefer phone from table B and fallback to table A:
let #"Added SurvivingPhone" = Table.AddColumn( #"All_Combined", "SurvivingPhone", each if [B_Phone] <> null and [B_Phone] <> "" then [B_Phone] else [A_Phone], type text ) in #"Added SurvivingPhone" 8. Incremental record linkage with an Excel master file
In operational scenarios, you may receive new data periodically and need to link it to an existing master list stored in Excel.
8.1 Pattern for incremental linkage
- Load the existing master table into Power Query.
- Load the new batch of records as a separate query.
- Normalize both to the same standard.
- Perform exact and fuzzy merges between the new batch and the master.
- Assign either an existing master ID (when matched) or generate a new master ID.
- Append newly created master records to the master table and output the updated master back to Excel.
An ID generation pattern in Power Query can use an index column added after sorting:
let #"Sorted Master" = Table.Sort(#"Master", {{"MasterID", Order.Ascending}}), #"MaxID" = List.Max(#"Sorted Master"[MasterID]), #"NewRecordsOnly" = #"Linked_New"[[Columns...]], #"Added Index" = Table.AddIndexColumn( #"NewRecordsOnly", "MasterID", #"MaxID" + 1, 1, Int64.Type ), #"UpdatedMaster" = Table.Combine({#"Master", #"Added Index"}) in #"UpdatedMaster" 9. Performance and governance best practices
Record linkage queries can be heavy, particularly when fuzzy matching large tables. Some practical guidelines include:
- Reduce column set before merging. Keep only necessary columns for the match and survivorship logic.
- Use blocking aggressively (such as city or postal code) to limit the search space for fuzzy matching.
- Disable load for intermediate queries that are only used by others (right-click query > Enable load off).
- Sample data first by keeping only a few thousand rows until you are confident in the logic.
- Log match statistics by adding queries that count the number of records per
MatchTieror per similarity band.
FAQ
How is record linkage in Excel Power Query different from using VLOOKUP or XLOOKUP?
VLOOKUP and XLOOKUP perform row-by-row lookups with exact or approximate matching, usually on a single key column. They are formula-driven and can be difficult to audit when many nested conditions are involved.
Power Query, on the other hand, uses merge operations that behave like database joins. It allows you to define repeatable steps for data cleaning, exact joins, fuzzy merges, and survivorship in a single query. This design is easier to refresh, document, and troubleshoot for complex record linkage scenarios.
When should I use fuzzy matching instead of exact matching in Power Query?
Use exact matching first whenever reliable identifiers are present, such as IDs or well-maintained emails. Fuzzy matching should be reserved for fields prone to inconsistencies or typos, such as free-form names and addresses.
A common pattern is to process records through multiple tiers: exact ID matches, exact email or phone matches, and finally fuzzy matches on normalized names with blocking fields like city or postal code. This minimizes the risk of incorrect combinations while still capturing legitimate near matches.
How can I evaluate the quality of my record linkage?
Start by defining metrics such as the percentage of records matched, the distribution of similarity scores, and the number of records in each match tier. Then, manually inspect random samples from each group, especially borderline fuzzy matches close to your similarity threshold.
You can also export subsets of matched and unmatched data to separate worksheets and ask subject matter experts to review them. Their feedback can guide adjustments to your thresholds, transformation tables, and survivorship rules.
How do I avoid incorrect fuzzy matches on common names?
Always combine fuzzy-matched fields with at least one additional blocking field. For person names, that might be date of birth or postal code. For companies, use city, country, or tax ID if available.
In Power Query, this usually means building a composite key such as “NameNorm | PostalCode” and applying fuzzy matching to that column. You can also increase the similarity threshold and limit the maximum number of matches per record.
Can Power Query emulate probabilistic record linkage methods?
Power Query does not implement full probabilistic algorithms out of the box. However, you can approximate some aspects by combining multiple match indicators, such as exact email match flags, fuzzy similarity results, and agreement on blocking fields.
For example, you can compute a composite score column using custom formulas and then classify matches into categories based on that score. While less sophisticated than dedicated record linkage software, this approach is often sufficient for small to medium-sized Excel-based projects.