- Get link
- X
- Other Apps
This article explains how to design, create, and maintain surrogate keys in Power Query so that your Excel and Power BI data models remain stable, reliable, and easy to extend even as source systems and business rules change.
1. Understanding surrogate keys in Power Query
When building data models in Power Query for Excel or Power BI, a surrogate key is a technical key that uniquely identifies each row in a table, independent of any business logic or real-world meaning.
1.1 Business keys vs. surrogate keys
A business key (or natural key) is a value that comes from the business domain, such as CustomerID from a CRM, ProductCode from ERP, or EmailAddress from a web application.
A surrogate key is an artificial key that you generate in Power Query, typically as an integer or hash, with these characteristics.
- Has no business meaning.
- Is stable within your model, even if source systems change.
- Is optimized for joins and relationships.
- Is often used as a primary key in dimension tables and as a foreign key in fact tables.
1.2 Why surrogate keys matter in Power Query data models
Using surrogate keys in Power Query is essential when you build star schemas or complex data models.
- They simplify joins between fact and dimension tables.
- They support slowly changing dimensions, where business attributes may change over time.
- They allow you to join multiple heterogeneous sources that do not share the same business keys.
- They improve performance compared with composite text keys in large models.
2. Typical scenarios where you need surrogate keys in Power Query
In Power Query, surrogate keys are useful in several recurring modeling patterns.
- Dimension tables such as Customers, Products, Calendar, or Locations.
- Fact tables that require stable foreign keys to dimensions.
- Record linkage across multiple sources with slightly different identifiers.
- Historical tracking when one business entity has multiple versions over time.
| Scenario | Problem with business keys | Benefit of surrogate keys in Power Query |
|---|---|---|
| Customer dimension | Customer codes may be reassigned or merged. | Stable integer key per customer record, independent of code. |
| Product master data | Product IDs change when migrated to new ERP. | Surrogate key allows joining facts across old and new systems. |
| Slowly changing dimensions | Address, status, or segment changes over time. | Each historical version gets its own surrogate key. |
| Record linkage between systems | Different ID formats for the same entity. | Surrogate key unifies matched records across sources. |
3. Core methods to create surrogate keys in Power Query
Power Query offers several practical patterns for generating surrogate keys. The three most common methods are.
- Index-based surrogate keys.
- Hash-based surrogate keys.
- Composite business keys as a fallback.
3.1 Creating index-based surrogate keys
The most straightforward pattern is to create a numeric index column in Power Query.
- Load your table into Power Query.
- Sort rows by a stable order (for example, by original ID or business key).
- Use Add Column → Index Column to generate a new integer key.
Example M code for a simple index-based surrogate key.
let Source = Excel.CurrentWorkbook(){[Name="tblCustomers"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{ {"CustomerCode", type text}, {"CustomerName", type text}, {"Country", type text} }), #"Sorted Rows" = Table.Sort(#"Changed Type",{{"CustomerCode", Order.Ascending}}), #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "CustomerKey", 1, 1, Int64.Type) in #"Added Index" In this example.
CustomerKeyis the surrogate key that you will use as the primary key in the dimension.CustomerCoderemains as a business attribute, not as a join key.
3.2 Creating hash-based surrogate keys
Hash-based keys are useful when you need deterministic IDs across multiple queries or files, especially in record linkage scenarios.
You can create a hash key by concatenating relevant columns and applying a hash function, often via Binary.FromText, Text.ToBinary, or custom M logic.
let Source = Excel.CurrentWorkbook(){[Name="tblProducts"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{ {"ProductCode", type text}, {"Category", type text}, {"Brand", type text} }), #"Added Key Base" = Table.AddColumn(#"Changed Type", "KeyBase", each Text.Combine({[ProductCode], [Category], [Brand]}, "|"), type text), #"Added Hash" = Table.AddColumn(#"Added Key Base", "ProductKey", each Text.From(Binary.ToText(Text.ToBinary([KeyBase]), BinaryEncoding.Base64)), type text), #"Removed Columns" = Table.RemoveColumns(#"Added Hash",{"KeyBase"}) in #"Removed Columns" Here.
ProductKeyis a text surrogate key derived from relevant attributes.- It is deterministic: the same combination of inputs will always generate the same key.
- It works well when merging tables coming from different sources with the same logical attributes.
3.3 Composite keys as a controlled fallback
Sometimes you cannot introduce a clean surrogate key because the model must remain compatible with an existing design. In such cases, you may create a composite key column in Power Query, while still planning a long-term move to a true surrogate key.
let Source = Excel.CurrentWorkbook(){[Name="tblSales"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{ {"StoreID", Int64.Type}, {"DateKey", type date}, {"ProductID", Int64.Type} }), #"Added Composite Key" = Table.AddColumn(#"Changed Type", "SalesBusinessKey", each Text.Combine({ Text.From([StoreID]), Date.ToText([DateKey], "yyyyMMdd"), Text.From([ProductID]) }, "-"), type text) in #"Added Composite Key" This composite key is still based on business fields, but by centralizing the logic in one column, you make it easier later to map it to a surrogate integer key if needed.
4. Using surrogate keys in relationships and merges
Surrogate keys only become useful when you actually use them as the join fields between tables, both in Power Query and in the data model.
4.1 Building dimension and fact tables with surrogate keys
A typical pattern is.
- Create the dimension table in Power Query and generate a surrogate key (for example,
CustomerKey). - In the fact table query, merge with the dimension table using business keys (for example,
CustomerCode). - Expand only the surrogate key from the dimension and keep it as
CustomerKeyin the fact table. - Load both tables to the model and create a relationship between
DimCustomer[CustomerKey]andFactSales[CustomerKey].
Example M pattern for merging a fact table to retrieve the surrogate key.
let FactSource = Excel.CurrentWorkbook(){[Name="tblSales"]}[Content], #"Changed Type" = Table.TransformColumnTypes(FactSource,{ {"CustomerCode", type text}, {"ProductCode", type text}, {"SalesAmount", type number} }), DimCustomer = #"DimCustomer With Surrogate Key", // reference dimension query #"Merged Queries" = Table.NestedJoin( #"Changed Type", {"CustomerCode"}, DimCustomer, {"CustomerCode"}, "DimCustomer", JoinKind.LeftOuter ), #"Expanded DimCustomer" = Table.ExpandTableColumn( #"Merged Queries", "DimCustomer", {"CustomerKey"}, {"CustomerKey"} ) in #"Expanded DimCustomer" With this pattern.
- The fact table no longer depends directly on
CustomerCodefor relationships. - Changes to
CustomerCodeor alternative mapping logic only require updates in the dimension, not in every merge.
4.2 Managing changes in business keys
When business keys change (for example, a customer is merged with another account and gets a new code), a surrogate-key-based design allows you to retain historical facts under the old mapping while still aligning new facts to the updated mapping.
- You can keep multiple rows for the same real-world entity in the dimension with different time ranges.
- Each row gets its own surrogate key, but joins from facts will still resolve correctly based on the BusinessKey and date logic you implement.
5. Refresh, reproducibility, and stability considerations
Creating surrogate keys in Power Query is not only about generating unique values. You must ensure that keys remain stable across refreshes, deployments, and data source changes.
5.1 Avoiding non-deterministic ordering
Index-based keys rely on row order. If the sort order changes between refreshes, the same real-world entity could receive a different surrogate key, breaking relationships and historical analyses.
- Always apply an explicit
Table.Sortstep beforeTable.AddIndexColumn. - Use fields that are guaranteed to be unique and stable, such as an original system ID plus an effective start date.
- Avoid relying on source-order, especially from CSV files or loosely defined external feeds.
5.2 Handling incremental loads
When using incremental refresh or custom incremental loading patterns.
- Build surrogate keys in a dimension query that processes the full history, or manage key assignment in a central table (for example, through a master dimension query).
- For fact tables loaded incrementally, ensure that lookups to the dimension are still correct for newly arriving rows.
- If you use index-based keys, they should be generated on a stable snapshot of the dimension, not on the incremental fact data.
5.3 Deployment across environments
In Power BI, the same Power Query logic may run against different environments (development, test, production). To keep surrogate keys consistent.
- Use deterministic key generation (for example, sorted index or hash of business attributes).
- Avoid environment-specific values (such as database-specific surrogate keys) when defining your own keys in Power Query.
- Document how keys are generated so that future changes do not accidentally reset or redefine them.
6. Comparing surrogate key strategies in Power Query
The table below compares common surrogate key strategies and how they behave in typical Power Query projects.
| Strategy | Key type | Pros | Cons | Recommended usage |
|---|---|---|---|---|
| Sorted Index | Int64 index | Simple, fast, ideal for relationships. | Depends on stable sort order. | Dimensions with reliable source IDs. |
| Hash of attributes | Text hash | Deterministic across systems, good for linkage. | Longer keys, possible hash collisions in theory. | Record linkage and multi-source matching. |
| Composite business key | Concatenated text | No extra integer column needed. | Still tied to business attributes and changes. | Temporary solution or compatibility mode. |
7. Practical step-by-step example
The following end-to-end example shows how to implement surrogate keys in Power Query for a small Customer dimension and Sales fact table in Excel or Power BI.
7.1 Step 1 – Create the Customer dimension with a surrogate key
let Source = Excel.CurrentWorkbook(){[Name="tblCustomers"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{ {"CustomerCode", type text}, {"CustomerName", type text}, {"Country", type text} }), #"Removed Duplicates" = Table.Distinct(#"Changed Type", {"CustomerCode"}), #"Sorted Rows" = Table.Sort(#"Removed Duplicates",{{"CustomerCode", Order.Ascending}}), #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "CustomerKey", 1, 1, Int64.Type) in #"Added Index" Load this query as DimCustomer to the model.
7.2 Step 2 – Build the Sales fact table and pull the surrogate key
let Source = Excel.CurrentWorkbook(){[Name="tblSales"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{ {"CustomerCode", type text}, {"Date", type date}, {"SalesAmount", type number} }), DimCustomer = #"DimCustomer", // reference the dimension query #"Merged Queries" = Table.NestedJoin( #"Changed Type", {"CustomerCode"}, DimCustomer, {"CustomerCode"}, "DimCustomer", JoinKind.LeftOuter ), #"Expanded DimCustomer" = Table.ExpandTableColumn( #"Merged Queries", "DimCustomer", {"CustomerKey"}, {"CustomerKey"} ) in #"Expanded DimCustomer" Load this query as FactSales to the model and create a relationship between DimCustomer[CustomerKey] and FactSales[CustomerKey].
7.3 Step 3 – Validate uniqueness and referential integrity
In the Customer dimension.
- Check that
CustomerKeyhas no duplicates. - Verify that
CustomerCodeis unique or is handled appropriately if duplicates exist.
In the Sales fact table.
- Check for any rows where
CustomerKeyis null, which indicates unmatched customers. - Decide how to handle unmatched rows (for example, assign them to an “Unknown” customer or fix the source data).
FAQ
Should I always use surrogate keys instead of business keys in Power Query?
You do not always need surrogate keys for very small or simple models, but they are recommended for most Power Query projects that feed analytical models in Excel or Power BI. When you expect multiple tables, historical data, slowly changing attributes, or integration of several sources, surrogate keys significantly improve stability and maintainability.
Is an index column in Power Query safe as a primary key?
An index column is safe as a primary key only if the row ordering is deterministic. You must sort the table explicitly using stable attributes before adding the index, and the source data must not randomly reorder records between refreshes. If these conditions hold, an index-based surrogate key is a robust and high-performing solution.
When should I prefer hash-based surrogate keys?
Hash-based surrogate keys are useful when you need deterministic keys for the same entities coming from different sources or environments. They are especially helpful in record linkage, where no single source has a clean primary key. Because the hash is derived from a set of attributes, the key will be consistent as long as those attributes and the hashing logic remain the same.
Can surrogate keys be text, or must they be integers?
Surrogate keys can be either integers or text. Integers are preferred for performance and memory efficiency, especially in large Power BI models. However, text-based surrogate keys, such as hashes or composite keys, are acceptable when you need deterministic cross-system keys or when migrating from an existing design. The key property is that the value is stable, unique, and independent of business meaning.
How do surrogate keys relate to relationships in the Power BI model?
In the data model, surrogate keys become the fields that define relationships between tables. Dimension tables expose the surrogate key as their primary key, and fact tables store the corresponding surrogate key as a foreign key. Once these relationships are defined, analytical measures and visuals can safely rely on them, even if the underlying business codes change over time.
- Fix Electrochemical iR Compensation Errors: Practical Guide to Uncompensated Resistance (Ru)
- Fix NMR Shimming Failure: Expert Troubleshooting Guide for Sharp, Stable Spectra
- GC Peak Tailing Troubleshooting: Proven Fixes for Sharp, Symmetric Peaks
- Fix Sudden Drop in Open-Circuit Voltage (OCV): Expert Battery Troubleshooting Guide
- Suppress Solvent Peak Interference in NMR: Proven Solvent Suppression Techniques and Settings
- Gas Chromatography FID Flame Ignition Failure: Expert Troubleshooting and Quick Fixes
- Get link
- X
- Other Apps