Mastering Surrogate Keys in Power Query for Reliable Power BI and Excel Data Models

This article explains how to design, create, and maintain surrogate keys in Power Query so that your Excel and Power BI data models remain stable, reliable, and easy to extend even as source systems and business rules change.

1. Understanding surrogate keys in Power Query

When building data models in Power Query for Excel or Power BI, a surrogate key is a technical key that uniquely identifies each row in a table, independent of any business logic or real-world meaning.

1.1 Business keys vs. surrogate keys

A business key (or natural key) is a value that comes from the business domain, such as CustomerID from a CRM, ProductCode from ERP, or EmailAddress from a web application.

A surrogate key is an artificial key that you generate in Power Query, typically as an integer or hash, with these characteristics.

  • Has no business meaning.
  • Is stable within your model, even if source systems change.
  • Is optimized for joins and relationships.
  • Is often used as a primary key in dimension tables and as a foreign key in fact tables.

1.2 Why surrogate keys matter in Power Query data models

Using surrogate keys in Power Query is essential when you build star schemas or complex data models.

  • They simplify joins between fact and dimension tables.
  • They support slowly changing dimensions, where business attributes may change over time.
  • They allow you to join multiple heterogeneous sources that do not share the same business keys.
  • They improve performance compared with composite text keys in large models.
Note : Relying only on business keys can break relationships when codes are recycled, corrected, or merged across systems, while surrogate keys protect your model from these changes.

2. Typical scenarios where you need surrogate keys in Power Query

In Power Query, surrogate keys are useful in several recurring modeling patterns.

  • Dimension tables such as Customers, Products, Calendar, or Locations.
  • Fact tables that require stable foreign keys to dimensions.
  • Record linkage across multiple sources with slightly different identifiers.
  • Historical tracking when one business entity has multiple versions over time.
Scenario Problem with business keys Benefit of surrogate keys in Power Query
Customer dimension Customer codes may be reassigned or merged. Stable integer key per customer record, independent of code.
Product master data Product IDs change when migrated to new ERP. Surrogate key allows joining facts across old and new systems.
Slowly changing dimensions Address, status, or segment changes over time. Each historical version gets its own surrogate key.
Record linkage between systems Different ID formats for the same entity. Surrogate key unifies matched records across sources.

3. Core methods to create surrogate keys in Power Query

Power Query offers several practical patterns for generating surrogate keys. The three most common methods are.

  • Index-based surrogate keys.
  • Hash-based surrogate keys.
  • Composite business keys as a fallback.

3.1 Creating index-based surrogate keys

The most straightforward pattern is to create a numeric index column in Power Query.

  1. Load your table into Power Query.
  2. Sort rows by a stable order (for example, by original ID or business key).
  3. Use Add Column → Index Column to generate a new integer key.

Example M code for a simple index-based surrogate key.

let Source = Excel.CurrentWorkbook(){[Name="tblCustomers"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{ {"CustomerCode", type text}, {"CustomerName", type text}, {"Country", type text} }), #"Sorted Rows" = Table.Sort(#"Changed Type",{{"CustomerCode", Order.Ascending}}), #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "CustomerKey", 1, 1, Int64.Type) in #"Added Index"

In this example.

  • CustomerKey is the surrogate key that you will use as the primary key in the dimension.
  • CustomerCode remains as a business attribute, not as a join key.
Note : Always sort the table by a stable column or set of columns before adding the index, so that the surrogate key assignment is predictable and reproducible under the same source data.

3.2 Creating hash-based surrogate keys

Hash-based keys are useful when you need deterministic IDs across multiple queries or files, especially in record linkage scenarios.

You can create a hash key by concatenating relevant columns and applying a hash function, often via Binary.FromText, Text.ToBinary, or custom M logic.

let Source = Excel.CurrentWorkbook(){[Name="tblProducts"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{ {"ProductCode", type text}, {"Category", type text}, {"Brand", type text} }), #"Added Key Base" = Table.AddColumn(#"Changed Type", "KeyBase", each Text.Combine({[ProductCode], [Category], [Brand]}, "|"), type text), #"Added Hash" = Table.AddColumn(#"Added Key Base", "ProductKey", each Text.From(Binary.ToText(Text.ToBinary([KeyBase]), BinaryEncoding.Base64)), type text), #"Removed Columns" = Table.RemoveColumns(#"Added Hash",{"KeyBase"}) in #"Removed Columns"

Here.

  • ProductKey is a text surrogate key derived from relevant attributes.
  • It is deterministic: the same combination of inputs will always generate the same key.
  • It works well when merging tables coming from different sources with the same logical attributes.

3.3 Composite keys as a controlled fallback

Sometimes you cannot introduce a clean surrogate key because the model must remain compatible with an existing design. In such cases, you may create a composite key column in Power Query, while still planning a long-term move to a true surrogate key.

let Source = Excel.CurrentWorkbook(){[Name="tblSales"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{ {"StoreID", Int64.Type}, {"DateKey", type date}, {"ProductID", Int64.Type} }), #"Added Composite Key" = Table.AddColumn(#"Changed Type", "SalesBusinessKey", each Text.Combine({ Text.From([StoreID]), Date.ToText([DateKey], "yyyyMMdd"), Text.From([ProductID]) }, "-"), type text) in #"Added Composite Key"

This composite key is still based on business fields, but by centralizing the logic in one column, you make it easier later to map it to a surrogate integer key if needed.

4. Using surrogate keys in relationships and merges

Surrogate keys only become useful when you actually use them as the join fields between tables, both in Power Query and in the data model.

4.1 Building dimension and fact tables with surrogate keys

A typical pattern is.

  1. Create the dimension table in Power Query and generate a surrogate key (for example, CustomerKey).
  2. In the fact table query, merge with the dimension table using business keys (for example, CustomerCode).
  3. Expand only the surrogate key from the dimension and keep it as CustomerKey in the fact table.
  4. Load both tables to the model and create a relationship between DimCustomer[CustomerKey] and FactSales[CustomerKey].

Example M pattern for merging a fact table to retrieve the surrogate key.

let FactSource = Excel.CurrentWorkbook(){[Name="tblSales"]}[Content], #"Changed Type" = Table.TransformColumnTypes(FactSource,{ {"CustomerCode", type text}, {"ProductCode", type text}, {"SalesAmount", type number} }), DimCustomer = #"DimCustomer With Surrogate Key", // reference dimension query #"Merged Queries" = Table.NestedJoin( #"Changed Type", {"CustomerCode"}, DimCustomer, {"CustomerCode"}, "DimCustomer", JoinKind.LeftOuter ), #"Expanded DimCustomer" = Table.ExpandTableColumn( #"Merged Queries", "DimCustomer", {"CustomerKey"}, {"CustomerKey"} ) in #"Expanded DimCustomer"

With this pattern.

  • The fact table no longer depends directly on CustomerCode for relationships.
  • Changes to CustomerCode or alternative mapping logic only require updates in the dimension, not in every merge.

4.2 Managing changes in business keys

When business keys change (for example, a customer is merged with another account and gets a new code), a surrogate-key-based design allows you to retain historical facts under the old mapping while still aligning new facts to the updated mapping.

  • You can keep multiple rows for the same real-world entity in the dimension with different time ranges.
  • Each row gets its own surrogate key, but joins from facts will still resolve correctly based on the BusinessKey and date logic you implement.
Note : If you expect frequent changes in business keys, design your dimension query with clear rules for how to map historical transactions, for example, whether to keep them under the old key or migrate them to the new entity.

5. Refresh, reproducibility, and stability considerations

Creating surrogate keys in Power Query is not only about generating unique values. You must ensure that keys remain stable across refreshes, deployments, and data source changes.

5.1 Avoiding non-deterministic ordering

Index-based keys rely on row order. If the sort order changes between refreshes, the same real-world entity could receive a different surrogate key, breaking relationships and historical analyses.

  • Always apply an explicit Table.Sort step before Table.AddIndexColumn.
  • Use fields that are guaranteed to be unique and stable, such as an original system ID plus an effective start date.
  • Avoid relying on source-order, especially from CSV files or loosely defined external feeds.

5.2 Handling incremental loads

When using incremental refresh or custom incremental loading patterns.

  • Build surrogate keys in a dimension query that processes the full history, or manage key assignment in a central table (for example, through a master dimension query).
  • For fact tables loaded incrementally, ensure that lookups to the dimension are still correct for newly arriving rows.
  • If you use index-based keys, they should be generated on a stable snapshot of the dimension, not on the incremental fact data.

5.3 Deployment across environments

In Power BI, the same Power Query logic may run against different environments (development, test, production). To keep surrogate keys consistent.

  • Use deterministic key generation (for example, sorted index or hash of business attributes).
  • Avoid environment-specific values (such as database-specific surrogate keys) when defining your own keys in Power Query.
  • Document how keys are generated so that future changes do not accidentally reset or redefine them.

6. Comparing surrogate key strategies in Power Query

The table below compares common surrogate key strategies and how they behave in typical Power Query projects.

Strategy Key type Pros Cons Recommended usage
Sorted Index Int64 index Simple, fast, ideal for relationships. Depends on stable sort order. Dimensions with reliable source IDs.
Hash of attributes Text hash Deterministic across systems, good for linkage. Longer keys, possible hash collisions in theory. Record linkage and multi-source matching.
Composite business key Concatenated text No extra integer column needed. Still tied to business attributes and changes. Temporary solution or compatibility mode.

7. Practical step-by-step example

The following end-to-end example shows how to implement surrogate keys in Power Query for a small Customer dimension and Sales fact table in Excel or Power BI.

7.1 Step 1 – Create the Customer dimension with a surrogate key

let Source = Excel.CurrentWorkbook(){[Name="tblCustomers"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{ {"CustomerCode", type text}, {"CustomerName", type text}, {"Country", type text} }), #"Removed Duplicates" = Table.Distinct(#"Changed Type", {"CustomerCode"}), #"Sorted Rows" = Table.Sort(#"Removed Duplicates",{{"CustomerCode", Order.Ascending}}), #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "CustomerKey", 1, 1, Int64.Type) in #"Added Index"

Load this query as DimCustomer to the model.

7.2 Step 2 – Build the Sales fact table and pull the surrogate key

let Source = Excel.CurrentWorkbook(){[Name="tblSales"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{ {"CustomerCode", type text}, {"Date", type date}, {"SalesAmount", type number} }), DimCustomer = #"DimCustomer", // reference the dimension query #"Merged Queries" = Table.NestedJoin( #"Changed Type", {"CustomerCode"}, DimCustomer, {"CustomerCode"}, "DimCustomer", JoinKind.LeftOuter ), #"Expanded DimCustomer" = Table.ExpandTableColumn( #"Merged Queries", "DimCustomer", {"CustomerKey"}, {"CustomerKey"} ) in #"Expanded DimCustomer"

Load this query as FactSales to the model and create a relationship between DimCustomer[CustomerKey] and FactSales[CustomerKey].

7.3 Step 3 – Validate uniqueness and referential integrity

In the Customer dimension.

  • Check that CustomerKey has no duplicates.
  • Verify that CustomerCode is unique or is handled appropriately if duplicates exist.

In the Sales fact table.

  • Check for any rows where CustomerKey is null, which indicates unmatched customers.
  • Decide how to handle unmatched rows (for example, assign them to an “Unknown” customer or fix the source data).

FAQ

Should I always use surrogate keys instead of business keys in Power Query?

You do not always need surrogate keys for very small or simple models, but they are recommended for most Power Query projects that feed analytical models in Excel or Power BI. When you expect multiple tables, historical data, slowly changing attributes, or integration of several sources, surrogate keys significantly improve stability and maintainability.

Is an index column in Power Query safe as a primary key?

An index column is safe as a primary key only if the row ordering is deterministic. You must sort the table explicitly using stable attributes before adding the index, and the source data must not randomly reorder records between refreshes. If these conditions hold, an index-based surrogate key is a robust and high-performing solution.

When should I prefer hash-based surrogate keys?

Hash-based surrogate keys are useful when you need deterministic keys for the same entities coming from different sources or environments. They are especially helpful in record linkage, where no single source has a clean primary key. Because the hash is derived from a set of attributes, the key will be consistent as long as those attributes and the hashing logic remain the same.

Can surrogate keys be text, or must they be integers?

Surrogate keys can be either integers or text. Integers are preferred for performance and memory efficiency, especially in large Power BI models. However, text-based surrogate keys, such as hashes or composite keys, are acceptable when you need deterministic cross-system keys or when migrating from an existing design. The key property is that the value is stable, unique, and independent of business meaning.

How do surrogate keys relate to relationships in the Power BI model?

In the data model, surrogate keys become the fields that define relationships between tables. Dimension tables expose the surrogate key as their primary key, and fact tables store the corresponding surrogate key as a foreign key. Once these relationships are defined, analytical measures and visuals can safely rely on them, even if the underlying business codes change over time.