- Get link
- X
- Other Apps
- Get link
- X
- Other Apps
This article explains expert-level best practices for designing, implementing, and maintaining staging queries in Power Query for Excel and Power BI so that data models become faster, more reliable, and easier to manage in real business environments.
1. What staging queries are and why they matter
In Power Query, a staging query is a query whose main purpose is to clean, standardize, and structure raw data before it is reused by other queries such as fact tables, dimension tables, or reports.
Instead of loading raw data directly into the data model and repeating the same transformation steps multiple times, you create a dedicated staging layer and reference these staging queries from downstream queries.
1.1 Typical roles of staging queries
Staging queries usually perform the following tasks.
- Connect to raw data sources (files, databases, APIs).
- Apply initial cleanup (remove obviously bad rows, fix column types, trim text, standardize dates).
- Normalize column names and data formats to a corporate standard.
- Split one large data source into reusable logical sets (e.g., header vs. detail, orders vs. lines).
- Provide a stable interface for downstream fact and dimension queries.
1.2 Benefits of a dedicated staging layer
Using staging queries systematically delivers several benefits.
- Performance: transformations are executed once at the staging level and reused, reducing redundant work during refresh.
- Maintainability: business logic is centralized in a small number of staging queries instead of being scattered across many report queries.
- Reusability: multiple fact and dimension queries can reference the same staging query instead of duplicating steps.
- Lower risk of errors: corrections to source data handling are made in one place and automatically flow to all dependent queries.
- Cleaner data model: only curated outputs (facts and dimensions) are loaded, keeping the model tidy and easier to understand.
2. Structuring staging queries in Power Query
Staging queries work best when they are intentionally structured inside the Power Query environment rather than added randomly.
2.1 Folder and naming conventions
A consistent naming convention is the foundation of staging query best practices.
- Use a prefix for staging queries such as
stg_orstage_. - Use a different prefix for final tables such as
f_(facts) andd_(dimensions). - Group staging queries into “folders” (query groups) and name the group clearly: for example,
01 Staging,02 Dimensions,03 Facts,99 Utility. - Keep names short but descriptive:
stg_SalesRaw,stg_CustomersMaster,d_Customer,f_Sales.
| Layer | Typical prefix | Example name | Loaded to data model |
|---|---|---|---|
| Staging | stg_ | stg_SalesRaw | No (load disabled) |
| Dimension | d_ | d_Customer | Yes |
| Fact | f_ | f_Sales | Yes |
| Utility / Parameters | u_ / p_ | p_DateRange | Usually no |
2.2 Disabling load for staging queries
Staging queries are normally not loaded into Excel worksheets or into the data model.
- Right-click the staging query in the Queries pane.
- Clear “Enable load” (Excel) or “Load to report” (Power BI Desktop).
- Confirm that only facts and dimensions appear in the final model.
Note : Forgetting to disable load for many staging queries can dramatically increase model size and refresh times because every intermediate table is stored redundantly.
2.3 Referencing staging queries correctly
Downstream queries should reference staging queries instead of duplicating them.
- Right-click a staging query such as
stg_SalesRaw. - Choose “Reference” (not “Duplicate”).
- Rename the new query to
f_Salesor another appropriate final name. - Add additional transformations only needed for that specific fact or dimension.
Using “Reference” ensures that all dependent queries share the same upstream steps.
3. Designing robust staging transformations
Good staging queries handle messy real-world data while still preserving performance and query folding where possible.
3.1 Keep early steps foldable
When the data source supports query folding (e.g., SQL Server, some OData feeds), try to keep heavy transformations early in the query within the foldable region.
- Filter rows as early as possible.
- Select only required columns instead of importing everything.
- Avoid row-by-row custom functions before major filters and joins.
- Prefer built-in transformations that fold to the source over complex custom logic.
Note : If a non-foldable step is inserted near the top of a staging query, all subsequent steps may run in memory in Power Query instead of being pushed to the database, which can slow refresh dramatically on large tables.
3.2 Typical staging transformation sequence
A simple but effective staging sequence might look like the following.
- Connect to source (database, file, API).
- Filter to relevant rows (e.g., last N years).
- Select necessary columns only.
- Set data types for each column.
- Standardize text (trim, clean, case rules).
- Split or merge columns for keys and codes.
- Handle nulls and default values.
- Expose the result as the staging query output.
3.3 Example: staging query in M code
The following M code shows a simplified staging query for a sales table.
let Source = Sql.Database("SQLPROD01", "SalesDW"), SalesTable = Source{[Schema="dbo", Item="Sales"]}[Data], FilteredRows = Table.SelectRows( SalesTable, each [OrderDate] >= #date(2020, 1, 1) ), SelectedColumns = Table.SelectColumns( FilteredRows, {"OrderID", "OrderDate", "CustomerID", "ProductID", "Quantity", "NetAmount"} ), TypedColumns = Table.TransformColumnTypes( SelectedColumns, { {"OrderID", Int64.Type}, {"OrderDate", type date}, {"CustomerID", Int64.Type}, {"ProductID", Int64.Type}, {"Quantity", Int64.Type}, {"NetAmount", type number} } ), CleanText = Table.TransformColumns( TypedColumns, {{"CustomerID", Text.Trim, type text}}, MissingField.UseCurrent ) in CleanText This query remains foldable to SQL Server in most environments, so filters and projections are executed on the database side.
4. Separating staging, business logic, and presentation
A common mistake is mixing raw data handling with business logic and presentation details inside the same query. A clear separation into layers is a core staging query best practice.
4.1 Layered architecture in Power Query
A practical layered pattern is:
- Staging layer: Connects to sources and performs technical clean-up.
- Business layer: Adds calculations, surrogate keys, and business rules.
- Presentation layer: Prepares outputs specifically for visuals, exports, or report pages.
| Layer | Query example | Main responsibility |
|---|---|---|
| Staging | stg_SalesRaw | Raw cleanup and standardization of source data. |
| Business / Model | f_Sales | Calculated columns, relationships, and business rules. |
| Presentation | v_SalesDashboard | Aggregations and shapes optimized for visuals. |
4.2 Using references to implement layers
Each layer is built by referencing the previous layer:
stg_SalesRawconnects to the source and cleans data.f_Salesreferencesstg_SalesRawand adds business columns.v_SalesDashboardreferencesf_Salesand aggregates data for visuals.
This structure gives you flexibility to change upstream logic without breaking every single downstream query.
5. Handling multiple sources and incremental refresh
Staging queries become even more important when dealing with many data sources or with large tables requiring incremental refresh.
5.1 Combining multiple source systems
When multiple systems feed into the same model (for example ERP, CRM, and an online store), follow this pattern.
- Create a dedicated staging query per source system:
stg_Sales_ERP,stg_Sales_CRM,stg_Sales_Web. - Ensure each staging query standardizes column names, data types, and key formats.
- Create a consolidating query
stg_Sales_Allthat appends the source-specific staging queries. - Reference
stg_Sales_Allfrom the final fact queryf_Sales.
Note : It is usually better to standardize each source separately and then append them in a single consolidating staging query rather than appending raw, inconsistent tables and trying to fix all issues at once.
5.2 Staging queries for incremental refresh scenarios
In Power BI, incremental refresh policies are set at the table level in the model, but the logic that determines which rows belong to which period is usually implemented in Power Query.
- Use staging queries to filter the source table by date or by partition keys.
- Parameterize date filters based on RangeStart and RangeEnd parameters.
- Keep the filtering logic as close to the source as possible so it folds.
- Reference the staging query in the final fact table where incremental refresh is configured.
let Source = Sql.Database("SQLPROD01", "SalesDW"), SalesTable = Source{[Schema="dbo", Item="Sales"]}[Data], FilteredByDate = Table.SelectRows( SalesTable, each [OrderDate] >= RangeStart and [OrderDate] < RangeEnd ) in FilteredByDate Here, the staging query stg_Sales_Incremental handles the date windowing and is reused by the final fact table.
6. Documentation, parameters, and governance
Well-governed staging queries make large Power Query solutions easier to share and maintain across teams.
6.1 Describing queries with annotations
Use descriptions and comments to document the purpose of each staging query.
- Add a query description in the query properties explaining the source and any critical business decisions.
- Consider a short header comment in the M code using “dummy” steps that do not break folding.
let // stg_CustomersMaster // Purpose: Standardize customer data from CRM and ERP Source = ... in Source 6.2 Using parameters with staging queries
Parameters enable greater flexibility and governance for staging queries.
- Create parameters for server names, database names, and folder paths so they can be changed for dev/test/prod.
- Use parameters for date filters or row limits when developing and testing.
- Reference parameters in staging queries instead of hard-coding environment-specific values.
let Source = Sql.Database(p_ServerName, p_DatabaseName), SalesTable = Source{[Schema="dbo", Item="Sales"]}[Data], LimitedRows = if p_IsDev then Table.FirstN(SalesTable, 10000) else SalesTable in LimitedRows 6.3 Privacy levels and staging
When combining data from multiple sources with different privacy levels, Power Query may insert privacy firewalls that affect performance. A structured staging design helps you see where data is combined and decide how to configure privacy levels appropriately.
Note : Staging queries themselves do not bypass privacy rules; they simply make combinations explicit so that you can evaluate whether the chosen privacy configuration is safe and performant for your scenario.
7. Practical checklist for staging query best practices
The following checklist can be used as a quick review when designing or refactoring staging queries in a real project.
| Area | Checklist item | Yes/No |
|---|---|---|
| Naming & structure | Staging queries use a clear prefix and are grouped in a dedicated folder. | |
| Load settings | All staging queries have load disabled; only facts and dimensions are loaded. | |
| Query folding | Filters and column selection occur early and remain foldable to the source. | |
| Reuse | Downstream queries use “Reference” instead of “Duplicate”. | |
| Multiple sources | Each source has its own staging query, and consolidation happens after standardization. | |
| Incremental refresh | Staging queries implement date or partition filters based on parameters. | |
| Documentation | Each critical staging query has a description and, if needed, comments in M code. | |
| Parameters | Server, database, and folder paths are parameterized rather than hard-coded. |
FAQ
Should every query in Power Query be a staging query?
No. Staging queries are usually created only where they add clear value, such as for large, frequently reused tables or complex source systems. Very small or one-off queries can sometimes be modeled directly without a separate staging layer, but having a consistent pattern for key tables keeps the overall solution easier to manage.
Is it necessary to disable load for all staging queries?
Yes, in most models you should disable load for staging queries because they are intermediate objects. Keeping their load enabled typically inflates file size and refresh duration without adding business value. The exceptions are diagnostic scenarios where you intentionally want to inspect staging outputs temporarily during development.
How many staging layers are too many?
There is no fixed limit, but beyond three or four layers the complexity can become hard to navigate. A common structure is one staging layer for raw cleanup, one business layer for model tables, and optionally a presentation layer for report-specific shapes. If you need more layers, document them clearly and keep naming consistent.
Do staging queries always improve performance?
Staging queries improve performance mainly by reducing duplication of heavy transformations and by keeping foldable operations close to the source. However, they do not guarantee faster refresh if they are poorly designed, non-foldable, or if too many unnecessary staging queries are created. Performance still depends on the data source, network, and model design.
Can I reuse the same staging queries in both Excel and Power BI?
Yes. If you keep source paths and server names parameterized, the same staging logic can often be reused in multiple workbooks or Power BI reports. Exporting and importing queries or using shared templates and dataflows are common strategies to share staging logic across solutions.
추천·관련글
- Fix Inconsistent NMR Integrals: Expert qNMR Troubleshooting Guide
- Reduce High UV-Vis Background Absorbance: Proven Fixes and Best Practices
- How to Fix GC Peak Fronting: Causes, Diagnostics, and Proven Solutions
- Suppress Solvent Peak Interference in NMR: Proven Solvent Suppression Techniques and Settings
- Fix NMR Shimming Failure: Expert Troubleshooting Guide for Sharp, Stable Spectra
- GHS Label Reading: Fix Common Mistakes and Improve Chemical Safety Compliance
data transformation layer
excel power query
power bi query design
power query staging
staging queries best practices
- Get link
- X
- Other Apps