Boost Power Query Performance: Import Large CSV Files with Table.Buffer in Excel and Power BI

This article explains how to import very large CSV files with Power Query and dramatically improve refresh speed and stability by using Table.Buffer and related buffering techniques in Excel and Power BI.

1. Why large CSV imports become slow in Power Query

When a CSV file is small, Power Query usually feels instant and smooth.

As soon as file size grows into hundreds of megabytes or millions of rows, users often face symptoms such as:

  • Extremely slow refresh times.
  • Queries that appear to hang on merge or group operations.
  • High memory usage and occasional out-of-memory errors.
  • Repeated re-reading of the same CSV file during a single refresh.

The core reason is how the Power Query engine evaluates steps: it uses lazy evaluation and streams data from the source. This is efficient in many scenarios, but when a later step repeatedly enumerates the same table (for example, through merges, groupings, custom functions, or repeated references), Power Query may read the same CSV data multiple times.

Table.Buffer and List.Buffer allow you to trade memory for speed. They force Power Query to read a table or list once, store it in memory, and then let downstream steps reuse the in-memory buffer instead of streaming from the CSV source again.

2. How Power Query evaluates steps and why buffering matters

2.1 Lazy evaluation and streaming

Power Query follows a lazy evaluation model.

Key characteristics include:

  • Each step is not fully computed immediately but only when needed by a later step.
  • If a later step references an earlier step multiple times, the earlier step may be recomputed multiple times.
  • For file-based sources like CSV, recomputation usually means re-reading the file from disk and re-parsing it.

This design makes simple transformations efficient but can become a bottleneck when the same source is referenced many times, such as in:

  • Star-schema style models where a large fact table is merged with several dimension tables multiple times.
  • Custom functions that repeatedly consume the same large table.
  • Complex conditional logic that branches and recombines tables.

2.2 Query folding vs. non-folding operations

For database sources, Power Query can often push transformations back to the source using query folding, meaning the work is done by the database engine.

However, for CSV files, there is no underlying query engine to fold to.

Almost all transformations are executed inside the Power Query engine itself, so repeated enumerations are especially expensive.

Buffering becomes a critical tool to avoid redundant work in these scenarios.

3. What Table.Buffer and List.Buffer actually do

3.1 Table.Buffer

Table.Buffer takes a table expression and returns an in-memory buffered copy of that table.

BufferedTable = Table.Buffer(SourceTable)

Important characteristics.

  • The source table is fully read and materialized in memory at the moment of buffering.
  • Future steps that reference the buffered table reuse the in-memory data instead of re-reading the CSV file.
  • Table.Buffer is most useful immediately after heavy or expensive steps that would otherwise be recomputed.

3.2 List.Buffer

List.Buffer works the same way but for list objects.

It is useful when working with list-based operations such as:

  • List.Contains, List.Select, List.Transform used repeatedly inside custom columns.
  • Lookup-list scenarios where a list is used as a filter or membership test many times.
BufferedList = List.Buffer(SourceList)

3.3 Trade-offs of buffering

Buffering is not free.

Key trade-offs include:

  • Increased memory usage because the entire table or list is loaded into RAM.
  • Potentially slower first-time evaluation of the buffered step because it must materialize all rows at once.
  • Large buffers may cause memory pressure or refresh failures on low-memory machines.
Note : Buffer only where it solves a real performance problem and only on tables or lists that are reused multiple times. Buffering every step usually makes performance worse, not better.

4. When buffering helps vs. when it hurts

4.1 Good candidates for Table.Buffer

Use Table.Buffer in the following patterns.

  • A large CSV fact table that is merged with multiple dimension tables.
  • A table that is fed into a custom function many times.
  • A table used as the base for multiple branching transformations, where each branch recombines later.
  • A lookup table referenced repeatedly through Table.Join or Table.NestedJoin.

4.2 Cases where buffering is harmful or unnecessary

Do not buffer in the following cases.

  • Small tables where performance is already acceptable.
  • Single-use tables that are referenced only once later in the query.
  • Early steps right at the CSV source if you later filter down to a tiny subset of rows.
  • Scenarios where the user machine has very limited RAM and the CSV is extremely large.
Note : The earlier you buffer in the query, the more data you load into memory. Whenever possible, filter and remove unnecessary columns before buffering to reduce the size of the in-memory table.

5. Step-by-step: importing a large CSV with Table.Buffer

5.1 Base query structure

Assume you have a large CSV file with millions of rows.

A typical basic query created by the UI looks like this.

let Source = Csv.Document( File.Contents("C:\Data\LargeFile.csv"), [ Delimiter = ",", Columns = 15, Encoding = 65001, QuoteStyle = QuoteStyle.Csv ] ), PromoteHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars = true]), ChangedTypes = Table.TransformColumnTypes( PromoteHeaders, { {"Date", type date}, {"CustomerID", Int64.Type}, {"Product", type text}, {"Amount", type number} } ) in ChangedTypes

If this table is later merged with several other queries, Power Query may repeatedly walk through the CSV content, which slows down refresh dramatically.

5.2 Introducing a buffered staging query

A robust pattern is to separate your large CSV into a staging query and then reference it from other queries.

  1. Create a staging query that does only the necessary cleansing.
  2. Apply Table.Buffer at the staging output.
  3. Reference the buffered staging table from fact and dimension queries.

Example staging query with buffer.

let Source = Csv.Document( File.Contents("C:\Data\LargeFile.csv"), [ Delimiter = ",", Columns = 15, Encoding = 65001, QuoteStyle = QuoteStyle.Csv ] ), PromoteHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars = true]), RemovedOtherColumns = Table.SelectColumns( PromoteHeaders, {"Date", "CustomerID", "Product", "Amount"} ), FilteredRows = Table.SelectRows( RemovedOtherColumns, each [Date] >= #date(2020, 1, 1) ), ChangedTypes = Table.TransformColumnTypes( FilteredRows, { {"Date", type date}, {"CustomerID", Int64.Type}, {"Product", type text}, {"Amount", type number} } ), BufferedFact = Table.Buffer(ChangedTypes) in BufferedFact

Now other queries should reference this staging query instead of the raw CSV.

let FactSales = BufferedFact, Grouped = Table.Group( FactSales, {"CustomerID"}, {{"Total Amount", each List.Sum([Amount]), type number}} ) in Grouped

Because FactSales is already buffered, the grouping step reads the in-memory table only once.

5.3 Buffering lookup tables

If you have a relatively small lookup table used in multiple joins, buffering that table can also improve merge performance.

let LookupSource = Excel.CurrentWorkbook(){[Name = "DimCustomer"]}[Content], ChangedTypes = Table.TransformColumnTypes( LookupSource, { {"CustomerID", Int64.Type}, {"CustomerName", type text}, {"Region", type text} } ), BufferedCustomers = Table.Buffer(ChangedTypes) in BufferedCustomers

The fact query then merges with BufferedCustomers instead of recomputing the lookup steps each time.

6. Advanced patterns for large CSV imports

6.1 Using buffering with Table.Join and Table.NestedJoin

Joins are common performance hotspots when large CSV tables are involved.

Recommended pattern.

  1. Buffer the larger side of the join after necessary filtering and type conversions.
  2. Optionally buffer the smaller side as well if it is referenced in many queries.
  3. Perform the join against the buffered tables.
let FactSales = BufferedFact, DimCustomer = BufferedCustomers, Merged = Table.NestedJoin( FactSales, {"CustomerID"}, DimCustomer, {"CustomerID"}, "Customer", JoinKind.LeftOuter ), Expanded = Table.ExpandTableColumn( Merged, "Customer", {"CustomerName", "Region"}, {"CustomerName", "Region"} ) in Expanded

6.2 Buffering for List.Contains and list-based checks

If you build a list from a table and repeatedly check membership with List.Contains, you should buffer the list once.

let BlacklistTable = Excel.CurrentWorkbook(){[Name = "Blacklist"]}[Content], BlacklistList = List.Buffer(BlacklistTable[CustomerID]), AddFlag = Table.AddColumn( BufferedFact, "IsBlacklisted", each List.Contains(BlacklistList, [CustomerID]), type logical ) in AddFlag

Without List.Buffer, List.Contains could force Power Query to reconstruct the list for each row, which is extremely slow on large datasets.

6.3 Combining buffering with parameterized file paths

For reusable solutions, store the file path and other settings in parameters or configuration tables.

Buffering still applies; you simply insert Table.Buffer at the right point in the query chain.

7. Measuring performance and verifying the impact of buffering

7.1 Using Query Diagnostics (Power Query editor)

To validate whether Table.Buffer actually improves performance, use Query Diagnostics.

  1. Open Power Query Editor.
  2. On the Tools or Diagnostics menu (depending on the version), choose Start Diagnostics.
  3. Refresh the target query.
  4. Stop Diagnostics and review the generated diagnostic queries.

Compare results between buffered and non-buffered versions of the same query, focusing on:

  • Total duration.
  • Number of data source reads (file reads).
  • Memory consumption if visible through external monitoring tools.

7.2 Simple before/after timing

If Diagnostics is not available or you need a quick check, measure refresh duration.

  1. Record refresh time before introducing Table.Buffer.
  2. Apply buffering to the suspected bottleneck tables.
  3. Refresh again and compare times.
Note : Always change one thing at a time when tuning performance. If you introduce multiple changes at once, it becomes difficult to know which change actually helped or hurt.

8. Typical pitfalls and troubleshooting large CSV imports

8.1 Buffering too early in the pipeline

One of the most common mistakes is buffering immediately after reading the CSV, before applying any filters or column pruning.

This forces unnecessary rows and columns into memory, increasing both memory pressure and refresh time.

Instead, follow this order.

  1. Read from CSV.
  2. Remove unwanted columns.
  3. Filter out irrelevant rows.
  4. Apply basic type conversions.
  5. Only then apply Table.Buffer.

8.2 Buffering intermediate steps that are not reused

Buffering a step that is used only once adds overhead with no benefit.

Use buffering primarily on:

  • Final outputs of staging queries referenced by multiple dependent queries.
  • Lookup or dimension tables used repeatedly across merges and calculations.

8.3 Out-of-memory or crash scenarios

If your CSV is extremely large and the machine has limited RAM, buffering may cause failures.

Mitigation strategies include:

  • Filtering the data to only required date ranges or subsets before buffering.
  • Splitting the CSV into multiple smaller files and loading only relevant subsets.
  • Upgrading to a machine with more RAM or moving the data into a relational database and using query folding instead.

8.4 Incorrect assumption that buffering always speeds things up

Table.Buffer is a targeted performance tool, not a universal accelerator.

Always verify with timing or diagnostics that buffering produces a net benefit for your specific scenario.

9. Practical checklist for large CSV imports with Power Query

The following table summarizes best practices for handling large CSV files using buffering.

Scenario Recommended buffering strategy Notes
Single large CSV used in multiple queries Create a staging query, clean and filter, then apply Table.Buffer at the end. Reference the staging query from all dependent queries.
Large fact table joined to several lookup tables Buffer the fact table and key lookup tables after type conversions and filters. Perform joins against buffered tables to avoid repeated scans.
Membership checks using List.Contains Build a list once and wrap it in List.Buffer. Use the buffered list in custom columns or filters.
Complex branching query logic Buffer the branch output that feeds multiple downstream branches. Prevents recomputation of expensive branches.
Limited memory environment Buffer only after aggressive filtering and column reduction. Monitor memory usage and avoid buffering extremely large tables.

10. Implementation template for reusable buffered CSV imports

Below is a template that you can adapt for your own large CSV imports with Power Query.

let // Parameters FilePath = "C:\Data\LargeFile.csv", StartDate = #date(2020, 1, 1),
// 1. Read CSV
Source = Csv.Document(
    File.Contents(FilePath),
    [
        Delimiter = ",",
        Columns = 15,
        Encoding = 65001,
        QuoteStyle = QuoteStyle.Csv
    ]
),

// 2. Promote headers
PromoteHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars = true]),

// 3. Keep only required columns
KeepColumns = Table.SelectColumns(
    PromoteHeaders,
    {"Date", "CustomerID", "Product", "Amount"}
),

// 4. Filter rows early
FilteredRows = Table.SelectRows(
    KeepColumns,
    each [Date] >= StartDate
),

// 5. Set data types
Typed = Table.TransformColumnTypes(
    FilteredRows,
    {
        {"Date", type date},
        {"CustomerID", Int64.Type},
        {"Product", type text},
        {"Amount", type number}
    }
),

// 6. Buffer the cleaned table
BufferedFact = Table.Buffer(Typed)
in
BufferedFact

From this point on, all dependent queries should reference BufferedFact as their source. This pattern ensures that the CSV file is read and parsed once per refresh, and all downstream operations work against the in-memory buffer.

FAQ

Should I always use Table.Buffer when importing large CSV files?

No. Table.Buffer is most effective when the same table is reused multiple times in merges, custom functions, or branching logic.

If the table is used only once in a simple pipeline, buffering often adds overhead without improving performance.

Where is the best place in the query to apply Table.Buffer?

Apply Table.Buffer after you have removed unnecessary columns, filtered out irrelevant rows, and set key data types.

This minimizes the size of the buffered table and reduces memory usage while still avoiding repeated reads of the CSV file.

Can buffering fix every performance problem with large CSV imports?

No. Buffering addresses repeated enumeration of the same data.

If your performance issues come from inefficient transformations, poorly designed joins, or insufficient hardware resources, you will need additional optimization steps such as better filtering, schema redesign, or hardware upgrades.

Is there a risk of running out of memory when using Table.Buffer?

Yes. Because buffering loads the entire table into memory, very large CSV files on low-RAM machines can cause out-of-memory errors or slow paging.

Mitigate this risk by filtering aggressively before buffering, reducing the number of columns, and testing on realistic datasets.

Does buffering behave differently in Excel and Power BI?

The underlying Power Query engine is essentially the same, so the buffering behavior is similar.

However, overall performance will still depend on available memory, data model design, and the refresh pipeline in each environment.

: