- Get link
- X
- Other Apps
- Get link
- X
- Other Apps
This article explains how to use the Open XML SDK in C# to perform reliable, high-performance batch editing of Excel spreadsheets without automating Excel, focusing on real-world patterns, performance tuning, and maintainable code structure.
1. Why batch edit Excel files with Open XML SDK.
Many teams still rely on Excel workbooks as de facto databases for reporting, pricing, forecasting, and operational logs.
When the number of workbooks grows into the hundreds or thousands, manual editing or macro-based updates becomes fragile and slow.
The Open XML SDK provides a robust, server-friendly way to manipulate Excel documents at the file format level.
1.1 Key advantages over Excel Interop.
- Runs without installing Excel, suitable for servers and containers.
- No desktop session required, so batch jobs can run unattended in services or scheduled tasks.
- Fine-grained control over the workbook structure, including worksheets, styles, shared strings, and tables.
- Good performance on large batches when combined with streaming and careful memory management.
1.2 Common batch editing scenarios.
- Refreshing values in report templates for multiple customers or business units.
- Propagating a new business rule across thousands of price lists or quotation workbooks.
- Fixing a miscalculated column or adding an audit flag to every workbook in a shared folder.
- Injecting data from a database or API into existing spreadsheet templates.
2. Spreadsheet structure fundamentals in Open XML SDK.
To design a batch editing tool, it is important to understand how Excel workbooks are represented in the Open XML file format and how the SDK exposes them.
2.1 Core parts and classes.
An .xlsx file is a ZIP package that contains parts connected by relationships.
The main classes you will work with for Excel spreadsheets are:
| Concept. | Open XML SDK type. | Description. |
|---|---|---|
| Spreadsheet package. | SpreadsheetDocument. | Represents the entire Excel workbook package. |
| Workbook. | WorkbookPart. | Holds the global workbook structure, sheet list, and defined names. |
| Worksheet. | WorksheetPart. | Represents a single sheet and its content, including cells and tables. |
| Sheet data. | SheetData. | Contains rows and cells inside a worksheet. |
| Cell. | Cell. | Stores the cell reference, value, and data type. |
| Shared strings. | SharedStringTablePart. | Optional table used to store unique text values referenced by cells. |
2.2 Cell values and shared strings.
When you batch edit spreadsheets, it is critical to handle cell values correctly.
Excel uses several representations.
- Numeric values are stored directly in
CellValue.TextwithDataTypenull orCellValues.Number. - Text values are often stored as indexes into the shared string table, with
DataTypeset toCellValues.SharedString. - Boolean, date, and error types have their own conventions and sometimes depend on styles.
For large-scale updates with repeated strings, using the shared string table keeps file size smaller and consistent with typical Excel output.
3. Setting up an Open XML SDK batch editing project.
To get started in C#, create a console application or worker service dedicated to batch processing.
3.1 Installing the DocumentFormat.OpenXml package.
Install the official NuGet package.
dotnet add package DocumentFormat.OpenXml Then import the namespaces you need.
using System; using System.Collections.Generic; using System.IO; using System.Linq; using DocumentFormat.OpenXml; using DocumentFormat.OpenXml.Packaging; using DocumentFormat.OpenXml.Spreadsheet; 3.2 Safely opening workbooks for editing.
The typical pattern for editing a workbook is to open the SpreadsheetDocument in editable mode inside a using block so that it is disposed correctly.
public static void EditWorkbook(string path) { using (SpreadsheetDocument doc = SpreadsheetDocument.Open(path, true)) { WorkbookPart workbookPart = doc.WorkbookPart; // Perform edits here. workbookPart.Workbook.Save(); } // Dispose persists changes and closes the package. } Note : Open XML SDK manipulates the file directly, so always work on a copy or in a controlled folder when testing new batch logic.
4. A reusable pattern for updating a single cell.
Before implementing batch logic, it is useful to build a robust primitive function that updates a single cell by sheet name and cell reference.
4.1 Locating a worksheet by name.
private static WorksheetPart GetWorksheetPart(WorkbookPart workbookPart, string sheetName) { Sheet sheet = workbookPart.Workbook .Descendants<Sheet>() .FirstOrDefault(s => s.Name == sheetName);
if (sheet == null)
throw new ArgumentException($"Sheet '{sheetName}' not found.");
return (WorksheetPart)workbookPart.GetPartById(sheet.Id);
}
4.2 Inserting or retrieving a cell.
This helper ensures that the row and cell exist, which is important when batch editing ranges that may be partially empty.
private static Cell GetOrCreateCell(WorksheetPart worksheetPart, string cellReference) { SheetData sheetData = worksheetPart.Worksheet.GetFirstChild<SheetData>();
// Extract row index from a reference like "C10".
string rowIndexText = new string(cellReference.Where(char.IsDigit).ToArray());
uint rowIndex = uint.Parse(rowIndexText);
Row row = sheetData.Elements<Row>()
.FirstOrDefault(r => r.RowIndex == rowIndex);
if (row == null)
{
row = new Row { RowIndex = rowIndex };
sheetData.Append(row);
}
Cell cell = row.Elements<Cell>()
.FirstOrDefault(c => c.CellReference == cellReference);
if (cell == null)
{
cell = new Cell { CellReference = cellReference };
// Keep cells ordered by column for compatibility.
Cell refCell = null;
foreach (Cell existingCell in row.Elements<Cell>())
{
if (string.Compare(existingCell.CellReference.Value, cellReference, StringComparison.OrdinalIgnoreCase) > 0)
{
refCell = existingCell;
break;
}
}
if (refCell != null)
row.InsertBefore(cell, refCell);
else
row.Append(cell);
}
return cell;
}
4.3 Updating a numeric or shared string cell.
The function below updates a cell with either a double or a string while respecting the shared string table.
public static void UpdateCellValue( string filePath, string sheetName, string cellReference, object value) { using (SpreadsheetDocument doc = SpreadsheetDocument.Open(filePath, true)) { WorkbookPart wbPart = doc.WorkbookPart; WorksheetPart wsPart = GetWorksheetPart(wbPart, sheetName); Cell cell = GetOrCreateCell(wsPart, cellReference);
if (value is double d)
{
cell.DataType = null; // Numeric.
cell.CellValue = new CellValue(d.ToString(System.Globalization.CultureInfo.InvariantCulture));
}
else
{
string text = Convert.ToString(value) ?? string.Empty;
int sharedIndex = InsertSharedStringItem(text, wbPart);
cell.DataType = CellValues.SharedString;
cell.CellValue = new CellValue(sharedIndex.ToString());
}
wsPart.Worksheet.Save();
wbPart.Workbook.Save();
}
}
private static int InsertSharedStringItem(string text, WorkbookPart workbookPart)
{
SharedStringTablePart sstPart = workbookPart.SharedStringTablePart;
if (sstPart == null)
sstPart = workbookPart.AddNewPart();
SharedStringTable sst = sstPart.SharedStringTable ??= new SharedStringTable();
// Look for an existing text.
int i = 0;
foreach (SharedStringItem item in sst.Elements<SharedStringItem>())
{
if (item.Text != null && item.Text.Text == text)
return i;
i++;
}
sst.AppendChild(new SharedStringItem(new Text(text)));
sst.Save();
return i;
}
5. Designing batch editing for many cells and sheets.
With robust primitives in place, the next step is to design an efficient pattern for updating many cells in a single workbook.
5.1 Batch updates using a dictionary of cell references.
A simple but powerful approach is to group updates for a given sheet into a dictionary keyed by cell reference.
public static void BatchUpdateSheet( string filePath, string sheetName, IDictionary<string, object> updates) { using (SpreadsheetDocument doc = SpreadsheetDocument.Open(filePath, true)) { WorkbookPart wbPart = doc.WorkbookPart; WorksheetPart wsPart = GetWorksheetPart(wbPart, sheetName);
// Optional cache for shared strings to reduce lookups.
SharedStringTablePart sstPart = wbPart.SharedStringTablePart
?? wbPart.AddNewPart<SharedStringTablePart>();
SharedStringTable sst = sstPart.SharedStringTable ??= new SharedStringTable();
foreach (KeyValuePair<string, object> kvp in updates)
{
string cellRef = kvp.Key;
object value = kvp.Value;
Cell cell = GetOrCreateCell(wsPart, cellRef);
if (value is double d)
{
cell.DataType = null;
cell.CellValue = new CellValue(
d.ToString(System.Globalization.CultureInfo.InvariantCulture));
}
else
{
string text = Convert.ToString(value) ?? string.Empty;
int index = GetOrAddSharedString(text, sst);
cell.DataType = CellValues.SharedString;
cell.CellValue = new CellValue(index.ToString());
}
}
sst.Save();
wsPart.Worksheet.Save();
wbPart.Workbook.Save();
}
}
private static int GetOrAddSharedString(string text, SharedStringTable sst)
{
int i = 0;
foreach (SharedStringItem item in sst.Elements())
{
if (item.Text != null && item.Text.Text == text)
return i;
i++;
}
sst.AppendChild(new SharedStringItem(new Text(text)));
return i;
}
In a real-world scenario, you can generate the updates dictionary from a database result set, configuration file, or another spreadsheet.
5.2 Minimizing repeated lookups.
For very large batches, repeatedly scanning the shared string table becomes expensive.
You can improve performance by caching the mapping between string text and index in a Dictionary<string, int> built once per workbook.
Note : When performance profiling shows the shared string search as a hotspot, introduce a dictionary cache that is populated once and reused for all cell updates in the same workbook.
6. Batch editing across many workbooks.
The next dimension of scale is applying the same transformation across dozens or thousands of files in a folder structure.
6.1 Iterating files in a directory tree.
public static void BatchProcessFolder( string rootFolder, string sheetName, IDictionary<string, object> updatesTemplate) { foreach (string file in Directory.EnumerateFiles( rootFolder, "*.xlsx", SearchOption.AllDirectories)) { Console.WriteLine($"Processing {file}."); IDictionary<string, object> updates = BuildUpdatesForFile(file, updatesTemplate); BatchUpdateSheet(file, sheetName, updates); } }
// Example placeholder to customize per workbook.
private static IDictionary BuildUpdatesForFile(
string filePath,
IDictionary template)
{
// In practice, you might adjust values based on file name, metadata, or external data.
return new Dictionary(template);
}
6.2 Parallelization considerations.
It is possible to use Parallel.ForEach to process files concurrently, but a few points must be respected.
- Each task must open and close its own
SpreadsheetDocument. - Processing multiple files on the same physical disk can become I/O bound, so the optimal degree of parallelism is workload dependent.
- Avoid sharing mutable state between threads unless it is properly synchronized.
7. DOM versus streaming (SAX) for large spreadsheets.
Open XML SDK offers a DOM-style object model and lower-level streaming APIs such as OpenXmlReader and OpenXmlWriter.
7.1 DOM approach.
- Loads the targeted parts into memory as strongly typed objects.
- Easier to write and maintain for moderate file sizes.
- Suitable for most enterprise batch jobs where sheets are not extremely large.
7.2 Streaming approach.
- Processes rows sequentially using readers and writers.
- Reduces memory usage for very large worksheets.
- More complex to implement because you reconstruct the sheet while streaming through it.
| Scenario. | Recommended approach. | Comment. |
|---|---|---|
| Hundreds of workbooks, each with up to a few tens of thousands of cells edited. | DOM. | Simpler implementation, adequate performance. |
| Single workbook with millions of rows in one sheet. | Streaming (reader and writer). | Avoids loading all rows into memory. |
| Report templates with moderate data size but many styles and formulas. | DOM with targeted updates. | Preserves rich formatting with minimal code. |
8. Editing formulas, formatting, and tables in batch.
In many batch operations, you update not only raw values but also formulas and formatting that drive downstream calculations.
8.1 Updating formulas safely.
- Formulas are stored as text in the
Cellelement, typically in theCellFormulachild. - When adjusting a formula, keep references in A1 style consistent and verify that sheet names are quoted when needed.
- Excel recalculates formulas by default when the workbook is opened, so the SDK does not compute formula results.
private static void UpdateFormulaCell( WorksheetPart wsPart, string cellReference, string formula) { Cell cell = GetOrCreateCell(wsPart, cellReference); cell.CellFormula = new CellFormula(formula); cell.CellValue = null; // Let Excel recalculate. } 8.2 Updating number formats and styles.
Styles are stored in the WorkbookStylesPart, which contains a Stylesheet.
For batch editing, a pragmatic strategy is to reuse existing style indexes whenever possible.
- Inspect the
CellFormatitems already in the stylesheet. - If an appropriate format exists, reuse its index rather than adding many new formats.
- When creating new styles in batch, keep their number small and consistent to avoid style bloat.
Note : Excessive style creation during batch operations can lead to workbooks that Excel warns about due to too many cell formats, so reuse existing styles whenever possible.
8.3 Preserving tables and structured references.
Excel tables are represented by table parts associated with a worksheet.
When editing data inside a table range, it is usually sufficient to update cells via SheetData without touching the table definition, as long as you stay within the existing boundaries.
If a batch operation expands the logical size of a table, you also need to adjust its reference range.
9. Robustness and validation strategies.
Batch editing at scale increases the impact of subtle mistakes, so defensive programming and validation are essential.
9.1 Defensive coding practices.
- Validate that required sheets exist before applying updates and log missing sheets clearly.
- Guard against invalid cell references and handle numeric parsing errors explicitly.
- Wrap each file processing step in try catch blocks and write detailed log entries, including stack traces, file names, and sheet names.
9.2 Validating output workbooks.
- As part of a test suite, open a sample of processed files in Excel and check for repair messages.
- Optionally add a lightweight automated check that opens the package with
SpreadsheetDocument.Openin read-only mode after edits to detect structural problems. - For critical workflows, build regression tests that compare key cell values before and after the batch job using small sample files.
9.3 Versioning and rollback.
A robust batch editing pipeline should include backup and rollback procedures.
- Save processed files to a separate output folder instead of overwriting the originals.
- Include timestamps or version identifiers in file names to make rollback straightforward.
- Record which batch configuration and code version were used for each run.
10. Putting it all together as a reusable batch framework.
Once you confirm the basic approach on a small pilot set, you can generalize the batch editing logic into a framework that your team can reuse across projects.
10.1 Suggested architecture.
- Configuration layer. Describes which folders to scan, which sheets to target, and which cells or ranges to update.
- Mapping layer. Converts business data models into cell updates dictionaries per workbook and per sheet.
- Open XML layer. Implements the low-level operations such as locating sheets, managing shared strings, and saving workbooks.
- Orchestration layer. Handles iteration over files, logging, error reporting, and optional parallelization.
10.2 Practical checklist before production.
| Item. | Question. | Status. |
|---|---|---|
| Backups. | Are original workbooks preserved in a safe location. | Planned or completed. |
| Logging. | Does the process log success and failure per file with enough detail. | Planned or completed. |
| Performance tests. | Has the batch been tested at realistic scale with measured timings. | Planned or completed. |
| Validation. | Are representative outputs validated by business users or automated checks. | Planned or completed. |
| Rollback. | Is there a documented procedure to revert to previous versions of workbooks. | Planned or completed. |
FAQ
Can Open XML SDK batch edit legacy .xls files.
No.
The Open XML SDK works with Office Open XML formats such as .xlsx, .xlsm, .xltx, and .xltm.
Legacy .xls files must be converted to a supported format before you can batch edit them with the SDK.
Is Open XML SDK faster than Excel Interop for batch editing.
For server-side and large-scale scenarios, Open XML SDK is typically faster and more robust.
It does not incur the overhead of starting Excel instances and can operate directly on files.
However, actual performance depends on the complexity of your formatting and the efficiency of your code.
Do I need to recalculate formulas after batch editing values.
The SDK does not evaluate formulas.
In most cases, Excel recalculates formulas automatically when the workbook is opened by a user or a compatible application.
If you rely on stored values without opening the workbook, you would need a separate calculation engine.
How can I avoid corrupting workbooks when making complex changes.
Always manipulate parts using the SDK rather than manually editing XML strings.
Validate that sheets exist before editing, keep cells ordered by reference, and reuse styles and shared strings instead of generating inconsistent structures.
Include automated tests that open each processed workbook with the SDK itself to catch structural issues early.
Is it safe to run Open XML SDK batch jobs in parallel.
It is safe as long as each thread works on different files and creates its own SpreadsheetDocument instances.
Ensure your logging and configuration handling are thread safe and tune the degree of parallelism to balance CPU and I/O usage.
추천·관련글
- How to Stabilize pH After Acid Neutralization: Proven Process Control Strategies
- How to Extend HPLC Column Life: Proven Maintenance, Mobile Phase, and Sample Prep Strategies
- Correct Curved ICP-OES Calibration: Expert Methods to Restore Linearity
- Fix Poor XRD Alignment: Expert Calibration Guide for Accurate Powder Diffraction
- Gas Chromatography FID Flame Ignition Failure: Expert Troubleshooting and Quick Fixes
- Fix FTIR Baseline Slope: Proven Methods for Accurate Spectra
- Get link
- X
- Other Apps