- Get link
- X
- Other Apps
- Get link
- X
- Other Apps
This article explains how to implement anti join and semi join in Excel using both Power Query and formulas so that you can perform professional grade data comparison, data cleansing, and list management without needing a separate database system.
1. Conceptual overview of anti join and semi join
Before implementing anti join and semi join in Excel it is important to understand what these operations mean at a relational level.
1.1 What is an anti join.
An anti join returns rows from one table that have no matching row in another table based on a join key.
- Input. Two tables, typically called a left table and a right table.
- Condition. Rows are compared using one or more key columns, for example CustomerID, ProductCode, or a composite key.
- Result. The output contains only rows from the left table for which no matching key value exists in the right table.
In business terms an anti join answers questions such as “Which customers in this list are not yet in the master table.” or “Which SKUs in file A do not exist in file B.”.
1.2 What is a semi join.
A semi join returns rows from one table where a matching row exists in another table but it keeps only the columns from the first table.
- Input. The same as an anti join, a left table and a right table.
- Condition. Matching is based on one or more key columns.
- Result. The output contains rows from the left table where at least one matching key exists in the right table. Columns from the right table are not included.
In practical terms a semi join answers questions such as “Which orders belong to customers that appear in this VIP list.” or “Which employees are in both the HR system and the security access list.”.
1.3 How anti join and semi join differ from common joins.
Excel users are usually familiar with inner joins and left joins through lookup formulas and Power Query merges. Anti and semi joins behave differently.
- Inner join. Returns matching rows from both tables with columns from both sides.
- Left outer join. Returns all rows from the left table and matching rows from the right table when they exist, typically used in VLOOKUP or XLOOKUP patterns.
- Left anti join. Returns only non matching rows from the left table with no columns from the right table.
- Left semi join. Returns only rows from the left table that have at least one match in the right table, again without columns from the right table.
This distinction is important because anti join and semi join are usually used for validation, filtering, and exception handling rather than for adding extra columns.
2. Typical use cases for anti join and semi join in Excel
Anti join and semi join provide robust solutions for common data preparation scenarios in Excel.
| Scenario | Recommended join | Description |
|---|---|---|
| Find items in list A that are missing from list B. | Left anti join. | Detect customers not yet on the master list, products not registered in ERP, or IDs missing from a reference table. |
| Keep only rows in list A that are approved or whitelisted in list B. | Left semi join. | Filter to customers in a whitelist, employees with valid security clearance, or invoices approved for payment. |
| Clean up obsolete entries. | Left anti join. | Identify codes, accounts, or vendors that no longer exist in the master system for potential removal. |
| Targeted marketing or reporting. | Left semi join. | Extract customers that match a segment list, region list, or campaign target list. |
| Reconciling two systems. | Both anti and semi joins. | Use semi joins to extract overlaps and anti joins to identify discrepancies between systems. |
Note : When using anti join or semi join for reconciliation always confirm that the join keys are unique or that the join logic is carefully designed for one to many relationships to avoid misleading results.
3. Anti join and semi join with Excel Power Query
Power Query, called Get and Transform Data in newer Excel versions, provides built in support for anti and semi joins via the Merge Queries dialog.
3.1 Preparing tables for Power Query.
To get reliable results every source range should be converted into an Excel Table.
- Select a data range.
- Press Ctrl+T to convert it into a table and give it a meaningful name, such as tblLeft or tblCustomers.
- Repeat for the right hand table, for example tblRight or tblMaster.
Using Excel Tables ensures that Power Query sees dynamic ranges and that column names remain stable even when new rows are added.
3.2 Loading tables into Power Query.
- Select any cell in tblLeft.
- On the Data tab choose From Table or Range to create a query.
- Repeat for tblRight.
- In the Queries pane you will now see two queries, for example Customers and MasterList.
3.3 Performing a left anti join in Power Query.
Assume you want all rows from Customers that do not exist in MasterList based on the CustomerID field. The steps are as follows.
- In the Power Query Editor select the Customers query.
- On the Home tab select Merge Queries, then choose either Merge Queries or Merge Queries as New depending on whether you want a new query.
- In the merge dialog set:
- Top table. Customers.
- Bottom table. MasterList.
- Join columns. select CustomerID in both tables, and if using multiple keys press Ctrl while clicking additional key columns.
- Join kind. choose Left Anti.
- Confirm with OK.
The result of a left anti join shows only customers that have no matching CustomerID in the MasterList table. No additional columns are added because the right table is only used as a filter.
Note : A left anti join is extremely useful before importing or creating new master data, because it reveals which records are new and need to be created rather than updated.
3.4 Performing a left semi join in Power Query.
For a semi join you want customers that do exist in the master table. The procedure is nearly identical but the join kind changes.
- Again choose Merge Queries with Customers as the top table and MasterList as the bottom table.
- Select the same key column or set of columns.
- In Join kind choose Left Semi.
- Confirm with OK.
The result of a left semi join contains only customers that have a matching record in MasterList. Columns from MasterList are not added, because semi join retains only the left table columns. If you need extra columns you would use a regular left outer join instead.
3.5 Right anti and right semi joins.
Power Query also provides Right Anti and Right Semi joins. They are symmetrical versions.
- Right Anti. Returns rows from the right table that are not matched in the left table.
- Right Semi. Returns rows from the right table that have at least one match in the left table.
These are useful when your natural starting point is the right table, for example when you are checking which entries in the master table are no longer present in a new transaction file.
3.6 Example of anti and semi joins in Power Query.
Consider the following simplified example of two tables.
| Customers (left) | MasterList (right) | ||
|---|---|---|---|
| CustomerID | Name | CustomerID | Status |
| C001 | Alpha Co. | C001 | Active |
| C002 | Beta Ltd. | C003 | Active |
| C003 | Gamma Inc. | C004 | Inactive |
| C004 | Delta AG | C005 | Active |
Results.
- Left Anti on CustomerID. Returns customers whose ID does not exist in MasterList. In this example the result is C002 because C001, C003, and C004 all have matches.
- Left Semi on CustomerID. Returns customers whose ID exists in MasterList. In this example C001, C003, and C004 are returned.
4. Implementing anti join and semi join with Excel formulas
If Power Query is not available or you prefer a worksheet based solution you can implement anti and semi joins using lookup formulas. The exact approach depends on whether your Excel version supports dynamic arrays.
4.1 Core idea for formula based anti and semi joins.
The essential logic is always to test whether a key value is found in another range. On that basis you either keep or exclude each row.
- Anti join. Keep rows from the left table where the key is not found in the right table.
- Semi join. Keep rows from the left table where the key is found in the right table.
Lookup functions such as MATCH, XLOOKUP, and COUNTIF, combined with IF, FILTER, and logical expressions, implement these rules efficiently.
4.2 Anti join with legacy formulas using MATCH and ISNA.
This pattern works in Excel versions that do not support dynamic arrays such as Excel 2016 or earlier.
- Assume table A with key values in column A and table B with key values in column E.
- In cell C2 of table A enter the following formula.
=IF(ISNA(MATCH(A2,$E$2:$E$1000,0)),"Not in B","In B") Explanation.
- The MATCH function searches for A2 in the key column of table B.
- If MATCH cannot find a match it returns an error, which ISNA converts to TRUE.
- The IF function labels the row as “Not in B” or “In B” accordingly.
To create a pure anti join result filter table A on the label “Not in B” or use an advanced filter to copy those rows to a separate sheet.
4.3 Semi join with legacy formulas using MATCH and ISNA.
For a semi join you simply reverse the logic to keep rows that do have a match.
=IF(ISNA(MATCH(A2,$E$2:$E$1000,0)),"Exclude","Include") After filling the formula down filter table A on “Include” and you obtain a semi join equivalent, because only rows with a match in table B remain visible.
Note : With legacy formulas always lock the lookup range with absolute references such as $E$2:$E$1000 so that the range does not shift when copying formulas down.
4.4 Anti join and semi join with dynamic array formulas.
In Excel for Microsoft 365 and Excel 2021 or later you can spill entire filtered tables using the FILTER function. This allows a compact and robust definition of anti join and semi join.
4.4.1 Anti join with FILTER and MATCH.
Assume two Excel Tables named tblA and tblB, both containing a column Key.
=FILTER(tblA, ISNA(MATCH(tblA[Key], tblB[Key], 0))) This formula returns all rows from tblA whose Key value is not found in tblB, which is a left anti join.
4.4.2 Semi join with FILTER and MATCH.
To return only matching rows from tblA use the opposite condition.
=FILTER(tblA, ISNUMBER(MATCH(tblA[Key], tblB[Key], 0))) This formula returns only rows from tblA that have at least one match in tblB, implementing a left semi join.
4.4.3 Using XLOOKUP with FILTER.
When keys are not unique or you prefer XLOOKUP the logic is similar. For an anti join.
=FILTER(tblA, ISERROR(XLOOKUP(tblA[Key], tblB[Key], tblB[Key]))) For a semi join.
=FILTER(tblA, NOT(ISERROR(XLOOKUP(tblA[Key], tblB[Key], tblB[Key])))) In each case XLOOKUP attempts to find the key in tblB. The error state signals missing keys, which drives the filter condition.
4.5 Handling multiple key columns with formulas.
Real world joins often rely on more than one column, for example a combination of CustomerID and Country or ProductCode and Plant. You can construct a composite key for formula based joins.
4.5.1 Concatenating key columns.
Add computed columns in both tables using a delimiter that does not appear in the raw data.
=[@CustomerID] & "|" & [@Country] After adding this composite key in both tables you can use the same MATCH or XLOOKUP patterns as before but referencing the concatenated column.
4.5.2 Using TEXTJOIN and LET for improved maintainability.
In dynamic array enabled Excel you can use LET and TEXTJOIN to keep formulas compact.
=LET( leftKey, tblA[CustomerID] & "|" & tblA[Country], rightKey, tblB[CustomerID] & "|" & tblB[Country], FILTER(tblA, ISNA(MATCH(leftKey, rightKey, 0))) ) This formula produces an anti join based on multiple key columns while keeping the logic clear and readable.
5. Choosing between Power Query and formulas
Both approaches can implement anti join and semi join in Excel but they have different strengths.
| Aspect | Power Query | Formulas |
|---|---|---|
| Data volume. | Handles large datasets more efficiently, often hundreds of thousands of rows. | Can become slow with very large ranges, especially with volatile formulas or repeated lookups. |
| Refresh behavior. | Press Refresh All to rerun the entire transformation pipeline in a controlled way. | Recalculates automatically on changes, which is convenient but can be expensive. |
| Transparency. | Steps are visible in the Applied Steps pane and can be documented. | Logic is encoded in formulas, which may be harder to audit if not well structured. |
| Complex joins and transformations. | Better suited for multi step transformations, multiple joins, and column reshaping. | Suitable for simpler filters, especially when you need interactive results in a worksheet. |
| User skill level. | Requires familiarity with the Power Query interface. | Uses standard worksheet formulas but can still be complex for multi key scenarios. |
Note : For recurring monthly jobs with stable data sources Power Query is usually preferable, while formula based anti and semi joins are often more convenient for ad hoc analysis and quick checks.
6. Best practices for reliable anti join and semi join in Excel
To ensure consistent and accurate results follow these practices regardless of whether you use Power Query or formulas.
6.1 Normalize key columns.
- Trim leading and trailing spaces.
- Apply consistent case, for example all upper case IDs.
- Standardize formats for codes and numbers, such as zero padded IDs or consistent country codes.
In Power Query this can be done with Transform commands such as Trim, Clean, and Format. In formulas you can use TRIM, CLEAN, UPPER, and TEXT to achieve the same effect.
6.2 Remove unintended duplicates.
Duplicates in key columns can cause confusing results, such as repeated rows or mismatched counts.
- Use Remove Duplicates on key columns after careful review, when appropriate.
- Alternatively keep duplicates but explicitly group them and count occurrences to ensure they are expected.
6.3 Validate join results.
After performing an anti join or semi join always verify the result using summary checks.
- Count the number of rows before and after the join to ensure totals make sense.
- For anti joins, randomly sample a few keys to confirm that they truly do not exist in the other table.
- For semi joins, verify that every key in the result appears in the reference table.
6.4 Document join logic.
Whether using Power Query or formulas document which fields are used as keys and why. Add comments, descriptive query names, and if necessary a separate sheet explaining the join rules. This documentation is essential for audits, collaboration, and future maintenance.
FAQ
What is the difference between anti join and semi join in Excel.
An anti join returns rows from a left table that do not have a matching key in a right table, which is ideal for finding missing or new items.
A semi join returns rows from a left table that do have at least one matching key in a right table, which is used to filter a list down to entries that exist in a reference or whitelist.
Should I use Power Query or formulas for anti join and semi join.
Use Power Query when you work with large datasets, need repeatable refreshable steps, or perform multiple joins and transformations in sequence.
Use formulas when you need quick ad hoc filters, interactive what if analysis, or when Power Query is not available in your environment.
Can I perform anti join across different Excel workbooks.
Yes, both Power Query and formulas can reference external workbooks. In Power Query you can connect to another workbook as a data source. With formulas you can point MATCH, XLOOKUP, or FILTER to ranges in another workbook. Ensure that external file paths remain stable and that linked workbooks are accessible when refreshing.
How do I handle multiple key columns when doing anti join or semi join in Excel.
In Power Query select multiple key columns while holding the Ctrl key when defining the merge. In formulas create a composite key by concatenating the key columns with a safe delimiter and use that composite column in MATCH or XLOOKUP. This approach ensures that the join reflects the combined uniqueness of all key fields.
What is the easiest way to start with anti join in Excel without Power Query.
The simplest approach is to use MATCH and ISNA in a helper column. Label rows as “Not in B” when the key from table A is not found in table B and then filter the table on that label. This provides a straightforward anti join effect using only traditional worksheet functions.
추천·관련글
- Prevent UV-Vis Absorbance Saturation: Expert Strategies for Accurate Spectrophotometry
- Fix NMR Shimming Failure: Expert Troubleshooting Guide for Sharp, Stable Spectra
- Fix FTIR Baseline Slope: Proven Methods for Accurate Spectra
- Fix Poor XRD Alignment: Expert Calibration Guide for Accurate Powder Diffraction
- GC Peak Tailing Troubleshooting: Proven Fixes for Sharp, Symmetric Peaks
- Fix Distorted EIS Arcs: Expert Troubleshooting for Accurate Nyquist and Bode Plots
- Get link
- X
- Other Apps