flowchart TD
PQ["Power Query<br>(Engine and<br>M language)"]
PQ --> PB["Power BI Desktop<br>and Power BI Service"]
PQ --> EX["Excel<br>(Get and Transform)"]
PQ --> DF["Power BI Dataflows<br>(reusable cloud<br>transformation)"]
PQ --> PA["Power Automate<br>(workflow integration)"]
PQ --> AS["Azure Analysis<br>Services"]
PQ --> Fa["Microsoft Fabric<br>Dataflows Gen2"]
style PQ fill:#e3f2fd,stroke:#1976D2
style PB fill:#fce4ec,stroke:#AD1457
style EX fill:#fff3e0,stroke:#EF6C00
style DF fill:#fff8e1,stroke:#F9A825
style PA fill:#e8f5e9,stroke:#388E3C
style AS fill:#ede7f6,stroke:#4527A0
style Fa fill:#f3e5f5,stroke:#6A1B9A
35 Power Query for Data Transformation
35.1 Why Power Query Matters
Power Query is the unsung hero of the Microsoft analytics stack — the place where messy source data becomes the clean tables every dashboard quietly depends on.
Power Query is Microsoft’s data-transformation engine, embedded in Power BI Desktop, Excel, Dataflows, Power Automate, and Analysis Services. It is graphical first — every transformation is a click in the ribbon — and code-backed under the covers, with each click translating into a step in the M language that the analyst can read, edit, and version.
The standard practitioner reference is Master Your Data with Power Query in Excel and Power BI by Ken Puls & Miguel Escobar (2022), the most thorough working guide to Power Query and M. Chris Webb’s earlier Power Query for Power BI and Excel by Chris Webb (2014) remains the foundational text.
For a visualisation-focused book, Power Query is what stands between a raw export and the dashboard the audience sees. Most of the data work invisible to dashboard consumers happens here.
35.2 Power Query Across the Microsoft Stack
The same Power Query engine and M language run across:
- Power BI Desktop and Power BI Service — the most common analyst entry point.
- Excel — under Data → Get & Transform Data (Office 2016 and later).
- Power BI Dataflows — cloud-resident reusable transformations consumed by multiple workbooks.
- Power Automate — workflow steps that include data preparation.
- Azure Analysis Services — enterprise tabular models.
- Microsoft Fabric Dataflows Gen2 — the modern Microsoft data-platform successor.
A Power Query pipeline written in Excel can be copied and pasted into Power BI Desktop with no changes; the engine is identical.
35.3 The Power Query Editor Interface
To open the editor: in Power BI Desktop, Home → Transform Data. In Excel, Data → Get Data → Launch Power Query Editor, or right-click any existing query.
The editor is divided into:
- Queries pane (left) — the list of queries in the workbook.
- Ribbon (top) — Home, Transform, Add Column, View, Tools, Help — every transformation organised by category.
- Formula bar (above the data) — shows the M code for the currently selected step.
- Data preview (centre) — first 1,000 rows of the current step’s result.
- Query Settings pane (right) — Properties (name, description) and the Applied Steps list.
- Status bar (bottom) — column profiling toggles (Column Quality, Column Distribution, Column Profile, covered in Chapter 6).
The Applied Steps list is the heart of the editor: each click adds a step; the list reads like a recipe; the analyst can click any step to see the data at that point.
35.4 The M Language in One Page
M (also called the Power Query Formula Language) is a functional, case-sensitive language Power Query uses to express transformations. The analyst rarely writes M from scratch; the editor generates it as the user clicks through the ribbon.
A typical M query looks like:
let
Source = Csv.Document(File.Contents("C:\Data\sales.csv"), [Delimiter=",", Encoding=65001]),
Promoted = Table.PromoteHeaders(Source),
Typed = Table.TransformColumnTypes(Promoted, {
{"order_date", type date},
{"amount", type number}
}),
Filtered = Table.SelectRows(Typed, each [amount] > 0),
Result = Table.Sort(Filtered, {{"order_date", Order.Descending}})
in
Result
Each comma-separated assignment in the let block corresponds to one Applied Step. The in clause names the final output. Editing M directly is occasionally needed for advanced patterns; for routine work, the ribbon is sufficient.
The Advanced Editor (View → Advanced Editor) shows the full M code for a query and is the right place to copy-paste a query between workbooks.
35.5 Common Transformations
| Tab | Common Operations | What It Does |
|---|---|---|
| Home | Use First Row as Headers, Remove Rows, Remove Columns, Choose Columns, Append Queries, Merge Queries | Connection-and-shape operations |
| Transform | Group By, Transpose, Reverse Rows, Fill Down/Up, Pivot Column, Unpivot Columns, Replace Values, Trim, Clean | Reshape and clean operations |
| Add Column | Custom Column, Conditional Column, Index Column, Column from Examples, Duplicate Column | Create new fields |
| View | Column Profile, Column Quality, Column Distribution, Advanced Editor | Diagnostic and code inspection |
A few worked examples:
- Promote headers: Click Use First Row as Headers — converts the first row into column names.
- Change type: Click the column-type icon in each header → Date / Whole Number / Decimal / Text.
- Filter rows: Click the dropdown arrow on a column → Number Filters → Greater Than → 0.
- Group By: Transform → Group By — choose grouping columns and aggregation (Sum, Count, Average, Min, Max).
- Pivot: Transform → Pivot Column — turns a long table into a wide one.
- Unpivot: Transform → Unpivot Columns — turns a wide table into a long one (the tidy data operation from Chapter 22).
- Merge Queries: Home → Merge Queries — joins two tables on a key (with optional fuzzy matching).
- Append Queries: Home → Append Queries — stacks two tables vertically (the union pattern).
- Add Custom Column: Add Column → Custom Column — write an M expression for the new column.
- Conditional Column: Add Column → Conditional Column — graphical IF logic without writing M.
35.6 Advanced Features
Beyond the ribbon basics, Power Query supports patterns that appear in nearly every mature workbook:
- Custom Functions: Wrap a query in a function with parameters, then invoke it on a list. Useful for processing many similar files (one function call per file in a folder).
- Parameters: Workbook-level constants — file path, region filter, as-of date — referenced from queries. Set the parameter once and the query updates everywhere.
- Folder Loop Pattern: Get Data → Folder; Power Query lists every file matching the pattern; combined with a custom function, processes them all into one consolidated table.
-
Error Handling:
try ... otherwise ...in M, or Transform → Replace Errors in the ribbon — convert transient errors to a default value. - Query Groups: Right-click in the Queries pane → New Group; organise queries into logical folders (Sources, Cleansed, Output) for maintainability.
- Reference vs Duplicate: Right-click a query → Reference creates a downstream query that depends on the original; Duplicate creates an independent copy. References preserve a single transformation lineage.
- Disable Load: For staging queries that should not become tables, right-click → Enable Load → uncheck; the staging query is computed but not loaded into the model.
35.7 Query Folding and Performance
Query Folding is the most important performance concept in Power Query. When the source supports it (relational databases, Snowflake, BigQuery, Power BI Dataflows), Power Query translates the steps into native query operations (SQL, MDX) that run on the source — rather than pulling all rows down and processing them locally.
A folded query is fast and uses minimal memory. A non-folded query is slow and downloads everything.
Rules of thumb to preserve query folding:
- Apply transformations supported by the source — filter, sort, group by, type changes, simple column derivations all fold.
- Avoid mid-pipeline operations that break folding — adding an Index Column, applying certain custom-column expressions, or unpivoting in some configurations.
- Push folding-breakers to the end — if a non-folding step is unavoidable, place it as late as possible so the earlier steps still fold.
- Right-click any step → View Native Query — if visible, the step is folding; if greyed out, it is not.
For very large data sources, query folding is the difference between an analytical workbook that runs in seconds and one that times out.
35.8 Best Practices for Power Query
- One query per source: Each external source has its own query; downstream queries reference it.
-
Rename steps to describe their purpose (
Removed columns→Kept analytical columns). - Use Query Groups to organise large pipelines.
- Promote staging queries to Dataflows when the same transformation is reused across workbooks.
- Parametrise file paths and dates so the workbook can be moved between environments.
- Disable load on staging queries; load only what the model needs.
-
Document with comments: Add
// commentlines in the Advanced Editor to explain non-obvious steps. - Test refresh: Before publishing, click Refresh All to confirm the pipeline runs end to end without manual intervention.
-
Watch for regional locale issues: Date and number formats are locale-sensitive; use
Change Type with Localefor consistent parsing.
35.9 Common Pitfalls
- Folding Lost Early: An early Index Column or unpivot breaks folding; the entire downstream pipeline runs locally on millions of rows.
-
Hard-Coded File Paths: A query referencing
C:\Users\Anita\sales.csvworks only on Anita’s laptop. - Step Sprawl: Twenty Applied Steps for what should be five; pipelines become unmaintainable.
- Mixed Locale Dates: Loading a CSV without specifying locale; dates parse inconsistently.
- Promoting Headers Twice: Importing a file that already has clean headers and accidentally promoting again, losing data.
- Append Without Schema Match: Appending two queries with mismatched columns; new columns appear with nulls or are silently dropped.
-
No Error Handling: A single bad row breaks the entire query; Replace Errors or
try/otherwisewould have isolated the problem. - Disabled Refresh on Critical Source: An anonymous credential expires; the pipeline silently shows last week’s data.
- Loading Staging Queries: Every staging query loaded into the model; tables proliferate and the model becomes confusing.
35.10 Illustrative Cases
A Folder-Loop Pipeline for Monthly Files
A finance team receives twelve monthly CSV files per year, with one for each branch. A Power Query custom function reads any one file; a folder query lists all files matching the pattern; the function is invoked on each row of the folder list. The result is a single consolidated table that picks up new files automatically. No code edits when the new month arrives.
A Reusable Cleaning Pipeline as a Dataflow
A retail firm builds a customer-master cleaning pipeline and promotes it to a Power BI Dataflow. Twenty downstream workbooks now consume the cleansed dataflow rather than each duplicating the cleaning logic. When the cleaning rule changes, it changes once.
A Query-Folding Save
A bank’s transaction-reporting workbook initially takes 12 minutes to refresh. Inspection reveals an early Index Column that breaks folding; the entire 80-million-row table is being processed locally. Removing the Index Column (and using a different M expression for the same purpose) restores folding. Refresh time drops to 90 seconds.
35.11 Hands-On Exercise: Building a Reusable Power Query Pipeline
Aim: Build a Power Query pipeline that reads a folder of monthly sales files for Yuvijen Stores, applies a consistent cleaning routine via a custom function, parametrises the folder path, and produces a refreshable consolidated table.
Deliverable: A Power BI file (or Excel workbook) with three queries — a Parameter, a Cleaning Function, and a Folder Loop — and a one-page reference showing the Applied Steps.
35.11.1 Step 1 — The Sample Data
Place twelve monthly CSV files in a folder, each named sales_2025-04.csv, sales_2025-05.csv, etc., with the same schema. The files do not need to be perfectly identical; the cleaning function will normalise them.
35.11.2 Step 2 — Create a Folder-Path Parameter
- Home → Manage Parameters → New Parameter.
- Name:
FolderPath. Type: Text. Suggested Values: Any value. Current Value: the folder path on your machine. - OK.
The parameter is now referenced from any query that needs the folder path. To move the workbook, only this parameter changes.
35.11.3 Step 3 — Connect to the Folder
- Home → Get Data → Folder.
- In the path box, change the literal path to the parameter: click the Source step’s path field and replace with
=FolderPath. - Power Query lists every file in the folder.
- Filter to
.csvfiles only by clicking the dropdown on the Extension column.
35.11.4 Step 4 — Build a Cleaning Function from One File
- Click the Combine Files button (the double-down-arrow icon on the Content column header).
- Power BI auto-generates a sample query against the first file.
- Edit the auto-generated Transform Sample File query: apply the cleaning steps you want for every file — Promote Headers, Change Types with Locale, Trim, Clean, Filter rows where amount > 0, Add Conditional Column for order size band.
- Each click is recorded in the Sample File query’s Applied Steps.
- Power BI also generates a Transform File function that wraps the sample steps.
35.11.5 Step 5 — Apply the Function to All Files
- Return to the Folder query (now called something like
sales_files). Power BI has already added an Invoke Custom Function step that runs the cleaning function on each file’s content. - Expand the resulting table column to flatten back to a single table.
- The result is a consolidated, cleaned table containing all files’ data.
35.11.6 Step 6 — Validate the Pipeline
- Inspect the Applied Steps in the consolidated query; rename any default-named steps to be descriptive.
- Right-click each step → View Native Query — for source-supported operations, the SQL or M-translated query is visible (not applicable for CSV but matters when the source is a database).
- Click Refresh Preview to confirm the pipeline runs end to end.
35.11.7 Step 7 — Promote the Pipeline to a Dataflow (Optional)
If the same cleaning logic should serve multiple workbooks:
- In the Power BI Service, + New → Dataflow.
- Add a query and paste the M code from this workbook (View → Advanced Editor → copy).
- Schedule a refresh on the dataflow.
- In any workbook, Get Data → Dataflows and select the dataflow as a source.
The cleaning rule now lives in one cloud-managed location consumed by every workbook.
35.11.8 Step 8 — Connect to the Visualisation Layer
The Power Query pipeline is invisible to the dashboard consumer but determines what every visual shows. Specifically:
- A KPI card of Total Sales is honest only if the cleaning has filtered out negative or zero amounts.
- A trend chart of Sales by Month is correct only if the file’s date column has been parsed with the right locale.
- A choropleth of Sales by Region matches the audience’s mental map only if region values have been standardised (Trim, Capitalise) in the cleaning step.
Every Power Query step has a downstream visualisation consequence. The hands-on closes the loop: the dashboard’s correctness rests on the Applied Steps that produced its data.
Power BI file (yuvijen-power-query-pipeline.pbix), the sample folder of monthly CSVs, and screen recordings of the parameter, function, folder loop, and dataflow promotion will be embedded here.
Summary
| Concept | Description |
|---|---|
| Foundations | |
| Why Power Query Matters | Power Query is where messy source data becomes the clean tables every dashboard quietly depends on |
| Power Query Across the Stack | |
| Power BI Desktop | Most common Power Query entry point for analysts |
| Excel Get and Transform | Power Query in Excel under Data Get and Transform |
| Power BI Dataflows | Cloud-resident reusable transformations consumed by multiple workbooks |
| Power Automate | Workflow steps that include data preparation |
| Azure Analysis Services | Enterprise tabular models built on Power Query |
| Microsoft Fabric | Microsoft data-platform successor with Dataflows Gen2 |
| The Editor Interface | |
| Power Query Editor | Editor opened from Home Transform Data in Power BI |
| Queries Pane | List of queries in the workbook on the left |
| Ribbon | Top ribbon with Home, Transform, Add Column, View, Tools, Help |
| Formula Bar | Shows the M code for the currently selected step |
| Data Preview | First thousand rows of the current step's result |
| Query Settings Pane | Properties and Applied Steps list on the right |
| Status Bar | Column profiling toggles on the bottom |
| Applied Steps List | Recipe of clicks; analyst can return to any step to inspect data |
| The M Language | |
| M Language | Functional case-sensitive language Power Query uses internally |
| Let In Block | Each comma-separated assignment in let corresponds to one Applied Step |
| Comma-Separated Steps | Final clause names the output of the query |
| Advanced Editor | View Advanced Editor shows the full M code; useful for copy-paste between workbooks |
| Common Transformations | |
| Use First Row as Headers | Converts the first row into column names |
| Change Type | Sets type to Date, Whole Number, Decimal, Text |
| Filter Rows | Click dropdown on a column for Number, Text, Date filters |
| Group By | Choose grouping columns and aggregations Sum, Count, Average |
| Pivot Column | Turns a long table into a wide one |
| Unpivot Columns | Turns a wide table into a long one; the tidy-data operation |
| Merge Queries | Joins two tables on a key with optional fuzzy matching |
| Append Queries | Stacks two tables vertically; the union pattern |
| Add Custom Column | Add Column with M expression for the new column |
| Conditional Column | Graphical IF logic without writing M |
| Advanced Features | |
| Custom Functions | Wrap a query in a function with parameters; invoke on a list |
| Parameters | Workbook-level constants referenced from queries |
| Folder Loop Pattern | Get Data Folder plus a custom function processes all files into one table |
| Error Handling | try otherwise in M, or Transform Replace Errors in the ribbon |
| Query Groups | Right-click in Queries pane New Group; organise into folders |
| Reference Versus Duplicate | Reference creates a downstream query depending on the original; Duplicate creates an independent copy |
| Disable Load | Right-click query Enable Load uncheck; computed but not loaded into the model |
| Query Folding | |
| Query Folding | Power Query translates steps into native source queries when possible |
| Native Query View | Right-click step View Native Query; visible means folding, greyed out means not |
| Folding Preservation Rules | Apply source-supported transformations; push folding breakers to the end |
| Best Practices | |
| One Query per Source | Each external source has its own query; downstream queries reference it |
| Rename Steps Descriptively | Rename Applied Steps to describe their purpose |
| Use Query Groups | Organise large pipelines into logical query groups |
| Promote to Dataflows | Reuse cleaning rules across workbooks via a Dataflow |
| Parametrise Paths and Dates | Workbook-level parameters for file paths and dates so the file can be moved between environments |
| Disable Load on Staging | Disable load on staging queries; load only what the model needs |
| Document with Comments | Add comment lines in the Advanced Editor to explain non-obvious steps |
| Test Refresh Before Publishing | Click Refresh All before publishing to confirm the pipeline runs end to end |
| Change Type with Locale | Use Change Type with Locale for consistent date and number parsing |
| Common Pitfalls | |
| Folding Lost Early | Pitfall of an early Index Column or unpivot breaking folding and processing millions locally |
| Hard-Coded File Paths | Pitfall of a query referencing a path that works only on the original developer's machine |
| Step Sprawl | Pitfall of twenty steps for what should be five; unmaintainable pipelines |
| Mixed Locale Dates | Pitfall of loading without specifying locale; dates parse inconsistently |
| Promoting Headers Twice | Pitfall of accidentally promoting headers on a file that already has clean ones |
| Append Without Schema Match | Pitfall of appending two queries with mismatched columns; nulls or silently dropped columns |
| No Error Handling | Pitfall of one bad row breaking the entire query without try otherwise or Replace Errors |
| Expired Credentials | Pitfall of an anonymous credential expiring and the pipeline silently showing stale data |
| Loading Staging Queries | Pitfall of every staging query loaded into the model; tables proliferate and the model is confusing |