35  Power Query for Data Transformation

35.1 Why Power Query Matters

Power Query is the unsung hero of the Microsoft analytics stack — the place where messy source data becomes the clean tables every dashboard quietly depends on.

Power Query is Microsoft’s data-transformation engine, embedded in Power BI Desktop, Excel, Dataflows, Power Automate, and Analysis Services. It is graphical first — every transformation is a click in the ribbon — and code-backed under the covers, with each click translating into a step in the M language that the analyst can read, edit, and version.

The standard practitioner reference is Master Your Data with Power Query in Excel and Power BI by Ken Puls & Miguel Escobar (2022), the most thorough working guide to Power Query and M. Chris Webb’s earlier Power Query for Power BI and Excel by Chris Webb (2014) remains the foundational text.

For a visualisation-focused book, Power Query is what stands between a raw export and the dashboard the audience sees. Most of the data work invisible to dashboard consumers happens here.

35.2 Power Query Across the Microsoft Stack

flowchart TD
    PQ["Power Query<br>(Engine and<br>M language)"]
    PQ --> PB["Power BI Desktop<br>and Power BI Service"]
    PQ --> EX["Excel<br>(Get and Transform)"]
    PQ --> DF["Power BI Dataflows<br>(reusable cloud<br>transformation)"]
    PQ --> PA["Power Automate<br>(workflow integration)"]
    PQ --> AS["Azure Analysis<br>Services"]
    PQ --> Fa["Microsoft Fabric<br>Dataflows Gen2"]
    style PQ fill:#e3f2fd,stroke:#1976D2
    style PB fill:#fce4ec,stroke:#AD1457
    style EX fill:#fff3e0,stroke:#EF6C00
    style DF fill:#fff8e1,stroke:#F9A825
    style PA fill:#e8f5e9,stroke:#388E3C
    style AS fill:#ede7f6,stroke:#4527A0
    style Fa fill:#f3e5f5,stroke:#6A1B9A

The same Power Query engine and M language run across:

  • Power BI Desktop and Power BI Service — the most common analyst entry point.
  • Excel — under Data → Get & Transform Data (Office 2016 and later).
  • Power BI Dataflows — cloud-resident reusable transformations consumed by multiple workbooks.
  • Power Automate — workflow steps that include data preparation.
  • Azure Analysis Services — enterprise tabular models.
  • Microsoft Fabric Dataflows Gen2 — the modern Microsoft data-platform successor.

A Power Query pipeline written in Excel can be copied and pasted into Power BI Desktop with no changes; the engine is identical.

35.3 The Power Query Editor Interface

To open the editor: in Power BI Desktop, Home → Transform Data. In Excel, Data → Get Data → Launch Power Query Editor, or right-click any existing query.

The editor is divided into:

  • Queries pane (left) — the list of queries in the workbook.
  • Ribbon (top) — Home, Transform, Add Column, View, Tools, Help — every transformation organised by category.
  • Formula bar (above the data) — shows the M code for the currently selected step.
  • Data preview (centre) — first 1,000 rows of the current step’s result.
  • Query Settings pane (right) — Properties (name, description) and the Applied Steps list.
  • Status bar (bottom) — column profiling toggles (Column Quality, Column Distribution, Column Profile, covered in Chapter 6).

The Applied Steps list is the heart of the editor: each click adds a step; the list reads like a recipe; the analyst can click any step to see the data at that point.

35.4 The M Language in One Page

M (also called the Power Query Formula Language) is a functional, case-sensitive language Power Query uses to express transformations. The analyst rarely writes M from scratch; the editor generates it as the user clicks through the ribbon.

A typical M query looks like:

let
    Source = Csv.Document(File.Contents("C:\Data\sales.csv"), [Delimiter=",", Encoding=65001]),
    Promoted = Table.PromoteHeaders(Source),
    Typed = Table.TransformColumnTypes(Promoted, {
                {"order_date", type date},
                {"amount", type number}
            }),
    Filtered = Table.SelectRows(Typed, each [amount] > 0),
    Result = Table.Sort(Filtered, {{"order_date", Order.Descending}})
in
    Result

Each comma-separated assignment in the let block corresponds to one Applied Step. The in clause names the final output. Editing M directly is occasionally needed for advanced patterns; for routine work, the ribbon is sufficient.

The Advanced Editor (View → Advanced Editor) shows the full M code for a query and is the right place to copy-paste a query between workbooks.

35.5 Common Transformations

TipThe Power Query Ribbon at a Glance
Tab Common Operations What It Does
Home Use First Row as Headers, Remove Rows, Remove Columns, Choose Columns, Append Queries, Merge Queries Connection-and-shape operations
Transform Group By, Transpose, Reverse Rows, Fill Down/Up, Pivot Column, Unpivot Columns, Replace Values, Trim, Clean Reshape and clean operations
Add Column Custom Column, Conditional Column, Index Column, Column from Examples, Duplicate Column Create new fields
View Column Profile, Column Quality, Column Distribution, Advanced Editor Diagnostic and code inspection

A few worked examples:

  • Promote headers: Click Use First Row as Headers — converts the first row into column names.
  • Change type: Click the column-type icon in each header → Date / Whole Number / Decimal / Text.
  • Filter rows: Click the dropdown arrow on a column → Number Filters → Greater Than → 0.
  • Group By: Transform → Group By — choose grouping columns and aggregation (Sum, Count, Average, Min, Max).
  • Pivot: Transform → Pivot Column — turns a long table into a wide one.
  • Unpivot: Transform → Unpivot Columns — turns a wide table into a long one (the tidy data operation from Chapter 22).
  • Merge Queries: Home → Merge Queries — joins two tables on a key (with optional fuzzy matching).
  • Append Queries: Home → Append Queries — stacks two tables vertically (the union pattern).
  • Add Custom Column: Add Column → Custom Column — write an M expression for the new column.
  • Conditional Column: Add Column → Conditional Column — graphical IF logic without writing M.

35.6 Advanced Features

Beyond the ribbon basics, Power Query supports patterns that appear in nearly every mature workbook:

  • Custom Functions: Wrap a query in a function with parameters, then invoke it on a list. Useful for processing many similar files (one function call per file in a folder).
  • Parameters: Workbook-level constants — file path, region filter, as-of date — referenced from queries. Set the parameter once and the query updates everywhere.
  • Folder Loop Pattern: Get Data → Folder; Power Query lists every file matching the pattern; combined with a custom function, processes them all into one consolidated table.
  • Error Handling: try ... otherwise ... in M, or Transform → Replace Errors in the ribbon — convert transient errors to a default value.
  • Query Groups: Right-click in the Queries pane → New Group; organise queries into logical folders (Sources, Cleansed, Output) for maintainability.
  • Reference vs Duplicate: Right-click a query → Reference creates a downstream query that depends on the original; Duplicate creates an independent copy. References preserve a single transformation lineage.
  • Disable Load: For staging queries that should not become tables, right-click → Enable Load → uncheck; the staging query is computed but not loaded into the model.

35.7 Query Folding and Performance

Query Folding is the most important performance concept in Power Query. When the source supports it (relational databases, Snowflake, BigQuery, Power BI Dataflows), Power Query translates the steps into native query operations (SQL, MDX) that run on the source — rather than pulling all rows down and processing them locally.

A folded query is fast and uses minimal memory. A non-folded query is slow and downloads everything.

Rules of thumb to preserve query folding:

  • Apply transformations supported by the source — filter, sort, group by, type changes, simple column derivations all fold.
  • Avoid mid-pipeline operations that break folding — adding an Index Column, applying certain custom-column expressions, or unpivoting in some configurations.
  • Push folding-breakers to the end — if a non-folding step is unavoidable, place it as late as possible so the earlier steps still fold.
  • Right-click any step → View Native Query — if visible, the step is folding; if greyed out, it is not.

For very large data sources, query folding is the difference between an analytical workbook that runs in seconds and one that times out.

35.8 Best Practices for Power Query

  • One query per source: Each external source has its own query; downstream queries reference it.
  • Rename steps to describe their purpose (Removed columnsKept analytical columns).
  • Use Query Groups to organise large pipelines.
  • Promote staging queries to Dataflows when the same transformation is reused across workbooks.
  • Parametrise file paths and dates so the workbook can be moved between environments.
  • Disable load on staging queries; load only what the model needs.
  • Document with comments: Add // comment lines in the Advanced Editor to explain non-obvious steps.
  • Test refresh: Before publishing, click Refresh All to confirm the pipeline runs end to end without manual intervention.
  • Watch for regional locale issues: Date and number formats are locale-sensitive; use Change Type with Locale for consistent parsing.

35.9 Common Pitfalls

  • Folding Lost Early: An early Index Column or unpivot breaks folding; the entire downstream pipeline runs locally on millions of rows.
  • Hard-Coded File Paths: A query referencing C:\Users\Anita\sales.csv works only on Anita’s laptop.
  • Step Sprawl: Twenty Applied Steps for what should be five; pipelines become unmaintainable.
  • Mixed Locale Dates: Loading a CSV without specifying locale; dates parse inconsistently.
  • Promoting Headers Twice: Importing a file that already has clean headers and accidentally promoting again, losing data.
  • Append Without Schema Match: Appending two queries with mismatched columns; new columns appear with nulls or are silently dropped.
  • No Error Handling: A single bad row breaks the entire query; Replace Errors or try/otherwise would have isolated the problem.
  • Disabled Refresh on Critical Source: An anonymous credential expires; the pipeline silently shows last week’s data.
  • Loading Staging Queries: Every staging query loaded into the model; tables proliferate and the model becomes confusing.

35.10 Illustrative Cases

A Folder-Loop Pipeline for Monthly Files

A finance team receives twelve monthly CSV files per year, with one for each branch. A Power Query custom function reads any one file; a folder query lists all files matching the pattern; the function is invoked on each row of the folder list. The result is a single consolidated table that picks up new files automatically. No code edits when the new month arrives.

A Reusable Cleaning Pipeline as a Dataflow

A retail firm builds a customer-master cleaning pipeline and promotes it to a Power BI Dataflow. Twenty downstream workbooks now consume the cleansed dataflow rather than each duplicating the cleaning logic. When the cleaning rule changes, it changes once.

A Query-Folding Save

A bank’s transaction-reporting workbook initially takes 12 minutes to refresh. Inspection reveals an early Index Column that breaks folding; the entire 80-million-row table is being processed locally. Removing the Index Column (and using a different M expression for the same purpose) restores folding. Refresh time drops to 90 seconds.


35.11 Hands-On Exercise: Building a Reusable Power Query Pipeline

Aim: Build a Power Query pipeline that reads a folder of monthly sales files for Yuvijen Stores, applies a consistent cleaning routine via a custom function, parametrises the folder path, and produces a refreshable consolidated table.

Deliverable: A Power BI file (or Excel workbook) with three queries — a Parameter, a Cleaning Function, and a Folder Loop — and a one-page reference showing the Applied Steps.

35.11.1 Step 1 — The Sample Data

Place twelve monthly CSV files in a folder, each named sales_2025-04.csv, sales_2025-05.csv, etc., with the same schema. The files do not need to be perfectly identical; the cleaning function will normalise them.

35.11.2 Step 2 — Create a Folder-Path Parameter

  1. Home → Manage Parameters → New Parameter.
  2. Name: FolderPath. Type: Text. Suggested Values: Any value. Current Value: the folder path on your machine.
  3. OK.

The parameter is now referenced from any query that needs the folder path. To move the workbook, only this parameter changes.

35.11.3 Step 3 — Connect to the Folder

  1. Home → Get Data → Folder.
  2. In the path box, change the literal path to the parameter: click the Source step’s path field and replace with =FolderPath.
  3. Power Query lists every file in the folder.
  4. Filter to .csv files only by clicking the dropdown on the Extension column.

35.11.4 Step 4 — Build a Cleaning Function from One File

  1. Click the Combine Files button (the double-down-arrow icon on the Content column header).
  2. Power BI auto-generates a sample query against the first file.
  3. Edit the auto-generated Transform Sample File query: apply the cleaning steps you want for every file — Promote Headers, Change Types with Locale, Trim, Clean, Filter rows where amount > 0, Add Conditional Column for order size band.
  4. Each click is recorded in the Sample File query’s Applied Steps.
  5. Power BI also generates a Transform File function that wraps the sample steps.

35.11.5 Step 5 — Apply the Function to All Files

  1. Return to the Folder query (now called something like sales_files). Power BI has already added an Invoke Custom Function step that runs the cleaning function on each file’s content.
  2. Expand the resulting table column to flatten back to a single table.
  3. The result is a consolidated, cleaned table containing all files’ data.

35.11.6 Step 6 — Validate the Pipeline

  • Inspect the Applied Steps in the consolidated query; rename any default-named steps to be descriptive.
  • Right-click each step → View Native Query — for source-supported operations, the SQL or M-translated query is visible (not applicable for CSV but matters when the source is a database).
  • Click Refresh Preview to confirm the pipeline runs end to end.

35.11.7 Step 7 — Promote the Pipeline to a Dataflow (Optional)

If the same cleaning logic should serve multiple workbooks:

  1. In the Power BI Service, + New → Dataflow.
  2. Add a query and paste the M code from this workbook (View → Advanced Editor → copy).
  3. Schedule a refresh on the dataflow.
  4. In any workbook, Get Data → Dataflows and select the dataflow as a source.

The cleaning rule now lives in one cloud-managed location consumed by every workbook.

35.11.8 Step 8 — Connect to the Visualisation Layer

The Power Query pipeline is invisible to the dashboard consumer but determines what every visual shows. Specifically:

  • A KPI card of Total Sales is honest only if the cleaning has filtered out negative or zero amounts.
  • A trend chart of Sales by Month is correct only if the file’s date column has been parsed with the right locale.
  • A choropleth of Sales by Region matches the audience’s mental map only if region values have been standardised (Trim, Capitalise) in the cleaning step.

Every Power Query step has a downstream visualisation consequence. The hands-on closes the loop: the dashboard’s correctness rests on the Applied Steps that produced its data.

TipFiles and Screen Recordings

Power BI file (yuvijen-power-query-pipeline.pbix), the sample folder of monthly CSVs, and screen recordings of the parameter, function, folder loop, and dataflow promotion will be embedded here.


Summary

Concept Description
Foundations
Why Power Query Matters Power Query is where messy source data becomes the clean tables every dashboard quietly depends on
Power Query Across the Stack
Power BI Desktop Most common Power Query entry point for analysts
Excel Get and Transform Power Query in Excel under Data Get and Transform
Power BI Dataflows Cloud-resident reusable transformations consumed by multiple workbooks
Power Automate Workflow steps that include data preparation
Azure Analysis Services Enterprise tabular models built on Power Query
Microsoft Fabric Microsoft data-platform successor with Dataflows Gen2
The Editor Interface
Power Query Editor Editor opened from Home Transform Data in Power BI
Queries Pane List of queries in the workbook on the left
Ribbon Top ribbon with Home, Transform, Add Column, View, Tools, Help
Formula Bar Shows the M code for the currently selected step
Data Preview First thousand rows of the current step's result
Query Settings Pane Properties and Applied Steps list on the right
Status Bar Column profiling toggles on the bottom
Applied Steps List Recipe of clicks; analyst can return to any step to inspect data
The M Language
M Language Functional case-sensitive language Power Query uses internally
Let In Block Each comma-separated assignment in let corresponds to one Applied Step
Comma-Separated Steps Final clause names the output of the query
Advanced Editor View Advanced Editor shows the full M code; useful for copy-paste between workbooks
Common Transformations
Use First Row as Headers Converts the first row into column names
Change Type Sets type to Date, Whole Number, Decimal, Text
Filter Rows Click dropdown on a column for Number, Text, Date filters
Group By Choose grouping columns and aggregations Sum, Count, Average
Pivot Column Turns a long table into a wide one
Unpivot Columns Turns a wide table into a long one; the tidy-data operation
Merge Queries Joins two tables on a key with optional fuzzy matching
Append Queries Stacks two tables vertically; the union pattern
Add Custom Column Add Column with M expression for the new column
Conditional Column Graphical IF logic without writing M
Advanced Features
Custom Functions Wrap a query in a function with parameters; invoke on a list
Parameters Workbook-level constants referenced from queries
Folder Loop Pattern Get Data Folder plus a custom function processes all files into one table
Error Handling try otherwise in M, or Transform Replace Errors in the ribbon
Query Groups Right-click in Queries pane New Group; organise into folders
Reference Versus Duplicate Reference creates a downstream query depending on the original; Duplicate creates an independent copy
Disable Load Right-click query Enable Load uncheck; computed but not loaded into the model
Query Folding
Query Folding Power Query translates steps into native source queries when possible
Native Query View Right-click step View Native Query; visible means folding, greyed out means not
Folding Preservation Rules Apply source-supported transformations; push folding breakers to the end
Best Practices
One Query per Source Each external source has its own query; downstream queries reference it
Rename Steps Descriptively Rename Applied Steps to describe their purpose
Use Query Groups Organise large pipelines into logical query groups
Promote to Dataflows Reuse cleaning rules across workbooks via a Dataflow
Parametrise Paths and Dates Workbook-level parameters for file paths and dates so the file can be moved between environments
Disable Load on Staging Disable load on staging queries; load only what the model needs
Document with Comments Add comment lines in the Advanced Editor to explain non-obvious steps
Test Refresh Before Publishing Click Refresh All before publishing to confirm the pipeline runs end to end
Change Type with Locale Use Change Type with Locale for consistent date and number parsing
Common Pitfalls
Folding Lost Early Pitfall of an early Index Column or unpivot breaking folding and processing millions locally
Hard-Coded File Paths Pitfall of a query referencing a path that works only on the original developer's machine
Step Sprawl Pitfall of twenty steps for what should be five; unmaintainable pipelines
Mixed Locale Dates Pitfall of loading without specifying locale; dates parse inconsistently
Promoting Headers Twice Pitfall of accidentally promoting headers on a file that already has clean ones
Append Without Schema Match Pitfall of appending two queries with mismatched columns; nulls or silently dropped columns
No Error Handling Pitfall of one bad row breaking the entire query without try otherwise or Replace Errors
Expired Credentials Pitfall of an anonymous credential expiring and the pipeline silently showing stale data
Loading Staging Queries Pitfall of every staging query loaded into the model; tables proliferate and the model is confusing