16  Cleaning, Combining, and Reshaping Data

NoteWhat This Chapter Covers

Building on the Tableau Prep foundations established in Chapter 15, this chapter covers the three most time-intensive data preparation tasks: cleaning messy data (handling nulls, fixing data types, standardising values, splitting fields), combining data from multiple sources (joins and unions in Prep), and reshaping data between wide and long formats (pivots). Each section uses realistic data quality issues drawn from the kinds of datasets analysts encounter in practice.

flowchart TD
    A[Messy Source Data] --> B[Clean\nFix quality issues]
    B --> C[Combine\nJoin or Union]
    C --> D[Reshape\nPivot wide to long]
    D --> E[Clean Output\nAnalysis-ready dataset]
    B --> B1[Null handling\nData type fixes\nValue standardisation\nField splitting]
    C --> C1[Join: key-based merge\nUnion: stacked rows]
    D --> D1[Wide to long pivot\nLong to wide pivot]
    style A fill:#fce4ec,stroke:#C62828
    style E fill:#e8f5e9,stroke:#388E3C
    style B fill:#fff9c4,stroke:#F9A825


16.1 Cleaning Data in Tableau Prep

NoteHandling Null Values

Null values are the most common data quality issue in real-world datasets. A null represents the absence of a value, it is fundamentally different from zero (which is a value) or an empty string (which is a string with no characters).

Null handling strategies in Tableau Prep:

  1. Filter out nulls: In the Clean step, click the null indicator bar at the bottom of a field’s profile and select Exclude Nulls. Use this when null rows are meaningless for the analysis.

  2. Replace nulls with a default value: Click the null bar and select Replace with. Choose the field average, the most common value, or a custom constant (e.g., replace null Region with “Unknown”).

  3. Create a null flag: In a Clean step, add a calculated field:

Code
# Flag records with missing customer data
IF ISNULL([Customer Name]) THEN "Missing" ELSE "Present" END
  1. Propagate values forward (fill down): When a field has values only on the first row of a group (e.g., an order header repeated across detail rows), use the Fill Down feature in the Clean step to propagate the non-null value to all subsequent rows in the group.

[Insert screenshot of the Prep Profile pane showing the null indicator bar at the bottom of a field, with the context menu showing “Exclude Nulls” and “Replace with” options]

NoteFixing Data Types

Incorrect data types prevent Tableau from performing calculations correctly. Common type errors:

Wrong Type Correct Type Symptom in Tableau Desktop
Dates stored as strings Date Cannot use date functions; no time-series charts
Numbers stored as strings Number Cannot SUM; appears as a dimension
Boolean stored as string Boolean (“True”/“False”) Cannot use logical operators efficiently
Zip codes stored as integers String Sums postal codes; geographic encoding fails

Fixing types in Prep: Click the data type icon above any field in the Profile pane. Prep shows a type mismatch warning (red indicator) when it detects values that cannot be parsed as the current type, hover over the warning to see which values are problematic.

NoteSplitting Fields

A field containing multiple pieces of information in a single value (e.g., “New York, NY” or “P001-Electronics-Blue”) should be split into separate fields for proper analysis.

Automatic split: 1. Click the field header in the Profile pane. 2. Click the Split Values icon (two arrows pointing apart). 3. Prep detects the delimiter automatically and creates new fields for each part. 4. Rename the resulting fields appropriately.

Custom split with a calculated field:

Code
# Extract the product category from a code like "P001-Electronics-Blue"
SPLIT([Product Code], '-', 2)

SPLIT takes three arguments: the field, the delimiter, and the token number (which part to extract, counting from 1).

NoteCalculated Fields in Tableau Prep

Tableau Prep supports calculated fields within Clean steps. While Prep’s calculation language is a subset of Tableau Desktop’s (LOD expressions and table calculations are not available), it covers most row-level transformation needs:

Code
# Standardise region names
IF CONTAINS(UPPER([Region]), "NORTH") THEN "North"
ELSEIF CONTAINS(UPPER([Region]), "SOUTH") THEN "South"
ELSEIF CONTAINS(UPPER([Region]), "EAST") THEN "East"
ELSEIF CONTAINS(UPPER([Region]), "WEST") THEN "West"
ELSE "Other"
END
Code
# Parse a date stored as text (format: DD/MM/YYYY)
DATE(MID([Date String], 7, 4) + "-" +
     MID([Date String], 4, 2) + "-" +
     LEFT([Date String], 2))

Add a calculated field via: Clean step > click the “+” next to any field header > Create Calculated Field.


16.2 Combining Data: Joins in Tableau Prep

NoteJoins in Tableau Prep vs. Tableau Desktop

Tableau Prep’s Join step is more powerful than Tableau Desktop’s Data Source Page join configuration in two key ways:

  1. Join clauses with calculations, You can join on a calculated expression, not just on raw field equality (e.g., join on the first three characters of a product code).
  2. Visual join result profiling, Prep shows a Venn diagram of matched vs. unmatched rows, making it immediately clear how many rows are lost or duplicated by the join.

Adding a Join step: 1. Bring two Input (or Clean) branches into the flow. 2. Drag one branch onto the other. Prep asks whether you want to create a Join or Union. 3. Select Join. 4. In the Join configuration panel, set the join type (Inner, Left, Right, Full Outer) and the join clause (the key fields from each table). 5. The Venn diagram shows the count of: left-only rows, matched rows, and right-only rows. 6. Review the Profile pane output to verify the joined data is correct.

[Insert screenshot of the Tableau Prep Join step configuration showing a Left join Venn diagram, the join clause configuration, and the matched/unmatched row counts]

TipDiagnosing Join Problems with the Venn Diagram

After configuring a join in Prep, always check the Venn diagram proportions: - Too many left-only rows: Suggests your join key has data quality issues on the right table (misspellings, extra spaces, type mismatch). - Unexpected right-only rows: Your right table has keys that do not exist in the left table, possible data integrity issue. - Result row count much higher than input: You have a fan trap (one-to-many join), verify this is intentional.

The Venn diagram makes these issues visible in seconds, a diagnostic that would require careful SQL count queries without Prep.


16.3 Combining Data: Unions in Tableau Prep

NoteWhen to Use a Union

A union vertically stacks rows from two or more tables that share the same structure (same fields, same data types). Use a union when: - You have the same data split across multiple files (e.g., monthly sales files: Jan.csv, Feb.csv, Mar.csv). - You have historical data in one table and current data in another. - You want to combine survey responses from multiple regions stored in separate tables.

Adding a Union step: 1. Drag a second branch (Input step) onto an existing step. Prep asks: Join or Union? 2. Select Union. 3. In the Union configuration panel, Prep displays each table as a column and each field as a row. Fields that match by name and type are automatically aligned; mismatched fields are shown in red. 4. For mismatched fields: drag field names onto each other to manually align them, or click Merge Mismatched Fields for Prep to attempt automatic alignment. 5. An [Table Name] field is automatically added to identify which source table each row came from.

[Insert screenshot of the Union step configuration showing two tables aligned side by side, with matching fields connected and a mismatched field shown in red]

NoteWildcard Union: Combining Multiple Files Automatically

When you have many files with the same structure (e.g., 12 monthly sales files), use a wildcard union to combine them automatically without adding each file as a separate Input step.

  1. On the Input step configuration panel, tick Multiple Files.
  2. Select Wildcard (automatic).
  3. Set the wildcard pattern (e.g., Sales_*.csv to match all files beginning with “Sales_”).
  4. Prep automatically includes all matching files as a union. New files added in future will be included automatically when the flow next runs.

[Insert screenshot of the Input step settings showing the Wildcard Union configuration with the file pattern “Sales_*.csv” and the matched file list]


16.4 Reshaping Data: Pivots

NoteWide vs. Long Format

Most analysis in Tableau (and in most analytical tools) requires data in long format: one row per observation, one column per variable. However, data frequently arrives in wide format from spreadsheets and legacy systems, where each time period or category is a separate column.

Example wide format (crosstab layout from Excel):

Region Jan Feb Mar Apr
East 12000 14000 11000 15000
West 9000 10500 8000 11000

Same data in long format (what Tableau needs):

Region Month Sales
East Jan 12000
East Feb 14000

The wide format has 2 rows; the long format has 8 rows. You cannot build a time-series line chart in Tableau from the wide format because “Month” does not exist as a single dimension.

NoteHow To: Pivoting Wide to Long in Tableau Prep
  1. Add a Pivot step after your Clean step.
  2. In the Pivot configuration, select Columns to Rows (wide to long).
  3. Drag all the column headers you want to pivot (e.g., Jan, Feb, Mar, Apr) to the Pivoted Fields area.
  4. Prep creates two new fields: Pivot1 Names (containing the old column headers: “Jan”, “Feb”, etc.) and Pivot1 Values (containing the cell values).
  5. Rename Pivot1 Names to “Month” and Pivot1 Values to “Sales”.
  6. Optionally add a Clean step after the Pivot to convert the “Month” string to a proper date.

[Insert screenshot of the Tableau Prep Pivot step configuration showing the wide column headers being dragged to the Pivoted Fields area, and the resulting long-format data preview]

NoteHow To: Pivoting Long to Wide (Rows to Columns)

The reverse pivot, converting long format to wide, is occasionally needed when preparing data for a tool that requires wide format, or when creating summary tables.

  1. Add a Pivot step and select Rows to Columns.
  2. Choose the field whose distinct values will become column headers (e.g., Month → Jan, Feb, Mar columns).
  3. Choose the field that provides the values for those columns (e.g., Sales).
  4. Choose the field(s) that define the row identity (e.g., Region).

[Insert screenshot of the Rows to Columns Pivot configuration with Month as the column header field and Sales as the value field]


16.5 Summary

NoteKey Concepts at a Glance
Operation Tool in Prep Key Configuration
Handle nulls Clean step Profile null bar > Exclude / Replace with
Fix data types Clean step Data type icon above field
Split fields Clean step Split Values icon or SPLIT() calculated field
Fix misspellings Clean step Group and Replace
Join tables Join step Select join type + join clause
Union tables Union step Align matching fields; wildcard for multiple files
Pivot wide to long Pivot step Columns to Rows; rename output fields
Pivot long to wide Pivot step Rows to Columns; specify column header and value fields
TipApplying This in Practice

The most powerful aspect of Tableau Prep is its reproducibility: once you build a flow, every future data refresh is automated. The first time you clean a dataset manually in Prep takes 30–60 minutes. Every subsequent refresh runs in seconds. Build Prep flows for all repeating data preparation tasks, monthly reports, quarterly aggregations, annual data loads, and you will reclaim dozens of hours per year that were previously spent on manual data wrangling.