flowchart TD
A[Messy Source Data] --> B[Clean\nFix quality issues]
B --> C[Combine\nJoin or Union]
C --> D[Reshape\nPivot wide to long]
D --> E[Clean Output\nAnalysis-ready dataset]
B --> B1[Null handling\nData type fixes\nValue standardisation\nField splitting]
C --> C1[Join: key-based merge\nUnion: stacked rows]
D --> D1[Wide to long pivot\nLong to wide pivot]
style A fill:#fce4ec,stroke:#C62828
style E fill:#e8f5e9,stroke:#388E3C
style B fill:#fff9c4,stroke:#F9A825
16 Cleaning, Combining, and Reshaping Data
Building on the Tableau Prep foundations established in Chapter 15, this chapter covers the three most time-intensive data preparation tasks: cleaning messy data (handling nulls, fixing data types, standardising values, splitting fields), combining data from multiple sources (joins and unions in Prep), and reshaping data between wide and long formats (pivots). Each section uses realistic data quality issues drawn from the kinds of datasets analysts encounter in practice.
16.1 Cleaning Data in Tableau Prep
Null values are the most common data quality issue in real-world datasets. A null represents the absence of a value, it is fundamentally different from zero (which is a value) or an empty string (which is a string with no characters).
Null handling strategies in Tableau Prep:
Filter out nulls: In the Clean step, click the null indicator bar at the bottom of a field’s profile and select Exclude Nulls. Use this when null rows are meaningless for the analysis.
Replace nulls with a default value: Click the null bar and select Replace with. Choose the field average, the most common value, or a custom constant (e.g., replace null Region with “Unknown”).
Create a null flag: In a Clean step, add a calculated field:
Code
# Flag records with missing customer data
IF ISNULL([Customer Name]) THEN "Missing" ELSE "Present" END- Propagate values forward (fill down): When a field has values only on the first row of a group (e.g., an order header repeated across detail rows), use the Fill Down feature in the Clean step to propagate the non-null value to all subsequent rows in the group.
[Insert screenshot of the Prep Profile pane showing the null indicator bar at the bottom of a field, with the context menu showing “Exclude Nulls” and “Replace with” options]
Incorrect data types prevent Tableau from performing calculations correctly. Common type errors:
| Wrong Type | Correct Type | Symptom in Tableau Desktop |
|---|---|---|
| Dates stored as strings | Date | Cannot use date functions; no time-series charts |
| Numbers stored as strings | Number | Cannot SUM; appears as a dimension |
| Boolean stored as string | Boolean (“True”/“False”) | Cannot use logical operators efficiently |
| Zip codes stored as integers | String | Sums postal codes; geographic encoding fails |
Fixing types in Prep: Click the data type icon above any field in the Profile pane. Prep shows a type mismatch warning (red indicator) when it detects values that cannot be parsed as the current type, hover over the warning to see which values are problematic.
A field containing multiple pieces of information in a single value (e.g., “New York, NY” or “P001-Electronics-Blue”) should be split into separate fields for proper analysis.
Automatic split: 1. Click the field header in the Profile pane. 2. Click the Split Values icon (two arrows pointing apart). 3. Prep detects the delimiter automatically and creates new fields for each part. 4. Rename the resulting fields appropriately.
Custom split with a calculated field:
Code
# Extract the product category from a code like "P001-Electronics-Blue"
SPLIT([Product Code], '-', 2)SPLIT takes three arguments: the field, the delimiter, and the token number (which part to extract, counting from 1).
Tableau Prep supports calculated fields within Clean steps. While Prep’s calculation language is a subset of Tableau Desktop’s (LOD expressions and table calculations are not available), it covers most row-level transformation needs:
Code
# Standardise region names
IF CONTAINS(UPPER([Region]), "NORTH") THEN "North"
ELSEIF CONTAINS(UPPER([Region]), "SOUTH") THEN "South"
ELSEIF CONTAINS(UPPER([Region]), "EAST") THEN "East"
ELSEIF CONTAINS(UPPER([Region]), "WEST") THEN "West"
ELSE "Other"
ENDCode
# Parse a date stored as text (format: DD/MM/YYYY)
DATE(MID([Date String], 7, 4) + "-" +
MID([Date String], 4, 2) + "-" +
LEFT([Date String], 2))Add a calculated field via: Clean step > click the “+” next to any field header > Create Calculated Field.
16.2 Combining Data: Joins in Tableau Prep
Tableau Prep’s Join step is more powerful than Tableau Desktop’s Data Source Page join configuration in two key ways:
- Join clauses with calculations, You can join on a calculated expression, not just on raw field equality (e.g., join on the first three characters of a product code).
- Visual join result profiling, Prep shows a Venn diagram of matched vs. unmatched rows, making it immediately clear how many rows are lost or duplicated by the join.
Adding a Join step: 1. Bring two Input (or Clean) branches into the flow. 2. Drag one branch onto the other. Prep asks whether you want to create a Join or Union. 3. Select Join. 4. In the Join configuration panel, set the join type (Inner, Left, Right, Full Outer) and the join clause (the key fields from each table). 5. The Venn diagram shows the count of: left-only rows, matched rows, and right-only rows. 6. Review the Profile pane output to verify the joined data is correct.
[Insert screenshot of the Tableau Prep Join step configuration showing a Left join Venn diagram, the join clause configuration, and the matched/unmatched row counts]
After configuring a join in Prep, always check the Venn diagram proportions: - Too many left-only rows: Suggests your join key has data quality issues on the right table (misspellings, extra spaces, type mismatch). - Unexpected right-only rows: Your right table has keys that do not exist in the left table, possible data integrity issue. - Result row count much higher than input: You have a fan trap (one-to-many join), verify this is intentional.
The Venn diagram makes these issues visible in seconds, a diagnostic that would require careful SQL count queries without Prep.
16.3 Combining Data: Unions in Tableau Prep
A union vertically stacks rows from two or more tables that share the same structure (same fields, same data types). Use a union when: - You have the same data split across multiple files (e.g., monthly sales files: Jan.csv, Feb.csv, Mar.csv). - You have historical data in one table and current data in another. - You want to combine survey responses from multiple regions stored in separate tables.
Adding a Union step: 1. Drag a second branch (Input step) onto an existing step. Prep asks: Join or Union? 2. Select Union. 3. In the Union configuration panel, Prep displays each table as a column and each field as a row. Fields that match by name and type are automatically aligned; mismatched fields are shown in red. 4. For mismatched fields: drag field names onto each other to manually align them, or click Merge Mismatched Fields for Prep to attempt automatic alignment. 5. An [Table Name] field is automatically added to identify which source table each row came from.
[Insert screenshot of the Union step configuration showing two tables aligned side by side, with matching fields connected and a mismatched field shown in red]
When you have many files with the same structure (e.g., 12 monthly sales files), use a wildcard union to combine them automatically without adding each file as a separate Input step.
- On the Input step configuration panel, tick Multiple Files.
- Select Wildcard (automatic).
- Set the wildcard pattern (e.g.,
Sales_*.csvto match all files beginning with “Sales_”). - Prep automatically includes all matching files as a union. New files added in future will be included automatically when the flow next runs.
[Insert screenshot of the Input step settings showing the Wildcard Union configuration with the file pattern “Sales_*.csv” and the matched file list]
16.4 Reshaping Data: Pivots
Most analysis in Tableau (and in most analytical tools) requires data in long format: one row per observation, one column per variable. However, data frequently arrives in wide format from spreadsheets and legacy systems, where each time period or category is a separate column.
Example wide format (crosstab layout from Excel):
| Region | Jan | Feb | Mar | Apr |
|---|---|---|---|---|
| East | 12000 | 14000 | 11000 | 15000 |
| West | 9000 | 10500 | 8000 | 11000 |
Same data in long format (what Tableau needs):
| Region | Month | Sales |
|---|---|---|
| East | Jan | 12000 |
| East | Feb | 14000 |
| … | … | … |
The wide format has 2 rows; the long format has 8 rows. You cannot build a time-series line chart in Tableau from the wide format because “Month” does not exist as a single dimension.
- Add a Pivot step after your Clean step.
- In the Pivot configuration, select Columns to Rows (wide to long).
- Drag all the column headers you want to pivot (e.g., Jan, Feb, Mar, Apr) to the Pivoted Fields area.
- Prep creates two new fields:
Pivot1 Names(containing the old column headers: “Jan”, “Feb”, etc.) andPivot1 Values(containing the cell values). - Rename
Pivot1 Namesto “Month” andPivot1 Valuesto “Sales”. - Optionally add a Clean step after the Pivot to convert the “Month” string to a proper date.
[Insert screenshot of the Tableau Prep Pivot step configuration showing the wide column headers being dragged to the Pivoted Fields area, and the resulting long-format data preview]
The reverse pivot, converting long format to wide, is occasionally needed when preparing data for a tool that requires wide format, or when creating summary tables.
- Add a Pivot step and select Rows to Columns.
- Choose the field whose distinct values will become column headers (e.g.,
Month→ Jan, Feb, Mar columns). - Choose the field that provides the values for those columns (e.g.,
Sales). - Choose the field(s) that define the row identity (e.g.,
Region).
[Insert screenshot of the Rows to Columns Pivot configuration with Month as the column header field and Sales as the value field]
16.5 Summary
| Operation | Tool in Prep | Key Configuration |
|---|---|---|
| Handle nulls | Clean step | Profile null bar > Exclude / Replace with |
| Fix data types | Clean step | Data type icon above field |
| Split fields | Clean step | Split Values icon or SPLIT() calculated field |
| Fix misspellings | Clean step | Group and Replace |
| Join tables | Join step | Select join type + join clause |
| Union tables | Union step | Align matching fields; wildcard for multiple files |
| Pivot wide to long | Pivot step | Columns to Rows; rename output fields |
| Pivot long to wide | Pivot step | Rows to Columns; specify column header and value fields |
The most powerful aspect of Tableau Prep is its reproducibility: once you build a flow, every future data refresh is automated. The first time you clean a dataset manually in Prep takes 30–60 minutes. Every subsequent refresh runs in seconds. Build Prep flows for all repeating data preparation tasks, monthly reports, quarterly aggregations, annual data loads, and you will reclaim dozens of hours per year that were previously spent on manual data wrangling.