flowchart TD
T["Tableau Desktop"]
T --> S["Start Page<br>Connect to data and<br>open recent workbooks"]
T --> DS["Data Source Page<br>Configure connections,<br>joins, extracts, filters"]
T --> WS["Worksheet<br>Build a single visualisation<br>using the shelves and Show Me"]
T --> DB["Dashboard<br>Compose multiple worksheets<br>into one interactive view"]
T --> ST["Story<br>Sequence of dashboards<br>or worksheets with narrative"]
style T fill:#e3f2fd,stroke:#1976D2
style S fill:#fce4ec,stroke:#AD1457
style DS fill:#fff3e0,stroke:#EF6C00
style WS fill:#fff8e1,stroke:#F9A825
style DB fill:#e8f5e9,stroke:#388E3C
style ST fill:#ede7f6,stroke:#4527A0
30 Tableau Proficiency and Data Connections
30.1 Why Tableau Matters
Tableau turned interactive visualisation from a specialist craft into a mainstream business skill.
Tableau is one of the two dominant business-intelligence platforms in the world (Power BI being the other), and it has shaped how a generation of analysts thinks about visualisation. Its drag-and-drop grammar of graphics, its Show Me recommender, and its rich library of interactive features lowered the barrier to genuinely good charts and made dashboards a routine output of the analytics function rather than a bespoke project.
The standard practitioner references on Tableau are Tableau Your Data! by Murray (Daniel G. Murray, 2013) and Practical Tableau by Sleeper (Ryan Sleeper, 2018), both of which assume the reader is going to spend many hours inside the tool and want to spend them well.
For this book — a visualisation-focused analytics text — Tableau and the chapters that follow on Power BI and Excel are where the principles of Modules 2 and 3 become concrete tools the reader can use. This chapter covers the foundation: what Tableau is, how data connects to it, and which choices the analyst makes before any chart is built.
30.2 The Tableau Product Family
| Product | Role | Typical User |
|---|---|---|
| Tableau Desktop | Authoring environment for workbooks and dashboards | Analyst on a workstation |
| Tableau Prep Builder | Visual data-prep tool for cleaning and shaping data before analysis | Data analyst, light data engineer |
| Tableau Server | On-premise deployment of dashboards to a corporate audience | Enterprise BI platform |
| Tableau Cloud (formerly Tableau Online) | Tableau-hosted SaaS deployment | Smaller organisations or those preferring a managed service |
| Tableau Public | Free version that publishes to the public Tableau Public site | Personal portfolio, public-data work |
| Tableau Reader | Free desktop app that opens packaged workbooks (.twbx) without editing |
Read-only consumers |
| Tableau Mobile | iOS / Android app for consuming dashboards | Field staff, executives |
| Tableau Pulse | Metric monitoring with AI-driven insights | Subscribers to KPI feeds |
For an analyst, the day-to-day pair is Tableau Desktop for authoring and Tableau Cloud / Server for sharing the result. The other products fit specific needs.
30.3 Touring the Tableau Desktop Interface
A Tableau Desktop workbook moves between five spaces:
- Start Page: Connect to a new data source, open a recent workbook, or browse sample data.
- Data Source Page: Configure how the workbook talks to its data — connections, joins, relationships, extracts, source filters.
- Worksheet: The single-chart space. The analyst drags fields onto the Rows, Columns, Pages, Filters, Marks, and Color/Size/Detail shelves to build a visualisation.
- Dashboard: Composes multiple worksheets into one interactive view. Adds objects (text, image, container, vertical/horizontal layout) and dashboard Actions (filter, highlight, URL, parameter).
- Story: A sequenced narrative built from worksheets and dashboards.
Inside a worksheet, the analyst’s primary controls are:
- Data pane (left) — fields from the data source, split into Dimensions and Measures.
- Analytics pane (left) — drag-and-drop trend lines, forecasts, reference lines, box plots, clustering.
- Shelves (top and centre) — Columns, Rows, Pages, Filters, Marks (Color, Size, Label, Detail, Tooltip).
- Show Me (top-right) — chart-type recommender that highlights compatible chart types based on the fields selected.
30.4 Data Connection Types
flowchart TD
C["Tableau<br>Connectors"]
C --> F["Files<br>Excel, CSV, JSON,<br>PDF, Spatial,<br>Statistical Files"]
C --> S["Servers (RDBMS)<br>SQL Server, Oracle,<br>MySQL, PostgreSQL,<br>SAP HANA, Teradata"]
C --> Cl["Cloud Platforms<br>Snowflake, BigQuery,<br>Redshift, Databricks,<br>Salesforce, Google Sheets"]
C --> W["Web Data Connector<br>Custom JavaScript-based<br>connectors for APIs"]
C --> P["Tableau Server<br>and Cloud<br>Published data sources"]
style C fill:#e3f2fd,stroke:#1976D2
style F fill:#e8f5e9,stroke:#388E3C
style S fill:#fff8e1,stroke:#F9A825
style Cl fill:#fff3e0,stroke:#EF6C00
style W fill:#fce4ec,stroke:#AD1457
style P fill:#ede7f6,stroke:#4527A0
Tableau ships with native connectors across five families:
-
Files: Excel (
.xlsx,.xls), text (.csv,.tsv), JSON, PDF tables, statistical files (.sas7bdat,.sav,.rda), spatial files (.shp,.kml,.geojson). - Servers (Relational Databases): SQL Server, Oracle, MySQL, PostgreSQL, SAP HANA, Teradata, IBM Db2, Vertica, Sybase.
- Cloud Platforms: Snowflake, Google BigQuery, Amazon Redshift, Databricks, Azure Synapse, Salesforce, Google Sheets, Box, OneDrive, Dropbox.
- Web Data Connector (WDC): A JavaScript-based interface for connecting to APIs and other web sources that lack native connectors.
- Tableau Server / Cloud Published Data Sources: Reuse a curated, governed data source already published by another Tableau author.
The connector families differ in performance, security, and freshness behaviour. Files are simplest but rarely live; cloud and server connectors support live and refreshed-extract patterns.
30.5 Live versus Extract Connections
Every Tableau connection is either Live or Extract. The choice is one of the most consequential decisions in any workbook and shapes performance, freshness, and infrastructure cost.
| Property | Live | Extract |
|---|---|---|
| Data location | Stays at the source | Cached as a .hyper file |
| Freshness | Always current at query time | Stale until refreshed |
| Query speed | Limited by source database performance | Very fast (Tableau’s columnar Hyper engine) |
| Source load | Every dashboard interaction queries source | Source queried only on extract refresh |
| Aggregations supported | Whatever the source supports | Full Tableau analytical engine |
| Best for | Real-time operational dashboards on fast warehouses | Most analytical and presentation use cases |
| Refresh management | None needed | Schedule daily, hourly, or on demand |
The default in modern practice is to use Extract unless the dashboard genuinely needs sub-minute freshness or the source data set is too large to extract. Cloud warehouses (Snowflake, BigQuery, Databricks) have made Live connections far more practical than they once were, but the Hyper extract engine still outperforms most live connections for interactive dashboards.
30.6 Joins, Unions, and Relationships
Tableau provides three ways to combine multiple tables: Joins (the classical row-level merge), Unions (stacking tables vertically), and Relationships (logical links introduced in Tableau 2020.2 that defer the join until query time).
30.6.1 Joins (Physical)
Tableau’s Joins behave like SQL joins:
- Inner Join: Returns rows where the join key matches in both tables.
- Left Join: All rows from the left table, with matched rows from the right.
- Right Join: All rows from the right table, with matched rows from the left.
- Full Outer Join: All rows from both tables, with nulls where no match exists.
- Cross Join (cartesian, used sparingly): Every row from one table paired with every row from the other.
To configure: drag a second table next to the first on the Data Source page, click the Venn-diagram icon, and choose the join type and key. The result is a single physical, denormalised table at the level of the join.
A common pitfall: an inner join silently drops rows that fail the match, sometimes hiding millions of unmatched records. Always inspect the row count before and after the join.
30.6.2 Unions
A Union stacks tables vertically — useful for combining monthly files, regional extracts, or partitioned tables that share the same schema. Tableau supports both manual unions (drag tables into the Union dialog) and wildcard unions (specify a file pattern such as Sales_2026_*.csv; new files matching the pattern are picked up automatically on refresh).
30.6.3 Relationships (Logical)
Introduced in Tableau 2020.2, Relationships are logical links between tables. Unlike physical joins, relationships do not flatten the tables; they preserve each table’s grain and apply the appropriate join only at query time, based on which fields the analyst uses in the visual.
Advantages:
- Correct aggregation at multiple levels of detail: A customer and order relationship returns the right customer count and the right order count without duplicating either.
- Query-time join optimisation: Tableau picks the join type and direction based on the visualisation context.
- Less risk of fan-out and chasm-trap errors: The classic many-to-many problem in physical joins.
Most modern Tableau workbooks should use Relationships by default and fall back to Joins only when a true denormalised flat table is needed.
30.7 Data Blending
Data Blending is a worksheet-level technique for combining two completely separate data sources at query time. It is suited to scenarios where joins and relationships are infeasible — for example, combining a Tableau Server published data source with a local Excel file.
How it works:
- The first data source is the primary; subsequent ones are secondaries.
- Tableau matches rows by linked fields — fields with matching names in both sources.
- Aggregation happens in each source separately; only aggregated results are blended.
Limitations:
- Only aggregated values from secondary sources are accessible.
- Performance depends on each source individually.
- Some calculation patterns (LOD, table calculations on secondary fields) do not work inside blends.
Use Blending only when Joins or Relationships cannot solve the problem. In modern Tableau, Relationships have replaced most of Blending’s traditional use cases.
30.8 Custom SQL, Initial SQL, and Data Source Filters
Three advanced features on the Data Source page:
Custom SQL: Write a SQL query directly against the source as the basis for the connection. Useful for complex pre-aggregation or for sources where the wizard’s options are limited. Use sparingly — Custom SQL forces Tableau to wrap the query in subqueries for every visual, which can hurt performance.
Initial SQL: Code that runs once when the connection opens, before any user query. Useful for setting session variables (locale, time zone, optimiser hints) on the source.
Data Source Filters: Filters applied at the data source level rather than per worksheet. They restrict the entire workbook’s view of the data — useful for security (only this region’s data is exposed) and for performance (the user never sees pre-2025 data).
30.9 Connecting to Tableau Server and Tableau Cloud
Tableau publishes data sources, workbooks, and dashboards to Tableau Server (on-premise) or Tableau Cloud (SaaS). Once published, the data sources can be reused across many workbooks, ensuring consistent definitions and centralised refresh schedules.
The pattern in mature Tableau deployments:
- A certified data source is built, validated, and published by a steward.
- Workbooks across the firm connect to the published data source rather than to raw tables.
- Refresh schedules and security are managed centrally on Server / Cloud.
- Dashboards built on the data source are themselves published, and consumers see them through the web or the Tableau Mobile app.
This is the operational form of the conformed-dimension principle from Chapter 24: a single trusted source serves every dashboard.
30.10 Best Practices for Data Connections
- Choose Extract by default for analytical workbooks; switch to Live only when freshness genuinely demands it.
- Filter at the data source level to keep the workbook lean and the queries fast.
- Prefer Relationships over Joins unless a flat physical table is genuinely needed.
- Reuse published data sources rather than re-building them in every workbook.
- Document the connection — a short text annotation on the Data Source page describing the source, refresh schedule, and any filters helps the next analyst.
- Avoid Custom SQL for routine work; use it only when the wizard cannot express the query.
- Schedule extract refreshes at off-peak hours; coordinate with the source-system team if needed.
- Monitor extract size — beyond a few hundred million rows, partitioning or live connection may be more efficient than a single extract.
30.11 Common Pitfalls
- Inner-Join Silent Drop: An inner join with mismatched keys silently loses rows; always count before and after.
- Cartesian Cross-Join: Joining on the wrong key (or no key) produces a many-to-many cartesian result and an enormous flat table.
- Live Connection on Slow Source: Building an interactive dashboard on a live connection to a slow data warehouse; every click triggers a multi-second query.
- Extract Without Refresh: Forgetting to schedule the extract refresh; the dashboard silently shows last week’s data.
- Custom SQL Everywhere: Wrapping every workbook in Custom SQL because that is what the analyst is comfortable with; performance degrades and maintenance becomes painful.
- Multiple Disconnected Extracts: A single workbook with three separate extracts that should have been one published data source.
- Joining Pre-Aggregated and Detail Tables: Joining a daily-aggregated sales table with detail-level customer table produces fan-out errors; relationships handle this correctly.
- Ignoring the Data Source Page: Skipping straight to the worksheet without setting up filters, types, and aliases on the Data Source page; problems compound downstream.
- Forgetting Time Zones and Locales: A live connection that returns dates in UTC but the dashboard renders in IST without conversion.
30.12 Illustrative Cases
A Mid-Size Bank’s Single Published Data Source
A retail bank publishes a Customer 360 data source on Tableau Server, certified by the data office, refreshed nightly. Every customer-related dashboard across marketing, risk, and operations connects to it. Definitions agree across functions and refresh load is centralised on one schedule rather than many.
A Logistics Firm’s Live Connection
A logistics control tower needs sub-minute freshness on vehicle location and route status. The team builds a live Tableau connection on a Snowflake warehouse with materialised views for the heavy aggregations. Every dashboard interaction queries Snowflake; the design works because the cloud warehouse handles the load and the dashboard’s question genuinely requires near-real-time data.
A Failed Custom SQL Migration
A team migrates a Power BI report to Tableau by translating each Power Query M step into a Custom SQL block. Each visual now wraps the Custom SQL in a subquery, queries are slow, and small changes require editing SQL. The redesign moves to a published, certified data source with Tableau Relationships and recovers the performance.
30.13 Hands-On Exercise: Connecting to Multiple Data Sources in Tableau
Aim: Practice the four most common Tableau connection patterns — file, server (database), cloud, and published data source — by combining four sources into one workbook for Yuvijen Stores Pvt Ltd and shaping them into an analytical-ready dataset.
Scenario: Build a single Tableau workbook that integrates:
- A CSV file of sales transactions
- A SQLite database of the customer master
- A Google Sheet of marketing campaigns (or, if Google not available, an Excel file)
- A Tableau Server / Cloud published data source of macroeconomic indicators (or a fall-back local Excel file representing the same)
Deliverable: A .twb workbook with a clean Data Source page, demonstrating Joins, a Relationship, an Extract, and a Data Source Filter.
30.13.1 Step 1 — Connect to the CSV File
- Open Tableau Desktop. The Start Page shows the Connect pane on the left.
- Under To a File, click Text File and select
sales.csv. - The Data Source Page opens with the file as a single table.
- Drag the table to the canvas if it is not already there.
- In the data grid below the canvas, verify that each column has the correct data type indicator (Abc for text, # for numeric, calendar for date). Fix any wrong inference by clicking the indicator.
30.13.2 Step 2 — Add a Database Connection
- On the Data Source Page, in the left-hand Connections pane, click the Add link.
- Choose More → Other Databases (JDBC) or pick the specific database type (here, SQLite).
- For SQLite: paste the file path. For SQL Server / MySQL / PostgreSQL: provide host, port, database, credentials.
- Drag the
customerstable to the canvas, next tosales. - Tableau opens the Relationship dialog by default. Connect
sales[customer_id]tocustomers[customer_id].
The two tables are now in a relationship, not a physical join. The Data Source canvas shows a noodle line between them.
30.13.3 Step 3 — Add a Google Sheet (or Excel)
- Click Add again on the Connections pane.
- Choose Google Sheets, sign in, select the campaigns sheet. (Alternative: choose Microsoft Excel and pick a local
campaigns.xlsx.) - Drag the
campaignstable to the canvas. - Relate
sales[campaign_id]tocampaigns[campaign_id].
The Data Source page now has three connected tables.
30.13.4 Step 4 — Add a Published Data Source
- Click Add → Tableau Server (or Tableau Cloud).
- Sign in to the Server / Cloud instance. The Search for Data dialog appears.
- Choose the Macro Indicators published data source.
- Drag it to the canvas. Tableau treats it like any other source for relationship purposes.
- Relate the new source’s
statecolumn tocustomers[state].
If a published data source is unavailable for the exercise, substitute a local Excel file with macro indicators.
30.13.5 Step 5 — Convert to an Extract
- In the upper-right of the Data Source page, switch the connection type from Live to Extract.
- Click Edit to configure the extract — choose All rows or a sample, and add filters and aggregation as needed for performance.
- Save the extract as a
.hyperfile. - Tableau materialises the data into the Hyper engine for fast subsequent queries.
For Server-published sources, the extract is configured on Server / Cloud; the workbook simply references the published refresh schedule.
30.13.6 Step 6 — Apply a Data Source Filter
- Top-right of the Data Source page, Filters → Add.
- Add a filter on
sales[date]to restrict the workbook to the last 24 months. - Add another on
customers[is_active]to show only active customers. - The filters apply globally — every worksheet in the workbook sees only the filtered data.
Data Source Filters are the right place for security restrictions and for performance restrictions that should apply to every visual.
30.13.7 Step 7 — Build a Quick Verification Worksheet
To prove the connections and relationships work:
- Click Sheet 1 at the bottom-left.
- Drag
customers[state]to Rows,sales[amount]to Columns (it auto-aggregates as SUM). - Drag
campaigns[campaign_name]to Color. The bar chart shows revenue by state, broken down by campaign. - Add the macro indicator (e.g., gdp_per_capita_state) to Tooltip to verify the published data source is accessible.
If any visual returns blank or unexpected zeros, the relationship configuration is the most likely culprit; revisit the Data Source page and verify the join keys.
30.13.8 Step 8 — Connect to the Visualisation Layer
The hands-on shows that the Data Source page is part of the visualisation work, not a separate engineering step. Decisions made there — extract versus live, Relationships versus Joins, source filters, types and aliases — directly determine what charts Tableau can render quickly and correctly.
A workbook that produces beautiful charts on top of a poorly configured data source is a workbook that breaks under any change in scope. Time spent on the Data Source page repays itself many times over in the worksheet and dashboard layers.
Tableau workbook (yuvijen-data-connections.twbx), the source CSV, SQLite database, Google Sheet (or Excel substitute), the published data source reference, and screen recordings of each connection step will be embedded here.
Summary
| Concept | Description |
|---|---|
| Foundations | |
| Why Tableau Matters | Tableau turned interactive visualisation from a specialist craft into a mainstream business skill |
| Tableau Product Family | |
| Tableau Desktop | Authoring environment for workbooks and dashboards |
| Tableau Prep Builder | Visual data-prep tool for cleaning and shaping data before analysis |
| Tableau Server | On-premise deployment of dashboards to a corporate audience |
| Tableau Cloud | Tableau-hosted SaaS deployment formerly called Tableau Online |
| Tableau Public | Free version that publishes to the public Tableau Public site |
| Tableau Reader | Free desktop app that opens packaged workbooks without editing |
| Tableau Mobile | iOS and Android app for consuming dashboards on the go |
| Tableau Pulse | Metric monitoring with AI-driven insights |
| The Desktop Interface | |
| Start Page | Connect to data and open recent workbooks |
| Data Source Page | Configure connections, joins, relationships, extracts, and filters |
| Worksheet | Single-chart space using shelves and Show Me |
| Dashboard | Composes multiple worksheets into one interactive view |
| Story | Sequenced narrative built from worksheets and dashboards |
| Data Pane | Fields from the data source split into Dimensions and Measures |
| Analytics Pane | Drag-and-drop trend lines, forecasts, reference lines, clustering |
| Shelves | Columns, Rows, Pages, Filters, Marks (Color, Size, Label, Detail, Tooltip) |
| Show Me | Chart-type recommender that highlights compatible chart types |
| Connection Types | |
| File Connectors | Excel, CSV, JSON, PDF, spatial, statistical files |
| Server Connectors | SQL Server, Oracle, MySQL, PostgreSQL, SAP HANA, Teradata |
| Cloud Connectors | Snowflake, BigQuery, Redshift, Databricks, Salesforce, Google Sheets |
| Web Data Connector | JavaScript-based interface for connecting to APIs without native connectors |
| Published Data Source | Reuse a curated, governed data source already published |
| Live versus Extract | |
| Live Connection | Data stays at source; query at every dashboard interaction |
| Extract Connection | Cached as a Hyper file; very fast queries; refreshed on schedule |
| Hyper Engine | Tableau's columnar in-memory query engine |
| Joins, Unions, and Relationships | |
| Inner Join | Rows where the join key matches in both tables |
| Left Join | All rows from the left table with matched rows from the right |
| Right Join | All rows from the right table with matched rows from the left |
| Full Outer Join | All rows from both tables with nulls where no match exists |
| Cross Join | Cartesian product; every row from one paired with every row from the other |
| Union | Stacks tables vertically; combines monthly files or partitioned tables |
| Wildcard Union | Specify a file pattern; new files matching the pattern are picked up automatically |
| Relationship | Logical link between tables that defers join until query time |
| Data Blending | |
| Data Blending | Worksheet-level technique for combining two completely separate data sources at query time |
| Custom SQL and Filters | |
| Custom SQL | Write a SQL query directly against the source as the basis for the connection |
| Initial SQL | Code that runs once when the connection opens, before any user query |
| Data Source Filter | Filter applied at data source level rather than per worksheet |
| Server and Cloud | |
| Certified Data Source | Validated and curated published data source serving multiple workbooks |
| Best Practices | |
| Extract by Default | Choose Extract by default for analytical workbooks; switch to Live only when freshness demands it |
| Filter at Source | Filter at the data source level to keep the workbook lean and queries fast |
| Prefer Relationships | Prefer Relationships over Joins unless a flat physical table is genuinely needed |
| Reuse Published Sources | Reuse published data sources rather than rebuilding them in every workbook |
| Document the Connection | Short text annotation on the Data Source page describing source and refresh |
| Avoid Custom SQL Routinely | Use Custom SQL only when the wizard cannot express the query |
| Schedule Refreshes Off-Peak | Schedule extract refreshes at off-peak hours and coordinate with source teams |
| Monitor Extract Size | Beyond a few hundred million rows, partition or use Live connection |
| Common Pitfalls | |
| Inner-Join Silent Drop | Pitfall of an inner join with mismatched keys silently losing rows |
| Cartesian Cross-Join | Pitfall of joining on the wrong key and producing an enormous flat table |
| Live on Slow Source | Pitfall of building an interactive dashboard on a live connection to a slow warehouse |
| Extract Without Refresh | Pitfall of forgetting to schedule the extract refresh; dashboard shows stale data |
| Custom SQL Everywhere | Pitfall of wrapping every workbook in Custom SQL and degrading performance |
| Multiple Disconnected Extracts | Pitfall of a single workbook with multiple extracts that should be one published source |
| Pre-Aggregated Joined to Detail | Pitfall of joining a daily-aggregated table with a detail table; relationships handle this correctly |
| Ignoring the Data Source Page | Pitfall of skipping straight to the worksheet without configuring the Data Source page |
| Time Zone and Locale Bugs | Pitfall of a live connection returning UTC dates that the dashboard renders without conversion |