30  Tableau Proficiency and Data Connections

30.1 Why Tableau Matters

Tableau turned interactive visualisation from a specialist craft into a mainstream business skill.

Tableau is one of the two dominant business-intelligence platforms in the world (Power BI being the other), and it has shaped how a generation of analysts thinks about visualisation. Its drag-and-drop grammar of graphics, its Show Me recommender, and its rich library of interactive features lowered the barrier to genuinely good charts and made dashboards a routine output of the analytics function rather than a bespoke project.

The standard practitioner references on Tableau are Tableau Your Data! by Murray (Daniel G. Murray, 2013) and Practical Tableau by Sleeper (Ryan Sleeper, 2018), both of which assume the reader is going to spend many hours inside the tool and want to spend them well.

For this book — a visualisation-focused analytics text — Tableau and the chapters that follow on Power BI and Excel are where the principles of Modules 2 and 3 become concrete tools the reader can use. This chapter covers the foundation: what Tableau is, how data connects to it, and which choices the analyst makes before any chart is built.

30.2 The Tableau Product Family

TipTableau Products at a Glance
Product Role Typical User
Tableau Desktop Authoring environment for workbooks and dashboards Analyst on a workstation
Tableau Prep Builder Visual data-prep tool for cleaning and shaping data before analysis Data analyst, light data engineer
Tableau Server On-premise deployment of dashboards to a corporate audience Enterprise BI platform
Tableau Cloud (formerly Tableau Online) Tableau-hosted SaaS deployment Smaller organisations or those preferring a managed service
Tableau Public Free version that publishes to the public Tableau Public site Personal portfolio, public-data work
Tableau Reader Free desktop app that opens packaged workbooks (.twbx) without editing Read-only consumers
Tableau Mobile iOS / Android app for consuming dashboards Field staff, executives
Tableau Pulse Metric monitoring with AI-driven insights Subscribers to KPI feeds

For an analyst, the day-to-day pair is Tableau Desktop for authoring and Tableau Cloud / Server for sharing the result. The other products fit specific needs.

30.3 Touring the Tableau Desktop Interface

flowchart TD
    T["Tableau Desktop"]
    T --> S["Start Page<br>Connect to data and<br>open recent workbooks"]
    T --> DS["Data Source Page<br>Configure connections,<br>joins, extracts, filters"]
    T --> WS["Worksheet<br>Build a single visualisation<br>using the shelves and Show Me"]
    T --> DB["Dashboard<br>Compose multiple worksheets<br>into one interactive view"]
    T --> ST["Story<br>Sequence of dashboards<br>or worksheets with narrative"]
    style T fill:#e3f2fd,stroke:#1976D2
    style S fill:#fce4ec,stroke:#AD1457
    style DS fill:#fff3e0,stroke:#EF6C00
    style WS fill:#fff8e1,stroke:#F9A825
    style DB fill:#e8f5e9,stroke:#388E3C
    style ST fill:#ede7f6,stroke:#4527A0

A Tableau Desktop workbook moves between five spaces:

  • Start Page: Connect to a new data source, open a recent workbook, or browse sample data.
  • Data Source Page: Configure how the workbook talks to its data — connections, joins, relationships, extracts, source filters.
  • Worksheet: The single-chart space. The analyst drags fields onto the Rows, Columns, Pages, Filters, Marks, and Color/Size/Detail shelves to build a visualisation.
  • Dashboard: Composes multiple worksheets into one interactive view. Adds objects (text, image, container, vertical/horizontal layout) and dashboard Actions (filter, highlight, URL, parameter).
  • Story: A sequenced narrative built from worksheets and dashboards.

Inside a worksheet, the analyst’s primary controls are:

  • Data pane (left) — fields from the data source, split into Dimensions and Measures.
  • Analytics pane (left) — drag-and-drop trend lines, forecasts, reference lines, box plots, clustering.
  • Shelves (top and centre) — Columns, Rows, Pages, Filters, Marks (Color, Size, Label, Detail, Tooltip).
  • Show Me (top-right) — chart-type recommender that highlights compatible chart types based on the fields selected.

30.4 Data Connection Types

flowchart TD
    C["Tableau<br>Connectors"]
    C --> F["Files<br>Excel, CSV, JSON,<br>PDF, Spatial,<br>Statistical Files"]
    C --> S["Servers (RDBMS)<br>SQL Server, Oracle,<br>MySQL, PostgreSQL,<br>SAP HANA, Teradata"]
    C --> Cl["Cloud Platforms<br>Snowflake, BigQuery,<br>Redshift, Databricks,<br>Salesforce, Google Sheets"]
    C --> W["Web Data Connector<br>Custom JavaScript-based<br>connectors for APIs"]
    C --> P["Tableau Server<br>and Cloud<br>Published data sources"]
    style C fill:#e3f2fd,stroke:#1976D2
    style F fill:#e8f5e9,stroke:#388E3C
    style S fill:#fff8e1,stroke:#F9A825
    style Cl fill:#fff3e0,stroke:#EF6C00
    style W fill:#fce4ec,stroke:#AD1457
    style P fill:#ede7f6,stroke:#4527A0

Tableau ships with native connectors across five families:

  • Files: Excel (.xlsx, .xls), text (.csv, .tsv), JSON, PDF tables, statistical files (.sas7bdat, .sav, .rda), spatial files (.shp, .kml, .geojson).
  • Servers (Relational Databases): SQL Server, Oracle, MySQL, PostgreSQL, SAP HANA, Teradata, IBM Db2, Vertica, Sybase.
  • Cloud Platforms: Snowflake, Google BigQuery, Amazon Redshift, Databricks, Azure Synapse, Salesforce, Google Sheets, Box, OneDrive, Dropbox.
  • Web Data Connector (WDC): A JavaScript-based interface for connecting to APIs and other web sources that lack native connectors.
  • Tableau Server / Cloud Published Data Sources: Reuse a curated, governed data source already published by another Tableau author.

The connector families differ in performance, security, and freshness behaviour. Files are simplest but rarely live; cloud and server connectors support live and refreshed-extract patterns.

30.5 Live versus Extract Connections

Every Tableau connection is either Live or Extract. The choice is one of the most consequential decisions in any workbook and shapes performance, freshness, and infrastructure cost.

TipLive versus Extract
Property Live Extract
Data location Stays at the source Cached as a .hyper file
Freshness Always current at query time Stale until refreshed
Query speed Limited by source database performance Very fast (Tableau’s columnar Hyper engine)
Source load Every dashboard interaction queries source Source queried only on extract refresh
Aggregations supported Whatever the source supports Full Tableau analytical engine
Best for Real-time operational dashboards on fast warehouses Most analytical and presentation use cases
Refresh management None needed Schedule daily, hourly, or on demand

The default in modern practice is to use Extract unless the dashboard genuinely needs sub-minute freshness or the source data set is too large to extract. Cloud warehouses (Snowflake, BigQuery, Databricks) have made Live connections far more practical than they once were, but the Hyper extract engine still outperforms most live connections for interactive dashboards.

30.6 Joins, Unions, and Relationships

Tableau provides three ways to combine multiple tables: Joins (the classical row-level merge), Unions (stacking tables vertically), and Relationships (logical links introduced in Tableau 2020.2 that defer the join until query time).

30.6.1 Joins (Physical)

Tableau’s Joins behave like SQL joins:

  • Inner Join: Returns rows where the join key matches in both tables.
  • Left Join: All rows from the left table, with matched rows from the right.
  • Right Join: All rows from the right table, with matched rows from the left.
  • Full Outer Join: All rows from both tables, with nulls where no match exists.
  • Cross Join (cartesian, used sparingly): Every row from one table paired with every row from the other.

To configure: drag a second table next to the first on the Data Source page, click the Venn-diagram icon, and choose the join type and key. The result is a single physical, denormalised table at the level of the join.

A common pitfall: an inner join silently drops rows that fail the match, sometimes hiding millions of unmatched records. Always inspect the row count before and after the join.

30.6.2 Unions

A Union stacks tables vertically — useful for combining monthly files, regional extracts, or partitioned tables that share the same schema. Tableau supports both manual unions (drag tables into the Union dialog) and wildcard unions (specify a file pattern such as Sales_2026_*.csv; new files matching the pattern are picked up automatically on refresh).

30.6.3 Relationships (Logical)

Introduced in Tableau 2020.2, Relationships are logical links between tables. Unlike physical joins, relationships do not flatten the tables; they preserve each table’s grain and apply the appropriate join only at query time, based on which fields the analyst uses in the visual.

Advantages:

  • Correct aggregation at multiple levels of detail: A customer and order relationship returns the right customer count and the right order count without duplicating either.
  • Query-time join optimisation: Tableau picks the join type and direction based on the visualisation context.
  • Less risk of fan-out and chasm-trap errors: The classic many-to-many problem in physical joins.

Most modern Tableau workbooks should use Relationships by default and fall back to Joins only when a true denormalised flat table is needed.

30.7 Data Blending

Data Blending is a worksheet-level technique for combining two completely separate data sources at query time. It is suited to scenarios where joins and relationships are infeasible — for example, combining a Tableau Server published data source with a local Excel file.

How it works:

  • The first data source is the primary; subsequent ones are secondaries.
  • Tableau matches rows by linked fields — fields with matching names in both sources.
  • Aggregation happens in each source separately; only aggregated results are blended.

Limitations:

  • Only aggregated values from secondary sources are accessible.
  • Performance depends on each source individually.
  • Some calculation patterns (LOD, table calculations on secondary fields) do not work inside blends.

Use Blending only when Joins or Relationships cannot solve the problem. In modern Tableau, Relationships have replaced most of Blending’s traditional use cases.

30.8 Custom SQL, Initial SQL, and Data Source Filters

Three advanced features on the Data Source page:

  • Custom SQL: Write a SQL query directly against the source as the basis for the connection. Useful for complex pre-aggregation or for sources where the wizard’s options are limited. Use sparingly — Custom SQL forces Tableau to wrap the query in subqueries for every visual, which can hurt performance.

  • Initial SQL: Code that runs once when the connection opens, before any user query. Useful for setting session variables (locale, time zone, optimiser hints) on the source.

  • Data Source Filters: Filters applied at the data source level rather than per worksheet. They restrict the entire workbook’s view of the data — useful for security (only this region’s data is exposed) and for performance (the user never sees pre-2025 data).

30.9 Connecting to Tableau Server and Tableau Cloud

Tableau publishes data sources, workbooks, and dashboards to Tableau Server (on-premise) or Tableau Cloud (SaaS). Once published, the data sources can be reused across many workbooks, ensuring consistent definitions and centralised refresh schedules.

The pattern in mature Tableau deployments:

  1. A certified data source is built, validated, and published by a steward.
  2. Workbooks across the firm connect to the published data source rather than to raw tables.
  3. Refresh schedules and security are managed centrally on Server / Cloud.
  4. Dashboards built on the data source are themselves published, and consumers see them through the web or the Tableau Mobile app.

This is the operational form of the conformed-dimension principle from Chapter 24: a single trusted source serves every dashboard.

30.10 Best Practices for Data Connections

  • Choose Extract by default for analytical workbooks; switch to Live only when freshness genuinely demands it.
  • Filter at the data source level to keep the workbook lean and the queries fast.
  • Prefer Relationships over Joins unless a flat physical table is genuinely needed.
  • Reuse published data sources rather than re-building them in every workbook.
  • Document the connection — a short text annotation on the Data Source page describing the source, refresh schedule, and any filters helps the next analyst.
  • Avoid Custom SQL for routine work; use it only when the wizard cannot express the query.
  • Schedule extract refreshes at off-peak hours; coordinate with the source-system team if needed.
  • Monitor extract size — beyond a few hundred million rows, partitioning or live connection may be more efficient than a single extract.

30.11 Common Pitfalls

  • Inner-Join Silent Drop: An inner join with mismatched keys silently loses rows; always count before and after.
  • Cartesian Cross-Join: Joining on the wrong key (or no key) produces a many-to-many cartesian result and an enormous flat table.
  • Live Connection on Slow Source: Building an interactive dashboard on a live connection to a slow data warehouse; every click triggers a multi-second query.
  • Extract Without Refresh: Forgetting to schedule the extract refresh; the dashboard silently shows last week’s data.
  • Custom SQL Everywhere: Wrapping every workbook in Custom SQL because that is what the analyst is comfortable with; performance degrades and maintenance becomes painful.
  • Multiple Disconnected Extracts: A single workbook with three separate extracts that should have been one published data source.
  • Joining Pre-Aggregated and Detail Tables: Joining a daily-aggregated sales table with detail-level customer table produces fan-out errors; relationships handle this correctly.
  • Ignoring the Data Source Page: Skipping straight to the worksheet without setting up filters, types, and aliases on the Data Source page; problems compound downstream.
  • Forgetting Time Zones and Locales: A live connection that returns dates in UTC but the dashboard renders in IST without conversion.

30.12 Illustrative Cases

A Mid-Size Bank’s Single Published Data Source

A retail bank publishes a Customer 360 data source on Tableau Server, certified by the data office, refreshed nightly. Every customer-related dashboard across marketing, risk, and operations connects to it. Definitions agree across functions and refresh load is centralised on one schedule rather than many.

A Logistics Firm’s Live Connection

A logistics control tower needs sub-minute freshness on vehicle location and route status. The team builds a live Tableau connection on a Snowflake warehouse with materialised views for the heavy aggregations. Every dashboard interaction queries Snowflake; the design works because the cloud warehouse handles the load and the dashboard’s question genuinely requires near-real-time data.

A Failed Custom SQL Migration

A team migrates a Power BI report to Tableau by translating each Power Query M step into a Custom SQL block. Each visual now wraps the Custom SQL in a subquery, queries are slow, and small changes require editing SQL. The redesign moves to a published, certified data source with Tableau Relationships and recovers the performance.


30.13 Hands-On Exercise: Connecting to Multiple Data Sources in Tableau

Aim: Practice the four most common Tableau connection patterns — file, server (database), cloud, and published data source — by combining four sources into one workbook for Yuvijen Stores Pvt Ltd and shaping them into an analytical-ready dataset.

Scenario: Build a single Tableau workbook that integrates:

  • A CSV file of sales transactions
  • A SQLite database of the customer master
  • A Google Sheet of marketing campaigns (or, if Google not available, an Excel file)
  • A Tableau Server / Cloud published data source of macroeconomic indicators (or a fall-back local Excel file representing the same)

Deliverable: A .twb workbook with a clean Data Source page, demonstrating Joins, a Relationship, an Extract, and a Data Source Filter.

30.13.1 Step 1 — Connect to the CSV File

  1. Open Tableau Desktop. The Start Page shows the Connect pane on the left.
  2. Under To a File, click Text File and select sales.csv.
  3. The Data Source Page opens with the file as a single table.
  4. Drag the table to the canvas if it is not already there.
  5. In the data grid below the canvas, verify that each column has the correct data type indicator (Abc for text, # for numeric, calendar for date). Fix any wrong inference by clicking the indicator.

30.13.2 Step 2 — Add a Database Connection

  1. On the Data Source Page, in the left-hand Connections pane, click the Add link.
  2. Choose More → Other Databases (JDBC) or pick the specific database type (here, SQLite).
  3. For SQLite: paste the file path. For SQL Server / MySQL / PostgreSQL: provide host, port, database, credentials.
  4. Drag the customers table to the canvas, next to sales.
  5. Tableau opens the Relationship dialog by default. Connect sales[customer_id] to customers[customer_id].

The two tables are now in a relationship, not a physical join. The Data Source canvas shows a noodle line between them.

30.13.3 Step 3 — Add a Google Sheet (or Excel)

  1. Click Add again on the Connections pane.
  2. Choose Google Sheets, sign in, select the campaigns sheet. (Alternative: choose Microsoft Excel and pick a local campaigns.xlsx.)
  3. Drag the campaigns table to the canvas.
  4. Relate sales[campaign_id] to campaigns[campaign_id].

The Data Source page now has three connected tables.

30.13.4 Step 4 — Add a Published Data Source

  1. Click AddTableau Server (or Tableau Cloud).
  2. Sign in to the Server / Cloud instance. The Search for Data dialog appears.
  3. Choose the Macro Indicators published data source.
  4. Drag it to the canvas. Tableau treats it like any other source for relationship purposes.
  5. Relate the new source’s state column to customers[state].

If a published data source is unavailable for the exercise, substitute a local Excel file with macro indicators.

30.13.5 Step 5 — Convert to an Extract

  1. In the upper-right of the Data Source page, switch the connection type from Live to Extract.
  2. Click Edit to configure the extract — choose All rows or a sample, and add filters and aggregation as needed for performance.
  3. Save the extract as a .hyper file.
  4. Tableau materialises the data into the Hyper engine for fast subsequent queries.

For Server-published sources, the extract is configured on Server / Cloud; the workbook simply references the published refresh schedule.

30.13.6 Step 6 — Apply a Data Source Filter

  1. Top-right of the Data Source page, Filters → Add.
  2. Add a filter on sales[date] to restrict the workbook to the last 24 months.
  3. Add another on customers[is_active] to show only active customers.
  4. The filters apply globally — every worksheet in the workbook sees only the filtered data.

Data Source Filters are the right place for security restrictions and for performance restrictions that should apply to every visual.

30.13.7 Step 7 — Build a Quick Verification Worksheet

To prove the connections and relationships work:

  1. Click Sheet 1 at the bottom-left.
  2. Drag customers[state] to Rows, sales[amount] to Columns (it auto-aggregates as SUM).
  3. Drag campaigns[campaign_name] to Color. The bar chart shows revenue by state, broken down by campaign.
  4. Add the macro indicator (e.g., gdp_per_capita_state) to Tooltip to verify the published data source is accessible.

If any visual returns blank or unexpected zeros, the relationship configuration is the most likely culprit; revisit the Data Source page and verify the join keys.

30.13.8 Step 8 — Connect to the Visualisation Layer

The hands-on shows that the Data Source page is part of the visualisation work, not a separate engineering step. Decisions made there — extract versus live, Relationships versus Joins, source filters, types and aliases — directly determine what charts Tableau can render quickly and correctly.

A workbook that produces beautiful charts on top of a poorly configured data source is a workbook that breaks under any change in scope. Time spent on the Data Source page repays itself many times over in the worksheet and dashboard layers.

TipFiles and Screen Recordings

Tableau workbook (yuvijen-data-connections.twbx), the source CSV, SQLite database, Google Sheet (or Excel substitute), the published data source reference, and screen recordings of each connection step will be embedded here.


Summary

Concept Description
Foundations
Why Tableau Matters Tableau turned interactive visualisation from a specialist craft into a mainstream business skill
Tableau Product Family
Tableau Desktop Authoring environment for workbooks and dashboards
Tableau Prep Builder Visual data-prep tool for cleaning and shaping data before analysis
Tableau Server On-premise deployment of dashboards to a corporate audience
Tableau Cloud Tableau-hosted SaaS deployment formerly called Tableau Online
Tableau Public Free version that publishes to the public Tableau Public site
Tableau Reader Free desktop app that opens packaged workbooks without editing
Tableau Mobile iOS and Android app for consuming dashboards on the go
Tableau Pulse Metric monitoring with AI-driven insights
The Desktop Interface
Start Page Connect to data and open recent workbooks
Data Source Page Configure connections, joins, relationships, extracts, and filters
Worksheet Single-chart space using shelves and Show Me
Dashboard Composes multiple worksheets into one interactive view
Story Sequenced narrative built from worksheets and dashboards
Data Pane Fields from the data source split into Dimensions and Measures
Analytics Pane Drag-and-drop trend lines, forecasts, reference lines, clustering
Shelves Columns, Rows, Pages, Filters, Marks (Color, Size, Label, Detail, Tooltip)
Show Me Chart-type recommender that highlights compatible chart types
Connection Types
File Connectors Excel, CSV, JSON, PDF, spatial, statistical files
Server Connectors SQL Server, Oracle, MySQL, PostgreSQL, SAP HANA, Teradata
Cloud Connectors Snowflake, BigQuery, Redshift, Databricks, Salesforce, Google Sheets
Web Data Connector JavaScript-based interface for connecting to APIs without native connectors
Published Data Source Reuse a curated, governed data source already published
Live versus Extract
Live Connection Data stays at source; query at every dashboard interaction
Extract Connection Cached as a Hyper file; very fast queries; refreshed on schedule
Hyper Engine Tableau's columnar in-memory query engine
Joins, Unions, and Relationships
Inner Join Rows where the join key matches in both tables
Left Join All rows from the left table with matched rows from the right
Right Join All rows from the right table with matched rows from the left
Full Outer Join All rows from both tables with nulls where no match exists
Cross Join Cartesian product; every row from one paired with every row from the other
Union Stacks tables vertically; combines monthly files or partitioned tables
Wildcard Union Specify a file pattern; new files matching the pattern are picked up automatically
Relationship Logical link between tables that defers join until query time
Data Blending
Data Blending Worksheet-level technique for combining two completely separate data sources at query time
Custom SQL and Filters
Custom SQL Write a SQL query directly against the source as the basis for the connection
Initial SQL Code that runs once when the connection opens, before any user query
Data Source Filter Filter applied at data source level rather than per worksheet
Server and Cloud
Certified Data Source Validated and curated published data source serving multiple workbooks
Best Practices
Extract by Default Choose Extract by default for analytical workbooks; switch to Live only when freshness demands it
Filter at Source Filter at the data source level to keep the workbook lean and queries fast
Prefer Relationships Prefer Relationships over Joins unless a flat physical table is genuinely needed
Reuse Published Sources Reuse published data sources rather than rebuilding them in every workbook
Document the Connection Short text annotation on the Data Source page describing source and refresh
Avoid Custom SQL Routinely Use Custom SQL only when the wizard cannot express the query
Schedule Refreshes Off-Peak Schedule extract refreshes at off-peak hours and coordinate with source teams
Monitor Extract Size Beyond a few hundred million rows, partition or use Live connection
Common Pitfalls
Inner-Join Silent Drop Pitfall of an inner join with mismatched keys silently losing rows
Cartesian Cross-Join Pitfall of joining on the wrong key and producing an enormous flat table
Live on Slow Source Pitfall of building an interactive dashboard on a live connection to a slow warehouse
Extract Without Refresh Pitfall of forgetting to schedule the extract refresh; dashboard shows stale data
Custom SQL Everywhere Pitfall of wrapping every workbook in Custom SQL and degrading performance
Multiple Disconnected Extracts Pitfall of a single workbook with multiple extracts that should be one published source
Pre-Aggregated Joined to Detail Pitfall of joining a daily-aggregated table with a detail table; relationships handle this correctly
Ignoring the Data Source Page Pitfall of skipping straight to the worksheet without configuring the Data Source page
Time Zone and Locale Bugs Pitfall of a live connection returning UTC dates that the dashboard renders without conversion