17  Sampling, Previewing, and Sharing Data

NoteWhat This Chapter Covers

Working with large datasets introduces two practical challenges: performance during development (previewing large flows is slow) and performance for end users (dashboards on large extracts can be sluggish). This chapter covers Tableau Prep’s sampling options for faster development iteration, data preview and validation techniques to ensure your flow output is correct before publishing, and the full range of options for sharing data, publishing to Tableau Server, exporting to files, and connecting published data sources to Tableau Desktop. This chapter also covers extract optimisation techniques for Tableau Desktop that directly improve dashboard performance.

flowchart LR
    A[Large Source Data] --> B[Sampling\nFast development preview]
    B --> C[Validate\nPreview and check output]
    C --> D{Share How?}
    D --> E[Publish to Server\nor Cloud]
    D --> F[Export to File\n.hyper or .csv]
    D --> G[Save as .twbx\npackaged workbook]
    E --> H[Certified Data Source\nfor the team]
    style A fill:#e3f2fd,stroke:#1976D2
    style D fill:#fff9c4,stroke:#F9A825
    style H fill:#e8f5e9,stroke:#388E3C


17.1 Sampling in Tableau Prep

NoteWhy Sample During Development?

When your source data has millions of rows, running the full flow for each development iteration is impractical, it wastes time and imposes unnecessary load on source databases. Tableau Prep’s sampling options allow you to work with a representative subset during development and run the full flow only when ready to publish.

Sampling options in the Input step:

Sampling Method How It Works Best For
Quick sample (1,000 rows) Returns the first 1,000 rows Fastest option; good for schema exploration
Sample (fixed N rows) Returns exactly N rows from the source Controlled development dataset size
Random sample (%) Returns a random N% of all rows Representative distribution of values
Full data Returns all rows Final validation and production runs

To configure sampling: click the Input step > Settings tab > Sampling dropdown. During development, use a 5–10% random sample. Switch to Full data before running the final output.

TipThe Development Workflow with Sampling

A production Tableau Prep workflow should be developed in three phases:

  1. Schema phase (1,000-row quick sample), Verify field names, data types, and overall structure. Build and test all transformations.
  2. Validation phase (10% random sample), Check value distributions, null counts, and join ratios with a representative subset.
  3. Production run (full data), Run once on the full dataset to produce the output. Schedule for future automated runs.

This three-phase approach can reduce development time by 80% on million-row datasets while ensuring the final output is correct.


17.2 Previewing and Validating Flow Output

NoteValidation Checks Before Publishing

Before publishing a Prep flow output to Tableau Server, perform these five validation checks:

1. Row count check: Compare the output row count against your expectation. If you are unioning 12 monthly files of ~1,000 rows each, the output should be ~12,000 rows. Significant deviations indicate a join or union problem.

2. Null check: Click each field in the output step’s Profile pane. Any field with unexpected nulls (shown as grey bars at the bottom) requires investigation. Compare null percentages before and after each transformation step.

3. Distribution check: For key measures (Sales, Revenue, Profit), verify the distribution shape in the Profile pane matches your expectation. A histogram that looks very different from the previous month’s output signals a data loading issue.

4. Distinct value check: For key dimensions (Region, Category, Customer Segment), click the field in the profile and verify the number of distinct values is correct. An unexpected extra value (e.g., 5 regions instead of 4) often indicates a data quality issue introduced in the source.

5. Date range check: For date fields, hover over the Profile histogram to see the min and max dates. Verify the date range matches the expected period.

NoteUsing the Data Grid for Row-Level Validation

The Data Grid (bottom panel of Prep Builder) shows actual row-level data. Use it to: - Spot-check specific records that appear unusual in the profile. - Verify that calculated fields produce the expected values for specific rows. - Identify rows where a join produced unexpected nulls.

Filtering in the data grid: Click any bar in the Profile pane to filter the data grid to only rows matching that value. For example, click the “null” bar in the Region field to see all rows with a null region, then trace which input file or join step produced them.


17.3 Sharing Data from Tableau Prep

NotePublishing to Tableau Server or Cloud

Publishing the output of a Tableau Prep flow to Tableau Server or Cloud creates a published data source, a reusable, versioned, centrally governed dataset that any Tableau Desktop user in the organisation can connect to.

Steps to publish: 1. In the Output step configuration, select Publish as a Data Source. 2. Enter a name for the published data source (use business-friendly naming: “FY2024 Annual Sales Summary” not “output_v3_final_FINAL”). 3. Select the Tableau Server/Cloud project (folder) where it will be published. 4. Set the Schedule for automatic refresh (daily, weekly, or on a custom schedule). 5. Set the Permissions, who can connect to this data source. 6. Click Run Flow to execute and publish.

After publishing, the data source appears in Tableau Server under the specified project. Any Desktop user with access can connect to it directly from the Start Page.

[Insert screenshot of the Tableau Prep Output step configured to publish to a Tableau Server project, with name, project, and schedule settings visible]

NoteExporting to Files

When sharing data with non-Tableau users or feeding downstream tools, export to standard file formats:

Format Use Case
.hyper (Tableau extract) Use directly in Tableau Desktop; fast in-memory querying
.csv (comma-separated) Share with Python, R, Excel, or any data tool
.xlsx (Excel) Share with business users who need to review data
Database write-back Write results to a database table (SQL Server, Snowflake, etc.) for downstream ETL

To export: set the Output step type to Save to File, choose the format, and specify the file path.


17.4 Sharing Data from Tableau Desktop

NotePublishing Workbooks to Tableau Server

Tableau Desktop workbooks are published to Tableau Server or Cloud via Server > Publish Workbook (or Ctrl+Shift+S).

Publication options:

Option Description
Include extract Bundles the data extract with the workbook (larger file, faster load)
Live connection Keeps the database connection live (always current data)
Show sheets as tabs Makes each sheet accessible as a tab on the Server web view
Include external files Bundles any images or external resources referenced in the workbook
Set permissions Configure who can view, download, or edit the published workbook
Schedule refresh Set the extract refresh schedule at publish time

After publishing, the workbook is accessible at the Tableau Server URL to all users with view permissions, via web browser or Tableau Mobile, with no Tableau Desktop licence required.

NotePackaging Workbooks for Offline Sharing

When the recipient does not have access to your Tableau Server, share a Packaged Workbook (.twbx). A .twbx file bundles the workbook and all its data extracts into a single file that can be opened with Tableau Desktop or Tableau Reader (a free viewer).

Creating a .twbx: 1. In Tableau Desktop, go to File > Export Packaged Workbook. 2. Choose a save location and click Save. 3. The resulting .twbx file includes the workbook structure and all extract data. 4. Share the file via email, file share, or cloud storage.

Tableau Reader, a free application, allows non-Tableau-Desktop users to open, view, and interact with .twbx files. It does not allow editing.


17.5 Extract Optimisation for Performance

NoteMaking Extracts Faster: Six Optimisation Techniques

When a Tableau dashboard is slow to load or respond to filter changes, extract optimisation is usually the first line of defence. These six techniques, applied in Tableau Desktop’s extract settings, can dramatically improve performance:

1. Filter the extract: Include only the date range and dimension values relevant to the workbook. A 3-year extract for a dashboard showing only the last 12 months is unnecessarily large.

2. Aggregate to the view level: If your dashboard shows monthly totals, aggregate the extract to monthly granularity. Storing daily rows when you only need monthly summaries increases file size and query time.

3. Hide unused fields: Any field hidden in the workbook is excluded from the extract. Remove all fields not used in any worksheet to reduce extract size.

4. Extract optimisation (materialise calculations): In the Extract dialog, click Optimise Extract to pre-calculate any calculated fields and store the results in the extract. This trades extract creation time for faster query time.

5. Use integers for high-cardinality string IDs: If you join on a Customer ID that is currently a string, converting it to an integer reduces memory usage and improves join performance.

6. Split large extracts with context filters: For very large extracts used across multiple dashboards, consider creating separate, smaller, purpose-built extracts for each dashboard rather than one large all-purpose extract.

NoteHow To: Configuring Extract Filters in Tableau Desktop
  1. On the Data Source Page, select Extract and click Edit.
  2. In the Extract Data dialog, click Add under Filters.
  3. Select the field to filter on (e.g., Order Date).
  4. Set the filter condition (e.g., Relative Date: Last 2 years).
  5. Click Add again for a second filter (e.g., Region = "East", "West").
  6. Under Aggregation, tick Aggregate data for visible dimensions if you want to pre-aggregate.
  7. Click OK and then trigger the extract refresh.

[Insert screenshot of the Extract Data dialog showing a date filter, an aggregation setting, and the field list with several fields checked for hiding]


17.6 Subscriptions and Data-Driven Alerts

NoteKeeping Stakeholders Informed Automatically

Once workbooks are published to Tableau Server or Cloud, you can set up automated delivery mechanisms so stakeholders receive insights without visiting the server manually.

Subscriptions: Send a scheduled email containing a PNG screenshot of a specified view. Recipients receive the dashboard image directly in their email at a configured time (e.g., every Monday at 8 AM). Set up via: open the published view on Server > Subscribe (envelope icon) > set schedule and recipients.

Data-driven alerts: Trigger an email notification when a specific value crosses a threshold (e.g., “send me an alert when daily sales fall below $10,000”). Set up via: hover over a mark in the published view > Alert (bell icon) > define the condition and threshold.

Feature Trigger Output
Subscription Schedule (time-based) Screenshot image in email
Data-driven alert Condition (value-based) Text notification when threshold crossed

17.7 Summary

NoteKey Concepts at a Glance
Topic Key Feature Access Point
Sampling in Prep Quick sample / Random % / Full data Input step > Settings > Sampling
Validation Row count, null check, distribution check Output step Profile pane
Publish to Server Certified reusable data source Output step > Publish as Data Source
Export to file .hyper, .csv, .xlsx Output step > Save to File
Publish workbook Share with Server/Cloud users Desktop: Server > Publish Workbook
.twbx packaging Offline sharing Desktop: File > Export Packaged Workbook
Extract filters Reduce extract size Data Source Page > Extract > Edit > Filters
Subscriptions Scheduled email of view screenshot Published view > Subscribe
Data-driven alerts Email when value crosses threshold Published view > Alert
TipApplying This in Practice

The most overlooked aspect of the Tableau workflow is the last mile: sharing results in a way that actually reaches the decision-makers who need them. A beautifully built dashboard that sits unread on a server has zero business value. Set up subscriptions for your most important dashboards so that key stakeholders receive the insights automatically every Monday morning, before they even think to check the server. This habit transforms Tableau from a tool you use to a service your organisation relies on.