flowchart LR
A[Scatter Plot <br> Base Form] --> B[Bubble Chart <br> + Size encoding]
A --> C[Connected Scatter <br> + Time dimension]
A --> D[Scatter Plot Matrix <br> All variable pairs]
A --> E[Highlight Table <br> Categorical grid]
B --> F[Three or four <br> variables at once]
C --> G[Temporal change <br> in two measures]
D --> H[Multivariate <br> EDA]
E --> I[Category vs. <br> measure patterns]
style A fill:#e3f2fd,stroke:#1976D2
style B fill:#f3e5f5,stroke:#7B1FA2
style C fill:#fff9c4,stroke:#F9A825
style D fill:#fce4ec,stroke:#C62828
style E fill:#e8f5e9,stroke:#388E3C
3 Extensions of Scatter Plots
The scatter plot is one of the most versatile charts in data visualization because it can reveal relationships, clusters, outliers, and distributions simultaneously. This chapter begins with the anatomy of the basic scatter plot and how to read it, then introduces four extensions that build on the scatter plot foundation: the bubble chart, the connected scatter plot, the scatter plot matrix (SPLOM), and the highlight table. Each extension adds one or more new encoding channels to answer questions that the basic scatter plot cannot answer alone. You will build all five chart types in Tableau step by step using the Sample Superstore dataset.
3.1 The Scatter Plot: Foundations
A scatter plot encodes two continuous quantitative variables, one on the X-axis and one on the Y-axis, and places each observation as a mark at the intersection of its two values. Every mark represents one record or one aggregated group.
The scatter plot answers one fundamental question: is there a relationship between these two variables?
Key components of a scatter plot:
- X-axis: The independent or explanatory variable, the one you expect might influence the other.
- Y-axis: The dependent or response variable, the one you expect might be influenced.
- Each mark: One observation or one aggregated data point.
- Trend line: A fitted line summarising the direction and strength of the overall relationship. Tableau supports linear, exponential, logarithmic, and polynomial fits.
- R-squared value: Displayed alongside the trend line in Tableau. It indicates what proportion of the variation in Y is explained by X. A value of 0 means no relationship; a value of 1 means a perfect relationship.
- Open a new worksheet and connect to the Sample Superstore dataset.
- Drag Sales to the Columns shelf. Tableau places it as a continuous measure on the X-axis.
- Drag Profit to the Rows shelf. Tableau places it on the Y-axis.
- The default view shows one aggregate mark. To disaggregate, drag Sub-Category to the Detail shelf on the Marks card. Each sub-category now appears as a separate mark.
- To add a trend line, open the Analytics pane and drag Trend Line > Linear onto the view.
- Hover over the trend line to see the R-squared value and p-value in the tooltip.
[Insert screenshot of a Tableau scatter plot with Sub-Category on Detail, showing Sales vs. Profit with a linear trend line and its R-squared value visible]
A positive correlation means marks trend from bottom-left to top-right, so higher values of X are associated with higher values of Y. A negative correlation means marks trend from top-left to bottom-right. No correlation means marks are scattered randomly with no discernible direction. A non-linear relationship means marks follow a curve rather than a straight line, and a linear trend line will fit poorly. Check the R-squared value: a value below 0.3 suggests the linear model explains very little of the variation.
A scatter plot showing a strong correlation between two variables does not prove that one causes the other. Both variables may be driven by a third unobserved factor. Always investigate the mechanism behind a correlation before drawing causal conclusions from a chart.
3.2 Identifying Clusters and Outliers
A cluster is a group of marks that are closer to each other than to the rest of the data, suggesting the data naturally segments into groups by customer type, product category, or geographic region. An outlier is a mark that is far from the main body of data, indicating an unusual observation that warrants investigation.
In Tableau, clusters can be identified visually or formally. For visual identification, drag a categorical dimension to the Colour shelf and look for separation between groups. For formal identification, use the Analytics pane Cluster feature, which automatically partitions data into groups using k-means clustering.
- Build a scatter plot with Sales on X and Profit on Y, disaggregated by Sub-Category on the Detail shelf.
- Open the Analytics pane.
- Drag Cluster onto the view. Tableau opens a dialog to set the number of clusters and the measures used for clustering.
- Set the number of clusters to 3 and click OK.
- Tableau colours each mark by its assigned cluster and adds a legend.
- Right-click the Clusters pill in the Marks card and select Describe Clusters to see a statistical summary of each cluster including centroid values and between-cluster variance.
[Insert screenshot of a Tableau scatter plot with three k-means clusters colour-coded, and the Describe Clusters dialog open showing centroid statistics]
3.3 Extensions of the Scatter Plot
The basic scatter plot encodes two continuous variables using position alone. The four extensions introduced in this chapter each add one or more encoding channels to answer questions that position alone cannot address:
- Bubble chart: Adds size as a third encoding channel, allowing a third quantitative variable to be displayed simultaneously alongside the two positional variables.
- Connected scatter plot: Adds time as a path encoding, showing how the relationship between two measures has changed chronologically.
- Scatter plot matrix (SPLOM): Repeats the scatter plot across a grid for every pairwise combination of variables, enabling multivariate exploration in a single view.
- Highlight table: Replaces continuous positional axes with categorical axes and uses colour intensity to encode a measure across a two-dimensional grid.
Each extension is appropriate for a specific type of question. Choosing the wrong extension adds visual complexity without adding analytical value, so the sections below specify clearly when each one applies.
3.4 Bubble Chart
A bubble chart extends the scatter plot by encoding a third quantitative variable as the size of each mark. A fourth variable can optionally be encoded as colour. This makes bubble charts one of the highest-information-density chart types available for displaying multivariate relationships.
When to use a bubble chart:
Use a bubble chart when you want to show the relationship between two measures while also communicating the magnitude of a third measure. A common business example is plotting Profit on the Y-axis against Sales on the X-axis with Quantity Sold encoded as bubble size and Region encoded as colour. This single chart communicates four variables simultaneously.
Limitation to keep in mind:
Human perception of area is less accurate than perception of position. Bubble size should communicate rough magnitude, not precise values. If precise comparison of the third variable is important, display it as a separate bar chart alongside the bubble chart rather than relying on size alone. Too many bubbles, typically more than 40, also makes the chart difficult to read. Use Sub-Category or a similarly compact dimension rather than individual customers or products.
- Start with the scatter plot from the previous section (Sales on X, Profit on Y, Sub-Category on Detail).
- Drag Quantity to the Size shelf on the Marks card. Each mark now scales in area proportional to quantity sold.
- Drag Region to the Colour shelf. Each region is now a distinct colour.
- Click the Size shelf to open the size slider and adjust the range so that small and large bubbles are visually distinguishable without overlapping excessively.
- Right-click any mark and select Mark Label > Always Show to label the largest bubbles by Sub-Category name.
- Give the chart a descriptive title: “Sales, Profit, and Quantity by Sub-Category and Region.”
[Insert screenshot of the completed bubble chart with four variables encoded: X-axis is Sales, Y-axis is Profit, mark size is Quantity, and colour is Region, with Sub-Category labels on the largest marks]
Adding reference lines at the average values of X and Y turns a bubble chart into a quadrant analysis, one of the most practical tools in business analytics. The four quadrants identify: high sales with high profit (strong performers), high sales with low or negative profit (volume without margin), low sales with high profit (niche opportunities), and low sales with low profit (candidates for review). To add reference lines, right-click each axis and select Add Reference Line > Average.
3.5 Connected Scatter Plot
A connected scatter plot joins the marks of a scatter plot with a line in chronological order. This adds a temporal dimension to the two-variable relationship, allowing you to see not only whether a relationship exists between two measures but also how that relationship has evolved over time.
When to use a connected scatter plot:
Use a connected scatter plot when you have two measures recorded at regular time intervals and want to show both the relationship between them and the trajectory of change across time. A typical example is plotting monthly advertising spend on the X-axis against monthly revenue on the Y-axis, connected month by month, to show whether increases in spend are consistently followed by revenue growth or whether the relationship changes at different points in the year.
How to read it:
Follow the line from the earliest point (usually labelled) to the latest. The overall direction of the path shows the temporal trend. The overall shape of the path shows the correlation pattern. A path that moves consistently toward the upper right signals that both measures are increasing together over time.
- Place a measure on Columns (for example, Discount) and a measure on Rows (for example, Profit).
- Drag a date field (for example, Order Date at Month/Year granularity) to the Path shelf on the Marks card.
- Change the mark type to Line in the Marks card dropdown.
- Tableau draws a line connecting the time periods in chronological order.
- Drag the date field also to the Label shelf and configure it to show only the first and last point labels, so the viewer can identify the start and end of the path without cluttering every mark.
[Insert screenshot of a connected scatter plot showing monthly Discount vs. Profit over a two-year period, with the earliest and latest months labelled at the start and end of the path]
3.6 Scatter Plot Matrix
A scatter plot matrix, commonly abbreviated as SPLOM, displays every pairwise combination of variables as a small scatter plot arranged in a grid. It is the most efficient tool for multivariate exploratory data analysis because it allows you to scan all two-variable relationships in a dataset simultaneously and identify which pairs are worth investigating further.
For a dataset with four measures such as Sales, Profit, Discount, and Quantity, the SPLOM produces a 4x4 grid of 16 cells. The diagonal cells are typically blank or used for distribution charts. The off-diagonal cells each show one scatter plot for one pair of variables.
When to use a scatter plot matrix:
Use a SPLOM during exploratory analysis when you have three or more quantitative variables and want to identify which pairs show the strongest correlations before building detailed charts or statistical models. The SPLOM is a screening tool, not a presentation tool. It is appropriate for the analyst’s own EDA workflow and for sharing with technically sophisticated audiences, not for executive dashboards.
- Hold Ctrl and select four measures in the Data pane: Sales, Profit, Discount, and Quantity.
- In the Show Me panel, select Scatter Plot. Tableau automatically constructs a scatter plot matrix.
- Each cell in the matrix shows the relationship between the row measure and the column measure.
- To add trend lines to all cells simultaneously, open the Analytics pane and drag Trend Line onto any single cell. Tableau applies it to all cells in the matrix.
- Drag a categorical dimension such as Category to the Colour shelf to highlight groups across all scatter plots at once.
[Insert screenshot of a Tableau SPLOM with four measures, trend lines in all cells, and Category colour-coded consistently across the grid]
A SPLOM with more than six variables produces 36 cells, most of which will be too small to read on a standard screen. For larger variable sets, first use a correlation heatmap, a highlight table of correlation coefficients, to identify the most interesting variable pairs. Then build individual scatter plots for only those pairs.
3.7 Highlight Table
A highlight table, also called a heatmap, encodes a quantitative measure using colour intensity across a grid formed by two categorical dimensions. It is related to the scatter plot in that it reveals patterns across two dimensions simultaneously, but instead of continuous positional axes, it uses categorical row and column axes. Each cell in the grid shows the value of the measure for one combination of the two categories.
When to use a highlight table:
Use a highlight table when you have two categorical dimensions and one quantitative measure and want to identify which combinations stand out at a glance. A common business example is showing Profit by Sub-Category (rows) and Region (columns). The colour pattern immediately reveals which sub-category and region combinations are most and least profitable without requiring the viewer to read individual numbers.
Choosing the right colour palette:
Use a sequential palette (light to dark in one hue) when all values are positive and you want to show intensity, such as number of orders. Use a diverging palette (two contrasting hues separated by a neutral midpoint) when values range from negative to positive and the midpoint is meaningful, such as profit, where zero is the break-even point. Avoid pure red-green combinations, which are inaccessible to viewers with colour vision deficiency. Use orange-blue or blue-grey diverging palettes instead.
- Drag Region to the Columns shelf.
- Drag Category and Sub-Category to the Rows shelf (nested, with Category as the outer dimension).
- Drag Profit to the Colour shelf. Tableau creates a highlight table automatically.
- Confirm the mark type is set to Square in the Marks card dropdown.
- Click the Colour shelf to open the Edit Colours dialog. Select the Orange-Blue Diverging palette.
- Click Advanced and set the centre of the colour range to 0 so that negative profit is orange and positive profit is blue.
- Drag Profit also to the Label shelf to display the value inside each cell for viewers who need precise numbers.
[Insert screenshot of a Tableau highlight table showing Profit by Sub-Category and Region, with an orange-blue diverging palette centred at zero and profit values labelled inside each cell]
3.8 Summary
| Chart Type | Variables Encoded | Best Use Case |
|---|---|---|
| Basic scatter plot | Two continuous measures | Reveal or rule out a relationship between two variables |
| Bubble chart | Two measures plus size plus colour | Display three or four variables simultaneously |
| Connected scatter plot | Two measures plus time as path | Show how a relationship evolves over time |
| Scatter plot matrix | All pairs of three or more measures | Multivariate EDA to screen for interesting pairs |
| Highlight table | Two categorical dimensions plus one measure | Identify pattern and exception across a grid |
Always start with a basic scatter plot before adding encodings. Each additional encoding channel, size, colour, or path, increases information density but also increases the risk of visual confusion. Add one encoding at a time and ask after each addition whether it makes the chart clearer or more confusing. Keep only the encodings that help the viewer answer the primary analytical question faster.