5  Descriptive Plots and Box Plot

NoteWhat This Chapter Covers

Descriptive plots are visualizations designed to summarise the statistical properties of a dataset: its centre, spread, skewness, and the presence of outliers. In this chapter, you will learn how to read and build the full family of descriptive visualizations, including strip plots, dot plots, the box-and-whisker plot (box plot), and violin plots, in Tableau. You will also understand the descriptive statistics that underpin these charts, so that you can interpret them with the precision of a statistician and communicate them clearly to a business audience.

flowchart LR
    A[Descriptive <br> Statistics] --> B[Centre <br> Mean, Median, Mode]
    A --> C[Spread <br> Range, IQR, Std Dev]
    A --> D[Shape <br> Skewness, Kurtosis]
    A --> E[Outliers]
    B --> F[Box plot shows median]
    C --> G[Box plot shows IQR and whiskers]
    D --> H[Violin plot shows full distribution shape]
    E --> I[Box plot marks individual outlier points]
    style A fill:#e3f2fd,stroke:#1976D2
    style F fill:#e8f5e9,stroke:#388E3C
    style G fill:#e8f5e9,stroke:#388E3C
    style H fill:#e8f5e9,stroke:#388E3C
    style I fill:#e8f5e9,stroke:#388E3C


5.1 Descriptive Statistics: The Foundation

NoteThe Five-Number Summary

Every box plot is built from the five-number summary, five statistical values that together describe the distribution of a dataset:

  1. Minimum, The smallest observed value (excluding outliers in a box plot).
  2. First quartile (Q1), The value below which 25% of observations fall.
  3. Median (Q2), The middle value; 50% of observations fall below and 50% above.
  4. Third quartile (Q3), The value below which 75% of observations fall.
  5. Maximum, The largest observed value (excluding outliers).

The Interquartile Range (IQR) is Q3 minus Q1, and represents the spread of the middle 50% of data. It is the primary measure of spread in a box plot.

Statistic Formula What It Tells You
Mean Sum / Count Average value; sensitive to outliers
Median Middle value (Q2) Central tendency; robust to outliers
IQR Q3 - Q1 Spread of the middle 50%
Std Deviation sqrt(variance) Average distance from the mean
Range Max - Min Total spread; very sensitive to outliers
TipWhen to Use Median vs. Mean

Use the median as your central tendency measure whenever the data is skewed or contains outliers, for example, income distributions, house prices, or sales figures in datasets with a few very large orders. The median is “resistant” to extreme values in a way that the mean is not. Use the mean when the data is approximately symmetric and there are no outliers, for example, height or temperature measurements.


5.2 Strip Plots and Dot Plots

NoteThe Strip Plot: Showing Every Data Point

A strip plot (also called a one-dimensional scatter plot or jitter plot) displays every individual observation as a mark along a single axis. Unlike aggregated charts, the strip plot preserves the full distribution of raw data, making it ideal for small to medium datasets (up to ~500 points) where you want to see every individual value.

When to use a strip plot: - You want to show the full distribution without aggregation. - You have a small enough dataset that individual points are meaningful (not just noise). - You want to compare the distribution of a measure across several categories.

Jitter is a small random offset added to marks to prevent overplotting (marks landing exactly on top of each other). In Tableau, jitter is added by creating a calculated field that adds a small random offset.

NoteHow To: Creating a Strip Plot in Tableau
  1. Drag a continuous measure (e.g., Profit) to the Columns shelf.
  2. Drag a categorical dimension (e.g., Category) to the Rows shelf.
  3. Change the mark type to Circle in the Marks card.
  4. To add jitter: create a calculated field named Jitter:
Code
# Tableau calculated field for jitter
(RANDOM() - 0.5) * 0.4
  1. Drag Jitter to the Rows shelf alongside Category. Right-click the Jitter pill and select Dimension to prevent aggregation.
  2. Right-click the Jitter axis and select Edit Axis > Fixed range: -0.5 to 0.5 and untick Show Header to hide the axis.
  3. Reduce mark opacity to 50–70% to reveal overlapping points.

[Insert screenshot of a jittered strip plot comparing Profit distributions across three Categories, with overlapping circles at reduced opacity]

NoteDot Plots: Aggregated Alternatives to Bar Charts

A dot plot uses a single dot to mark the value of a measure for each category, plotted along a common axis. It conveys the same information as a bar chart but with less ink and more visual clarity when the number of categories is large. Dot plots are especially effective for comparison when the values are similar in magnitude, as the positional differences are easier to discern than length differences on a bar chart.

In Tableau, create a dot plot by building a bar chart, then changing the mark type to Circle and removing the bar (keeping only the circle mark).


5.3 The Box Plot

NoteAnatomy of a Box-and-Whisker Plot

The box plot (also called a box-and-whisker plot) is the most information-dense single-variable chart in statistics. It displays five summary statistics and outliers simultaneously in a compact visual form.

Box plot anatomy: - Box, Spans from Q1 to Q3 (the IQR). Represents the middle 50% of the data. - Centre line, The median (Q2) inside the box. - Whiskers, Lines extending from the box to the smallest and largest values within 1.5 × IQR of the box edges. - Outlier points, Individual marks for observations beyond the whisker boundaries (i.e., values more than 1.5 × IQR from Q1 or Q3).

[Insert diagram showing a labelled box plot with Q1, Q2, Q3, IQR, lower whisker, upper whisker, and outlier points clearly marked]

NoteHow To: Creating a Box Plot in Tableau
  1. Drag Profit to the Columns shelf.
  2. Drag Sub-Category to the Rows shelf.
  3. In the Show Me panel, click Box-and-Whisker Plot. If Show Me is not visible, go to Presentation > Show Me.
  4. Alternatively, change the mark type to Box Plot in the Marks card dropdown.
  5. To show individual data points alongside the boxes: hold Ctrl and drag Profit to the Columns shelf a second time to create a dual-axis view. Set the first axis to Box Plot and the second to Circle (jittered), then synchronise the axes.
  6. To colour boxes by a dimension: drag Region to the Colour shelf.

[Insert screenshot of a Tableau box plot showing Profit distributions across Sub-Categories, with outlier points visible and boxes coloured by Region]

NoteReading a Box Plot: What the Shape Tells You
Box Plot Shape Statistical Interpretation Business Interpretation Example
Tall box (large IQR) High variability in the middle 50% Inconsistent profit margins, investigate pricing
Short box (small IQR) Low variability; consistent values Predictable performance
Median near Q1 (bottom of box) Right-skewed distribution Most orders are low-profit; a few large outliers pull the median down
Median near Q3 (top of box) Left-skewed distribution Most orders are high-profit; a few loss-making outliers
Many outlier points Heavy-tailed distribution Extreme values are common, investigate individually
Long lower whisker Left tail extends far Some orders have extremely negative profit
WarningThe Box Plot Hides Multi-Modal Distributions

A critical limitation of the box plot is that it cannot reveal whether a distribution is bimodal (two peaks) or multi-modal (multiple peaks). A dataset with two distinct clusters can produce a box plot that looks identical to a unimodal distribution. When you suspect multi-modality, supplement the box plot with a histogram or violin plot to verify the shape of the distribution.


5.4 Violin Plots: Distribution Shape Revealed

NoteWhat a Violin Plot Adds to the Box Plot

A violin plot combines a box plot with a mirrored kernel density estimate (KDE), a smoothed curve showing the probability density of the data at each value. The width of the violin at any point is proportional to the number of observations near that value.

What violin plots show that box plots cannot: - Multi-modal distributions (multiple peaks appear as bulges in the violin). - The concentration of data within the IQR (a fat violin mid-box means many observations are concentrated there). - The smoothness or roughness of the distribution.

Tableau limitation: Tableau does not have a native violin plot option. Violin plots can be approximated in Tableau using a combination of a density mark type and a mirrored dual-axis layout, or by using Tableau Extensions from the Exchange. For full violin plots, consider building in R (using ggplot2 with geom_violin) and embedding the output as an image or Extension.

NoteHow To: Approximating a Violin Plot in Tableau Using Density Marks
  1. Drag Profit to the Columns shelf (continuous).
  2. Drag Category to the Rows shelf.
  3. Change the mark type to Density in the Marks card dropdown.
  4. Tableau renders a density heatmap along the axis showing where values are concentrated.
  5. To create a more traditional violin shape: duplicate the axis (dual axis) and mirror the density mark by adjusting the axis range.

[Insert screenshot of a Tableau density mark view approximating violin plot shapes for Profit across three Categories]


5.5 Comparing Descriptive Plot Types

NoteWhich Descriptive Plot Should You Use?
Chart Type Shows Individual Points Shows Distribution Shape Shows Five-Number Summary Best Dataset Size
Strip plot Yes Implied No Small (< 500 rows)
Dot plot No (aggregated) No No Any
Box plot Outliers only Partially Yes Any (summary form)
Violin plot No Yes (KDE) Partially Medium to large
Histogram No Yes (binned) No Medium to large

Combined approach: The most informative descriptive view combines a box plot (for the five-number summary) with a jittered strip plot overlay (for individual points). This hybrid reveals both the summary statistics and the raw distribution in a single chart, Tableau makes this easy with the dual-axis technique described earlier.


5.6 Summary

NoteKey Concepts at a Glance
Concept Definition Relevance in Tableau
Five-number summary Min, Q1, Median, Q3, Max Automatically computed in box plot
IQR Q3 - Q1; spread of middle 50% Box height in box plot
Whiskers 1.5 × IQR above Q3 and below Q1 Box plot arm length
Outlier Value beyond whisker boundary Plotted as individual circle marks
Strip plot All individual data points on one axis Dual axis overlay on box plot
Violin plot KDE + box plot combined Use density marks as approximation
TipApplying This in Practice

In business reporting, box plots are underused compared to bar charts, even though they convey far more information. The next time a stakeholder asks “what does our profit look like by product category?”, present both a bar chart (for the mean/total) and a box plot (for the distribution). The box plot will almost always reveal something the bar chart hides, an outlier customer, a skewed distribution, or an inconsistent sub-category that deserves investigation.