20 Factors: Creation, Levels, and Reordering
A factor is R’s data type for categorical variables, values drawn from a fixed, known set of categories such as "Low" / "Medium" / "High" or "Pass" / "Fail". Internally a factor stores integers with a label table called levels, which makes it both memory-efficient and aware of order. This chapter covers creating factors, inspecting and renaming levels, ordered factors, and changing the level order so summaries and plots come out the way you want.
20.1 Why a separate type for categories?
A column of grades stored as text is just letters with no inherent order, "A", "B", "C" sort alphabetically by accident. A factor lets you state explicitly: “these are the only valid categories, and this is their order.” That information then drives:
- which categories
summary()andtable()count (including those with zero observations), - the order categories appear in plots and group-by output,
- modelling code that needs categorical predictors with a known reference level.
Notice the printout: the values appear without quotes and a Levels: line shows the category list.
20.2 Creating factors
factor() accepts a vector and infers levels by sorting the unique values alphabetically.
That alphabetical default is rarely what you want. Specify levels = to take control.
Pass labels = to rename categories at the same time:
levels become NA
If a value in your data isn’t listed in levels, R silently turns it into NA. Use unique() first to be sure your level list covers everything.
The XL becomes <NA>, a useful safety net when you want to flag unknown categories, but a trap if you didn’t intend it.
20.3 Inspecting factors
as.integer() exposes the factor’s secret: each value is really an integer pointing into the levels vector. That is why factors are so cheap and so fast for grouping operations.
20.4 Adding, renaming, and dropping levels
Renaming all levels at once:
Adding a level that may not appear in the data yet, useful for plots that should always reserve space for an empty category:
Dropping unused levels after a filter:
droplevels() is the cleanup function to know.
20.5 Ordered factors
Some categorical variables have a natural order, Low < Medium < High, or grades D < C < B < A. ordered = TRUE records that order so comparisons work.
Use ordered factors for measurement scales and severity grades, but stick with regular factors for categories that have no inherent order (region, department, colour). Ordered factors change how some modelling functions treat the variable, so don’t reach for them by default.
20.6 Reordering levels
The level order, not the alphabetical order of the labels, drives every summary and plot. Three common ways to change it.
By hand with factor(..., levels = ...), re-state the level vector explicitly:
Move one level to the front with relevel(), handy for setting a regression reference category:
By another variable’s value with forcats::fct_reorder(), the easiest way to make a bar chart sort by height. forcats is part of the tidyverse and runs in webr.
Two more forcats helpers worth memorising:
-
fct_infreq(x), order levels by how often each appears (most common first). -
fct_relevel(x, "Foo", "Bar"), push named levels to the front in the order given.
20.7 Worked example, student performance
A small dataset of student grades. We want a frequency table with categories in pedagogical order (F < D < C < B < A), not alphabetical, plus the top performer.
Two things to notice. First, cut() is the workhorse for turning a numeric variable into a factor with custom bins. Second, because grade is an ordered factor, max() and == give meaningful answers, exactly what factor() was designed for.
20.8 Summary
| Concept | Description |
|---|---|
| Create | |
| factor() | Create a factor from a vector; specify levels to control category order |
| ordered = TRUE | Record an intrinsic order so comparisons and min/max work |
| cut() | Bin a numeric vector into a factor with custom break points and labels |
| Inspect | |
| levels() | Return the category vector behind a factor |
| nlevels() | Count how many categories a factor has |
| table() | Frequency table including levels with zero observations |
| Rename and Drop | |
| levels(f) <- ... | Rename all levels in one assignment |
| droplevels() | Remove levels that no longer appear in the data after a filter |
| Reorder | |
| factor(f, levels = ...) | Re-state the level order explicitly to change grouping and plot order |
| relevel() | Move one level to the front, useful as a regression reference category |
| fct_reorder() | forcats helper that orders levels by another variable's value |
| fct_infreq() | forcats helper that orders levels by frequency, most common first |