24 Iterating Loops Over Data Structures
The previous chapter introduced the loop forms. This one shows how to walk over R’s main containers, vectors, lists, matrices, data frames, using both hand-written for loops and the apply family (apply, sapply, lapply, vapply, mapply). You’ll see why the apply functions are usually preferred, when a for loop still wins, and how to choose the right helper for the shape of the data you’re processing.
24.1 A common shape, iterate, transform, collect
Almost every loop in data work has the same shape:
- step through a container,
- compute something for each element,
- collect the results into a new structure.
R offers two ways to express this shape:
-
Explicit: a
forloop into a pre-allocated output. - Implicit: an apply function that bundles iteration + collection into one call.
We’ll see both, but in idiomatic R the apply family is the default.
24.2 Iterating over vectors
A for loop works element-by-element:
The vectorised one-liner does the same thing in one line:
When the operation is per-element and already vectorised, don’t loop. Use a loop only when each step depends on the previous one or has a side effect.
24.3 Iterating over lists
Lists hold heterogeneous elements, so a per-element transform is a real iteration job. Here’s the loop form:
The same with sapply() is a single line:
sapply() walks the list, applies mean() to each element, and simplifies the result to a named numeric vector. Names are preserved automatically.
24.4 The apply family at a glance
| Function | Input | Output | Use when |
|---|---|---|---|
lapply(x, f) |
vector / list | list (always) | safe default, even outputs vary in shape |
sapply(x, f) |
vector / list | vector / matrix if all results have the same shape; otherwise list | quick interactive use |
vapply(x, f, FUN.VALUE) |
vector / list | vector / matrix matching FUN.VALUE
|
production code, type-safe |
apply(m, MARGIN, f) |
matrix / data frame | vector / matrix | collapse rows or columns |
mapply(f, …) |
several vectors | vector / matrix | iterate over parallel arguments |
Map(f, …) |
several lists | list | like mapply but always a list |
Three of these, lapply, sapply, vapply, do the same job and differ only in what they return. Pick by how predictable the output is.
24.5 lapply, always returns a list
lapply() is the safe default. It returns a list of the same length as the input, regardless of what the function returns.
A list back means no surprises, lapply never tries to be clever about merging the outputs.
24.6 sapply, simplify when possible
sapply() calls lapply() and then tries to simplify:
- if every result is a single value → returns a vector
- if every result is the same length > 1 → returns a matrix
- otherwise → falls back to a list (just like
lapply)
Convenient interactively, risky in scripts: if one element happens to return a different shape, your code’s output type silently changes.
24.7 vapply, type-safe simplify
vapply() is sapply() plus a contract: you state up front what one result should look like, and R errors if any iteration disagrees. Use it when the type matters.
The FUN.VALUE = numeric(1) template says “every result must be a length-1 numeric.” If any iteration returned, say, an integer or a vector, vapply() would stop with a clear error instead of silently shifting type.
24.8 apply, for matrices and data frames
apply() walks along one dimension of a matrix and collapses the other. The MARGIN argument is 1 for rows, 2 for columns.
For the common cases, sums and means of rows or columns, the dedicated helpers rowSums(), colSums(), rowMeans(), colMeans() are faster and clearer. Reach for apply() when the function isn’t one of those.
apply() also works on data frames, but it coerces them to matrices first, so all columns must be of compatible type, otherwise everything becomes character. For data frames, prefer column-wise iteration with lapply()/sapply().
24.9 Iterating over a data frame’s columns
Because a data frame is a list of columns, lapply() and sapply() walk over its columns by default:
To filter to numeric columns first:
24.10 Iterating over rows of a data frame
Row-wise iteration in base R is unusual, most analyses are column-wise. When you need it, two patterns:
By index with a for loop:
With apply(df, 1, …), but remember the matrix-coercion gotcha: every column will be turned into character if any column is non-numeric.
For serious row-wise work in modern R, use dplyr::rowwise() or split the frame with split() then lapply().
24.11 mapply and Map, parallel iteration
mapply() is sapply() with multiple inputs walked in parallel, index 1 of every argument, then index 2, and so on.
For the same operation, the vectorised prices * qty is shorter, mapply() shines when the per-element function does something genuinely non-vectorisable.
Map() is mapply() without simplification, it always returns a list, the way lapply() does for one input.
24.12 Anonymous functions
You don’t need to name a function to pass it to an apply call. Two equivalent shorthands:
The \(x) … form is just sugar for function(x) … and reads cleanly inside an apply call.
24.13 Worked example, exam summary by subject
Five students, three subjects each. Compute per-subject mean and standard deviation, classify each subject as “tight” or “wide” based on the standard deviation, and label every individual score as Pass/Fail.
Three different iteration patterns in one example:
-
sapply()to collapse each column to a single number, -
ifelse()over the resulting vector to derive a label, -
lapply()to apply a per-column transform that returns a vector of the same length.
That toolbox handles the vast majority of descriptive-analytics work.
24.14 Summary
| Concept | Description |
|---|---|
| Explicit Iteration | |
| vectorised arithmetic | Apply an operation to a whole vector at once, the first tool to try |
| for loop on container | Hand-written loop over a vector or list, used when steps depend on prior ones |
| Apply Family for Lists | |
| lapply() | Walks a list and returns a list of the same length, safe default |
| sapply() | Like lapply but simplifies to a vector or matrix when shapes agree |
| vapply() | Type-safe simplify, requires a FUN.VALUE template and errors on mismatch |
| Matrix and Frame Iteration | |
| apply() | Collapse a matrix along rows (MARGIN=1) or columns (MARGIN=2) |
| rowSums/colMeans family | Fast dedicated helpers for the most common row/column reductions |
| mapply() and Map() | Walk several inputs in parallel, mapply simplifies, Map always returns a list |
| Function Shorthand and Pattern | |
| anonymous function \(x) | Inline function shorthand, readable inside apply-family calls in R 4.1+ |
| iterate, transform, collect | The universal loop shape, step through, compute, collect into a new structure |