24  Iterating Loops Over Data Structures

NoteWhat this chapter covers

The previous chapter introduced the loop forms. This one shows how to walk over R’s main containers, vectors, lists, matrices, data frames, using both hand-written for loops and the apply family (apply, sapply, lapply, vapply, mapply). You’ll see why the apply functions are usually preferred, when a for loop still wins, and how to choose the right helper for the shape of the data you’re processing.

24.1 A common shape, iterate, transform, collect

Almost every loop in data work has the same shape:

  1. step through a container,
  2. compute something for each element,
  3. collect the results into a new structure.

R offers two ways to express this shape:

  • Explicit: a for loop into a pre-allocated output.
  • Implicit: an apply function that bundles iteration + collection into one call.

We’ll see both, but in idiomatic R the apply family is the default.

24.2 Iterating over vectors

A for loop works element-by-element:

The vectorised one-liner does the same thing in one line:

When the operation is per-element and already vectorised, don’t loop. Use a loop only when each step depends on the previous one or has a side effect.

24.3 Iterating over lists

Lists hold heterogeneous elements, so a per-element transform is a real iteration job. Here’s the loop form:

The same with sapply() is a single line:

sapply() walks the list, applies mean() to each element, and simplifies the result to a named numeric vector. Names are preserved automatically.

24.4 The apply family at a glance

Function Input Output Use when
lapply(x, f) vector / list list (always) safe default, even outputs vary in shape
sapply(x, f) vector / list vector / matrix if all results have the same shape; otherwise list quick interactive use
vapply(x, f, FUN.VALUE) vector / list vector / matrix matching FUN.VALUE production code, type-safe
apply(m, MARGIN, f) matrix / data frame vector / matrix collapse rows or columns
mapply(f, …) several vectors vector / matrix iterate over parallel arguments
Map(f, …) several lists list like mapply but always a list

Three of these, lapply, sapply, vapply, do the same job and differ only in what they return. Pick by how predictable the output is.

24.5 lapply, always returns a list

lapply() is the safe default. It returns a list of the same length as the input, regardless of what the function returns.

A list back means no surprises, lapply never tries to be clever about merging the outputs.

24.6 sapply, simplify when possible

sapply() calls lapply() and then tries to simplify:

  • if every result is a single value → returns a vector
  • if every result is the same length > 1 → returns a matrix
  • otherwise → falls back to a list (just like lapply)

Convenient interactively, risky in scripts: if one element happens to return a different shape, your code’s output type silently changes.

24.7 vapply, type-safe simplify

vapply() is sapply() plus a contract: you state up front what one result should look like, and R errors if any iteration disagrees. Use it when the type matters.

The FUN.VALUE = numeric(1) template says “every result must be a length-1 numeric.” If any iteration returned, say, an integer or a vector, vapply() would stop with a clear error instead of silently shifting type.

TipProduction rule of thumb
  • Quick exploration → sapply().
  • Code that anyone else (or future-you) will run → vapply().
  • Outputs vary in shape → lapply().

24.8 apply, for matrices and data frames

apply() walks along one dimension of a matrix and collapses the other. The MARGIN argument is 1 for rows, 2 for columns.

For the common cases, sums and means of rows or columns, the dedicated helpers rowSums(), colSums(), rowMeans(), colMeans() are faster and clearer. Reach for apply() when the function isn’t one of those.

apply() also works on data frames, but it coerces them to matrices first, so all columns must be of compatible type, otherwise everything becomes character. For data frames, prefer column-wise iteration with lapply()/sapply().

24.9 Iterating over a data frame’s columns

Because a data frame is a list of columns, lapply() and sapply() walk over its columns by default:

To filter to numeric columns first:

24.10 Iterating over rows of a data frame

Row-wise iteration in base R is unusual, most analyses are column-wise. When you need it, two patterns:

By index with a for loop:

With apply(df, 1, …), but remember the matrix-coercion gotcha: every column will be turned into character if any column is non-numeric.

For serious row-wise work in modern R, use dplyr::rowwise() or split the frame with split() then lapply().

24.11 mapply and Map, parallel iteration

mapply() is sapply() with multiple inputs walked in parallel, index 1 of every argument, then index 2, and so on.

For the same operation, the vectorised prices * qty is shorter, mapply() shines when the per-element function does something genuinely non-vectorisable.

Map() is mapply() without simplification, it always returns a list, the way lapply() does for one input.

24.12 Anonymous functions

You don’t need to name a function to pass it to an apply call. Two equivalent shorthands:

The \(x) … form is just sugar for function(x) … and reads cleanly inside an apply call.

24.13 Worked example, exam summary by subject

Five students, three subjects each. Compute per-subject mean and standard deviation, classify each subject as “tight” or “wide” based on the standard deviation, and label every individual score as Pass/Fail.

Three different iteration patterns in one example:

  • sapply() to collapse each column to a single number,
  • ifelse() over the resulting vector to derive a label,
  • lapply() to apply a per-column transform that returns a vector of the same length.

That toolbox handles the vast majority of descriptive-analytics work.

24.14 Summary

Summary of concepts introduced in this chapter
Concept Description
Explicit Iteration
vectorised arithmetic Apply an operation to a whole vector at once, the first tool to try
for loop on container Hand-written loop over a vector or list, used when steps depend on prior ones
Apply Family for Lists
lapply() Walks a list and returns a list of the same length, safe default
sapply() Like lapply but simplifies to a vector or matrix when shapes agree
vapply() Type-safe simplify, requires a FUN.VALUE template and errors on mismatch
Matrix and Frame Iteration
apply() Collapse a matrix along rows (MARGIN=1) or columns (MARGIN=2)
rowSums/colMeans family Fast dedicated helpers for the most common row/column reductions
mapply() and Map() Walk several inputs in parallel, mapply simplifies, Map always returns a list
Function Shorthand and Pattern
anonymous function \(x) Inline function shorthand, readable inside apply-family calls in R 4.1+
iterate, transform, collect The universal loop shape, step through, compute, collect into a new structure