25  Functions: Purpose, Types, and Creation

NoteWhat this chapter covers

A function wraps a block of logic behind a name and a clear interface: what goes in (arguments), what comes out (return value). Functions let you name an idea, reuse it, test it, and hide the details. This chapter covers why functions matter, how to write them, the difference between positional, named, and default arguments, variable-length arguments with ..., and the rules R uses to find variables inside a function (scoping). You’ll learn to turn a rough snippet of analysis code into a clean, reusable function.

25.1 Why functions?

Take any script longer than a page and you’ll find the same three or four lines of logic appearing in slightly different form over and over. A function is how you name that logic once and use the name afterwards.

Functions buy you four things:

  1. Names. pct(82, 100) tells the reader what’s happening; round(82/100*100, 1) makes them decode it.
  2. Reuse. One edit updates every caller.
  3. Isolation. Variables inside a function don’t leak out, you can experiment without polluting the workspace.
  4. Testing. A function with clear inputs and outputs is something you can hand examples to and verify.

25.2 The anatomy of a function

Every R function has the same skeleton:

name <- function(arg1, arg2, ...) {
  body                     # one or more expressions
  return_value             # the last expression is the result
}

A live example:

Three things to notice:

  1. function(…) { … } is itself an expression. We assign it to a name with <-.
  2. The braces {} wrap the body. You can omit them for a one-line body, but including them is never wrong.
  3. The last expression is the return value, no return() keyword required.

25.3 return(), explicit vs implicit

Both forms below work identically:

R convention is to rely on the implicit return for the normal exit, and use return() only for early returns, jumping out before the end when a short-circuit condition fires.

25.4 Arguments, positional, named, default

Arguments can be passed by position or by name. Named passing is more explicit and safer once a function has more than two or three arguments.

Default values let you omit arguments that usually take the same value:

Defaults can reference earlier arguments, handy for computed defaults:

TipArgument-order discipline

A healthy convention: put the data first, then required parameters, then optional parameters with defaults. This matches R’s own functions (mean(x, na.rm = FALSE)) and plays well with the pipe |>.

25.5 Variable arguments, ...

... lets a function accept an unknown number of extra arguments. It collects them and passes them through to another function unchanged.

Inside the function, ... can be inspected by wrapping it in list(...):

... is how most plotting and summary functions let you pass through graphical or statistical options you didn’t anticipate.

25.6 Return multiple values, return a list

R functions return exactly one object. To return several things, bundle them in a named list.

The caller pulls values out with $ or [[. This idiom is everywhere, every model-fitting function in R returns a list of this shape.

25.7 Scoping, where does a name come from?

Inside a function, R looks for variables in a specific order: local first, then the enclosing environment, then parent environments, then the global workspace.

Variables created inside a function are local, they disappear when the function returns:

WarningAvoid reaching out for inputs

The example above works but is fragile. If multiplier changes, the function’s behaviour changes silently. Always prefer passing values in as arguments, functions should read their inputs from arguments, not from the surrounding workspace.

25.8 Functions are first-class

A function is an ordinary R object. You can store it in a variable, pass it to another function, return it from a function, or put it in a list.

This is the property that makes the apply family (Chapter 24) possible, sapply(x, mean) passes the function mean as a value.

25.9 Types of functions you’ll meet

Four categories worth naming, even though there’s nothing syntactically different between them:

  1. Built-in functions, ship with R: mean(), sum(), paste(), lm().
  2. Package functions, loaded via library(): dplyr::mutate(), stringr::str_detect().
  3. User-defined functions, the kind we’re writing in this chapter.
  4. Anonymous functions, defined on the spot without a name: \(x) x^2 used inside an apply call.

The rules are identical for all four. The distinction is social, not technical.

25.10 Pure vs side-effect functions

A pure function returns a value and does nothing else, no printing, no plotting, no writing to files. A side-effect function changes the outside world.

Rule of thumb: pure functions are easier to test and combine. Reserve printing, messaging, and file I/O for functions whose job is precisely that.

25.11 Worked example, a reusable grading function

Package the grading logic from Chapter 21 as a proper function: explicit arguments, default rules, a clean return value, and support for a vector input via ifelse.

Look at what the function gained:

  • A default cutoff vector matches the common case; callers override for a custom scheme.
  • The function accepts scalar or vector input, one implementation, two use cases.
  • The behaviour is documented through the parameter names, not a paragraph of comments.

25.12 Summary

Summary of concepts introduced in this chapter
Concept Description
Definition and Return
function() { ... } The constructor for a function; bind it to a name with <-
implicit return The last expression in the body is what the function returns
return() Use for early exits when a guard condition fires
Arguments
positional vs named args Pass by position for short signatures, by name for clarity and safety
default values Let callers omit common values; defaults may reference earlier args
... (dots) Collect an unknown number of extra arguments to pass through to another call
Return Shapes and Scope
return a list Bundle multiple outputs in a named list, the caller extracts with $ or [[
lexical scoping R looks up names locally, then in enclosing, parent, and global scopes
local variables Variables created inside a function disappear when the call returns
Design Principles
first-class functions Functions are ordinary objects, storable in variables, lists, and arguments
pure vs side-effect Prefer pure functions for testability; reserve printing and I/O for helpers whose job it is