3 Fundamentals of R and R Studio
3.1 Introduction to R programming
What is R ?
- R (R Core Team, 2024), is a powerful language and environment for statistical computing and graphics.
- R is an open-source programming language, widely used among statisticians, data analysts, and researchers for data manipulation, calculation, and graphical display.
- R is not just a programming language, but also an environment for interactive statistical analysis.
- It was developed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently maintained by the R Development Core Team.
- It is a GNU project and is freely available under the GNU General Public License.
- Packages: The R community is known for its active contributions in terms of packages. There are thousands of packages available in the Comprehensive R Archive Network (CRAN), covering various functions and applications.
- Platform Independent: R is available for various platforms such as Windows, MacOS, and Unix-like systems.
3.1.1 Installation and Setup
Install R
Download and install R from the Comprehensive R Archive Network (CRAN) and choose the relevant OS (Windows,mac,linux).
Install RStudio
RStudio is a recommended integrated development environment (IDE) for R. Download and install RStudio form POSIT and choose the relevant OS (Windows,mac,linux).
3.2 Basics of R Studio interface
3.2.1 Overview of RStudio Panels
-
RStudio is a widely-used Integrated Development Environment (IDE) for R programming.
- RStudio’s design enhances the efficiency and user-friendliness of coding, testing, and data analysis in R.
- Its panels and features provide a comprehensive environment that caters to the needs of both novice and experienced R programmers.
- It features a user-friendly interface and is divided into several panels, each designed for specific tasks. Here’s a detailed overview of these panels.
Source Panel (Top-Left by Default)
Function
This panel is where you write and edit your R scripts and R Markdown documents.
Features
- Syntax highlighting for R code.
- Code completion and hinting.
- Ability to run code directly from the script.
Console Panel (Bottom-Left by Default)
Function
This is where R code is executed interactively.
Features
- Direct execution of R commands.
- Displays results of script execution.
- Keeps a history of your commands.
Environment/History Panel (Top-Right by Default)
Environment Tab
- Shows the current working dataset and variables in memory.
- Allows for inspection and management of data structures and variables.
History Tab
- Records all commands run in the Console.
- Enables re-running and insertion of previous commands into scripts.
Output/ Files/ Plots/ Packages/ Help/ Viewer Panel (Bottom-Right by Default)
Files Tab
- Manages project files and directories.
- Sets the working directory.
Plots Tab
- Displays graphs and charts.
- Allows for the export of plots.
Packages Tab
- Lists and manages R packages.
- Provides access to package documentation.
Help Tab
- Offers R documentation and help files.
- Useful for learning about R functions and packages.
Viewer Tab
- Displays local web content such as HTML files from R Markdown or Shiny apps.
Additional Features
- Toolbar: Quick access to common tasks like saving, loading, and running scripts.
- Customization: Ability to rearrange the layout of tabs and panes.
- Version Control: Integrated support for Git and SVN.
3.3 Fundamentals of R programming
3.3.1 R Syntax
R is a powerful programming language used extensively for statistical computing and graphics. It provides a wide array of techniques for data analysis, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and more. Its syntax allows users to easily manipulate data, perform calculations, and create graphical displays. Here’s a breakdown of some fundamental aspects of R syntax and an example to illustrate how it works.
Basic Syntax Components
Variables: In R, you can create variables without declaring their data type. You simply assign values directly with the assignment operator
<-or=.Comments: Comments start with the
#symbol. Everything to the right of the#in a line is ignored by the interpreter.Vectors: One of the basic data types in R is the vector, which you create using the
c()function. Vectors are sequences of elements of the same type.Functions: Functions are defined using the
functionkeyword. They can take inputs (arguments), perform actions, and return a result.Conditional Statements: R supports the usual if-else conditional constructs.
Loops: For iterating over sequences, R provides
for,while, andrepeatloops.Packages: R’s functionality is extended through packages, which are collections of functions, data, and compiled code. You can install packages using the
install.packages()function and load them withlibrary().
3.3.2 R Script
- Rscript is a tool for executing R scripts directly from the command line, making it easier to integrate R into automated processes or workflows.
- It’s part of the R software environment, which is widely used for statistical computing and graphics. Rscript enables you to run R code saved in script files (typically with the
.Rextension) without opening an interactive R session. - This is particularly useful for batch processing, automated analyses, or running scripts on servers where a graphical user interface is not available.
Creating an R Script in RStudio
Creating and using R scripts in RStudio is a fundamental skill for anyone working with data in R. RStudio, being a powerful IDE for R, streamlines the process of writing, running, and managing R scripts. Here’s a concise guide based on insights from various sources:
Start a New Script: To begin, navigate to
File->New File->R Script. This opens a new script tab in the top-left pane where you can write your code.Writing Code: You can type your R code directly into this script pane. Common tasks include importing data, data manipulation, statistical analysis, and plotting. For instance, to create and print a variable, simply type something like
result <- 3followed byprint(result)to see the output in the Console pane.Running Code: To execute your code, you can click the
Runbutton at the top of the script pane, or use keyboard shortcuts (e.g.,Ctrl+Enteron Windows orCmnd+Enteron Mac). The output will appear in the Console pane at the bottom.
Basic R Scripts Examples
Below are a few examples of basic R scripts that demonstrate common tasks in R.
Example 1: Hello World
A simple script that prints “Hello, World!” to the console.
Example 2: Basic Arithmetic
This script performs basic arithmetic operations and prints the results.
3.3.3 Data Types in R
Data types refer to the kind of data that can be stored and manipulated within a program. In R, the basic data types include:
- Numeric: Represents real numbers (e.g., 2, 15.5).
- Integer: Represents whole numbers (e.g., 2L, where L denotes an integer).
- Character: Represents strings (e.g., “hello”, “1234”). Character must be put between “.
- Logical: Represents Boolean values (TRUE or FALSE).
3.3.4 Basic Operators
Assignment Operator
- The assignment operator in R is used to assign values to variables or objects in the R programming language.
- The leftwards assignment operator <-: This is the most commonly used assignment operator in R. It assigns the value on its right to the object on its left. For example, x <- 3 assigns the value 3 to the variable x.
- Alternative Assignment Operator (=) Apart from <-, R also supports the use of the = operator for assignments, similar to many other programming languages.
- However, the use of <- is preferred in R for historical and readability reasons. For example, x = 3 is valid but x <- 3 is more idiomatic to R.
Use <- or = for assigning values, e.g., x <- 10 or x= 10
Commenting Code for Clarity
Use # for comments, e.g., # This is a comment.
- Comments are not executable and are used to provide relevant information about the syntax. Whatever is typed after
#symbol, is considered as comment.
Arithmetic operators
- In R, arithmetic operators are used to perform common mathematical operations on numbers, vectors, matrices, and arrays. Here’s an overview of the primary arithmetic operators available in R:
+,-,*,/,^
Division (/) operator - Divides the first number or vector by the second, element-wise.
Square (^) operator - Squares the first number by the second.
3.3.5 Statements
Logical Operations
Includes ==, !=, >, <, >=, <=.
Equality: == checks if two values are equal.
Inequality: != checks if two values are not equal.
Greater than: > checks if the value on the left is greater than the value on the right.
Less than: < checks if the value on the left is less than the value on the right.
Greater than or equal to: >= checks if the value on the left is greater than or equal to the value on the right.
Less than or equal to: <= checks if the value on the left is less than or equal to the value on the right.
3.3.6 Data Structures
Vectors
- Vectors are fundamental data structures that hold elements of the same type.
- They are one-dimensional arrays that can store numeric, character, or logical data.
- Assigning data to vectors in R is a basic operation, essential for data manipulation and analysis.
- The
c()function combines values into a vector. It’s the most common method for creating vectors.
Matrix
- A two-dimensional, rectangular collection of elements of the same type.
- All elements must be of the same data type.
- Created using the matrix() function. nrow is used to set number of rows and byrow is used to set values by rows (if TRUE) or columns (if FALSE).
Array
- Similar to matrices but can have more than two dimensions.
- Elements within an array must all be of the same data type.
- Created using the array() function. dimensions are set using dim.
3.3.7 Functions
sum() Function
The sum() function calculates the total sum of all the elements in a numeric vector.
length() Function
The length() function returns the number of elements in a vector (or other objects).
sqrt() Function
The sqrt() function calculates the square root of each element in a numeric vector.
mean() Function
The mean() function calculates the arithmetic mean (average) of the elements in a numeric vector.
summary() function
The summary() function in R provides a concise statistical summary of objects like vectors, matrices, data frames, and results of model fitting.
data.frame() function
data.frame() function is used to create data frames, which are table-like structures consisting of rows and columns. - Data frames are one of the most important data structures in R, especially for statistical modeling and data analysis.
head() function
The head() function in R is used to display the first few rows of a dataset, making it a useful tool for quickly inspecting large data frames or matrices.
View() function
View() function is used to invoke a spreadsheet-like data viewer on a data frame, matrix, or other objects that can be coerced into a data frame. - This function is particularly useful during interactive sessions to inspect data visually.
3.3.8 Loops
Use for, while.
for loop
The for loop in R is used to iterate over a sequence (like a vector or a list) and execute a block of code for each element in the sequence.
while loop
The while loop executes a block of code as long as the specified condition is TRUE
3.4 R Markdown
3.4.1 Introduction to R Markdown
-
R Markdown is a powerful tool for integrating data analysis with documentation, allowing you to create dynamic reports and presentations.
- It combines the core syntax of Markdown (a simple markup language for formatting text) with embedded R code chunks.
- R Markdown documents are fully reproducible and support a wide range of output formats like HTML, PDF, and Word documents.
Key Features of R Markdown
- Reproducible Research: Allows you to integrate your R code with your report, ensuring that your analysis can be easily reproduced.
- Multiple Output Formats: You can convert a single R Markdown file into a variety of formats, including HTML, PDF, and Word.
- Dynamic Content: Your document automatically updates its results whenever the underlying R code changes.
- Integration with RStudio: R Markdown is tightly integrated with RStudio, making it easy to write, preview, and compile your document.
3.4.2 Creating an R Markdown File
In RStudio, you can create a new R Markdown file via the menu: File -> New File -> R Markdown....
This opens a dialog where you can choose the output format and other options.
Enter a title, author and date, check html for html output and click ok.
New R Markdown will open in a new window as shown below.
Default R Markdown Template in RStudio
When you create a new R Markdown file in RStudio, a default template is generated with the following components:
YAML Metadata (Document Header)
At the top of the document, you will see a YAML metadata block, enclosed within triple dashes ---. This section specifies document settings such as the title, author, date, and output format.---
title: “Untitled”
author: “vijay”
date: “2026-05-05”
output: html_document---
Setup Code Chunk
Immediately following the YAML header, a setup chunk is included:
{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
- Purpose: This chunk sets options for how R code should be displayed and executed in the document.
- knitr::opts_chunk$set(echo = TRUE): Ensures that R code is displayed along with its output.
- include=FALSE: Hides this setup chunk from appearing in the final document.
Default Heading: “R Markdown”
# R Markdown
This is a section heading that introduces the user to R Markdown. It is placed there to provide a structured template for writing content.
Insert the following code inside the r setup chunk.
options(repos = c(CRAN = "https://cran.rstudio.com/"))
knitr::opts_chunk$set(message = FALSE)Including this code at the beginning of an RMarkdown file ensures that R uses a specific CRAN repository (https://cran.rstudio.com/) for package installation and suppresses unnecessary messages in code chunks, making the document cleaner and more readable.
It should look like the below.
- Save the file with a file name
Compiling the Document: knit
After saving the file, compile the document into your desired output format. click the Knit button in RStudio. This will execute all R code within the document and render it to the specified output format.
3.4.3 Markdown Syntax
R Markdown supports standard Markdown syntax. Here are some basic examples:
-
Headers: Use
#for headers. E.g.,# Header 1,## Header 2. -
Bold and Italic: Use
**bold**for bold text and*italic*for italic text. -
Lists: Use
-or*for unordered lists and numbers for ordered lists. -
Links: Use
[link text](URL)to create hyperlinks. -
Images: Use
to insert images.
Embedding R Code
You can embed R code within your document by using the following syntax:
The three backticks (```) mark the beginning of a code chunk, while {r} specifies that the chunk contains R code. To properly close the code chunk, another set of three backticks (```) must be included at the end.
The code inside this chunk is executed, and its results are included in the document below the chunk.
Compiling the Document: knit
To compile the document into your desired output format, click the Knit button in RStudio. This will execute all R code within the document and render it to the specified output format.
Dynamic Report Generation
- Knitr allows for the automatic generation of reports. Code chunks embedded within an R Markdown document are executed during the compilation of the document, ensuring that the results (including figures and tables) are directly integrated into the final output.
Multiple Output Formats
- With Knitr and R Markdown, you can create a wide range of output formats including HTML, PDF, and Word documents. This flexibility makes it easy to produce reports tailored to different audiences and purposes.
Reproducibility
- By combining explanations, source code, and results, documents created with Knitr are not just reports, but also reproducible records of your analysis. This is crucial in scientific research and data analysis where reproducibility is a key concern.
Ease of Use
- Knitr uses simple syntax to embed R code in Markdown documents. Code chunks are clearly marked and can be configured with various options to control their behavior and appearance in the output document.
3.4.4 Example: Data Analysis
Here’s a simple example of an R Markdown document that performs a basic data analysis:
Summary Statistics
Let’s calculate summary statistics for the pressure dataset:
Including Plots
You can also embed plots. For example, here’s a plot of pressure vs temperature:
3.4.5 Chunk options
| Option | Run code | Show code | Output | Plots | Messages | Warnings |
|---|---|---|---|---|---|---|
eval = FALSE |
||||||
include = FALSE |
✓ | |||||
echo = FALSE |
✓ | ✓ | ✓ | ✓ | ✓ | |
results = "hide" |
✓ | ✓ | ✓ | ✓ | ✓ | |
fig.show = "hide" |
✓ | ✓ | ✓ | ✓ | ✓ | |
message = FALSE |
✓ | ✓ | ✓ | ✓ | ✓ | |
warning = FALSE |
✓ | ✓ | ✓ | ✓ | ✓ |
The most important set of options controls if your code block is executed and what results are inserted in the finished report:
eval = FALSE prevents code from being evaluated. (And obviously if the code is not run, no results will be generated). This is useful for displaying example code, or for disabling a large block of code without commenting each line.
include = FALSE runs the code, but doesn’t show the code or results in the final document. Use this for setup code that you don’t want cluttering your report.
echo = FALSE prevents code, but not the results from appearing in the finished file. Use this when writing reports aimed at people who don’t want to see the underlying R code.
message = FALSE or warning = FALSE prevents messages or warnings from appearing in the finished file.
error = TRUE causes the render to continue even if code returns an error.
3.4.6 PDF Reports: Install tinytex package
- To create PDF documents from R Markdown, you will need to have a LaTeX distribution installed. Although there are several traditional options, I recommend that R Markdown users install tinyteX.
To learn more about R Markdown, Check this ebook
3.4.7 Sample RMarkdown file
3.5 Directories and Projects in R
3.5.1 Introduction
A directory in R is a folder in the computer where files are stored and accessed. R allows users to interact with directories to read data files, save outputs, and manage projects efficiently. Understanding how to check and change the working directory is crucial when dealing with file operations in R.
What is a Working Directory?
The working directory is the default location where R reads and writes files. When working with data files such as CSV, Excel, or text files, R looks for these files in the current working directory unless a full file path is specified.
3.5.2 Setting a New Working Directory:
The setwd() function allows users to change the working directory. This is useful when dealing with files located in different folders.
Syntax:
Checking the Current Working Directory:
The getwd() function in R is used to check the current working directory. This helps users confirm where R is looking for files and where outputs will be saved.
Syntax:
Listing Files in the Current Directory
To check the available files in the current working directory, use list.files():
Code
# List all files in the current working directory
list.files()This is helpful when verifying whether the required files exist before attempting to read them.
Best Practices for Using Directories in R
3.6 Overview of Key R Packages
R provides a vast ecosystem of packages designed for data analysis, visualization, and statistical modeling. In this section, we will explore five essential packages:
-
stats: Statistical analysis and hypothesis testing
-
plotly: Interactive visualizations
-
tidyverse: A collection of packages for data manipulation and visualization
Each of these packages serves a crucial role in handling data efficiently and performing complex analyses in R.
3.6.1 stats: Statistical Analysis and Hypothesis Testing in R
The stats package comes pre-installed with R and provides essential statistical functions for data analysis.
Key Features:
- Performs descriptive statistics (mean, median, standard deviation).
- Supports hypothesis testing (t-tests, ANOVA, chi-square).
- Includes regression and time-series analysis.
Basic Usage:
3.6.2 plotly: Creating Interactive Visualizations in R
The plotly package enables the creation of interactive and dynamic charts in R, making it useful for exploring data visually.
Key Features:
- Supports interactive plots such as scatter plots, bar charts, and 3D plots.
- Allows zooming, panning, and tooltips.
- Integrates seamlessly with
ggplot2.
Basic Usage:
Code
# Install and load the package
install.packages("plotly")Example Use Case:
An agribusiness analyst visualizes seasonal trends in crop yields using plotly, making it easier to identify patterns and variations.
3.6.3 Tidyverse: A Unified Collection of R Packages for Data Science
The tidyverse is a collection of R packages designed for data science. It provides a structured approach to importing, manipulating, visualizing, and modeling data.
Installing and Loading Tidyverse
Code
# Install tidyverse
install.packages("tidyverse")
# Load all core tidyverse packages
library(tidyverse)This loads several useful packages such as ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats.
3.6.4 Key Functionalities of Tidyverse
1. Data Manipulation with dplyr
2. Data Tidying with tidyr
3. Data Visualization with ggplot2
4. String Manipulation with stringr
5. Handling Factors with forcats
Summary
| Concept | Description |
|---|---|
| Introduction to R programming | |
| Installation & Setup | Key concept under Introduction to R programming |
| Basics of R Studio interface | |
| Overview of RStudio Panels | RStudio** is a widely-used Integrated Development Environment (IDE) for R programming.\ |
| Fundamentals of R programming | |
| R Syntax | R** is a powerful programming language used extensively for statistical computing and graphics |
| R Script | Rscript** is a tool for executing R scripts directly from the command line, making it easier to integrate R into automated processes or workflows |
| Data Types in R | Data types refer to the kind of data that can be stored and manipulated within a program |
| Basic Operators | Key concept under Fundamentals of R programming |
| Statements | Key concept under Fundamentals of R programming |
| Data Structures | Key concept under Fundamentals of R programming |
| Functions | Consists inbuilt functions like sum(), length(), sqrt(),mean(), summary(), View() |
| Loops | Use for, while |
| R Markdown | |
| R Markdown | R Markdown** is a powerful tool for integrating data analysis with documentation, allowing you to create dynamic reports and presentations.\ |
| R Markdown File | In RStudio, you can create a new R Markdown file via the menu: File -> New File -> R Markdown |
| Markdown Syntax | R Markdown supports standard Markdown syntax |
| Ex: Analysis | Here's a simple example of an R Markdown document that performs a basic data analysis |
| Chunk options | The most important set of options controls if your code block is executed and what results are inserted in the finished report |
| PDF Reports | To create PDF documents from R Markdown, you will need to have a LaTeX distribution installed |
| Sample | Key concept under R Markdown |
| Directories and Projects in R | |
| **Introduction** | A directory in R is a folder in the computer where files are stored and accessed |
| **Working Directory**: | The setwd() function allows users to change the working directory |
| Overview of Key R Packages | |
| **stats in R** | The stats package comes pre-installed with R and provides essential statistical functions for data analysis |
| **plotly** | The plotly package enables the creation of interactive and dynamic charts in R, making it useful for exploring data visually |
| **Tidyverse** | The tidyverse is a collection of R packages designed for data science |
| **Tidyverse: Key Functionc** | Key concept under Overview of Key R Packages |




