You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

R's data structures are powerful tools for organizing and manipulating information. Subsetting and indexing are key techniques for extracting specific parts of these structures. They let you pull out exactly what you need, whether it's a single value or a whole chunk of data.

These skills are crucial for data analysis and manipulation in R. By mastering subsetting and indexing, you'll be able to efficiently work with complex datasets, filter information, and perform targeted operations on your data structures.

Subsetting in R

Subsetting Basics

Top images from around the web for Subsetting Basics
Top images from around the web for Subsetting Basics
  • Subsetting extracts specific elements or subsets of data from a larger data structure based on certain conditions or criteria
  • Vectors can be subsetted using square brackets
    []
    and logical, integer, or character vectors as indices
    • vec[c(TRUE, FALSE)]
      subsets elements corresponding to TRUE values
    • vec[c(1, 3)]
      subsets the first and third elements
  • Matrices can be subsetted using square brackets
    []
    with and indices separated by a comma
    • mat[1:2, 2:3]
      selects a submatrix consisting of the first two rows and second and third columns
  • Lists can be subsetted using single square brackets
    []
    for selecting elements by position or double square brackets
    [[]]
    for extracting individual elements
    • list[1:2]
      selects the first two elements of the list
    • list[[1]]
      extracts the first of the list

Subsetting Data Frames

  • Data frames can be subsetted using square brackets
    []
    with row and column indices separated by a comma, similar to matrices
    • df[1:3, c("col1", "col2")]
      selects the first three rows and the columns "col1" and "col2"
  • Column names can also be used for subsetting data frames
    • df[, "col1"]
      selects the column named "col1"
    • df$col1
      is an alternative syntax for selecting a single column
  • Logical subsetting can be applied to data frames to filter rows based on conditions
    • df[df$col1 > 10, ]
      subsets rows where the value in "col1" is greater than 10
    • df[df$col1 > 10 & df$col2 [==](https://www.fiveableKeyTerm:==) "A", ]
      subsets rows based on multiple conditions

Subsetting Techniques

Logical Subsetting

  • Logical vectors can be used for subsetting by providing a of TRUE/FALSE values, where TRUE indicates the corresponding element should be included in the subset
  • Logical subsetting is often combined with comparison operators and functions to create a logical vector based on certain conditions
    • vec[vec > 5]
      subsets elements of
      vec
      that are greater than 5
    • mat[mat[,1] > 0, ]
      subsets rows of
      mat
      where the first column is greater than 0
  • Logical subsetting can be used with any data structure that supports subsetting, including vectors, matrices, lists, and data frames

Integer Subsetting

  • Integer vectors can be used for subsetting by providing a vector of positive or negative integers
    • Positive integers specify the positions of elements to include
    • Negative integers specify the positions to exclude
  • Integer subsetting allows for selecting specific positions or ranges of elements
    • vec[c(1, 3, 5)]
      selects the 1st, 3rd, and 5th elements of
      vec
    • mat[1:3, 2:4]
      selects a submatrix consisting of the first three rows and second to fourth columns
  • Integer subsetting can be used with vectors, matrices, lists, and data frames

Character Subsetting

  • Character vectors can be used for subsetting named data structures like lists or data frames, where the character vector specifies the names of elements to include
  • Character subsetting is useful for extracting elements by their names
    • list[c("a", "b")]
      selects the elements named "a" and "b" from
      list
    • df[, c("col1", "col3")]
      selects the columns named "col1" and "col3" from the data frame
      df
  • Character subsetting can be combined with integer or logical subsetting to select specific elements based on both names and conditions
    • df[df$col1 > 10, c("col2", "col3")]
      subsets columns "col2" and "col3" from rows where "col1" is greater than 10

Subsetting vs Indexing

Differences between Subsetting and Indexing

  • Subsetting focuses on extracting a subset of elements from a data structure based on certain conditions or criteria, while indexing is used to access individual elements by their position or identifier
  • Subsetting typically returns a new data structure containing the selected elements, while indexing returns the value of a specific element
  • Subsetting can be performed using logical, integer, or character vectors, while indexing primarily uses integer or character vectors
  • Subsetting allows for selecting multiple elements or subsets of data, while indexing is used to access a single element at a specific position

Use Cases for Subsetting and Indexing

  • Subsetting is commonly used for:
    • based on conditions
    • Extracting specific subsets of data for analysis or further processing
    • Creating new data structures containing only the relevant elements
  • Indexing is commonly used for:
    • Accessing individual elements of a data structure
    • Retrieving specific values or observations
    • Modifying or updating specific elements within a data structure

Combining Subsetting and Indexing

Chained Subsetting

  • Chained subsetting involves applying multiple subsetting operations sequentially to drill down into nested data structures
    • list[[1]][2]
      subsets the first element of
      list
      and then selects the second element of the resulting subset
    • df[df$col1 > 10, ][1:5, c("col2", "col3")]
      subsets rows of
      df
      where "col1" is greater than 10, then selects the first five rows and the columns "col2" and "col3"
  • Chained subsetting allows for complex data extraction and manipulation by combining different subsetting techniques

Modifying Subsetted Data

  • Subsetting can be used to create new data structures or modify existing ones by assigning values to subsetted elements
    • vec[vec > 5] <- 0
      replaces elements of
      vec
      greater than 5 with 0
    • mat[1:2, 1:2] <- matrix(1:4, nrow = 2)
      assigns a new matrix to a subset of
      mat
  • Modifying subsetted data allows for targeted updates and transformations within specific portions of a data structure

Combining Subsetting with Indexing

  • Indexing can be used within subsetted data structures to access specific elements
    • mat[mat[,1] > 0, ][2, 3]
      subsets rows of
      mat
      where the first column is greater than 0, then selects the element at the second row and third column of the resulting submatrix
  • Combining subsetting with indexing enables precise data selection and extraction based on multiple criteria and positions
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary