You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

6.3 Logical indexing and filtering

3 min readaugust 9, 2024

Logical indexing and filtering are powerful tools for manipulating data in R. They let you slice and dice your datasets, pulling out exactly what you need. With these techniques, you can easily select specific rows or columns based on conditions.

These skills are crucial for data analysis and cleaning. By mastering logical operators and filtering methods, you'll be able to efficiently subset large datasets, handle missing values, and prepare your data for further analysis or visualization.

Logical Vectors and Operators

Understanding Logical Vectors and Boolean Operations

Top images from around the web for Understanding Logical Vectors and Boolean Operations
Top images from around the web for Understanding Logical Vectors and Boolean Operations
  • Logical vectors contain only TRUE or FALSE values
  • Boolean operators manipulate logical vectors
    • NOT (!) reverses logical values
    • AND (&) returns TRUE if both operands are TRUE
    • OR (|) returns TRUE if at least one operand is TRUE
  • Comparison operators create logical vectors
    • Equal to ()
    • Not equal to (!=)
    • Greater than (>)
    • Less than ()
    • Greater than or equal to (>=)
    • Less than or equal to (<=)
  • Vectorized operations apply element-wise to vectors
    • c(1, 2, 3) > 2
      results in
      c(FALSE, FALSE, TRUE)

Advanced Logical Operations

  • Combine multiple conditions using AND (&) and OR (|) operators
    • (x > 0) & (x < 10)
      checks if x is between 0 and 10
    • (y == "A") | (y == "B")
      checks if y is either "A" or "B"
  • Short-circuit evaluation optimizes performance
    • AND stops evaluating if first condition is FALSE
    • OR stops evaluating if first condition is TRUE
  • Use parentheses to control order of operations
    • (a > b) & (c < d) | (e == f)
      evaluates left to right
    • (a > b) & ((c < d) | (e == f))
      changes evaluation order

Subsetting and Filtering

Basic Subsetting Techniques

  • Subset operator [] extracts elements from vectors, matrices, or data frames
    • x[3]
      selects the third element of x
    • df[2, 3]
      selects the element in the second row and third column of df
  • function returns indices of TRUE values in a
    • which(x > 5)
      returns positions where x is greater than 5
  • function selects rows based on logical conditions
    • subset(df, age > 18)
      selects rows where age is greater than 18
  • Conditional combines logical vectors with the subset operator
    • x[x > 0]
      selects all positive values in vector x

Advanced Filtering Techniques

  • filter() function from dplyr package provides intuitive data frame filtering
    • filter(df, age > 18, gender == "F")
      selects females over 18
  • Combine multiple conditions for complex filtering
    • df[df$age > 18 & df$income > 50000, ]
      selects rows meeting both conditions
  • Use %in% operator for membership tests
    • df[df$category %in% c("A", "B", "C"), ]
      selects rows with specified categories
  • Apply functions within subsetting for dynamic filtering
    • df[grepl("^A", df$name), ]
      selects rows where name starts with "A"

Handling Missing Values

Identifying and Working with Missing Data

  • is.na() function checks for missing values (NA)
    • Returns TRUE for NA values, FALSE otherwise
    • is.na(x)
      creates a logical vector indicating NA positions in x
  • Missing value handling strategies
    • Remove rows with missing values using na.omit() or complete.cases()
      • na.omit(df)
        removes rows with any NA values
    • Impute missing values with mean, median, or other methods
      • df$x[is.na(df$x)] <- mean(df$x, na.rm = TRUE)
        replaces NA with mean
  • Subset to exclude or include missing values
    • df[!is.na(df$x), ]
      selects rows where x is not NA
    • df[is.na(df$y), ]
      selects rows where y is NA

Advanced Missing Value Operations

  • Combine is.na() with logical operators for complex conditions
    • df[is.na(df$x) | is.na(df$y), ]
      selects rows where either x or y is NA
  • Use colSums() or rowSums() with is.na() to count missing values
    • colSums(is.na(df))
      counts NA values in each column
  • Apply na.rm = TRUE in functions to ignore missing values
    • mean(x, na.rm = TRUE)
      calculates mean excluding NA values
  • Visualize missing data patterns using libraries like VIM or naniar
    • Create heatmaps or bar plots to identify missing data trends
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary