You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

Data frames are the backbone of data manipulation in R. They organize information into and , making it easy to work with structured data. This section covers essential techniques for selecting, modifying, and reshaping data frames.

We'll explore how to extract specific columns and rows, add or remove variables, and transform existing data. We'll also dive into merging datasets and aggregating information, equipping you with powerful tools for data analysis in R.

Selecting and Manipulating Data

Data Frame Structure and Column Selection

Top images from around the web for Data Frame Structure and Column Selection
Top images from around the web for Data Frame Structure and Column Selection
  • Data frames serve as two-dimensional structures in R organizing data into rows and columns
  • Columns represent variables while rows contain individual observations or cases
  • Access specific columns using the
    $
    operator (
    dataframe$column_name
    )
  • Select multiple columns with square bracket notation (
    dataframe[, c("column1", "column2")]
    )
  • Utilize the
    [select()](https://www.fiveableKeyTerm:select())
    function from package to choose columns based on names or positions
    • select(dataframe, column1, column2)
      extracts specified columns
    • select(dataframe, starts_with("prefix"))
      selects columns starting with a specific prefix

Row Selection and Subsetting Techniques

  • Extract rows using logical conditions within square brackets (
    dataframe[dataframe$column > value, ]
    )
  • Employ numeric indices to select specific rows (
    dataframe[1:5, ]
    selects the first five rows)
  • Subset data frames by combining row and column selection (
    dataframe[1:10, c("column1", "column2")]
    )
  • Use the
    [filter()](https://www.fiveableKeyTerm:filter())
    function from dplyr for more complex row selection based on multiple conditions
    • filter(dataframe, column1 > value1 & column2 == value2)
      selects rows meeting specified criteria
  • Combine
    select()
    and
    filter()
    for powerful data manipulation
    • dataframe %>% filter(condition) %>% select(column1, column2)
      chains operations using the pipe operator

Modifying Data Frames

Adding and Removing Columns

  • Create new columns using the
    $
    operator (
    dataframe$new_column <- values
    )
  • Add multiple columns simultaneously with the
    cbind()
    function
  • Remove columns by assigning
    NULL
    to them (
    dataframe$column_to_remove <- NULL
    )
  • Utilize the
    subset()
    function to exclude specific columns (
    subset(dataframe, select = -column_to_remove)
    )
  • Employ the
    [mutate()](https://www.fiveableKeyTerm:mutate())
    function from dplyr to add or modify multiple columns in one operation
    • mutate(dataframe, new_column1 = calculation1, new_column2 = calculation2)
  • Use
    transmute()
    to create new columns while dropping all others

Renaming and Transforming Columns

  • Rename columns using the
    names()
    function (
    names(dataframe)[column_index] <- "new_name"
    )
  • Apply the
    rename()
    function from dplyr for more intuitive column renaming
    • rename(dataframe, new_name1 = old_name1, new_name2 = old_name2)
  • Transform existing columns with
    mutate()
    by referencing other columns or applying functions
    • mutate(dataframe, transformed_column = function(existing_column))
  • Utilize
    across()
    within
    mutate()
    to apply the same transformation to multiple columns
    • mutate(dataframe, across(columns_to_transform, transformation_function))

Combining and Reshaping Data

Merging and Joining Data Frames

  • Combine data frames vertically using
    rbind()
    when they have the same column structure
  • Merge data frames horizontally with
    merge()
    or dplyr's join functions (
    left_join()
    ,
    right_join()
    ,
    inner_join()
    ,
    full_join()
    )
    • merge(dataframe1, dataframe2, by = "common_column")
      joins based on a shared column
    • left_join(dataframe1, dataframe2, by = "common_column")
      keeps all rows from the first data frame
  • Use
    bind_rows()
    from dplyr to stack data frames with different column structures

Reshaping and Aggregating Data

  • Transform data between wide and long formats using
    [pivot_wider()](https://www.fiveableKeyTerm:pivot_wider())
    and
    [pivot_longer()](https://www.fiveableKeyTerm:pivot_longer())
    from package
    • pivot_longer(dataframe, cols = column_names, names_to = "new_column_name", values_to = "value_column_name")
  • Aggregate data with
    [group_by()](https://www.fiveableKeyTerm:group_by())
    and
    [summarize()](https://www.fiveableKeyTerm:summarize())
    functions from dplyr
    • group_by(dataframe, grouping_column) %>% summarize(new_column = aggregate_function(column))
  • Apply multiple aggregation functions simultaneously within
    summarize()
  • Utilize
    ungroup()
    to remove grouping structure after aggregation
  • Employ
    [arrange()](https://www.fiveableKeyTerm:arrange())
    to sort the resulting data frame based on specific columns
    • arrange(dataframe, column1, desc(column2))
      sorts by column1 ascending and column2 descending
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary