You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

12.2 Grouping and summarizing data

2 min readaugust 9, 2024

Grouping and summarizing data are key skills in data manipulation. They let you organize info into meaningful chunks and crunch numbers to get useful insights. These techniques are super handy for spotting patterns and trends in your data.

Using functions like and , you can slice and dice data in countless ways. You'll learn to calculate stats for different groups, sort data, and pull out the most important bits. It's like giving your data superpowers!

Grouping Data

Creating and Manipulating Groups

Top images from around the web for Creating and Manipulating Groups
Top images from around the web for Creating and Manipulating Groups
  • group_by()
    function organizes data into groups based on specified variables
  • Groups remain intact for subsequent operations until explicitly ungrouped
  • ungroup()
    removes grouping structure from a data frame
  • Grouping affects how other functions operate on the data
  • Multiple variables can be used for grouping, creating nested group structures

Sorting and Selecting Data

  • [arrange()](https://www.fiveableKeyTerm:arrange())
    orders rows of a data frame based on values in specified columns
  • [desc()](https://www.fiveableKeyTerm:desc())
    function used within
    arrange()
    to sort in descending order
  • Sorting can be applied to grouped data, maintaining group structure
  • slice()
    selects rows from a data frame by their integer positions
  • When used with grouped data,
    slice()
    operates within each group independently

Summarizing Data

Basic Summary Functions

  • summarize()
    (or
    summarise()
    ) computes summary statistics for a data frame
  • Creates a new data frame with one row per group when used with grouped data
  • [n()](https://www.fiveableKeyTerm:n())
    counts the number of rows in each group or the entire dataset
  • [mean()](https://www.fiveableKeyTerm:mean())
    calculates the arithmetic average of a numeric vector
  • [median()](https://www.fiveableKeyTerm:median())
    finds the middle value in a sorted set of numbers

Advanced Summary Functions

  • max()
    returns the highest value in a vector or column
  • min()
    identifies the lowest value in a vector or column
  • sum()
    computes the total of all values in a numeric vector
  • Aggregate functions perform calculations across multiple rows or an entire column
    • Include functions like
      var()
      for variance and
      sd()
      for
    • Can be used within
      summarize()
      to compute group-wise statistics
  • Custom summary functions can be defined and used within
    summarize()

Applying Summary Functions

  • Summary functions can be used on both grouped and ungrouped data
  • Multiple summary statistics can be computed in a single
    summarize()
    call
  • Results of summary functions often combined with
    group_by()
    for group-wise analysis
  • Summary functions handle missing values (NA) differently, some have
    [na.rm](https://www.fiveableKeyTerm:na.rm)
    argument
  • Summarized results can be further manipulated or visualized for data analysis
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary