3 min read•august 9, 2024
Data frames are the backbone of data manipulation in R. They organize information into and , making it easy to work with structured data. This section covers essential techniques for selecting, modifying, and reshaping data frames.
We'll explore how to extract specific columns and rows, add or remove variables, and transform existing data. We'll also dive into merging datasets and aggregating information, equipping you with powerful tools for data analysis in R.
$
operator (dataframe$column_name
)dataframe[, c("column1", "column2")]
)[select()](https://www.fiveableKeyTerm:select())
function from package to choose columns based on names or positions
select(dataframe, column1, column2)
extracts specified columnsselect(dataframe, starts_with("prefix"))
selects columns starting with a specific prefixdataframe[dataframe$column > value, ]
)dataframe[1:5, ]
selects the first five rows)dataframe[1:10, c("column1", "column2")]
)[filter()](https://www.fiveableKeyTerm:filter())
function from dplyr for more complex row selection based on multiple conditions
filter(dataframe, column1 > value1 & column2 == value2)
selects rows meeting specified criteriaselect()
and filter()
for powerful data manipulation
dataframe %>% filter(condition) %>% select(column1, column2)
chains operations using the pipe operator$
operator (dataframe$new_column <- values
)cbind()
functionNULL
to them (dataframe$column_to_remove <- NULL
)subset()
function to exclude specific columns (subset(dataframe, select = -column_to_remove)
)[mutate()](https://www.fiveableKeyTerm:mutate())
function from dplyr to add or modify multiple columns in one operation
mutate(dataframe, new_column1 = calculation1, new_column2 = calculation2)
transmute()
to create new columns while dropping all othersnames()
function (names(dataframe)[column_index] <- "new_name"
)rename()
function from dplyr for more intuitive column renaming
rename(dataframe, new_name1 = old_name1, new_name2 = old_name2)
mutate()
by referencing other columns or applying functions
mutate(dataframe, transformed_column = function(existing_column))
across()
within mutate()
to apply the same transformation to multiple columns
mutate(dataframe, across(columns_to_transform, transformation_function))
rbind()
when they have the same column structuremerge()
or dplyr's join functions (left_join()
, right_join()
, inner_join()
, full_join()
)
merge(dataframe1, dataframe2, by = "common_column")
joins based on a shared columnleft_join(dataframe1, dataframe2, by = "common_column")
keeps all rows from the first data framebind_rows()
from dplyr to stack data frames with different column structures[pivot_wider()](https://www.fiveableKeyTerm:pivot_wider())
and [pivot_longer()](https://www.fiveableKeyTerm:pivot_longer())
from package
pivot_longer(dataframe, cols = column_names, names_to = "new_column_name", values_to = "value_column_name")
[group_by()](https://www.fiveableKeyTerm:group_by())
and [summarize()](https://www.fiveableKeyTerm:summarize())
functions from dplyr
group_by(dataframe, grouping_column) %>% summarize(new_column = aggregate_function(column))
summarize()
ungroup()
to remove grouping structure after aggregation[arrange()](https://www.fiveableKeyTerm:arrange())
to sort the resulting data frame based on specific columns
arrange(dataframe, column1, desc(column2))
sorts by column1 ascending and column2 descending