3 min read•august 9, 2024
Subsetting data frames is a crucial skill in R programming, allowing you to extract specific parts of your data. This topic covers various methods, from basic square bracket notation to advanced functions, giving you the tools to manipulate your data effectively.
Understanding these techniques is essential for data analysis and manipulation. By mastering subsetting, you'll be able to efficiently filter, , and transform your data, setting the foundation for more complex data operations in R.
[]
accesses specific elements, rows, or columns in a data frame[]
return a data frame, while double square brackets [[]]
return a vectordataframe[row, column]
$
extracts a single column from a data frame as a vectordataframe$column[1:5]
[subset()](https://www.fiveableKeyTerm:subset())
function creates a subset of a data frame based on specified conditionssubset(dataframe, condition, select = columns)
==
, !=
, >
, <
, >=
, <=
)&
, |
, !
)[which](https://www.fiveableKeyTerm:which)()
function to find indices of TRUE values in a logical vectorc()
function to create vectors of indices or names for multiple selectionsdataframe[1:5, ]
or dataframe[, c("col1", "col2")]
dataframe[1:5, c("col1", "col2")]
dataframe[dataframe$age > 30, ]
which()
function to find row indices based on conditions dataframe[which(dataframe$status == "active"), ]
dataframe[dataframe$age > 30 & dataframe$income < 50000, ]
dataframe$column_to_drop <- NULL
dataframe[, c("col1", "col2", "col3")]
dataframe[10:20, 3:5]
dataframe[, c("col3", "col1", "col2")]
dataframe$new_column <- dataframe$column1 + dataframe$column2
[dplyr::select()](https://www.fiveableKeyTerm:dplyr::select())
function chooses specific columns from a data frameselect()
with column names, indices, or helper functions (starts_with(), ends_with(), contains())select()
using the new_name = old_name syntax-
to exclude specific columnsselect()
select()
with other dplyr functions using the pipe operator %>%
dplyr::[filter()](https://www.fiveableKeyTerm:filter())
function subsets rows based on specified conditionsfilter()
callfilter()
with between()
, %in%
, and other dplyr helper functions for complex filteringfilter()
with select()
to subset both rows and columns in a single pipelinefilter()
with group_by()
to apply filtering conditions within groupsfilter()
for efficient data cleaning and preparation tasks