You have 3 free guides left 😟

Light

You have 3 free guides left 😟

6.1 Descriptive statistics and data visualization

6 min read•august 16, 2024

Descriptive statistics and data visualization are crucial tools for impact evaluation. They help researchers summarize key features of datasets and identify patterns, trends, and relationships between variables. These techniques provide a foundation for more complex analyses.

By using measures of central tendency, dispersion, and distribution shape, evaluators can gain insights into their data. Visualizations like , scatter plots, and box plots further enhance understanding, making it easier to spot , compare groups, and communicate findings effectively.

Descriptive Statistics for Data Exploration

Measures of Central Tendency and Dispersion

Top images from around the web for Measures of Central Tendency and Dispersion

Statistical dispersion - Wikipedia View original
Is this image relevant?
Data Science for Water Professionals: Descriptive Statistics in R View original
Is this image relevant?
Descriptive Statistics | Mathematics for the Liberal Arts Corequisite View original
Is this image relevant?
Statistical dispersion - Wikipedia View original
Is this image relevant?
Data Science for Water Professionals: Descriptive Statistics in R View original
Is this image relevant?

1 of 3

Top images from around the web for Measures of Central Tendency and Dispersion

Statistical dispersion - Wikipedia View original
Is this image relevant?
Data Science for Water Professionals: Descriptive Statistics in R View original
Is this image relevant?
Descriptive Statistics | Mathematics for the Liberal Arts Corequisite View original
Is this image relevant?
Statistical dispersion - Wikipedia View original
Is this image relevant?
Data Science for Water Professionals: Descriptive Statistics in R View original
Is this image relevant?

1 of 3

Descriptive statistics provide quantitative summaries of key dataset features including measures of central tendency, dispersion, and distribution shape
Measures of central tendency include:
- calculates the average value of a dataset
- represents the middle value when data is ordered
- identifies the most frequent value in a dataset
Measures of dispersion quantify data spread:
- measures the difference between maximum and minimum values
- calculates the average squared deviation from the mean
- equals the square root of variance, providing a measure of spread in the original units
Calculate descriptive statistics separately for subgroups to reveal potential differences or patterns

Distribution Shape and Relative Position

describes distribution symmetry:
- Positive skew indicates a longer right tail (Exam scores heavily clustered on the low end)
- Negative skew indicates a longer left tail (Income distributions with a few very high earners)
measures the concentration of data around the mean:
- High kurtosis indicates heavy tails and a peaked distribution (Stock returns during market volatility)
- Low kurtosis indicates light tails and a flatter distribution (Uniform distribution of rainfall across months)
Percentiles and quartiles offer insights into relative data position:
- Quartiles divide data into four equal parts (25th, 50th, 75th percentiles)
- Useful for understanding data spread and identifying potential outliers (IQ scores, standardized test results)

Relationships Between Variables

Correlation coefficients measure strength and direction of variable relationships:
- for linear relationships between continuous variables (-1 to +1)
- for monotonic relationships, including non-linear associations
Interpret correlation values:
- Strong positive correlation (Height and weight, = 0.8)
- Moderate negative correlation (Age and reaction time, r = -0.5)
- Weak or no correlation (Shoe size and intelligence, r = 0.1)
Consider limitations of correlation:
- Does not imply causation
- Sensitive to outliers and non-linear relationships

Data Visualization for Impact Evaluation

Distribution and Relationship Visualizations

Histograms display frequency distribution of continuous variables:
- Reveal central tendency, spread, and shape (Income distribution across population)
- Identify multimodal distributions or unexpected patterns
provide smoothed representations of distributions:
- Useful for comparing multiple groups (Treatment vs control group outcomes)
- Highlight subtle differences in distribution shape
Scatter plots visualize relationships between two continuous variables:
- Identify linear or non-linear patterns (Age vs income)
- Incorporate additional dimensions through color, size, or shape (Gender, education level)
Box plots display five-number summary of datasets:
- Show median, quartiles, and potential outliers (Comparing test scores across schools)
- Effective for comparing distributions across groups or categories

Categorical and Time Series Visualizations

Bar charts display categorical data frequencies or proportions:
- Compare values across categories (Market share by company)
- Use stacked or grouped bars for multiple categories (Employment status by gender and age group)
Pie charts show part-to-whole relationships for categorical data:
- Effective for displaying percentage breakdowns (Budget allocation by department)
- Limited usefulness when comparing many categories
Time series plots illustrate trends and patterns over time:
- Reveal seasonality, cycles, or long-term trends (Monthly unemployment rates)
- Incorporate moving averages or trend lines to highlight underlying patterns
Heat maps and choropleth maps visualize spatial patterns:
- Display data intensity across geographical regions (Crime rates by neighborhood)
- Useful for identifying clusters or hotspots in spatial data

Advanced Visualization Techniques

Interactive visualizations allow data exploration:
- Tools like D3.js or Plotly enable zooming, filtering, and hovering
- Facilitate deeper understanding of complex datasets (Interactive COVID-19 dashboards)
Multi-dimensional visualizations represent 3+ variables:
- Bubble charts combine x-y position, size, and color (GDP, life expectancy, and population by country)
- Parallel coordinates plot for high-dimensional data exploration
Animation and small multiples for temporal changes:
- Animated charts show evolution over time (Changing age structure of populations)
- Small multiples display series of similar graphs for comparison (Economic indicators across decades)

Interpreting Data Insights

Baseline Comparison and Randomization Assessment

Compare descriptive statistics between treatment and control groups:
- Assess effectiveness of randomization in experimental designs
- Identify potential sources of selection bias in quasi-experimental studies
Examine key demographic and socioeconomic variables:
- Age, gender, education level, income (Ensure groups are balanced at baseline)
- Flag significant differences that may affect interpretation of results
Visualize baseline distributions for important variables:
- Use overlaid histograms or density plots to compare groups
- Identify any systematic differences that may threaten internal validity

Outcome Analysis and Impact Visualization

Track changes in outcome variables over time:
- Compare pre- and post-intervention measurements
- Use time series plots to visualize trends (Monthly income levels before and after job training program)
Employ box plots and violin plots for distribution comparisons:
- Highlight differences in central tendency and spread between groups
- Identify potential heterogeneity in treatment effects (Test score distributions for different teaching methods)
Create scatter plots with regression lines to illustrate relationships:
- Program intensity or duration vs outcome variables
- Reveal potential dose-response effects (Hours of tutoring vs improvement in grades)

Subgroup Analysis and Heterogeneous Effects

Conduct descriptive statistics for different subgroups:
- Analyze outcomes by demographic categories (Age groups, gender, socioeconomic status)
- Identify potential heterogeneous treatment effects
Visualize subgroup differences using faceted or grouped plots:
- Create separate visualizations for each subgroup
- Use color-coding to distinguish subgroups within a single plot
Interpret subgroup findings cautiously:
- Consider sample size limitations for smaller subgroups
- Be aware of multiple comparisons problem when testing many subgroups

Identifying Outliers and Patterns

Statistical Methods for Outlier Detection

Use z-scores to identify extreme values:
- Calculate z-score = (x - μ) / σ
- Flag data points with |z| > 3 as potential outliers (Extreme test scores in a class)
Apply Interquartile Range (IQR) method:
- Calculate IQR = Q3 - Q1
- Identify outliers as points < Q1 - 1.5IQR or > Q3 + 1.5IQR (Detecting unusual home prices in a neighborhood)
Employ Median Absolute Deviation (MAD) for robust outlier detection:
- Less sensitive to extreme values than z-scores
- Useful for skewed distributions (Identifying anomalous network traffic patterns)

Visual Techniques for Pattern Identification

Examine box plots for potential outliers:
- Points falling beyond the whiskers considered outliers
- Compare outlier prevalence across groups or categories
Analyze scatter plots for unusual patterns:
- Identify non-linear relationships or unexpected clusters
- Look for data points that deviate from the general trend (Unusual price-demand relationships in economic data)
Investigate histograms and density plots for distribution anomalies:
- Detect multimodality or unexpected peaks
- Identify asymmetry or heavy tails that may indicate data quality issues

Advanced Outlier and Pattern Analysis

Calculate Cook's distance and leverage statistics in regression analyses:
- Identify influential observations that disproportionately affect results
- Investigate points with high Cook's distance (> 4/n) or leverage (> 2p/n)
Examine time series data for unusual patterns:
- Look for sudden level shifts, trend changes, or seasonal irregularities
- Use time series decomposition to separate trend, seasonal, and residual components
Investigate context and potential causes of outliers or patterns:
- Distinguish between data entry errors, measurement issues, and genuine phenomena
- Consider the impact of outliers on statistical analyses and interpretation of results

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

About Fiveable Blog Careers Testimonials Code of Conduct Terms of Use Privacy Policy CCPA Privacy Policy

Resources

Cram Mode AP Score Calculators Study Guides Practice Quizzes Glossary Crisis Text Line Request a Feature

Stay Connected

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

About Fiveable Blog Careers Testimonials Code of Conduct Terms of Use Privacy Policy CCPA Privacy Policy

Resources

Cram Mode AP Score Calculators Study Guides Practice Quizzes Glossary Crisis Text Line Request a Feature

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Glossary

You have 3 free guides left 😟

You have 3 free guides left 😟

6.1 Descriptive statistics and data visualization

Descriptive Statistics for Data Exploration

Measures of Central Tendency and Dispersion

Top images from around the web for Measures of Central Tendency and Dispersion

Top images from around the web for Measures of Central Tendency and Dispersion

Distribution Shape and Relative Position

Relationships Between Variables

Data Visualization for Impact Evaluation

Distribution and Relationship Visualizations

Categorical and Time Series Visualizations

Advanced Visualization Techniques

Interpreting Data Insights

Baseline Comparison and Randomization Assessment

Outcome Analysis and Impact Visualization

Subgroup Analysis and Heterogeneous Effects

Identifying Outliers and Patterns

Statistical Methods for Outlier Detection

Visual Techniques for Pattern Identification

Advanced Outlier and Pattern Analysis

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

Resources

Stay Connected

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

Resources

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Next