You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

is revolutionizing investigative reporting. By harnessing diverse and advanced analysis techniques, journalists can uncover hidden stories and support their findings with solid evidence. This approach adds depth and credibility to reporting, enabling journalists to tackle complex issues with precision.

From government databases to leaked documents, data sources are plentiful. Journalists must master , cleaning, and analysis skills to extract meaningful insights. Visualization tools then help transform raw data into compelling narratives, making complex information accessible to readers and amplifying the impact of investigative stories.

Data Sources for Investigative Reporting

Types of Data Sources

Top images from around the web for Types of Data Sources
Top images from around the web for Types of Data Sources
  • Investigative journalists use data from a variety of sources to uncover stories
    • Government databases (census data, crime and justice statistics, campaign finance records, lobbying databases, spending data)
    • Public records (property records, court documents, business filings)
    • Leaked documents (confidential memos, emails, financial records)
    • Surveys (public opinion polls, customer satisfaction surveys)
    • Original data collection (field observations, experiments, crowdsourced data)

Acquiring Datasets

  • Datasets can be acquired through various methods
    • extracts data from websites using automated tools (Scrapy, BeautifulSoup)
    • (Application Programming Interfaces) allow programmatic access to data from online platforms (Twitter API, Google Maps API)
    • Manual collection and entry may be necessary for non-digital or unstructured data sources (handwritten records, printed documents)
  • Journalists should understand the basics of these acquisition methods and associated tools to effectively gather data
  • When acquiring data, it's important to evaluate the source and methodology
    • Assess accuracy and completeness of the data
    • Consider timeliness and relevance to the investigative story
    • Exercise journalistic skepticism and verify information

Creating Original Datasets

  • When existing data is insufficient, investigative journalists may need to create original datasets
    • Surveys can be conducted to gather data from a sample population (opinion polls, market research)
      • Understanding survey design principles and sampling techniques is valuable
    • Experiments or field observations can generate data to test hypotheses or understand phenomena (A/B testing, ethnographic studies)
    • Crowdsourcing involves collecting data from a large group of contributors (user-generated content, citizen science projects)
  • Original data collection requires careful planning and execution to ensure data quality and integrity

Handling Sensitive Data

  • Investigative journalists often work with leaked or confidential documents and datasets
    • Negotiating access to sensitive information requires tact and persistence
    • Verifying authenticity and accuracy of leaked data is critical
    • Protecting sources and handling data securely is a key ethical consideration
  • Journalists must navigate legal and ethical issues around sensitive data (privacy rights, national security concerns)

Data Cleaning and Analysis

Data Preprocessing

  • Raw datasets often require cleaning and preprocessing before analysis
    • Handling missing or incomplete data points (deletion, imputation)
    • Removing duplicate or irrelevant records
    • Standardizing formats and normalizing values (date formats, units of measurement)
  • Large datasets may need to be transformed to facilitate analysis
    • Filtering to extract relevant subsets of data
    • Sampling to create representative subsets for efficiency
    • Aggregating or summarizing data at different levels of granularity
    • Merging datasets to combine relevant information from multiple sources

Spreadsheet Analysis

  • Spreadsheet software (Excel, Google Sheets) enables basic data manipulation and analysis
    • Sorting and filtering to organize and extract relevant data
    • Pivot tables to summarize and cross-tabulate data
    • Formulas for calculations, conditionals, and logical operations (SUM, IF, VLOOKUP)
  • Spreadsheets are suitable for smaller datasets and simpler analyses

Relational Databases and SQL

  • Relational databases (, ) are used to store and manage large, structured datasets
    • Data is organized into tables with defined relationships
    • (Structured Query Language) is used to extract, filter, and analyze data
  • Understanding basic SQL statements is valuable for data journalism
    • SELECT to retrieve data, WHERE to filter records, JOIN to combine tables, GROUP BY to aggregate data
  • Databases enable efficient querying and analysis of complex datasets

Programming for Data Analysis

  • Programming languages (, ) provide powerful tools for data manipulation, analysis, and visualization
    • Libraries like and (Python) or dplyr and ggplot2 (R) offer extensive functionality
    • provide an interactive environment for
  • More advanced techniques can be applied using programming
    • and hypothesis testing
    • for pattern recognition and prediction
    • Text mining and natural language processing
  • The choice of tools depends on the complexity of the data and analysis required
    • Journalists should be familiar with the range of options and their strengths and limitations

Insights from Data

Exploratory Data Analysis

  • Exploratory analysis is the starting point for generating investigative leads and research questions
    • Identifying trends, patterns, outliers, or anomalies in the data
    • Visualizing distributions, relationships, and variations
  • Summary statistics provide a high-level understanding of the data
    • Measures of central tendency (mean, median, mode)
    • Measures of variability (range, variance, standard deviation)
  • Segmenting data by relevant variables (demographics, time periods) and comparing subgroups can reveal key insights
    • Differences in summary statistics across segments may indicate disparities or areas for further investigation

Examining Relationships

  • examines the relationship between two variables
    • Positive correlation indicates variables move in the same direction
    • Negative correlation indicates variables move in opposite directions
    • Correlation coefficient (Pearson's r) quantifies the strength of the linear relationship
  • Scatter plots visually represent the relationship between two continuous variables
  • Correlation does not imply causation; further investigation is needed to establish causal links

Statistical Inference

  • techniques test hypotheses and assess the significance of findings
    • compare means between two groups to determine if differences are statistically significant
    • (Analysis of Variance) compares means across multiple groups
    • examine the association between categorical variables
  • p-values indicate the probability that observed results occurred by random chance
    • Lower p-values (typically < 0.05) suggest statistically significant findings
  • Confidence intervals estimate the range of plausible values for a population parameter

Predictive Modeling

  • models the relationship between a dependent variable and one or more independent variables
    • fits a line to predict a continuous outcome
    • predicts a binary outcome (yes/no, success/failure)
  • Machine learning techniques can identify complex patterns and make predictions
    • (k-means) group similar data points together
    • (decision trees, support vector machines) predict categories
  • Predictive models should be validated and their limitations acknowledged

Communicating Insights Responsibly

  • Journalists must interpret and communicate data-driven insights carefully
    • Avoid overstating or misrepresenting findings
    • Acknowledge limitations of data and analysis
    • Provide context and caveats around results
  • Verify insights through traditional reporting techniques
    • Corroborate data with human sources and documents
    • Seek expert opinion to validate findings
  • Transparent methodology and data sourcing enhances credibility
    • Describe data collection and analysis processes
    • Make datasets and code available when possible

Data Visualization for Storytelling

Principles of Effective Visualization

  • translates complex datasets into engaging, informative formats
    • Allows audiences to explore data and grasp key takeaways
    • Enhances storytelling by making insights accessible and memorable
  • Effective visualizations are tailored to the story and audience
    • Highlight the most relevant insights and narrative threads
    • Balance complexity and clarity to avoid overwhelming readers
  • Visualizations should accurately represent the data
    • Maintain proportionality and scale
    • Use appropriate chart types and visual encodings
    • Avoid distortions or manipulations that mislead

Choosing the Right Visualization

  • Different types of charts and graphs serve different purposes
    • Bar and column charts compare discrete categories (election results by candidate)
    • Line charts show change over time or along a continuous scale (stock price trends)
    • Scatterplots depict the relationship between two continuous variables (age vs. income)
    • Maps plot geographic data (crime incidents by neighborhood)
  • The choice of visualization depends on the nature of the data and the story being told
    • Consider the data types (categorical, continuous, temporal, spatial)
    • Align the visualization with the key message or insight
  • Combining multiple chart types can provide a more comprehensive view

Interactive and Animated Visualizations

  • Interactive elements engage readers and allow them to explore the data
    • Filters and sliders to focus on specific subsets or ranges
    • Hover effects and tooltips to reveal additional details
    • Drill-downs and zoom to move between levels of granularity
  • Animations can be eye-catching and effective for showing change over time
    • Animated transitions between different chart states
    • Animated scrollytelling guides readers through a narrative
  • Interactivity should be purposeful and not detract from the central message

Tools and Technologies

  • Spreadsheet programs (Excel, Google Sheets) can create basic charts and graphs
  • Specialized visualization tools offer more advanced functionality
    • for interactive dashboards and exploratory analysis
    • Datawrapper and Infogram for embeddable, shareable charts
    • D3.js for custom, web-based visualizations using JavaScript
  • The choice of tool depends on the complexity of the data and the desired level of interactivity
  • Journalists should be familiar with the capabilities and limitations of different tools

Integrating Visualizations into Stories

  • Data visualizations should be closely integrated with the narrative
    • Align visuals with key points in the story
    • Use text to provide context and explanation for charts
    • Refer to visualizations in the narrative to encourage engagement
  • Placement and layout of visualizations affects reader attention
    • Prominent visuals can serve as entry points or highlights
    • Smaller charts can provide supporting details
  • Consistency in visual style and branding enhances professionalism

Accessibility and Ethics

  • Visualizations should be accessible to all readers
    • Use colorblind-friendly palettes and sufficient contrast
    • Provide text alternatives and captions for visual elements
    • Ensure interactivity can be navigated by keyboard
  • Ethical considerations in visualization mirror those in journalism broadly
    • Avoid sensationalizing or misrepresenting data
    • Disclose data sources and methodology transparently
    • Respect privacy and protect sensitive information
  • Journalists are responsible for the accuracy and integrity of their visualizations
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary