3.4 Creating original datasets through surveys and crowdsourcing
6 min read•july 30, 2024
Creating original datasets through and is a powerful way to gather unique insights for data journalism. This approach allows reporters to collect targeted information directly from sources, filling gaps in existing data and uncovering new stories.
Surveys and crowdsourcing projects require careful planning, from defining research questions to designing questionnaires and engaging participants. Proper data cleaning, analysis, and integration with other sources can transform raw responses into compelling narratives that inform and engage readers.
Survey Design for Journalism
Key Steps in the Survey Process
Top images from around the web for Key Steps in the Survey Process
The Research Process | ENG 101 College Writing I View original
Define the research question that guides the design of the survey instrument and data analysis
Research questions should be specific, measurable, and relevant to the journalistic project
Identify the target population (entire group the survey aims to understand) and sampling frame (lists of all members of the target population from which the sample will be selected)
Determine the sample size (large enough to represent the target population and support ) and sampling method (probability sampling methods like simple or help ensure representative samples)
Design the questionnaire by crafting clear, unbiased questions that elicit accurate responses
Question types include open-ended, closed-ended, Likert scale, and ranking
Pretest questions to identify potential issues with comprehension, flow, or response options
Include demographic questions to analyze differences among subgroups
Collect data through various modes, such as online surveys, phone interviews, or in-person interviews
Each mode has advantages and limitations related to cost, response rates, and data quality
Questionnaire Design and Data Collection
Craft clear, unbiased questions that elicit accurate responses
Use simple, jargon-free language that is easily understood by respondents
Avoid leading or loaded questions that suggest a particular answer
Provide clear instructions and definitions for key terms
Choose appropriate question types based on the information needed
Open-ended questions allow respondents to provide detailed, qualitative responses (e.g., "What do you think about the proposed policy?")
Closed-ended questions offer a fixed set of response options, making data analysis easier (e.g., "Do you support or oppose the proposed policy?")
Likert scale questions measure attitudes or opinions on a numeric scale (e.g., "On a scale of 1 to 5, how strongly do you agree with the statement?")
Ranking questions ask respondents to order a set of items by preference or importance
Pretest questions with a small group of respondents to identify potential issues
Check for comprehension, clarity, and ease of answering
Revise questions based on feedback to improve data quality
Collect data through the most appropriate mode for the target population and research question
Online surveys are cost-effective and can reach a large, geographically dispersed sample
Phone interviews allow for more in-depth questioning and can reach respondents without internet access
In-person interviews provide the richest data but are time-consuming and expensive
Crowdsourcing Data and Audiences
Leveraging Crowdsourcing Platforms
Crowdsourcing leverages the collective intelligence and resources of a large group of people, often through online platforms, to gather information, solve problems, or generate ideas
Crowdsourcing can help journalists collect data quickly and cost-effectively
Use general-purpose crowdsourcing platforms like or for simple data collection tasks
These platforms are easy to use and offer basic data analysis features
Example: Collecting reader opinions on a local issue through a Google Form embedded in an article
Utilize specialized crowdsourcing platforms like or for more complex projects
These platforms are designed for specific use cases, such as crisis mapping or collaborative investigations
Example: Using Ushahidi to map reports of election irregularities submitted by citizens
Designing and Engaging with Crowdsourcing Projects
Define the problem or question clearly and identify the target audience
Provide clear instructions for participation, including what data to submit and how to submit it
Example: Asking readers to submit photos of potholes in their neighborhood, along with the location and date
Gather a variety of data types, such as personal experiences, observations, opinions, or documents
Projects may involve asking participants to submit photos, videos, or other multimedia content
Example: Collecting personal stories and documents related to a particular issue, such as experiences with the healthcare system
Verify submissions and ensure data quality
Implement processes to validate submissions, detect fraudulent or duplicate entries, and ensure data accuracy
Example: Cross-referencing submitted data with official records or conducting follow-up interviews with selected participants
Engage with participants throughout the project
Communicate regularly with contributors, provide feedback and updates, and acknowledge their contributions
Example: Sending personalized thank-you messages to participants and featuring selected submissions in the final story
Consider ethical issues, such as protecting participant privacy, obtaining informed consent, and ensuring transparency about how data will be used
Example: Providing clear privacy policies and allowing participants to opt-out of having their submissions published
Data Cleaning and Analysis
Preparing Data for Analysis
Clean data by identifying and correcting errors, inconsistencies, or missing values
Remove duplicates, standardize formats, and handle outliers
Example: Converting all date values to a consistent format (YYYY-MM-DD) or removing rows with missing key variables
Validate data by checking whether it meets predefined criteria or rules
Verify that responses fall within acceptable ranges, confirm that required fields are complete, or cross-reference data against external sources
Example: Checking that age values are positive integers within a reasonable range (e.g., 18-120)
Conduct exploratory data analysis (EDA) to understand patterns, relationships, and potential insights
Calculate summary statistics, create visualizations, and identify correlations or trends
Example: Creating a histogram of survey respondents' ages or a scatterplot of two variables to identify potential relationships
Analyzing and Interpreting Results
Use statistical analysis to test hypotheses, compare groups, or build predictive models
Apply methods such as t-tests, ANOVA, chi-square tests, and regression analysis
Example: Using a t-test to compare the mean satisfaction scores between two different customer groups
Weight data to adjust for sampling biases or ensure representativeness
Assign different importance to individual responses based on demographic or other characteristics
Example: Weighting survey responses by age and gender to match the population distribution
Use tools like spreadsheet software (Excel), statistical packages (SPSS, R), and tools (Tableau, D3.js) for data cleaning, validation, and analysis
Example: Using R to perform a logistic regression analysis and create a interactive visualization of the results
Data Integration for Storytelling
Combining Original and External Datasets
Integrate original survey or crowdsourced data with other datasets (government databases, academic research, data from other media outlets) for a more comprehensive understanding
Merge datasets based on common variables or keys using techniques like joining tables, concatenating datasets, or aggregating data at different levels of granularity
Example: Combining original survey data on local business sentiment with government data on economic indicators
Integrate geospatial data (maps, satellite imagery) to explore geographic patterns or trends
Use GIS software or mapping libraries to visualize and analyze geospatial data
Example: Mapping crowdsourced reports of crime incidents and comparing them to official police data
Analyze text data (open-ended survey responses, social media posts) using natural language processing (NLP) techniques
Identify common themes, sentiment, or named entities in the text data
Example: Using NLP to analyze open-ended survey responses about experiences with a particular product or service
Communicating Insights through Data Visualization
Create clear, accurate, and tailored data visualizations to communicate insights from integrated datasets
Use charts, graphs, maps, and interactive dashboards to present findings
Example: Developing an interactive dashboard that allows readers to explore survey results by demographic group or geographic area
Be transparent about data sources, methods, and limitations
Provide access to raw data and methodology to enhance credibility and reproducibility
Example: Publishing a detailed methodology document that describes the data collection, cleaning, and analysis process
Consider ethical issues in data integration
Protect individual privacy, ensure data security, and avoid misleading or biased interpretations of the combined data
Example: Anonymizing sensitive data before publishing and providing context to help readers interpret the findings accurately