🪓Data Journalism Unit 10 – Data Journalism: Managing & Querying Databases
Databases are the backbone of data journalism, enabling reporters to store, organize, and analyze vast amounts of information. This unit introduces SQL, the language for managing relational databases, and covers essential concepts like tables, keys, and relationships.
Students learn to write SQL queries, extract insights from data, and apply best practices for data management. The unit emphasizes real-world applications in journalism, such as analyzing public records and tracking campaign finance data, while addressing common challenges in database work.
Explores the fundamental role of databases in data journalism for storing, organizing, and retrieving large amounts of structured data
Introduces SQL (Structured Query Language) as the standard language for managing and querying relational databases
Covers essential database concepts such as tables, rows, columns, primary keys, foreign keys, and relationships
Emphasizes the importance of understanding database design principles to ensure data integrity and efficient querying
Provides hands-on experience with writing SQL queries to extract, filter, sort, and aggregate data from databases
Discusses best practices for data management, including data validation, normalization, and backup strategies
Highlights real-world applications of databases and SQL in data journalism projects, such as analyzing public records or tracking campaign finance data
Identifies common pitfalls and challenges when working with databases and offers practical tips to overcome them
Key Concepts & Terms
Relational database: A type of database that organizes data into tables with rows and columns, establishing relationships between tables using keys
SQL (Structured Query Language): The standard language used to manage and query relational databases
Table: A collection of related data organized in rows and columns, representing a specific entity or concept (e.g., customers, articles, or transactions)
Row: A single record or instance of data within a table, containing values for each column
Column: A specific attribute or field of data within a table, defining the data type and constraints for each value in that column
Primary key: A unique identifier for each row in a table, ensuring data integrity and enabling efficient querying
Foreign key: A column in one table that refers to the primary key of another table, establishing a relationship between the two tables
JOIN: An SQL operation that combines rows from two or more tables based on a related column between them
Normalization: The process of organizing data in a database to minimize redundancy and dependency, ensuring data integrity and reducing data anomalies
Involves dividing data into smaller, more focused tables and establishing relationships between them using keys
Database Basics
Databases provide a structured way to store, organize, and retrieve large volumes of data
Relational databases, such as MySQL, PostgreSQL, and SQLite, are the most common type used in data journalism
They organize data into tables with rows and columns, establishing relationships between tables using keys
Each table represents a specific entity or concept, such as articles, authors, or sources
Columns define the attributes or fields of data for each entity (e.g., title, publication_date, or word_count)
Rows contain the actual data values for each instance of the entity
Primary keys uniquely identify each row within a table, ensuring data integrity and enabling efficient querying
Often an auto-incrementing integer or a unique combination of columns
Foreign keys establish relationships between tables by referring to the primary key of another table
Enable the creation of complex queries that combine data from multiple tables
Database design involves determining the appropriate tables, columns, and relationships to efficiently store and retrieve data
Proper design minimizes data redundancy and ensures data integrity through normalization techniques
SQL: Your New Best Friend
SQL (Structured Query Language) is the standard language for managing and querying relational databases
Enables journalists to extract, filter, sort, and aggregate data from databases using a set of simple yet powerful commands
Basic SQL syntax consists of keywords (e.g., SELECT, FROM, WHERE, ORDER BY) and clauses that define the desired data and operations
SELECT specifies the columns to retrieve
FROM indicates the table(s) to query
WHERE filters rows based on specified conditions
ORDER BY sorts the result set by one or more columns
SQL supports a wide range of functions and operators for manipulating and transforming data