SQL PROJECT
COVID-19 Data Analysis
A comprehensive exploration of global pandemic data using advanced SQL techniques to track infection rates, mortality, and vaccination progress.
Project Overview
This project analyzes the Covid Deaths and Covid Vaccinations datasets from "Our World in Data". The goal is to derive meaningful insights regarding the likelihood of mortality, infection rates relative to population, and the global rollout of vaccinations.
The analysis follows a structured approach, starting from basic data exploration and moving towards complex joins and window functions to calculate rolling metrics.
Usage Instructions
- Clone the repository via Git.
- Import the Our World in Data dataset into your SQL Server environment.
- Run the queries below sequentially to reproduce the analysis.
Key Analysis & SQL Queries
1. Total Cases vs Total Deaths
Evaluates the likelihood of dying if you contract COVID-19 in a specific country (e.g., India).
SELECT Location, date, total_cases, total_deaths,
(CAST(total_deaths AS decimal) / total_cases) * 100 AS DeathPercentage
FROM [Covid Data Exploration]..CovidDeaths
WHERE location LIKE '%india%'
AND continent IS NOT NULL
ORDER BY 1,2
2. Total Cases vs Population
Examines what percentage of the population has contracted COVID-19.
SELECT Location, date, population, total_cases,
(total_cases/population)*100 AS DeathPercentage
FROM [Covid Data Exploration]..CovidDeaths
WHERE location LIKE '%india%'
AND continent IS NOT NULL
ORDER BY 1,2
3. Highest Infection Rates
Identifies countries with the highest infection rates relative to their population.
SELECT Location, population, MAX(total_cases) AS HighestInfectionCount,
MAX((total_cases/population))*100 AS PercentPopulationInfected
FROM [Covid Data Exploration]..CovidDeaths
GROUP BY location, population
ORDER BY PercentPopulationInfected DESC
4 & 5. Highest Death Counts (Country & Continent)
Breaking down total death counts by country and by continent to identify the hardest-hit regions.
-- By Country
SELECT location, MAX(total_deaths) AS TotalDeathCount
FROM [Covid Data Exploration]..CovidDeaths
WHERE continent IS NOT NULL
GROUP BY location
ORDER BY TotalDeathCount DESC
-- By Continent
SELECT continent, MAX(total_deaths) AS TotalDeathCount
FROM [Covid Data Exploration]..CovidDeaths
WHERE continent IS NOT NULL
GROUP BY continent
ORDER BY TotalDeathCount DESC
6. Global Statistics
Calculates the total new cases, new deaths, and the global death percentage across the world.
SELECT SUM(new_cases) AS TotalCases,
SUM(new_deaths) AS TotalDeaths,
(SUM(CAST(new_deaths AS decimal)) / SUM(CAST(new_cases AS decimal))) * 100 AS DeathPercentage
FROM [Covid Data Exploration]..CovidDeaths
WHERE continent IS NOT NULL
ORDER BY 1,2
8 & 9. Vaccination Rollout (Window Functions & CTE)
Uses Window Functions to calculate a rolling count of vaccinated people, then uses a CTE (Common Table Expression) to perform calculations on that rolling count.
WITH PopVsVac (Continent, Location, Date, Population, new_vaccinations, RollingPeopleVaccinated)
AS (
SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations,
SUM(CONVERT(int, vac.new_vaccinations)) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.date) AS RollingPeopleVaccinated
FROM [Covid Data Exploration]..CovidDeaths dea
JOIN [Covid Data Exploration]..CovidVaccinations vac
ON dea.[location] = vac.[location] AND dea.[date] = vac.[date]
WHERE dea.continent IS NOT NULL
)
SELECT *, (RollingPeopleVaccinated/population)*100 AS PercentVaccinated
FROM PopVsVac
10. Creating Views for Visualization
Storing the complex query logic into a View for later use in visualization tools like Tableau or PowerBI.
CREATE VIEW VaccinationPercentage AS
SELECT dea.continent, dea.location, dea.date, dea.population, vac.new_vaccinations,
SUM(CONVERT(int, vac.new_vaccinations)) OVER (PARTITION BY dea.location ORDER BY dea.location, dea.date) AS RollingPeopleVaccinated
FROM [Covid Data Exploration]..CovidDeaths dea
JOIN [Covid Data Exploration]..CovidVaccinations vac
ON dea.[location] = vac.[location] AND dea.[date] = vac.[date]
WHERE dea.continent IS NOT NULL
Outcomes & Final Thoughts
Through this extensive data exploration project, we successfully analyzed the global impact of COVID-19. Key insights derived include:
Mortality Risk
Established precise death percentages for individual countries, allowing for risk assessment.
Infection Spread
Identified nations with the highest infection rates relative to their population density.
Global Impact
Quantified the global death percentage and isolated the hardest-hit continents.
Vaccination Progress
Tracked rolling vaccination counts and calculated percentage of vaccinated populations.
The skills utilized here - complex joins, CTEs, and window functions - are foundational for any robust data analysis workflow. This project highlights the power of SQL for deep data exploration. By converting raw data into meaningful metrics, we can tell a compelling story about the global impact of the pandemic.