Analyzing NFL Game-Day Weather with Data Science
Weather is something every football fan notices, but its role in shaping NFL games is rarely examined in a systematic way. In this project, I explored how often NFL games are played under different environmental conditions and established a data-driven foundation for understanding weather’s potential impact on the sport.
Using weather data from the 2000-2020 NFL seasons, compiled from Meteostat and Weather Underground, I cleaned and standardized hourly observations of temperature, wind, and precipitation. I transformed continuous weather variables into interpretable indicators for extreme cold, extreme heat, high wind, and precipitation, allowing conditions to be summarized at the game level.
This project was completed to highlight skills developed during my first semester in the program, particularly those from the Data Engineering I: Data Pipeline Architecture course. The workflow emphasized modular data cleaning, feature engineering, and preparation of time-series data for aggregation, all of which are core components of building reliable data pipelines. The project also established a foundation for future Phase 2 analysis examining how environmental conditions influence team performance and game outcomes.
Check out my project’s GitLab page.
NFL Weather Impact Analysis
Project Overview
This project explores how environmental conditions during NFL games vary by year, season and weather category, establishing a foundation for future performance-impact analysis.
Data Source
Weather data was sourced from a Kaggle dataset compiled by Tom Bliss. The dataset contains NFL game-day weather data from 2000-2020, aggregated from Meteostat and Weather Underground.
Files
stadium_coordinates.csv
• StadiumName
• RoofType
• Longitude
• Latitude
• StadiumAzimuthAngle
games.csv
• game_id
• Season
• StadiumName
• TimeStartGame
• TimeEndGame
• TZOffset
games_weather.csv
• game_id
• Source
• DistanceToStation
• TimeMeasure
• Temperature
• DewPoint
• Humidity
• Precipitation
• WindSpeed
• WindDirection
• Pressure
Research Questions (Phase 1)
Phase 1 focuses on descriptive characterization of game-day weather conditions.
- What is the distribution of game-time temperatures?
- What proportion of games occur in extreme cold (<32°F) or heat (>85°F)?
- How common are precipitation games?
- How does wind speed vary by season?
Skills Demonstrated
- Data Cleaning & Preprocessing: Standardized column names, parsed datetime fields, handled missing values.
- Feature Engineering: Created interpretable indicators for extreme temperature, high wind, and precipitation events.
- Exploratory Data Analysis & Visualization: Aggregated hourly weather data to game-level summaries and visualized patterns across seasons.
Tools
- Python
- Pandas
- NumPy
- Matplotlib
Project Roadmap
Phase 1 focuses on characterizing environmental conditions in NFL games.
Phase 2 will incorporate team-level game and stadium outcomes to evaluate performance impacts under different weather conditions.
Limitations and Future Work
This dataset does not include team-level identifiers or game outcomes. Future work will integrate game results data to assess performance under different weather conditions.


