Proposal Overview
- Summary of Research Topic,: : This project investigates U.S. flight delay patterns using descriptive analysis and predictive modeling to identify drivers beyond weather, such as staffing shortages, maintenance issues, aircraft/crew rotations, and airport congestion. We will integrate BTS on-time performance data with weather, traffic, and operational signals to quantify trends across airlines, airports, routes, and times of day, compare reliability between carriers and hubs, and model both the likelihood and duration of delays. The goal is to provide actionable insight for airlines and airports to inform scheduling, staffing, and resource allocation. By centering controllable operational factors alongside weather, this work addresses gaps in prevailing, weather-centric models and provides evidence to guide more effective industry responses.
- Scope: Exploring flight delay patterns with a focus on post-COVID changes. Identifying key causes of delays (weather vs. controllable factors like staffing, maintenance, scheduling). Building machine learning models to predict both the likelihood and length of delays.
- Research Questions:
- Question 1: Has there been an increase in flight delays since COVID pandemic? Explore reasons for staff shortages, increase in travel or weather? (factors)
- Question 2: Can we predict flight delays?
- Question 3: Do certain airline companies have more flight delays?
- Question 4: Do certain airports have more flight delays?
- Question 5: Do current flights predict delays in a timely fashion how often?
- Question 6: What general variables are the most impactful in determining whether a flight will be delayed? (Between weather, staffing, mechanical, or other)
- Question 7: What specific variables are most informative on how long a flight will be delayed for?
- Question 8: How reliable can delay predictions be? Is there too much variability in records to be able to make good predictions?
- Question 9: Do flights to or from certain locations (east vs west coast, or down to airport precision) or at certain times (midnight vs midday) tend to be delayed more?
- Question 10: Can patterns between flight location/time and delays be explained in an actionable way, such that airport managers could work to improve the underlying problems?
- Question 11: How does the precision of a prediction change with the addition of more information? Are extra features impactful to a model, or are delays unpredictable beyond whether they will occur or not? For example: If a flight delay has been announced, how accurate would a prediction be solely based on that announcement compared to knowing that the delay was caused by weather?
- Question 12: How can delay predictions best be conveyed to commuters? Could there be a data visualization method that might be sub-optimal in a data science setting, but which would be more easily understandable to a layperson within an airport? (this one couldn't be purely supported by data so it might not be usable)
Datasets
Bureau of Transportation Statistics (BTS) — On-Time Performance
Kaggle — Flight Delays
COVID-19 Air Travel
← Back to Home