A Major League Baseball Club Project 6

Term: Spring 2023

Faculty Advisor: Prof. Bhaven Mistry

Project Description:

The Dodgers baseball club and sports analytics division have proposed a data analysis project that aims to find the number of chances for outfielders to throw out runners at home and third base based on player location and situation. The project also involves developing a model to predict the out probability of throwing runners out from the outfield given conditions of the play and evaluating decisions of the first/third base coach to send or hold a runner. The student team is expected to create well-documented code repository in R or Python that creates metrics, builds a model, and uses it to determine whether a runner should have tried to advance or stay at the same base.

The students who participate in this project must have knowledge of regression and machine learning techniques, as well as an understanding of cross-validation for model evaluation. They need to be able to implement these concepts in R or Python, and the ability to write sustainable code that will ease its incorporation into the pipeline is preferred. The client will provide a dataset of plays in which the ball is hit to the outfield with tracking information during the 2019-2022 MLB regular and post seasons, which includes the identity of the fielder, the identity of the runner, information on the trajectory of the throw, information on the location of the players, information on the state of the game, and information on the result of the play. Visualizations for communicating model results would also be impactful, though not required.