Discrepancies between reported knuckleball spin rates and dynamics

Major League Baseball tracks every pitch of every game and provides the public with access to a significant amount of data for each pitch. This level of data availability is not seen in many other sports and offers substantial opportunities to provide analysis on the sport. Major League Baseball does not provide the raw data but provides refined data. While the algorithms for transforming the raw data are robust, the algorithms are not perfect. This study investigates the specific case of knuckleball pitches and shows that the equations of motion and reported spin rates do not align. This study uses the modified shooting method combined with the Levenberg–Marquardt algorithm to determine the trajectories of the pitches based on the provided data and known equations of motion. This study investigates three scenarios: 1) the reported data are correct; 2) the pitches are correctly identified, but the spin rates are incorrect; and 3) the spin rates are correct, but the pitches are incorrectly identified as knuckleballs. We show that the reported data are inconsistent with the equations of motion and that, based on statistical analysis, the pitch identification is likely incorrect.


Introduction
The knuckleball is a pitch that few pitchers in Major League Baseball (MLB) can throw successfully.A knuckleball has a very low spin rate; a spin rate of 25 to 50 rpm is typical, with an upper thresh hold of 150 rpm.Because of the low spin rate, its movement is challenging to predict, and both the pitcher and catcher often do not know exactly where the ball will go.In baseball, a pitcher must be able to consistently throw strikes, so the challenge for mastering the knuckleball is to throw the pitch so that it routinely will be a strike.Few pitchers have been able to achieve this level of mastery.
The challenge partly comes because dynamical chaos appears to explain how the knuckleball works [1].The ball's movement is subject to the laws of physics, so as long as the initial conditions are appropriate, the ball's trajectory will result in a strike.However, understanding what physics says must be done and implementing the correct initial conditions are two very different tasks.
Watts and Sawyer [2] were some of the first investigators to research the motion of the knuckleball.They performed wind tunnel tests and identified two possible sources that could explain the observed behavior of the knuckleball.In more recent work, Borg and Morrissey [3] included flow visualization in their analysis and determined the contribution of shear stress to the observed knuckleball trajectories.Higuchi and Kiura [4] use Digital Particle Image Velocimetry to analyze the airflow around the knuckleball.Because of potential issues in wind tunnel testing, Smith and Sciacchitano [5] have studied baseball movement in free flight.Healey and Wang [6] have similarly investigated baseball trajectories outside the laboratory setting.
Escalera Santos et al. [7] have used experimental data and analysis to investigate the seam force in addition to the Magnus force contributing to the trajectory of a knuckleball.In recent work, Rooney et al. [8] investigated the wake behind a baseball, including at spin rates consistent with a knuckleball.
Nathan [9] used PITCHf/x data to analyze the trajectory of knuckleballs.His results showed that knuckleballs follow smooth trajectories but that straight-line trajectory deflection are random in direction and magnitude.
Knuckleballs appear in a variety of sports [10].The exact cause of the unpredictable motion of the knuckleball will vary from sport to sport.Kensrud and Smith [11] examined a variety of balls from different sports and found that there is a drag crisis for a knuckleball in baseball.Aguirre-Lopez et al. [12] found that the baseball's seams also impact the knuckleball's trajectory.
All of these previous studies have used various techniques to investigate the motion of the knuckleball.Because of the knuckleball's unpredictable behavior, it is a more significant challenge for an automated process to identify when a pitch is a knuckleball.Other researchers have already researched appropriate ways to extract pitch identification from video [13,14,15,16,17,18].This study does not investigate the raw data's transformation into the provided positions, velocities, accelerations, and spin rates.Instead, this study investigates whether or not the provided information matches the equations of motion derived previously.
Similar work to the method presented in this study was performed for baseballs in general by Aguirre-Lopez et al. [19].That work separated the Magnus force, solved a two-point boundary value problem, and required three points along the trajectory.We cannot implement this exact method because the Statcast data provided by Baseball Savant [20] used in this study only provides two positions along the trajectory.However, the Statcast data does provide velocity and acceleration at a third point.In addition, the two-step method of introducing the Magnus force is not of significant benefit to this study because the contributions of the Magnus force are significantly less for knuckleballs than with other types of pitches.
The contributions of this study are that we are the first to analyze the accuracy of the Statcast pitches labeled as knuckleballs.By inspection, it can be determined that the reported spin rates are not consistent with being knuckleballs.This paper intends to identify whether the spin rates or the identification cause inaccuracy.We show that the trajectories using the equations of motion for a knuckleball do not provide results consistent with the reported spin rates.These initial findings are expected since the reported spin rates are too high to adhere to the low spin rates that generally define a knuckleball.We then investigate if the equations of motion provide consistent results if the spin rates are adjusted as part of the solution procedure.In addition, we investigate whether the quality of the results of pitch identification is inaccurate.Our results show that, based on the available data, the pitch identification is incorrect, or the spin rate is inaccurately determined in all instances.

Experimental
This study uses published equations of motion that explain the trajectory of a knuckleball [21].Without independent data to verify the published model, this study relies on the validation presented in the source and cannot verify the model independently.We queried the Baseball Savant database for the record of every knuckleball pitch from 2008 through 2021.The Statcast data is computed from various sensors located within each MLB stadium [22].Different systems: PITCHf/x, Trackman, and Hawk-Eye have been in use over the time span of the database.The system's accuracy is continually improving, but even the earliest systems achieved a fraction of an inch of accuracy.The database does not indicate which system is used for which record in the database.This study only investigates records that include the spin rate.The oldest record in the database, which consists of a reported spin rate, is from April 8, 2015.The dataset does not have the complete state of the ball at any particular moment in time, but partial state information is available at three different points in time: (i) The position vector, velocity magnitude, and spin rate when the ball is released (ii) The velocity vector and acceleration vector when the ball is 50 feet from home plate (iii) The position vector when the ball crosses the front of home plate.
Without complete state information, we cannot determine the trajectory based on solving the equations of motion as an initial value problem.A problem with Dirichlet boundary conditions would have the position known at the beginning and end of the trajectory.In contrast, a problem with Neumann boundary conditions would have the velocities known at the trajectory's start and end [23].The dataset does include the Cartesian position at both the beginning and end of the trajectory.However, the time of the end of the trajectory is unknown, and the rotation angle at both points is unknown.It is impossible to use any two provided points to recreate the trajectory.As will be subsequently explained, our approach uses a modification of the shooting method [23] to determine the ball's trajectory.
The coordinate system for the Baseball Savant data is a right-handed coordinate system with the y-axis along the line from the pitcher to the catcher, the z-axis vertical to the ground, and the x-axis orthogonal to the other two with values of x increasing from the third base side to the first base side.The original equations of motion presented by Giordano and Nakanishi [21] were in an alternatively defined coordinate system.Converting the equations to the Baseball Savant coordinate system results in equations ( 1) -( 4).
where ϕ is the rotation angle in radians, ω is the rotation rate in radians per second, g is the acceleration due to gravity (32.185 feet per square second), v is the speed of the ball in feet per second.It is calculated with equation ( 5), and the F and G functions are defined by equations ( 6) and ( 7) respectively.
The shooting method is initiated with an initial estimate of the missing state variables and propagates the equations of motion [23].At a future state, where information is known, the propagated state is compared to the known information.This work uses a root-finding algorithm to iteratively determine the initial state such that the propagated state matches the known information.For the knuckleball equations of motions, equations ( 1)-( 4), the complete state has nine components: (i) Time, t, (seconds) (ii) X-position, x, (feet) (iii) Y-position, y, (feet) (iv) Z-position, z, (feet) (v) Rotation angle, ϕ, (radians) (vi) X-velocity, ẋ, (feet per second) (vii) Y-velocity, ẏ, (feet per second) (viii) Z-velocity, ż, (feet per second) (ix) Rotation rate, ω, (radians per second) If we have all nine components for a single point, then the trajectory can be solved as an initial value problem, but we do not have all components at a single point.Therefore, we require information from an additional point to solve the trajectory.Typically, the time is known for all points, but the Baseball Savant dataset does not provide any time information.This study defines the release point as time 0. However, the other two points are defined in y-position, so the y-position can be used to relate the points to each other similar to how time typically does.Of the eight state elements (other than y-position), we must know a value related to the trajectory at a future point to solve the system of equations for each unknown value at the epoch.The shooting method uses a root-finding method to determine the unknown values of the initial state subject to the trajectory meeting the known values of additional points.To solve the system of ordinary differential equations, we used an adaptive step-size Runge-Kutta method using a combination of fourth-and fifth-order solutions.The root-finding problem utilizes the Levenberg-Marquardt algorithm.The Baseball Savant data provides the initial position, but the initial velocity vector and initial rotation angle are unknown.While the spin rate is provided, the hypothesis that the spin rate is inaccurate initiated this study.Three different analyses are performed: (i) The data is assumed to be accurate, so the spin rate from the data is used along with the knuckleball equations of motion (equations ( 1)-( 4)).(ii) For the 2008 to 2021 dataset, the spin rate is considered erroneous, so the spin rate is unknown, and the knuckleball equations of motions are used.(iii) For the 2008 to 2021 dataset, the spin rate is considered to be accurate, but the identification of the type of pitch is deemed to be inaccurate, so more general equations of motion are used (equations ( 8)-( 11)).
Four or five unknown components of the initial state exist for the three cases considered.The three components of the velocity vector and the rotation angle are unknown for all performed analyses.The spin rate is also unknown for the second analysis.
For the analyses with four unknowns, the four pieces of data used by the root-finding algorithm are the three elements of the velocity vector when the ball is 50 feet from home plate and the Euclidean distance from the origin when the ball crosses home plate.The Euclidean distance allows for both components to be considered rather than using only one of the position components.For the analysis with five unknowns, the velocity vector when the ball is 50 feet from home plate still accounts for three of the five pieces of information.The other two pieces of information are the x-and z-position of the ball crossing home plate rather than the Euclidean distance.
This study hypothesizes that all the pitches in the dataset labeled as knuckleballs contain incorrect values.In particular, the reported spin rates are too large in many instances.Either the spin rate is inaccurate, or the spin rate is accurate, but the pitch is not a knuckleball.
Because a knuckleball has a low spin rate, the Magnus force does not significantly contribute to the ball's trajectory.At higher rotation rates, the Magnus force must be considered.Equations  11) taken from the textbook by Giordano [21] are the equations of motion for arbitrary baseball pitch.
where B is a dimensionless quantity that approximates the averaging of the drag force over the face of the ball per unit mass.

Results and Discussion
Because the research hypothesis focused on the reported spin rate, our analysis only investigated records that included the reported spin rate.There are 762 records from the Baseball Savant dataset that are labeled as knuckleballs with reported spin rates.In total, there are 7 unique pitchers in the 762 records.The minimum, maximum, and mean reported spin rates of each of the seven pitchers is presented in Table 1 with the total number of records, the minimum spin rate, the maximum spin rate, and the mean spin rate over all 762 records for each pitcher reported.There does not appear to be a correlation between pitcher and reported spin rate.

Analysis Using Reported Spin Rate
Because the root-finding algorithm cannot find a solution using known equations of motion with data that should adhere to the equations of motion, there is an inconsistency between the data and equations of motion.For this analysis, there was one instance out of 762 records where the root-finding algorithm could not determine a solution.As will be seen, there were other occasions where the algorithm converged on a solution, but there was a significant error compared to the values reported in the dataset.
Of the 761 records where the root-finding algorithm returned a value, most of the calculated values are close to the reported values.Some dataset fields were not used in the solution process, but data from the unused fields can be used to analyze the results as measures independent of the solution process.As a first example, we investigated the ball's initial speed.This value was a field in the dataset, and the solution determined the initial velocity vector.Ideally, the magnitude of the determined initial velocity vector should match the reported speed.

Figure 1. Calculated Accelerations vs. Reported Accelerations With Reported Spin
As an additional independent measure, the reported values of the acceleration when the ball is 50 feet from home plate can be compared to the calculated acceleration values at the same 50 feet from home plate.Figure 1 has each of the three components of the acceleration plotted.
There are noticeable discrepancies between the calculated and reported values.In particular, the acceleration along the x-axis seems to exhibit little correlation between the calculated and reported values.The correlation coefficients are 0.02, 0.70, and 0.43, respectively, for the acceleration vector's x-, y-, and z-components.
Because the Euclidean distance of x-and z-components of the ball's location as it crossed home plate was one of the four values used by the root-finding algorithm, the two components of the crossing of home plate are not independent of the solution procedure.Figure 2 plots the xand z-components of the ball crossing home plate.While there is a correlation, the discrepancy between the reported and calculated values brings into question the accuracy of the propagation matching the actual trajectory.The correlation coefficients are 0.75 and 0.90 for the x-and z-components.
In summary, there is one instance where the root-finding algorithm fails to converge.Of the remaining 761 records, the acceleration comparison, especially the x-component, indicates that the trajectories provided by the determined solution do not match the reported trajectories.

Analysis Solving For Spin Rate
The spin rate for a knuckleball should be approximately between 20 and 150 revolutions per minute (RPM).The analysis presented in this subsection assumes that the reported spin rates are inaccurate and determines the spin rates as part of the solution process.Figure 3 plots the calculated spin rate versus the reported spin rate.The correlation coefficient for this data is -0.16, demonstrating very little correlation between the reported spin rates and the spin rates determined based on the equations of motion.The important details to notice about the plot are that while the reported spin rates are significantly too high, all of the calculated spin rates are in the range expected for a knuckleball.Unlike the solutions that included the reported spin rate, all 762 records were solved.There is significant agreement between the initial calculated speed and the initially reported speed when the spin rate from the Baseball Savant dataset.
Figure 4 shows the calculated acceleration versus the reported values.While the agreement is not perfect, visually, the x-component of the acceleration vector does not seem as random as it was in the previous analysis.

Analysis Assuming Pitch Label is Inaccurate
The first subsection assumed that the Baseball Savant dataset is correct.The second subsection assumed that the spin rate was not necessarily accurate.An alternative possibility is that the spin rate is valid but that the pitch was incorrectly identified as a knuckleball.Now, we will analyze the data using the general equations of motion rather than the knuckleball-specific equations of motion.As was the case with the results in the second subsection, the algorithm successfully solved all 762 records.
Figure 5 shows the accelerations at 50 feet from home plate.The correlation coefficients are 0.49, 0.71, and 0.36 for the acceleration's x-, y-, and z-components.
Finally, Figure 6 considers the home plate crossing.The correlation coefficient is 1.0 for both the x-and z-components of the position as the ball crossed home plate.There is significantly less error than when the dataset was correct in pitch type identification and spin rate.

Comparisons
The data are not noise-free, so it is reasonable that none of the methods would perform perfectly.Comparing the analyses, it is clear that if the pitches are correctly labeled as knuckleballs, the dynamics do not support the reported spin rate.Similarly, if the spin rates are accurate, the pitches were not knuckleballs.This situation is demonstrated by the root-finding algorithm not finding a solution for all 762 records when the data is assumed to be correct but does find all answers in both instances where the data is treated as inaccurate.   2 compares the correlation coefficients for the three analyses.The values from the first subsection are in column 2, the values from the second subsection are in column 3, and the values from the third subsection are in column 4. Because position components were used for the root-finding method, the correlation should be 1.0 for all three analyses.The fact that it is not 1.0 for the analysis that assumes both the pitch type and spin rate are correct indicates that there are errors with at least some pitch identifications or reported spin rates.
As mentioned previously, the high correlation between the initial speed and the components of the home plate crossing positions is partly due to the values used as part of the root-finding process of the shooting method.Therefore, the acceleration provides more insight into the performance of all three models for the reported data.As evaluation metrics, paired t-tests, Wilcoxon signed-rank tests, and Wilcoxon rank-sum tests were applied to all three sets of model groups.Table 3 provides the p-values for the hypothesis that the differences between the two solutions come from a distribution with a mean (for paired t-test) or median (for Wilcoxon tests) of 0. Most tests are rejected at the 5% level, indicating different distributions.
Both solution sets that assumed an error in the data outperformed the solution that assumed the reported spin rate and pitch type were correct.Because of the rejection of the null hypothesis, we know that the observed difference is statistically significant.Therefore, we can conclude that the reported values (of pitch types and spin rates) contain errors.The model that assumes the reported spin rate is accurate but that the pitch is not a knuckleball outperforms (statistically) the model that calculated the spin rate.Therefore, we conclude that the error is most likely with pitch identification rather than spin rate determination based on the agreement with the provided acceleration components.At the same time, the conversion from the raw data to the provided data could bias these results in favor of the pitch identification being inaccurate, so there is the potential that an inherent bias is influencing which is statistically more accurate.The critical takeaway is that the reported pitch type and spin rate are inconsistent.

Conclusions
This study was initiated because the reported spin rates for many knuckleball records in the Baseball Savant dataset were inaccurate.While they appeared to be in error, a more formal examination was required to prove that the spin rates were inconsistent with the equations of motion for a knuckleball.We used the shooting method to determine the calculated trajectory based on the equations of motion of a knuckleball.We also used general equations of motion for a baseball with the assumption of mislabeled pitch types.All 762 records were solved successfully when the spin rate was considered unknown.
Data not used for the solution process was used as an independent measure of the performance of all sets of solutions.The initial speed was reasonably well matched in all cases, with a correlation between the reported and calculated speed of 0.9996 in the worst case of the three solution processes.This result is not surprising since three pieces of information used for the root-finding were the ball's velocity 50 feet from home plate, which is only about five feet after the ball is released.However, the root-finding method did not use the ball's acceleration.The acceleration vector's x-component, in particular, showed a significant difference between the calculated and observed values when the provided data was assumed to be correct.In comparison, both results that assumed an error in the reported data better described the acceleration 50 feet from home plate.
The Baseball Savant database is a rich trove of data, and significant work has produced high-quality data products.However, in the case of knuckleballs with reported spin rates, the dynamics do not support the reported values.A likely possibility is that the uniqueness of the knuckleball, and its relatively rare use, results in the data processing of these pitches not being the most accurate.Future work could include analyzing the raw data to ensure that the data processing did not bias that the pitch identification was more likely to be the faulty component of the data.Additional future work could investigate more complete equations of motion, including features such as the spin axis and variable spin rate, to aid in better identification of knuckleballs and the parameters stored in the Baseball Savant database.While this study has identified an error either in the pitch identification or the spin rate, future work would analyze the algorithm used and improve it to avoid this identified error.

Figure 2 .Figure 3 .
Figure 2. Calculated Pate Crossing Position Versus Reported Plate Crossing Position With Spin

Figure 4 .
Figure 4. Calculated Acceleration vs. Reported Acceleration Without Spin

Table 1 .
Pitcher Spin Statistics