What is a balk? Pro Baseball Insider has the simplest definition:
In the simplest sense, a balk is when the pitcher tries to intentionally deceive the hitter or runner. It can be a flinch on the mound after the pitcher gets set, a deceptive pick off attempt, or even just as simple as dropping the ball once you become set. There are many actions that can result in a balk. When runners are on base and a balk is called, all the runners move up one base.
A full list of the actions that constitute a balk can be found here.
Balks are rare. Since 2000, there have only been 100-200 balks per season, which is roughly one every 12 to 24 games (or 648 to 1296 innings pitched) in a full 2430-game season.
Balks are difficult to spot. Balks sometimes go unnoticed by fans, players, and umpires. What constitutes a balk might be subjective depending on the umpire. Balks might even be ignored by umpires depending on the situation.
The definition of a balk has changed over time. Throughout baseball history, there have been a number of tweaks to the balk rule. With each tweak, balk totals for the subsequent season tended to spike or dip.
The 2015 version of The Lahman Baseball Database contains complete batting and pitching statistics from 1871 to 2015, plus fielding statistics, standings, team stats, managerial records, post-season data, and more. The Master (player names, DOB, and biographical info), Pitching (regular season pitching statistics), PitchingPost (postseason pitching statistics), and Batting (regular season batting statistics) tables are required for this balk analysis.
The full database and a detailed description of its contents can be found on Sean Lahman's website.
For the caught stealing ("CS") column in the Batting table, there are big gaps in the data up until 1920 and smaller gaps in the data from 1920 to 1950. Data from 1951 to present is whole and this segment of data is all that is required for an effective visualization in Part 2: Balks and Stolen Base Attempts.
There is no postseason balk data from 1884 to 1892 and in 2012. The innings pitched and balk data for these seasons in the PitchingPost table were therefore excluded in Part 3: Regular Season Balks vs. Postseason Balks.
Columns from the Master table and the Pitching table were merged to create a new table that contains pertinent information about the balk-related record holders in Part 4: Balk Kings and Iron Men.
The "IPouts" data, which is innings pitched times 3 outs per inning, in the pitching tables was converted back to innings pitched by dividing by 3 throughout the analysis.
Balks per inning pitched (in Parts 1-3) and innings pitched per balk (in Part 4) was calculated by dividing the balks column and the newly-created innings pitched column accordingly.
import pandas as pd
# Load relevant .csv files into DataFrames
master_df = pd.read_csv('../p2/baseballdatabank-master/core/Master.csv')
pitching_df = pd.read_csv('../p2/baseballdatabank-master/core/Pitching.csv')
pitchingpost_df = pd.read_csv('../p2/baseballdatabank-master/core/PitchingPost.csv')
batting_df = pd.read_csv('../p2/baseballdatabank-master/core/Batting.csv')
%pylab inline
import matplotlib.pyplot as plt
import seaborn as sns
# Balks per inning pitched data grouped by year (regular season)
balks_by_year = pitching_df.groupby('yearID')['BK'].sum()
ip_by_year = (pitching_df.groupby('yearID')['IPouts'].sum()) / 3 # Convert to IP
balks_ip_by_year = balks_by_year / ip_by_year
fig = balks_ip_by_year.plot(color='b')
fig.annotate('1 BK every 50 IP', xy=(1873, .02), xytext=(1873, .0197))
fig.annotate('1 BK every 100 IP', xy=(1873, .01), xytext=(1873, .0097))
fig.annotate('1 BK every 200 IP', xy=(1873, .005), xytext=(1873, .0047))
ylabel('Regular Season BK/IP')
xlabel('Year')
Balks per inning pitched have been on a slow upward trajectory since 1885 or so, with spikes in 1899, 1950, 1963, and 1988. The spike in 1988 (1 balk for every ~40 innings pitched) was so dramatic that the season is referred to as The Year of the Balk. All of these spikes coincide with rule changes and enforcements. As per Recondite Baseball:
The balk rule is designed to limit pitcher deception towards the baserunner. Did the balk rule changes and enforcements in the mid-to-late 1900s spark an increase in stolen base attempts?
# Stolen base attempt data (successful + unsuccessful) grouped by year (regular season)
sb_by_year = batting_df.groupby('yearID')['SB'].sum()
cs_by_year = batting_df.groupby('yearID')['CS'].sum()
sb_attempts_by_year = sb_by_year + cs_by_year
sb_attempts_ip_by_year = sb_attempts_by_year / ip_by_year
x = balks_ip_by_year.loc['1940':'2015'].index.get_level_values('yearID')
y1 = balks_ip_by_year.loc['1940':'2015'].values
y2 = sb_attempts_ip_by_year.loc['1940':'2015'].values
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(x, y1, color='b', label='BK/IP')
ax2.plot(x, y2, color='r', label='SB/IP')
ax1.legend(loc='upper left', shadow=True)
ax2.legend(loc='upper right', shadow=True)
ax1.set_xlabel('Year')
ax1.set_ylabel('Regular Season BK/IP')
ax2.set_ylabel('Regular Season SB/IP')
ax2.annotate('1 BK every 50 IP', xy=(1951, .12), xytext=(1943, .1188), color='blue')
ax2.annotate('1 BK every 200 IP', xy=(1947.5, .06), xytext=(1942, .078), color='blue',
arrowprops=dict(facecolor='blue', shrink=0.05))
ax2.annotate('1 SB every ~8 IP', xy=(2000, .12), xytext=(2001, .1188), color='red')
ax2.annotate('1 SB every ~17 IP', xy=(2006.5, .06), xytext=(2000, .078), color='red',
arrowprops=dict(facecolor='red', shrink=0.05))
The post-1950 balk rule changes and enforcements coincide with an increase in balks (blue) and an increase in stolen base attempts (red). Though there could be other factors at play here (i.e. an increase in player speed, managers calling for more stolen base attempts based on strategy change, etc.), it appears likely that the balk rule changes and modifications were effective in promoting the running game.
# Balks per inning pitched data grouped by year (postseason)
playoff_balks_by_year = pitchingpost_df.groupby('yearID')['BK'].sum()
playoff_ip_by_year = (pitchingpost_df.groupby('yearID')['IPouts'].sum()) / 3 # Convert to IP
playoff_balks_ip_by_year = playoff_balks_by_year / playoff_ip_by_year
fig = balks_ip_by_year.plot(color='b', label='Regular Season')
fig = playoff_balks_ip_by_year.plot(color='g', label='Postseason')
fig.legend(loc='upper center', shadow=True)
ylabel('BK/IP')
xlabel('Year')
fig.annotate('1 BK every 50 IP', xy=(1873, .02), xytext=(1873, .0197))
fig.annotate('1 BK every 100 IP', xy=(1873, .01), xytext=(1873, .0097))
fig.annotate('1 BK every 200 IP', xy=(1873, .005), xytext=(1873, .0047))
There is a high variation in postseason balks per inning pitched (green). This variation is to be expected because of the small sample size of innings pitched in each postseason. For reference, there are ~28 thousand innings pitched in the postseason pitching table compared to ~3.75 million in the regular season pitching table. Despite quick mean calculations in Excel revealing that regular season BK/IP is much larger than postseason BK/IP, it is difficult to determine purely from this yearly visualization which is the larger overall value.
Question: Are there significantly more balks called per inning pitched in the regular season compared to the postseason?
H0: μD, regular season BK/IP - postseason BK/IP <= 0
HA: μD, regular season BK/IP - postseason BK/IP > 0
where H0 is the null hypothesis, HA is the alternative hypothesis, and μD, regular season BK/IP - postseason BK/IP is the population mean difference in balks per inning pitched for the regular season compared to the postseason.
A right-tailed, independent t-test comparing two independent means is appropriate for this scenario for the following reasons:
Since the variances are not roughly equal, as illustrated by the above figure, unpooled standard error is appropriate for this test.
# Regular season balk and innings pitched data by year and individual
season_balks = pitching_df['BK']
season_ip = pitching_df['IPouts'] / 3 # Convert to IP
season_balks_ip = season_balks / season_ip
season_ip_sum = season_ip.sum()
# Weighted BK/IP mean calculation for regular season
season_ip_weights = season_ip / season_ip_sum
season_weight_times_obs = season_ip_weights * season_balks_ip
weighted_season_mean = season_weight_times_obs.sum() / season_ip_weights.sum()
# Weighted BK/IP standard deviation calculation for regular season
# stats.stackexchange: How do I calculate a weighted standard deviation? http://goo.gl/6206Ck
# Very large sample size (>3 million), so don't need to re-scale the variance before sqrt
sum_season_weighted_squared_dev = (season_ip_weights * ((season_balks_ip - weighted_season_mean) ** 2)).sum()
weighted_season_std = (sum_season_weighted_squared_dev / season_ip_weights.sum()) ** 0.5
# Sample size for hypothesis test (regular season)
season_n = season_ip_sum
# Postseason balk and innings pitched data by year and individual (excluding NaNs)
playoff_balk_bool = pd.notnull(pitchingpost_df['BK']) # Exclude missing postseason BK data
playoff_balks = pitchingpost_df['BK'][playoff_balk_bool]
playoff_ip = (pitchingpost_df['IPouts'] / 3)[playoff_balk_bool] # Convert to IP
playoff_balks_ip = playoff_balks / playoff_ip
playoff_ip_sum = playoff_ip.sum()
# Weighted BK/IP mean calculation for postseason
playoff_ip_weights = playoff_ip / playoff_ip_sum
playoff_weight_times_obs = playoff_ip_weights * playoff_balks_ip
weighted_playoff_mean = playoff_weight_times_obs.sum() / playoff_ip_weights.sum()
# Weighted BK/IP standard deviation calculation for postseason
# stats.stackexchange: How do I calculate a weighted standard deviation? http://goo.gl/6206Ck
# Very large sample size (>27 thousand), so don't need to re-scale the variance before sqrt
sum_playoff_weighted_squared_dev = (playoff_ip_weights * ((playoff_balks_ip - weighted_playoff_mean) ** 2)).sum()
weighted_playoff_std = (sum_playoff_weighted_squared_dev / playoff_ip_weights.sum()) ** 0.5
# Sample size for hypothesis test (postseason)
playoff_n = playoff_ip_sum
# For unpooled_se, t, and df formulas, see https://onlinecourses.science.psu.edu/stat200/node/60
# Unpooled standard error
unpooled_se = (((weighted_season_std ** 2) / season_n) + ((weighted_playoff_std ** 2) / playoff_n)) ** (0.5)
# Test statistic for independent means (unpooled)
t = (weighted_season_mean - weighted_playoff_mean) / unpooled_se
# Degrees of freedom for independent means (unpooled)
num_df = (((weighted_season_std ** 2) / season_n) + ((weighted_playoff_std ** 2) / playoff_n)) ** 2
denom_df = ((1 / (season_n - 1.)) * (((weighted_season_std ** 2) / season_n) ** 2)) + \
((1 / (playoff_n - 1.)) * (((weighted_playoff_std ** 2) / playoff_n) ** 2))
df = num_df / denom_df
# p-value
p = 0 # P(T > 5.50) using https://surfstat.anu.edu.au/surfstat-home/tables/t.php
print "Regular Season"
print "Weighted Season mean (BK/IP): " + str(weighted_season_mean)
print "Weighted Season std (BK/IP): " + str(weighted_season_std)
print "Season n (IP): " + str(season_n) + "\n"
print "Postseason"
print "Weighted playoff mean (BK/IP): " + str(weighted_playoff_mean)
print "Weighted playoff std (BK/IP): " + str(weighted_playoff_std)
print "Playoff n (IP): " + str(playoff_n) + "\n"
print "Unpooled SE: " + str(unpooled_se)
print "t: " + str(t)
print "df: " + str(df) # Calculation confirmed here http://web.utk.edu/~cwiek/TwoSampleDoF
print "p-value: " + str(p)
There is sufficient evidence at any alpha level of significance to support the claim that there are significantly more balks called per inning pitched in the regular season compared to the postseason.
So why was there 1 balk called every 280 innings in the regular season and only 1 balk called every 390 innings in the postseason? My speculation is that a combination of the two factors below is responsible for the discrepancy:
Who is the all-time balk king? Who is the modern-day (post-2000) balk king? Who is the balk iron man (most innings pitched without a balk)?
# Balks and innings pitched data grouped by player (regular season)
balks_by_pitcher = pitching_df.groupby('playerID')['BK'].sum()
ip_by_pitcher = (pitching_df.groupby('playerID')['IPouts'].sum()) / 3 # Convert to IP
ip_by_pitcher.rename('IP', inplace=True)
# Innings pitched for every balk column
ip_balks_by_pitcher = ip_by_pitcher / balks_by_pitcher
ip_balks_by_pitcher.rename('IP/BK', inplace=True)
# Combine master and pitching tables
player_info = master_df[['playerID', 'nameFirst', 'nameLast', 'debut', 'finalGame']]
alltime_balk_king = player_info.join(ip_by_pitcher, on='playerID', how='inner')
alltime_balk_king = alltime_balk_king.join(balks_by_pitcher, on='playerID', how='inner')
alltime_balk_king = alltime_balk_king.join(ip_balks_by_pitcher, on='playerID', how='inner') \
.sort_values('IP/BK', ascending=True)
# Record holders
alltime_balk_king = alltime_balk_king.loc[alltime_balk_king['IP'] > 50]
modern_balk_king = alltime_balk_king.loc[alltime_balk_king['debut'] > '2000-01-01']
balk_iron_man = alltime_balk_king.loc[(alltime_balk_king['BK'] == 0) & \
(alltime_balk_king['debut'] > '1950-01-01')] \
.sort_values('IP', ascending=False)
alltime_balk_king.head(15)
For pitchers with more than 50 innings pitched, Don Heinkel is the all-time IP/BK leader with 1 balk every ~9 innings pitched (7 balks in 62 and 2/3 innings). Heinkel, like many of the pitchers on this leaderboard, pitched in the era influenced by 1988, The Year of the Balk, so perhaps he doesn't deserve the Balk King title based on true "skill" alone.
modern_balk_king.head(15)
For pitchers with more than 50 innings pitched in the post-2000 era (after balk rates normalized following the 1988 rule change), Steven Kent is the modern-day IP/BK leader with 1 balk every ~19 innings pitched (3 balks in 57 and 1/3 innings). Kent pitched in the 2002 season. Perhaps the more interesting name on this list is Franklin Morales. Morales, still active in 2016, has committed an astounding 17 balks in 486 innings, which equates to 1 balk every ~29 innings. His sample size of 486 innings pitched is more than double anyone else's on the leaderboard. For me, Franklin Morales is the modern-day balk king.
balk_iron_man.head(15)
Kirk "Woody" Rueter is the all-time innings pitched leader without a balk. Rueter pitched 1918 balk-less innings over a career that spanned 13 years, ~275 more innings than the second-place Sam Jones. Perhaps even more impressive is the fact that Rueter began his career in 1993, when balk-calling rates had not yet settled down from the highs caused by the rule change in 1988, i.e., The Year of the Balk. Jonathan Niese is probably the active pitcher on this leaderboard that has the best shot at catching Rueter. Niese is only 29 years old and is still a starting pitcher that logs ~150-200 innings per year. It would take Niese ~4.5 more balk-less years at his current pace to take the title of Balk Iron Man away from Rueter.