"Being Steve Parsons" references the movie "Being John Malkovich", where people get to see the world from John's eyes for a few minutes. Likewise, you get to see the world through my eyes for a few minutes as you read this column. It's a place for experienced roto players to explore issues, get your own ideas peer-reviewed or to just use the resources of RotoCommunity to look at this game we love. If you are a less experienced player or do not consider yourself an "expert," read on, but you may wish to read the introductory column as an FAQ or just to see the rules of the road. However strongly held my opinions may be, my goal here is to put forth an issue and look at as many sides as my editor will let me get away with. If you have an idea, a column or just a thought you would like to explore, just leave a comment.
Being Steve Parsons - May 19, 2010
In-Play Analysis - Conclusion.
What started as an interesting tidbit of information in Voros McCracken’s efforts to isolate what part of a pitcher’s line was the responsibility of the pitcher and what was attributable to defense or other factors, has been twisted into a general tool used by writers to explain and predict all manner of baseball results. Oftentimes in-play results, most usually Batting Average on Balls In Play, or BABIP are used indiscriminately with the ability of pitchers to prevent hits generally, something that McCracken specifically said was not the case. Many, if not most, advanced metrics that speak of “fielding independent” or “defensive independent” use the idea that pitchers do not control whether in-play balls become hits, and that over time in-play results will settle into or regress to some non pitcher-specific average. Many metrics, from Winshares to Defensive Efficiency that don’t explicitly state so, incorporate the idea.
Nor is the idea limited to fantasy projections for ballplayers or rhetorical ratings systems. Sabermetrically-inclined professional ball clubs use in-play results in making player decisions. The idea that defense is one of the primary factors as to whether in-play ’events’ turn into hits or outs has reached major league offices. Teams lmake personnel decisions based on the idea that a single key defensive player has more to do whether 55-80% of all plays in baseball becomes outs than the pitcher.
The problem is that the idea is wrong.
Evaluating pitcher control - What McCracken Got Right.
In his original piece McCracken sets out nine facts and five anecdotal observations that led him to this observation. There are some problem with how he goes about it, but the fundamental ideas behind these “facts” are what we all would do to test whether something was a skill intrinsic to a pitcher, random, or due to some extrinsic factor. Does a pitcher, given a reasonable amount of work have their own “level” of performance in that area? Does that tend to repeat from year to year? Just as a ".300 hitter" would hit .300 over time and repeat that performance year in and year out. We would not call a guy who hit .350 in the even years and .250 in the odds a .300 hitter just because the average was .300.
But Pitchers DO have their own level of BABIP.
We get into immediate trouble. I have uploaded a number of spreadsheets to the invaluable Google docs, so you can look at the data examples for yourselves and check my math. Pitchers do,have their own levels. McCracken states that “ The vast majority of pitchers who have pitched significant innings have career rates between .280 and .290” but they do not.
Using the 500 pitchers who have faced the most batters from 1974-2007 Here. We get the following chart:
Rank BABIP (avg) BABIP (range) BFP (avg)
1-100 .267 .241-.274 6695
101-200 .278 .274-.282 7399
201-300 .285 .282-.288 7134
301-400 .292 .288-.296 6750
401-500 .302 .296-.325 5920
Totals .284 .241-.325 6778
164 of 500 results fall in the range .280-.290, not a vast majority and the variation is more in line with a skill, than with random chance. And note, this chart understates the ctual distribution, because by “filtering” using a high number of batters faced, you are filtering by quality. The .284 BABIP average for this group is significantly different form the population as a whole, which you can see here. is .294 over the same period. The reason for this is simple. Jus as a large at-bat sample wouldn’t contain any .200 hitters (and only low average hitters with power), a large batters-faced list isn’t going to include pitchers who give up a lot of hits. This means the range of results is compacted, but even so it doesn’t reach McCracken’s contention that the “vast majority” of pitchers fall between .280-.290. They do not.
BABIP rates do not regress from extreme results.
Many writers tend to use BABIP anecdotally. They point out an individual BABIP result, claim that it is “unsustainable” and then proceed to predict some overall regression. McCracken, in his second "fact" states it this way: “The pitchers who are the best at preventing hits on balls in play one year are often the worst at it the next. In 1998, Greg Maddux had one of the best rates in baseball, then in 1999 he had one of the worst. In 2000, he had one of the better ones again. In 1999, Pedro Martinez had one of the worst; in 2000, he had the best. This happens a lot.”
It doesn’t happen a lot - it happened a total of 12 times in 9 seasons. BABIP rates don’t regress reliably and even the most extreme examples tend more often to repeat, than regress and such “flips” happen almost never. I have compiled here, a complete list (min. 50IP 10GS) of the highest 20% BABIP seasons for every season starting with 1999, the initial season McCracken examined and going up until 2007. Of those 321 seasons (or qualifying half-seasons for players traded mid-season), only 126 (39%) pitched the following season and had a BABIP that was not, again, in the top quintile. And of that 126, 41 had BABIPs the second season of over .300 and three even had higher BABIPs but fell just outside the top 20% threshold for that season.
Of the remaining 195, 71 repeated their performance the second season, 95 did not even pitch the following season due to demotion or the end of their careers, the remaining number are those pitchers who were significantly injured one season or the other and a handful of seasons (six or seven) where their split time called the numbers into question.
The actual results using BABIP regression or “normalizing” would be worse than randomly saying “yes” or “no” and no better than automatically saying that the results would repeat..
“High Hit” performers do not do “about the same” as “Low Hit” performers.
2000 results IP Hits/9 BABIP ERA
High Hits 1999 8.76 .303 4.70
Low Hits 1999 7.19 .285 4.11
2001 results IP Hits/9 BABIP ERA
High Hits 2000 8.81 .306 5.10
Low Hits 2000 7.10 .286 3.64
And I want to emphasize here, these results are not adjusted to compensate for the attrition that we discussed previously. A “corrected” version that took into account the 30% of the high BABIP pitchers who retire each season, would show an even greater gap.
Pitchers control hits and not just by preventing the batter from getting the bat on the ball.
McCracken uses the example of Scott Karl and Randy Johnson “That's not because batters hit the ball harder off Karl than Johnson, but because they hit the ball more often off Karl than Johnson.“ First off, his example his incorrect, Johnson not only prevents contact (about 55% of Johnson’s batters faced got into play during his prime vs. around three out of four for Karl’s career), but also hits-in-play with a .291 career BABIP to Karl’s .306. Leaving that aside, if it were true then the percent of in-play events should correlate well with actual in-play hits per nine innings but in fact they correlate very mildly (data here.a correlation coefficient of 468 with 1 being well correlated and 0 being statistically independent). BABIP itself correlates much better with actual in-play hits at .840 (as it should since you are talking about all the same numbers). It’s not the strikeouts and it‘s not the BABIP, it’s both or maybe it is neither.
Outs are just outs.
I have spent a fair amount of time building up the idea that pitchers control hits-in-play and many other commentators have focused on small differences that seem to be the case for extremes groundball pitchers, knuckle-ballers and pitchers with high or low line drive rates. I noted when this idea first migrated over to defense that the numbers are heavily park influenced, which renders such stats as Defensive Efficiency Rating basically meaningless. In fact if you examine the BABIP numbers closely, you’ll find that many of the BABIP regressions that there are can largely be explained by moving into, or out of Coors field.
But the real problem, and why BABIP numbers are problematic is pretty clearly illustrated by McCracken’s two big examples. Greg Maddux and Pedro Martinez. Pedro’s “bad” season was one of the best seasons in baseball history, just not as good as the following year (probably one of the top four statistical seasons of all time -Sandy Koufax owns the others). This illustrates the very loose connection that BABIP has with actual performance and how little it has to do with the amount of actual hits allowed. In this bad season, Pedro allowed the sixth fewest in-play hits in all of major league baseball. Maddux, who was pitching through injuries during his bad season, had one of the most consistent records of hit prevention in baseball. From 1990 until his final year he had only one season with a BABIP of over .286 - 1999, the year that McCracken looked at and all without a large strikeout rate, batters were hitting the ball.
Pitchers control hits, or if you prefer their hit rates are a matter of their skill. What they don’t control is the ratio on in-play events to out of play events.
There’s a mathematical error going on with BABIP. To reach all the conclusions about defense, future results and pitcher independence, you have to demonstrate that hit rates aren’t independent of whether outs come on strikeouts or pop-outs. A quick way to illustrate how this works is to picture a box into which you place 20 red balls and then fill with a random number of yellow and green balls. The BABIP argument would say you have no control over the amount of red balls put into the box because after you take away the yellow, the ratio of red to green keeps changing. “After all,” you say, “Yellow balls can’t be red.” Now you can obviously describe the contents of the box each time using a BABIP like ratio. But if you tried to predict the next box or said that some green-ball defensive sorter could reduce the number of red balls in the next box you‘d only be right occasionally by accident. You have to prove that the pitcher isn’t controlling the total number of red balls otherwise the formulation “once the ball is in play” is meaningless and ironically for the very reason BABIP was supposed to put to rest: a pop-up or a groundball is closer to a strikeout than it is to a hit - that baseball DID have base-hit written all over it. Mere variability of BABIP results doesn’t prove that pitchers don’t control hits and the numbers almost universally suggest that hit rates are independent of in-play rates.
McCracken was perplexed by what he saw as the variability of BABIP, Let's take a look at an example.
IP Hits/9 SO/9 BB/9 HR Batters BABIP
Faced