xBABIP? Let’s Start Putting Random Letters in Front of Statistics!

In our discussion of BABIP, we mentioned that it’s frequently misused or misinterpreted.  An example of its misuse might be, “Miguel Cabrera’s BABIP was .331 in 2012.  He’s due for a drop in production as we expect his BABIP comes back down to the league average of about .300″.  That’s where the “x” comes into xBABIP, or Expected Batting Average on Balls in Play.

What Do We Use xbabip for?

xBABIP is a projection of what a given player’s BABIP will/should be.  It’s not a measure of past performance like BABIP.  So we can use it to project a player’s statistics for the year.

Should we expect all hitters to have a BABIP of around .300?

No.  As you know, there are many different types of hitters.  Even without statistics to support the argument, you would probably expect a hitter with power to have a different BABIP than a hitter with little power.  You’d expect a hitter that tends to hit more ground balls and line drives to have a greater BABIP than a fly ball hitter (a fly ball that stays in play has a low chance of being a hit).

How Is xbabip calculated?

This is a difficult question to answer.  As far as I can tell, because this is a projection, there is no definitive calculation.  This article at the Hardball Times may be the first to reference the phrase “xBABIP”.

The harder I look for an agreed upon calculation, the more variations I find.  In my not-so-expert opinion, I see two ways to calculate without needing a degree in statistics:

  1. Use a player’s historical BABIP to project future BABIP
  2. Break down batted ball data into categories of ground balls, line drives, and fly balls (some will then further break fly balls into infield fly balls and outfield fly balls).

The first approach is slightly flawed.  It’s never a great idea to assume past performance is a strong indicator of what will happen in the future.  But it does appear to have some predictive value.

Using batted ball data makes more logical sense.  I’m generally aware that there are ground ball and fly ball hitters.  There are probably some hitters that can regularly hit more line drives than others.  We can then determine the average BABIP for ground balls, fly balls, and line drives, and use those to calculate an estimate of BABIP.

I can’t figure out where to find BABIP for each batted ball type.  To be more clear, I’d like to see data showing the number of ground balls hit and the number of grounds balls that went for hits.  The best information I can find is in the table below (from this article by Tristan Cockroft of ESPN), but it lacks supporting detail:

Ground Balls 0.237
Fly Balls 0.138
Line Drives 0.724
Bunts 0.376

Another problem with this data is that it’s from 2009.

When trying to come up with my own calculation for xBABIP (and borrowing Cockroft’s factors), I was envisioning something simple like this:

((GB * .237)+(FB * .138)+(LD * .724)) / BIP

Where “GB” stands for ground balls, “LD” for line drives, “FB” for fly balls, and “BIP” for total balls in play.

Not surprisingly, some folks smarter than myself have fine tuned this.  You can see Cockcroft references a BABIP for bunts.  This makes sense as this would be the batting average when a player is attempting to bunt for a base hit (as BABIP excludes sacrifices).  And this more recent article at Fangraphs has a formula that makes great sense (and has updated rates).

((GB – IFH) * .195 + (FB-HR-IFF) * .134 + LD * .740 + IFH + BUH) / BIP

The Fangraphs formula breaks out infield hits (IFH) as a subcategory of ground balls.  The thinking here is that a player “earns” infield hits.  Said another way, not all players have the same rate of infield hits.  For example, Ichiro is always going to have more infield hits than Prince Fielder.  If you remove IFHs, you can then assume Ichiro and Prince Fielder would have a similar BABIP for ground balls.

The formula also separates fly balls into infield fly balls (IFFB) and outfield fly balls (OFFB).  I can’t find documentation to support this, but I’m guessing you assume the BABIP for IFFBs is 0.  While OFFBs can fall for hits.

 I Don’t Understand a Word You Just Said

Oh, Napoleon.  In plain English, you could interpret the above  formula to say, “The batting average for ground balls that are not infield hits is .195.  The batting average for fly balls hit to the outfield is .134.  The batting average for line drives is .740.  And a player’s xBABIP will be higher when you adjust for their expected infield hits and bunt hits.”  

What do You Think?

I can’t locate an updated discussion of xBABIP.  So I’m sure I missed something.  I think Cockroft’s article is great, but it’s getting a little outdated.  Have you seen any other helpful primers on the topic?

Does anyone know how to locate statistics to support BABIP for different types of batted balls?  For example, something showing line drive hits vs. line drive outs.

Thanks for reading.  As your Mom probably told you, make smart choices.




2 Responses

  1. cdutton
    cdutton April 30, 2014 at 10:01 AM | | Reply

    FYI – the latest xBABIP model includes the following factors: HR/FB, IF/FB, LD%, FB/GB, Speed score, Lefty*(FB/GB%), Contact rate, Spray.

    This was a follow-up article to the original HBT post:

  2. Tanner Bell
    Tanner Bell May 2, 2014 at 5:49 AM | | Reply

    Thanks for sharing, Chris. Do you know if your model has gotten more accurate over time? Especially with the trends in shifting left handed pull hitters, that Lefty GB component seems like it would separate this model from the others even more.

    Is the tool at the link below something that calculates the qxbabip mentioned in Derek’s article?


Leave a Reply