Envision a Major League Baseball player’s stat line. If you’re having trouble doing that, here’s one:
Those are Paul Goldschmidt’s Major League statistics for the last three seasons.
How Do We Take That Information And Create 2014 Projections?
Do we just eyeball it and say, “He hit 20 HR in 2012 and 36 in 2013, so I’ll project 28.”? Do we give more weight to 2013, because it’s the most recent season? Is Goldschmidt still improving? Could he hit more than 36?
What about stolen bases? Or batting average? Runs? RBI?
There are a lot of moving parts here. And they’re all somewhat related to each other. How do you make sense of all this information and develop a sound, reliable, and accurate projection for what will happen in 2014?
We Have To Disaggregate the Data
“There you go again, Tanner. Using words like ‘disaggregate’. What does that even mean?”
Assume you own an ice cream cone stand and you’re trying to project what sales of ice cream will be this month. What factors would go into that calculation?
You could just project it at a very high level and say, “Sales were $10,000 last month and $9,000 the month before. So I will estimate $9,500 for the current month”. And that might give you a reasonably close estimate.
But the key to accurate projections is to look at underlying data or events that make up that end result. You want to break apart the big event, or disaggregate it into smaller events you can study and measure. Instead of trying to guess the ending sales result, you’re better off trying to project the smaller things that make up that monthly total:
- The average selling price per ice cream cone
- The number of ice cream cones sold
- How many hours is the stand open each day?
- How many people will walk by the ice cream stand in a day? In an hour?
- Out of every 100 people that walk by the stand, how many buy a cone?
After you have estimated this information, you run the math and calculate the total sales for the month.
Why This Works
It’s hard to just look at $9,000 and $10,000 of monthly ice cream sales and make sense of those numbers. But if you know that you raised the price of each cone 25 cents, that you just hired an employee that will allow you to keep the stand open longer each day, that the employee has a striking resemblance to Jennifer Lawrence (with long hair, please) and has an uncanny ability to sell ice cream, and that there is a large festival taking place this month that will bring an extra 5,000 people by the stand, then you’ll be able to make a much more accurate projection than you would by simply looking at past monthly sales figures.
Applying This To Baseball
You can think of our typical rotisserie baseball categories as aggregated data, like the monthly ice cream sales. When you break it down a home run is actually the end result of many smaller outcomes that added up to the end result of a baseball being hit over the fence.
All of these events have to happen for a home run to occur:
- The ball has to clear the fence, which means:
- The ball has to travel X number of feet
- The fence is < X from home plate
- The ball has to be hit in the air (a fly ball)
- The hitter has to have an at bat, which means:
- The hitter has to have a plate appearance
- The hitter has to make contact (no swing and miss)
- The hitter has to swing
We could take this further, but you get the idea.
We Live In An Amazing Time
Fortunately, we have data available (for free!) to measure every bullet point above. Sticking with our original Goldschmidt example:
- You can see a plotting of every one of his home runs at HitTrackerOnline.com
- You can use the same site to see overlays of the dimensions of every MLB stadium
- You can even overlay all the hits of any specific player on any specific stadium at Katron.org’s hit location tool
- You can get information about any individual player’s batted ball distance at BaseballHeatMaps.com
- You can see a leaderboard of average fly ball distance, again at BaseballHeatMaps.com
- You can see the batted ball profile/breakdown (GB%, FB%, LD%) for any individual player at Fangraphs.com
- You can also get plate discipline, contact rates, and swing percentages at the same Fangraphs pages
(Note, if you’d like to see more resources like this, be sure to register as an SFBB Insider and get your free guide to the best fantasy baseball resources on the web.)
This Isn’t Just For Home Runs
Batting average breaks down into strike outs, balls in play, home runs, BABIP, batting average on LD/FB/GB, etc. Stolen bases break down into times on base, opportunities to steal, etc. Runs scored and RBI are functions of plate appearances, OBP, slugging, where a hitter bats in the lineup, the strength of the surrounding lineup, and more.
Strikeouts by pitchers break down into innings pitched, pitch mix, effectiveness of specific pitches, swinging strike rates, and more. WHIP and ERA can be broken down into factors like BABIP, HR/FB, BB%, K%, ball park played in, surrounding defense, etc.
Too Much Of A Good Thing
There are two warnings to keep in mind when you are disaggregating data.
- The law of diminishing returns applies. You know, the economics lesson that the fourth KitKat bar just doesn’t taste quite as good as the first. If your original projection model is to “eyeball” the number of home runs based just on the last couple of seasons, you would have a certain level of accuracy with that approach. If you then break home runs down just one level into something like fly balls hit and a HR/FB calculation, your accuracy would likely increase significantly. But each additional level of breakdown that you attempt will likely have a lower and lower benefit to your accuracy.
- The disaggregation has to make sense in the context of what you are projecting. Sales of at your ice cream cart could logically be affected if the sales person looks like an attractive and famous female actress. But ice cream sales probably won’t be affected by who won the Super Bowl the previous year (measurements like, “the stock market increases in years after the NFC wins the Super Bowl” don’t make sense). You are probably saying, “No #h!~”. But we do see this in baseball analysis. Do you really think a hitter’s power is affected when the month of May rolls around? Or that a player that hit home runs the last three days is any more likely to hit a fourth?
How Do I Use This Information?
One application would be to use this to analyze statistical oddities. Were Paul Goldschmidt’s 36 HR legit? Look at his HR/FB and FB%. Could Billy Hamilton really steal 100 bases? Look at how often he stole in the minors and calculate how many times you think he’ll be on-base in 2014.
Another practical application is to improve your current knowledge of player analysis. Take your knowledge of things one more level. If you aren’t already familiar with some of the advanced player analysis measures available (batted ball profiles, HR/FB, contact rates, etc.), add a few of those to your tool belt. If you are familiar with using batted ball stats, move on to contact rates. If you understand contact rates, break your understanding into left and right-handed splits.
Or take a stab at developing your own player projections. This is a great way to see how these different metrics fit together and build up to a complete forecast for an individual player.
Or begin to apply this theory to other measures. Like winning your fantasy baseball league. Start breaking down that big end result into smaller components that can be measured or studied. Do you know what it takes to win your league (get a tool to help here)? Winning a title can be broken down into draft preparation, draft execution, in-season moves, free agency pickups, trades, end game moves, and more.
Break your behavior down into those buckets and take a closer look. Is there a component you feel weak in?
Do you have a very strong opponent in your league? Break their seasons down and see if you can learn things.
Thanks For Reading