The Difficulty In Aggregating Projections

Now that we’ve established that we can benefit from combining multiple projection models into one, let’s take a look at the challenges this presents.

I’ll also give brief explanations of how you can work around these challenges in Excel.  At the end I’ll discuss an Excel template I’m working on that will do these calculations for you automatically and how you can get your hands on it.

I love Your Feedback

If you’re a SFBB Insider you might recall that after you sign up, the very first e-mail I send you asks you to reply with any fantasy baseball topics you’d like to know more about or difficulties you’re having (if you’re not, you can register here.  I like to think it’s worth your while).

Insider

I Read All Of Those Responses

I’ve been fortunate enough to have nearly 500 people register, and I read every single response that comes in from that question.  One of the most frequent areas of interest is how to average, or aggregate, multiple sets of projections into one usable set of information.

More Difficult Than I Originally Thought

These requests started to roll in during the off-season, and I even replied to several people saying that I thought this was going to be easy and that I’d have guidance coming out soon on how to do this.

… And here I sit months later having never written on the topic yet.

In theory, averaging a set of three numbers in Excel is easy.  If one system says 25 HR, one says 30 HR, and another says 35 HR, Excel’s AVERAGE formula can easily respond with the average of 30.

But I quickly ran into some big problems that greatly complicated things.

Problem 1 – Lining Projections Up To Do The Averages

In order to aggregate multiple projection systems into one, we need a method of “lining up” the projections from one system with those of another system.  Perhaps Giancarlo Stanton is projected to hit 20 HR the rest of the season by Steamer and 22 HR by PECOTA.

Giancarlo_Stanton_ROS
I made this information up just to illustrate the concept of “lining up” different projections.

We can use formulas in Excel (e.g. VLOOKUP) to pull Stanton’s Steamer projection and place it next to his PECOTA projection.  But you can run into some complications in doing this.  What if one projection system lists him as “Stanton, Giancarlo” and the other as “Giancarlo Stanton”.

Using names to pull data also opens you up to inconsistencies in the name being used.  Is it Jonathan Singleton or Jon Singleton?  AJ Burnett or A.J. Burnett?

If you have taken on the challenge of creating your own rankings, you know that we’ve dealt with this problem before, but on a smaller scale.  In my rankings spreadsheets I use a consistent playerID to pull information between the different tabs.  I prefer to use the Baseball-Reference playerIDs because you can tell who a player is (Stanton is “stantmi03” because there were two other Mike Stanton’s before him).

But seemingly every major baseball site has their own player ID system.  Fangraphs says Stanton is “4949”, Baseball Prospectus uses “57556”, ESPN says “30583”, etc.

This is why I maintain the SFBB player ID map Excel file.  The map allows for this translation or “lining up” to happen.  It’s the bridge that can easily help you take Stanton’s projection from one system and place it next to his projection from another.  Giancarlo_Stanton_PlayerID

Problem 2 – Players Not Projected In All Systems

Continue reading “The Difficulty In Aggregating Projections”

Should You Combine Multiple Projection Systems Into One?

Should I use this projection system or that one?  Why mess around with the second best system if you can easily determine the best, right?

If you search the web, you can locate previous studies that review the accuracy of baseball’s many projection models.

I Don’t Have Time To Read All That.  Just Tell Me what They Say.

Understood.  Here’s my summary:

  • There area lot of different approaches to projecting stats (Marcel, Steamer, Zips, Oliver,PECOTA, etc.)
    • Basic three year weighted average with regression to league average
    • More than three year weighted averages incorporating more advanced component metrics
    • Crowd sourcing
    • Aging curves
    • Similar player modelling
  • No single projection system is consistently better than the others in all the stat categories we care about for fantasy baseball
  • The most accurate projection model changes from year-to-year
  • But there are some that consistently perform well
  • Some systems do well in projecting offensive statistics
  • Some are better at pitching

What Is Also True

A lot of research has been done on the effectiveness of combining or “aggregating” different projections or forecasts into one.  This research was not done with only fantasy baseball in mind, but we can take advantage of it.  Here’s one very interesting article on the topic (it’s from a website named “forecastingprinciples.com” and is a PDF of a study from the Wharton School of Business at Penn, it has to be legit, right?).

The thinking behind aggregating projections is that the wisdom of many intelligent people looking over a lot of information can lead to better results than just one isolated model for projecting future results.  When you combine all of this together you’ll naturally be removing the outliers from the individual models, but hopefully you’re also improving the accuracy as a whole.

The Actual Results

It may not be appropriate to boil a 15 page research paper into a couple of sentences.  But I’m going to do it anyway!  Here’s what the PDF linked above concludes on the evidence on the value of combining forecasts: Continue reading “Should You Combine Multiple Projection Systems Into One?”