Module 5


  • Linear weights – people have used multiple linear regression in order to determine relative value of baseball events (1B, 2B, 3B, HR, BB, HBP, etc.)
  • One out is -.25
  • OPS = OBP + SLG
  • BRA = OBP * SLG
  • XRR = (.5 * 1B)+(.72 * 2B) + (1.04 * 3B) +(1.44*HR)+.33*(HBP + BB)+.18*SB – .32*CS-.098*(AB-H)
  • LWTs = (46 *1B)+(.8 *2B) + (1.02 * 3B) +(1.40*HR)+.33*(HBP + BB)+.3*SB – .6*CS-.25*(AB-H)
    • By Pete Palmer
    • Measures Runs Above Average
  • Base Runs, created by David Smyth
    • More flexible
    • Works across different run scoring environments (might even work in softball)


Regression to the Mean

  • Think of sophomore slumps and SI cover jinx
  • Often leads to incorrect interpretation of results
  • We are often measuring outcomes, not talents or skills
  • Outcomes are not necessarily perfect depiction of talent.  Outcomes can have a heavy element of luck.
  • Envision a typical scatterplot.  Even when there is strong correlation between two variables, there is always error above and below that line of best fit.
  • Outcomes = Innate Skills & Talents + Error Luck Chance Randomness
    • IS&T changes over time, it fluxuates
    • ELCR can change the outcome even when IS&T doesn’t change!


SQL Commands

  • Describe TableName – gives output of all field names, types, keys, etc. for the specified table
  • You can select from two instances of the same table in SQL.  For example, you can create one instance of the batting table “b12” where yearID=2012, and select data from that.  You can then create a second instance “b13” where yearID=2013, and select data from that.  Then use WHERE b12.playerID = b13.playerID to get stats side-by-side for two year.

R Commands

  • file.choose() – Opens dialog box in operating system to choose the file to load
  • DataFrameName[“XXX”] = XYZ, to add a new field called “XXX” to a data set using variable array XYZ
  • lm(X~Y) – fitting linear  models, linear regression of variable x against variable y
  • boxplot() – creates a box plot that shows min, max, median, and quartiles


Earnshaw Cook

  • Wrote book “Percentage Baseball” in 1964, the first full-length book onsabermetrics.  It wasn’t very well written, would not stand up as a scientific finding.
    • Suggested sending batters to plate in order of their skill, best hitter first (this has proven to not be true)
    • Suggested discarding platoon splits and just using best hitters (not true)
    • Start game with a RP and pinch hit for him first time through order, then switching to SP
    • Book was too complicated, math focused, could not be adopted
    • Mathematical mistakes
    • But he was a pioneer by doing this in the 1960s
  • Princeton, engineer, professor
  • Applied math to baseball
  • 1964 Scoring Index, R = (Constant * (1B+BB+ROE+HBP-2*SH)*TB)/PA
  • 1972 Scoring Index, R = ((H+BB+HBP)*(TB+SB-CS))/PA
  • Formulas are similar to OPS
  • He analyzed run expectancy for the 24 game states

Leave a Reply