# Module 5

## Sabermetrics

• Linear weights – people have used multiple linear regression in order to determine relative value of baseball events (1B, 2B, 3B, HR, BB, HBP, etc.)
• One out is -.25
• OPS = OBP + SLG
• BRA = OBP * SLG
• XRR = (.5 * 1B)+(.72 * 2B) + (1.04 * 3B) +(1.44*HR)+.33*(HBP + BB)+.18*SB – .32*CS-.098*(AB-H)
• LWTs = (46 *1B)+(.8 *2B) + (1.02 * 3B) +(1.40*HR)+.33*(HBP + BB)+.3*SB – .6*CS-.25*(AB-H)
• By Pete Palmer
• Measures Runs Above Average
• Base Runs, created by David Smyth
• More flexible
• Works across different run scoring environments (might even work in softball)

## Statistics

### Regression to the Mean

• Think of sophomore slumps and SI cover jinx
• Often leads to incorrect interpretation of results
• We are often measuring outcomes, not talents or skills
• Outcomes are not necessarily perfect depiction of talent.  Outcomes can have a heavy element of luck.
• Envision a typical scatterplot.  Even when there is strong correlation between two variables, there is always error above and below that line of best fit.
• Outcomes = Innate Skills & Talents + Error Luck Chance Randomness
• IS&T changes over time, it fluxuates
• ELCR can change the outcome even when IS&T doesn’t change!

## Technology

### SQL Commands

• Describe TableName – gives output of all field names, types, keys, etc. for the specified table
• You can select from two instances of the same table in SQL.  For example, you can create one instance of the batting table “b12” where yearID=2012, and select data from that.  You can then create a second instance “b13” where yearID=2013, and select data from that.  Then use WHERE b12.playerID = b13.playerID to get stats side-by-side for two year.

### R Commands

• file.choose() – Opens dialog box in operating system to choose the file to load
• DataFrameName[“XXX”] = XYZ, to add a new field called “XXX” to a data set using variable array XYZ
• lm(X~Y) – fitting linear  models, linear regression of variable x against variable y
• boxplot() – creates a box plot that shows min, max, median, and quartiles

## History

### Earnshaw Cook

• Wrote book “Percentage Baseball” in 1964, the first full-length book onsabermetrics.  It wasn’t very well written, would not stand up as a scientific finding.
• Suggested sending batters to plate in order of their skill, best hitter first (this has proven to not be true)
• Suggested discarding platoon splits and just using best hitters (not true)
• Start game with a RP and pinch hit for him first time through order, then switching to SP
• Book was too complicated, math focused, could not be adopted
• Mathematical mistakes
• But he was a pioneer by doing this in the 1960s
• Princeton, engineer, professor
• Applied math to baseball
• 1964 Scoring Index, R = (Constant * (1B+BB+ROE+HBP-2*SH)*TB)/PA
• 1972 Scoring Index, R = ((H+BB+HBP)*(TB+SB-CS))/PA
• Formulas are similar to OPS
• He analyzed run expectancy for the 24 game states