Sabermetrics
- Linear weights – people have used multiple linear regression in order to determine relative value of baseball events (1B, 2B, 3B, HR, BB, HBP, etc.)
- One out is -.25
- OPS = OBP + SLG
- BRA = OBP * SLG
- XRR = (.5 * 1B)+(.72 * 2B) + (1.04 * 3B) +(1.44*HR)+.33*(HBP + BB)+.18*SB – .32*CS-.098*(AB-H)
- LWTs = (46 *1B)+(.8 *2B) + (1.02 * 3B) +(1.40*HR)+.33*(HBP + BB)+.3*SB – .6*CS-.25*(AB-H)
- By Pete Palmer
- Measures Runs Above Average
- Base Runs, created by David Smyth
- More flexible
- Works across different run scoring environments (might even work in softball)
Statistics
Regression to the Mean
- Think of sophomore slumps and SI cover jinx
- Often leads to incorrect interpretation of results
- We are often measuring outcomes, not talents or skills
- Outcomes are not necessarily perfect depiction of talent. Outcomes can have a heavy element of luck.
- Envision a typical scatterplot. Even when there is strong correlation between two variables, there is always error above and below that line of best fit.
- Outcomes = Innate Skills & Talents + Error Luck Chance Randomness
- IS&T changes over time, it fluxuates
- ELCR can change the outcome even when IS&T doesn’t change!
Technology
SQL Commands
- Describe TableName – gives output of all field names, types, keys, etc. for the specified table
- You can select from two instances of the same table in SQL. For example, you can create one instance of the batting table “b12” where yearID=2012, and select data from that. You can then create a second instance “b13” where yearID=2013, and select data from that. Then use WHERE b12.playerID = b13.playerID to get stats side-by-side for two year.
R Commands
- file.choose() – Opens dialog box in operating system to choose the file to load
- DataFrameName[“XXX”] = XYZ, to add a new field called “XXX” to a data set using variable array XYZ
- lm(X~Y) – fitting linear models, linear regression of variable x against variable y
- boxplot() – creates a box plot that shows min, max, median, and quartiles
History
Earnshaw Cook
- Wrote book “Percentage Baseball” in 1964, the first full-length book onsabermetrics. It wasn’t very well written, would not stand up as a scientific finding.
- Suggested sending batters to plate in order of their skill, best hitter first (this has proven to not be true)
- Suggested discarding platoon splits and just using best hitters (not true)
- Start game with a RP and pinch hit for him first time through order, then switching to SP
- Book was too complicated, math focused, could not be adopted
- Mathematical mistakes
- But he was a pioneer by doing this in the 1960s
- Princeton, engineer, professor
- Applied math to baseball
- 1964 Scoring Index, R = (Constant * (1B+BB+ROE+HBP-2*SH)*TB)/PA
- 1972 Scoring Index, R = ((H+BB+HBP)*(TB+SB-CS))/PA
- Formulas are similar to OPS
- He analyzed run expectancy for the 24 game states