Baseball: An Introduction to Sabermetrics

If you are a baseball statistics junkie, then you have probably heard about Sabermetrics. Sabermetrics is the use of baseball statistics and other research to analyze the game for the purpose of making comparisons between players and teams throughout baseball's history.

The word comes from the acronym of SABR, the Society for American Baseball Research. Noted baseball statistical guru Bill James invented the term. James did not devise the first new baseball formula. As far back as the early 1950's, former Cardinals, Dodgers, and Pirates general manager Branch Rickey created the first of what today would be considered a sabermetric formula. The Mahatma was among the first to realize that on-base percentage (hits + walks + hit by pitch)/(at bats + walks + sacrifice flies) was more important than batting average. He combined on-base percentage with isolated power (slugging percentage minus batting average) to come up with an early cousin to today's on base plus slugging.
 
The book Moneyball by Michael Lewis created several thousand new fans of sabermetric baseball analysis. Oakland Athletics General Manager Billy Beane interpreted the statistics of baseball players through sabermetric means, and it helped the A's stay near the top with one of the lowest payrolls in the Major Leagues. In 2004, Boston Red Sox General Manager Theo Epstein used sabermetric analysis to make the necessary changes to the Bosox roster. Those changes enabled the Sox to win their first World Championship since the dead ball era.
 
There are many different sabermetric ratings that measure some form of hitting, pitching, or fielding. To cover them all would take a book the size of the Baseball Encyclopedia. Therefore, I am going to choose three hitting ratings and one pitching formula. After explaining them, I am going to attempt to apply this to Vanderbilt baseball.
 
Hitter's Statistics
 
1. Ugly Weights: Phil Birnbaum created this formula. Its name comes from the improvement he made over a long-used group of ratings called "Linear Weights" (created by Pete Palmer). This improved linear formula is among the most accurate because it works equally well for average players and players far better or far worse than average. Some ratings only work for the average typical player, but fail when trying to rate Barry Bonds or Mario Mendoza.
 
2. Extrapolated Runs:(XR) Jim Furtado created this easy to use formula. Actually, there are three different formulas for this rating; XR for when all offensive statistics are available; XR reduced for when only the bare basic statistics are available; and XRB or XR Basic which is XR reduced without hit by pitch statistics.
 
3. Estimated Runs Produced. Paul Johnson (not the Navy football coach) created this easy to use formula. On an individual basis, it is more accurate than the runs created formula below because it works better with the players who produced high on-base percentages and slugging averages.
 
4. Runs Created: This is one of Bill James' ratings. This formula loses accuracy with players with high on-base percentages and slugging averages (like Ted Williams or Barry Bonds), but it is fairly decent when used on a team-by-team basis.
 
Ugly Weights Formula
 
Runs = .46*(1b) + .80*(2b) + 1.02*(3b) + 1.40*(HR) + .33*(BB) + .30*(SB) -
.50*(CS) - [(.687* avg) - (1.188* avg² ) + (.152 * isolated power ²) - {1.288 * (BB/AB) * avg} - (.049 * avg * isolated power) + {.271 * avg * isolated power * (BB/AB)} + {.459*(BB/AB)} - {.552*(BB/AB)²} - .018] (Outs)
 
Isolated Power = (TB-H)/AB
Outs = AB - H
 
Extrapolated Runs
 
XR = (.50 x 1B) + (.72 x 2B) + (1.04 x 3B) + (1.44 x HR) + (.34 x (HBP+TBB-IBB)) +(.25 x IBB)+ (.18 x SB) + (-.32 x CS) + (-.090 x (AB - H - K)) + (-.098 x K)+ (-.37 x GIDP) + (.37 x SF) + (.04 x SH)
 
Estimated Runs Produced
 
ERP = (2*(TB+BB+HBP)+H+SB-(.605*(AB+CS+GIDP-H)))*.16
 
Runs Created
 
RC = (H+W+HBP-CS-GDP)*(TB+(.26(BB-IBB+HBP)+.52(SH+SF+SB))/ AB+BB+HBP+SH+SF)
 
Pitcher's Statistics
 
Rating pitchers has always been difficult. Take a look at two Major League pitchers from 2005. Last year Derek Lowe of the Los Angeles Dodgers posted an ERA of 3.61 and a walks plus hits per innings pitched (Whip) of 1.25. Jon Lieber of the Philadelphia Phillies had an ERA of 4.20 and a Whip of 1.21. Now look at their records. Lowe went just 12-15. Lieber went 17-13. Can we definitely say one pitcher was better than the other with just these statistics? No, and here's why. Both of these pitchers are notorious ground ball inducers. Dodger Stadium's infield is rough and choppy. The infield at Citizen's Bank Park in Philadelphia is ideal. A hard grounder that sneaks through the infield in LA won't perform the way one will in Philadelphia. Hard hit grounders down both lines at Dodger Stadium will curve around and come to the fielder when it hits the curved-in wall. At Philadelphia, that same ball will bounce away from the fielder and surrender an extra base.
 
More importantly, just how much effect does a pitcher have on a ball that has been hit on the ground, or for that matter a line drive or a fly ball? If the pitcher's infield has four gold glovers, he will give up far fewer ground ball hits than a pitcher with four Pacific Coast League fielders just called up. If a pitcher surrenders a lot of fly balls, having an average fielding outfield in Coors Field will be less fortunate than if that pitcher had an excellent fielding outfield in U.S. Cellular Field.
 
To compare pitchers, you must factor out all the things that the pitcher has no control over. For the most part, when a pitcher walks or strikes out a batter, he gets all the credit. When a pitcher surrenders an over-the-fence home run, his defense has no effect (the ballpark does though). Every time the ball is batted into play, a pitcher's teammates have much more of the deciding factor of whether or not it will be a hit or an out. This isn't to say that a pitcher doesn't have some deciding factor. A batter may not get a good piece of wood on the ball because a good pitch has fooled him.
 
Knuckleball and junk-ball pitchers make their living enticing batters to swing at pitches that are hard to hit well but not hard to in which to make some form of contact.
 
There is a rating that factors all these things and comes close to replicating effectiveness. It is called, "defense independent pitching statistics (Dips). For this formula, we owe Voros McCracken much praise. This prominent sabermetrician earned a consultant position with the Boston Red Sox for his excellent work with the computer.
 
Dips has undergone some changes in the past year and in its third version (Dips 3.0), it is highly accurate at predicting performance. However, it requires knowing the percentage of ground balls allowed, fly balls allowed, pop-ups allowed, and line drives allowed. Obviously, these stats are only available at the Major League level and only in recent years when people began keeping this data. So, we could not apply them to Vanderbilt baseball.
 
Luckily, there is a rating that mimics Dips and it is easily adaptable for all baseball levels. It is called, Defense Independent Component ERA (DICE). A similar statistic is called Fielding Independent Pitching Statistics (FIPS). Here is the formula for DICE. It is supposed to estimate true ERA. The caveat for this formula is it doesn't work well for knuckleball or junk ball pitchers.
 
DICE = 3 + (([13*HR] + [3*(BB+HBP)] - [2*K])/IP)
 
Vandy's Ratings With Some Major League Comparisons
 
Let's apply some of these ratings to our red-hot Commodores. These ratings won't be as accurate as they would be for the big leagues, but this should be a fun exercise that reveals some truth. Since nine conference games really aren't enough to use, I will apply this to the 28 total games played to date (as of April 3). In compiling a 19-9 ledger to date, the stats are quite impressive.
 
First let's look at Ugly Weights for all Commodore hitters with 50 or more at bats. Since Vandy has played just 28 games, I will then multiple each rating by 5.75 to approximate how these ratings would look for a complete Major League season. The numbers listed are the actual ratings followed by the 162-game equivalent ratings. Following the Commodore players' ratings, I will include some Major League players' stats as a comparison.
 
Pedro Alvarez              27        155
Ryan Flaherty              20        115
Dominic de la osa        15          86
Brian Hernandez         15          86
David Macias               14          81
Parker Hanks              13          75
Shea Robin                 10          58
Alex Feinberg               9           52
Ryan Davis                   7           40
Matt Meingasner          7           40
 
Last year's average American League player had a UW rating of 79.
 
A-Rod had a UW of 143.
 
Albert Pujols had a UW of 146.
 
What this means is that Pedro Alvarez is having one great season for the black and gold.
 
The best ever UW was Barry Bonds in 2004 with 215. The best prior to that season was Babe Ruth in 1921 with 199. Ted Williams posted a UW of 178 in his big 1941 season.
 
You will see that extrapolated runs are very close to UW. The XR for Vandy's top three batters are Alvarez with 25, Flaherty with 20, and de la osa with 16. Notice the similarities with UW.
 
Yes, Estimated Runs Produced is virtually identical with the other two ratings. For Vandy's top three hitters, the ratings show Alvarez with 25, Flaherty with 19, and de la osa with 15.
 
Now let's look at pitching stats. This is where the Goldmen are head and heels among the top teams in the NCAA. Five pitchers have thrown for 20 innings or more. David Price has a DICE rating of just 0.69! We'll see how unbelievable that is in a moment. Ty Davis comes next with a DICE of 2.01, followed by Cody Crowell with 2.39, Cameron Betourne with 2.65, and Matt Buschmann with 3.32.
 
Looking at last season's top Major League pitchers, Roger Clemens posted a DICE of 2.85; Dontrelle Willis registered a 2.97, while Johan Santana bettered those with a 2.78.
 
So, what pitchers in Major League history recorded something close to David Price's current rating? Nobody, that's who. In 1968, Bob Gibson posted an ERA of just 1.12 with 22 wins and 13 shutouts. His DICE rating was 2.39. Sandy Koufax had the best four consecutive year period of any modern day pitcher from 1963 to 1966. In 1965, he went 26-8 with a then record 382 strike outs in 335.2 innings. His DICE for that year was 2.41. Lefty Grove went 31-4 in 1931 and posted a DICE of 2.89. Walter Johnson in 1912 struck out 303 batters and won 33 games. His DICE was 2.18. Looking at the best relief record for one season, Dennis Eckersley in 1990 walked only 4 batters all season; he yielded just two homers and struck out 73 batters in 73.1 innings; he saved 48 games with an ERA of 0.61. His DICE was 1.53. So, Price's current DICE rating of 0.69 is something to behold, even if SEC baseball is about the equivalent of Class A minor league ball.

Commodores Daily Top Stories