Posts Tagged ‘charts and graphs’


Who is The Best Base Stealer in MLB?

March 20, 2010

After several seasons of watching Jacoby Ellsbury, Carl Crawford and B.J. Upton run amok around the A.L. East, it’s been refreshing to see elite speed on the Yankees in the form of Brett Gardner (side note: have you ever seen the actual definition of the word amok? Less mischievous and more sinister than I would have guessed. A manic urge to murder. Yikes.) As a team, the Yankees have been a better than average base stealing team for the past several years (ranking 7th in the A.L. in 2009, 4th in 2008, 4th in 2007, 2nd in 2006 and 6th in 2005). However, that’s more due to the fact that they featured so many players capable of stealing bases, with none being at an elite level. Jeter, Damon, Abreu, and A-Rod have all been capable of 20-30 steals in their time as Yankees, and both Joe Girardi and Joe Torre have been more than willing to let them run. None of those players possess the base stealing potential or the incredible speed that Brett Gardner does. While it is certainly way too early to say Brett Gardner is the best base stealer in baseball (he hasn’t even played a full major league season), his ability and potential to take that title have made me wonder just who is.

Rickey Henderson still leads the league in Rickey

Aside from the list of players above, there are a number of others who belong in the discussion: Jose Reyes (operating on the assumption that his bionic legs are intact), Jimmy Rollins, Brian Roberts, Michael Bourn, Ichiro, Willy Taveras and Chone Figgins. A few others perhaps worth mentioning but who I’m not factoring in the the discussion are Rajai Davis and Nyjer Morgan (like Gardner they haven’t had enough time to fully display their skills in the majors but will probably be among the best in the years to come), and Carlos Gomez and Joey Gathright (neither of whom plays enough due to other limitations to make full use of their ability… but there’s always this).

I’m going to look at the last three years worth of data, checking out the basics (stolen base totals, stolen base percentage) plus trying to figure out how well they put their speed to use. A simple (and very, very raw) way of estimating how much use these players get out of their talents would seem to be how many bases they are stealing in relation to how much they play. Now of course that is very heavily dependent on a lot of other factors (where in the lineup they bat would affect how often they’re on base with men in front of them, and could also impact how comfortable a manager would be green lighting a steal), but it should hopefully provide a rough estimate. The way I will be calculating that is:

(Stolen bases-caught stealing)/plate appearances

Unlike finding data for batting with men on base, finding data for base running with other men on base has proven to be a bit difficult, so this quick formula neglects it (as well as neglecting a ton of other factors). Even with a decent margin for error I think it will be a helpful tool in figuring out who the best base stealer in baseball is. By the way, for a running total of spring training SB’s you can click here.

The google doc is here. So, what say you dinosaur writer guy? Well, let’s look at the counting totals first, and remember everything is for 2007-2009:

In spite of all of his missed time Jose Reyes has the highest total, followed by Carl Crawford. Figgins was caught the most in the group, followed by Reyes and Upton.

Jimmy Rollins has an incredible 87.5% success rate. Upton and Figgins probably run more than they should, stealing below the magic number of 75% success.

Now as for who makes the most out of their skills. Willy Taveras is not very good at getting on base. His career OBP is a hilarious .321, easily the lowest out of of the entire group. Yet he still produces more successful steals (discounting for CS) per plate appearance than any of the others. What that’s saying is that if Taveras had even average on base ability, say somewhere in the range of .340-.350, you could be looking at a guy who steals 80 bases annually.

So who do we conclude as the best base stealer in baseball? I’m going to have to go with Taveras. Among all of the top base stealers he has the second best success rate, which he maintains even while taking off more liberally than any of the others. In a perfect world Brett Gardner turns into a base stealing clone of Rollins, Ellsbury or Taveras, hopefully getting on base more often than any of them (and playing better defense, in the case of Ellsbury). Best case scenario is that Gardner turns into a 100-110 OPS+ player who steals a ton of bases and plays excellent defense at a premium position in center. Worst case would seem to be a much better version of Gathright; a defensive replacement and pinch runner who can be valuable depending on the situation and proper use.

By the way I think manic urges to murder needs to become a more frequently used post tag. I will use it every time I write an article ranting about Mike Lupica.

thzxUnlike finding data for batting with men on base, finding data for base running with other men on base has proven to be a bit difficult.


August 30, 2009

Tonight I have a sort of dual purpose project. My first reason for getting into it is that I am trying to learn Excel basics. The second is that I was curious at to how well runs scored by MLB teams correlated with runs created. Runs created is a sort of simple catch all offensive term. It includes power numbers, on base skills, and speed as well. Caught stealing takes away from the on base portion, because obviously you have wiped away the appearance on base. Stolen bases contribute to the base advancing half, which also includes total bases to encompass doubles, triples and homers. That wiki link explains it better than I do in any case, and while it is far less advanced than wOBA and other new metrics it is still a surprisingly accurate formula for the expected runs an offense will output.

Also note that there are several versions of runs created. I am using a semi-basic one that still includes SB and CS:


Now if anyone would like to check out the excel workbook it can be seen here. On to the nerdery!

The first describes the total number of runs scored (blue) against runs created by the team (red). The teams are ordered from least prolific offense to most, with the league and MLB averages at the top. Click for a more readable size.

rc v rs

It’s impressive how accurate these formulas are. The team outperforming their expected runs the most are the A’s. The Yankees are actually underperforming. They should realistically have even more runs than they currently have scored. I’m not about to go into any deeper calculations today to figure out the exact cause, but reasons for a difference in actual runs and runs created can be double plays hit into, running into outs on the bases, or simple failure in clutch situations. That last one is a possibility for the Yankees, as they hit 12 points lower with runners in scoring position than they do overall.

The second graph levels out team totals, since every team has not played the same number of games. It compares runs per game (red) with runs created per game (blue).

rg v rcg

Same idea as the last graph, just slightly more accurate. In all likelihood the two values will get closer to each other as the season goes on. The larger the sample size the lower the probability of an outlier (such as the Yankees bad luck, which is astounding to see being that they are second in MLB in runs per game).

I have a few other projects of varying depth I’ll be messing around with, but I shall try and keep everything on here updated with new info. Stay tuned!