MLB Starting Rotations: Using Data to Define an Ace (and a 2 and a 3 and a 4…)

Since I really want to use the blogosphere to solve as many of baseball’s infinite puzzles that I possibly can (within the constraints of life), it probably seems like I’m not being very ambitious with this post – at least if you’re judging by the title. I get it…there’s even a definition of “Ace” provided by Major League Baseball at MLB.com. That’s about as official as it gets, so consider this a closed case, right? Well, you can probably assume from the inclusion of hundreds of words below this paragraph that my answer is no. There’s really not much existing literature that delineates the parameters of an “ace” (or any other spot in the rotation) in an objective, data-driven manner. The great Jeff Sullivan was on the cusp with this Fangraphs post, but ultimately conducted an opinion poll in which readers were asked if they considered the top SP in each team’s rotation an “ace”. Both Jeff’s methodology and his conclusions underscore additional benefits of establishing objective, context-neutral parameters:

(NOTE: This isn’t a criticism of Jeff Sullivan or his post…he’s probably my favorite baseball writer by a wide margin, and his objectives with said post were not the same as my objectives with this post):

  1. Posted prior to the 2016 season, the content is contemporaneously relevant, and 71% of respondents considered Sonny Gray an ace. With statistically-rigid definitions of what an ace is, we could compare Sonny Gray’s performance at that point instead of laughing at the mere thought of being asked “Is Sonny Gray an ace?”. At this juncture I’d imagine Gray’s perception is that of a fringe starter who fills in when someone goes down. But is that what he really is? I don’t know, we haven’t established what makes a fringe starter either. With context-neutral definitions of each rotation spot, we can eliminate the contemporaneous relevance and easily make comparisons across seasons or even eras.
  2. Jeff concluded there were about 20 starting pitchers in Major League Baseball that most people would agree were aces, which makes us 10 shy of what we’d expect given the MLB definition of “ace” (the top starting pitcher on a team). While small year-to-year variances are to be expected, we should consistently find about 30 pitchers to fall within the parameters of acehood. So really, Jeff’s poll found there was a perception that 20 aces were active at the time – I contend that there were actually around 30, and roughly a third of them weren’t all that obvious. We want to eliminate the perception aspect with definitive criteria that undeniably establishes acehood.
  3. It turns out that the perception of an ace wasn’t completely performance-based (shocker!): pitchers from more talented rotations were penalized for being teammates with other good starting pitchers. Stephen Strasburg outperformed many of the pitchers who scored higher than him, yet only 57% of respondents considered him an ace – largely due to being in the same rotation as Max Scherzer (and probably injuries). While some may consider it fundamentally incorrect to label multiple pitchers from the same rotation “aces”, it’s going to be harder to convince me that a league-average pitcher who leads a rotation where he’s followed by 4 below-average teammates is more worthy of the ace label. Objectively speaking, an ace is unconditionally an ace based on performance (not on that of his teammates). The ace parameters will rid us of the perception penalty incurred by aces who are teammates with aces, and likewise the perception benefit bestowed on non-aces who overshadow their relatively inferior rotation mates.

Before we go any further, I want to make it clear that I’m writing under the assumption that an “ace” and a “#1” are synonymous. On a recent episode of Effectively Wild, Ben, Jeff, and Meg Rowley all bantered about how we define an ace, and even briefly attempted to distinguish the differences between an ace and a #1; not that they’re mutually exclusive, but it sounded more like the beginning of an LSAT logic game where ‘all aces are #1s, but not all #1s are aces…’ from what I gathered. I don’t want to strictly adhere to the MLB.com definition, but for the sake of this post, we’re going to at least continue under the assumption that aces and #1s meet the same defining criteria as each other.

Perhaps counterintuitively, the task of defining each role within a rotation is even more important given the lightening workloads of starting pitchers, and, inversely, the increasing workloads of relievers. The paradigm shifts with caution, and no team is should have a perennial Cy Young candidate throw anything less than the greatest quantity of innings he can possibly throw without sacrificing performance or health.

With the advent of the Opener, what truly constitutes a “Starting Pitcher” is becoming increasingly vague. It wouldn’t be much of a surprise to see some of the more traditional roles played by back-of-the-rotation starting pitchers to completely disappear in the pretty near future. But it should be a little more than obvious that this evolutionary process isn’t necessary for all SPs, right? Perhaps the most likely progression begins with the teams under tighter budget constraints, relatively deeper relief corps than starting corps, and the ones just a little more forward thinking. We saw the Rays unveil the strategy out of necessity, soon followed by the injury-stricken Athletics. But what was spawned initially out of necessity for the early adopters should presumably expand to teams doing it out of practicality.

But in the wake of all this, one puzzle we’re left to figure out revolves around the pitchers to cut from their traditional role – who should be sacrificed to this developing experiment?

I’m not going to try and answer that in THIS post, because we need to solve another puzzle as a prerequisite – the definition of each spot in the rotation. On one hand, it couldn’t be simpler; each spot is based on the order of talent within a given pool of starting pitchers, beginning with the most talented at the top. On the other hand, it’s a complex and generally subjective matter, albeit unnecessarily; a lot of credible baseball people might require seemingly arbitrary attributes, like a minimum fastball velocity for an ace, or more strikeouts than innings for anyone in the one or two spot. I’m not saying these ideas are necessarily incorrect either, but my goal is to wash away the ambiguity. Defining the performance expectations of each spot in the rotation can be done objectively by analyzing some key metrics and keeping the parameters simple.

First we’ll define the parameters. We know MLB’s definition of an “ace” is the best starting pitcher on a given team. We also concede that not every team has an ace because talent isn’t equally distributed. So how we divide the pitcher roles will be across teams rather than within them; this means “aces” will be the top 30 starting pitchers in MLB, not the single best starting pitcher from each of the 30 teams (which is how we’d determine acehood using MLB’s definition).

As easy as it is to envision the stereotypical grumpy baseball traditionalist reciting how only a few pitchers handled the majority of innings decades ago, 5-man rotations outnumbered all other combinations for the first time in 1926 (believe it or not, the 6 man rotation was actually more common than the 3-man rotation at that point). So we can call a rotation a pool of five starting pitchers without much controversy. However, given how improbable it is to expect the same 5 pitchers to make all their scheduled starts in a given year, every team generally has a 6th pitcher who can start (either in theory or an actual place on the 25-man roster) whenever someone from the top 5 can’t. As a role every team has been forced to utilize, and the means by which many SPs crack their first rotation, the 6th spot is by no means trivial. So, while we’ll call a rotation a set of 5 SPs, we’re also saying they’re the top 5 from a pool of 6 pitchers. This establishes 6 tiers that, under optimal conditions, would be represented by sextiles (that’s what you call 6 equally-sized groups) of talent, where the first sextile holds the top 16.7% of talent, which descends with each tier.

Unfortunately, since true talent can’t really be quantified, we’ll have to proxy talent with performance metrics. Here I’m going to use ERA-, FIP-, and xFIP-. This lets us compare the metrics equally across different seasons, leagues, and parks, creating a context-neutral benchmark for comparison. I assume anyone who finds themselves on this blog is familiar with these three metrics and why they’re more useful than their slightly-more-traditional-non-minus counterparts. But if not, I highly recommend checking out their entries in the Fangraphs Glossary (you’ll learn a ton in like 5 minutes).

(NOTE: If you REALLY don’t feel like leaving the page, the key here is the number 100; 100 is average. An ERA-/FIP-/xFIP- under 100 is better than average, and anything above 100 is worse than average, with the absolute difference representing the percent better or worse than average. For example, a FIP- of 75 is 25% better (less) than league average: 75 – 100 = -25%. For normalized stats that end in “-“, any measure below 100 is good, while the opposite holds true for normalized metrics ending with a “+”, such as wRC+.)

Instead of using these metrics individually for our approximation of talent, I’m going to use the average. ERA often comes under fire because it’s a relatively poor predictor of future performance due to the amount of luck associated with its inputs – which is well warranted given that both FIP, xFIP, and even K%-BB% actually predict future ERA better than past ERA. But I’m including ERA here because I don’t see any reason to omit past success as a component that defines an ace, or any other tier of a rotation, lucky or unlucky. However, since we’re attempting to approximate talent to define each tier, it’s important we limit the magnitude of ERA since much of its variance is fielding-dependent. We do this by including the other two metrics, FIP(-) and xFIP(-), both which are obviously fielding-independent, and rely exclusively on the pitcher. Furthermore, while each metric is results-based, the most forward-looking of them is xFIP, which is a better predictor than both FIP and ERA are of their future selves. So while xFIP might be the worst descriptor of what actually happened, it’s easily the best indicator of what will eventually happen. This is important is because it makes future expectations a part of the equation.

Additionally, while it won’t be perfect given the incomparable year-to-year variance of each respective metric, the average also gives us an idea of the rough cutoff for each metric individually. So once we establish our cutoffs, we could say, “player X had an ERA- of 99 but an xFIP of -75. So he pitched like a #3 starter, but I expect him to pitch like an ace moving forward”.

So our talent proxy is simply the average of ERA-, FIP-, and xFIP-, which I’ll call MEAN-. Once we establish the cutoff for each sextile, our tiers will be defined. Using data from 2002 through 2018, I looked at every pitcher who threw at least 100 IP as a starter, calculated both their MEAN- and their respective MEAN- percentile rank, and here’s what we have:

While splitting our data into sextiles gives us the mathematical explanation as to why this happened, at first glance it might seem odd to see Tier 4 begin with the league average MEAN-…because league average should be a #3, shouldn’t it? Actually it shouldn’t. There’s a reason top pitching prospects are often given labels that imply something as seemingly underwhelming as a “3rd starter” – it’s because 3rd starters are (barely) above average pitchers. Sure, they’re seen as the midpoint in the rotation, but they’re only the midpoint when the best 5 options make all their scheduled starts, themselves included. At some point, every team utilizes their 6th option, with few exceptions. In 2018, the Indians and Rockies used the fewest starting pitchers with 7, while the average big league team utilized 12. Starting pitchers whose innings total ranked 6th or lower on their respective teams accounted for 18.8% of starting pitcher innings – only the top ranked starting pitcher (and presumable ace) accounted for more with 21%. This helps explain why the 4th Tier is where league average goes, and not the 3rd Tier.

The table above shows some average performance metrics of the starting pitchers within each tier dating back to 2002. Everything descends or ascends in the order you’d expect it to, but one interesting thing about the table is the WAR column. Tiers 2 through 6 are separated pretty evenly, ranging anywhere between a 0.6 and 0.8 WAR differential with the adjacent tier. The exception is Tier 1 (our Ace Tier), which is a full 1.5 WAR ahead of Tier 2. We can see this more clearly in the table of average WAR by tier; the linearity holds steady for the most part in tiers 2 through 6, only to slope sharper from 1 to 2. So even while we’ll find roughly the same number of pitchers within each tier on an annual basis, upgrading from a Tier 3 pitcher to a Tier 2 pitcher won’t yield the same improvement you’d see from upgrading a Tier 2 to a Tier 1. The roughly equal tier-by-tier difference in WAR from the bottom 5 tiers suggests we get essentially flat marginal returns from any single-tier upgrade unless we’re adding a Tier 1 guy (an ace!).

    

That may have been tough to follow, but let me put it another way. Let’s say you’re a GM headed into the offseason with the goal of upgrading your rotation via trade. For the sake of this hypothetical, you’re only able to offer one trade package comprised of a starting pitcher from your current rotation, a prospect, and cash. In return, you’ll receive a starting pitcher that’s 1 tier better than the SP you’re trading away (the prospect and cash are irrelevant other than making the tier downgrade worthwhile for your trade partner). We’ll hold the prospect and cash fixed, so the only part of the offer you can change is the tier of the pitcher you give up, and therefore, the tier of the pitcher you receive. So here’s what you’re looking at in the trade for a new SP:

  • Assume your 5-man rotation is comprised of a starting pitcher from each of the top 5 tiers
  • You also have a Tier 6 pitcher you use as a spot starter
  • Your ace is the only pitcher you’re unable to trade
  • If you give up a Tier 6, you’ll receive a Tier 5    (~0.8 net WAR)
  • If you give up a Tier 5, you’ll receive a Tier 4    (~0.7 net WAR)
  • If you give up a Tier 4, you’ll receive a Tier 3    (~0.6 net WAR)
  • If you give up a Tier 3, you’ll receive a Tier 2    (~0.8 net WAR)
  • If you give up a Tier 2, you’ll receive a Tier 1    (~1.5 net WAR)

The right thing to do here is to give up your Tier 2 pitcher, so you end up getting a Tier 1 SP. Sure, you get two aces in the rotation now, but the reason for giving up your #2 isn’t as simple as ‘adding an ace’. The reason you gave up your Tier 2 for a Tier 1 is because it represented the only offer with a marginal upgrade compared to what was on the table. In other words, the added benefit from swapping a Tier 6 with a Tier 5 is roughly the same as the added benefit from swapping a Tier 5 for a Tier 4, a Tier 4 for a Tier 3, and a Tier 3 for a Tier 2.

Since I have a habit of overexplaining things, I’ll end with some examples of each tier using numbers from the 2018 season. For the table of 2018 Tier Examples, 5 randomly selected pitchers within each tier were chosen just so readers get a better idea of who falls in line with a given tier.

#pitching