How to Identify Bounceback Candidates (Pitcher Edition)

Okay, a lot of people think ERA sucks. Sure, I don’t really disagree in the sense that it’s luck-laden and a poor predictor of future performance. It’s a shallow measure, but it still seems to get the best of those even at the highest levels; Jon Gray was left off the Rockies’ playoff roster after posting a 5.12 ERA that wasn’t really compatible with his 9.6 K/9 and 2.72 BB/9. Domingo German couldn’t stay in the Majors with his 5.57 ERA in spite of striking out nearly 11 per 9 and walking 3.5/9.

This isn’t a defense of ERA by any means – its not. This is a guide to find out who’s 2019 ERA is (probably) going to be better than their 2018 ERA, and it’s pretty simple. Fangraphs features a metric called “E-F”, which is simply a pitcher’s ERA minus FIP. This can give us some idea of how representative the pitcher’s ERA actually is – grossly oversimplified, it gives us a measure of luck. The following facts have been fairly well-documented, but just for a refresh, I want to reiterate the following:

  • ERA is a relatively poor predictor of future ERA
  • FIP is a better predictor of future ERA but still not great
  • xFIP is a better predictor of future ERA and future FIP than both ERA and FIP

Results-based analysis is tricky business, but not totally unreliable when done correctly. ERA is far from the ideal indicator of a pitcher’s ability, which has been addressed through FIP, which also includes a lot of noise that’s washed away in xFIP. Things that show little or no year-to-year correlation, such as HR/FB% or BABIP, are controlled for by applying constants in the calculation of xFIP, which is why it’s probably the best metric we use to evaluate how good a pitcher’s been, at least in the same context of ERA. Unfortunately, fans, fantasy leagues, and the general consumption of baseball continue to emphasize ERA in spite of it’s obvious shortcomings, probably due to a fear of adaptation. So even though it would be more practical and easier to predict future xFIP, we’re going to predict future ERA with xFIP, since it’s still the best we’ve got.

Let’s check out the correlation matrix of ERA predictors I put together. This uses all big-league pitchers from 2010-2017 with at least 30 IP in a given half-season who also threw at least 30 IP in the subsequent half-season. I did notice that the within-period correlations aren’t identical in both time periods (ERA’s respective correlation to FIP and xFIP is .67 and .49 in t=0, but .70 and .55 in t+1…this still occurs even when ERA-/FIP-/xFIP- are used instead, so I’m theorizing that it’s just a matter of a pitcher gaining consistency with an additional year of experience, but that’s another post for another day.) We can see that each of the bullet points above are reflected in the matrix, and that xFIP does a much better job of predicting the future than any other metric. So what am I trying to prove here? That xFIP is a super useful metric that isn’t used enough for predictive analysis! And unlike ERA, xFIP is a superb predictor of itself, which is why I highlighted that particular part of the matrix, and added the chart on xFIP predictability. Worth noting is that the full-season correlation between ERA and xFIP is a much better-looking 0.64, compared to the half-season correlations shown in the matrix, so being able to predict xFIP from one period to the next is pretty valuable.

        

So now that I’ve emphasized the value of xFIP versus the other metrics as predictors with some visual overkill, I’m going to rework the Fangraphs’ metric I mentioned earlier: instead of E-F (ERA-FIP), we’ll be using E-X (ERA-xFIP).

Let’s set up some definitions that will apply to the remainder of this post:

  1. Overachiever – A pitcher who’s xFIP exceeds his ERA. In this case the E-X is negative.

    2018 Example: Wade Miley; 2.57 ERA/ 4.3 xFIP/ -1.73 E-X with MIL

  2. Underachiever – A pitcher who’s xFIP is less than his ERA. In this case the E-X is positive.

    2018 Example: Marcus Stroman; 5.54 ERA/ 3.84 xFIP/ 1.7 E-X with TOR

The intuition here is simple enough – overachievers are due for positive regression (remember that “positive” is bad when it comes to ERA/FIP/xFIP) and underachievers are due for negative regression. In other words, pitchers with a negative E-X should see their ERAs increase, while pitchers with a positive E-X should see their ERAs decrease. I said “should”, but I really mean “do”, because the effect is quite robust when we use aggregated data. The first chart looking at ERA changes from 2017 to 2018 suggests that, while E-X is a good indicator of the direction a pitcher’s ERA is headed, underachievers appear to be more predictable than overachievers – at least using non-normalized metrics.

STANDARD
ERA & xFIP

Now since ERA is known to fluctuate over time and we need normalized metrics to compare across eras, I wanted to see how predictability changes (if it does at all) when we use ERA- and xFIP- instead of standard ERA and xFIP. Here, the effect is consistent across both groups (both overachievers and underachievers). Take a look at the chart below:

NORMALIZED
ERA & xFIP (ERA- & xFIP-)

This tells us that roughly 73% of overachieving pitchers in 2017 saw a rise in their 2018 ERA, while almost an identical portion of 2017 underachievers (72%) saw a decline in their 2018 ERA. That means, with respect to this sample, nearly three-quarters of the time we accurately predicted the direction of future ERA by subtracting xFIP- from ERA-. This is pretty powerful, but it’s limited in the sense that we’re looking at a binary prediction – its yes or no; while we can reasonably expect the ERA to increase or decrease, we don’t know by how much. And we all know to be skeptical when sample sizes are small; just 169 pitchers threw at least 40 IP in both 2018 and 2017, so let’s see what happens when we have a sample 8.5 larger than what’s reflected in the 2017/2018 chart…

NORMALIZED
ERA & xFIP (ERA- & xFIP-)

And there you go; 71% of overachievers saw their ERA go up in the subsequent half-season, and 72% of underachievers saw their ERA go down – basically unchanged from the previous chart. Here, time is grouped into half seasons rather than full seasons, which gives us an even greater sample to look at. So E-X is legit when it comes to predicting improvement or decline, but why not build on that if we can? If we’re trying to identify bounceback candidates, wouldn’t it be nice if we could know exactly how likely it is that a pitcher’s ERA will be lower next season (or next half-season) than it was in the most recent one?

Obviously the answer is ‘yes’, so I modeled the probability of ERA improvement using E-X as the singular dependent variable and ran a logistic regression on the binary outcome of whether or not ERA improved in the in half-season t+1. The summary statistics are shown below, as well as how to calculate the probability.

N=1454

Calculating the probability estimate of this model isn’t like a typical linear regression, so if you wanted to apply it to a particular pitcher on your own, here’s how it works:

So rather than going through too much more math, lets move on to what the model tells us by using the probability of ERA Improvement chart:

This shows us the estimated probability of a given pitcher improving his ERA in the next time period (in this case, half of a season), based on the E-X in the most recent period. While the model is built off half-season samples, we can reasonably apply it different time groups that occur consecutively, like a full season (we don’t want to stray too far from the half-season though, because we’d fail to account for a lot player-specific changes that might occur in the two time periods. For example, we wouldn’t want t=0 to be the last 5 years, where we’re trying to predict improvement in the next 5 years, because a lot of changes could occur with the pitcher we’re looking at; his mix might change, his velocity almost certainly will, perhaps Tommy John surgery, etc.) So, at an E-X of 0, we see the probability of improving ERA is 50%, which is right where we’d expect it to be (actually it’s 49.8% if we take it out to the thousandths place…the absolute probability difference in an E-X of 0 and -10 is actually almost the same as the difference between 0 and +10, but I kept the probability estimates to two decimal places for the sake of simplicity). The greater the E-X in the most recent (half) season, the more likely it is the pitcher’s ERA will drop in the next (half) season; even though only 18% of pitchers post E-Xs of at least 20, it’s certainly worth noting their probability of improvement is better than three-quarters. Even more rare is an E-X of 40 or greater, which occurs just 4% of the time, but is practically a guarantee of improvement at 91%.

So just for fun, let’s apply the model to a pitcher using his 2018 E-X, and determine the probability that his ERA will improve. One guy a lot of people might be curious about is Sonny Gray; are greener pastures ahead for Sonny in 2019? Or was all that chaos in New York City the catalyst to an irreversible downward trend? Well…let’s find out!

2018 Sonny Gray – NYY

ERA: 4.90    xFIP: 4.10

ERA-: 113    xFIP-: 97

E-X = 113-97 = 16    Now we’ll apply the model…

1/(1+e^-[-0.06+{0.059*16}]) = 0.718

Estimated probability of improvement is 71.8%! So Sonny Gray’s got a pretty good shot at being a better pitcher in 2019 than he was in 2018.

Let’s do another…how about NL Cy Young Award winner Jacob DeGrom? DeGrom had an absolutely insane year that a bunch of morons tried discrediting at various stages, but most of the people reading this are probably aware of how special it actually was. So how likely is it that DeGrom could be even better next year?

2018 Jacob DeGrom – NYM

ERA: 1.70    xFIP: 2.60

ERA-: 45    xFIP-: 64

    E-X = 45-64 = -19

1/(1+e^-[-0.06+{0.059*-19}]) = 0.245

So the model gives DeGrom a 24.5% shot at improving his ERA in 2019, which isn’t that bad considering there’s not much room for improvement when your ERA is 1.7…the closer you get to 0, the more improbable improvement becomes!

Instead of continuing with random case-by-case examples, I added a few names to the probability chart to go along with Sonny Gray and Jacob DeGrom. I also built a table of 25 semi-randomly selected pitchers alongside their 2018 numbers and their respective 2019 ERA improvement probabilities. One thing that’s fairly clear, though also quite intuitive, is that it’s difficult to improve upon good performances; DeGrom, Max Scherzer, and Justin Verlander are unlikely to be better in 2019 than they were in 2018, largely because they were just so good. Applying that same intuition to the other end of the spectrum, it’s pretty easy to improve on bad performances – Clayton Richard is almost certainly going to be better in 2019 because he set the bar so low. Those are the predictable cases – the ones in which the probability model does nothing but reaffirm what we’d basically known. Among those shown in the table, the more interesting cases are those of Josh Hader and Carlos Carrasco, both of whom enjoyed incredible 2018 seasons, and are actually more likely than not to improve in 2019. There’s also a few names not shown in the table who are in the same boat as Hader and Carrasco, such as Patrick Corbin, Dellin Betances, Ross Stripling, and Edwin Diaz – all of them are likely to improve in 2019 after being phenomenal in 2018.

#aaron-sanchez, #alex-cobb, #anibal-sanchez, #carlos-carrasco, #chris-sale, #clayton-richard, #dallas-keuchel, #dellin-betances, #domingo-german, #e-f, #e-x, #edwin-diaz, #edwin-jackson, #era, #jacob-degrom, #jake-odorizzi, #jakob-junis, #joe-musgrove, #jon-gray, #jose-quintana, #jose-urena, #josh-hader, #justin-verlander, #kenta-maeda, #kyle-freeland, #madison-bumgarner, #marcus-stroman, #matt-harvey, #max-scherzer, #michael-fulmer, #mike-leake, #patrick-corbin, #pitching, #pitching-projections, #rich-hill, #robbie-erlin, #ross-stripling, #sonny-gray, #tyler-anderson, #tyler-mahle, #wade-miley, #xfip