Here we’ll use data from Google Trends to determine the nerdiest baseball cities in the US. About 90% of the words in this post are on the basis and methodology for the analysis. If you just want to see the rankings, skip to the end.
Whenever I discover that someone I know is a baseball fan, I try throwing a few advanced metrics into the conversation just to gauge familiarity. I should preface this by mentioning I don’t think any level of familiarity with advanced metrics changes a person’s value as a baseball fan – whether we’re exporting Fangraphs data or we’re listening to sports talk radio, passion is passion regardless of how we waste our time with something as pointless as baseball fandom. But while I enjoy baseball conversations with fans of all types simply because its baseball, I love getting the insight from fellow stat geeks because I want to know what others value. There’s so much to learn from the massive collection of data generated by baseball that it’s impossible for one person to know everything on their own. The biggest problem I have throwing in the statistical jargon is simply the lack of bites on the other side of the conversation; I never get the long-awaited bWAR versus fWAR debate that I truly long for in casual chitchat. I see it on the message boards in droves. Fangraphs, Beyond the Box Score, and even Reddit all seem packed with geeks, so why isn’t the bar by Angel Stadium flooded with a few of the same people an hour before first pitch? This had me thinking about something that’s probably kind of dumb, but to me at least, is still very interesting…
Geographically speaking, where do I find all the baseball stat geeks? The nearest brewery to which stadium am I most likely to find someone equally as annoyed as I am about being unable to split half-seasons and years in the same export at Fangraphs?
Thanks to Google’s dominance in both search engine quality and creepy monitoring of our each and every move, Google Trends was my go-to resource for the data I collected. For the uninitiated, Google Trends is a way to measure the search interest in a particular term over time or space (geography), and also compare the interest of different terms to each other over those same dimensions. “Search interest” is probably better defined by the less-marketable term “search volume”, though the data produced by Google Trends isn’t a direct measure of volume like total searches or search percentage – it’s a 0-100 scale that controls for general search activity in a given area (same as controlling for population). Now I probably could’ve simply looked up Fangraphs on Google Trends (which I did) and called it a day, but the lack of rigor made it seem shallow.
It’s obvious from the Google Trends graphic when and where Fangraphs garners the most interest; baseball season, the Pacific Northwest, the area between Chicago and St Louis (I looked this up and it’s apparently called the “North Central Midwest”) and the Pittsburgh area. But we’re still not done. For US-only searches, Google Trends usually returns more data points using the “Metro” subregion, which is actually the Designated Market Area (DMA) used by Nielsen (the TV ratings people) rather than the Metropolitan Statistical Area as I’d first assumed. The exported data from Google Trends for Fangraphs revealed a handful of DMAs with search volume too low to register any quantifiable level of interest. I’d wondered if the same places would garner similar results for Baseball-Reference, and much to my delight, I found a correlation coefficient of r=0.85 (R2 is shown on the chart). I also pulled Google Trends data for the phrase “Happy Thanksgiving” (it was trending at the time) as a control set to reassure myself the correlation between Fangraphs and Baseball-Reference wasn’t a probable outcome for any random Google search; this yielded a correlation coefficient of r=-0.10 with Fangraphs…hooray!
This means it’s very likely certain regions are more baseball-nerdy than others – not that searching for Fangraphs or Baseball-Reference makes a person a baseball nerd, but the aggregated data certainly represents a solid proxy. I wanted to collect more Google Trends data on search terms that are similarly stat-geeky, so I tried “sabermetrics”, “Bill James”, and “Moneyball”, but neither sabermetrics nor Bill James yielded enough data points due to lack of volume, and the very vast majority of Moneyball’s search volume was generated when the movie came out – not a desirable trait. Still, with more search terms, we have more data, and we’ll be a lot more confident in the results while mitigating bias; similar to diversifying your portfolio to mitigate risk. So I eventually ran the following through Google Trends, both individually and combined (for volume comparison between terms):
My goal, if you couldn’t tell, was to use the aggregated data to determine the best (and worst) baseball-nerd cities and regions by summing the total interest generated for the five Google searches by location. To accurately reflect their search proportions, each Google Search was weighted by its individual search volume relative to the combined volume of all five (visualized in the appropriately titled donut chart).
I added a finishing touch for a sixth and final Google search-related measurement that doesn’t fall completely in-line with the other 5:
The final weighting of the combined score is shown in the next chart:
The very last table in this post reflect the complete rankings, which I can’t say are too surprising. The map generated by the initial Fangraphs search made them a little more predictable, but there’s certainly more clarity after we combine all the data. I also put together a heat map in Tableau that visualizes nerdiness in the 48 contiguous states (sorry Hawaii, Honolulu [112th] did register some data…nowhere in Alaska did). Let me also mention that Tableau doesn’t recognize Designated Market Areas as a geographic variable, so I had to map them out by ZIP code…which was the greatest pain in this entire post before I learned it would’ve been much easier had I mapped them out by county instead of ZIP.
Well…the numbers basically speak for themselves. As much as I cringe hearing Cardinals fans claim the “Best Fans in Baseball” designation, they’re easily the nerdiest. They’re also the most engaged on social media, which makes their nerdiness pretty understandable. So congratulations St. Louis and surrounding area, you’re a bunch of nerds – which makes me reeeeeeally want to visit Busch Stadium when the A’s come to town in 2019. The Columbia-Jefferson City area is the market directly west of St. Louis, and directly east of Kansas City, though the Cardinals generate about twice the search volume the Royals do in the area. The third result, Champaign-Springfield-Decatur, IL, generates more search volume for the Cubs than the Cardinals, however – SO YOU AREN’T THAT GREAT CARDINALS FANS! The other strong areas include Pittsburgh, Chicago, and New England, each one home to notably loyal and passionate fanbases – though I have to admit Pittsburgh ended up higher than I might’ve guessed. I’m also guessing Meg Rowley and Patrick Dubuque are solely responsible for Wisconsin appearing twice in the top 15…and probably a little for Seattle not being as sad as the rest of the west coast.
The west coast is basically inept when it comes to nerding out on baseball, which is sad news for me. 23rd-ranked Seattle-Tacoma is the only west coast area in the first 38, and that’s when the Bay Area finally joins in at 39th. I find the very bottom of the rankings interesting – maybe even more so than the top. These areas could easily be the places where the most blue chip high school football prospects come from in any given year – 12 of the bottom 15 are from deeeeeep football country – Texas, Oklahoma, Florida, Mississippi, and Georgia. Compared to the top of the list though, the bottom is also generally much further in proximity from any MLB team.
What’s all this mean? Probably not a whole lot. But I’ve been to the bars near Fenway, and they were definitely enthusiastic about baseball in a way I don’t ever expect to witness in Anaheim. If that same enthusiasm is topped by the nerdiness engulfing the area between St. Louis and Chicago, I actually look forward to visiting the Midwest – something I’ve never felt before in my life. At the very least, I’m guessing it beats trying to talk about run differential in El Paso.