My examination of the correlation between metropolitan population and success in Major League baseball raised a few more questions in my mind than I examined in my first article. One that immediately occurred to me was whether it was reasonable to correlate wins for teams in multiple team markets with the total population of their market or with a share of the market. The second, which is a burning question for me as a fan of the small market but historically decent Cincinnati Reds, is to find the all time demographic over performer.
The first question is more important historically than it is today. Currently, of 32 major league teams only six are located within the same metropolitan area. The Oakland and San Francisco markets probably overlap but the United States Census Bureau conveniently identifies them as separate metropolitan areas and I’m willing to go with that. In the original 16-team leagues of 1900 that stayed fixed into the 1950s, however, no less than 11 teams competed financially on the same turf with two of three New York-based teams participating in the same league.
Although some teams may have had geographic strongholds within their metropolitan realms, as I suppose Brooklyn must have and I’ve always heard the White Sox and the Cubs do (dividing the north and south sides of Chicago), the only way I could see to account for split markets was to divide the metropolitan population of each by the number of teams present. Thus New York, Chicago, and Los Angeles in the present were divided by two as were Boston, Chicago, Philadelphia, and St. Louis of the past. New York to 1958 was divided by three to account for the Dodgers, Giants, and Yankees.
The first observation to make is that even when thus divided, teams in the biggest cities have considerably larger populations on which to draw. The smallest single team markets in the long-standing 16-team league were always smaller than the partial populations in leading cities. Aside from St. Louis, which was the smallest market of all when divided in two and shared equally between the Browns and Cardinals, who generally played in the same stadium, Cincinnati was always the smallest National League market and Detroit or Washington held up the American League.
The major changes resulting from the division of markets are the elevation of Pittsburgh by a narrow margin to most populous of the National League markets in the 1900s and 1910s, and of Detroit to the top of the American League in the 1940s. Both teams did well under those circumstances with the Honus Wagner-led Pirates winning considerably more games than any other team in the National League in the first decade of the century and finishing fourth in the second. The Tigers of the forties were the third best team in junior circuit.
For the most part, though, the rank of markets is not greatly changed by splitting them. A third of the New York market was more than any other team had to work with in the National League from 1920 to 1950. In the American League, it was enough to make the Yankees pre-eminent throughout. The major statistical influence is to lower the populations associated with the generally good performance of all three teams. The consequence for the regression analysis I outlined in my previous post is to marginally improve the correlations between market population and winning, and significantly increase the number of wins associated with increased market population:
| Total Markets | Divided Markets | |||
| Equation | Correlation | Equation | Correlation | |
| National League | ||||
| 1900s | y=73.57+0.0001085x | r2=0.01 | y=54.96 +0.0163834x | r2=0.41 |
| 1910s | y=68.35+0.0020066x | r2=0.45 | y=64.65 +0.0068259x | r2=0.42 |
| 1920s | y=75.17+0.0004056x | r2=0.08 | y=72.57 +0.0024062x | r2=0.13 |
| 1930s | y=72.54+ 0.0009804x | r2=0.23 | y=68.72 +0.0041195x | r2=0.27 |
| 1940s | y=76.74+0.0000027x | r2=0.00 | Y=83.16 +- 0.0031947x | r2=-0.21 |
| 1950s | y=70.84+0.0015004x | r2=0.41 | Y=82.93 +- 0.0027209x | r2=-0.22 |
| 1960s | y=78.10+-0.0007343x | R2=-0.14 | Y=79.74 +- 0.0016541x | r2=-0.14 |
| 1970s | y=79.70+0.0002290x | r2=0.07 | y=79.48 +0.0003880x | r2=0.05 |
| 1980s | y=76.58+0.0003878x | r2=0.25 | y=75.10 +0.0010206x | r2=0.27 |
| 1990s | y=77.33+0.0000594x | r2=0.03 | y=75.31 +0.0007496x | r2=0.15 |
| 2000s | y=77.16+0.0007707x | r2=0.33 | y=73.47 +0.0020631x | r2=0.43 |
| American League | ||||
| 1900s | y=69.20+0.0021508x | r2=0.30 | y=62.58+0.0111290x | r2=0.44 |
| 1910s | y=73.19+0.0005175x | r2=0.11 | y=64.22+0.0084305x | r2=0.48 |
| 1920s | y=72.38+0.0015004x | r2=0.32 | y=74.50+0.0013906x | r2=0.08 |
| 1930s | y=68.13+0.0025115x | r2=0.47 | y=59.00+0.0097659x | r2=0.55 |
| 1940s | y=67.55+0.0025788x | r2=0.58 | y=58.59+0.0091860x | r2=0.63 |
| 1950s | y=62.25+0.0038227x | r2=0.71 | y=57.01+0.0072533x | r2=0.44 |
| 1960s | y=71.95+0.0017122x | r2=0.46 | y=68.57+0.0038216x | r2=0.59 |
| 1970s | y=74.65+0.0009935x | r2=0.24 | y=70.99+0.0029570x | r2=0.44 |
| 1980s | y=75.41+0.0007250x | r2=0.34 | y=74.49+0.0014368x | r2=0.41 |
| 1990s | y=74.97+0.0006106x | r2=0.35 | y=75.49+0.0007247x | r2=0.24 |
| 2000s | y=72.32+0.0019955x | r2=0.60 | y=70.76+0.0035210x | r2=0.60 |
Although correlations are generally a bit stronger, the correlation for the National League is negative for three decades running from the 1940s through the 1960s. The correlation coefficient is essentially the same for the sixties but it is a substantial turnaround for the fifties, presumably because the well performing Dodgers and Giants markets are recast from being the largest to the fourth and fifth positions (the Dodgers and Giants count their 1958 and 1959 seasons in Los Angeles and San Francisco, respectively, and the Giants fall back because San Francisco was and remains considerably smaller than New York). Otherwise, it is notable that slopes are steeper in every case but one between the two leagues, which follows from the narrower range of populations.
Another thing that can be done with regression equations and which J. C. Bradbury does in his book, which I referenced in my previous post, is to predict the performance that each team should have given the population that supports it. The following tables give the results for all teams in both leagues by decade. The all-time over performers ironically enough are the San Francisco but previously New York Giants and the (ugh) New York Yankees. The underperformers are the Phillies, who shared their market for about half of their “modern” existence and who hold the world professional record for losses with over 10,000, and the big city Los Angeles Angels of Anaheim. Those who put more weight on a really long losing tradition, however, may prefer the Baltimore Orioles, who did most of their damage as the St. Louis Browns, and the Chicago White who carried a curse from 1919 on. The all-time stretch of futility goes to the Phillies, who contrived to underperform their market by nearly 20 games per year for three decades from the 1920s through the 1940s. Their current dynasty is going to have to last to dig out of the resulting hole.
My Reds have been a consistent if fairly modest overachiever with a particularly notable performance in the seventies when they illuminated my adolescence and young adulthood, after hooking me as a seven year old in 1961, and taking me on a bit of a roller coaster ride through the sixties. As the following table shows the Reds were a whopping 15.3 wins over par as determined by the regression equation for National League performance in the seventies, which is exceeded only by the Cubs of the 1900s, who spent the last half of the twentieth century going in the other direction. Over their history from 1900, the Reds are a more modest 1.5 wins over expectations. The best smaller city performer, in fact, is Ohio’s other team, the Cleveland Indians. Although the Indians were league doormats throughout my youth and have only won two World Series in their history, they have actually been a model of consistency, in the top half of the league outside of the seventies and eighties, and several times breathing down the neck of the dominant Yankees. Their record of 4.0 wins over par, however, is handily exceeded by both the Yankees and the Giants.
| 1900 | 1910 | 1920 | 1930 | 1940 | 1950 | 1960 | 1970 | 1980 | 1990 | 2000 | Avg. | |
| National League | ||||||||||||
| ATL | -16.0 | -8.4 | -16.4 | -6.4 | -4.8 | 11.4 | 8.0 | -7.7 | -6.4 | 15.0 | 8.8 | -2.2 |
| CHC | 16.6 | 8.1 | 3.9 | 11.9 | -3.2 | -12.2 | 0.3 | -2.8 | -5.9 | -3.9 | -3.3 | 0.3 |
| CIN | -2.2 | 1.8 | 4.3 | -7.0 | 0.0 | 1.5 | 8.9 | 15.3 | 0.9 | 3.5 | -3.7 | 1.5 |
| LAD | -11.0 | -9.8 | -1.4 | -6.8 | 12.6 | 8.3 | 14.5 | 9.7 | 2.8 | 1.8 | 0.5 | 2.2 |
| PHI | -3.4 | 2.8 | -19.8 | -17.6 | -18.3 | -0.2 | 1.2 | 0.4 | -0.2 | -4.4 | 3.0 | -5.7 |
| PIT | 21.7 | 1.5 | 11.7 | 6.4 | -1.1 | -13.1 | 8.7 | 11.3 | -4.3 | -0.1 | -10.2 | 2.4 |
| SFG | 10.7 | 9.5 | 11.1 | 6.6 | -4.4 | 0.6 | 13.1 | -0.6 | 0.1 | 1.6 | 6.7 | 5.6 |
| STL | -16.5 | -5.6 | 6.5 | 12.9 | 19.3 | 3.8 | 12.0 | -0.3 | 5.0 | -1.7 | 12.1 | 3.8 |
| HOU | -7.5 | -0.9 | 4.1 | 3.8 | 3.5 | 2.5 | ||||||
| NYM | -9.8 | -5.4 | 1.7 | -1.2 | -1.7 | -3.0 | ||||||
| SDP | -25.1 | -13.4 | -1.2 | -1.7 | -2.3 | -3.4 | ||||||
| WSN | -24.3 | -5.5 | 3.4 | 0.1 | -8.1 | -1.2 | ||||||
| COL | -4.3 | -3.7 | -2.4 | |||||||||
| FLA | -10.0 | 1.5 | -2.0 | |||||||||
| ARZ | 5.0 | 1.6 | 4.5 | |||||||||
| MIL | -3.4 | -4.9 | -2.9 | |||||||||
| American League | ||||||||||||
| BAL | -4.8 | -14.1 | 1.8 | -14.2 | -2.1 | -5.8 | 15.8 | 17.6 | 2.9 | 2.9 | -7.1 | -1.7 |
| BOS | 1.5 | 10.8 | -18.4 | -7.4 | 7.3 | 2.1 | -4.0 | 9.6 | 2.7 | 2.9 | 7.0 | 1.3 |
| CHW | 8.2 | 5.0 | -5.2 | -11.8 | -9.5 | 0.6 | 1.8 | -6.6 | -4.9 | 1.9 | -2.8 | -1.7 |
| CLE | 6.7 | 0.5 | 4.3 | 10.5 | 8.3 | 20.6 | 2.4 | -3.3 | -6.0 | 6.0 | 6.7 | 4.0 |
| DET | 5.3 | 5.3 | 0.9 | 7.6 | 8.5 | -2.1 | 9.0 | -0.2 | 5.4 | -7.4 | -9.8 | 1.6 |
| MIN | -17.2 | 1.9 | 5.5 | 9.8 | -3.6 | -6.0 | 11.0 | 4.5 | -3.8 | -4.8 | 7.7 | -0.6 |
| NYY | -7.6 | -5.9 | 10.9 | 9.3 | 3.8 | 0.0 | 1.5 | 5.9 | 3.9 | 4.7 | 4.5 | 5.0 |
| OAK | 7.9 | -3.5 | 0.2 | -3.9 | -12.7 | -9.4 | -5.6 | 7.5 | 3.5 | 1.0 | 13.4 | -0.9 |
| ANA | -7.0 | -3.7 | -3.0 | -6.7 | -2.6 | -4.2 | ||||||
| TEX | -9.3 | -2.0 | -5.1 | 3.9 | -3.6 | -2.2 | ||||||
| KCR | -5.3 | 9.0 | 6.1 | -3.5 | -8.6 | 1.9 | ||||||
| TBD | -10.4 | -2.2 | 4.0 | -3.4 | -9.7 | -1.6 | ||||||
| SEA | -13.9 | -9.4 | 0.1 | 6.2 | -1.4 | |||||||
| TOR | -22.2 | 3.8 | 2.6 | -1.3 | 0.5 | |||||||
Whether or not this means anything is open to question. National League teams in the sixties are actually being discounted wins for population and the generally mediocre correlations suggest that population has never been the determining factor. Whether their large market area has allowed the Yankees to sustain good management through most of their history is open to question. Certainly, most of the New York teams have done well but the performance of the Mets, Chicago’s two teams, and the Angels suggests that franchises with large fan bases do not inevitably succeed. Small city dynasties such as the Orioles, who led the American League in wins through the sixties and seventies; the Pirates of the 1900s and the Reds of the seventies, who dominated the National League; and the Cardinals, who led in the forties, eighties, and the decade just ended also test the point. They account for the seven decades of 22 that teams representing metropolitan areas in the bottom half of either league won the most games.
The importance of the current situation is the growing feeling that the strength of larger markets in 2000s is rooted in the structure of baseball as it has evolved with limited revenue sharing, no salary cap, and the prevalence of free agency. Larger communities have always had a bit of an advantage, as a century plus of analysis suggests, but it hasn’t often risen to the current level. The most comparable decade is the 1950s when the New York teams in both leagues dominated a collection of dying northeastern markets. The result was the redistribution of franchises from historic over-served markets, including New York, to new and growing areas.
Today’s conditions are arguably the opposite. Nearly all viable markets are now served and the three that have more than one team, are more than able to support two franchises. The pressure now is on the smaller single team cities like Kansas City, Milwaukee, and Cincinnati. It seems unlikely that any will lose their team. The Brewers are not that bad and drew over 3 million last year, and the Royals and Reds are drawing surprisingly well given their extended periods of failure.
The question still nags that they may never ever be contenders again because they lack the resources to compete. The Brewers have cycled up for a couple of years but now appear to be sliding back. The Indians also appeared to be making a move a few years ago but have since given away two of the best pitchers in baseball because they couldn’t afford them, handing one to the Brewers, who could only manage to rent C. C. Sabathia before turning him over to the Yankees. The powers that be do little to help. The fact is they seem to like it. Boston-New York games pull in big TV ratings and keep the big city writers humming. Another Oakland-Cincinnati World Series may not be a rational business goal.
On the other hand, enough has to be dangled to keep the small market fans interested. The Reds and the Royals draw enough customers to keep their ballparks painted (I’ve never been to either place but I understand that both Great American Ballpark and Kaufmann Stadium are lovely) but their attendance is in the bottom two of their respective leagues. It was much better for both when they won and if it slips much farther in either case there could be problems. If Major League Baseball wants the Yankees to continue to have a geographically distributed collection of punching bags, Bud Selig, who ought to know a thing or two about small markets, may have to do a bit more to help the little guys.
