The Baseball Demographer I

I recently read a book by J.C. Bradbury, which purports to be “the next step in the Bill James revolution,” called The Baseball Economist. I’m not sure that it is the next step. It seems to me that James did much more interesting work applying the mathematical techniques of economics to refine baseball metrics than the Bradbury does by essentially converting a variety of baseball questions into economic ones. Bradbury is a good read, though, and, as such, contributes to another positive trend started by James and carried on by Steven Leavitt of making economic and statistical thinking entertaining and accessible.

Bradbury writes on a variety of issues from the non-existence of lefthanded catchers to the positive contributions of pitching coach Leo Mazzone to the superiority of statistics over scouting as a method to identify baseball talent. The essay in his book that naturally interested me the most, however, is called “The Big City v. the Small City Problem.” In it Bradbury applies simple linear regression to assess the correlation between the population of the market served by each major league team and its performance on the field. His analysis is based on average wins per season by all major league teams from 1995 through 2004 related to metropolitan population over the same period.

Bradbury finds what I would expect. Teams playing in larger communities typically do better than their rivals drawing on smaller markets. It is not surprising given that larger markets offer the potential for considerably more revenue, particularly in an era when teams rely as much on their cable contracts and merchandise sales to pay the freight as they do on gate receipts (which also favour teams in larger centres). It is also not surprising when the perennial powerhouse of Major League baseball, the New York Yankees, hail from North America’s largest city.

As a fan of baseball’s longest standing small market team, the Cincinnati Reds, as well as a demographic analyst, my interest runs deeper than Bradbury’s I think. I immediately wondered if there was any difference between the National League, of which the Reds are members and which has been less clearly dominated by a specific team, and the American League, over which the Yankees have long reigned. I also wondered if there was ever a better time in which the small enjoyed a more even playing field.

Expanding Bradbury’s study was not that hard. I knew Baseball Reference.com had posted on-line the records of every major league season from the founding of the National League in 1876. It did not take long to find a convenient site with the historic populations of American metropolitan areas back to 1900. I decided to do my analysis from 1901 because the population data were readily available and that was the year that the American and National Leagues merged to begin what is generally regarded as “the modern era” of baseball’s history.

With the data assembled, it is also not hard to do regression analysis. In fact, Excel will provide the key parameters of a linear regression equation as easily as it will add up a row of numbers and it took me a few minutes to obtain the correlation between wins and population for every National League and American League season from 1901 to 2008, 214 equations in all. A couple more minutes and I had the same for the decades from the 19 oughts to the 20 oughts. In each case, the equation is in the form

y = a +bx.

where

y = the dependent variable (i.e., expected number of wins)

a = the intercept or the number of wins that would be expected in a market with no population whatsoever

b = the slope or the number of additional wins expected for each 1,000 additional residents in the market

x = the independent variable (i.e., population of each team’s metropolitan market in 1,000s)

The equation is also accompanied by a correlation coefficient (r2), which provides an estimate of the explanatory value of the equation as well as the direction of the presumed relationship.

First of all, the National and American Leagues do differ as I supposed, at least to a degree. In 39 of 107 (36.1 per cent) National League seasons, the smaller market teams actually out performed their big brothers as indicated by the slope of regression equation and the sign of the correlation coefficient, which was negative in each case. For the American League, only 22 of 107 seasons (20.4 per cent) showed a negative relationship. Indeed, the average equations for each league further reflect this difference:

National League: American League:
y=76.22+0.0003275x r2=0.05 y=71.91+0.0016311x r2=0.25

The average correlation coefficient for the National League suggests a weak to almost negligible relationship between population and wins. The intercept is also quite close to the average number of wins available, which is just under 78 and the slope is extremely shallow. The explanatory value of the American League equation is substantially higher, although hardly overwhelming. The same can be said for the intercept and the slope of the equation both of which suggest that population has had a more meaningful role in American League success.

The predictive value of the equations improves considerably when numbers are averaged by decades:

  National League American League
Decade Equation Correl. Equation Correl.
1900s y=73.57+0.0001085x r2=0.01 y=69.20+0.0021508x r2=0.30
1910s y=68.35+0.0020066x r2=0.45 y=73.19+0.0005175x r2=0.11
1920s y=75.17+0.0004056x r2=0.08 y=72.38+0.0015004x r2=0.32
1930s y=72.54+ 0.0009804x r2=0.23 y=68.13+0.0025115x r2=0.47
1940s y=76.74+0.0000027x r2=0.00 y=67.55+0.0025788x r2=0.58
1950s y=70.84+0.0015004x r2=0.41 y=62.25+0.0038227x r2=0.71
1960s y=78.10+-0.0007343x r2=-0.14 y=71.95+0.0017122x r2=0.46
1970s y=79.70+0.0002290x r2=0.07 y=74.65+0.0009935x r2=0.24
1980s y=76.58+0.0003878x r2=0.25 y=75.41+0.0007250x r2=0.34
1990s y=77.33+0.0000594x r2=0.03 y=74.97+0.0006106x r2=0.35
2000s y=77.16+0.0007707x r2=0.33 y=72.32+0.0019955x r2=0.60

The strongest correlations were in the 1950s when the Yankees won eight of ten pennants in the American League and the Dodgers and Giants won six between them before their departure for the West Coast (where the Dodgers won one more in 1959). The weakest were in the first decade of the nineteenth century when the leagues were generally less settled and the twenties before the Yankees got their chokehold on the American League. In fact, the correlation between population and wins was stronger in the American League for every decade after the twenties than before, with the exception of the 1970s.

Other notable decades are the 1940s, when population appears to have had no bearing whatsoever on performance in the National League and the 1960s when it was negatively correlated. The forties were arguably another unsettled period with the influence of the war, although population seems to have had plenty of influence in the American League. The perverse sixties, on the other hand, are attributable to the presence of an expansion team in the form of the abysmal Mets occupying the National League’s largest market augmented by the poor performance of the Cubs in the nation’s Second City. It is the only equation of 22 with a negative slope and correlation.

The current era rates among the leading “big market” decades of all time, ranking third for the National League and second for the American League. For the American League the keys have been not only the continued dominance of the Yankees but also strong performances by the Angels in Los Angeles, the Chicago White Sox, and the Boston Red Sox. In the National League, the top teams, St. Louis and Atlanta, represent largish cities, and the third best, the Dodgers, are located in the largest market. The Mets have also done reasonably despite some awful ups and downs. It is also notable that the very worst performers were from the smallest markets in both leagues including Pittsburgh, Cincinnati, Kansas City, and Tampa Bay. In times past, one or two teams of this sort usually held out as a mini-dynasty in the shadow of the Yankees, Giants, and Dodgers. For our current decade the Oakland Athletics have held the role not as occasional winners but as contending also rans, good enough to support a book on the managerial revolution that has kept them within hailing distance of the mountain top.

Small market teams have always had their trials as exemplified by the Reds, whose history is most familiar to me. In 1901, John T. Brush traded future Hall of Famer and hometown boy Christy Mathewson from the Reds to the New York Giants for expired Hall of Famer Amos Rusie before transferring his financial interests from the Queen City to the Empire State. In the early thirties, Powell Crosley bought the Reds out of bankruptcy and saved them by inventing night baseball. They were also in trouble in the 1950s, trailing the league in attendance immediately before their surprise pennant in 1961 and in the mid-1960s when moving the franchise to San Diego or Denver was seriously discussed before the advent of the Big Red Machine. In 1981, the Reds were robbed of a pennant by the stupid split season format adopted to deal with a strike.

These are the travails of the less powerful, exaggerated by the belief of baseball’s administrators and even the media that success in large markets makes for stronger leagues. The Reds and the Expos have been screwed out of pennants they should have won not the Yankees or the Red Sox. Innumerable New York writers have expounded on the glories of the fifties when only one World Series was played without a New York team (and even that one had a former New York team), notwithstanding that the experience was so boring for even New Yorkers that crowds dwindled through the decade. No one, it seems, values the variety of the sixties and seventies when the pennant was shared, and the small market Orioles and Reds demonstrated the virtues of cultivating players within a well managed organization. Baseball attendance rose during those years.

Advertisement

, , ,

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.