Tuesday, February 24, 2009

He Also Picked Mickey Rourke...

I'm a little late to the party on this one, but Nate Silver's projections of A-Rod's home run totals just don't look right. Here is his description of his methodology:
I took Rodriguez's top 20 PECOTA-comparable players and averaged their performances over each remaining season of their careers. Actually, the process was a little more complicated than that (each comparable's performance was adjusted for his park and league context, as well as his previous track record, and we had to make an accommodation for guys like Manny Ramirez who made A-Rod's comparables list but have yet to conclude their own careers). But the basic idea is simple: Comparables like Frank Robinson, who aged well, have a favorable effect on Rodriguez's forecast, and players like Caminini [sic] just the opposite one.
And here are the projections (I've added A-Rod's two previous seasons in white):

There are three glaring problems to me.
  1. Silver predicts a near-linear decline, with each total being 2-6HR lower than the preceding year. This, of course, is the result you are going to get when you take his "top 20 PETCOA-comparable players" like Silver did. If you take any 20 players and average their careers, the total number of home runs is almost always going to decline with age. Increase the sample size and it will always decline. The only problem is, over that ten year period it is extremely unlikely that any individual player is going to have that consistent of a downward slide. I will bet anyone reading $100 that this doesn't happen to A-Rod. First one to take the bet in the comments is on, we can iron out the details later.

  2. A-Rod is going to be able to DH at some point. Common knowledge would suggest that playing regularly in the field puts wear and tear on a body, draining energy in individual games and effectively shortening careers. Many of his comparables didn't have that luxury and were driven out of the league because they were no longer well-rounded players, not just because they could no longer be effective at the plate and hit home runs.

  3. The problem with being on pace to be the greatest home run hitter of all time is that you aren't going to have too many people similar to you.

Here are Alex's 20 comparables (and their career HR totals):

  1. Sammy Sosa (609)
  2. Bobby Grich (224)
  3. Dave Winfield (465)
  4. Ken Caminiti (239)
  5. Ryne Sandberg (282)
  6. Frank Robinson (586)
  7. Dwight Evans (385)
  8. Jeff Bagwell (449)
  9. George Brett (317)
  10. Reggie Jackson (563)
  11. Hank Aaron (755)
  12. Greg Luzinski (307)
  13. Albert Belle (381)
  14. Reggie Smith (314)
  15. Manny Ramirez (527, Inc.)
  16. Carlos Delgado (469, Inc.)
  17. Dick Allen (351)
  18. Doug DeCinces (237)
  19. Larry Walker (383)
  20. Tony Perez (379)

Granted, PETCOA's comparables are based on a ton of things besides home runs, but the problem with this list is that A-Rod already has more HRs than 16 of the guys on it. He has twice as many as his second and fourth closest matches and his career isn't over. I know Nate is going to lean on his own projection system for a variety of reasons, but it would probably have made more sense to look at the top 20 career HR leaders. We are, after all, trying to predict how many home runs he is going to hit and he's already #12 on the list.

I realize that the most reliable way to predict future outcomes is by analyzing past events. However, the flaw in using this methodology is that it becomes impossible to predict when someone will do unprecedented things. Simply put, how is analyzing 20 guys, none of whom is the career home run leader, ever going to result in the simulation predicting A-Rod will break the all-time record?

Look what happens when you line up A-Rod's projections with Hammerin' Hank's:

For one thing, last year, A-Rod played in only 138 games. If he played 156 games, he was on pace for almost exactly 43 HRs, right in line with a 32 year old Aaron.

Where the big differentials come in, are from ages 35-39. As A-Rod enters his steady plunge into oblivion (98HR, 19.6/year), Hank checks in with 203 round-trippers (40.6 per year) including the highest single season total of his career (47 at age 37).

There's no guarantee that A-Rod will hit even one more home run. He could get struck by lightning tomorrow. I just don't think that Alex is going to take the field on a consistent basis and gradually slide off into oblivion like PETCOA projects. The truly eye opening part of Silver's projection is that he'll would still only be 33 HRs away from the all-time record even if he is as bad as the simulation predicts.


Before you leave a comment telling me how much of a moron I am, I've posted my own projections here.


  1. PECOTA comparables aren't supposed to be used as a comparison of player's careers, but, rather, a comparison of where each player was at similar stages of their careers.

    Take a look at his top 5 comparables age 32-34 seasons, and you'll see the relevance you're looking for.

  2. Hey man, this is my first time visiting your site (coming from Rob Neyer's blog), but you should probably go back to stats 101.
    I agree with you that the methodology used for the projection is kind of suspicious but the fact that according to the projection he is going to "gradually slide into oblivion" is not a good reason to say that the projection is flawed. It's a projection after all, and sometimes the best fit you can use for your projection is a line or a curve or something similar. For example, in economics all the models used to predict the GDP of a country in the long run, the GDP usually just increases in the long run, it's just a projection, that doesn't mean that GDP is going to increase every single year. Some years you might have a recession, some years you might expand at a faster rate than usual, but in the long run you expect the growth. It's similar with baseball players, in the long run you expect them to decline, some years Arod might outperform that projection, some years he might underperform it. If the model instead predicted him to hit 27 HR at 35 and then 46 HR at 36, would it be more credible? I doubt it, there is now way you could predict such a thing, for all practical purposes HR’s connected are a random variable. it’s obvious that Arod’s HR are not going to slide every single year and he might have great seasons after he turns 35, but it’s also possible that he might have injuries and other stuff happen to him. In the long run the tendency to decline should be part of the model.

  3. Anon - Fair point that putting in random spikes and declines wouldn't make it more credible. But if you are always assuming a decline, doesn't that skew the projection downward? If it starts at 33, that is his highest total going forward? Seems unlikely. I'm not saying start at 55 either, but I think that's a major reason why PETCOA ends up lower than most people's ballpark guesses.

    It comes across like I'm calling out Nate Silver, but maybe I'm just identifying some of the reasons that projections in general are flawed.

  4. PECOTA also ends up much *better* than most people's ballpark guesses. Anon up there is exactly right--the whole point of projection systems is to point us toward the most likely future outcomes based on what has tended to happen in the past. It seems as though you're saying the fact that it does exactly that is a "flaw."

    It would be incredibly silly for it to compare him to Hank Aaron or to project anyone to age like Hank Aaron, when there's been only one Hank Aaron, ever. The side-by-side of his projections and Hank's is interesting and all, but why not do the same thing with Jimmie Foxx?

  5. BillP - PETCOA is much better than people's guesses in most cases, but to my knowledge it hasn't been used to project anyone on the path to an all-time record for an individual stat. It was excellent last year at predicting teams records, but I just don't think the methodology makes sense.

    A-Rod was the fastest player ever to 500HR, so I don't think it would be all that silly to think that he could have a carrer path similar to Hank Aaron since he has already been better than him. There was only one Hank Aaron, but there was also one Barry Bonds.

    The reason I didn't compare him to Jimmie Foxx is I am a Yankees fan... Well, that and it would have been a pretty lobsided Excel chart.

  6. PECOTA has always (or at least for several years now) projected several years into the future, and as far as I can tell has been about as good as one would expect at that, too. The one thing (among those things we can reasonably expect a forecasting system to do) that PECOTA doesn't seem to be good at at all is Ichiro--it will apparently keep predicting that he's going to fall off a cliff until it's right.

    What I mean by "there's only been one Hank Aaron" is his aging pattern. A-Rod certainly isn't alone in having been a better hitter than Aaron through age 32 (has he been, though? Aaron's OPS+ through that age was better); the sucker's bet is to take that as an indication that he's at all likely to be as good as Aaron was at age 33 or 34 or 37 or 42.

    Of course, yes, there's Barry Bonds too, but most people seem to assume--and with good reason I suppose, though I'm not really one of them--that that was kind of an artificial phenomenon. Stack those two against the dozens of great players who have fallen off much faster, and it's still pretty clear which way the odds are pointing.

  7. It seems futile to attempt projecting a career that is an outlier. Rodriguez had 91 extra-base hits in his age-20 season, so we're dealing with a pretty rare player as it is. (DiMaggio and Pujols each had 89 in their age-21 seasons.) Using guys like Grich, Caminiti and Sandberg as comps doesn't seem fair to Rodriguez. Combined those three have only four seasons with 30+ homers. Rodriguez has hit less that 35 just one time.

  8. BillP - I know PETCOA projects a few years into the future, but the unique part of the pursuit of a record is that you are by nature outside of previous accomplishments. Like scatterbrian said "It seems futile to attempt projecting a career that is an outlier".

    The fact that they haven't been able to forecast Ichiro accurately would serve that point as well.

    I'd be down for a friendly bet that A-Rod hits more home runs than Aaron did at age 33 and 34 (39+29) in the next two seasons.

  9. Damn, 68 homers is a pretty good over/under bet over the next two seasons. I'd lean over, but I'm really curious to see how he handles this season after all this PED nonsense.

  10. The other problem in comparing to Aaron's late 30s power surge is that it was very much influenced by the Braves move into the homer-friendly confines of Fulton County stadium, IIRC.

  11. "However, the flaw in using this methodology is that it becomes impossible to predict when someone will do unprecedented things. Simply put, how is analyzing 20 guys, none of whom is the career home run leader, ever going to result in the simulation predicting A-Rod will break the all-time record?"

    I think that this is the crux of the issue.

    Let's say that we have a marathon with 10 runners. Halfway through the race, the leader appears on pace to win. A statistician, feverishly trying to drum up some publicity for himself, knows that the best way to achieve his self-promotional goals is to say something contrarian:

    "He can't finish first! I've compared him to his closest competitors, the second and the third runners, and I can tell you that he's on track to finish either second or third!"

    It's idiotic, circular reasoning.

    Nate Silver and the BP crew do some GREAT work and are brilliant minds, in my opinion. But this is so facially, flagrantly stupid that I'm just going to assume that they were just trying to get some attention (a'la, Rob Neyer becoming famous for pointing out Jeter's defense was bad).