Friday, May 15, 2015

Reading the Ted Wells Report: Exponent report

There are various parts of the report which are used by the NFL to make their case.  They seem to rely more than anything else on the text messages.  This is odd to me, as the text messages are not, in fact, incriminating.  Embarrassing?  Certainly.  Consistent with guilt?  Maybe.  There are more problems with that than people are admitting.  Damning?  Hardly At some point I may go over that, but I think it's a better use of my time to go over the Exponent appendix, which makes the case that the Wells report and the suspensions rely upon.  It should be clear here that without the Exponent report, there is no evidence of any wrongdoing.  Without evidence that the balls were actually deflated, who cares what text messages a couple locker room attendants sent to each other?

The main problem with the Exponent report is that a lot of information is missing.  Where we should be using carefully recorded air pressures, we rely more upon half-rememberances, including on the question of which gauge was actually used to measure the air pressures.

A second issue that I find problematic is that it presumes uniformity to conditions that simply isn't reasonable to assume.  It also fails to consider the possibility that using a football playing a football game in the NFL might tend to lead to loss of air pressure.

Anyway, let's start.

I am not going to bother to replicate the tables used by the report.  It is freely available online and any interested party should get a copy.

The Effect of Various Environmental and Physical Factors on the Measured Internal Pressure of NFL Footballs

Reading the executive summary.  

Based on information from Paul, Weiss, we learned that two digital air pressure gauges had been used to measure both the Patriots and the Colts footballs at halftime, one identified herein as the Non-Logo Gauge and the other identified as the Logo Gauge. One of these gauges had also been used to check (and in a few cases, set) the pressure of the footballs prior to the game. We have been told by Paul, Weiss that there remains some uncertainty as to which of the two gauges was used prior to the game. We have also been told by Paul, Weiss that the pressures of the Patriots balls were set at or near 12.5 psig1 following pre-game inspection by the game officials, and the pressures of the Colts balls were set at or near 13.0 psig following pre-game inspection by the game officials.2 When tested at halftime, the air pressure in the Patriots balls measured between 10.50 psig and 11.80 psig, and 10.90 psig and 12.30 psig, depending on the gauge used. Four of the Colts footballs were also measured and found to have dropped in pressure from their reported pre-game pressure levels, to between 12.50 psig and 12.75 psig, and 12.15 psig and 12.95 psig, depending on the gauge used. What is most significant about the halftime measurements is that the magnitude of the reduction in average pressure was greater for the Patriots footballs when compared to that of the Colts footballs. The question then becomes: what factor(s) could explain this difference?
My first thought here would be "the Pats' balls were tested first?"

This is a serious point.  Throughout this process, the NFL has shown a real lack of understanding at the interrelationship between temperature and pressure.  If the Patriots' balls and the Colts' balls were all brought inside at the same time,  and it took 10 minutes to test the Patriots' balls, then the air in the Colts' footballs would have been warming up for 10 minutes, waiting to be measured.  And as they warmed up, the air pressures would have increased.

Presumably they didn't miss such an obvious possibility.  I will read on.

OK, next page.  (p. 156 of the PDF - ignoring the document's page numbering)

  1. According to basic thermodynamics, it is completely expected that the temperature and pressure inside a football drop when it is brought from a warmer environment into a colder environment and rise when brought back into a warmer environment. It is important to note, however, that these variations in temperature and pressure are time-dependent (in the time ranges at issue in the present investigation).

It's worth saying at this point that it seems like it took the NFL a few days to cotton onto this fact.  I'm convinced that the league thought that the initial measurements alone would be sufficient evidence to convince everybody of the Patriots' guilt.

  1. As a result of being exposed to relatively colder temperatures when brought outside to the field for the first half, the pressure inside the footballs for both teams was lower at halftime when compared with the reported pre-game levels. This is consistent with the Ideal Gas Law, which predicts, among other things, the change in pressure that is caused by a change in temperature. Based on information regarding actual conditions on the day of the AFC Championship Game, however, the application of the Ideal Gas Law (assuming equilibrium conditions) cannot account entirely for the pressure drops observed in the Patriots halftime measurements. Most of the individual Patriots measurements recorded at halftime were lower than the range predicted by the Ideal Gas Law. Indeed, all but three of the footballs, as measured by both gauges, registered pressure levels lower than the range predicted by the Ideal Gas Law, assuming an initial pressure of 12.5 psig and temperature conditions that we understand were present on Game Day. In addition, applying the Ideal Gas Law while assuming equilibrium conditions fails to account for the transient nature of the halftime testing, as described in detail herein.
I don't think this claim is borne out by an examination of the data.  It's a bad sign when the Executive Summary makes a claim that the report itself falsifies.

p. 157

  1. When the Logo and Non-Logo Gauges measure an identical pressure, different readings are produced: the Logo Gauge reads higher than the Non-Logo Gauge. However, for a given set of measurements, the error for either gauge remains consistent compared to a calibrated gauge. In other words, in the short term, both gauges (as well as the other model gauges used by Exponent during our experiments) will read consistently, but differently from each other. Thus, the short-term repeatability or precision of the two gauges used at halftime is not a factor that contributed to the difference in the magnitude of the pressure drops between the game balls of the two teams, although their apparent difference in accuracy must be taken into account.
Yeah, it's troubling that they used two different gauges with very different measurements.  How did we reach this state of affairs?  Even with a day or more to prepare, the league couldn't find two gauges that would agree with each other?  Like many aspects of the measurement process, this is amateurish.

  1. A series of physical factors were evaluated for their potential contribution(s) to the difference
    in the observed pressure drops at halftime. These included:
    1. The impact of game use.
    2. The impact of repeated insertions of an inflation needle into the football.
    3. The natural leak rate and permeability of properly functioning footballs.
    4. The relative humidity of the air in the room(s) in which the footballs were inflated.
    5. The variation of volume of the footballs.
    6. The different treatments used by the Patriots and the Colts to condition the surface of the balls prior to the game (including the vigorous rubbing described by the Patriots as a step in the process used to break in their footballs).
    Notably, the potential differences in the amount and type of use by each team during the game as well as the ball preparation methods used prior to the game, including vigorous rubbing taking place more than 30 minutes prior to pre-game inspection, were found to have little to no impact on the recorded pressures. None of the above physical factors, at the levels we understand were applicable on Game Day, were found to contribute in any material way to changes in the internal pressure of the footballs, and do not, therefore, explain the relative difference in the pressure drops measured at halftime.
Nope, not seeing any mention of the relative delay in testing the Colts' balls.

OK, there it is on the next page, (p. 158), part of point 7.

All of these factors were found to contribute in varying degrees to changes in the internal pressure of footballs. However, given the magnitude of the temperature change that would have affected the footballs at halftime when they were brought from the field to the locker room, a key factor in explaining the difference in measurements between the Patriots and Colts footballs is timing; that is, the change in pressure with time as the footballs were brought from a colder environment (the field) to a warmer environment (the Officials Locker Room) at halftime.
They slipped this one in...

  1. For the purpose of the experiments, Paul, Weiss informed Exponent that there was no plausible basis to believe that there had been tampering with the Colts footballs; therefore, the Colts footballs were used as a “control” group when evaluating and determining test parameters for the pertinent experiments. In other words, because we could reasonably assume that the Colts measurements collected at halftime on Game Day were the result only of natural causes, a combination of environmental and timing factors was identified (within the realistic ranges provided by Paul, Weiss) for the purpose of our experiments that resulted in measurements for the Colts balls that matched the Game Day measurements.

Nothing like a testing situation where guilt of the other party is ruled impossible right from the start.

Hmmm...what's next

  1. Overall, we determined that there was a small window in which it was theoretically possible to combine the factors listed in 7a through 7d above to achieve pressure levels that matched those recorded for both the Colts and the Patriots on Game Day, regardless of which gauge was used to measure the footballs pre-game, test them at halftime, or set them prior to our experiments. However, as described below, the precise combination of factors required for the Patriots halftime measurements to fall within the range predicted by the transient experiments while also matching the Colts halftime measurements to the predicted range required setting certain parameters—particularly the timing of the halftime testing and the surface condition of the footballs—at levels believed to be unrealistic and unlikely to have been present on Game Day. In particular:
    1. If the Non-Logo Gauge was used pre-game, the Patriots average halftime measurement from Game Day is always lower than the pressures predicted by the transient curves. If one allows for the standard error associated with the Game Day measurements, the Patriots halftime measurements will overlap with the pressures predicted by the transient curves (with the Colts halftime measurements also matching the predicted range), but only in the outer range of the error band, and only if testing of the Patriots balls began immediately once the footballs arrived in the Officials Locker Room at halftime and took no more than 4 minutes. Based on information provided by Paul, Weiss, however, we understand that testing is likely to have begun no sooner than 2 minutes after the balls were returned to the locker room and is likely to have taken approximately 4 to 5 minutes.
    2. If the Logo Gauge was used pre-game, the Patriots average halftime measurement will match the pressures predicted by the transient curves (with the Colts halftime measurements also matching the predicted range), but only if the testing of the Patriots balls began immediately once the footballs arrived in the Officials Locker Room at halftime and took no more than 4 minutes, and only if the majority of the Patriots game balls were wet. As noted, testing of the Patriots balls is likely to have begun no sooner than 2 minutes and is likely to have taken approximately 4 to 5 minutes. Further, based on statements made to Paul, Weiss (and subsequently conveyed to xponent) by Patriots ballboys and game officials, we understand that some of the Patriots game balls may have been damp when tested at halftime, but none were waterlogged.

I strongly doubt that they can reconstruct the halftime measurements with this kind of precision. It's ridiculous. I also don't understand why the balls would have to be wet when tested for this to matter.

It's not like drying a wet football is going to restore lost air pressure.

The next point mentions "simulations" of game day conditions. I don't think these simulations included rain, and I strongly doubt they included the balls being used in a game of professional baseball.

This point is particularly inane: (p. 160)

  1. In addition to noting the difference in average pressure drops between the Colts and Patriots footballs when measured at halftime, we observed that there appears to be a difference in the variability of the measurements recorded for each team.
Only four Colts' balls were even tested!  How much variability do you want?  Should we model the probability that a sample of four balls will have less variability than a sample of 11 balls?

And of course, this is the real kicker (p. 160)

  1. In sum, the data did not provide a basis for us to determine with absolute certainty whether there was or was not tampering as the analysis of such data ultimately is dependent upon assumptions and information that is not certain.

Um, what???

Skipping forward a few pages into the report. (p. 165)

Table 1 shows the recorded measurements.

It's worth noting at this point that Exponent simply ignored the post-game measurements. Gotta go back into the main report to see what's up with that.Hmm...the explanation isn't any better.

I note from page 73 "The officials also inflated and re-adjusted each of the Patriots game balls tested. Riveron instructed that footballs registering below the permissible range should be inflated and set to 13.0 psi."  The post-game measurements are problematic in the following sense:  they show four balls measuring at 13.5, 13.35, 13.35, and 13.65 psi.  How is this possible?  If the refs inflated the balls to only 13.0 psi, why do they have higher air pressure when measured after the game?  I can see why Exponent would want to exclude them from any modeling.

OK, let's move to Table 2.  (p. 166) The average pressure for the Patriots' balls is 11.11 by one gauge, 11.5 by the other.

The average pressure for the Colts' balls is 12.63 by one gauge, 12.44 by the other.  Curiously the order of the columns has reversed.  By which I mean the gauge used by one official had higher measures for the Patriots and lower measurements for the Colts.  Did they switch gauges for some reason?

Anyway, Table 3 (p. 167) shows the main part of the argument: the Patriots balls seem to have lost about 1.0 psi, while the Colts' balls lost about half of that.

Table 4 shows the measurements based on the opposite presumption, that the officials switched gauges.  Why it's not a matter of record which official used which gauge is a huge puzzle.  At this point, we don't have the excuse of  "we're not taking this seriously".  We already know that one gauge is significantly different than the other.

It's time to address the gauge switching story.  As we see in Table 2, it looks like the officials switched gauges between doing the Patriots' balls and the Colts' balls.  But it's a bit worse than that.

Colts' balls 1,2, and 4 show higher pressures in the Blakeman measure, and ball 3 shows higher pressures with the Prioleau measure.  Since the 'non-logo' gauge consistently measures lower pressures than the "logo" gauge, Exponent is forced to assume that the two officials switched gauges only to measure ball 3.

To summarize; for the Patriots' ball-testing, Prioleau had the 'logo/' gauge.  Then for Colts balls #1 and #2, Blakeman had it.  And they traded again for ball #3 and again for ball #4.  Remember that these measurements were taken in order, so ball 3 happened after balls 1 and 2 and before ball 4.

This is more ridiculous than any of Jastremski's text messages.

There is a more sinister explanation.  And it's reasonably simple: the two officials only switched gauges once.  So what happened with Ball 3? It's been suggested elsewhere that this is a simple data entry error.  And that, instead of the ball being measured at 12.95 by the Prioleau gauge, it really  should be 11.95.  Why does this make sense?

The two gauges preserve order.  By which I mean if the logo gauges says the pressure of ball X is less than the pressure of ball Y, so does the non-logo gauge.  This is consistent for all 11 Patriots balls as well as the 3 other Colts' balls.  And the (apparently) non-logo gauge says that Ball #3 has the lowest pressure of the four Colts' balls.  The logo gauge should say the same thing.  If it recorded a pressure of 11.95, that order-preserving property would hold.

There's a problem here that 11.95 psi would show the kind of loss of air pressure that Exponent says only happened with the Patriots' balls.  That would be a loss of more than 1.0 psi.  Another solution, which I think is more likely, is that the gauge matches not only the "order-preserving" property but also the "constant difference" property.  All of the other three balls lose between 0.35 and 0.4 psi switching gauges.  Some of the Patriots' balls lost as much as 0.45 psi.  If I subtract 0.45 psi from 12.5, I get 12.05 psi.  It's pretty easy to imagine that somebody wrote a 0 which was later interpreted as a '9'.

Exponent's solution is simply to discard Ball 3.  But there's a bit of a problem here.  They want to simultaneously assure us that the data is trustworthy enough to cost the Patriots $1 million, two draft picks, and the services of the Super Bowl MVP for a quarter season.  But they also discard the numbers they don't like.

This is a great example of what's known in the field of statistics as "cherry-picking your data".

The pressures they discard are the highest ones measured.  Since their declared intent is to show that the Patriots' balls are dropping in pressure more significantly than the Colts' balls, this exclusion wouldn't hurt their conclusions.  But I have a different problem: excluding this data creates a false aura of certainty around the data collection process.

When this data is included, we have to confront the absurdity of the measurement recording process. I can believe that the two officials switched gauges between measuring the Pats' balls and the Colt's balls.  I cannot believe they switched them two more times while measuring only four footballs.  That's absurd.

I also disagree with the assumption that the officials first tested the Patriots' balls, then the Colts' balls, and then re-inflated the Patriots balls.  That doesn't make any sense.  We are told that they didn't have enough time to measure all the Colts' balls.  Are we supposed to believe that they figured this out 1/3 of the way through the measurement process that they would only have enough time for exactly four Colts' balls, but that they saved enough time to inflate all 11 Patriots' balls?

It seems more likely that what they would do, while they had all the Patriots' footballs out, would be to re-inflate them immediately.  That's the logical thing to do, before moving on to the next bag of balls.  And only then, as time is running down, do they cut short the testing process for the Colts' balls.

This is an important point, as it gives the Colts' footballs a lot more time in the officials' locker room at halftime to warm up.

On PDF page 170, Exponent outlines four scenarios for how the data was generated, and consider the likelihood of the data given said presumptions.  I'm going to add a fifth scenario: that the true measure for the Colts for Ball #3 should be 12.05 psi.  And I'm also going to consider how sensitive this analysis is to time estimates.  Finally I will consider what the ball pressures should be at various temperatures based on the gas laws.  I'm not going to focus solely on the difference between the two measures.  It seems likely to me that the Patriots' balls were tested with much colder air than the Colts' balls were.  That seems to be a reasonably likely possibility.

I'm going to push this off to a second post.  tl;dr

Other reading on this topic; Drew Fustin has done a great job addressing the scientific issues here.

No comments: