Home > Blog

Estimating Greenhouse Gas Emissions:
The Washington Post and information quality

29 Apr 2007 in

A Page One story by Washington Post staff writer David A. Fahrenthold says carbon dioxide emissions in the Washington, DC, area increased 13.4% from 2001 to 2005.

The article clearly is intended to influence public policy. But there are significant problems with this estimate that are not disclosed in the article. The federal Information Quality Act does not apply to the Washington Post, but it would apply to any federal agency that attempted to either take action based on them, or even to report them in a manner suggesting that it thought they were valid. (Congress is exempt from the statutory requirement to only disseminate scientific and statistical data that meet applicable information quality standards. Unlike Executive branch agencies, of course, Congress is never regarded as an authoritative body for scientific or statistical information. )

Below we compare the data reported by Fahrenthold with the information quality standards that apply to federal agencies.

Here's what Fahrenthold tells us about how he derived his estimate:

The Post estimate began with data on miles traveled by cars and trucks in local jurisdictions and the amount of kilowatt hours used by utility customers.

Then, using methods from the U.S. Energy Information Administration, those figures were used to calculate the total amount of carbon dioxide emitted from vehicles and power-plant smokestacks. [See the chart for details.]

The figures from those calculations leave out greenhouse gases from other sources, such as agriculture, planes, boats and oil furnaces. Those missing figures could account for half of all emissions.

The the chart referred to in square brackets above is found only in the print edition and is titled "The Rapid Rise of Emissions." It contains the following reported data, but in graphical form:

"The Rapid Rise of Emissions"
("The rate of increase was calculated by The Washington Post"
using data from governments, environmental groups and electric utilities")

Cars and Trucks* Electricity Use** Both Sources Combined
Virginia Suburbs 16.8 20.4 18.8
Maryland Suburbs 10.1 12.2 11.2
The District -1.6 9.1 6.7
Total Area 11.7 14.6 13.4
US Total 4.9 6.0 5.6
*Arlington County not included in Virginia Suburbs
** Frederick County not included in Maryland Suburbs. Only partial data available for Stafford, Fauquier, Calvert, Montgomery and Prince George's counties


SOURCE: Staff reporting
Washington Post, April 29, 2007 Print Edition, A16


These data do not adhere to the minimum information quality standards that would apply if they had been disseminated by the federal government.

TRANSPARENCY AND REPRODUCIBILITY

Federal information quality guidelines require government agencies to practice transparency and reproducibility when they disseminate statistical information. Transparency means fully revealing all sources and methods. Reproducibility means providing enough information that a qualified third party would obtain essentially the same answer. The Post's data do not satisfy either of these requirements.

The Post's choice of data is not transparent, and Fahrenthold only hints at his sources. At least one of his acknowledged sources -- "environmental groups" -- have a policy interest in maximizing the reported percentage increase in CO2 emissions. It is possible that they did not bias their data in accordance with these policy interests. However, Fahrenthold does not inform readers of this potential conflict of interest, nor does he reveal whether the Post performed due diligence to validate the validity and reliability their data. It appears that the Post simply accepted their data without question.

The Post acknowledges that its cdata are incomplete two ways -- first, by not counting all emissions from categories that it included, and second, by excluding source categories. When data are incomplete, inferences about them should be made with caution. Instead, the Post mentions these defects but draws inferences as if these defects are minor.

With regard to its analytic methods, the Post also reveals nothing of importance. Presumably, the Post performed a simple subtraction of 2001 from 2005 values and assumed the resulting difference to be an unbiased estimate. An unbiased estimate is one that is just as likely to overestimate the true but unknown value as to underestimate it. But simple subtraction yields an unbiased estimate of the difference only under certain restrictive conditions, including:

  • All definitions must be identical for 2001 and 2005. Any change in definitions means that the data are not comparable across years, and the result of subtraction is uninterpretable. Apples cannot be subtracted from oranges.
  • Data that were missing in each year must be missing from both years. Counties partially counted or missing in 2001 must be either missing or excluded in 2005, and vice versa. Where coverage was partial in 2001, it must be identically partial in 2005.
  • The methods used to estimate values for 2001 must be the same methods used for estimating values for 2005. Any change in methods implies an explainable discrepancy in the reported difference.
These conditions might apply, but we don't know because the Post did not reveal its sources and methods.

This leads to the Post's second procedural failure. The Post's calculations are not reproducible by a qualified independent third party. Fahrenthold reports that "Jonathan Cogan, a spokesman for the [Department of Energy's] Energy Information Administration reviewed The Post's calculations and said the agency's formulas appeared to have been used correctly." The extent of this external review is unclear -- was it limited to fidelity to EIA formulae, or did it also include a review of the Post's input data? (By responding to the Post's request, Cogan put EIA in the position of violating the spirit of the law by implicitly conveying its endorsement. He did not violate the letter of the law because statements made by agency spokesmen are exempt.)

The depth of Cogan's review notwithstanding, the reproducibility requirement in federal information quality standards can't be satisfied by reliance on a hand picked third party. Satisfying the reproducibility requirement can be achieved only by disclosure.

OBJECTIVITY

Federal information quality guidelines require federal agencies to ensure that statistical information intended to influence policy be objective.

Substantive objectivity means that information must be "accurate, reliable, and unbiased." "In a scientific, financial, or statistical context, the original and supporting data shall be generated, and the analytic results shall be developed, using sound statistical and research methods."

Presentational objectivity means that information must be "presented in an accurate, clear, complete, and unbiased manner," including "within a proper context" that may include"other information" necessary "to ensure an accurate, clear, complete, and unbiased presentation, including sources and supporting data and models "so that the public can assess for itself whether there may be some reason to question the objectivity of the sources."

We've already documented why the Post's estimates are unlikely to be substantively objective. If a federal agency disseminated statistical information this way, it would be presumptively in violation of the law. So we'll focus on presentational objectivity, which applies even if substantive objectivity is assured.
An elementary principle of information quality is to present quantitative measurements or estimates at a level of precision consistent with that of the measurement instruments and analytic tools. In this case, Fahrenthold presents estimates of percentage change with three significant figures, with the last digit measuring tenths of percentage points. This means Fahrenthold's estimate of the percentage change in CO2 emissions should be accurate within 0.05%. Given just the acknowledged missing data, that level of precision is technically infeasible; he would be fortunate if his first digit were significant. But by using three significant digits, Fahrenthold falsely implies that he knows much more about CO2 emissions, and their changes over time, than is justified by his data. Presentational objectivity is never served by misleading the users of information about its precision even when the information is accurate.
It's unclear by how much, if any, CO2 emissions actually rose because Fahrenthold chose a problematic baseline. The year 2001 was unusual in many respects, most notably a weak recession and the coordinated terrorist attacks of September 11. The average annual change in CO2 emissions likely would be different -- and in particular, smaller -- if Fahrenthold had chosen as a baseline a comparable date in the previous business cycle. It's also unclear what to make of estimates for the Virginia Suburbs that exclude Arlington County, the jurisdiction closest to the District of Columbia. This difficulty is exacerbated by missing data from two exurban Virginia counties (Stafford and Fauquier). Arlington, Stafford and Fauquier counties represent 9%, 5% and 3%, respectively, of the estimated 2005 population of the Virginia Suburbs. That is, data are incomplete or excluded with respect to 17% of the suburban Virginia population.

Figures for the Maryland Suburbs are even more problematic. Fahrenthold reports that there are data missing from Montgomery and Prince George's counties, and he excludes Frederick County. These counties represent 33%, 30% and 3%, respectively, of the Maryland Suburbs. Data are incomplete or excluded with respect to 66% of the suburban Maryland population.

Howard County, located midway between Washington and Baltimore, is also excluded by the Post. Had Howard County been included, the population for the Maryland Suburbs would have been about 10% greater.
Fahrenthold reports that District of Columbia officials took credit for their apparently lower rate of increase in CO2 emissions:

The brightest news came from the District, where emissions grew 6.7 percent. D.C. officials said they think the relatively low increase is partly a sign of changing behavior: Residents were leaving their cars at home and walking, biking or taking public transit..

But Fahrenthold did not point out that DC's population had declined about 4% during this period, whereas the population of Suburban Virginia and Suburban Maryland increased about 11% and 10%, respectively. Adjusting for DC's population decline, Fahrenthold's figures, if true, would mean DC's CO2 emissions rose 11% per capita.

Indeed, the entire picture changes when population changes are taken into account. When Fahrenthold's (unverified) estimates of percentage changes in CO2 emissions from 2001 to 2005 are divided by the Census Bureau's (validated) estimates of population changes from 2000 to 2005, DC's performance is the worst in the region rather than the best:

How Adjusting for Population
Changes the Washington Post's Estimates
Jursdictions Percentage Change in CO2 Emissions
Reported by the Washington Post
Percentage Change in CO2
Emissions Reported by the
Washington Post Adjusted for Population Changes
Virginia
Suburbs
18.8% 7%
Maryland
Suburbs
11.2% 2%
The District 6.7% 11%
See table.

To be clear, we hesitate to draw any inferences from Fahrenthold's data. We doubt they are useful for any public policy purpose. Most importantly, his inferences about both the absolute change in CO2 emissions in the Washington metropolitan area and his comparisons across jurisdictions are unsupported by his own data.
The primary message of Fahrenthold's article is that CO2 emissions in the Washington metropolitan area are "rapidly rising." But Fahrenthold reports data from just two dates. Even if these data were accurate to three significant figures, it would be technically impossible to discern acceleration. The most that Fahrenthold could legitimately report is the average annual change.
Information quality principles matter for many reasons, but one key reason is that when poor quality information is disseminated, others are led to draw invalid inferences. These invalid inferences often find their way into public policy unless they are successfully corrected before decisions are made.

A plausible explanation for the invalid inferences made by the anonymous DC government officials cited by Fahrenthold is that Fahrenthold himself premised his request for a reaction on invalid inferences about the data. When pressed for a reaction, public officials may offer answers that are consistent with other data at their disposal. Alternatively, they may give an explanation that is either self-serving or what they think the reporter wants to hear. (Sometimes these are the same thing.) It's possible that DC officials have data supporting their suggestion that DC's allegedly lower rate of increase CO2 emissions is a "sign of changing behavior." But it's more plausible that they didn't want to attribute the lower rate to a decline in the District's population, about which they would be familiar and would not be interpreted favorably by a reporter whose narrative is that regional CO2 are "rapidly rising."

Similarly, Frank O'Donnell's claim that "sprawl is causing a big increase in greenhouse gases" is most plausibly related to the public policy positions he and his organization advocate. Because they are opposed to what they call "suburban sprawl," sprawl is a convenient inference from Fahrenthold's data that also fits the reporter's likely narrative.

If sprawl were actually the culprit, then one would expect to find that commuting times are significantly higher for jurisdictions farther away from the District. The available data don't support that inference. Average commute times reported by the Census Bureau are not nearly as different across the region as one would expect if sprawl were the underlying cause of rising CO2 emissions. For Virginia, average commute times vary from 27.3 minutes (Arlington County) to 37.7 minutes (Stafford County). But Arlington is located adjacent to the District and Stafford is about 45 miles southwest. A 10-minute difference in average commuting time seems much less than one would expect if proximity to the District reduced CO2 emissions from commuting. For Maryland the range is 29.2 minutes (St. Mary's County) to 39.8 minutes (Calvert County) -- again, a range of just 10-minutes.

Indeed, the average commuting time for residents of the District was almost 30 minutes in 2000. The higher population density of the District apparently does not translate into a significantly reduced commute. When DC's figure is treated as a baseline and subtracted from the averages for the other jurisdictions, the range in net average commuting times in Virginia becomes -2.4 to 8, and the range in Maryland becomes -0.5 to 10.1. People in the Washington metropolitan area don't all work in the District, and they choose places to live based on many criteria other than the length of their commute. But their average commute is remarkable stable irrespective if where they live.

Of all the errors in Fahrenthold's story, surely the most pernicious is the claim that CO2 emissions are "rising rapidly." As we've already noted, a rate of acceleration cannot be discerned from two static observations. But this narrative is clearly an appealing one for those who are predisposed to believe that "the problem" of anthropogenic global climate change is "getting worse." This narrative is often expressed by Post reporters and the newspaper's editorial board. The Post should make a diligent effort to understand information quality principles and apply them to the newspaper's work products, especially when a story appears to conform to the revealed biases of its reporters and editors.



Selected Washington Metropolitan Area
Population Statistics,
2000-2005
Jurisdiction 2000 20051 % Ch2 2000
Avg
Commute
(mins)3
2000 Avg
Commute
Normalized
by District
Avg Commute
Pop'n Adjusted
% Change CO24
VIRGINIA SUBURBS 1,962,782 2,220,329 10.8% -- -- 7%
Arlington County 189,453 195,965 3.4% 27.3 -2.4 --
Alexandria City 128,283 128,923 0.5% 29.7 0.0 --
Fairfax City 21,498 21,963 2.2% 30.1 0.4 --
Fairfax County 969,749 1,006,529 3.8% 30.7 1.0 --
Falls Church City 10,377 10,781 3.9% 26.4 6.7 --
Fauquier County 55,139 64,997 17.9% 36.8 7.1 --
Loudon County 169,599 255,518 50.7% 30.8 1.1 --
Manassas City 35,135 37,569 6.9% 32.4 2.7 --
Manassas Park City 10,290 11,622 12.9% 35.6 5.9 --
Prince William County 280,813 348,588 24.1% 36.9 7.2 --
Stafford County 92,446 117,874 27.5% 37.7 8.0 --
MARYLAND SUBURBS
2,561,109 2,828,550 9.5% -- -- 2%
Anne Arundel County 489,656 510,878 4.3% 28.9 0.1 --
Calvert County 74,563 87,925 17.9% 39.8
10.1 --
Charles County 120,546 138,822 15.2% 39.3 9.4 --
Frederick County 195,277 220,701 13.0% 31.9 2.2 --
Montgomery County 873,341 927,583 6.1% 32.8 3.1 --
Prince George's County 801,515 846,123 5.7% 35.9 6.2 --
St. Mary's County 6,211 96,518 11.9% 29.2 -0.5 --
THE DISTRICT 572,059 550,521 -3.8% 29.7 0.0 11%
1 Estimated by Census Bureau; see data quality note.
2 Estimated by Census Bureau; see
data quality note.
3 Estimated by Census Bureau; see
data quality note.
4 Estimated by the Washington Post; no data quality disclosed.