We’ve discussed CRU and GISS gridded data, but many of the recent news stories about the “warmest winter” come from NOAA gridded data (for example here and here) , which seems to be gaining a little market share of news attention for gridded data.

I’ve started taking a look at the data. Given the intransigence of Phil Jones and CRU in refusing to disclose their station selection and methodology and the fact that NOAA is presumably subject to the U.S. Data Quality Act, there may be some advantages to trying to figure out how NOAA gets its results.

I’ve done a first pass in trying to replicate an individual gridcell and have replicated some features and not others. Maybe others will have some ideas.

The NOAA gridded data is a land-based gridded system and is an outgrowth of the GHCN data. Only 32% of the NOAA gridcells have any values. The readme says

GHCN homogeneity adjusted data was the primary source for developing the gridded fields. In grid boxes without homogeneity adjusted data, GHCN raw data was used to provide additional coverage when possible. Each month of data consists of 2592 gridded data points produced on a 5 X 5 degree basis for the entire globe (72 longitude X 36 latitude grid boxes).

Not a very detailed recipe, but it indicates that you should be able to go from GHCN data to NOAA gridded data. The GHCN v2 Adjusted version is located at http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean_adj.Z and the “raw” version is located at http://www1.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z . The most recent gridded version appears to be at ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/anom/anom-grid2-1880-current.dat.gz with some earlier versions at other locations at the website. I have not been able to figure out how to download this information in R directly. I manually downloaded all three of these files, unzipped them and organized the files into R formats. I’ve attached scripts and instructions for making these three data sets.

I’ve experimented a little with benchmarking the Barabinsk, Russia gridcell, where there is only one station, and replicated some key features in a first pass, but with some surprising issues. First, here is a figure comparing the Barabinsk anomaly series to the corresponding gridcell series. The series start and end at the same spots and have a very similar visual appearance. (Given that Barabinsk data extends to the present, as I’ve noted elsewhere, it’s unclear why GHCN hasn’t updated its Barabinsk data.)


Figure 1. Top - NOAA gridcell; bottom - GHCN anomaly (my calculation from monthly data)

While the appearance is very similar, there are some important differences between the gridcell version and the anomaly (As I calculated it). The red line is simply a smoothed version. There are two quite different replication problems - there’s something in the annual normalization that I’ve not replicated and this leads to a more pronounced annual cycle in the anomaly as I’ve emulated so far. The red line shows a smoothed version of the difference, which evens out the annual cycle. This shows a type of step difference between the GHCNv2 adjusted data and the gridcell, which seems inconsistent with the methodology description. It also shows a progressive increase of the gridded version relative to the GHCNv2 adjusted anomaly (as I’ve been able to emulated the calculation).


Figure 2. Difference between NOAA gridcell and emulated anomaly from GHCN Adjusted version

I was a little puzzled by this and so I tried the same calculation with the GHCN Raw version. Again I couldn’t quite match the annual cycle - probably something to do with the calculation of normals - but this eliminated many of the steps observed with the GHCN Adjusted version. So it looks like the NOAA gridded calculation has used the Barabinsk raw version here rather than the adjusted version (seemingly contrary to the readme). Right now I have no idea what causes the remaining differences with the gridded version.


Figure 3. Difference between NOAA gridcell and emulated anomaly from GHCN Raw version

To close the circuit, here is the difference between the GHCN raw and adjusted versions, showing differences of up to 1 degree C.


Figure 4. Difference between GHCN Adjusted and Raw versions.

A script is here.