Nov 24 2009
Where Is CRU’s Raw Temp Data?
Update: More on the crappy CRU code at Hot Air (maybe I won’t have to dig into it after all!) – end update.
One question still remains after all the email and document dumps – where is the raw climate data CRU used to create the global warming false alarm? We know that Phil Jones claims he ‘accidentally’ deleted it (but in emails planned to do so if he was forced to comply with FOI laws). And I find it interesting that no real data was contained in the dump – just reports, proposal, code and data products.
I ran across an interesting “readme” file in the documents. A “readme” file usually contains information that users of programs need to run the programs and listing known bugs, written by the programmers. But in the case of CRU the “readme” files are sometimes detailed logs by the developers (the Harrry_READ_ME file provides a wonderful view inside the crappy code upon which the future of humanity apparently rests). One such file I discovered under the original zip path was /documents/EMULATE-corrections (which you can read here).
In this file we find the sources for the Mean Sea Level Pressure (MLSP) data, which I presume comes from weather stations that also collect temperature values. So the sources for the MSLP are probably the sources for much of the land data CRU had (or has).
To add to AJ’s reply to Fai, there are multiple references to f77. That is the FORTRAN 1977 standard. It was designed to standardize FORTRAN between hardware vendors in the days of the 6 major mainframe vendors.
Most likely if you go to any heavy compute shop, like oil reservoir modeling, the bulk of the code will still be FORTRAN 77. No one wants to rewrite that code to ‘C’ or ‘C++”.
Next we can talk about why Cobol and PL/I.
Rick
The files we were looking at were Fortran 90 but yeah, it is still used a lot in scientific work but is being replaced, slowly, by R.
I don’t do much programming these days as my job is more that of an architectural role. I am more of a “big picture” guy in designing large TCP/IP networks that move terrabits of traffic each day between major internet operations. When in the past I needed to move large amounts of data around and manipulate it, I used python. Lower level work like building network diagnostic tools was done in C.
It would be interesting to translate these programs line by line into something like python or R because the syntax of those languages make “weird” things easier to see. Python reads almost like English. You can actually code something and come back to it 5 years later and understand what you did, unlike Fortran or perl or even c. It is almost self-documenting. You have to TRY to make it hard to read.
But having a language where whitespace is significant trips some people up 🙂 Put an extra space in front of a line and it becomes part of a completely different loop as python actually uses the level of indentation to determine where something belongs. C and Fortran use indentation only as a style enhancement to make the code easier to read, the indentation isn’t significant to the compiler/interpreter. It is to python.
What concerns me most are some comments in the HARRY file and some that others have made looking into the code about how it is build in places where it silently returns in some cases from error conditions. A routine can apparently be called that fails a bounds check and returns having done nothing at all and the calling program is unaware. In other words, the subroutine returns the same thing on failure or success and in some places makes no note of the failure.
I can see evidence of this in the HARRY file where the author is unsure of the right thing was even actually done and he seems resigned to simply crossing his fingers and hoping it all comes out ok. He isn’t positive that the output is correct at all and in the end he is simply wanting results to come out the same this time as it did last time or he wants his output to correlate to a model. As long as it does, he isn’t so interested in if it actually did what it was supposed to do.
He tries this and he tries that until he gets the output he wants. He doesn’t seem convinced that it is correct but it passes whatever test he applies to it at the end so that is “good enough”.
Going back to Warwick Hughes’ notice that a derived cell value can be larger than those from which it was derived, it is sort of like having a room full of people who are between 66 and 72 inches tall, taking an “average” of them, and coming up with 80 inches! It can’t happen that and average is larger than ANY of the numbers used to calculate the average. So that is why Hughes wanted the names of the stations in the surrounding cells and the code used to create the fill value.
As far as I know, he never got it.