Nov 24 2009

Where Is CRU’s Raw Temp Data?

Update: More on the crappy CRU code at Hot Air (maybe I won’t have to dig into it after all!) - end update.

One question still remains after all the email and document dumps – where is the raw climate data CRU used to create the global warming false alarm? We know that Phil Jones claims he ‘accidentally’ deleted it (but in emails planned to do so if he was forced to comply with FOI laws). And I find it interesting that no real data was contained in the dump – just reports, proposal, code and data products.

I ran across an interesting “readme” file in the documents. A “readme” file usually contains information that users of programs need to run the programs and listing known bugs, written by the programmers. But in the case of CRU the “readme” files are sometimes detailed logs by the developers (the Harrry_READ_ME file provides a wonderful view inside the crappy code upon which the future of humanity apparently rests). One such file I discovered under the original zip path was /documents/EMULATE-corrections (which you can read here).

In this file we find the sources for the Mean Sea Level Pressure (MLSP) data, which I presume comes from weather stations that also collect temperature values. So the sources for the MSLP are probably the sources for much of the land data CRU had (or has).

22 responses so far

22 Responses to “Where Is CRU’s Raw Temp Data?”

  1. Frogg1on 24 Nov 2009 at 3:29 pm

    Excellent analyzing and reporting on this AJ! Thank you.

  2. AJStrataon 24 Nov 2009 at 3:47 pm

    Frogg1,

    Many thanks – more to come!

  3. ivehaditon 24 Nov 2009 at 4:04 pm

    Thank you for pounding this, AJ!

    On another note, I found it interesting that in Alabama, I could only find ONE story on Sarah Palin’s visit to a Birmingham mall yesterday at al.com. And get this: NO NUMBERS were mentioned. Just a sentence that the mall was packed and they were chanting her name. On Sunday, however, they wrote that 1800 people stood in line to get one of the 1000 wristbands for a guaranteed signature. 1800, the only number they mentioned. Had pictures from SUNDAY, NOT MONDAY! And NEITHER of these stories was on the front page of the internet version-I had to search to find the story! the comments on the paper’s site were foul.

  4. ivehaditon 24 Nov 2009 at 4:05 pm

    Also, did you see Robin of Berkeley’s excellent commentary at AmericanThinker.com on the Wilding of Sarah Palin? It’s a keeper.

  5. gary1sonon 24 Nov 2009 at 7:54 pm

    Once again, we’re up against the Democrat voting media. CBS has that good report on their website, but will we see it on the evening news? Seems doubtful:

    http://tinyurl.com/ydqe5v2

    Fox has it, but we can’t follow Fox, the White House said so.

    But if we ask these guys for their data in congressional investigations, and they can’t produce it, that might just resonate and demand scrutiny.

  6. crosspatchon 24 Nov 2009 at 8:02 pm

    The obstruction that was going on was just insane. You can read what I think is the best timeline I have seen to date in this article and I know at least one side of the story is accurate because I was “watching” it unfold on the Internet on McIntyre’s blog as it happened.

    But that article linked above is the most complete timeline of an actual case of evasion of FOIA I have seen to date with information from both sides presented along with what was going on in the background now that we have the Jones emails.

    Just amazing. What is also amazing is the discovery of simple facts like grid cells that contain no recording stations being given an anomaly from surrounding regions and the grid cells get assigned a HIGHER anomaly than the surrounding cells from which the anomaly is derived. So imagine you have a cell with no data but the cells surrounding it are +1C over “average”. You don’t then give the cell with missing data a +2C anomaly! But this is what was found in the HADCRUT data.

    In order to find the reason why that happened, a researcher asked for the raw data and the program code that created that “fill” value so he could possibly discover why and discover if other such errors existed in the data output from CRU.

    He never got it.

    A great read.

  7. AJStrataon 24 Nov 2009 at 8:44 pm

    CP,

    Knowing you I suspect you have the files. Did you look at opnormals.f90? If so did you look at the subroutine “MergeTwo”. If you don’t have the file send me an email and I will send you a copy.

    If I read this subroutine right it ensures any New Station (C) required to fill holes always takes the greater of the two estimates from any nearby real stations (Station A and Station B)

  8. crosspatchon 24 Nov 2009 at 9:49 pm

    Yes, I have the files. I have not looked at the opnormals.f90 file but will have a look.

    My programming background is more in C and Python with a few other more obscure languages tossed in but I never picked up fortran (never had to) so it might take me a bit to get through it.

  9. crosspatchon 24 Nov 2009 at 9:51 pm

    But in any case, Jones told Hughes that it should never happen that a “fill” value should be greater than any of the surrounding cell values. It can be the same as one of them (the greater of two as you say above) but it can not be greater than any of them.

  10. Whomeveron 24 Nov 2009 at 11:06 pm

    thanks to ivehad it for directing us to American Thinker to read Robin of Berkeley http://www.americanthinker.com/2009/11/the_wilding_of_sarah_palin.html It’s a great blog post and be sure to click on “Comments” at the bottom and read the also great comments. thanks, ivehadit.

  11. kathieon 24 Nov 2009 at 11:36 pm

    AJ this is an interesting read.

    Iowahawk Geographic: The Secret Life of Climate Researchers

  12. crosspatchon 25 Nov 2009 at 12:03 am

    That code gives me a headache. I guess there is a reason I don’t write in Fortran.

    But that score comparison is interesting though I don’t understand what it is doing well enough to figure out if I should care. I would have to go all the way back to the start of the program to make sure I understand what all the arrays are, etc. in order to understand what it is comparing and multiplying to come up with the score to decide what it is actually writing out.

  13. kathieon 25 Nov 2009 at 12:06 am

    Robin of the Americanthinker was terrific.

  14. AJStrataon 25 Nov 2009 at 1:17 am

    CP,

    Uncle! I will do the analysis (sort of boring not all that complicated)

  15. ivehaditon 25 Nov 2009 at 1:25 am

    You’re welcome, Whatever.

  16. ivehaditon 25 Nov 2009 at 1:27 am

    Sorry, Whomever!

  17. crosspatchon 25 Nov 2009 at 1:32 am

    No, isn’t all that complicated, just need to spend time on it and I don’t have it. Grandma’s in town for the holidays, kids out of school starting tomorrow … I won’t have any time until next week when things return to “normal” around here.

    Happy Thanksgiving to you and yours.

  18. Fai Maoon 25 Nov 2009 at 2:34 am

    I am not a computer programmer. So excuse my ignorance if this is dumb question.

    Why did they use a program based in Fortran and not C++? Isn’t that a better language for this typ of project?

    While not a computer programer, I have had some experience entering data into a statistical program. I am not an expert by any means but am at least a competant layman and have a fair understanding of basic statistical method

    This data is garbage. A 15 year-old could do better than this.

    If I’d turned this in as a project in grad school I’d have probably been thrown out of the class.

    These people need to not only be removed from their jobs they need to be laughed at. Nothing is more humiliating to a professor, especially a leftist professor from the UK than to be laughed at.

  19. AJStrataon 25 Nov 2009 at 8:17 am

    Fai,

    The answer is the PhDs. They use a tool called IDL to process and graph data (it is so antiquated). There are few older scientists who use modern programming languages, and that code looks likes some nightmare out of the 1980s.

  20. AJStrataon 25 Nov 2009 at 8:28 am

    CP, Enjoy your Thanksgiving!

  21. Rick Con 25 Nov 2009 at 10:00 am

    To add to AJ’s reply to Fai, there are multiple references to f77. That is the FORTRAN 1977 standard. It was designed to standardize FORTRAN between hardware vendors in the days of the 6 major mainframe vendors.

    Most likely if you go to any heavy compute shop, like oil reservoir modeling, the bulk of the code will still be FORTRAN 77. No one wants to rewrite that code to ‘C’ or ‘C++”.

    Next we can talk about why Cobol and PL/I.

    Rick

  22. crosspatchon 25 Nov 2009 at 1:50 pm

    The files we were looking at were Fortran 90 but yeah, it is still used a lot in scientific work but is being replaced, slowly, by R.

    I don’t do much programming these days as my job is more that of an architectural role. I am more of a “big picture” guy in designing large TCP/IP networks that move terrabits of traffic each day between major internet operations. When in the past I needed to move large amounts of data around and manipulate it, I used python. Lower level work like building network diagnostic tools was done in C.

    It would be interesting to translate these programs line by line into something like python or R because the syntax of those languages make “weird” things easier to see. Python reads almost like English. You can actually code something and come back to it 5 years later and understand what you did, unlike Fortran or perl or even c. It is almost self-documenting. You have to TRY to make it hard to read.

    But having a language where whitespace is significant trips some people up :) Put an extra space in front of a line and it becomes part of a completely different loop as python actually uses the level of indentation to determine where something belongs. C and Fortran use indentation only as a style enhancement to make the code easier to read, the indentation isn’t significant to the compiler/interpreter. It is to python.

    What concerns me most are some comments in the HARRY file and some that others have made looking into the code about how it is build in places where it silently returns in some cases from error conditions. A routine can apparently be called that fails a bounds check and returns having done nothing at all and the calling program is unaware. In other words, the subroutine returns the same thing on failure or success and in some places makes no note of the failure.

    I can see evidence of this in the HARRY file where the author is unsure of the right thing was even actually done and he seems resigned to simply crossing his fingers and hoping it all comes out ok. He isn’t positive that the output is correct at all and in the end he is simply wanting results to come out the same this time as it did last time or he wants his output to correlate to a model. As long as it does, he isn’t so interested in if it actually did what it was supposed to do.

    He tries this and he tries that until he gets the output he wants. He doesn’t seem convinced that it is correct but it passes whatever test he applies to it at the end so that is “good enough”.

    Going back to Warwick Hughes’ notice that a derived cell value can be larger than those from which it was derived, it is sort of like having a room full of people who are between 66 and 72 inches tall, taking an “average” of them, and coming up with 80 inches! It can’t happen that and average is larger than ANY of the numbers used to calculate the average. So that is why Hughes wanted the names of the stations in the surrounding cells and the code used to create the fill value.

    As far as I know, he never got it.

Trackback URI | Comments RSS

Leave a Reply

You must be logged in to post a comment.