31 August 2010

Google Charts is (are?) cool stuff

I am just finishing up a summer project, ahead of my departure from the University of Arizona at the end of August, in which I've been writing a lot more JavaScript and developing ways to use XML data (exported from MS Excel) through several methods, including Google Charts and Google Maps. I look forward to a continued collaboration with my project partners at the University of Arizona even after I get re-settled in the U.S. Midwest. More on that move later, maybe some pictures like the one I posted from my move to Arizona a couple years ago...

Anyway, the dataset that I've been able to work with this summer is truly extraordinary. Last summer (2009) I gave a brief presentation at the annual meeting of the Arizona Hydrological Society with a few preliminary examples of visualizations from work done over several years by Prof. Katie Hirschboeck, a professor in the UA Laboratory for Tree-Ring Research and affiliated with the UA Climate Assessment for the Southwest (CLIMAS) project. What she has found, as we wrote then in our conference abstract, is that...
"...extreme flooding episodes in Arizona, such as the widespread winter events of 1993, may seem sporadic but have been shown to evolve in recurrent, favorable climatic and meteorological regimes that can develop in specific reinforcing patterns and may vary in response to climate change. Managers in the water community are challenged to assess the roles of extreme events and climatic variability in emergency and long-range planning for impact mitigation and citizen safety in floodplains, riparian areas, and river–reservoir systems. A flood hydroclimatology database being developed at the University of Arizona, in collaboration with the Climate Assessment for the Southwest (CLIMAS), examines watershed-specific event conditions across Arizona, classifies the meteorological causes of floods in the stream gauge records, surveys supporting synoptic patterns, and includes paleoflood information where available. The flood-linked synoptic patterns of the database may be used for comparison with current observations and forecasts and ultimately to assess the impacts of climate variability on individual watersheds."
Katie is a very precise and exacting writer, and I usually start verbose and make it more concise as I revise, so overall we're both rather harsh in editing. I think that, between us, that was our eighth draft on which we finally agreed enough to submit for the conference...

So what we did then, for the conference, was a mock-up of information for a single USGS gauge location in Arizona, the station on the Santa Cruz River in Tucson (09482500), using graphics from Katie's previously published sources. This was basically a proof-of-concept for us, a chance for me to show Katie and anyone else who was interested what this massively-interlinked dataset and information gold-mine could look like when it was put into the spatial context that Google Maps provides. I had already included several point- and polygon-based datasets in the AHIS map interface, most exported from numerous GIS datasets and combined with other sources of information. In this case, we worked up a Map-based pop-up window for the USGS gauge station that was so detailed that we had multiple tabs of information, ranging from a link to the real-time gauge date to the historical analysis of flood events, both over the gauge period of record and the specific 1993 event, and a tab full of references. Picture your normal tabbed browser, and in one tab you are looking at a Google Map of selected stream gauge stations in Arizona, and you click on a gauge marker and up comes a window inside the map that is yet another tabbed window for browsing the available information just for that one single location... cool, huh? And this was before Google acknowledged that one could even create multiple-tab information boxes attached to a map-based marker...

Anyway, after the conference our task became an effort to backtrack to the original dataset that Katie had compiled and re-build the graphics digitally, especially the charts of flood peaks over time, and make them interactive for use by various stakeholder groups in decision-making. By the end of the summer project, we've made it just shy of full interactivity, but here's one example of the results from Google Charts:

Click on the chart image for a larger version. This chart is a provisional research result, not suitable for publication or inclusion elsewhere, and should not be used for diagnostic or other purposes. It just simply looks cool, and I made it using Katie's data. The explanation to appear in the figure caption might go something like this:
"Time series of peaks-above-base for USGS stream gauge 09482500 on the Santa Cruz River at Tucson, Arizona, for the available period of record (1915 - 2004). Bars represent the peak flow for the water year indicated and are colored by the type of weather event diagnosed as the proximate cause of the flood peak. Circles represent additional peaks-above-base that did not qualify as annual peak flows, and are also colored accordingly except where these would be the same color as the annual peak bar for that year, in which cases the circles are instead colored white for contrast."
From the USGS directly, one would be able to obtain simply the peak flow values and the date of occurrence, or for some gauges a complete record of daily average flow values. However, budget cuts at the USGS have been taking their toll on the National Streamflow Information Program (NSIP) and, in addition to many stream gauges being decommissioned in the past few years, data archive are being culled methodically in order to reduce the amount of information that needs to be digitized for web availability. Somewhere, likely in several places around the country, are the paper archives that hold the complete records. In no case, as far as I know, are the gauge records tied directly to the meteorological records that have data on the storm events that lead to these flood events. This is very likely a pioneering research avenue with great potential; we need to learn from the past in order to prepare for the future, and floods remain one of those ubiquitous natural hazards that we seem to forget when the rivers are not running high.

One of the interesting things I ran into while building this particular chart was that Google Charts takes two different methods of data submission. At first, when it was still part of Google Labs, the Charts API took data by the HTTP GET method, which is limited in the length of the URL to about 2K characters. A more basic version of the chart shown above, with only the annual peaks included, just barely fits into this limit. Along the way to developing the Charts API for graduation from Google Labs, the developers received much feedback on how the GET method limited users' productivity (and creativity), so they developed a far more flexible HTTP POST method that is virtually unlimited in data submission capacity. So along the way to building this particular chart, I had to transition our method to POST as well, but it has been well worth the extra code development. The chart shown above, complete with data on the "peaks-above-base" in the annual series, runs about 6K characters in the POST submission before URL encoding (a multiplication factor of ~2-3 in character-data volume transferred).

So we've also developed options for the chart-making tool to deliver monochrome plots, for easier use in print publications some day, and I'm currently (yes, right this minute) working on making the returned images interactive: Google Charts provides for an alternative return data stream in JSON format, which is then used to make an object-map out of the image so that "hotspots" will show pop-up information when you mouse over them and open a new tab/window with even more detailed information when you click. I have only two suggestions to the Google Charts API development team: (1) although colors and vertical stripes are available for the value bars, patterns and diagonal stripes (as in MS Excel) would be nice to have available too, and (2) it would be great if the plot image could be returned as an object for web page placement using HTML-based standards according to the developer's preference, and not as an pure image that either wipes out the page that sent the POST information (a form with resulting data that the user might still want, in this case) or directed to a new tab entirely. At this point, it's not possible to generate a one-click interactive chart, but the pieces are there and it will eventually be capable of that (with a lot more programming work, of course).

In addition, it will be connected to a Google Map interface with many more analyzed gauge locations in Arizona, for access to the full scope of Katie's dataset in space and time, as well as several other ancillary information sources that we will carry over from the AHIS project. If you're interested in how we're doing with this, keep checking for these tools in the new CLIMAS website hosted at the University of Arizona, and as always you're welcome to post a comment or contact me directly. The query and chart tool will become public with a bit more work, on which I'll be consulting long-distance after my move.