Skip to Content.

mget-help - [mget-help] RE: understanding attributes of objects

Please Wait...

Subject: Marine Geospatial Ecology Tools (MGET) help

Text archives


From: "Jason Roberts" <>
To: "'Marie Roch'" <>
Cc: <>
Subject: [mget-help] RE: understanding attributes of objects
Date: Thu, 1 Dec 2011 22:10:47 -0500

Hi Marie,

 

Nice to hear from you again...

 

The object model MGET uses here—the classes in GeoEco.Datasets and those in GeoEco.DataProducts that derive from them—is still under development and not documented yet. It’s fairly stable but at this point, we are not completely ready for folks to start building on it. But thanks for giving it a try. I’ll endeavor to answer your questions as best I can and hopefully you can make some progress.

 

1.    The MODISL3SSTTimeSeries class derives from Grid, in GeoEco.Datasets. The '4km' resolution provided to the constructor is actually NASA’s informal name for the MODIS product. This name is kept as a QueryableAttribute of the MODISL3SSTTimeSeries. I’ll skip discussion of what a QueryableAttribute is for now, but will say that you can get that name back by doing this:

 

>>> from GeoEco.DataProducts.NASA.PODAAC import MODISL3SSTTimeSeries

>>> grid = MODISL3SSTTimeSeries('Aqua', 'Monthly', '4km', 'sst')

>>> grid.GetQueryableAttributeValue('SpatialResolution')

u'4km'

 

You can enumerate all of the QueryableAttributes of that instance and their values like this:

 

>>> for qa in grid.GetAllQueryableAttributes():

...   print qa.Name, grid.GetQueryableAttributeValue(qa.Name)

...

VariableType Grid

Satellite aqua

SatelliteCode A

TemporalResolution monthly

TemporalResolutionCode MO

SpatialResolution 4km

SpatialResolutionCode 4

Algorithm sst

Wavelength 11um

GeophysicalParameter sst

EndDate None

VariableName l3m_data

 

QueryableAttributes are not common across all Grids. They are metadata specific to the particular product in question.

 

To get the actual resolution, you can access some properties that are common across all grids:

 

>>> grid.Dimensions

'tyx'

>>> grid.CoordIncrements

(1, 0.041666666666666664, 0.041666666666666664)

>>> grid.TIncrementUnit

'month'

>>> grid.GetSpatialReference('proj4')

u'+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs '

 

grid.Dimesnions tells you the dimensions of the grid—time y, and x—and their order. grid.CoordIncrements tells you the resolution—the increment—of each dimension (an increment will be None if the grid has a non-constant increment).

 

2.    Here are some examples of getting coordinate values:

 

>>> grid.MinCoords['x', 0:5]

array([-180.        , -179.95833333, -179.91666667, -179.875     ,

       -179.83333333])

>>> grid.CenterCoords['x', 0:5]

array([-179.97916667, -179.9375    , -179.89583333, -179.85416667,

       -179.8125    ])

>>> grid.MaxCoords['x', 0:5]

array([-179.95833333, -179.91666667, -179.875     , -179.83333333,

       -179.79166667])

>>> grid.MinCoords['y', 0:5]

array([-90.        , -89.95833333, -89.91666667, -89.875     , -89.83333333])

>>> grid.MinCoords['t', 0:5]

array([2002-07-01 00:00:00, 2002-08-01 00:00:00, 2002-09-01 00:00:00,

       2002-10-01 00:00:00, 2002-11-01 00:00:00], dtype=object)

 

3.    MGET does not currently have a kriging function. MGET does have a linear interpolation function that will allow you to easily interpolate a value from a Grid at an arbitrary point in space and time. Nearest neighbor is also supported. If you need kriging, the ArcGIS Spatial Analyst does have a tool for doing that. That tool produces a gridded raster (via kriging) from a set of arbitrarily-located points.

 

4.    THREDDS is a great protocol for remotely querying datasets. Unfortunately THREDDS does not specify a metadata dictionary or schema that data providers must adhere to. So there is no way to write a piece of code that can determine, for example, the coordinate system of all gridded datasets available through THREDDS. Some providers describe the coordinate system (or whatever you’re interested in) one way, and others describe it a different way.

 

In the case of MODIS, they used the HDF-EOS standard. This is one of those standards that, unfortunately, does not document the attribute values in the HDF file. Instead they provide a C API that you’re supposed to use to interrogate the file. This is unfortunate because it is incompatible with alternative metadata-retrival methods such as THREDDS. So, while THREDDS can expose the attributes of the HDF file, those attributes are not actually documented.

 

For this and for performance reasons, MGET often takes the strategy of hard-coding metadata that is unlikely to change, and querying via THREDDS (when possible) for metadata that is likely to change (in our opinion). In the case of MODIS, nearly all of the metadata you saw there has not changed in years, so the hard coding strategy has proven safe so far with it.

 

5.    There are several kinds of caching used throughout the Dataset and DatasetCollection classes. In general, a call to the QueryDatasets method of a DatasetCollection usually operates against the original store, not a cache, if the cacheDirectory has not been specified. Certain DatasetCollections support caching, to speed up repeated querying. For example, the THREDDSCatalog (in GeoEco.Datasets.OPeNDAP) always caches THREDDS catalog nodes in memory (for the lifetime of the THREDDSCatalog instance) and optionally caches them on disk. The on-disk caching is particularly useful for catalogs that take a long time to download, such as the NOAA NODC 4km AVHRR Pathfinder daily SST catalogs, which are split out by year and have 5840 files per year. To enable the on-disk caching of the catalog, the cacheDirectory parameter must be supplied and the cacheCatalogsOnDisk parameter must be True.

 

If cacheDirectory is supplied but cacheCatalogsOnDisk is False, the catalogs will not be cached but the eventual OPeNDAPGrid objects representing actual data will cache the data there when it is accessed.

 

MODISL3SSTTimeSeries is a bit complicated. It is a Grid with three dimensions. But the original MODIS data is not in three dimensions. Internally, MODISL3SSTTimeSeries uses something called a TimeSeriesGridStack to “stack” 2D grids returned by a THREDDSCatalog. It turns out that NASA’s server is very fast so when MODISL3SSTTimeSeries instantiates the THREDDSCatalog, leaves cacheCatalogsOnDisk set to False. So with MODISL3SSTTimeSeries, you do not have a choice of asking it to cache the catalogs—it just won’t. But it will cache the data if you specify a cacheDirectory.

 

Caches on disk are never purged by MGET. You must purge them automatically. This is generally fine for datasets that are environmental time series that only get “appended to”. But occasionally satellite data are reprocessed. In that case, the entire cache should be invalidated and all the data re-downloaded. In general, the user is responsible for knowing about that event and blowing the cache directory away. (It is an infrequent event.)

 

MGET is not using any kind of serialization, pickling, etc. save actual class instances to disk and then rehydrate them later.

 

Some portions of MGET are thread-safe and some portions are not. It is, in my experience, rare for Python programmers in the GIS community to write multithreaded code. So while I have lots of experience writing thread-safe code, and some of MGET is itself multithreaded internally (e.g. the CPU-intensive Cayula-Cornillon front detection algorithm can utilize multiple threads, if requested), I would say the blanket rule for MGET is that it is not thread safe, but that this works fine for 99% of our user community. Much of the raster processing depends on the widely-used Geospatial Data Abstraction Library (GDAL), which is itself not really thread safe, so it would be difficult to make MGET thread-safe with that dependency in place.

 

Hope that helps,

 

Jason

 

From: Marie Roch [mailto:]
Sent: Thursday, December 01, 2011 8:49 PM
To:
Subject: understanding attributes of objects

 

Dear Jason,

A little while back you sent me a demo for querying information directly from Python using a SST example from the MODIS time series.  I had some time to dig a little deeper and have a couple of questions about the GeoEco data model.

You set up something along the following lines:

from GeoEco.DataProducts.NASA.PODAAC import MODISL3SSTTimeSeries
grid = MODISL3SSTTimeSeries('Aqua', 'Monthly', '4km', 'sst')

and then mapped the time-space coordinates to indices and queried.  My questions are as follows:

  1. If my query spans a region, how do I determine the resolution at which it is gridded?  Clearly one knows this from the constructor parameters, but I would like to provide defaults for callers to simplify queries.  I see that there is a set of queryable attributes, and I assume that TemporalResolution or SpatialResolution should somehow be queried.  I can retrieve the attribute, but when I look at the methods and dictionary associated with the QueryableAttribute class I do not see anything that would let me know that my scale is 4 km or that measurements are taken daily.
  2. Once I query it with grid.Data(...), how do I pull the values associated with the data (e.g. long/lat of each point for which data was actually returned)?   Is there a simple way to pull the index labels associated with the query?
  3. Is kriging supported?
  4. Is there a way to determine what the possible parameters can be programatically?  I see in the PODAAC.MODISL3SSTTimeSeries class that these appear to be hardcoded, but they should be derivable from the Thredds catalog.  In principal, I would think that a classmethod could be written to return valid values.  Has this been done?
  5. How is the data cache associated with the DataSetCollection used?  Is it set explicitly?  How/when is the cache purged?  It seems like this is associated with instances rather than classes, so I'm assuming it would make sense to save instances when they are created (unless we simply set the cache directories for each instance to point to the same place).  Are datasets thread safe?

Thanks so much,
Marie

-- 
Marie Roch
Associate Professor of Computer Science
San Diego State University
 
Visiting Scholar
Scripps Institution of Oceanography
 
http://roch.sdsu.edu
SDSU - Tel:  619 594-5830  Fax:  619 594-6746
SIO  - Tel:  858 534-7280      
Archives powered by MHonArc.
Top of Page