PyTables User's Guide: Hierarchical datasets in Python - Release 1.3.2 | ||
---|---|---|
Prev | Chapter 7. NetCDF - a PyTables NetCDF3 emulation API | Next |
tables.NetCDF data is stored in an HDF5 file instead of a netCDF file.
Although each variable can have only one unlimited dimension in a tables.NetCDF file, it need not be the first as in a true NetCDF file. Complex data types F (Complex32) and D (Complex64) are supported in tables.NetCDF, but are not supported in netCDF (or Scientific.IO.NetCDF). Files with variables that have these datatypes, or an unlimited dimension other than the first, cannot be converted to netCDF using h5tonc.
Variables in a tables.NetCDF file are compressed on disk by default using HDF5 zlib compression with the shuffle filter. If the least_significant_digit keyword is used when a variable is created with the createVariable method, data will be truncated (quantized) before being written to the file. This can significantly improve compression. For example, if least_significant_digit=1, data will be quantized using numarray.around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). From http://www.cdc.noaa.gov/cdc/conventions/cdc_netcdf_standard.shtml: "least_significant_digit -- power of ten of the smallest decimal place in unpacked data that is a reliable value." Automatic data compression is not available in netCDF version 3, and hence is not available in the Scientific.IO.NetCDF module.
In tables.NetCDF, data must be appended to a variable with an unlimited dimension using the append method of the netCDF variable object. In Scientific.IO.NetCDF, data can be added along an unlimited dimension by assigning it to a slice (there is no append method). The sync method of a tables.NetCDF NetCDFVariable object synchronizes the size of all variables with an unlimited dimension by filling in data using the default netCDF _FillValue. The sync method is automatically invoked with a NetCDFFile object is closed. In Scientific.IO.NetCDF, the sync() method flushes the data to disk.
The tables.NetCDF createVariable() method has three extra optional keyword arguments not found in the Scientific.IO.NetCDF interface, least_significant_digit (see item (2) above), expectedsize and filters. The expectedsize keyword applies only to variables with an unlimited dimension, and is an estimate of the number of entries that will be added along that dimension (default 1000). This estimate is used to optimize HDF5 file access and memory usage. The filters keyword is a PyTables filters instance that describes how to store the data on disk. The default corresponds to complevel=6, complib='zlib', shuffle=1 and fletcher32=0.
tables.NetCDF data can be saved to a true netCDF file using the NetCDFFile class method h5tonc (if Scientific.IO.NetCDF is installed). The unlimited dimension must be the first (for all variables in the file) in order to use the h5tonc method. Data can also be imported from a true netCDF file and saved in an HDF5 tables.NetCDF file using the nctoh5 class method.
In tables.NetCDF a list of attributes corresponding to global netCDF attributes defined in the file can be obtained with the NetCDFFile ncattrs method. Similarly, netCDF variable attributes can be obtained with the NetCDFVariable ncattrs method. These functions are not available in the Scientific.IO.NetCDF API.
You should not define tables.NetCDF global or variable attributes that start with _NetCDF_. Those names are reserved for internal use.
Output similar to 'ncdump -h' can be obtained by simply printing a tables.NetCDF NetCDFFile instance.