Seràs la clau que obre tots els panys, seràs la llum, la llum il.limitada, seràs confí on l'aurora comença, seràs forment, escala il.luminada! | |
—M'aclame a tu Lyrics: Vicent Andrés i Estellés Music: Ovidi Montllor |
This chapter consists of a series of simple yet comprehensive tutorials that will enable you to understand PyTables' main features. If you would like more information about some particular instance variable, global function, or method, look at the doc strings or go to the library reference in chapter 4. If you are reading this in PDF or HTML formats, follow the corresponding hyperlink near each newly introduced entity.
Please, note that throughout this document the terms column and field will be used interchangeably, as will the terms row and record.
In this section, we will see how to define our own records in Python and save collections of them (i.e. a table) into a file. Then we will select some of the data in the table using Python cuts and create numarray arrays to store this selection as separate objects in a tree.
In examples/tutorial1-1.py you will find the working version of all the code in this section. Nonetheless, this tutorial series has been written to allow you reproduce it in a Python interactive console. I encourage you to do parallel testing and inspect the created objects (variables, docs, children objects, etc.) during the course of the tutorial!
Before starting you need to import the public objects in the tables package. You normally do that by executing:
>>> import tables
This is the recommended way to import tables if you don't want to pollute your namespace. However, PyTables has a very reduced set of first-level primitives, so you may consider using the alternative:
>>> from tables import *
which will export in your caller application namespace the following functions: openFile(), copyFile(), isHDF5File(), isPyTablesFile() and whichLibVersion(). This is a rather reduced set of functions, and for convenience, we will use this technique to access them.
If you are going to work with numarray (or NumPy or Numeric) arrays (and normally, you will) you will also need to import functions from them. So most PyTables programs begin with:
>>> import tables # but in this tutorial we use "from tables import *" >>> import numarray # or "import numpy" or "import Numeric"
Now, imagine that we have a particle detector and we want to create a table object in order to save data retrieved from it. You need first to define the table, the number of columns it has, what kind of object is contained in each column, and so on.
Our particle detector has a TDC (Time to Digital Converter) counter with a dynamic range of 8 bits and an ADC (Analogical to Digital Converter) with a range of 16 bits. For these values, we will define 2 fields in our record object called TDCcount and ADCcount. We also want to save the grid position in which the particle has been detected, so we will add two new fields called grid_i and grid_j. Our instrumentation also can obtain the pressure and energy of the particle. The resolution of the pressure-gauge allows us to use a simple-precision float to store pressure readings, while the energy value will need a double-precision float. Finally, to track the particle we want to assign it a name to identify the kind of the particle it is and a unique numeric identifier. So we will add two more fields: name will be a string of up to 16 characters, and idnumber will be an integer of 64 bits (to allow us to store records for extremely large numbers of particles).
Having determined our columns and their types, we can now declare a new Particle class that will contain all this information:
>>> class Particle(IsDescription): ... name = StringCol(16) # 16-character String ... idnumber = Int64Col() # Signed 64-bit integer ... ADCcount = UInt16Col() # Unsigned short integer ... TDCcount = UInt8Col() # unsigned byte ... grid_i = Int32Col() # integer ... grid_j = IntCol() # integer (equivalent to Int32Col) ... pressure = Float32Col() # float (single-precision) ... energy = FloatCol() # double (double-precision) ... >>>
This definition class is self-explanatory. Basically, you declare a class variable for each field you need. As its value you assign an instance of the appropriate Col subclass, according to the kind of column defined (the data type, the length, the shape, etc). See the section 4.16.2 for a complete description of these subclasses. See also appendix A for a list of data types supported by the Col constructor.
From now on, we can use Particle instances as a descriptor for our detector data table. We will see later on how to pass this object to construct the table. But first, we must create a file where all the actual data pushed into our table will be saved.
Use the first-level openFile (see 4.1.2) function to create a PyTables file:
>>> h5file = openFile("tutorial1.h5", mode = "w", title = "Test file")
openFile (see 4.1.2) is one of the objects imported by the "from tables import *" statement. Here, we are saying that we want to create a new file in the current working directory called "tutorial1.h5" in "w"rite mode and with an descriptive title string ("Test file"). This function attempts to open the file, and if successful, returns the File (see 4.2) object instance h5file. The root of the object tree is specified in the instance's root attribute.
Now, to better organize our data, we will create a group called detector that branches from the root node. We will save our particle data table in this group.
>>> group = h5file.createGroup("/", 'detector', 'Detector information')
Here, we have taken the File instance h5file and invoked its createGroup method (see 4.2.2) to create a new group called detector branching from "/" (another way to refer to the h5file.root object we mentioned above). This will create a new Group (see4.4) object instance that will be assigned to the variable group.
Let's now create a Table (see 4.6) object as a branch off the newly-created group. We do that by calling the createTable (see 4.2.2) method of the h5file object:
>>> table = h5file.createTable(group, 'readout', Particle, "Readout example")
We create the Table instance under group. We assign this table the node name "readout". The Particle class declared before is the description parameter (to define the columns of the table) and finally we set "Readout example" as the Table title. With all this information, a new Table instance is created and assigned to the variable table.
If you are curious about how the object tree looks right now, simply print the File instance variable h5file, and examine the output:
>>> print h5file Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:00:13 2003' / (Group) 'Test file' /detector (Group) 'Detector information' /detector/readout (Table(0,)) 'Readout example'
As you can see, a dump of the object tree is displayed. It's easy to see the Group and Table objects we have just created. If you want more information, just type the variable containing the File instance:
>>> h5file File(filename='tutorial1.h5', title='Test file', mode='w', trMap={}, rootUEP='/') / (Group) 'Test file' /detector (Group) 'Detector information' /detector/readout (Table(0,)) 'Readout example' description := { "ADCcount": Col('UInt16', shape=1, itemsize=2, dflt=0), "TDCcount": Col('UInt8', shape=1, itemsize= 1, dflt=0), "energy": Col('Float64', shape=1, itemsize=8, dflt=0.0), "grid_i": Col('Int32', shape=1, itemsize=4, dflt=0), "grid_j": Col('Int32', shape=1, itemsize=4, dflt=0), "idnumber": Col('Int64', shape=1, itemsize=8, dflt=0), "name": Col('CharType', shape=1, itemsize=16, dflt=None), "pressure": Col('Float32', shape=1, itemsize=4, dflt=0.0) } byteorder := little
More detailed information is displayed about each object in the tree. Note how Particle, our table descriptor class, is printed as part of the readout table description information. In general, you can obtain much more information about the objects and their children by just printing them. That introspection capability is very useful, and I recommend that you use it extensively.
The time has come to fill this table with some values. First we will get a pointer to the Row (see 4.6.4) instance of this table instance:
>>> particle = table.row
The row attribute of table points to the Row instance that will be used to write data rows into the table. We write data simply by assigning the Row instance the values for each row as if it were a dictionary (although it is actually an extension class), using the column names as keys.
Below is an example of how to write rows:
>>> for i in xrange(10): ... particle['name'] = 'Particle: %6d' % (i) ... particle['TDCcount'] = i % 256 ... particle['ADCcount'] = (i * 256) % (1 << 16) ... particle['grid_i'] = i ... particle['grid_j'] = 10 - i ... particle['pressure'] = float(i*i) ... particle['energy'] = float(particle['pressure'] ** 4) ... particle['idnumber'] = i * (2 ** 34) ... particle.append() ... >>>
This code should be easy to understand. The lines inside the loop just assign values to the different columns in the Row instance particle (see 4.6.4). A call to its append() method writes this information to the table I/O buffer.
After we have processed all our data, we should flush the table's I/O buffer if we want to write all this data to disk. We achieve that by calling the table.flush() method.
>>> table.flush()
Ok. We have our data on disk, and now we need to access it and select from specific columns the values we are interested in. See the example below:
>>> table = h5file.root.detector.readout >>> pressure = [ x['pressure'] for x in table.iterrows() ... if x['TDCcount']>3 and 20<=x['pressure']<50 ] >>> pressure [25.0, 36.0, 49.0]
The first line creates a "shortcut" to the readout table deeper on the object tree. As you can see, we use the natural naming schema to access it. We also could have used the h5file.getNode() method, as we will do later on.
You will recognize the last two lines as a Python list comprehension. It loops over the rows in table as they are provided by the table.iterrows() iterator (see 4.6.2). The iterator returns values until all the data in table is exhausted. These rows are filtered using the expression:
x['TDCcount'] > 3 and x['pressure'] <50We select the value of the pressure column from filtered records to create the final list and assign it to pressure variable.
We could have used a normal for loop to accomplish the same purpose, but I find comprehension syntax to be more compact and elegant.
Let's select the name column for the same set of cuts:
>>> names=[ x['name'] for x in table if x['TDCcount']>3 and 20<=x['pressure']<50 ] >>> names ['Particle: 5', 'Particle: 6', 'Particle: 7']
Note how we have omitted the iterrows() call in the list comprehension. The Table class has an implementation of the special method __iter__() that iterates over all the rows in the table. In fact, iterrows() internally calls this special __iter__() method. Accessing all the rows in a table using this method is very convenient, especially when working with the data interactively.
That's enough about selections. The next section will show you how to save these selected results to a file.
In order to separate the selected data from the mass of detector data, we will create a new group columns branching off the root group. Afterwards, under this group, we will create two arrays that will contain the selected data. First, we create the group:
>>> gcolumns = h5file.createGroup(h5file.root, "columns", "Pressure and Name")
Note that this time we have specified the first parameter using natural naming (h5file.root) instead of with an absolute path string ("/").
Now, create the first of the two Array objects we've just mentioned:
>>> h5file.createArray(gcolumns, 'pressure', array(pressure), ... "Pressure column selection") /columns/pressure (Array(3,)) 'Pressure column selection' type = Float64 itemsize = 8 flavor = 'numarray' byteorder = 'little'
We already know the first two parameters of the createArray (see 4.2.2) methods (these are the same as the first two in createTable): they are the parent group where Array will be created and the Array instance name. The third parameter is the object we want to save to disk. In this case, it is a numarray array that is built from the selection list we created before. The fourth parameter is the title.
Now, we will save the second array. It contains the list of strings we selected before: we save this object as-is, with no further conversion.
>>> h5file.createArray(gcolumns, 'name', names, "Name column selection") /columns/name Array(4,) 'Name column selection' type = 'CharType' itemsize = 16 flavor = 'List' byteorder = 'little'
As you can see, createArray() accepts names (which is a regular Python list) as an object parameter. Actually, it accepts a variety of different regular objects (see 4.2.2) as parameters. The flavor attribute (see the output above) saves the original kind of object that was saved. Based on this flavor, PyTables will be able to retrieve exactly the same object from disk later on.
Note that in these examples, the createArray method returns an Array instance that is not assigned to any variable. Don't worry, this is intentional to show the kind of object we have created by displaying its representation. The Array objects have been attached to the object tree and saved to disk, as you can see if you print the complete object tree:
>>> print h5file Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:00:13 2003' / (Group) 'Test file' /columns (Group) 'Pressure and Name' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection' /detector (Group) 'Detector information' /detector/readout (Table(10,)) 'Readout example'
To finish this first tutorial, we use the close method of the h5file File object to close the file before exiting Python:
>>> h5file.close() >>> ^D
You have now created your first PyTables file with a table and two arrays. You can examine it with any generic HDF5 tool, such as h5dump or h5ls. Here is what the tutorial1.h5 looks like when read with the h5ls program:
$ h5ls -rd tutorial1.h5 /columns Group /columns/name Dataset {3} Data: (0) "Particle: 5", "Particle: 6", "Particle: 7" /columns/pressure Dataset {3} Data: (0) 25, 36, 49 /detector Group /detector/readout Dataset {10/Inf} Data: (0) {0, 0, 0, 0, 10, 0, "Particle: 0", 0}, (1) {256, 1, 1, 1, 9, 17179869184, "Particle: 1", 1}, (2) {512, 2, 256, 2, 8, 34359738368, "Particle: 2", 4}, (3) {768, 3, 6561, 3, 7, 51539607552, "Particle: 3", 9}, (4) {1024, 4, 65536, 4, 6, 68719476736, "Particle: 4", 16}, (5) {1280, 5, 390625, 5, 5, 85899345920, "Particle: 5", 25}, (6) {1536, 6, 1679616, 6, 4, 103079215104, "Particle: 6", 36}, (7) {1792, 7, 5764801, 7, 3, 120259084288, "Particle: 7", 49}, (8) {2048, 8, 16777216, 8, 2, 137438953472, "Particle: 8", 64}, (9) {2304, 9, 43046721, 9, 1, 154618822656, "Particle: 9", 81}
Here's the outputs as displayed by the "ptdump" PyTables utility (located in utils/ directory):
$ ptdump tutorial1.h5 Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:40:51 2003' / (Group) 'Test file' /columns (Group) 'Pressure and Name' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection' /detector (Group) 'Detector information' /detector/readout (Table(10,)) 'Readout example'
You can pass the -v or -d options to ptdump if you want more verbosity. Try them out!
Also, in figure 3.1, you can admire how the tutorial1.h5 looks like using the ViTables graphical interface .