4.6. The Table class

Instances of this class represents table objects in the object tree. It provides methods to read/write data and from/to table objects in the file.

Data can be read from or written to tables by accessing to an special object that hangs from Table. This object is an instance of the Row class (see 4.6.4). See the tutorial sections chapter 3 on how to use the Row interface. The columns of the tables can also be easily accessed (and more specifically, they can be read but not written) by making use of the Column class, through the use of an extension of the natural naming schema applied inside the tables. See the section 4.9 for some examples of use of this capability.

Note that this object inherits all the public attributes and methods that Leaf already has.

Finally, during the description of the different methods, there will appear references to a particular object called NestedRecArray. This inherits from numarray.records.RecArray and is designed to keep columns that have nested datatypes. Please, see appendix B for info on these objects.

4.6.1. Table instance variables

description

A Description (see 4.8) instance describing the structure of this table.

row

The associated Row instance (see 4.6.4).

nrows

The number of rows in this table.

rowsize

The size in bytes of each row in the table.

cols

A Cols (see section 4.7) instance that serves as an accessor to Column (see section 4.9) objects.

colnames

A tuple containing the (possibly nested) names of the columns in the table.

coltypes

Maps the name of a column to its data type.

colstypes

Maps the name of a column to its data string type.

colshapes

Maps the name of a column to it shape.

colitemsizes

Maps the name of a column to the size of its base items.

coldflts

Maps the name of a column to its default.

colindexed

Is the column which name is used as a key indexed? (dictionary)

indexed

Does this table have any indexed columns?

indexprops

Index properties for this table (an IndexProps instance, see 4.17.2).

flavor

The default flavor for this table. This determines the type of objects returned during input (i.e. read) operations. It can take the "numarray" (default) or "numpy" values. Its value is derived from the _v_flavor attribute of the IsDescription metaclass (see 4.16.1) or, if the table has been created directly from a numarray or NumPy object, the flavor is set to the appropriate value.

4.6.2. Table methods

4.6.2.1. getEnum(colname)

Get the enumerated type associated with the named column.

If the column named colname (a string) exists and is of an enumerated type, the corresponding Enum instance (see 4.17.4) is returned. If it is not of an enumerated type, a TypeError is raised. If the column does not exist, a KeyError is raised.

4.6.2.2. append(rows)

Append a series of rows to this Table instance. rows is an object that can keep the rows to be append in several formats, like a NestedRecArray (see appendix B), a RecArray, a NumPy object, a list of tuples, list of Numeric/numarray/NumPy objects, string, Python buffer or None (no append will result). Of course, this rows object has to be compliant with the underlying format of the Table instance or a ValueError will be issued.

Example of use:


from tables import *
class Particle(IsDescription):
    name        = StringCol(16, pos=1)   # 16-character String
    lati        = IntCol(pos=2)        # integer
    longi       = IntCol(pos=3)        # integer
    pressure    = Float32Col(pos=4)    # float  (single-precision)
    temperature = FloatCol(pos=5)      # double (double-precision)

fileh = openFile("test4.h5", mode = "w")
table = fileh.createTable(fileh.root, 'table', Particle, "A table")
# Append several rows in only one call
table.append([("Particle:     10", 10, 0, 10*10, 10**2),
              ("Particle:     11", 11, -1, 11*11, 11**2),
              ("Particle:     12", 12, -2, 12*12, 12**2)])
fileh.close()
		

4.6.2.3. col(name)

Get a column from the table.

If a column called name exists in the table, it is read and returned as a numarray object, or as a NumPy object (whatever is more appropriate depending on the flavor of the table). If it does not exist, a KeyError is raised.

Example of use:

narray = table.col('var2')

That statement is equivalent to:

narray = table.read(field='var2')

Here you can see how this method can be used as a shorthand for the read() (see 4.6.2) method.

4.6.2.4. iterrows(start=None, stop=None, step=1)

Returns an iterator yielding Row (see section 4.6.4) instances built from rows in table. If a range is supplied (i.e. some of the start, stop or step parameters are passed), only the appropriate rows are returned. Else, all the rows are returned. See also the __iter__() special method in section 4.6.3 for a shorter way to call this iterator.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

Example of use:


		result = [ row['var2'] for row in table.iterrows(step=5)
		if row['var1'] <= 20 ]
	      

Note: This iterator can be nested (see example in section 4.6.2).

4.6.2.5. itersequence(sequence, sort=True)

Iterate over a sequence of row coordinates.

sequence

Can be any object that supports the __getitem__ special method, like lists, tuples, Numeric/NumPy/numarray objects, etc.

sort

If true, means that sequence will be sorted out so that the I/O process would get better performance. If your sequence is already sorted or you don't want to sort it, put this parameter to 0. The default is to sort the sequence.

Note: This iterator can be nested (see example in section 4.6.2).

4.6.2.6. read(start=None, stop=None, step=1, field=None, flavor=None)

Returns the actual data in Table. If field is not supplied, it returns the data as a NestedRecArray (see appendix B) object table.

The meaning of the start, stop and step parameters is the same as in the range() python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1. If you do not specify neither start nor stop, then all the rows in the object are selected.

The rest of the parameters are described next:

field

If specified, only the column field is returned as an homogeneous numarray/NumPy/Numeric object, depending on the flavor. If this is not supplied, all the fields are selected and a NestedRecArray (see appendix B) or NumPy object is returned. Nested fields can be specified in the field parameter by using a '/' character as a separator between fields (e.g. Info/value).

flavor

Passing a flavor parameter make an additional conversion to happen in the default returned object. flavor can have any of the next values: "numarray" "numpy", "python" or "numeric" (only if field has been specified). If flavor is not specified, then it will take the value of self.flavor.

4.6.2.7. readCoordinates(coords, field=None, flavor=None)

Read a set of rows given their indexes into an in-memory object.

This method works much like the read() method (see 4.6.2), but it uses a sequence (coords) of row indexes to select the wanted columns, instead of a column range.

It returns the selected rows in a NestedRecArray object (see appendix B). If flavor is provided, an additional conversion to an object of this flavor is made, just as in read().

4.6.2.8. modifyRows(start=None, stop=None, step=1, rows=None)

Modify a series of rows in the [start:stop:step] extended slice range. If you pass None to stop, all the rows existing in rows will be used.

rows can be either a recarray or a structure that is able to be converted to any of them and compliant with the table format.

Returns the number of modified rows.

It raises an ValueError in case the rows parameter could not be converted to an object compliant with table description.

It raises an IndexError in case the modification will exceed the length of the table.

4.6.2.9. modifyColumn(start=None, stop=None, step=1, column=None, colname=None)

Modify a series of rows in the [start:stop:step] extended slice row range. If you pass None to stop, all the rows existing in column will be used.

column can be either a NestedRecArray (see appendix B), RecArray, numarray, NumPy object, list or tuple that is able to be converted into a NestedRecArray compliant with the specified colname column of the table.

colname specifies the column name of the table to be modified.

Returns the number of modified rows.

It raises an ValueError in case the column parameter could not be converted into an object compliant with column description.

It raises an IndexError in case the modification will exceed the length of the table.

4.6.2.10. modifyColumns(start=None, stop=None, step=1, columns=None, names=None)

Modify a series of rows in the [start:stop:step] extended slice row range. If you pass None to stop, all the rows existing in columns will be used.

columns can be either a NestedRecArray (see appendix B), RecArray, a NumPy object, a list of arrays or list or tuples (the columns) that are able to be converted to a NestedRecArray compliant with the specified column names subset of the table format.

names specifies the column names of the table to be modified.

Returns the number of modified rows.

It raises an ValueError in case the columns parameter could not be converted to an object compliant with table description.

It raises an IndexError in case the modification will exceed the length of the table.

4.6.2.11. removeRows(start, stop=None)

Removes a range of rows in the table. If only start is supplied, this row is to be deleted. If a range is supplied, i.e. both the start and stop parameters are passed, all the rows in the range are removed. A step parameter is not supported, and it is not foreseen to implement it anytime soon.

start

Sets the starting row to be removed. It accepts negative values meaning that the count starts from the end. A value of 0 means the first row.

stop

Sets the last row to be removed to stop - 1, i.e. the end point is omitted (in the Python range tradition). It accepts, likewise start, negative values. A special value of None (the default) means removing just the row supplied in start.

4.6.2.12. removeIndex(index)

Remove the index associated with the specified column. Only Index instances (see 4.17.3) are accepted as parameter. This index can be recreated again by calling the createIndex (see 4.9.2) method of the appropriate Column object.

4.6.2.13. flushRowsToIndex()

Add remaining rows in buffers to non-dirty indexes. This can be useful when you have chosen non-automatic indexing for the table (see section 4.17.2) and want to update the indexes on it.

4.6.2.14. reIndex()

Recompute all the existing indexes in table. This can be useful when you suspect that, for any reason, the index information for columns is no longer valid and want to rebuild the indexes on it.

4.6.2.15. reIndexDirty()

Recompute the existing indexes in table, but only if they are dirty. This can be useful when you have set the reindex parameter to 0 in IndexProps constructor (see 4.17.2) for the table and want to update the indexes after a invalidating index operation (Table.removeRows, for example).

4.6.2.16. where(condition, start=None, stop=None, step=None)

Iterate over values fulfilling a condition.

This method returns an iterator yielding Row (see 4.6.4) instances built from rows in the table that satisfy the given condition over a column. If that column is indexed, its index will be used in order to accelerate the search. Else, the in-kernel iterator (with has still better performance than standard Python selections) will be chosen instead. Please, check the section 5.2 for more information about the performance of the different searching modes.

Moreover, if a range is supplied (i.e. some of the start, stop or step parameters are passed), only the rows in that range and fulfilling the condition are returned. The meaning of the start, stop and step parameters is the same as in the range() Python function, except that negative values of step are not allowed. Moreover, if only start is specified, then stop will be set to start+1.

You can mix this method with standard Python selections in order to have complex queries. It is strongly recommended that you pass the most restrictive condition as the parameter to this method if you want to achieve maximum performance.

Example of use:


passvalues=[]
for row in table.where(0 < table.cols.col1 < 0.3, step=5):
    if row['col2'] <= 20:
        passvalues.append(row['col3'])
print "Values that pass the cuts:", passvalues
	      

Note that, from PyTables 1.1 on, you can nest several iterators over the same table. For example:


for p in rout.where(rout.cols.pressure < 16):
    for q in rout.where(rout.cols.pressure < 9):
        for n in rout.where(rout.cols.energy < 10):
            print "pressure, energy:", p['pressure'],n['energy']
	      

In this example, the iterators returned by where() has been nested, but in fact, you can use any of the other reading iterators that the Table object offers. Look at examples/nested-iter.py for the full code.

4.6.2.17. whereAppend(dstTable, condition, start=None, stop=None, step=None)

Append rows fulfilling the condition to the dstTable table.

dstTable must be capable of taking the rows resulting from the query, i.e. it must have columns with the expected names and compatible types. The meaning of the other arguments is the same as in the where() method (see 4.6.2).

The number of rows appended to dstTable is returned as a result.

4.6.2.18. getWhereList(condition, flavor=None)

Get the row coordinates that fulfill the condition parameter. This method will take advantage of an indexed column to speed-up the search.

flavor is the desired type of the returned list. It can take the "numarray", "numpy", "numeric" or "python" values. The default is returning an object of the same flavor than self.flavor.

4.6.3. Table special methods

Following are described the methods that automatically trigger actions when a Table instance is accessed in a special way (e.g., table["var2"] will be equivalent to a call to table.__getitem__("var2")).

4.6.3.1. __iter__()

It returns the same iterator than Table.iterrows(0,0,1). However, this does not accept parameters.

Example of use:


result = [ row['var2'] for row in table if row['var1'] <= 20 ]
	      

Which is equivalent to:


result = [ row['var2'] for row in table.iterrows()
                       if row['var1'] <= 20 ]
	      

Note: This iterator can be nested (see example in section 4.6.2).

4.6.3.2. __getitem__(key)

Get a row or a range of rows from the table.

If the key argument is an integer, the corresponding table row is returned as a tables.nestedrecords.NestedRecord object. If key is a slice, the range of rows determined by it is returned as a tables.nestedrecords.NestedRecArray object.

Using a string as key to get a column is supported but deprecated. Please use the col() (see 4.6.2) method.

Example of use:


record = table[4]
recarray = table[4:1000:2]
	      

Those statements are equivalent to:


record = table.read(start=4)[0]
recarray = table.read(start=4, stop=1000, step=2)
	      

Here you can see how indexing and slicing can be used as shorthands for the read() (see 4.6.2) method.

4.6.3.3. __setitem__(key, value)

It takes different actions depending on the type of the key parameter:

key is an Integer

The corresponding table row is set to value. value must be a List or Tuple capable of being converted to the table field format.

key is a Slice

The row slice determined by key is set to value. value must be a NestedRecArray object or a RecArray object or a list of rows capable of being converted to the table field format.

Example of use:


		# Modify just one existing row
		table[2] = [456,'db2',1.2]
		# Modify two existing rows
		rows = numarray.records.array([[457,'db1',1.2],[6,'de2',1.3]],
		formats="i4,a3,f8")
		table[1:3:2] = rows
	      

Which is equivalent to:


		table.modifyRows(start=2, rows=[456,'db2',1.2])
		rows = numarray.records.array([[457,'db1',1.2],[6,'de2',1.3]],
		formats="i4,a3,f8")
		table.modifyRows(start=1, step=2, rows=rows)
	      

4.6.4. The Row class

This class is used to fetch and set values on the table fields. It works very much like a dictionary, where the keys are the field names of the associated table and the values are the values of those fields in a specific row.

This object turns out to actually be an extension type, so you won't be able to access its documentation interactively. However, you will be able to access some of its internal attributes through the use of Python properties. In addition, there are some important methods that are useful for adding and modifying values in tables.

4.6.4.1. Row attributes

nrow

Property that returns the current row number in the table. It is useful to know which row is being dealt with in the middle of a loop or iterator.

4.6.4.2. Row methods

append()

Once you have filled the proper fields for the current row, calling this method actually append these new data to the disk (actually data are written to the output buffer).

Example of use:


        row = table.row
        for i in xrange(nrows):
            row['col1'] = i-1
            row['col2'] = 'a'
            row['col3'] = -1.0
            row.append()
        table.flush()
		    
Please, note that, after the loop in which Row.append() has been called, it is always convenient to make a call to Table.flush() in order to avoid losing the last rows that can be in internal buffers.

update()

This allows you to modify values of your tables when you are in the middle of table iterators, like Table.iterrows() (see 4.6.2) or Table.where() (see 4.6.2). Once you have filled the proper fields for the current row, calling this method actually commits these data to the disk (actually data are written to the output buffer).

Example of use:


        for row in table.iterrows(step=10):
            row['col1'] = row.nrow
            row['col2'] = 'b'
            row['col3'] = 0.0
            row.update()
		    
which modifies every tenth row in table. Or:

        for row in table.where(table.cols.col1 > 3):
            row['col1'] = row.nrow
            row['col2'] = 'b'
            row['col3'] = 0.0
            row.update()
		    
which just updates the rows with values in first column bigger than 3.