3.3. Commiting data to tables and arrays

We have seen how to create tables and arrays and how to browse both data and metadata in the object tree. Let's examine more closely now one of the most powerful capabilities of PyTables, namely, how to modify already created tables and arrays[1].

3.3.1. Appending data to an existing table

Now, let's have a look at how we can add records to an existing table on disk. Let's use our well-known readout Table object and append some new values to it:


>>> table = h5file.root.detector.readout
>>> particle = table.row
>>> for i in xrange(10, 15):
...     particle['name']  = 'Particle: %6d' % (i)
...     particle['TDCcount'] = i % 256
...     particle['ADCcount'] = (i * 256) % (1 << 16)
...     particle['grid_i'] = i
...     particle['grid_j'] = 10 - i
...     particle['pressure'] = float(i*i)
...     particle['energy'] = float(particle['pressure'] ** 4)
...     particle['idnumber'] = i * (2 ** 34)
...     particle.append()
...
>>> table.flush()
	  

It's the same method we used to fill a new table. PyTables knows that this table is on disk, and when you add new records, they are appended to the end of the table[2].

If you look carefully at the code you will see that we have used the table.row attribute to create a table row and fill it with the new values. Each time that its append() method is called, the actual row is committed to the output buffer and the row pointer is incremented to point to the next table record. When the buffer is full, the data is saved on disk, and the buffer is reused again for the next cycle.

Caveat emptor: Do not forget to always call the .flush() method after a write operation, or else your tables will not be updated!

Let's have a look at some rows in the modified table and verify that our new data has been appended:


>>> for r in table.iterrows():
...     print "%-16s | %11.1f | %11.4g | %6d | %6d | %8d |" % \
...        (r['name'], r['pressure'], r['energy'], r['grid_i'], r['grid_j'],
...         r['TDCcount'])
...
...
Particle:      0 |         0.0 |           0 |      0 |     10 |        0 |
Particle:      1 |         1.0 |           1 |      1 |      9 |        1 |
Particle:      2 |         4.0 |         256 |      2 |      8 |        2 |
Particle:      3 |         9.0 |        6561 |      3 |      7 |        3 |
Particle:      4 |        16.0 |   6.554e+04 |      4 |      6 |        4 |
Particle:      5 |        25.0 |   3.906e+05 |      5 |      5 |        5 |
Particle:      6 |        36.0 |    1.68e+06 |      6 |      4 |        6 |
Particle:      7 |        49.0 |   5.765e+06 |      7 |      3 |        7 |
Particle:      8 |        64.0 |   1.678e+07 |      8 |      2 |        8 |
Particle:      9 |        81.0 |   4.305e+07 |      9 |      1 |        9 |
Particle:     10 |       100.0 |       1e+08 |     10 |      0 |       10 |
Particle:     11 |       121.0 |   2.144e+08 |     11 |     -1 |       11 |
Particle:     12 |       144.0 |     4.3e+08 |     12 |     -2 |       12 |
Particle:     13 |       169.0 |   8.157e+08 |     13 |     -3 |       13 |
Particle:     14 |       196.0 |   1.476e+09 |     14 |     -4 |       14 |
	  

3.3.2. Modifying data in tables

Ok, until now, we've been only reading and writing (appending) values to our tables. But there are times that you need to modify your data once you have saved it on disk (this is specially true when you need to modify the real world data to adapt your goals ;). Let's see how we can modify the values that were saved in our existing tables. We will start modifying single cells in the first row of the Particle table:


>>> print "Before modif-->", table[0]
Before modif--> (0, 0, 0.0, 0, 10, 0L, 'Particle:      0', 0.0)
>>> table.cols.TDCcount[0] = 1
>>> print "After modif first row of ADCcount-->", table[0]
After modif first row of ADCcount--> (0, 1, 0.0, 0, 10, 0L, 'Particle: 0', 0.0)
>>> table.cols.energy[0] = 2
>>> print "After modif first row of energy-->", table[0]
After modif first row of energy--> (0, 1, 2.0, 0, 10, 0L, 'Particle: 0', 0.0)

	  

We can modify complete ranges of columns as well:


>>> table.cols.TDCcount[2:5] = [2,3,4]
>>> print "After modifying slice [2:5] of ADCcount-->", table[0:5]
After modifying slice [2:5] of ADCcount--> RecArray[
(0, 1, 2.0, 0, 10, 0L, 'Particle:      0', 0.0),
(256, 1, 1.0, 1, 9, 17179869184L, 'Particle:      1', 1.0),
(512, 2, 256.0, 2, 8, 34359738368L, 'Particle:      2', 4.0),
(768, 3, 6561.0, 3, 7, 51539607552L, 'Particle:      3', 9.0),
(1024, 4, 65536.0, 4, 6, 68719476736L, 'Particle:      4', 16.0)
]
>>> table.cols.energy[1:9:3] = [2,3,4]
>>> print "After modifying slice [1:9:3] of energy-->", table[0:9]
After modifying slice [1:9:3] of energy--> RecArray[
(0, 1, 2.0, 0, 10, 0L, 'Particle:      0', 0.0),
(256, 1, 2.0, 1, 9, 17179869184L, 'Particle:      1', 1.0),
(512, 2, 256.0, 2, 8, 34359738368L, 'Particle:      2', 4.0),
(768, 3, 6561.0, 3, 7, 51539607552L, 'Particle:      3', 9.0),
(1024, 4, 3.0, 4, 6, 68719476736L, 'Particle:      4', 16.0),
(2560, 10, 100000000.0, 10, 0, 171798691840L, 'Particle:     10', 100.0),
(2816, 11, 214358881.0, 11, -1, 188978561024L, 'Particle:     11', 121.0),
(3072, 12, 4.0, 12, -2, 206158430208L, 'Particle:     12', 144.0),
(3328, 13, 815730721.0, 13, -3, 223338299392L, 'Particle:     13', 169.0)
]
	  

Check that the values has been correctly modified!. Hint: remember that column TDCcount is the first one, and that energy is the third. Look for more info on modifying columns in section 4.9.3.

PyTables also let's you modify complete sets of rows at the same time. As a demonstration of these capability, see the next example:


>>> table.modifyRows(start=1, step=3,
...                  rows=[(1, 2, 3.0, 4, 5, 6L, 'Particle:   None', 8.0),
...                        (2, 4, 6.0, 8, 10, 12L, 'Particle: None*2', 16.0)])
2
>>> print "After modifying the complete third row-->", table[0:5]
After modifying the complete third row--> RecArray[
(0, 1, 2.0, 0, 10, 0L, 'Particle:      0', 0.0),
(1, 2, 3.0, 4, 5, 6L, 'Particle:   None', 8.0),
(512, 2, 256.0, 2, 8, 34359738368L, 'Particle:      2', 4.0),
(768, 3, 6561.0, 3, 7, 51539607552L, 'Particle:      3', 9.0),
(2, 4, 6.0, 8, 10, 12L, 'Particle: None*2', 16.0)
]
	  

As you can see, the modifyRows call has modified the rows second and fifth, and it returned the number of modified rows.

Apart of modifyRows, there exists another method, called modifyColumn to modify specific columns as well. Please, check sections 4.6.2 and 4.6.2 for a more in-depth description of them.

Finally, it exists another way of modifying tables that is generally more handy than the described above. This new way uses the method update() (see section 4.6.4) of the Row instance that is attached to every table, so it is meant to be used in table iterators. Look at the next example:


>>> for row in table.where(table.cols.TDCcount <= 2):
...    row['energy'] = row['TDCcount']*2
...    row.update()
...
>>> print "After modifying energy column (where TDCcount <=2)-->", table[0:4]
After modifying energy column (where TDCcount <=2)--> NestedRecArray[
(0, 1, 2.0, 0, 10, 0L, 'Particle:      0', 0.0),
(1, 2, 4.0, 4, 5, 6L, 'Particle:   None', 8.0),
(512, 2, 4.0, 2, 8, 34359738368L, 'Particle:      2', 4.0),
(768, 3, 6561.0, 3, 7, 51539607552L, 'Particle:      3', 9.0)
]

	  

Note:The authors find this way of updating tables (i.e. using Row.update()) to be both convenient and efficient. Please, make sure to use it extensively.

3.3.3. Modifying data in arrays

We are going now to see how to modify data in array objects. The basic way to do this is through the use of __setitem__ special method (see 4.10.3). Let's see at how modify data on the pressureObject array:


>>> print "Before modif-->", pressureObject[:]
Before modif--> [ 25.  36.  49.]
>>> pressureObject[0] = 2
>>> print "First modif-->", pressureObject[:]
First modif--> [  2.  36.  49.]
>>> pressureObject[1:3] = [2.1, 3.5]
>>> print "Second modif-->", pressureObject[:]
Second modif--> [ 2.   2.1  3.5]
>>> pressureObject[::2] = [1,2]
>>> print "Third modif-->", pressureObject[:]
Third modif--> [ 1.   2.1  2. ]

	  

So, in general, you can use any combination of (multidimensional) extended slicing[3] to refer to indexes that you want to modify. See section 4.10.3 for more examples on how to use extended slicing in PyTables objects.

Similarly, with and array of strings:


>>> print "Before modif-->", nameObject[:]
Before modif--> ['Particle:      5', 'Particle:      6', 'Particle:      7']
>>> nameObject[0] = 'Particle:   None'
>>> print "First modif-->", nameObject[:]
First modif--> ['Particle:   None', 'Particle:      6', 'Particle:      7']
>>> nameObject[1:3] = ['Particle:      0', 'Particle:      1']
>>> print "Second modif-->", nameObject[:]
Second modif--> ['Particle:   None', 'Particle:      0', 'Particle:      1']
>>> nameObject[::2] = ['Particle:     -3', 'Particle:     -5']
>>> print "Third modif-->", nameObject[:]
Third modif--> ['Particle:     -3', 'Particle:      0', 'Particle:     -5']

	  

3.3.4. And finally... how to delete rows from a table

We'll finish this tutorial by deleting some rows from the table we have. Suppose that we want to delete the the 5th to 9th rows (inclusive):


>>> table.removeRows(5,10)
5
	  

removeRows(start, stop) (see 4.6.2) deletes the rows in the range (start, stop). It returns the number of rows effectively removed.

We have reached the end of this first tutorial. Don't forget to close the file when you finish:


>>> h5file.close()
>>> ^D
$
	  

In figure 3.2 you can see a graphical view of the PyTables file with the datasets we have just created. In figure 3.3 are displayed the general properties of the table /detector/readout.

Figure 3.2. The final version of the data file for tutorial 1.

Figure 3.3. General properties of the /detector/readout table.

Notes

[1]

Appending data to arrays is also supported, but you need to create special objects called EArray (see 4.12 for more info).

[2]

Note that you can append not only scalar values to tables, but also fully multidimensional array objects.

[3]

With the sole exception that you cannot use negative values for step.