4.2. The File class

An instance of this class is returned when a PyTables file is opened with the openFile() function. It offers methods to manipulate (create, rename, delete...) nodes and handle their attributes, as well as methods to traverse the object tree. The user entry point to the object tree attached to the HDF5 file is represented in the rootUEP attribute. Other attributes are available.

File objects support an Undo/Redo mechanism which can be enabled with the enableUndo() method. Once the Undo/Redo mechanism is enabled, explicit marks (with an optional unique name) can be set on the state of the database using the mark() method. There are two implicit marks which are always available: the initial mark (0) and the final mark (-1). Both the identifier of a mark and its name can be used in undo and redo operations.

Hierarchy manipulation operations (node creation, movement and removal) and attribute handling operations (setting and deleting) made after a mark can be undone by using the undo() method, which returns the database to the state of a past mark. If undo() is not followed by operations that modify the hierarchy or attributes, the redo() method can be used to return the database to the state of a future mark. Else, future states of the database are forgotten.

Note that data handling operations can not be undone nor redone by now. Also, hierarchy manipulation operations on nodes that do not support the Undo/Redo mechanism issue an UndoRedoWarning before changing the database.

The Undo/Redo mechanism is persistent between sessions and can only be disabled by calling the disableUndo() method.

4.2.1. File instance variables

filename

The name of the opened file.

format_version

The PyTables version number of this file.

isopen

True if the underlying file is open, false otherwise.

mode

The mode in which the file was opened.

title

The title of the root group in the file.

trMap

A dictionary that maps node names between PyTables and HDF5 domain names. Its initial values are set from the trMap parameter passed to the openFile function. You can change its contents after a file is opened and the new map will take effect over any new object added to the tree.

rootUEP

The UEP (user entry point) group in the file (see 4.1.2).

filters

Default filter properties for the root group (see section 4.17.1).

root

The root of the object tree hierarchy (a Group instance).

objects

A dictionary which maps path names to objects, for every visible node in the tree (deprecated, see note below).

groups

A dictionary which maps path names to objects, for every visible group in the tree (deprecated, see note below).

leaves

A dictionary which maps path names to objects, for every visible leaf in the tree (deprecated, see note below).

Note: From PyTables 1.2 on, the dictionaries objects, groups and leaves are just instances of objects faking the old functionality. Actually, they internally use File.getNode() (see 4.2.2) and File.walknodes() (see 4.2.2), which are recommended instead.

4.2.2. File methods

4.2.2.1. createGroup(where, name, title='', filters=None)

Create a new Group instance with name name in where location.

where

The parent group where the new group will hang from. where parameter can be a path string (for example "/level1/group5"), or another Group instance.

name

The name of the new group.

title

A description for this group.

filters

An instance of the Filters class (see section4.17.1) that provides information about the desired I/O filters applicable to the leaves that hangs directly from this new group (unless other filters properties are specified for these leaves). Besides, if you do not specify filter properties for its child groups, they will inherit these ones.

4.2.2.2. createTable(where, name, description, title='', filters=None, expectedrows=10000)

Create a new Table instance with name name in where location. See the section 4.6 for a description of the Table class.

where

The parent group where the new table will hang from. where parameter can be a path string (for example "/level1/leaf5"), or Group instance.

name

The name of the new table.

description

This is an object that describes the table, that is, how many columns has it, and properties for each column: the type, the shape, etc. as well as other table properties.

description can be any of the next several objects:

A user-defined class

This should inherit from the IsDescription class (see 4.16.1) where table fields are specified.

A dictionary

For example, when you do not know beforehand which structure will have your table). See section 3.4 for an example of use.

A RecArray

This object from the numarray package is also accepted, and all the information about columns and other metadata is used as a basis to create the Table object. Moreover, if the RecArray has actual data this is also injected on the newly created Table object.

A NestedRecArray

Finally, if you want to have nested columns in your table, you can use this object (see appendix B) and all the information about columns and other metadata is used as a basis to create the Table object. Moreover, if the NestedRecArray has actual data this is also injected on the newly created Table object.

title

A description for this object.

filters

An instance of the Filters class (see section 4.17.1) that provides information about the desired I/O filters to be applied during the life of this object.

expectedrows

An user estimate of the number of records that will be on table. If not provided, the default value is appropriate for tables until 10 MB in size (more or less). If you plan to save bigger tables you should provide a guess; this will optimize the HDF5 B-Tree creation and management process time and memory used. See section 5.1 for a discussion on that issue.

4.2.2.3. createArray(where, name, object, title='')

Create a new Array instance with name name in where location. See the section 4.10 for a description of the Array class.

object

The regular array to be saved. Currently accepted values are: NumPy, Numeric, numarray arrays (including CharArray string numarrays) or other native Python types, provided that they are regular (i.e. they are not like [[1,2],2]) and homogeneous (i.e. all the elements are of the same type). Also, objects that have some of their dimensions equal to zero are not supported (use an EArray object if you want to create an array with one of its dimensions equal to 0).

See createTable description 4.2.2 for more information on the where, name and title, parameters.

4.2.2.4. createCArray(where, name, shape, atom, title='', filters=None)

Create a new CArray instance with name name in where location. See the section 4.11 for a description of the CArray class.

shape

The shape of the objects to be saved.

atom

An Atom instance representing the shape, type and flavor of the chunk of the objects to be saved.

See createTable description 4.2.2 for more information on the where, name and title, parameters.

4.2.2.5. createEArray(where, name, atom, title='', filters=None, expectedrows=1000)

Create a new EArray instance with name name in where location. See the section 4.12 for a description of the EArray class.

atom

An Atom instance representing the shape, type and flavor of the atomic objects to be saved. One (and only one) of the shape dimensions must be 0. The dimension being 0 means that the resulting EArray object can be extended along it. Multiple enlargeable dimensions are not supported right now. See section 4.16.3 for the supported set of Atom class descendants.

expectedrows

In the case of enlargeable arrays this represents an user estimate about the number of row elements that will be added to the growable dimension in the EArray object. If not provided, the default value is 1000 rows. If you plan to create both much smaller or much bigger EArrays try providing a guess; this will optimize the HDF5 B-Tree creation and management process time and the amount of memory used.

See createTable description 4.2.2 for more information on the where, name, title, and filters parameters.

4.2.2.6. createVLArray(where, name, atom=None, title='', filters=None, expectedsizeinMB=1.0)

Create a new VLArray instance with name name in where location. See the section 4.13 for a description of the VLArray class.

atom

An Atom instance representing the shape, type and flavor of the atomic object to be saved. See section 4.16.3 for the supported set of Atom class descendants.

expectedsizeinMB

An user estimate about the size (in MB) in the final VLArray object. If not provided, the default value is 1 MB. If you plan to create both much smaller or much bigger VLA's try providing a guess; this will optimize the HDF5 B-Tree creation and management process time and the amount of memory used.

See createTable description 4.2.2 for more information on the where, name, title, and filters parameters.

4.2.2.7. getNode(where, name=None, classname=None)

Get the node under where with the given name.

where can be a Node instance or a path string leading to a node. If no name is specified, that node is returned.

If a name is specified, this must be a string with the name of a node under where. In this case the where argument can only lead to a Group instance (else a TypeError is raised). The node called name under the group where is returned.

In both cases, if the node to be returned does not exist, a NoSuchNodeError is raised. Please, note that hidden nodes are also considered.

If the classname argument is specified, it must be the name of a class derived from Node. If the node is found but it is not an instance of that class, a NoSuchNodeError is also raised.

4.2.2.8. isVisibleNode(path)

Is the node under path visible?

If the node does not exist, a NoSuchNodeError is raised.

4.2.2.9. getNodeAttr(where, attrname, name=None)

Returns the attribute attrname under where.name location.

where, name

These arguments work as in getNode() (see [here]), referencing the node to be acted upon.

attrname

The name of the attribute to get.

4.2.2.10. setNodeAttr(where, attrname, attrvalue, name=None)

Sets the attribute attrname with value attrvalue under where.name location. If the node already has a large number of attributes, a PerformanceWarning will be issued.

where, name

These arguments work as in getNode() (see [here]), referencing the node to be acted upon.

attrname

The name of the attribute to set on disk.

attrvalue

The value of the attribute to set. Any kind of python object (like string, ints, floats, lists, tuples, dicts, small Numeric/NumPy/numarray objects...) can be stored as an attribute. However, if necessary, (c)Pickle is automatically used so as to serialize objects that you might want to save (see 4.15 for details).

4.2.2.11. delNodeAttr(where, attrname, name=None)

Delete the attribute attrname in where.name location.

where, name

These arguments work as in getNode() (see [here]), referencing the node to be acted upon.

attrname

The name of the attribute to delete on disk.

4.2.2.12. copyNodeAttrs(where, dstnode, name=None)

Copy the attributes from node where.name to dstnode.

where, name

These arguments work as in getNode() (see [here]), referencing the node to be acted upon.

dstnode

This is the destination node where the attributes will be copied. It can be either a path string or a Node object.

4.2.2.13. iterNodes(where, classname=None)

Returns an iterator yielding children nodes hanging from where. These nodes are alpha-numerically sorted by its node name.

where

This argument works as in getNode() (see [here]), referencing the node to be acted upon.

classname

If the name of a class derived from Node is supplied in the classname parameter, only instances of that class (or subclasses of it) will be returned.

4.2.2.14. listNodes(where, classname=None)

Returns a list with children nodes hanging from where. The list is alpha-numerically sorted by node name.

where

This argument works as in getNode() (see [here]), referencing the node to be acted upon.

classname

If the name of a class derived from Node is supplied in the classname parameter, only instances of that class (or subclasses of it) will be returned.

4.2.2.15. removeNode(where, name=None, recursive=False)

Removes the object node name under where location.

where, name

These arguments work as in getNode() (see [here]), referencing the node to be acted upon.

recursive

If not supplied, the object will be removed only if it has no children; if it does, a NodeError will be raised. If supplied with a true value, the object and all its descendants will be completely removed.

4.2.2.16. copyNode(where, newparent=None, newname=None, name=None, overwrite=False, recursive=False, **kwargs)

Copy the node specified by where and name to newparent/newname.

where, name

These arguments work as in getNode() (see [here]), referencing the node to be acted upon.

newparent

The destination group that the node will be copied to (a path name or a Group instance). If newparent is None, the parent of the source node is selected as the new parent.

newname

The name to be assigned to the new copy in its destination (a string). If newname is None or not specified, the name of the source node is used.

overwrite

Whether the possibly existing node newparent/newname should be overwritten or not. Note that trying to copy over an existing node without overwriting it will issue a NodeError.

recursive

Specifies whether the copy should recurse into children of the copied node. This argument is ignored for leaf nodes. The default is not recurse.

kwargs

Additional keyword arguments may be passed to customize the copying process. The supported arguments depend on the kind of node being copied. The following are some of them:

title

The new title for the destination. If None, the original title is used. This only applies to the topmost node for recursive copies.

filters

Specifying this parameter overrides the original filter properties in the source node. If specified, it must be an instance of the Filters class (see section 4.17.1). The default is to copy the filter attribute from the source node.

copyuserattrs

You can prevent the user attributes from being copied by setting this parameter to False. The default is to copy them.

start, stop, step

Specify the range of rows in child leaves to be copied; the default is to copy all the rows.

stats

This argument may be used to collect statistics on the copy process. When used, it should be a dictionary with keys groups, leaves and bytes having a numeric value. Their values will be incremented to reflect the number of groups, leaves and bytes, respectively, that have been copied in the operation.

4.2.2.17. renameNode(where, newname, name=None)

Change the name of the node specified by where and name to newname.

where, name

These arguments work as in getNode() (see [here]), referencing the node to be acted upon.

newname

The new name to be assigned to the node (a string).

4.2.2.18. moveNode(where, newparent=None, newname=None, name=None, overwrite=False)

Move the node specified by where and name to newparent/newname.

where, name

These arguments work as in getNode() (see [here]), referencing the node to be acted upon.

newparent

The destination group the node will be moved to (a path name or a Group instance). If newparent is None, the original node parent is selected as the new parent.

newname

The new name to be assigned to the node in its destination (a string). If newname is None or not specified, the original node name is used.

4.2.2.19. walkGroups(where='/')

Iterator that returns the list of Groups (not Leaves) hanging from (and including) where. The where Group is listed first (pre-order), then each of its child Groups (following an alpha-numerical order) is also traversed, following the same procedure. If where is not supplied, the root object is used.

where

The origin group. Can be a path string or Group instance.

4.2.2.20. walkNodes(where="/", classname="")

Recursively iterate over the nodes in the File instance. It takes two parameters:

where

If supplied, the iteration starts from (and includes) this group.

classname

(String) If supplied, only instances of this class are returned.

Example of use:


	      # Recursively print all the nodes hanging from '/detector'
	      print "Nodes hanging from group '/detector':"
	      for node in h5file.walkNodes("/detector"):
	          print node
	    

4.2.2.21. copyChildren(srcgroup, dstgroup, overwrite=False, recursive=False, **kwargs)

Copy the children of a group into another group.

This method copies the nodes hanging from the source group srcgroup into the destination group dstgroup. Existing destination nodes can be replaced by asserting the overwrite argument. If the recursive argument is true, all descendant nodes of srcnode are recursively copied.

kwargs takes keyword arguments used to customize the copying process. See the documentation of Group._f_copyChildren() (see 4.4.2) for a description of those arguments.

4.2.2.22. copyFile(dstfilename, overwrite=False, **kwargs)

Copy the contents of this file to dstfilename.

dstfilename must be a path string indicating the name of the destination file. If it already exists, the copy will fail with an IOError, unless the overwrite argument is true, in which case the destination file will be overwritten in place. In this last case, the destination file should be closed or ugly errors will happen.

Additional keyword arguments may be passed to customize the copying process. For instance, title and filters may be changed, user attributes may be or may not be copied, data may be sub-sampled, stats may be collected, etc. Arguments unknown to nodes are simply ignored. Check the documentation for copying operations of nodes to see which options they support.

Copying a file usually has the beneficial side effect of creating a more compact and cleaner version of the original file.

4.2.2.23. flush()

Flush all the leaves in the object tree.

4.2.2.24. close()

Flush all the leaves in object tree and close the file.

4.2.2.25. Undo/Redo support

isUndoEnabled()

Is the Undo/Redo mechanism enabled?

Returns True if the Undo/Redo mechanism has been enabled for this file, False otherwise. Please, note that this mechanism is persistent, so a newly opened PyTables file may already have Undo/Redo support.

enableUndo(filters=Filters(complevel=1))

Enable the Undo/Redo mechanism.

This operation prepares the database for undoing and redoing modifications in the node hierarchy. This allows mark(), undo(), redo() and other methods to be called.

The filters argument, when specified, must be an instance of class Filters (see section 4.17.1) and is meant for setting the compression values for the action log. The default is having compression enabled, as the gains in terms of space can be considerable. You may want to disable compression if you want maximum speed for Undo/Redo operations.

Calling enableUndo() when the Undo/Redo mechanism is already enabled raises an UndoRedoError.

disableUndo()

Disable the Undo/Redo mechanism.

Disabling the Undo/Redo mechanism leaves the database in the current state and forgets past and future database states. This makes mark(), undo(), redo() and other methods fail with an UndoRedoError.

Calling disableUndo() when the Undo/Redo mechanism is already disabled raises an UndoRedoError.

mark(name=None)

Mark the state of the database.

Creates a mark for the current state of the database. A unique (and immutable) identifier for the mark is returned. An optional name (a string) can be assigned to the mark. Both the identifier of a mark and its name can be used in undo() and redo() operations. When the name has already been used for another mark, an UndoRedoError is raised.

This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.

getCurrentMark()

Get the identifier of the current mark.

Returns the identifier of the current mark. This can be used to know the state of a database after an application crash, or to get the identifier of the initial implicit mark after a call to enableUndo().

This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.

undo(mark=None)

Go to a past state of the database.

Returns the database to the state associated with the specified mark. Both the identifier of a mark and its name can be used. If the mark is omitted, the last created mark is used. If there are no past marks, or the specified mark is not older than the current one, an UndoRedoError is raised.

This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.

redo(mark=None)

Go to a future state of the database.

Returns the database to the state associated with the specified mark. Both the identifier of a mark and its name can be used. If the mark is omitted, the next created mark is used. If there are no future marks, or the specified mark is not newer than the current one, an UndoRedoError is raised.

This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.

goto(mark)

Go to a specific mark of the database.

Returns the database to the state associated with the specified mark. Both the identifier of a mark and its name can be used.

This method can only be called when the Undo/Redo mechanism has been enabled. Otherwise, an UndoRedoError is raised.

4.2.3. File special methods

Following are described the methods that automatically trigger actions when a File instance is accessed in a special way.

4.2.3.1. __contains__(path)

Is there a node with that path?

Returns True if the file has a node with the given path (a string), False otherwise.

4.2.3.2. __iter__()

Iterate over the children on the File instance. However, this does not accept parameters. This iterator is recursive.

Example of use:


	      # Recursively list all the nodes in the object tree
	      h5file = tables.openFile("vlarray1.h5")
	      print "All nodes in the object tree:"
	      for node in h5file:
	          print node
	    

4.2.3.3. __str__()

Prints a short description of the File object.

Example of use:


>>> f=tables.openFile("data/test.h5")
>>> print f
data/test.h5 (File) 'Table Benchmark'
Last modif.: 'Mon Sep 20 12:40:47 2004'
Object Tree:
/ (Group) 'Table Benchmark'
/tuple0 (Table(100L,)) 'This is the table title'
/group0 (Group) ''
/group0/tuple1 (Table(100L,)) 'This is the table title'
/group0/group1 (Group) ''
/group0/group1/tuple2 (Table(100L,)) 'This is the table title'
/group0/group1/group2 (Group) ''
	    

4.2.3.4. __repr__()

Prints a detailed description of the File object.