PyTables User's Guide: Hierarchical datasets in Python - Release 1.3.2 | ||
---|---|---|
Prev | Chapter 6. FileNode - simulating a filesystem with PyTables | Next |
The FileNode module is part of the nodes sub-package of PyTables. The recommended way to import the module is:
>>> from tables.nodes import FileNode
However, FileNode exports very few symbols, so you can import * for interactive usage. In fact, you will most probably only use the NodeType constant and the newNode() and openNode() calls.
The NodeType constant contains the value that the NODE_TYPE system attribute of a node file is expected to contain ('file', as we have seen). Although this is not expected to change, you should use FileNode.NodeType instead of the literal 'file' when possible.
newNode() and openNode() are the equivalent to the Python file() call (alias open()) for ordinary files. Their arguments differ from that of file(), but this is the only point where you will note the difference between working with a node file and working with an ordinary file.
For this little tutorial, we will assume that we have a PyTables database opened for writing. Also, if you are somewhat lazy at typing sentences, the code that we are going to explain is included in the examples/filenodes1.py file.
You can create a brand new file with these sentences:
>>> import tables >>> h5file = tables.openFile('fnode.h5', 'w')
Creation of a new file node is achieved with the newNode() call. You must tell it in which PyTables file you want to create it, where in the PyTables hierarchy you want to create the node and which will be its name. The PyTables file is the first argument to newNode(); it will be also called the 'host PyTables file'. The other two arguments must be given as keyword arguments where and name, respectively. As a result of the call, a brand new appendable and readable file node object is returned.
So let us create a new node file in the previously opened h5file PyTables file, named 'fnode_test' and placed right under the root of the database hierarchy. This is that command:
>>> fnode = FileNode.newNode(h5file, where='/', name='fnode_test')
That is basically all you need to create a file node. Simple, isn't it? From that point on, you can use fnode as any opened Python file (i.e. you can write data, read data, lines of text and so on).
newNode() accepts some more keyword arguments. You can give a title to your file with the title argument. You can use PyTables' compression features with the filters argument. If you know beforehand the size that your file will have, you can give its final file size in bytes to the expectedsize argument so that the PyTables library would be able to optimize the data access.
newNode() creates a PyTables node where it is told to. To prove it, we will try to get the NODE_TYPE attribute from the newly created node.
>>> print h5file.getNodeAttr('/fnode_test', 'NODE_TYPE') file
As stated above, you can use the new node file as any other opened file. Let us try to write some text in and read it.
>>> print >> fnode, "This is a test text line." >>> print >> fnode, "And this is another one." >>> print >> fnode >>> fnode.write("Of course, file methods can also be used.") >>> >>> fnode.seek(0) # Go back to the beginning of file. >>> >>> for line in fnode: ... print repr(line) 'This is a test text line.\n' 'And this is another one.\n' '\n' 'Of course, file methods can also be used.'
This was run on a Unix system, so newlines are expressed as '\n'. In fact, you can override the line separator for a file by setting its lineSeparator property to any string you want.
While using a file node, you should take care of closing it before you close the PyTables host file. Because of the way PyTables works, your data it will not be at a risk, but every operation you execute after closing the host file will fail with a ValueError. To close a file node, simply delete the corresponding reference it or call its close() method.
>>> fnode.close() >>> print fnode.closed True
If you have a file node that you created using newNode(), you can open it later by calling openNode(). Its arguments are similar to that of file() or open(): the first argument is the PyTables node that you want to open (i.e. a node with a NODE_TYPE attribute having a 'file' value), and the second argument is a mode string indicating how to open the file. Contrary to file(), openNode() can not be used to create a new file node.
File nodes can be opened in read-only mode ('r') or in read-and-append mode ('a+'). Reading from a file node is allowed in both modes, but appending is only allowed in the second one. Just like Python files do, writing data to an appendable file places it after the file pointer if it is on or beyond the end of the file, or otherwise after the existing data. Let us see an example:
>>> node = h5file.root.fnode_test >>> fnode = FileNode.openNode(node, 'a+') >>> print repr(fnode.readline()) 'This is a test text line.\n' >>> print fnode.tell() 26 >>> print >> fnode, "This is a new line." >>> print repr(fnode.readline()) ''
Of course, the data append process places the pointer at the end of the file, so the last readline() call hit EOF. Let us seek to the beginning of the file to see the whole contents of our file.
>>> fnode.seek(0) >>> for line in fnode: ... print repr(line) 'This is a test text line.\n' 'And this is another one.\n' '\n' 'Of course, file methods can also be used.This is a new line.\n'
As you can check, the last string we wrote was correctly appended at the end of the file, instead of overwriting the second line, where the file pointer was positioned by the time of the appending.
You can associate arbitrary metadata to any open node file, regardless of its mode, as long as the host PyTables file is writable. Of course, you could use the setNodeAttr() method of tables.File to do it directly on the proper node, but FileNode offers a much more comfortable way to do it. FileNode objects have an attrs property which gives you direct access to their corresponding AttributeSet object.
For instance, let us see how to associate MIME type metadata to our file node:
>>> fnode.attrs.content_type = 'text/plain; charset=us-ascii'
As simple as A-B-C. You can put nearly anything in an attribute, which opens the way to authorship, keywords, permissions and more. Moreover, there is not a fixed list of attributes. However, you should avoid names in all caps or starting with '_', since PyTables and FileNode may use them internally. Some valid examples:
>>> fnode.attrs.author = "Ivan Vilata i Balaguer" >>> fnode.attrs.creation_date = '2004-10-20T13:25:25+0200' >>> fnode.attrs.keywords_en = ["FileNode", "test", "metadata"] >>> fnode.attrs.keywords_ca = ["FileNode", "prova", "metadades"] >>> fnode.attrs.owner = 'ivan' >>> fnode.attrs.acl = {'ivan': 'rw', '@users': 'r'}
You can check that these attributes get stored by running the ptdump command on the host PyTables file:
$ ptdump -a fnode.h5:/fnode_test /fnode_test (EArray(113,)) '' /fnode_test.attrs (AttributeSet), 14 attributes: [CLASS := 'EARRAY', EXTDIM := 0, FLAVOR := 'numarray', NODE_TYPE := 'file', NODE_TYPE_VERSION := 2, TITLE := '', VERSION := '1.2', acl := {'ivan': 'rw', '@users': 'r'}, author := 'Ivan Vilata i Balaguer', content_type := 'text/plain; charset=us-ascii', creation_date := '2004-10-20T13:25:25+0200', keywords_ca := ['FileNode', 'prova', 'metadades'], keywords_en := ['FileNode', 'test', 'metadata'], owner := 'ivan']
Note that FileNode makes no assumptions about the meaning of your metadata, so its handling is entirely left to your needs and imagination.