Mark Johnson, 5 Jul 2001
This document mostly consists of user-oriented excerpts from the original README document.
These Java classes implement the OASIS Entity Management Catalog format as well as an XML Catalog format for resolving XML public identifiers into accessible files or resources on a user's system or throughout the Web. These definitions can easily be incorporated into most Java-based XML processors, thereby giving the users of these processors all the benefits of public identifier use.
For more information, see also the Standard Deviations from Norm column " If You Can Name It, You Can Claim It!"
apt-get install arbortext-catalog
This section provides a very brief overview of the classes. For more complete information, see the API Documentation.
The sample applications demonstrate some of the features of Catalogs. Each of the examples that follows assumes that you're current working directory is the directory where you unpacked the catalog distribution.
The catalog program parses one or more Catalog files and performs a single lookup based on catalog keywords. Running catalog with no arguments will display a summary of the following usage information.
Usage: catalog [options] command The catalog program parses one or more Catalog files and performs a single lookup of a public or system identifier. Running catalog with no arguments will display a summary of this usage information. Options: -h Print this help message -E Show usage examples -g Use GNU gij java interpreter -c <catalogfile> Can be repeated to load several catalogs -d <debuglevel> Parsing verbosity, an integer in [0-3] -p <parserClass> Name of a parser class for reading Cowan XML Catalogs -s Load system catalogs & give them a higher search precedence than catalogs specified via -c <catalogfile> Note: to use the -p option, the relevant class files needed by <parserClass> must be in your CLASSPATH Commands take one of the following forms: document doctype name publicid systemid entity name publicid systemid notation name publicid systemid public publicid systemid system systemid Arguments are positional, use the string "null" to indicate a null value. 'catalog' usage examples: ========================= --Input: catalog -s public "-//OASIS//DTD DocBook MathML Module V1.0//EN" --Output: Loading system catalogs. Set debug to: 0 Resolving public: Public: -//OASIS//DTD DocBook MathML Module V1.0//EN System: null Resolved: file:/usr/share/sgml/docbook/custom/mathml/1.0/dbmathml.dtd --------------------------------------------------------------------- --Input: catalog -s -d 0 \ system "http://www.oasis-open.org/docbook/xml/mathml/1.0/dbmathml.dtd" --Output: Loading system catalogs. Set debug to: 0 Resolving system: Public: null System: http://www.oasis-open.org/docbook/xml/mathml/1.0/dbmathml.dtd Resolved: file:/usr/share/sgml/docbook/custom/mathml/1.0/dbmathml.dtd --------------------------------------------------------------------- --Input: catalog -c http://oasis-open.org/docbook/xml/4.1.2/docbook.cat \ public "-//OASIS//DTD DocBook XML V4.1.2//EN" --Output: Ignoring system catalogs. Set debug to: 0 Adding catalog: http://oasis-open.org/docbook/xml/4.1.2/docbook.cat Resolving public: Public: -//OASIS//DTD DocBook XML V4.1.2//EN System: null Resolved: http://oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
In the following example, catalog loads the OASIS Catalog file test/catalog, looks up the requested public identifier, and displays the resulting system identifier.
$ catalog -c test/catalog public "-//Arbortext//TEXT Test Public Identifier//EN"
(with the whole command on a single line, naturally).
$ catalog -d 0 -c /etc/sgml/catalog public "-//OASIS//DTD DocBook XML V4.1.2//EN" Ignoring system catalogs. Set debug to: 0 Adding catalog: /etc/sgml/catalog Resolving PUBLIC: Public: -//OASIS//DTD DocBook XML V4.1.2//EN System: null Resolved: file:/usr/share/sgml/docbook/dtd/xml/4.1.2/docbookx.dtd
There are a number of options that you can pass to the catalog program:
catalog command line options | ||
---|---|---|
Option | Example | Description |
-c catalogfile | -c test/catalog | Load the specified catalog file. |
-d debuglevel | -d 1 | Set the debug level; the default debug level is 3. |
-p parserClass | -p org.apache.xerces.parsers.SAXParser | Select the SAX Parser to use to parse XML Catalog files. |
-s | -s | Load system catalogs. |
Running catalog with no arguments will display a summary of this usage information.
Note: in order to use the
-p
option, you will need to have the relevant class files for the parser
class that you select on your CLASSPATH. In the example above, the
Xerces parser from the
Apache XML Project would be required. You
can use any SAX compliant parser
with the Catalog files.
The eresolve application demonstrates the use
of a CatalogEntityResolver
class as a SAX entityResolver
hook.
$ eresolve -c test/catalog test/test.xml
(with the whole command on a single line, naturally).
$ eresolve -d 2 -c test/catalog test/test.xml Set debug to 2 Adding catalog: test/catalog Loading catalog: test/catalog Parsing test/test.xml Resolved: -//Arbortext//TEXT Test Public Identifier//EN file:/N:/viewstores/nwalsh_saffron/Epic/src/xml/catalog/test/testpub.xml Resolved: urn:x-arbortext:test-system-identifier file:/N:/viewstores/nwalsh_saffron/Epic/src/xml/catalog/test/testsys.xml Done parsing test/test.xml
Brief descriptions of the sample files:
This is a Catalog with a few simple entries:
OVERRIDE YES PUBLIC "-//Arbortext//TEXT Test Public Identifier//EN" "testpub.xml" SYSTEM "urn:x-arbortext:test-system-identifier" "testsys.xml" OVERRIDE NO PUBLIC "-//Arbortext//TEXT Test Override//EN" "override.xml"
This is a test document that contains several external entities:
<!DOCTYPE test [ <!ENTITY testpub PUBLIC "-//Arbortext//TEXT Test Public Identifier//EN" "bogus-system-identifier.xml"> <!ENTITY testsys SYSTEM "urn:x-arbortext:test-system-identifier">> <!ENTITY testovr PUBLIC "-//Arbortext//TEXT Test Override//EN" "testovr.xml"> ] > <test> &testpub; &testsys; &testovr; </test>
This XML document demonstrates several Catalog features:
If parsed without a catalog, the parse will fail since bogus-system-identifier.xml won't be found (and neither would the URN, unless you happen to have some other URN resolution mechanism running).
If parsed with the included catalog, the following substitutions will be made:
&testpub; will be replaced with the contents of testpub.xml, due to the mapping provided by the first PUBLIC entry in the catalog.
&testsys; will be replaced with the contents of testsys.xml, due to the mapping provided by the SYSTEM entry in the catalog.
&testovr; will be replaced with the contents of testovr.xml, due to the system identifier given in its entity declaration; the mapping provided by the second PUBLIC entry in the catalog is not used because the entity declaration did provide a system identifier and the matching public identifier occurs where OVERRIDE is NO.
In this example, the system catalog path is set to
test/catalog and the XML Parser is asked to parse test/test.xml.
In the course of this parsing, it will encounter entities which need to be
resolved. The SAX entityResolver
hook will use the catalog to
locate appropriate resources. If you attempt to parse test/test.xml
without a catalog, the parse will fail.
This example program uses the Xerces parser from the Apache XML Project and those classes must be available in order to run eresolve. You can add the Catalog functionality to any SAX compliant parser, but for the purpose of this example, we've explicitly chosen the Xerces parser.
This code is placed in the public domain.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ARBORTEXT OR ANY OTHER CONTRIBUTOR BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.