Welcome to XMLResolver.org

As you can tell from the clever name, this site is about an XML Resolver. (The code is over on github.) Many (Java-based) XML APIs include features for “resolvers” of various sorts. For example, many XML parsers allow you to define a “entity resolver” that can intercept attempts to load system identifiers. Schema processors provide a “URI resolver” that lets you intercept schema module URIs. Stylesheet and query processors have similar APIs for intercepting stylesheet and query modules.

The resolver APIs exist because it’s sometimes useful in applications to return a locally cached resource instead of the resource actually requested. It’s a significant feature of the web that you can dereference the URI

http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

and find out that it’s the DTD for XHTML. It is not, however, desireable that everyone should always dereference that URI to get the XHTML DTD. It hasn’t changed in more than a decade and there’s no reason to believe it will ever change again.

I know, DTDs are unfashionable and XHTML has measles or some other disease against which the world should have been vaccinated, but I chose that example with care. The W3C web server gets so many requests for the XHTML DTD that it goes out of its way to make retrieving it painful.

Go ahead, download that DTD. You’ll find that the server introduces a significant delay before returning the data and if you get it often enough they’ll lock you out for 24 hours or something.

Point being: there are lots of URIs which you can usefully cache locally.

There are basically two approaches to local caching: you can setup a proxy server and have it cache things for you, or you can use XML Catalogs. Oh, I don’t dispute there might be other approaches, but those are the two common, obvious ones.

The advantage of the local caching proxy is that it’s automatic. It caches the resources you request according to whatever criteria you establish, it works transparently in the background. No muss, no fuss. Well, except for the fact that you have to install and setup a local caching proxy. You have to use it everywhere. You might have to chain it together with your corporate caching proxy. You also have to configure the criteria for local caching. I find its advantages are a lot more theoretical than practical.

The XML Resolver project is about doing it with catalogs.

XML Catalogs

Catalogs are straightforward, you provide an XML document that has mappings from identifiers that might appear in documents to local resources that should be returned for those identifiers.

Here’s an example:

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <system systemId="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
          uri="/share/dtds/xhtml1-strict.dtd"/>
</catalog>

If you load that catalog, attempts to obtain the XHTML DTD from the W3C will be satisfied by a local copy of the DTD obtained from the /share/dtds/xhtml1-strict.dtd.

How to use XML Resolver

The simplest possible thing you can do is instantiate an instance of ort.xmlresolver.Resolver and use it as the resolver for your parser. The Resolver class implements the following resolvers:

Another simple integration point is to instantiate org.xmlresolver.tools.ResolvingXMLReader as your XML parser.

Configuring XML Resolver

The Resolver classes use either Java system properties or a standard Java properties file to establish an initial environment. The property file, if it is used, must be called xmlresolver.properties and must be somewhere on your CLASSPATH.footnote:[For backwards compatibility, the name catalogmanager.properties may also be used. Use the system property xmlresolver.properties to specify a name (or, technically, a semicolon separated list of names) explicitly.]

The resolver searches for a property file by looking in the following places, in this order:

The following features may be configured with properties.

The initial list of catalog files

A semicolon-delimited list of catalog files. These are the catalog files that are initially consulted for resolution.

Unless you are incorporating the resolver classes into your own applications, and subsequently establishing an initial set of catalog files through some other means, at least one file must be specified, or all resolution will fail.

Preference for public or system identifiers

The initial prefer setting, either public or system.

Obey oasis-xml-catalog processing instruction

This setting allows you to toggle whether or not the resolver classes obey the <?oasis-xml-catalog?> processing instruction.

Support relative catalog paths

If relative-catalogs is true, relative catalogs in the catalogs property list will be left relative; otherwise they will be made absolute with respect to the base URI of the properties file from which they came.

This setting has no effect on catalogs loaded from the xml.catalogs.files system property (which are always returned unchanged).

Cache documents

The cache properties specify the directory in which the XML Resolver should attempt to cache files that fail to resolve locally. If, instead, one of the cacheUnderHome properties is set, the cache directory will default to $HOME/.xmlresolver/cache.

Schemes to cache

Specifies whether or not URIs of type scheme will be cached. If not specified, the default is “true” for all schemes except file.

Example catalog properties file

My XMLResolver.properties file looks like this:

# XMLResolver.properties

relative-catalogs=yes

# Always use semicolons in this list
catalogs=./catalog.xml;/home/ndw/Documents/catalog.xml

prefer=public

cache=/Users/ndw/.xmlresolver/cache

See also