Chapter 6What you can configure

This chapter lists all of the configurable settings in XML Resolver.

1The initial list of catalog files

Configuration on Java

  • Feature ResolverFeature.CATALOG_FILES (type: List<String>)

  • System property xml.catalog.files

  • Property file property catalogs

Configuration on .NET

  • Feature ResolverFeature.CATALOG_FILES (type: List<string>)

  • Environment variable XML_CATALOG_FILES

  • Property file property catalogs

A semicolon-delimited list of catalog files. These are the catalog files that are initially consulted for resolution. If no catalog files are specified, by default the resolver will attempt to use a file named catalog.xml in the current directory as a catalog.

2A list of additional catalog files

Configuration on Java

  • Feature ResolverFeature.CATALOG_ADDITIONS (type: List<String>)

  • System property xml.catalog.additions

  • Property file property catalog-additions

Configuration on .NET

  • Feature ResolverFeature.CATALOG_ADDITIONS (type: List<string>)

  • Environment variable XML_CATALOG_ADDITIONS

  • Property file property catalogAdditions

A semicolon-delimited list of catalog files. This list is used in addition to the initial list of catalog files.

If you attempt to use both a system property and a property from a property file to create the initial list of catalog files, you’ll only get one or the other. (See prefer-property-file.)

This property provides a way to add to the current list of files. For example, suppose you use a global properties file to initialize the resolver, but for a particular application you want to search additional catalogs. You can specify them in the xml.catalog.additions system property and they’ll be appended to the list instead of replacing the list entirely as setting xml.catalog.files would.

3Specify protocol(s) allowed for URI access

Configuration on Java

  • Feature ResolverFeature.ACCESS_EXTERNAL_DOCUMENT (type: String)

  • System property xml.catalog.accessExternalDocument

  • Property file property access-external-document

Configuration on .NET

  • This setting has no equivalent on .NET

This feature restricts the protocols that are allowed during URI resolution. If an attempt is made to resolve a document and the URI for that document uses a protocol not listed in this feature, the request is rejected. See JAXP 185.

The keyword “all” allows all protocols. An empty string allows none.

The default value for this feature is “all”.

4Specify protocol(s) allowed for entity access

Configuration on Java

  • Feature ResolverFeature.ACCESS_EXTERNAL_ENTITY (type: String)

  • System property xml.catalog.accessExternalEntity

  • Property file property access-external-entity

Configuration on .NET

  • This setting has no equivalent on .NET

This feature restricts the protocols that are allowed during entity resolution. If an attempt is made to resolve an entity and the URI for that entity uses a protocol not listed in this feature, the request is rejected. See JAXP 185.

The keyword “all” allows all protocols. An empty string allows none.

The default value for this feature is “all”.

5Catalogs to load from assemblies

Configuration on Java

  • This setting has no equivalent on Java.

Configuration on .NET

  • Feature ResolverFeature.ASSEMBLY_CATALOGS (type: List<string>)

This setting can be used to load additional catalogs from other assemblies linked into your application.

Starting in version 2.0.0, the assembly catalogs should be identified by their AssemblyName, not by the name of the DLL as previously.

6Load the XmlResolverData assembly catalog?

Configuration on Java

  • This setting has no equivalent on Java.

Configuration on .NET

  • Feature ResolverFeature.USE_DATA_ASSEMBLY (type: bool)

  • Environment variable XML_CATALOG_USE_DATA_ASSEMBLY

  • Property file property useDataAssembly

This setting, which defaults to true, tells the resolver that it should always attempt to use the catalog in the XmlResolverData.dll assembly. This assembly is distributed as a separate NuGet package and includes a variety of common resources.

This setting has no effect unless you’ve linked that assembly into your application.

7Load catalogs from the classpath

Configuration on Java

  • Feature ResolverFeature.CLASSPATH_CATALOGS (type: Boolean)

  • System property xml.catalog.classpathCatalogs

  • Property file property classpath-catalogs

Configuration on .NET

  • This setting has no equivalent on .NET

Load catalog files from the classpath. If this property is true, the resolver will search for all of the files named org/xmlresolver/catalog.xml on the classpath and add each of them to the end of the catalog list.

8Preference for public or system identifiers

Configuration on Java

  • Feature ResolverFeature.PREFER_PUBLIC (type: Boolean)

  • System property xml.catalog.prefer

  • Property file property prefer

Configuration on .NET

  • Feature ResolverFeature.PREFER_PUBLIC (type: bool)

  • Environment variable XML_CATALOG_PREFER

  • Property file property prefer

The initial prefer setting, either public or system.

9Obey oasis-xml-catalog processing instruction

Configuration on Java

  • Feature ResolverFeature.ALLOW_CATALOG_PI (type: Boolean)

  • System property xml.catalog.allowPI

  • Property file property allow-oasis-xml-catalog-pi

Configuration on .NET

  • Feature ResolverFeature.ALLOW_CATALOG_PI (type: bool)

  • Environment variable XML_CATALOG_ALLOW_PI

  • Property file property allowOasisXmlCatalogPi

This setting allows you to toggle whether or not the resolver classes obey the <?oasis-xml-catalog?> processing instruction.

If you never use the processing instruction, you can get a very tiny performance improvement by disabling this feature. (If this feature is enabled, the parser has to create a copy of the resolver configuration for every parse.)

10Always resolve resources

Configuration on Java

  • Feature ResolverFeature.ALWAYS_RESOLVE (type: Boolean)

  • System property xml.catalog.alwaysResolve

  • Property file property always-resolve

Configuration on .NET

  • This setting has no equivalent on .NET

The standard contract for the Java resolver APIs is that they return null if the resolver doesn’t find a match. But on the modern web, lots of URIs redirect (from http: to https: especially), and some parsers don’t follow redirects. That causes the parse to fail in ways that may not be easy for the user to fix.

Starting in version 5.0.0, the resolver will always resolve resources, follow redirects, and return a stream. This deprives the parser of the option to try something else, but means that redirects don’t cause the parse to file.

This feature is enabled by default. If you set it to false, the resolver will return null if the resource isn’t found in the catalog.

I don’t know of any parsers that try anything else after the resolver has failed except loading the resource, so I expect this to be an improvement for users. If your implementation wants to explicitly just check the catalog, at the Java API level, you can use the CatalogManager API. That’s the same API the resolver classes use to locate resources in the catalog.

11Support relative catalog paths

Configuration on Java

  • Property file property relative-catalogs

Configuration on .NET

  • Property file property relativeCatalogs

If relative-catalogs is true, relative filenames in the catalogs property list will be made absolute relative to the current working directory; otherwise they will be made absolute with respect to the base URI of the properties file from which they came.

This setting has no effect on catalogs loaded from the xml.catalogs.files system property which are always made absolute with respect to the current working directory.

12Fix system identifiers on Windows.

This feature is new in XML Resolver 4.4.0.

Configuration on Java

  • Feature ResolverFeature.FIX_WINDOWS_SYSTEM_IDENTIFIERS (type: Boolean)

  • System property xml.catalog.fixWindowsSystemIdentifiers

  • Property file property fix-windows-system-identifiers

Configuration on .NET

  • Feature ResolverFeature.FIX_WINDOWS_SYSTEM_IDENTIFIERS (type: bool)

  • Environment variable XML_CATALOG_FIX_WINDOWS_SYSTEM_IDENTIFIERS

  • Property file property fixWindowsSystemIdentifiers

Windows uses the backslash (“\”) instead of the forward slash as a path separator. The URI specification does not allow unescaped backslashes to appear in URIs, but as a practical reality, many users think of them as filenames and may simply copy filesystem paths as system identifiers. On Windows, this will cause “URI syntax” exceptions if an attempt is made to resolve them.

If FIX_WINDOWS_SYSTEM_IDENTIFIERS is true, and the resolver is running on a Windows system, backslashes in system identifiers are replaced with forward slashes before any resolution is attempted.

13Cache documents

Configuration on Java

  • Features ResolverFeature.CACHE (type: ResourceCache), ResolverFeature.CACHE_DIRECTORY (type: String), ResolverFeature.CACHE_UNDER_HOME (type: Boolean), ResolverFeature.CACHE_ENABLED (type: Boolean)

  • System properties xml.catalog.cache, xml.catalog.cacheUnderHome, xml.catalog.cacheEnabled

  • Property file properties cache, cache-under-home, cache-enabled

Configuration on .NET

  • Features ResolverFeature.CACHE (type: ResourceCache), ResolverFeature.CACHE_DIRECTORY (type: string), ResolverFeature.CACHE_UNDER_HOME (type: bool), ResolverFeature.CACHE_ENABLED (type: bool)

  • Environment variables XML_CATALOG_CACHE, XML_CATALOG_CACHE_UNDER_HOME, XML_CATALOG_CACHE_ENABED.

  • Property file properties catalogCache, cacheUnderHome, catalogCacheEnabled.

The cache properties specify the directory in which the XML Resolver should attempt to cache files that fail to resolve locally. If, instead, one of the “cache under home” properties is set, the cache directory will default to $HOME/.xmlresolver/cache.

If “cache enabled” is explicitly set to “false”, no attempt will be made to create or use a cache.

If the xmlresolver.offline system property is set, no documents will expire from the cache, regardless of their age.

14Prefer property file values

Configuration on Java

  • Feature ResolverFeature.PREFER_PROPERTY_FILE (type: Boolean)

  • System property xml.catalog.preferPropertyFile

  • Property file property prefer-property-file

Configuration on .NET

  • Feature ResolverFeature.PREFER_PROPERTY_FILE (type: bool)

  • Environment variable XML_CATALOG_PREFER_PROPERTY_FILE

  • Property file property preferPropertyFile

Prefer properties from the properties file. If a property file is loaded to configure the resolver and one of the properties in that file is also specified as a system property, the system property takes precedence. If you’d prefer to have the property file take precedence (as was the case in some earlier versions), set the “prefer property file” property to true.

15The SAX parser factory class

Configuration on Java

  • Feature ResolverFeature.SAXPARSERFACTORY_CLASS (type: String)

  • System property xml.catalog.saxParserFactoryClass

  • Property file property saxparserfactory-class

Configuration on .NET

  • This setting has no equivalent on .NET

By default, parsers are created with the XMLREADER_SUPPLIER.

This feature and the XMLREADER_SUPPLIER are different mechanisms for configuring the same underlying feature: how does the resolver get an XML parser if it needs one? (For example, for the ResolvingXMLFilter and ResolvingXMLReader classes.)

The SAXPARSERFACTORY_CLASS is initially null and the XMLREADER_SUPPLIER is used. The purpose of the SAXPARSERFACTORY_CLASS is to allow the factory to be configured with a property since the XMLREADER_SUPPLIER cannot.

If a SAXPARSERFACTORY_CLASS is specified, it will be used in favor of the default XMLREADER_SUPPLIER. If an XMLREADER_SUPPLIER is explicitly set after the configuration is initialized, it will set this feature to null and take precedence.

16Use URI entries for system resolution

Configuration on Java

  • Feature ResolverFeature.URI_FOR_SYSTEM (type: Boolean)

  • System property xml.catalog.uriForSystem

  • Property file property uri-for-system

Configuration on .NET

  • Feature ResolverFeature.URI_FOR_SYSTEM (type: bool)

  • Environment variable XML_CATALOG_URI_FOR_SYSTEM

  • Property file property uriForSystem

Ignore the distinction between system identifiers and URIs. The distinction between external identifiers (the public and system identifiers that are used in DTDs) and general URIs (as might be used to load a RELAX NG Grammar or XML Schema, for example), is not supported uniformly by the parser APIs. The Xerces XML Schema implementation, for example, users the resolveEntity API to load XML Schema imports.

Ordinarily, system identifier resolution interrogates the system and public entries (and their related entries), but not the uri entries. If this property is true, the resolver will attempt to resolve system identifiers with uri entries (after attempting to resolve them with the system and public entries.

17Merge http: and https: URI schemes

Configuration on Java

  • Feature ResolverFeature.MERGE_HTTPS (type: Boolean)

  • System property xml.catalog.mergeHttps

  • Property file property merge-https

Configuration on .NET

  • Feature ResolverFeature.MERGE_HTTPS (type: bool)

  • Environment variable XML_CATALOG_MERGE_HTTPS

  • Property file property mergeHttps

Treat http: and https: URIs as equivalent for the purpose of resolution. The web used to be served over http: and many existing catalog files contain http: system identifiers. Today, the web is largely served over https: and many documents contain https: system identifiers. If this property is true, that distinction will be ignored during catalog lookup, http://example.com/sample.dtd will match https://example.com/sample.dtd.

Note: this has *no effect* on the URIs returned by the resolver or retrieved over the web. It only effects catalog lookup for system identifiers and URIs.

18Mask jar URIs

Configuration on Java

  • Feature ResolverFeature.MASK_JAR_URIS (type: Boolean)

  • System property xml.catalog.maskJarUris

  • Property file property mask-jar-uris

Configuration on .NET

  • This setting has no equivalent on .NET

Don’t return jar: or classpath: URIs. Most entity resolver APIs are defined such that if resolution succeeds, the base URI of the resource returned is the base URI of the actual, local resource. This can greatly simplify things because subsequent relative URIs can be resolved against the local resource directly.

However, the Java URI class does not treat jar: or classpath: URI schemes as hierarchical, so any subsequent attempts to resolve relative URIs will fail. If this property is true, the local resource will be returned but the URI will be left unchanged. That may require a more complete catalog, but it will avoid a situation which is guaranteed to fail.

19Catalog loader class

Configuration on Java

  • Feature ResolverFeature.CATALOG_LOADER_CLASS (type: String)

  • System property xml.catalog.catalogLoaderClass

  • Property file property catalog-loader-class

Configuration on .NET

  • Feature ResolverFeature.CATALOG_LOADER_CLASS (type: string)

  • Environment variable XML_CATALOG_LOADER_CLASS

  • Property file property catalogLoaderClass

Specify the catalog loader class. The default catalog loader ignores any errors encountered when loading catalogs. This is convenient for production use, but can be frustrating because it may not be obvious when resolution fails, especially if your internet connection is fast. A typo in a catalog file can easily go unnoticed.

If the value org.xmlresolver.loaders.ValidatingXmlLoader is specified for this property, catalog files will be validated when they are loaded and the resolver will throw an exception for any validity errors encountered.

The validating loader depends on having version 20181222 of Jing on your classpath. (This is an optional dependency in the Maven distribution of the resolver, so you may have to add it by hand.)

20Parse RDDL documents

Configuration on Java

  • Feature ResolverFeature.PARSE_RDDL (type: Boolean)

  • System property xml.catalog.parseRddl

  • Property file property parse-rddl

Configuration on .NET

  • Feature ResolverFeature.PARSE_RDDL (type: bool)

  • Environment variable XML_CATALOG_PARSE_RDDL

  • Property file property parseRddl

Attempt to resolve RDDL resources in namespace URI lookup. If the namespace resolver is used, if a nature and purpose are specified, and if the resource returned is an HTML document, the resolver will attempt to find the RDDL resource description for the requested namespace and resolve that URI.

For example, the following API call will return the XML Schema for XML:

  |resolveNamespaceURI("http://www.w3.org/XML/1998/namespace",
  |                    "http://www.w3.org/2001/XMLSchema",
  |                    "http://www.rddl.org/purposes#schema-validation");

Attempting to resolve RDDL resources requires extra processing. If you know it will never succeed you can disable it by setting this property to false.

21Specify an alternate class loader

Configuration on Java

  • Feature ResolverFeature.CLASSLOADER (type: ClassLoader)

Configuration on .NET

  • This setting has no equivalent on .NET

If you are using the resolver in an environment where the default class loader (getClass().getCatalogLoader()) will not return useful class loader, you can specify an alternate loader with this feature.

22Configure the XML Reader

Configuration on Java

  • Feature ResolverFeature.XMLREADER_SUPPLIER (type: Supplier<XMLReader>)

Configuration on .NET

  • This setting has no equivalent on .NET

The default supplier is obtained with SAXParserFactory.newInstance() and the global mechanisms that it uses.

This feature and the SAXPARSERFACTORY_CLASS are different mechanisms for configuring the same underlying feature: how does the resolver get an XML parser if it needs one? (For example, for the ResolvingXMLFilter and ResolvingXMLReader classes.)

The SAXPARSERFACTORY_CLASS is initially null and the XMLREADER_SUPPLIER is used. The purpose of the SAXPARSERFACTORY_CLASS is to allow the factory to be configured with a property since the XMLREADER_SUPPLIER cannot.

If a SAXPARSERFACTORY_CLASS is specified, it will be used in favor of the default XMLREADER_SUPPLIER. If an XMLREADER_SUPPLIER is explicitly set after the configuration is initialized, it will set this feature to null and take precedence.

23Throw URI exceptions?

Configuration on Java

  • Feature ResolverFeature.THROW_URI_EXCEPTIONS (type: Boolean)

  • System property xml.catalog.throwUriExceptions

  • Property file property throw-uri-exceptions

Configuration on .NET

  • This setting has no equivalent on .NET

If this setting is true, errors in URIs that raise illegal argument exceptions, malformed URI exceptions, or URI syntax exceptions will be thrown. If it’s false, null is returned and the exception is ignored.

24Support catalog files in ZIP archives

This feature is new in XML Resolver 3.1.0.

Configuration on Java

  • Feature ResolverFeature.ARCHIVED_CATALOGS (type: Boolean)

  • System property xml.catalog.archivedCatalogs

  • Property file property archived-catalogs

Configuration on .NET

  • Feature ResolverFeature.ARCHIVED_CATALOGS (type: bool)

  • Environment variable XML_CATALOG_ARCHIVED_CATALOGS

  • Property file property archivedCatalogs

If archived catalogs are allowed, then you can place ZIP files of resources directly on the catalog path. The resolver will search inside the ZIP file for a catalog (/org/xmlresolver/catalog.xml or /catalog.xml) to use.

25Access the underlying catalog manager

Configuration on Java

  • Feature ResolverFeature.CATALOG_MANAGER (type: CatalogManager)

Configuration on .NET

  • Feature ResolverFeature.CATALOG_MANAGER (type: CatalogManager)

The CatalogManager class provides some lower-level methods for mapping to URIs without returning sources of any kind.

26Logging

There are two options to control logging: the class used to perform logging and the logging level desired.

These settings only apply to Java. On .NET, the NLog package is used for logging.

26.1The logger class

Configuration on Java

  • Feature ResolverFeature.RESOLVER_LOGGER_CLASS (type: String)

  • System property xml.catalog.resolverLoggerClass

  • Property file property resolver-logger-class

Configuration on .NET

  • This setting has no equivalent on .NET

The default logger class (org.xmlresolver.logging.DefaultLogger) simply writes messages to standard error. For many applications, this is sufficient and requires no additional configuration.

An alternate logger class (org.xmlresolver.logging.SystemLogger) can easily be swapped in for environments where writing to standard error is inappropriate or where more control is desired. If the system logger is used, it uses the SLF4J API to do logging. You must configure your environment with an appropriate logging backend.

A third option, available only at the API level, is to instantiate the org.xmlresolver.logging.SystemLogger yourself and pass in the java.util.logging.Logger class that you would like it to use.

26.2The logging level

Configuration on Java

  • Feature ResolverFeature.DEFAULT_LOGGER_LOG_LEVEL (type: String)

  • System property xml.catalog.defaultLoggerLogLevel

  • Property file property default-logger-log-level

Configuration on .NET

  • This setting has no equivalent on .NET

As a convenience, the resolver categorizes log messages and allows you to change the logging level for categories selectively.

Suppose, for example, you want to know more about how the cache is being managed. At the level of logging configuration, change how org.xmlresolver.cache.ResourceCache log messages are presented. If you can’t easily do that, change how the resolver logs messages related to caching: make them into warning messages instead of info messages, for example.

The log categories aren’t as fine grained as the class hierarchy. There are six categories:

request

Information related to the request.

response

Information related to the response.

config

Information related to configuration.

cache

Information related to caching.

warning

Reported warnings.

error

Reported errors.

Any of these categories can be reported as “debug”, “info”, or “warn” logging messages. This is configured with the system property xml.catalog.logging. That property is interpreted as a comma delimited list of category:level pairs: for example, cache:warn,response:info, would log cache messages as warnings and response messages as info.

If you prefer to specify this in a configuration file, use the catalog-logging property. Note, however, that this setting only applies if the system property is not set. (Because the logging class doesn’t have access to the resolver configuration, it can’t apply the usual defaulting rules.)