2013-11-15

utilizing XLink in WIS Discovery Metadata to compactify common information

When looking at multiple metadata records, we can easily find a huge number of repetition of the same information with significant size, for example point of contact, citation, or reference system.  It is a natural idea to utilize XLink attributes in ISO 19139 XML Schema, and that has been discussed so many times in WIS community for years.

But I have to say there is no tangible outcome so far.  Everybody like to talk about the vision, but apparently don't know how to get there. I feel I'm obliged to explain already-known difficulties.

Note: following is only my personal view, and is not to be considered official position of any organization. 


There seem to be at least three major problems --- persistence, reacheability and strong typing.

Persistence


Persistence is clearly necessary.  If a metadata record can be fully understood only after interpreting the remote content pointed by the xlink, the record may break silently when the persistence of the remote object is not reliable.

In the current WIS framework the GISCs exchange the entire set of metadata, and there is no primary GISC in the network.  Thanks to this loose coupling, a GISC can assure the basic continuation of the service only by taking care of its own already-harvested set of metadata.  If common objects are to be hosted somewhere and to be xlinked, there must be some formal arrangement to establish a hosting service to ensure at least the same level of commitment of long-term continuation..

In order to avoid "silent break", it is convenient to restrict that the URI points to immutable object; i.e. any change is prohibited once after a record is made.  But that is not so practical, since most of information has to be updated.

Reacheability


Reacheability is also a potential concern.  Years ago I heard some people in the WIS community are eager to establish the GISC's web services on the private networks, probably in expectation of improving operation of the GTS.  In such vision, it is considered really bad to introduce a dependency on the everyday connectivity to the Internet to reach the xlinked service.  Right now I'm not receiving such pressure, but it's too early to declare such an idea has totally gone.

Strong Typing

Many people believes that the resource which an xlink:href points to must be a valid XML document or fragment which must validate against ISO 19139 XML Schema of the same XML element as the element in which the XLink is described.  I can't find such regulation in ISO 19139 or 19118, but this notion is so strong.  In my observation such people tend to consider XML as a serialization of strong-typed data structure (such as classes of Java), and every application software must load the XML documents through rigid decoder such as Java binding.

That seems to be behind the reaction of people against proposal of XLink.  Persistence or reacheability is said to be reason, but the strong typing model of processing implies tight linkage between centers, and that makes the requirements serious.  For example, there is concern about what if ISO 19139 is updated, with memory of people suffered from different versions of GML XSD's.

In the reality however, the WMO Core Metadata Profile (WCMP) has not assumed the strong-typed xlinks.  In the previous version 1.2 there are examples of xlink:href to HTML, PDF, and even DOC file.  These liberal examples are not present in the latest WCMP 1.3, but that is simply because we retained really essential regulations, not because we agreed to enforce strong typing of xlinks. [Note 2013-12-17: It is almost sure that we have to allow keyword thesaurus that doesn't have GML CodeListCatalogue, and the use of gmx:Anchor is not limited to WMO Codelists in this respect].


Possible Way Forward

I think the loose coupling is essential to promote the xlink idea.  In this case that means the question is whether we can confirm a GISC does not have to automatically process the objects beyond xlinks.  If we could answer positively, that would liberate us from a number of requirements; the remote objects would be recommended but non-mandatory practice to augment (probably human-readable) information.

In that way of thinking the attribute xlink:href is just an identifier of the external resource, and the search index should be constructed, if necessary, from xlink:title.  That will achieve a balance of compact reference and robust processing.

The work of Aviation XML people might have synergy with that direction.  They are trying to establish a web-based registry http://codes.wmo.int/ to manage codelists used in the AvXML standard.  As the about page says, the UK Met Office (or perhaps UKgovLD?) is providing the service, so the continuity would be okay.  They serve RDF, not geographical metadata, in addition to human readable HTML; but that is I think a good point with regard to format stability.

















No comments :

Post a Comment