2013-12-20

XSLT (XPath) cannot find where the namespace declaration is

Yesterday I got stack a tricky features of XML.

Well, somehow I had to remove redundant the default namespace declarations (something looking like attribute "xmlns") from a large number of huge size of XML documents.  That job is incomplete, and this is intermediate memorandum.

Before really doing the job, I wanted to see how many instances are found where.  It was surprising that XSLT cannot do the job.  The "namespace::" syntax of XPath does not find the literal text in XML seriarization, but rather matches conceptual namespace nodes which are copied to all child elements [XPath].  So it cannot detect redundant NS declaration.

I think it is necessary to program with XML parser.  I wrote a ruby script to work with libxml2 Reader interface.



2013-12-18

XSLT to extract metadata from OAI-PMH GetRecord response

Between the GISCs of WMO Information System, a metadata record is exchanged using OAI-PMH.  OAI-PMH is an HTTP-based protocol, in which the server's response is an XML document that encapsulates metadata record(s).

It sounds so easy.  It's just extracting /OAI-PMH/GetRecord/record/metadata/gmd:MD_Metadata. Following command would suffice:

$ xmllint --xpath '//*[local-name()="MD_Metadata"][1]' input.xml > output.xml

Even with older version of libxml2, a few lines of equivalent XSLT would do the same job. Until today I have thought so.  But it was no good.

WMO Core Metadata Profile version 1.3 somehow prohibits the use of default namespace declaration.  But above command produces undesired default namespace declaration if the OAI-PMH uses it.  Oh no....!

It is a bit tricky to remove namespace declaration.  The exclude-result-prefixes parameter
works only in the literal result elements.  That means you have to write <gmd:MD_Metadata> instead of xsl:copy-of or xsl:copy or xsl:element.

<xsl:stylesheet version="1.0"
 xmlns:gmd="http://www.isotc211.org/2005/gmd"
 xmlns:oai="http://www.openarchives.org/OAI/2.0/"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 exclude-result-prefixes="oai"
 >
<xsl:output method="xml" omit-xml-declaration="no" />
<xsl:template match="/">
 <xsl:apply-templates select=".//oai:metadata/*[1]"/>
</xsl:template>
<xsl:template match="gmd:MD_Metadata">
 <!-- this is the literal result element -->
 <gmd:MD_Metadata>
  <!-- you might wish to override xsi:schemaLocation by ISO standard -->
  <xsl:apply-templates select="*|@*|text()"/>
  </gmd:MD_Metadata>
</xsl:template>
<xsl:template match="*">
 <!-- xsl:copy-of brings undesirable xmlns= even under MD_Metadata -->
 <xsl:copy>
 <xsl:apply-templates select="*|@*|text()"/>
 </xsl:copy>
</xsl:template>
<xsl:template match="@*">
 <xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>

2013-12-17

Note on NASA DIF (Directory Interchange Format) and GCMD Keywords

For long time I knew only the name of DIF (Directory Interchange Format) used in GCMD (Global Change Master Directory) which is a catalogue operated by NASA.  Recent days I'm getting interacting with more people who are interested in using GCMD keywords in the WMO/WIS Discovery Metadata which is extension of ISO 19139.

Resources I found in a quick research:
In the WIS community there was a question about gmd:keywordType and the uniqueness of gmd:thesaurusName.  Our profile WCMP 1.3 requires a single thesaurusName appears only once.  The GCMD keyword tables contain different types.   If a metadata creator wish to use gmd:keywordType to clarify the category of the keywords, he/she has to divide the MD_Keyword block for different keywordType.

In the original mapping by GCMD, there is no such issue.  The ISO element "keyword" is mapped from only "Keyword" in DIF which is free text.  But the most complex DIF element "Parameters" is mapped to old ISO element "category" which is probably superseded by topicCategory which is unfortunately enumeration and hence no longer extendable.   So the mapping does not have contemporary meaning, really unfortunately.

So I move to more realistic mapping implementation by AADC.  It creates MD_Keywords from following DIF elements:
Apparently there should be a need to care about a need for using GCMD thesaurusName for multiple keyword types.

TT-ApMD-2 (see para 28) was aware about that situation, and recommended slightly changing the title of thesaurusName/*/title like following:

"NASA/Global Change Master Directory (GCMD) Earth Science Keywords. Version 8.0.0.0.0.  (for theme)"

I know this is ugly and there are still some opinions, and really hope we get some agreement....



WMO Common Code Table C-15 (Physical quantities) and QUDT unit of measurement

New common code table C-15 (Physical quantities) is under development in the WMO Manual on Codes.  This is to be served as online registory http://codes.wmo.int/common/c-15.  In my understanding the primary motivation at the moment is to provide semantic description of quantities used in the Aviation XML.

The table is of course a list of entries, each describes a quantity, for example "airTemperature" http://codes.wmo.int/common/c-15/me/airTemperature.  Looking at the table, there is a field "generalization" with value "ThermodynamicTemperature" that links to http://qudt.org/vocab/quantity#ThermodynamicTemperature.

This is a link to QUDT.  The top page describes only SI and CGS systems, but there seems to be care for other conventional units. 

2013-12-06

ambiguity in pressure level heights of TAC TEMP which really casued trouble


It comes to attention recently that unnatural values in geopotential height is sometimes reported in BUFR TEMP message for 89532 SYOWA in Antarctica. That message is converted by RTH Tokyo from the traditional alphanumeric code (TAC) FM 35. The issues is partly a problem in the conversion software (handling of negative values), but also partly stemming from inherent ambiguity in the TAC TEMP format; location-independent algorithm fails to estimate of "upper digits" especially on 700 hPa.
The TAC/BUFR conversion is commonly seen worldwide, and other converter might have that problem, though the situation is not surveyed yet.

2013-12-04

"iso" URN namespace - machine-readable reference to ISO standards


I found IETF RFC5141 http://tools.ietf.org/html/rfc5141 defines the "iso" namespace of the URN.  That makes it possible to cite ISO standards in a computer-readable manner.  The metadata standards used in the WMO Core Metadata Profile can be called like following:
  • urn:iso:std:iso:19115:2003:en
  • urn:iso:std:iso:19115:cor-1:2006:en
  • urn:iso:std:iso:ts:19139:2007:en
Interestingly enough, the author (from ISO) says the ISO requests this namespace because the URN scheme using OID (such as urn:oid:1:0:19115 for ISO 19115) is not human-readable.

I'm not trying to change WCMP (for example gmd:metadataStandardName) since (for now) I don't know request that the field has to be computer-readable. But I think it is worth sharing.

[article also posted to WMO/IPET-MDRD]