Resources I found in a quick research:
- DIF Writer's Guide
- An entry point of most of official resources.
- Contains XML Schema or template that illustrates the structure.
- Some elements refer to keywords i.e. controlled vocabulary
- NASA gives a mapping table to ISO 19115
- some elements seem to be given in old names and structure
- AADC (Australian Antarctic Data Centre) provides a converter into various profiles of ISO 19115/19139
- probably well-done, but apparently using some extension to XSLT
In the original mapping by GCMD, there is no such issue. The ISO element "keyword" is mapped from only "Keyword" in DIF which is free text. But the most complex DIF element "Parameters" is mapped to old ISO element "category" which is probably superseded by topicCategory which is unfortunately enumeration and hence no longer extendable. So the mapping does not have contemporary meaning, really unfortunately.
So I move to more realistic mapping implementation by AADC. It creates MD_Keywords from following DIF elements:
- Parameters - Controlled
- Keyword - Free text
- Sensor_Name (Instruments) - Controlled
- Source_Name (Platform) - Controlled
- Paleo_Temporal_Coverage - Free numeric date range
- Paleo_Temporal_Coverage/Chronostratigraphic_Unit - Controlled but the list not online, probably something like shown in Wikipedia
- Project - Controlled
- IDN_Node - some identifier I don't know
- Location - Controlled
TT-ApMD-2 (see para 28) was aware about that situation, and recommended slightly changing the title of thesaurusName/*/title like following:
"NASA/Global Change Master Directory (GCMD) Earth Science Keywords. Version 8.0.0.0.0. (for theme)"
I know this is ugly and there are still some opinions, and really hope we get some agreement....
I think that there is a strong argument for aiming to make the expression of names of things (people, organisations, thesauri, etc) consistent.
ReplyDeleteSystems functionality and interoperability (now and future) will be hampered by the lack of consistent implementation.
Appending the thesaurusName/title with "theme" is a solution to a local need, but seems to be at the cost of a broader need. Is it possible to consider broadening the rule, to allow more than one instance of one thesaurusName? If not, are there other solutions?
Also: it seems unfortunate that when this vocabulary is so broad-ranging, the recommended citation only applies to the top-level, rather than to each subsection (Projects, Instruments, etc).
If I remember right, the WCMP's requirement of uniqueness of thesaurusName.title was explained as a preparation for new ISO 19115-1. I'll check what was the argument to find out best compromise among the requests.
DeleteBy the way do you plan to make real use of keywortType, for example creating separated search index for each different keyword type? If so, it's a persuading reason for me to relax the WCMP requirement.
I've just checked, and the new 19115-1:2014 hasnt changed, re Keyword cardinality. And indeed its' introduction of KeywordClass (as an additional way to cluster keywords) might be a further argument against constraining (to 1) the number of the same ThesaurusName that are allowable.
DeleteRe indexing of (and searching via) KeywordType: I imagine that facetted searching could (and ideally would) operate as follows:
1.choose "keyword";
2. systems presents options of a) 'freetext', b) thesaurusName-list, or c)keywordType-list.
A further argument against this constraint might be the mapping usecase. Where a metadata record in another 19115 Profile exists, and it has to be mapped to WMCPv.1.3... if it contains more than 1 instance of the same thesaurusName, which block should be dropped?
The last para here refers to GCMD's recommendation on how to cite their vocabularies [see: http://gcmd.nasa.gov/learn/keyword_list.html]
Delete