2010-10-30

Forward proxy of HTTPS by Apache HTTPD

I wanted to set up an forward proxy server for a web site that uses
both HTTPS and HTTP protocols.
Okay it is well documented in
<http://httpd.apache.org/docs/2.2/en/mod/mod_proxy.html>.

One thing I had to find by trial and error is that https connection is
implemented by CONNECT method of HTTP and is not represented directly
in <Proxy> directive.
Instead we have to match URI in form "proxy:host:443".

# for HTTP
<Proxy http://{ORIGIN-SERVER}/*>
Order deny,allow
Allow from {CLIENT-IP}
</Proxy>

# for HTTPS
<Proxy proxy:{ORIGIN-SERVER}:443>
Order deny,allow
Allow from {CLIENT-IP}
</Proxy>

2010-10-21

ctags for XML Schema

require 'uri'
require 'rubygems'
require 'xml'

class App

def wputs str
$stderr.puts str if $VERBOSE
end

def eputs str
$stderr.puts str
end

def assert_equal test, right
raise "#{test} != #{right}" unless test == right
end

def getopts
while /^-/ === @argv.first
case opt = @argv.shift
when /^-o(.*)/ then @outfnam = $1
end
end
end

def initialize argv
@argv = argv.dup
@cache = {}
@names = {}
@outfnam = 'tags'
getopts
end

def help
puts <
vi tags generator
usage: ruby #$0 file.xsd ...
EOF
end

XSD_NS = 'http://www.w3.org/2001/XMLSchema'

def get1 uri, disp = nil
if @cache[uri.to_s] then
wputs "skipping #{disp or uri}"
return
end
doc = XML::Document.file(uri.path)
@cache[uri.to_s] = true
children = []
assert_equal(doc.root.namespaces.namespace.href, XSD_NS)
nodes = doc.find('/xs:schema/xs:import|/xs:schema/xs:include', 'xs'=>XSD_NS)
nodes.each { |node|
children.push node['schemaLocation'].to_s
}
nodes = nil
nodes = doc.find('/xs:schema/xs:*/@name', 'xs'=>XSD_NS)
nodes.each { |node|
name = node.value
if @names[name]
eputs "duplicated #{name} in #{uri.path} and #{@names[name]}"
end
@names[name] = uri.path
}
nodes = nil
doc = nil
for child in children
get1(uri + child, child)
end
end

def run1 filename
uri = URI('file:///' + filename)
get1(uri)
end

def run
for filename in @argv
run1 filename
end
self
end

def output
File.open(@outfnam, 'w') { |fp|
now = Time.now.utc.strftime('%Y-%m-%dT%H:%M:%SZ')
fp.puts "!_TAG_FILE_SORTED\t1\tsort=case-sensitive date=#{now}"
for name in @names.keys.sort
query = '/\["\']' + name.gsub(/\W/, '.') + '\["\']/'
fp.puts [name, @names[name], query].join("\t")
end
}
eputs "saved to #{@outfnam}"
end

def close
output
self
end

end

App.new(ARGV).run.close
When editing XMLs in vi, it is really powerful.
====
require 'uri'
require 'rubygems'
require 'xml'

class App

def wputs str
$stderr.puts str if $VERBOSE
end

def eputs str
$stderr.puts str
end

def assert_equal test, right
raise "#{test} != #{right}" unless test == right
end

def getopts
while /^-/ === @argv.first
case opt = @argv.shift
when /^-o(.*)/ then @outfnam = $1
end
end
end

def initialize argv
@argv = argv.dup
@cache = {}
@names = {}
@outfnam = 'tags'
getopts
end

def help
puts <
vi tags generator
usage: ruby #$0 file.xsd ...
EOF
end

XSD_NS = 'http://www.w3.org/2001/XMLSchema'

def get1 uri, disp = nil
if @cache[uri.to_s] then
wputs "skipping #{disp or uri}"
return
end
doc = XML::Document.file(uri.path)
@cache[uri.to_s] = true
children = []
assert_equal(doc.root.namespaces.namespace.href, XSD_NS)
nodes = doc.find('/xs:schema/xs:import|/xs:schema/xs:include', 'xs'=>XSD_NS)
nodes.each { |node|
children.push node['schemaLocation'].to_s
}
nodes = nil
nodes = doc.find('/xs:schema/xs:*/@name', 'xs'=>XSD_NS)
nodes.each { |node|
name = node.value
if @names[name]
eputs "duplicated #{name} in #{uri.path} and #{@names[name]}"
end
@names[name] = uri.path
}
nodes = nil
doc = nil
for child in children
get1(uri + child, child)
end
end

def run1 filename
uri = URI('file:///' + filename)
get1(uri)
end

def run
for filename in @argv
run1 filename
end
self
end

def output
File.open(@outfnam, 'w') { |fp|
now = Time.now.utc.strftime('%Y-%m-%dT%H:%M:%SZ')
fp.puts "!_TAG_FILE_SORTED\t1\tsort=case-sensitive date=#{now}"
for name in @names.keys.sort
query = '/\["\']' + name.gsub(/\W/, '.') + '\["\']/'
fp.puts [name, @names[name], query].join("\t")
end
}
eputs "saved to #{@outfnam}"
end

def close
output
self
end

end

App.new(ARGV).run.close

Caveat - xlink:href in ISO 19139 Geographic Metadata

I wanted to use the same instances of gmd:authority/gmd:CI_Citation
many times, so I came up with using xlink:href.

[definition]
<gmd:authority>
<gmd:CI_Citation id="url.authority">
<gmd:title>...
</gmd:CI_Citation>
</gmd:authority>

[quotation]
<gmd:authority>
<gmd:CI_Citation xlink:href="#url.authority">
</gmd:authority>

But it causes validation error, saying xlink:href is not allowed in CI_Citation.
It took a while, but finally I got the reason reading XML schema.

The xlink:href attribute must be attached to parent element of omitted
element that would be if xlink:href is not used.
So following validates.

<gmd:authority>
<gmd:CI_Citation id="url.authority">
<gmd:title>...

<gmd:authority xlink:href="#url.authority"/>

2010-10-20

A libxml-ruby script to download XSD recursively

require 'uri'
require 'net/http'
require 'rubygems'
require 'xml'

class App

def initialize argv
@argv = argv
@cache = {}
@htconn = {}
end

def help
puts <<EOF
XSD downloader following includes and imports
usage: ruby #$0 [-pNUM] uri ...
-pNUM number of directory structure (including hostname) to be stripped
EOF
end

def outfnam uri
File.join(*[uri.host, uri.path].compact)
end

def close
@htconn[:conn].finish if @htconn[:conn]
end

def getconn uri
shp = [uri.scheme, uri.host, uri.port]
if @htconn[:shp] == shp then
puts 'reusing connection'
yield @htconn[:conn]
else
@htconn[:conn].finish if @htconn[:conn]
puts "connecting #{shp.join(' ')}"
@htconn[:shp] = shp
@htconn[:conn] = Net::HTTP.new(uri.host, uri.port)
@htconn[:conn].start
yield @htconn[:conn]
end
end

def assert_equal test, right
raise "#{test} != #{right}" unless test == right
end

XSD_NS = 'http://www.w3.org/2001/XMLSchema'

def mkdir_p dirname
return nil if File.directory?(dirname)
raise Errno::ENOTDIR, "not a directory: (#{dirname})" if
File.exist?(dirname)
puts "mkdir #{dirname}"
mkdir_p(File.dirname(dirname))
Dir.mkdir(dirname)
end

def savefile filename, content
puts "saving #{filename}"
mkdir_p File.dirname(filename)
File.open(filename, 'wb') { |fp| fp.write(content) }
end

def get1 uri, disp = nil
if @cache[uri.to_s] then
puts "skipping #{disp or uri}"
return
end
ofn = outfnam(uri)
buf = nil
getconn(uri) {|conn|
resp = conn.get(uri.path)
raise "#{resp.code} #{resp.message}" unless /^200/ === resp.code
buf = resp.body
}
savefile(ofn, buf)
@cache[uri.to_s] = true
doc = XML::Document.string(buf)
assert_equal(doc.root.namespaces.namespace.href, XSD_NS)
nodes = doc.find('/xs:schema/xs:import|/xs:schema/xs:include', 'xs'=>XSD_NS)
nodes.each { |node|
child = node['schemaLocation'].to_s
get1(uri + child, child)
}
nodes = nil
end

def run1 arg
uri = URI(arg)
get1(uri)
end

def run
for arg in @argv
run1 arg
end
self
end

end

App.new(ARGV).run.close

2010-10-13

How to create processing instruction when writing XML using libxml-ruby

Actually I didn't find the way. So I had to use libxslt-ruby and
apply a stylesheet simply inserts PI.

require 'libxslt'
if @xslt
filter = <<-END_OF_XSLT
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:processing-instruction name="xml-stylesheet"
>type="text/xsl" href="#{@xslt}"</xsl:processing-instruction>
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
END_OF_XSLT
stylesheet = LibXSLT::XSLT::Stylesheet.new(XML::Document.string(filter))
@xdoc = stylesheet.apply(@xdoc)
end

2010-09-16

XML 文書内に散らばる同名の要素を集めて数えたりする XSLT

=== XSLT ===
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:in="http://example.com/hack"
  xmlns="http://example.com/hack2"
  >

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/">
    <root>
      <xsl:call-template name="kvlistParser">
        <xsl:with-param name="kvlist">
          |uri=http://example.com/hack|ln=node1|param=string|
          |uri=http://example.com/hack|ln=node2|param=float|
          |uri=http://example.com/hack|ln=node3|param=date|
        </xsl:with-param>
      </xsl:call-template>
    </root>
  </xsl:template>

  <xsl:template name="kvlistParser">
    <xsl:param name="kvlist"/>
    <xsl:variable name="nkvl" select="normalize-space($kvlist)"/>
    <xsl:choose>
      <xsl:when test="contains($kvlist, ' ')">
        <xsl:call-template name="collectElems">
          <xsl:with-param name="keyval" select="substring-before($nkvl, ' ')"/>
        </xsl:call-template>
        <xsl:call-template name="kvlistParser">
          <xsl:with-param name="kvlist" select="substring-after($nkvl, ' ')"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:call-template name="collectElems">
          <xsl:with-param name="keyval" select="$nkvl"/>
        </xsl:call-template>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template name="collectElems">
    <xsl:param name="keyval"/>
    <xsl:variable name="uri">
      <xsl:call-template name="keyvalGet">
        <xsl:with-param name="keyval" select="$keyval"/>
        <xsl:with-param name="key" select="'uri'"/>
      </xsl:call-template>
    </xsl:variable>
    <xsl:variable name="tagn">
      <xsl:call-template name="keyvalGet">
        <xsl:with-param name="keyval" select="$keyval"/>
        <xsl:with-param name="key" select="'ln'"/>
      </xsl:call-template>
    </xsl:variable>
    <class debug="{$keyval}">
      <tagname namespace="{$uri}">
        <xsl:value-of select="$tagn"/>
      </tagname>
      <count><xsl:value-of select="count(//*[local-name()=$tagn][namespace-uri()=$uri])"/></count>
      <xsl:for-each select="//*[local-name()=$tagn][namespace-uri()=$uri]">
        <leaf><xsl:value-of select="."/></leaf>
      </xsl:for-each>
    </class>
  </xsl:template>

  <xsl:template name="keyvalGet">
    <xsl:param name="keyval"/>
    <xsl:param name="key"/>
    <xsl:value-of
      select="substring-before(
      substring-after($keyval, concat('|', $key, '=')), '|')"
      />
  </xsl:template>

</xsl:stylesheet>
=== input ===
<?xml version="1.0"?>
<weirdPrefix:root
  xmlns:weirdPrefix="http://example.com/hack">
  <weirdPrefix:node2>val2a</weirdPrefix:node2>
  <weirdPrefix:node1>val1a</weirdPrefix:node1>
  <weirdPrefix:node2>val2b</weirdPrefix:node2>
  <weirdPrefix:node1>val1b</weirdPrefix:node1>
  <weirdPrefix:node1>val1c</weirdPrefix:node1>
  <weirdPrefix:node2>val2c</weirdPrefix:node2>
</weirdPrefix:root>
=== output ===
<?xml version="1.0"?>
<root xmlns="http://example.com/hack2" xmlns:in="http://example.com/hack">
  <class debug="|uri=http://example.com/hack|ln=node1|param=string|">
    <tagname namespace="http://example.com/hack">node1</tagname>
    <count>3</count>
    <leaf>val1a</leaf>
    <leaf>val1b</leaf>
    <leaf>val1c</leaf>
  </class>
  <class debug="|uri=http://example.com/hack|ln=node2|param=float|">
    <tagname namespace="http://example.com/hack">node2</tagname>
    <count>3</count>
    <leaf>val2a</leaf>
    <leaf>val2b</leaf>
    <leaf>val2c</leaf>
  </class>
  <class debug="|uri=http://example.com/hack|ln=node3|param=date|">
    <tagname namespace="http://example.com/hack">node3</tagname>
    <count>0</count>
  </class>
</root>

2010-08-06

MD_ProjectionParameters

ISO standard of projection parameter is given in MD_ProjectionParameters of ISO 19115.


Unfortunately it is not in WMO Core Profile v1.1 of Metadata.

But at least we can safely say that if we are going to define something, we have to respect that.

2010-05-21

Thanks for response: relative URI in OAI

Glad to see a response. 

http://www.openarchives.org/pipermail/oai-general/2010-May/000507.html

It's good to see the OAI community shares my sense that relative URI is no good.  I have nothing to add, but i didn't know TAG httpRange-14 which is pretty informative.

2010-05-19

IPET-MDI-1 final report about to conclude

there seems to be no new issue is coming arise.  I think it must conclude this week.
i haven't finished the task list I've offered though....

an essay on OAI-PMH in WIS

http://docs.google.com/Doc?docid=0AdzHt7XJp45TZGY0a21ncXNfNTJ4N3FjbjRobQ&hl=en

2010-05-09

(Gmail) keyboard shortcut to remove label?

I use several labels as multiple inboxes so I'm frustrated everyday
when I mark read threads and try to delete label. Maybe there is
better style of working....

2010-05-07

mail to blog was so nice

I found blogspot.com accepts email to secret address as a post. It's
so nice! This would be my memopad. And some open discussion could be
blogged by BCC:-ing.

How to insert multiple language texts into ISO 19139 metadata document

I was long wondering. I knew there is xml:lang global attribute, but
as stated in Section 6.6 of the standard, "[i]f a particular element
can only occur once based on the encoding rules discussed in Clause 8
then the technique of using the special xml:lang attribute to indicate
the language does not allow for the specification of the same element
in two or more languages."

But Appendix D.4 shows alternative way.

Firstly, there must be default locale (set of language and character
set) as MD_Metadata/*[language or characterSet]:

<language>
<LanguageCode codeList="resources/Codelist/gmxcodelists.xml#LanguageCode"
codeListValue="eng"> English </LanguageCode>
</language>
<characterSet>
<MD_CharacterSetCode
codeList="resources/Codelist/gmxcodelists.xml#MD_CharacterSetCode"
codeListValue="utf8"> UTF-8 </MD_CharacterSetCode>
</characterSet>

And then there must also be alternate locale at MD_Metadata/locale:

<locale>
<PT_Locale id="locale-fr">
<languageCode>
<LanguageCode codeList="resources/Codelist/gmxcodelists.xml#LanguageCode"
codeListValue="fra"> French </LanguageCode>
</languageCode>
<characterEncoding>
<MD_CharacterSetCode
codeList="resources/Codelist/gmxcodelists.xml#MD_CharacterSetCode"
codeListValue="utf8">UTF 8</MD_CharacterSetCode>
</characterEncoding>
</PT_Locale>
</locale>

Finally we can use this construct in <abstract>

<abstract xsi:type="PT_FreeText_PropertyType">
<gco:CharacterString>Brief narrative summary of the content of the
resource</gco:CharacterString>
<!--== Alternative value ==-->
<PT_FreeText>
<textGroup>
<LocalisedCharacterString locale="#locale-fr">Résumé succinct du
contenu de la ressource</LocalisedCharacterString>
</textGroup>
</PT_FreeText>
</abstract>

Complex, isn't it?

If we cannot speak French, we can prepare a reference and let French
guy write translation file:

<abstract xsi:type="PT_FreeText_PropertyType">
<gco:CharacterString>Brief narrative summary of the content of the
resource</gco:CharacterString>
<!--== Alternative value ==-->
<PT_FreeText>
<textGroup xlink:href="fr-fr.xml#abstract-fr"/>
</PT_FreeText>
</abstract>

In this way there must <textGroup>. If none, we have to edit
English-only version.

Compatibility Study on Regular Expression

I did a brief survey to find out portable syntax of regular expression.
http://docs.google.com/Doc?docid=0AdzHt7XJp45TZGY0a21ncXNfNTBmZm16bXFnag&hl=en

2010-05-05

CF-Metadata standard name grammer

Jonathan Gregory is doing some kind of linguistics. He analyzed standard name of the CF.
http://www.met.rdg.ac.uk/~jonathan/CF_metadata/13/standard_name_grammar.html#lexicon
http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2010/003451.html

I feel we have to learn from the CF community's heuristic approach. If it were WIS, an expert team would spend four years for discussing standard_name syntax before collecting examples. Of course that would not work.

2010-03-29

NetCDF uses UTF-8

John Caron of UCAR replied my question. http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2010/003370.html That's really good news because we don't have to fight for internationalization.

2010-01-11

note: METAR Structure

I spent this weekend writing a memorandum on the structure of BUFR template for translation of METAR code form. This would be useful for writing decoder or designing XML translation.

2010-01-01

今日の研究

急いでメタデータプロファイルを作らないといけないことになった。

国内との整合の観点からいうと、まずは JMP2.0 なのだが、まだこれだけでは使える気がしない(特殊化が足りない?)ので、山尾理他、2009: 海洋情報メタデータプロファイル. 海洋情報部技報, 27, 1--8 などが参考になりそうなので読んでみた。

JMP2.0 の辞書の項番を参照しているのでつきあわせてみると、JMP の理解も進んだ。気がついたことをバラバラと:

  • 海洋プロファイルでは JMP2.0 をかなり拡張している。特に観測基盤・掲載論文誌・海域根拠法令・データ(物理量)分類などは重要なものであるが、JMP に対応するるものがないので括弧書きで (38) すなわち descriptiveKeywords//keyword としている。これは海洋クリアリングハウスから JMP への変換だけを考慮した片道写像ということだろう。
  • ラウンドトリップ不要としたときの気軽さを感じた。
  • 必須項目はあまり多くない。

今後は、必須項目を ISO Core Mandatory と比べて、さらに Atom Mandatory と比べてみよう。たぶんそれで本当に何が必要かわかるだろう。