GarryJolleyRogers - Wed Nov 25 2009 - Version 1.5
* NovitatesTreatment_1_0.xml: Natural Language Coding of a taxonomic treatment in the AMNH legacy literature project
This is my first attempt at an SDD coding of the Novitates ant treatment of Baikurus casei from the digitized version of http://research.amnh.org/informatics/ants/pdf/N3208.pdf the initial digitization of which is http://research.amnh.org/informatics/ants/xml_lite/N3208.xml See also other representations of this particular treatment at http://research.amnh.org/informatics/ants/
Especially see Ants.WebHome
I've done the treatment entirely with the
The essential point of SDD NLD markup is that subsequent agents acting on the document with a refined Terminology could mark it up further. For example, the HEAD section could be refined to separate markup of the ocelli, the eyes, etc., all the way to detailed character/state based markup that would be as searchable with XQuery as had it derived from a database in the first place. Accomplishing that markup is either expert handwork, or the subject of research in natural language processing of morphological characters such as that in Bryan Heidorn's lab. What's missing in this coding. It's important to the AMNH work to also include certain automatically marked document characteristics, including page boundaries, figure locations, etc. I haven't tried to capture that here in part because there have not yet been any attempts to use SDD markup as fragments, e.g. via namespaces, of something else. This doesn't seem difficult in principle, although the heavy dependence of SDD on key/keyref mechanisms could make validation complicated in such a scenario. Particularly unclear to me are the consequences of the (common) case where these document objects come within an informational object, such as the text of the use of a Concept in a
Less obscure is that I have omitted the (well-defined) use of mechanisms for describing special objects like tables, figures, and images. Normally in SDD these might be treated as part of the
Finally note two things:
1. NovitatesTreatment_1_0.xml has had the debugRef tool applied so that it is easier to see in situ what the Concepts used in descriptions are since the descriptions just use references, much akin to foreign keys in a traditional database. The tool inserts a heuristically chosen text (usually a mandatory label) from the referenced object---here a Concept---at the point of its reference.
2. My experience with SDD, both in the design and use, has been with
-- Main.BobMorris - 06 Jan 2005