NaturalLanguageDescriptions
GregorHagedorn - Sat Jun 13 2009 - Version 1.14
Parent topic: SddContents
SDD Part 0: Introduction and Primer to the SDD Standard
2.3 Natural language descriptions.
2.3.1 Traditional natural language descriptions.
Natural-language descriptions (Box 2.2.1) are semi-structured, semi-formalised descriptions of a taxon (or occasionally of an individual specimen). They may be simple, short and written in plain language (if used for a popular field guide), or long, highly formal and using specialised terminology when used in a taxonomic monograph or other treatment.
Box 2.3.1 - Typical natural language descriptions
Red Knot (Calidris canutus) from Slater, P., Slater, P. & Slater, R. (2001) The Slater Field Guide to Australian Birds (Reed New Holland: Sydney)
Discaria pubescens (Brongn.) Druce from Walsh, N.G. (1999) Rhamnaceae, in N.G.Walsh & T.J.Entwisle, Flora of Victoria Volume 4, Dicotyledons, Cornaceae to Asteraceae (Inkata Press: Melbourne) |
There are two methods for the production of natural language descriptions within SDD.
- Descriptions may be produced elsewhere and simply stored within an SDD instance document, these are "authored natural language descriptions"
- Descriptions may be generated from data and text snippets sourced from within the SDD instance document, these are termed "marked up natural language descriptions".
2.3.2 Authored natural language descriptions.
Authored natural language descriptions are simply descriptions written by hand, either within an application or imported into an application, including legacy descriptions sourced from existing publications. Within SDD "authored" descriptions may never be overwritten by a natural language reporting process, whereas "generated" descriptions may be updated. Both "authored" and generated descriptions may contain markup (data supplied from a coded data source) but this is not required. All natural language descriptions are nested within the <NaturalLanguageDescriptions> element within <Dataset>.
A natural language description requires only two essential items: the names of the taxa being described, and the descriptions themselves.
A simple SDD instance document for natural language descriptions has the basic structure shown below and in Example 2.3.2.
Example 2.3.2 - Anchored natural language descriptions
|
For more information on defining taxon names using the <TaxonNames> element, see the topic Defining taxon names.
Note that taxa can also be arranged into hierarchies. See the topic Defining taxon hierarchies for more information.
The <Representation> element provides a label for the description. This may be useful if the instance document includes multiple descriptions for different purposes.
<Scope> describes the taxon or set of taxa to which the description applies.
The <NaturalLanguageData> element contains the text of the natural language description.
2.3.3 Marked up natural language descriptions.
Marking up of natural language descriptions allows parsing of matrix data into natural language descriptions and modification of character and state names for inclusion in natural language descriptions. "Authored" descriptions may never be overwritten by a natural language reporting process, whereas "generated" descriptions may be updated. Both "authored" and "generated" descriptions may have markup, but do not need to. The sdd standard is capable of storing data with partial markup, resulting from any mixture of automatic markup by a processor or manual markup.
-- Main.DonovanSharp - 01 Jun 2006
- Long expanded version of NLD structures: