LinneanCoreDisentangle

GregorHagedorn - Fri Nov 12 2004 - Version 1.13
Parent topic: LinneanCore
(Of course only an attempt to disentangle some issues of name, concept, hierarchy! Please discuss, add, and feel free to insert short comments immediately into the text!):

Multiple names for a taxon concept

Scientific organism names-string (with or without nomenclatural or concept citation) and taxon concepts (in the sense of conveying a concept in the real world, i. e. through a circumscription by means of character description or listing the classified objects) have a n : m relationship: The same scientific organism name in practice may have multiple circumscriptions, and the same circumscription may have multiple scientific organism names.

When using "scientific organism name" here I refer to a nomenclatural object, the identity of which is defined by nomenclatural rules and for which many name-string-variants may exist. Note that the nomenclatural rules may be by convention rather than by ICBN/ICZN etc. codes. For example, forma specialis or race is not governed by any code, yet "Genus species f. sp. capsici", "Genus species forma spec. capsici", and even "Genus species f. sp. capsicum", or "race 1A", "race 1 A", "r. 1a", "Rasse 1a" can all be recognized as referring to the same name object, respectively.

Why is this so? The following is largely influenced by my botanical background, but in principle I believe most applies to zoology as well.

1. Spelling issues, founded or unfounded. Some go back to the original orthography. Is it 'Evonymus' or 'Euonymus'? Haplospora or Aplospora? Sometimes grammatical corrections are unfounded: Capnodium citricola McAlpine is correct, Capnodium citricolum McAlp. wrong (...cola = growing-on never changes with genus).

2. Author abbreviations. Although for some taxonomic domains attempts to standardize author abbreviations exist, normalizing abbreviations is very labor intensive to use, and often may cause more confusion when used to guess on the correct abbreviation for non-standard short-hand abbreviations found in the data sources (specimens, literature, etc.).

3. A minor issue is that in the case of infraspecific taxa, the taxon author may be given for both the species and the infraspecific taxon. The canonical recommended form in botany is to give authors only for the lowest ranking infraspecific taxon.

4. A scientific name is often a mixture of hierarchy information and name information. This is one of the reasons, why multiple names for the same taxon concept exist. Indications of hierarchy exist in three places:

a) between genus and species:
Cortinarius olidus J.E. Lange
= Cortinarius (Phlegmacium) olidus J.E. Lange
= Cortinarius (subgen. Phlegmacium, sect. Elastici) olidus J.E. Lange

b) between species and lowest infrageneric rank accepted by the code:
Saxifraga aizoon subf. surculosa Engl. & Irmsch.
(citing only the subforma is the recommended canonical form of a botanical name)
= Saxifraga aizoon var. aizoon subvar. brevifolia f. multicaulis subf. surculosa Engl. & Irmsch

c) at the genus level itself
Microbotryum violaceum (Pers.) G. Deml & Oberw.
= Ustilago violacea (Pers.) Roussel

5. The above does not preclude that a scientific name is cited in a form (using "secundum" or "sensu" that does explicitly indicates a referred taxon concept. But even this is not necessarily unique, since the concept citation are often given in a highly historical-context defined abbreviated form (s. str., s. lat., s. latissimo, etc.).

Name hierarchy and name identity

Point 4 above addresses the issue of hierarchy information. Linnean scientific names are a mixture ("entanglement") of expressions of hierarchy and nomenclatural-object-identity. The circumscription of a taxon concept is always specific to the lowest rank. However, some name parts/elements confer both hierarchy and identity. Those only confering hierarchy (subgenus etc.) are redundant in a canonical name, those also confering identity are not. Genus names must be globally unique within each nomenclatural code, but e.g. subgenus names are locally unique, requiring the Genus for identity (the name must be a "combination" (see ICBN 6.7, e.g. "Arytera sect. Mischarytera"). Similarly, infraspecific epithets are only unique within a species. They are, however, not restricted to their immediate infraspecific hierarchy, i.e. a variety in a subspecies must be unique for the entire species, not just for the subspecies.

Hierarchical information does not change the circumscription for the purpose of identification or comparing property data about taxa. It does change the "concept" in a sense where the term concept implies the entire hierarchy. Since only few use cases make use of specific hierarchies for individual taxa (as opposed to the importance of a hierarchy in general, which may however be applied to taxon concepts defined elsewhere) - I think these issues should be separated.

Some thoughts

Name strings may be:
1. with/without authors,
2. with/without year of publication
3. with/without concept suffix
4. in zoology: with/without indication protonym genus (when only the epithet is given as a name).

Names without authors are frequently found in works addressing the general public. Resolving them to a unique scientific name requires knowledge of a historical and geographical publication context. In some groups this may be relatively easy, if data on the usage period (= years) of homonyms (same genus and species, but different authors) are available. However, datasets providing this information are not known to me.

The most common practice in scientific publication is the use the scientific name with authors and or year of publication. It is is highly desirable to be able to find unique names. Much of the LinneanCoreUseCases ultimately depend on the desire to compare different data sets with each other.

Taxon concepts are very desirable in many case. Also professional practice to refer to concepts in publications in a uniquely resolvable way should be encouraged. However, many cases exists where each identification creates its own taxon concept.

Only a subset of taxon concept applications is currently operationally feasable. Whenever I identify a plant pathogen, I routinely use 3-4 publication sources with keys, indexes, descriptions. The resulting name is a result of my attempt to understand and reconcile the concepts I find in this literature. Citing the sources is helpful, but introduces a fuzzy statement that my new concept is somehow and operationally intractably related to these other concepts.

Resolving unique scientific names to a taxon concept in retrospect is often very difficult and extremely labor intensive. In the projects that I am involved it, I see no hope to ever find the resources to attempt this. Resources to cross reference taxon concepts among taxnomic publications may be found, but property or spatial/temporal observation data are usually separate from these treatments - and I believe they are the real interest in using a GBIF name service.

Some conclusions from me for the design of LinneanCore

1. Separate canonical name information from hierarchy opinion

2. Separate taxonomic opinion from the much more reliable nomenclatural relations.

3. Things like biostatus, vernacular names etc. should be urgently developed, but as separate modules to the scientific name component.

-- Main.GregorHagedorn - 26 Oct 2004