LinneanCoreUseCasesGregor

GregorHagedorn - Mon Nov 15 2004 - Version 1.8
Parent topic: LinneanCoreUseCases
My use case scenarios where I see LinneanCore to be used are

1) Form the scientific name representation of future taxon concept exchange standards. Especially, in regard to TCS it should replace the ABCD-style name details used in TCS 0.80. This is based on an older version of ABCD (1.20?) and completely separates name data elements for Bacterial, Botanical, Zoological and Viral names. This separation may simplify the task of exporting names for those having names only of one major taxonomic groups, but any consumer dealing with several of Bacterial, Botanical, Zoological and Viral names either has to work against a super-wide structure, often using complex OR-queries to find a given name, or has to figure out the relationships between the different subtypes for names him- or herself. It is clear that some names specialities are limited to one area, but I believe these should be modeled as extensions from a common name-detail-supertype applicable to all kind of scientific names (perhaps with the exception of viral names?).

To avoid waiting for TCS to be finished, I believe it is acceptable to agree on a first version of LinneanCore quick, accepting the danger that together with TCS a modified version 2 of LC may have to appear. I do believe that Jessie Kennedy's arguments that standards that model current biological practices detracts from the power of introducing entirely new procedures into biology is not correct. Instead, LinneanCore may be the easy way for those not able or willing to deal with the problems in making taxon concepts explicit. However, I do believe that users of LinneanCore may find it easier to later upgrade to a TCS-style model. LC may pave a way to more complex and sophisticated models.

For these reasons, I believe that a major design goal of LC should be to be as intuitive to biologists as possible.

2) Separate the issues of conveying hierarchical information from a standardized "canonical name" name used for comparing name equality. See the separate topic LinneanCoreDisentangle.

3) Support collaboration and sharing of work.

4) Perhaps a longer use case of the latter case: In our GLOPP database on fungal plant pathogens, we have ca. 190000 records. As of 2004-10-26, the raw name usage is 26418 fungal and 31139 plant "name strings". Our guess is that the number of names is much lower, perhaps a third of these numbers. Trying to find the fungi in Index fungorum yields less than a third of matching names. All other names would have to be manually corrected to provide a link. If Index fungorum correct a name, the link would be broken again. So instead of doing this, we attempt to use IDs from Index fungorum in our database. The linking effort is similar, but the result is more stable, and we can profit from both the homotypic synonymy information in Index fungorum and from improvements in the spelling of names and authors.

To me an important side-aspect of this is, that although less than 30 % of names matched directly, using information from index fungorum and some rule-based algorithms based on empirically detected name-variant patterns, I have been able to create a name variant index for the Index fungorum, which is several times the size of it, but increases linkage to about 2/3. (I plan to do this again with the next version of Index fungorum which provides a better basis for this procedure, so I am reluctant to give more precise values here.)

For the last reason, I believe that the original authors orthography should be part of LC. I further believe, that listing known non-standard name variants would be extremely valuable and I propose to make this part of the "Core". Name variants may result from reasons discussed in LinneanCoreDisentangle, but also special situations like that a name has frequently erronously attributed to an author that did not publish it, or names that used to be valid under older versions of the code but are replaced with different names through changes in the rules (example: changing the starting date for fungi and the associated sanctioning mechanism).

The use of relationships is more flexible as you don't need to specify all the types of relationships in advance which you would need to do if you were to treat them as attributes. If you wanted an enumerated list of relationships then of course the list would have to be maintained but I believe this is simpler.

-- Main.GregorHagedorn - 26 Oct 2004