LinneanCoreTCSDiscussion

GregorHagedorn - Tue May 16 2006 - Version 1.17
Parent topic: LinneanCoreTCSInteraction
The TCS team at Napier has posted pages on the TCSWIKI where we look at the LinneanCore from the TCS perspective, and provide a worked example of how TCS can represent Names and Nomenclatural issues.

Rather than duplicate those pages here (especially as this site seems to be under redevelopment) it would be convenient review our commentary and example at the Npier WIKI....and to add comments and further examples if possible (No Registration Necessary!)

-- Main.TrevorPaterson - 29 Nov 2004


JMS: Could people please make clear distinction between name-string (target of LinneanCore) and name (a relationship between a name-string and a named-object, would be expressed by TCS) throughout? Otherwise it is too hard to follow (for non-native speakers at least). Name-strings in LinneanCore domain are affected by combination for bi- or tri- nomen, but it is not conceptual issue just like difference between "I have" and "she has". If 'generic', 'specific', 'infraspecific' are uncomfortable for their relevance to concept, we would say 'first_token', 'second_token' and 'third_token' instead. Or, does the distinction between name-string and name contradict to TCS philosophy? -- 12 Nov 2004.

Metaissue by Gregor: I tried to follow the discussion below and failed at places to understand to whom which part belongs. I tried to reformat it using the indentation style for comments - but I may have failed in places. I hope the original authors can rectify this. Thanks!


Taken from email from Jessie and initial responses...

I think there is some great stuff on nomenclature issues which it is good to see documented, questioned and discussed e.g.
LinneanCoreHomoIsonym (issues of homonym isonym, and nomen novum/replacement name)
LinneanCoreHybridNothotaxa (issues about hybrid formulae and "notho"taxa [named hybrids])
LinneanCoreNameIdentity (How many elements necessary for infrageneric/specific ranks?)
LinneanCoreConceptSuffixString (provide a place in atomization for "s. lat." etc.?)
LinneanCoreResourceLinks (Links to BioCode, etc.)
LinneanCoreExampleNames (names from different kingdoms, hard cases etc.)

It would be really valuable for me and I guess others if someone could provide for zoology what Sally has for Botany in terms of examples.

There is also some useful material on issues not confined to nomenclature alone which is great e.g. Relevant enumerations:
TaxonomicRankEnum (see also formatted html page)
NomenclaturalTypeStatusOfSpecimensEnum
NomenclaturalStatusEnum? (e.g. Rejected, Conserved, Sanctioned, etc. Unfinished)
NomenclaturalCodesEnum (e.g. "Botanical/Zoological" for ICBN, ICZN, etc.)

However...

Although I was under the impression that at the Taxon Concepts/Names subgroup on the Saturday at TDWG there was general understanding of the goals and rationale for the TCS and how it intended to deal with nomenclature issues in addition to concepts issues within the one framework, I can see from the discussion that either I (and those present at the meeting) was wrong or it no longer seems to be the case amongst those contributing to the Wiki.

I also see that there is confusion and disagreement amongst those contributing to discussion on the Wiki what the role of Linnean core subgroup is and I see the beginnings of TCS discussions emerging all over again (maybe this is side effect of using Wiki's as any comment can be made thereby making it difficult to know and follow what the relevant discussion points are). During TDWG Linnean core was being pushed by some as a simple way of providing access to lists of valid names.


With DarwinCore being used as an analogy i.e. it would be a simple list of fields. That was basically what the name components from ABCD which were incorporated into TCS were and what I was informed was being discussed and re-considered by the Linnean core group to ensure the requirements of the nomenclator community were being addressed (and I was and am grateful that this work is being done).

However I am disappointed that much of the confusion and requests for changes to Linnean core/ABCD and hence TCS is coming from Gregor (sorry Gregor, I am impressed by your drive and enthusiasm to have your say in all these discussions but find some of your comments unconstructive). There was invitation to all interested in TCS (names and concepts) to attend the relevant working group meetings during TDWG, so I find it unreasonable that Gregor is saying that he doesn't understand TCS when I went to lengths to explain it's rationale and design philosophy and gave people a chance to raise any questions or issues and to address any misunderstandings. Gregor however didn't give a presentation on his ideas at the meeting even though he submitted an abstract suggesting he would nor did he attend the subgroup session on Saturday nor the Linnean group session on the Sunday. I wasn't even aware that Gregor was a nomenclator but perhaps I’m wrong or I misunderstood the reasons for re-considering the ABCD section. However, I would’ve thought that Gregor had enough on his plate trying to finalise possibly the most complicated of all the TDWG standards (SDD in addition to UBIF) and that it would have been reasonable for him to accept that other groups can consider and reason about the conflicting requirements of all potential users of the standard on coming to a design decision.

6. To sum this up: I try to stay away from concepts discussion (sometimes unsuccessful) and am glad if nomenclator's needs and TCS go hand-in-hand. I need some basic stuff for SDD, just like Anna and Chris need names stuff for TaxMLit. But foremost, I, Jerry, Anna, Chris, Paul Kirk, Dave Remsen and others are members of GBIF ECAT SSC and have a responsibility there. We did needed a LC-like standard already last year and it still is not available. At TDWG 2003 it was said that TCS would deal with both concept and names questions for the purposes of ECAT. It is, however, not Jessies fault that she was asked to do work GBIF ECAT depends upon without being asked to participate in the ECAT discussions, and actually with (almost?) no overlap between the ECAT SSC and her working group participants. -- Main.JessieKennedy - 15 Nov 2004: At TDWG 2003 it was never said to me that I was to deal with concepts/names for the purpose of GBIF ECAT. GBIF were one of the potential users to consult - and with whom I did - if those named shoud've been consulted then I'm sorry but this was not made clear to me by anyone - even though I did ask whether or not I should attend the GBIF meeting in May this year.


I have the following reservations based on some of the discussion on the Wiki:

It is unfortunate that following years of work by the ABCD group that there doesn't appear to be an ABCD representative contributing to the discussion – are you sure that you aren’t overturning well reasoned and argued decisions on why things were designed the way they were?

ALL agreed at the meeting that specimens or observations should be labelled with concepts_names (or concept GUIDs) not scientific names (e.g. binomial plus original author) therefore I don't understand why is this even being given wiki space unless you’re starting the modelling process from scratch again.

There seems to be confusion about what a transfer schema should do. To support the community at large it should not dictate how any application should be implemented, nor should it dictate what any database system must or must not contain, nor should it present a view of the world from only one of the community’s perspective – if we want a community standard exchange format everyone must move from their own view/model of the problem/requirements but still be able to map their data to the schema to exchange their data. (Why I explained the different types of users/views of what concepts/names were in my presentation)

Some of the discussion is about what a particular application i.e. a nomenclature tracking system or name resolution system should do and requires – I don’t believe this should be encoded in the schema - it should be documented in the mapping from that database onto the schema or in the specification of the software design. This is still important and the work may still to be done....but not as part of a schema design.

Although the process of nomenclature is separate from the process of classification (and yes I’m co-author on a paper in Taxon arguing this), as soon as one considers the use of names – either by taxonomists or other biologists or even nomenclature specialists I haven’t seen a convincing discussion or proposal that actually keeps names separate from concepts and I am even more convinced of this having talked to a wide range of potential users this past year. So let me re-iterate: people can’t refer to concepts without names nor does a name exist without some kind of relationship to a concept.

I know you know all of this, but my point is that what may appear to be overlap between LC and TCS (e.g., both needing to point to the publication details of SMith, 1949), are in fact quite different pieces of information, and do not represent overlap at all.

Or is Jessie's point that the LC should not be passing the author and publication information at all just some fields ('Aus', 'bus')? In which case, in the second example, where does the 'Smith' bit go?

Or are we arguing about something different & I've missed the point about where LC and TCS are starting to overlap?

examples needed ...

Perhaps those of you who are expert in and only interested in names for their own sake think you can and if that’s so I accept that as I do the many ways that people define concepts which to me may personally seem strange, however the vast majority of biologist and users of names can’t and don’t. So regardless of the fact that some people say a concept can have many names – how I interpret this is that they mean that the definition part of the concept can be the same in different taxonomic concepts with different names – this is how we have modelled it, explained it and require it to be understood to engage in a meaningful discussion about how for example Linnean core fits into TCS or how any particular issue or data problem can be represented in TCS. I tried to emphasise this at TDWG and thought I had succeeded but now am wondering if I did.

Based on these points it seems to me that the discussion on the Wiki is introducing concept issues into the requirements even though some people are trying hard to keep it concept free. If this continues what in my mind will eventually happen (even if you try your hardest to ignore information that is unequivocally only about concepts) is that you will still come up with an abstract model that needs to deal with everything that the TCS already does, e.g. names will have: specimens, relationships between them of different types (whether you choose to model this as relationships or embed them in the data type), citations, authors, (descriptions?), other people’s opinions on the relationships someone else asserted about a name etc... So we will have two schemas to handle taxonomic names and concepts – does the community really need this?


Rich:

Following up on my previous example of "Aus bus Smith, 1949 sensu Smith 1949". Based on my understanding of TCS, I can imagine two TCS records that look (in abridged form, with some ad-hoc modifications to the current LC schema) something like this (I hope I've rendered the XML properly):

<DataSet>
  <Publications>
    <Publication id="9876" type="Article">
      <PublicationSimple>
         Smith, J.D. 1949. Aus bus, a new species of Cidae from the Hawaiian Islands. Journal of the Linnean Core. 1(1):15-20
      <PublicationSimple>
      <PublicationDetailed>
        [Insert full "AlexandriaCore" schema instance here for publication details of Smith, 1949]
      </PublicationDetailed>
    </Publication>
  </Publications>
  <TaxonConcepts>
    <TaxonConcept type="Nominal" id="1234">
      <Name id="1234">
        <NameSimple>Aus bus</NameSimple>
        <NameDetailed>
          <Label>Aus bus Smith, 1949</Label>
          <CanonicalName>
            <Text>Aus bus</Text>
            <Genus>Aus</Genus>
            <SpecificEpithet>bus</SpecificEpithet>
          </CanonicalName>
          <CanonicalAuthorship>
            <Text>Smith, 1949</Text>
            <ProtonymCitation id="9876" />
          </CanonicalAuthorship>
          <Rank code="sp" text="species" />
          <AuthorsTaxonOrthograpy>Aus bus</AuthorsTaxonOrthograpy>
        </NameDetailed>
      </Name>
    </TaxonConcept>
    <TaxonConcept type="Original" id="5678">
      <Name id="1234">
        <NameSimple>Aus bus</NameSimple>
        <NameDetailed>
          <Label>Aus bus Smith, 1949</Label>
          <CanonicalName>
            <Text>Aus bus</Text>
            <Genus>Aus</Genus>
            <SpecificEpithet>bus</SpecificEpithet>
          </CanonicalName>
          <CanonicalAuthorship>
            <Text>Smith, 1949</Text>
            <ProtonymCitation id="9876" />
          </CanonicalAuthorship>
          <Rank code="sp" text="species" />
          <AuthorsTaxonOrthograpy>Aus bus</AuthorsTaxonOrthograpy>
        </NameDetailed>
      </Name>
      <AccordingTo>
        <AccordingToSimple>Smith</AccordingToSimple>
        <AccordingToDetailed>
          <AuthorTeam>Smith</AuthorTeam>
          <Date>1949</Date>
          <PublishedIn ref="9876" />
        </AccordingToDetailed>
      </AccordingTo>
    </TaxonConcept>
  </TaxonConcepts>
</Dataset>

The first thing to notice is that the Publication details ("Alexandria Core") have been moved out of LC ProtonymCitation, and instead referenced via the TCS method (Jessie -- correct me if I interpret TCS correctly here). It has the GUID "9876".

The other thing to notice is that there are two TaxonConcept instances represented here: one is "Nominal" type and the other is "Original" type. The "Nominal" type contains only LC elements -- it is an "empty", name-only concept (no "AccordingTo", etc.). This is what I imagine a basic concept-less LC instance would look like (i.e., enclosed within a TCS wrapper). Notice that both TaxonConcept id"1234", and Name id"1234". This is my attempt to illustrate how the same GUID series used for TCS instances could be inherited by the LC schema where TaxonConcept type="Nominal".

The second TaxonConcept instance is of "Original" type -- which, as I understand it, implies that a concept is created at the same time that its name is created (correct, Jessie?) This TCS instance is intended to represent a specific taxon concept -- the concept asserted by Smith in the original publication that created the name. Note that TaxonConcept id is different ("5678"), but Name id is the same ("1234").

My main point here is that there are separate "name-only" and "concept-bearing" TCS instances represented by the idea of "Aus bus Smith, 1949 sensu Smith 1949".

If I have totally misunderstood either TCS or LC, now would be a VERY good time to clue me in.

Will this be finalised any quicker than TCS? Concern has been raised about waiting for TCS to be finalised... so what’s actually holding up finalisation of TCS - is it because of a "it doesn’t represent things the way I do it" phenomenon? I believe that TCS is almost finalised or could be very soon and I hoped that the Linnean core would help that by getting agreement on the fields necessary to detail a name applied to a concept - currently the according to element and the name element (ABCD name) in TCS.

The only other things we are waiting on is an agreed interface to the other schemas being developed, i.e. the elements we have marked with placeholders.

However we don’t need to wait for all the other schemas to be completed to move on - for those desperate to move on they can use a proposed schema knowing that they will have to transform their data or software to deal with the agreed schemas in due course (this is not a problem confined only to TCS). The placeholders include publication, specimen vouchers for which TCS would suggest a GUID attribute plus a set of fields which can uniquely identify the object being represented by the other schema in the absence of GUIDs and which also act as a human readable version of those elements.

I guess I could continue but I’m sure that it will be counter-productive at the moment. I could and in fact would like to try and explain things again if anyone doesn’t get the idea we’re presenting. I would like to ensure that we have understood and taken into account all of the requirements of nomenclators – much of the information on the Linnean core Wiki will contribute to that understanding, and am sure we can find a good solution. I only ask you to be careful that you don’t end up re-doing what we did last year but with a narrower group of users unless you really intend to, in which case we should be clear about the fact that that’s what Linnean core is doing.

Having read this far I hope you don’t think I’ve made these comments because the Linnean core Wiki is suggesting changes to our proposal and I simply don’t like it. I’m very happy to have good suggestions which improve the schema but if the suggestions fundamentally change it then I'm reluctant to start modelling TCS over again. If they are suggestions because of misunderstandings then it’s a waste of time arguing back and forth until we explain TCS more clearly. So I am happy to reply to specific questions about the relationship between Linnean core and TCS but unless the philosophy of TCS is understood the discussion may become a bit pointless.

-- Main.JessieKennedy - 11 Nov 2004


Sally: I think that the most helpful thing would be if you could point out exactly where Concept stuff is creeping into Linnean Core and what bits you'd want to cut out and have (just) in TCS.

Also what might be really helpful would be some examples of TCS data especially now you think it's getting towards completion. - with LC slotted in to the names gap we can see clearly where the overlaps are & have a real discussion on what should go where. For me, that would make the division of labour between LC and TCS a lot clearer than any number of philosophical discussions. Maybe that's just me?


Rich: Thanks, Jessie. I guess my two requests to you would be:

  1. Tell us specifically where you think that LC is encroaching on TCS (i.e., overlapping concept information), so we can address your concerns in a more specific way; 2. Please explain to us (or point us to a document that explains) the detailed definitions of the different TaxonConcept types ("Nominal" [was "nomenclatural"], ["Original"], etc.) in TCS, and what they were intended to represent.
    • Main.JessieKennedy - 12 Nov 2004: Have started explaining above but will create a separate document that explains this more clearly and that stands alone so is more understandable - next week.....

The primary issues that I think we need to discuss in terms of developing LC as a seamless plug-in to TCS elements are:

  1. Vocabulary (i.e., we should try as much as possible to follow similar or identical element naming conventions).

2. Publications: looking at TCS (if I understand it correctly) I am now leaning towards removing the "Publication" element of LC 0.1.4 from within "ProtonymCitation" (under "CanonicalAuthorship_Proposal2"), and instead adding a "ref" attribute to ProtonymCitation that points to a Publication record stored in the TCS "wrapper" of an LC instance. 3. Perhaps eliminate "Kingdom" element of TCS, and either replace with "NomenCode" (or similar); * Main.JessieKennedy - 12 Nov 2004: would be happy with this or move this element to within the "Name" element (in the domain of LC. * Main.JessieKennedy - 12 Nov 2004: not so happy - when taxonomists do a classification they need to say which rank they are working at before the appropriate name can be applied so I believe it first belongs in concept and then secondarily is associated to name - don't really think it needs to be in name if we accept that original concepts are really names with some extra info - so to get a name take a project view on the concept. Don't ask me yet to answer specific examples - we'll get there... 4. Make sure we understand the respective roles of "Rank" in TCS and LC, make sure they need to be separate (or decide how to combine them), and if maintained as separate, make sure they are cross-consistent. * Main.JessieKennedy - 12 Nov 2004: agree 5. Resolve the Primary type specimen issue I discussed in the previous message (e.g., move type specimen elements from LC to SpecimenCircumscription, or prehaps create some sort of "TypeSpecimenCircumscription" element somewhere -- I don't know. As I said before, I see this as the only real potential point of contention. * Main.JessieKennedy - 12 Nov 2004: agree but think that it is really just a perspective if you are a nomenclators then you see it as primarily as something to do with names, if you are a taxonomist you see it as a representative of the circumscription of the taxonomic concept, if you are a museum curator you probably see them in some other way altogether and we need to choose one that works for every one. * Main.SallyHinchcliffe - 12 Nov 2004: on Rich's point 5 above, I have no problem with the type specimen information should remaining solely in the TCS, as the SpecimenCircumscription. We hadn't really got down to that bit of the schema yet - I think it's still in there from Jerry's original version of the LC. You could argue that it's part of the name, philosophically, but it's not needed for transmitting the name. I can't think of any case where we'd want to transmit some TCS data with one set of specimen circumscription data in it, wrapped around a LC element with another set of (type) specimens included in that.


Gregor - 12. Nov. 2004: Clearly the relation between TCS and LC should be discussed, and Richard's 5 points above are important. However, I believe the question what LC exactly is and whether it should fit completely into TCS or not is besides the point. Jerry did draw up a scope of LC and many - me included - responded with: "this looks like something I am looking for". Ultimately most points from Jerry's draft will end up somewhere. That includes stuff we already agreed to exclude from current LC discussion (e. g. Biostatus, Vernacular names)! Now, is it important to discuss exactly where they end up, and whether this should be called LC? We have started to develop types for CanonicalName and CanonicalAuthorship (and separating these I now, although initially against it, consider good progress!), we are trying to make progress on rank, and someone (which is not me in this case!) urgently needs to deal with those stupid hybrid and multihybrid and and named hybrid cases (please contribute to LinneanCoreHybridNothotaxa). We have not finished these steps - and I believe in relation to TCS they are uncontentious. We should work on this.

The next step is probably some structure to express nomenclatural scrutiny: Is a name illegitimate under ICBN, was it valididly published under ICBN, or whatever under ICZN, is it a nomen nudum, etc. I think TCS has not dealt with this at all so far. Then I personally am fervently in favor of being able to express nomenclatural relations like basionym or replaced-synonym separately from circumscription-concept relations (I try to avoid the overloaded term "concept" here - Jessie uses a different definition than I and what so far I thought Richard and Sally). From my perspective it appears to me important to "type" these relations and therefore prevent nonsense relationships like a derived circumscription concept becomes the basionym of a botanical name, which as far as I understand is quite possible under the generalized TCS "concept"-concept). Jessie will probably argue against that. However, I propose to view it like this: maybe LC trying to do it helps to "sort" (as Jessie requested help) the list of 40 TCS relations, some of which are homotypic, some heterotypic singular-point (type specimen) and some circumscription-related. This list is the core why for me TCS does not seem work - simply because this list is mixed and the semantics and the syntax (which relation is applicable to which of the 5 concept-types on start and on end-point) are missing. So let us get it sorted! If we sort it in LC and create some element names, maybe these names end up as heading or supertypes in the TCS relationship list. It would not bother me too much! -- Another way to express it: If some of us are unable to help TCS under the TCS definitions of what a "concept" and "nomenclature" is in TCS, perhaps by drawing up a schema that expresses our knowledge in terminology which some of us can easier relate to, we are able to express knowledge that TCS could then peruse (whatever that means in English :-) ).

Finally, at the moment I would like to have some way to express knowledge about type specimens - but indeed perhaps that works well in TCS after all. I cannot see into the future - I would need TCS examples under a revised TCS to get an opinion about this. Either it does work well in TCS or someone will extend LC (or any schema inheriting from LC!) to make it work in a different way. And the same applies to SDD. Simple. -- Gregor - 12. Nov. 2004 -- PS compare LinneanCoreTCSRelationshipTypes. - 19. Nov.