GarryJolleyRogers - Wed Nov 25 2009 - Version 1.28
Parent topic: BDI.SDD_
(This is part of the UBIF.SchemaDiscussion discussion. See also InterfaceDiscussion. The name for the section contain Agent, Publication, etc. Proxy objects in the TopLevelStructure is discussed under NameForProxyOrInterfaceSection.)
Fundamentals:
1 Ignore the problem and pretend the external objects can be represented by a simple human readable string (publication, person name, taxon name), disregarding the fact that this only allows humans to guess identity and prevents integration of different datasets. Unions of such sets can be queried using fuzzy query operators, but datasets cannot be joined - e. g. not
To solve these problems I propose to consistently use a class of objects called proxy data. These primarily have very simple representation of an object composed of
Together with a developer Annotation and extensibility containers (Ext and
In addition, each kind of proxy data class may contain extensions providing additional, more detailled data. To discuss both the base proxy elements and such extensions, I propose to discuss publications. This is external data to all currently active groups and is of similar importance to most biodiversity groups. Also it is obvious that no external publication database will ever fully provide the needs of scientists both citing difficult-to-find 200 year old literature and publications that have just appeared.
So in > > ProxyDataPublication < < please discuss both the basics of Proxies and the specifics of reference models!
The proxy architecture is proposed as a generally used architecture for all biodiversity knowledge domains (see ObsoleteProxyListsInAllTdwgGbifStandards). The following is a graphical overview of the use of proxy in the current version of BDI.SDD_:

I personally believe that proxy data objects are what the GBIF indexer should be built upon. A data provider that participates in GBIF could export all internal objects into a
Other proxy discussions (except ProxyDataPublication):
History and term for the concept: In version BDI.SDD_ 0.9 (dec. 2003)
Donald Hobern commented in email:
My comments: Moving things out is the general trend on the web, but I am very wary about this. I am very much in favor of federating data, collaborating and using IDs to link information. However, I believe data streams and documents should maintain a sense of continuity and preserve the semantics of links for human consumption. This is the essential model of printed books and libraries - and it is one of the foundations of science.
Imagine printed scientific articles - instead of giving human readable references - refer to some GUID code that you can enter into the computer and then obtain the information. This is the current state on the web. Unfortunately, we all know that institutions, even if well managed, change. Data citing specimens solely by case numbers used at a given time in a specific collection may become worthless after some restructuring. In our institution we find that the knowledge of coded references that was preserved over decades, often disappears with retirement of colleagues and valuable data become trash.
So when talking about proxies I have two interfaces in mind:
Perhaps these could be separated. Would it make sense to have the base type (perhaps modified and simplified itself, e.g. using fewer types in Links), and leave the truly immature simple-data interface question out?
I am not sure whether I am ahead (realizing that we need additional interfacing beyond simple guid/uri) or behind times (because the web will become so stable that references are easily retrieved after 100 years) - both is quite possible.
-- Gregor Hagedorn - 29 July 2004
Markus Döring commented on 11. August: "Assume we have always a service which resolves IDs by appending it to the webservice URL:
Gregor Hagedorn: I think the problem is that only a small number of services will be fully under our control (e.g. not publications like "urn:lsid:www.bioservice.org:alpinevegetation:426781876" id"426781876">Abies alba Mill.</Taxon> this is already close to what the proxy base model tries to achieve: a locally referrable id that defines identity even if no external service id is present, a link to the outside, and a human readable label.
Two more problems: a) Most likely at least URL, LSID, and DOI will exist in parallel. That is the only reason why Link is a collection. I personally have no major problem in making it three attributes. It seems a bit artifical, but if that improves acceptance I will gladly endorse it! As you can see in the new UBIF versions (BDI.SDD_.CurrentSchemaVersion) the complex webservice proposal is underscored (starts with "_"), meaning it is tentative and should perhaps be removed. b) Almost all object labels are potentially multilingual. Examples are geographical names, and even the full Agent label often needs transcriptions (Chinese to roman letters) or contains Place names to improve name uniqueness ("Hans Heinrich, Munich" or "Hans Heinrich, München"?). I believe this is a problem for GBIF, which is already now viewed as being a shop dominated by English speaking countries. And Chinese is the most widely used language on earth... However, providing several languages is impossible with an attribute approach. Can this be better hidden than I do? The proposed model is simply the model used in BDI.SDD, which throughout is multilingual. For BDI.SDD_ it is easier to keep it as it is, because this responds to generic code.
Markus Döring: Is it really required to reference another object using XML validation techniques? I could imagine the above simple proxy model to be used just in place somewhere inside the xml hierarchy and not referencing via xml to global proxy objects. So something like:
<SDD>
<Taxon datasource"local" id="123">
<ScientificName>Abies alba Mill.</ScientificName>
<HigherTaxon>Pinaceae</HigherTaxon>
<Genus>Abies</Genus>
<Synonyms> ... </Synonyms>
...
</Taxon>
<Taxon datasource"local" id="124">
...
</Taxon>
<Description datasource"local" id="567">
<Taxon datasource"local" id="123">Abies alba Mill.</Taxon>
<HumanDescription>...</...>
...
</Description>
</SDD>
I don't very much like to impose the xml ID/IDREF constraints on users (must be document global numbers, which usually means that it is more complicated to output a document, since the numbers have to be created on the fly rather than being able to use or hash existing ids). Some people think that the xml id/idref mechanism should be considered depracated. However, replace in <Taxon datasource"local" id="123">Abies alba Mill.</Taxon> "id" with "ref", and it seems you are close to the proxy ref that UBIF is proposing - plus an optional copied Label inside the
The more important distinction at the moment is that in the Micro... types the ref is optional, like in: <Publication ref="123123" /> or
<Publication ref="123123">Smith & Gordon 1999</Publication>
I am unclear whether it is desirable to allow this, but it may be possible. If this is the accepted migration path to a truly linked system that is fine. Perhaps one could optionally also allow: <Publication>Smith & Gordon 1999</Publication>
to simplify later migration to full proxy objects.<Publication language="en">Smith & Gordon 1999</Publication>
-- Gregor Hagedorn - 11 August 2004
Note: In the change from UBIF 1.0 beta 14 to beta 15 I have now removed the Webservice-Linking mechanism, see ProxyLinkByWebservice. -- Gregor Hagedorn - 13 August 2004