GarryJolleyRogers - Wed Nov 25 2009 - Version 1.9
Parent topic: WebHome
Some notes on changing terminology and whether projects should be based on protying and the fixing terminology, or whether change should be allowed throughout the project. This has bearing on SDD, because some centralized (non-federated) models of terminology would not support change (or at least we would have to think about how they still could be made to do so). Federation is not yet properly tackled in SDD but it will be central for its success and is on the Agenda for review meeting in Berlin. The following is from a draft of a paper I am currently writing, I appreciate any comments on this!
As soon as descriptive data using a specific terminology have been recorded, any substantial change of the terminology on which these descriptions depend has significant consequences. It is obvious that terminology changes must be carefully considered. Nevertheless, experience with systems allowing the ad-hoc definition of new characters and states (and thus an evolutionary development of terminology) is positive and often considered an indispensable feature (pers. comm. of users of
Designing a good terminology is not much different from software development. It is in fact part of the design of an information model. Furthermore, this part of the model is created by the domain expert, who typically has little experience with such activities. Few projects are able to finance a collaboration between domain experts (taxonomists, plant pathologists, etc.) and persons experienced in descriptive data modeling. Even in this fortunate situation, the contradictory nature of conventional terminology, which usually surfaces only during the development of a structured terminology, will be a major obstacle. In software engineering, the attempt to first design a near-perfect information and object model is sometimes called "Big design up front (BDUF)". This classical model of software engineering may still be preferable in certain situations. However, development processes that explicitly contain iterative elements (e. g., the "Rational Unified Process" of inception, elaboration, iterative cycles of design, implementation, testing, and refactoring during construction, and final transition to finished product, Jacobson & al. 1999) are often more successful. The rate of change is even greater in development processes like "extreme programming" (see, e. g., Beck 2000).
Similar to software engineering, the best development process for descriptive terminology depends on the circumstances. Starting with a badly designed terminology, and then having technical personnel record a large amount of data is almost guaranteed to either fail or produce low-quality data. The advantage of iterative or evolutionary development processes can be reaped only if the designer her- or himself struggles with recording data and uses the experiences to improve the terminology. Note that the usefulness of prototyping is very limited when recording biological descriptions. The diversity of organisms is so large that it is very difficult to assess the validity of a terminological model until all organisms have been studied. It is, however, very inefficient to study all descriptions of a taxonomic group in a first pass to ensure that all required characters are present in the terminology, to then record actual data in a second pass. Depending on project size and circumstances:
Also, note that many changes in terminology are near to neutral in their effect on other data items (Table 2). Even where adverse effects exist, these changes often occur shortly after the start of data recording, because problems that have been overlooked in the design phase become evident. Those changes that occur later typically affect rarely used characters, allowing a manual revision of affected data. Changing a bad terminology and reducing the validity of a few data items is preferable to sticking to a bad terminology for the remaining 90% of data to be recorded.
Despite the fact that it is unavoidable to give researchers the freedom to define their own terminology, it is evident that standardization is very desirable. Standardization has a technical aspect and a semantic aspect.
-- Gregor Hagedorn - 30 Mar 2004
Some thoughts as we come to the end of our Prometheus 'Experiment'
Re: Static Versus dynamic Terminology Models
I would agree that a lot of what Gregor discusses above definitely agrees with the experiences of our Main.PrometheusII project.
We realized that any terminology developed would have to be expandable so that it could cope with new concepts and with a widening user base, but that if it was to allow backwards compatibility, expansion must not alter the semantics of earlier versions of the terminology.
This is problematic because it can be argued that addition of new terminology does change the semantics of data collected with previous versions which lacked the new terminology - but at an operational level a taxonomist would have to choose to 'ignore' this subtlety in order to achieve compatibility or integration.
Prometheus tried to test whether if terminology is standardized at a lower level than that at which characters are currently defined - i.e. the defined terminology is used to actually compose characters/states from more atomic statements - it might be possible that the same data can be used to build up new or alternative 'characters'. We hoped that this approach would mean that new ways of describing variation (i.e. when a new conceptual character is 'created') would still be compatible with old data at the level of the underlying decomposed description. However, in practice this seems to be quite a difficult concept for working taxonomists to develop. The idea of what constitutes useful taxonomic characters seems to be ingrained at a higher compositional level and the taxonomists do not see value in decomposing their observations into more atomic statements about variation. As a non-taxonomist I could propose an alternative methodology where specimens are merely described in factual terms, and 'characters' of taxonomic interest are 'discovered' by analyzing the variation in recorded data. Taxonomists however prefer to evaluate the variation before scoring (or at least they do an initial quick evaluation to decide on their 'characters' of interest - before going to detailed scoring of specimens).
-- Main.TrevorPaterson - 01 Apr 2004
Indeed Prometheus has a huge advantage of testing approaches with users. I believe the problems you mention partly come from the tediousness of recording descriptive data on specimens. I agree that your approach of an "alternative methodology where specimens are merely described in factual terms" would work, but it runs into two problems:
SDD has all the time struggled with these problems. The decision to keep a basic flat character model (with largely optional extensions over DELTA) is based on both, the need to incorporate DELTA/Lucid etc. legacy data content, and the believe that many "content providers" are already struggling with structural requirements of character/states, and that highly complex models would probably not find acceptance (we also realized that the believe may well be wrong and that the struggling may be as result of the inadequacies of the character model). However, all through the SDD process, we also tried to accomodate the experience of large projects, who found the analytical capabilities of the character model unsatisfying, and were struggling with managing around a 1000 characters in their project.
So we have added mechanisms that optionally allow to organize characters in more meaningful ways. Unfortunately (?), we currently have two such mechanisms: a fundamental "ontology" definition (= glossary), and an operational mechanism (concept trees mapping on characters; specific concepts like part-hierarchy = structures, properties, methods are marked). I still find this somewhat unsatisfactorily, so the input of how Prometheus organized descriptive knowledge and then thinking about how this could be combined with a relatively low-structure "character model" is very valuable to SDD.
-- Gregor Hagedorn - 2 Apr. 2004
Some thoughts as we come to the end of our Prometheus 'Experiment'
Re: Technical and Semantic Standardisation
It seems very important to scope the range/level/extent over which semantic standardisation is attempted, possible or even useful.
A Top-Down attempt to impose standardisation across a large domain can never succeed. A standard terminology must be developed by/for a specific user/expert group, who can agree on the semantics of terminology for the range of their work. Perhaps it may never be valuable/meaningful to integrate data between distinct user groups - in which case mapping of ontological relations between different terminologies is not worthwhile. However, if different user groups do wish to share information - they would have to agree on a shared terminology between user groups - or do some horrendous non-automatic mapping between their separate terminologies. We have argued about whether it is possible to create a generic terminology - for a wide taxonomic domain, with layers of increasing specificity (perhaps as plug-in modules) of more specific terminology for distinct taxa. Or whether we require totally separate terminologies - that possibly could be partially mapped to each other. There is no obvious answer, but we have to make people aware that if individualised personal terminologies are developed and used - the possibilities for data integration are severely reduced. (A problem being that at an individual level a taxonomist may not care about data integration).
-- Main.TrevorPaterson - 01 Apr 2004
"... at an individual level a taxonomist may not care about data integration": perhaps, but if the data are readily available and can be easily imported, they care a lot about saving time and avoiding spending months on defining terminology. So I do not think that the manual mapping is necessarily horrendous, provided:
Attempts like in Main.PrometheusII or http://www.plantontology.org to provide well defined sets of definitions will be very useful if they are provided in a way that new projects can incorporate them as parts (and preferably, predefined modules, rather than selecting individually from the the whole). That brings us back to SDD. If Prometheus higher plant terminology could be provided in a way that a new project could import it (to extend it locally, or to combine it with other imports), that would both save time for the new project, probably increase the quality of the descriptive data in the project, and provide a base for integration. NOTE that SDD DOES NOT include all the tools for this! Although we carry this scenario around with us (and have made structural decisions to support federation) We still have not decided on an include or import mechanism.
-- Gregor Hagedorn - 2 Apr. 2004
There's a little rambling on the subject in SDD.ExternalTerminology. Possibly it belongs here, but instead I'll link to here. Mostly it concerns the nature of the use of GUIDs for discovery of external terminologies. -- Main.BobMorris - 18 Apr 2004