LeeBelbin - Fri Nov 20 2009 - Version 1.8
Parent topic: ClosedTopicSchemaDiscussionSDD09

ClosedTopicStatisticalMeasureReuse

I do not like the current structure of Statistical measures.

For categorical states, we can define local states directly at a character, or we can define generic concepts states at the concept tree nodes. A concept can be color, with red, green, ... in the property tree, or fruit with capsule, berry, ... in the structure/part tree. These generic states (should we call them concept states?) can be reused at multiple characters. This not only saves definition work, it also allows to define a shape once, and thus allow an identification processor to abstract from the structure. If a fungus has sexual and asexual spores, one may not know which ones are currently present during the identification. Knowing that there is a concept "Spore shapes" would allow to search in any characters using this concept.

Importantly, categorical states only make sense as a set, and thus entire sets can be referenced.

Back to numerical:

Here we have an extra dimension to define project wide information for any use of mean, max, etc. in Terminology/StatisticalMeasures. The StatisticalMeasure/Generalization can be used to make the measure concept fully interoperable. However, the project-wide definitions are necessary, otherwise we will not be able to add labels and wordings in other languages (German, Chinese, etc.)

However, we have nowhere a set of these, and they are not reusable as a set. You can not specify that for spore measurements you would like to use min, lower range as mean - s.d., mean, upper range as mean + s.d., max, and sample size, whereas for hyphal wall measures you simply record extremes (min to max).

Instead, we would have to add each measure to each state. That makes it impossible to provide a similar functionality as above envisaged for categorical states, looking at the generic concept of spore measure and finding all characters using this concept.

In previous schema versions we had the project wide ("global") measures defined in sets (1:n). However, this worked poorly since the mean or min/max will occur in many sets, as in the example above. The association between set and project-wide definition must be n:m. Currently it is removed completely.

Now what shall we do:

Or should we be more stringent here and say: We do away with Terminology/StatisticalMeasures and define them only at the (not yet existing!) Terminology/ConceptTrees/ConceptTree/Nodes/Concept/StatisticalMeasures? We would have to add the labels for min, max, mean several times, but the Generalization would allow applications to figure out that they are referring to the same concept.

Then, should we allow only complete sets in this case, rather than partial as in the case of the ResolvedTopicGenericStates?

I currently tend to favor the stringent solution, but I urgently would need a good discussion on this...

(return to SchemaDiscussionSDD09s)

-- Gregor Hagedorn - 15 Dec 2003, 9. June 2004