GarryJolleyRogers - Wed Nov 25 2009 - Version 1.8
Parent topic: SchemaDiscussion

QueryLanguages

I just spent the better part of the day with a student in Magedburg who will be working on some attempts to build XQuery-based applications against SDD coded documents and I have some observations. Jacob's experience with the debug tool may have some similar aspects to what I write here.

First, it is interesting to look at queries that can be decomposed into primitives of the form "Give me all taxa X with Character C having state S" With XQuery it seems that you often can choose between using XQuery expressions on simple XPaths or XPath expressions with simple XQuery operations, or some combination thereof, so what I really mean here is that that queries paramaterized mostly by characters and states are certainly a large set of interesting queries, and XQuery should in principle handle them gracefully. That said, some things like Geography are importantly missing. ("Are there red ants in Switzerland?")

Consider a query like "What are the names of all ant taxa whose thorax has a dorsal depression?". This requires chasing a few references and for the class of queries I mentioned above that could be done by incorporating knowledge that in the present schema, keys on Characters can be found in exactly one place, on Terminology/Characters/Character elements, and hence those are the only places to look for a Label with some suitable text (determined by the application) like "thorax". Only twice as hard, there are presently only two places where Label searches would turn up anything like "with a dorsal depression": in the (presumably now found) Character itself as a StateDefinition, or by chasing all the State References into the ConceptTrees and looking for labels like "with a dorsal depression". This is a cheap approach, but it doesn't generalize.

The likely most general thing is to use some external ontology to help you map your query parts into SDD key types. Then the query application can use the xpath constraints in the Schema to determine a path to the objects which are the only ones whose Label can provide matches to your query part. XQuery supports joins over multiple documents, this approach probably really needs XQuery and not just XPath.

-- Main.BobMorris - 21 Jul 2004


First two quick questions back:

a) I do not understand what is missing regarding Geography. You have the same options, either naively look at labels and hope Switzerland is not named Schweiz or Helvetia, or you go out of SDD to an agreed thesaurus and get real mapping of your geography item to those used in the SDD dataset. Clearly this is music of the future, but we hope to have it fully enabled.

b) Can you elaborate on "does not generalize"?

Gregor Hagedorn - 21 Jul 2004


I do wonder about the validity of the assumption that it is a useful thing to go into datasets and look for those containing some word-set matching of "What are the names of all ant taxa whose thorax has a dorsal depression?". This will certainly be useful to mine data on the internet, i.e. find places where there are descriptions, but I consider it unlikely to be useful when you are really looking for an answer. The way to express the same information in biological terms is almost infinite, which is why SDD is burdened with defining terminology.

If you want datamining, you may also look for GlossaryEntry objects containing your words, they are most likely useful (where present, much more than the labels, which may contain all kinds of abbreviations to be concise).

The usual queries that I expect (and which are used in all identification packages I know of) first go into the terminology and ask the user in steps: Would you like to ask a question for this concept/character, then present either the applicable states as options to choose or ask for numerics.

Gregor Hagedorn - 21 Jul 2004