Minutes of the SDD Berlin 2006 Meeting (First week of April)
This page contains agenda items together with minutes for individual items. A second page summarizes the results of the meeting: SDD2006BerlinSummary.
Participants for Monday-Thursday: Bob Morris, Gregor Hagedorn, Jacob Asiedu, Damian Barnier, Kevin Thiele. On Tuesday Markus Döring joined us, on Friday only Bob Morris, Gregor Hagedorn, Jacob Asiedu participated.
Schema validation issues -- Monday
Main issue was key/keyref, was ambiguous def. in w3c standard (part 1.3.4 of the W3C Schema rules); erroneously implemented in spy, current solution is: all identity constraints taken out, may be put in (reversely than before) in a future version). Should have been done, Jacob double-checks.
Removed the default attribute '""' of the Audience from !AudienceRepresentationBase. We think this is an error in the schema. The "" has length of zero, but the audience type was defined to have a length of at least one. Should have been done, Jacob double-checks.
Other technical/Java issues -- Monday
Replaced all names,types,refs that begins with "" with "TBD" so that there is no conflicts in castors's local variables which begin with "_". TBD means to be determined.
Clarification: technically there is no problem, just conflict of convention. However, double underscore exists only in draft versions, but signifies that these elements should be deleted before using, xslt to do this exist. We only have this versions around for lack of time of doing this every time we update.
Replaced the SDD type "String" with "StringBDI.SDD" so that there is no conflicts with Java.lang.String and generated code are compiled successfully. Background: Until semantics are agreed UBIF prevents the various ways to not have a string (Like: <element xsi:nil /> -- <element></element> -- <element>""</element> -- (no element at all)
<element>content</element>. This problem of xs:string differs sharply from other simple types like xs:integer, xs:datetime.) Agreed: StringNonEmpty having constraint: length > 0. Done.
Replaced 'xs:Name' with 'xs:NCName' because castor does not yet support xs:Name. Difference is: name allows colons it, non-colonized name (i.e. not Iraq). Agreed and done.
Removed all references to /*/ in the documentation so that Java does not treat them as part of java comments.
Occurs only in the form: xs:selector xpath="Dataset/TaxonNames/*/Label" or element content of xs:annotation. Considered defect of particular schema compiler software.
Multiple (indirect) inclusion of the basic "type library" parts of UBIF schema - is this a problem? Problems only conceptual, but unavoidable to reuse very basic type libraries (like UBIF_TypeLib.xsd) in multiple schema components. Not a problem.
Revise basic object concept (Gregor), id in element optional and only for linking, GUIDs expressed in Links...
removed attribute "unique" from the label text type, considered responsibility of application.
considered changing attribute "role" to "kind" or "form". Not done after revisiting media resources.
changed label role/kind value ="concise" to "full", revisited and revised Label, Detail, MediaResource and Links enumeration.
Where are definitions (for abstract concepts): in MediaResources or in Links/Link? Decision: Link with relBasedOn, MediaResource with role"normative". Different because MediaResource is a Representation, not interresource link.
The basic object concept discussion also covered the issues on normative definitions for character, character states, etc.(Kevin/Damian)
TAPIR issues -- Tuesday morning, together with Markus Döring
Replace recursive tree elements (tree data expressed with standard representation of digraphs)?
Used in Taxon hierarchies, Concept trees, Identification trees.
Problem: xml-tree has order of children. Traversal order: breadth first (node, sequence of children) or depth first (child-child-child, next child, child). Traversal order does not order, but requirement: list of edges must remain ordered, is not a set. (In contrast, a graph is a set of edges!)
For concept trees: old SDD solution does not handle multiple inheritance (ascospore in part of ascus AND ascospore is kind of spore). Possible if tree is generalized to directed acyclical graph with single root.
For taxon hierarchies: We have no preference, but TCS is using a list.
Possible advantages of edge list (digraph with ordered edge list):
TAPIR
Fewer constraints to verify
UML model closer to schema
Concept trees: Decision: no global "tree type", but types on the edges. Revised and changed the label for the new, edge-based model.
Consider: If using digraph with ordered edge list once, we should use it everywhere for any datastructure that is a special case of a digraph (tree, forest, etc.) Reasoning based on robustness grounds.
Various -- Wednesday
Concept states (Kevin/Damian)
Issue is resolved when changing concept trees to list of referrable concepts plus separate hierarchies with edge lists.
MediaResource, location cannot be specified; there link element enumeration what things are, ... (Damian) -- FIXED.
Replace pointer from Concepts to characters with pointer from characters to concepts. Bob proposed this a long time ago and I thought it does not matter much which way round - but now I see the wisdom in this. It just took me a long time. Concepts are more general and likely to be reused if we view SDD as modular data pieces. (Gregor)
* Long discussion, possible to separate issues of concept and character curation when referring from characters to concepts. Idea: character is multiple inheritance from concepts (plus its character data type): Petal color is a color character, is a petal character, is a plant character, is an unaided observation character.
Various -- Thursday
Concept Hierarchies/Character Trees
Discussions carried over from Wednesday on how to arrange and re-use concepts in trees defined using edge lists. Agreed to split the ConceptHierarchies and CharacterTrees with the intention of developing ontology like structures for concepts at a later date.
Resolutions: Abandon ConceptTrees, just keep Character / CharacterTree (no longer arbitrary DAG!), concepts ontologies may be added later.
Result: Structural changes in SDD schema.
RDF/xml-schema -- Thursday
Talk about RDF issues, possibility to mix structured documents with rdf, or learn from rdf
Gregor's Roger-validated talk in EDI/April, architecture meeting - prepare and discuss. See
John Wilbanks: write xslt to express rdf metadata based on structured documents?
Preparation for Talk Presenting SDD at TDWG TAG meeting in Edinburgh (Thursday afternoon)
Roger: "We all have some familiarity with the technologies being presented so we don't need detailed technical talks but what we do need to do is have an overview of how they fit into the broader scheme of things. The whole talk should last a maximum of 15 minutes."
Motivation & Rationale: Very brief introduction to motivations i.e. what it was intended to do and why it takes the form it does.
Market Size: Potential Publishers Who could be producing data like this.
Research to answer this question has not been funded so far.
Market Size: Potential Customers Who could be consuming data like this.
Research to answer this question has not been funded so far.
Success Factors: Significant factors for successful adoption. Why has it been successful? What do you think will make it successful? From an adopters point of view.
SDD itself should be invisible to users, only experienced through software.
Such software will be adapted for data production and consumption if it increases the productiveness of knowledge workers.
Software needs to address the expectations and previous experience of consumers (biologists, etc.).
Such software may be based on proprietary dataformats, as is currently largely the case. SDD will be successfull if the users are demanding data sharing and and collaboration tools, and desire to use the best aspects of multiple software programs for the same dataset.
The wide variety of open source and production-quality supported commercial development tools for the implementation language (XML-Schema).
Hurdles to Adoption: Significant hurdles to adoption. What have been the major hurdles to adoption? Or what are perceived as the major hurdles?
Complexity of the problem
Lack of a tradition in descriptive systematics and taxonomy for broadscale collaborations and early exposure of data
Low level of funding for developing software tools sufficiently advanced to indeed make users productive
SDD needs some external validation tools, not all requirements could be fullfilled using xml-schema-based validation.
Big Picture Where does the technology fit in the model discussed in the morning session (this obviously can't be prepared ahead of time so a blank slide is fine). Points raised in discussion on this will form the detailed agenda for day 2.
It does not.
Various clean-up tasks -- Friday
Participants: Bob, Gregor, Jacob.
Open Issues:
As a result of the TAPIR/non-tree-structure change, natural language support needs redistribution between concept and node-list. Conclusion: shifted into character tree, both on internal nodes and character nodes. Thus in fact now more similar again to SDD 1.0!
Move Modifiers into Concepts. -- Agreed as a step towards a lighter SDD. Modifier sets are now a special case of Concepts. Modifier definitions itself remain unchanged.
Base the Dataset itself on AbstractObject, so far all the Dataset metadata are considered special cases, but they seem to fit well with the general model. Only necessary action: add "coverage" as a Detail-text role. Resolution: Agreed.
Both the taxon hierarchy and identification key are still tree, not edge list (TAPIR issues). - Jacob/Bob todo.
Put keyrefs (taken out in St. Petersburg because of technical issues with w3c schema validation tools) back in. - Jacob/Bob todo.
Note: Originally we had hoped to start thinking about SDD-Lite, but other than a substantial simplification of the modifier definitions, no time was available for this.
The Result of this meeting was at first SDD Version101, which we later decided to change to Version1dot1 because too many changes had accumulated. See also DiscussionFor101b7 (although most discussion occurred through email and telephone conference calls).