OSGeo - User contributions [en]

Reading the INSPIRE Metadata Draft

2007-03-14T12:19:55Z

Wiki-Ianibbo: /* Lack of engagement with packaging and re-use issues */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]''' I've added some random musings also, again, no idea if they are of use or even valid questions, but at least they are there and can be edited out, or used as the basis of discussions - [[User:IanIbbo]].

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this observation is spot on, but for different reasons perhaps. I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract discovery model (The probably *do* belong in some result record schema). These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or as already said, to a very specific kind of returned result record). What I'd really like to see is a much clearer statement of what the purpose of the abstract discovery model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model. (Actually.. I should say that I'm baised by the information retrieval community generally, in that it's considered really important to have a seperate abstract model for discovery (The search access points) and then bind that model on to as many backend schemas as needed.. this decoupling is seen as best practice in the information retrival domain, and most of my concerns here are that because of the apparent 1:1 mapping between the abstract model and the implementation. This is the approach taken in the [[Z3950 GEO profile] http://www.blueangeltech.com/standards/GeoProfile/geo22.htm]).

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data. This seems a specific example where the abstract IR model needs to go beyond what is defined in the A2 binding.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear (in the abstract model, it is in the A2 binding) where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

II: It does tend to talk about "Lineage statement"... would making it (More along the conceptual lines of)
<pre>
<lineage>
<dc:description>Text</dc:description>
</lineage>
</pre>
Give you the extensibility to either use private extensions, or to specify recombination elements at a later date (I didn't think this through in terms of the *actual* recombination operations, just wanted to show how we might make lineage extensible without specifying it.
<pre>
<lineage>
<dc:description>This dataset is a recombination of X and Y</dc:description>
<Jo:recombination>
<Jo:source>X-URI</Jo:source>
<Jo:source>Y-URI</Jo:source>
<Jo:Rules>Overlap</Jo:Rules>
<Jo:recombination>
</lineage>
</pre>
Should the lineage search point be called "LineageDescription" (I think thats what I'll do in my SRW profile).

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

II: Indeed, as well as the srw/sru binding experiment, I've been wondering about the OAI binding, which I know you've already discussed elsewhere. What might be generally useful (And maybe this already exists) is a set of TREC style test data. Setting up a static gateway OAI server wouldn't be so hard, and might give us some valuable real-world information about this problem. I now the records won't be in the right schema, but something we can try and munge into gmd would be a real help.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

II: Aye, generally for discovery services it's nice to try and avoid mandating that users understand predefined controlled vocabularies, whilst allowing users who do know terms to qualify their discovery process, for example, in CQL I'd be tempted to allow a user to say "dc.subject=Something" or (The equivalent of) "authority=19115:2003 and dc.subject=Something" for users who know a specific term.

There's quite a lot of work going on around europe at the moment covering crosswalks of controlled vocabularies (Mostly I know about crosswalking euroopean educational levels, but It seems to be the same problem cast in a different way). If we can arrange for someone to do the intellectual work of cross-mapping, and make the data publically available, then it becomes a "Turning-the-handle" job for providers to support cross vocab retrieval. Standards such as ZThes are being used quite a lot in the learning domain to transport this data around. The only effect on reviwing the IR is that it's important that the IR does not preclude this at a future date? (The whole design for unforseen use thing.. specifically, I think mandating a specific vocab in the IR might not be the right thing to do, and giving users a way to say which vocab they are using in the description and discovery process is a better way to go....)

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T12:03:34Z

Wiki-Ianibbo: /* Search / discovery services */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]''' I've added some random musings also, again, no idea if they are of use or even valid questions, but at least they are there and can be edited out, or used as the basis of discussions - [[User:IanIbbo]].

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this observation is spot on, but for different reasons perhaps. I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract discovery model (The probably *do* belong in some result record schema). These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or as already said, to a very specific kind of returned result record). What I'd really like to see is a much clearer statement of what the purpose of the abstract discovery model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model. (Actually.. I should say that I'm baised by the information retrieval community generally, in that it's considered really important to have a seperate abstract model for discovery (The search access points) and then bind that model on to as many backend schemas as needed.. this decoupling is seen as best practice in the information retrival domain, and most of my concerns here are that because of the apparent 1:1 mapping between the abstract model and the implementation. This is the approach taken in the [[Z3950 GEO profile] http://www.blueangeltech.com/standards/GeoProfile/geo22.htm]).

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data. This seems a specific example where the abstract IR model needs to go beyond what is defined in the A2 binding.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear (in the abstract model, it is in the A2 binding) where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

II: It does tend to talk about "Lineage statement"... would making it (More along the conceptual lines of)
<pre>
<lineage>
<dc:description>Text</dc:description>
</lineage>
</pre>
Give you the extensibility to either use private extensions, or to specify recombination elements at a later date (I didn't think this through in terms of the *actual* recombination operations, just wanted to show how we might make lineage extensible without specifying it.
<pre>
<lineage>
<dc:description>This dataset is a recombination of X and Y</dc:description>
<Jo:recombination>
<Jo:source>X-URI</Jo:source>
<Jo:source>Y-URI</Jo:source>
<Jo:Rules>Overlap</Jo:Rules>
<Jo:recombination>
</lineage>
</pre>
Should the lineage search point be called "LineageDescription" (I think thats what I'll do in my SRW profile).

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

II: Indeed, as well as the srw/sru binding experiment, I've been wondering about the OAI binding, which I know you've already discussed elsewhere. What might be generally useful (And maybe this already exists) is a set of TREC style test data. Setting up a static gateway OAI server wouldn't be so hard, and might give us some valuable real-world information about this problem.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

II: Aye, generally for discovery services it's nice to try and avoid mandating that users understand predefined controlled vocabularies, whilst allowing users who do know terms to qualify their discovery process, for example, in CQL I'd be tempted to allow a user to say "dc.subject=Something" or (The equivalent of) "authority=19115:2003 and dc.subject=Something" for users who know a specific term.

There's quite a lot of work going on around europe at the moment covering crosswalks of controlled vocabularies (Mostly I know about crosswalking euroopean educational levels, but It seems to be the same problem cast in a different way). If we can arrange for someone to do the intellectual work of cross-mapping, and make the data publically available, then it becomes a "Turning-the-handle" job for providers to support cross vocab retrieval. Standards such as ZThes are being used quite a lot in the learning domain to transport this data around. The only effect on reviwing the IR is that it's important that the IR does not preclude this at a future date? (The whole design for unforseen use thing.. specifically, I think mandating a specific vocab in the IR might not be the right thing to do, and giving users a way to say which vocab they are using in the description and discovery process is a better way to go....)

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T12:02:23Z

Wiki-Ianibbo: /* Search / discovery services */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]''' I've added some random musings also, again, no idea if they are of use or even valid questions, but at least they are there and can be edited out, or used as the basis of discussions - [[User:IanIbbo]].

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this observation is spot on, but for different reasons perhaps. I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract discovery model (The probably *do* belong in some result record schema). These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or as already said, to a very specific kind of returned result record). What I'd really like to see is a much clearer statement of what the purpose of the abstract discovery model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model. (Actually.. I should say that I'm baised by the information retrieval community generally, in that it's considered really important to have a seperate abstract model for discovery (The search access points) and then bind that model on to as many backend schemas as needed.. this decoupling is seen as best practice in the information retrival domain, and most of my concerns here are that because of the apparent 1:1 mapping between the abstract model and the implementation. This is the approach taken in the [[Z3950 GEO profile] http://www.blueangeltech.com/standards/GeoProfile/geo22.htm]).

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data. This seems a specific example where the abstract IR model needs to go beyond what is defined in the A2 binding.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

II: It does tend to talk about "Lineage statement"... would making it (More along the conceptual lines of)
<pre>
<lineage>
<dc:description>Text</dc:description>
</lineage>
</pre>
Give you the extensibility to either use private extensions, or to specify recombination elements at a later date (I didn't think this through in terms of the *actual* recombination operations, just wanted to show how we might make lineage extensible without specifying it.
<pre>
<lineage>
<dc:description>This dataset is a recombination of X and Y</dc:description>
<Jo:recombination>
<Jo:source>X-URI</Jo:source>
<Jo:source>Y-URI</Jo:source>
<Jo:Rules>Overlap</Jo:Rules>
<Jo:recombination>
</lineage>
</pre>
Should the lineage search point be called "LineageDescription" (I think thats what I'll do in my SRW profile).

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

II: Indeed, as well as the srw/sru binding experiment, I've been wondering about the OAI binding, which I know you've already discussed elsewhere. What might be generally useful (And maybe this already exists) is a set of TREC style test data. Setting up a static gateway OAI server wouldn't be so hard, and might give us some valuable real-world information about this problem.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

II: Aye, generally for discovery services it's nice to try and avoid mandating that users understand predefined controlled vocabularies, whilst allowing users who do know terms to qualify their discovery process, for example, in CQL I'd be tempted to allow a user to say "dc.subject=Something" or (The equivalent of) "authority=19115:2003 and dc.subject=Something" for users who know a specific term.

There's quite a lot of work going on around europe at the moment covering crosswalks of controlled vocabularies (Mostly I know about crosswalking euroopean educational levels, but It seems to be the same problem cast in a different way). If we can arrange for someone to do the intellectual work of cross-mapping, and make the data publically available, then it becomes a "Turning-the-handle" job for providers to support cross vocab retrieval. Standards such as ZThes are being used quite a lot in the learning domain to transport this data around. The only effect on reviwing the IR is that it's important that the IR does not preclude this at a future date? (The whole design for unforseen use thing.. specifically, I think mandating a specific vocab in the IR might not be the right thing to do, and giving users a way to say which vocab they are using in the description and discovery process is a better way to go....)

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T11:59:04Z

Wiki-Ianibbo: /* Search / discovery services */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]''' I've added some random musings also, again, no idea if they are of use or even valid questions, but at least they are there and can be edited out, or used as the basis of discussions - [[User:IanIbbo]].

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this observation is spot on, but for different reasons perhaps. I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract discovery model (The probably *do* belong in some result record schema). These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or as already said, to a very specific kind of returned result record). What I'd really like to see is a much clearer statement of what the purpose of the abstract discovery model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model. (Actually.. I should say that I'm baised by the information retrieval community generally, in that it's considered really important to have a seperate abstract model for discovery (The search access points) and then bind that model on to as many backend schemas as needed.. this decoupling is seen as best practice in the information retrival domain, and most of my concerns here are that because of the apparent 1:1 mapping between the abstract model and the implementation. This is the approach taken in the [[Z3950 GEO profile] http://www.blueangeltech.com/standards/GeoProfile/geo22.htm]).

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

II: It does tend to talk about "Lineage statement"... would making it (More along the conceptual lines of)
<pre>
<lineage>
<dc:description>Text</dc:description>
</lineage>
</pre>
Give you the extensibility to either use private extensions, or to specify recombination elements at a later date (I didn't think this through in terms of the *actual* recombination operations, just wanted to show how we might make lineage extensible without specifying it.
<pre>
<lineage>
<dc:description>This dataset is a recombination of X and Y</dc:description>
<Jo:recombination>
<Jo:source>X-URI</Jo:source>
<Jo:source>Y-URI</Jo:source>
<Jo:Rules>Overlap</Jo:Rules>
<Jo:recombination>
</lineage>
</pre>
Should the lineage search point be called "LineageDescription" (I think thats what I'll do in my SRW profile).

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

II: Indeed, as well as the srw/sru binding experiment, I've been wondering about the OAI binding, which I know you've already discussed elsewhere. What might be generally useful (And maybe this already exists) is a set of TREC style test data. Setting up a static gateway OAI server wouldn't be so hard, and might give us some valuable real-world information about this problem.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

II: Aye, generally for discovery services it's nice to try and avoid mandating that users understand predefined controlled vocabularies, whilst allowing users who do know terms to qualify their discovery process, for example, in CQL I'd be tempted to allow a user to say "dc.subject=Something" or (The equivalent of) "authority=19115:2003 and dc.subject=Something" for users who know a specific term.

There's quite a lot of work going on around europe at the moment covering crosswalks of controlled vocabularies (Mostly I know about crosswalking euroopean educational levels, but It seems to be the same problem cast in a different way). If we can arrange for someone to do the intellectual work of cross-mapping, and make the data publically available, then it becomes a "Turning-the-handle" job for providers to support cross vocab retrieval. Standards such as ZThes are being used quite a lot in the learning domain to transport this data around. The only effect on reviwing the IR is that it's important that the IR does not preclude this at a future date? (The whole design for unforseen use thing.. specifically, I think mandating a specific vocab in the IR might not be the right thing to do, and giving users a way to say which vocab they are using in the description and discovery process is a better way to go....)

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T11:58:41Z

Wiki-Ianibbo: /* Search / discovery services */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]''' I've added some random musings also, again, no idea if they are of use or even valid questions, but at least they are there and can be edited out, or used as the basis of discussions - [[User:IanIbbo]].

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this observation is spot on, but for different reasons perhaps. I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract discovery model (The probably *do* belong in some result record schema). These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or as already said, to a very specific kind of returned result record). What I'd really like to see is a much clearer statement of what the purpose of the abstract discovery model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model. (Actually.. I should say that I'm baised by the information retrieval community generally, in that it's considered really important to have a seperate abstract model for discovery (The search access points) and then bind that model on to as many backend schemas as needed.. this decoupling is seen as best practice in the information retrival domain, and most of my concerns here are that because of the apparent 1:1 mapping between the abstract model and the implementation. This is the approach taken in the [Z3950 GEO profile http://www.blueangeltech.com/standards/GeoProfile/geo22.htm]).

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

II: It does tend to talk about "Lineage statement"... would making it (More along the conceptual lines of)
<pre>
<lineage>
<dc:description>Text</dc:description>
</lineage>
</pre>
Give you the extensibility to either use private extensions, or to specify recombination elements at a later date (I didn't think this through in terms of the *actual* recombination operations, just wanted to show how we might make lineage extensible without specifying it.
<pre>
<lineage>
<dc:description>This dataset is a recombination of X and Y</dc:description>
<Jo:recombination>
<Jo:source>X-URI</Jo:source>
<Jo:source>Y-URI</Jo:source>
<Jo:Rules>Overlap</Jo:Rules>
<Jo:recombination>
</lineage>
</pre>
Should the lineage search point be called "LineageDescription" (I think thats what I'll do in my SRW profile).

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

II: Indeed, as well as the srw/sru binding experiment, I've been wondering about the OAI binding, which I know you've already discussed elsewhere. What might be generally useful (And maybe this already exists) is a set of TREC style test data. Setting up a static gateway OAI server wouldn't be so hard, and might give us some valuable real-world information about this problem.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

II: Aye, generally for discovery services it's nice to try and avoid mandating that users understand predefined controlled vocabularies, whilst allowing users who do know terms to qualify their discovery process, for example, in CQL I'd be tempted to allow a user to say "dc.subject=Something" or (The equivalent of) "authority=19115:2003 and dc.subject=Something" for users who know a specific term.

There's quite a lot of work going on around europe at the moment covering crosswalks of controlled vocabularies (Mostly I know about crosswalking euroopean educational levels, but It seems to be the same problem cast in a different way). If we can arrange for someone to do the intellectual work of cross-mapping, and make the data publically available, then it becomes a "Turning-the-handle" job for providers to support cross vocab retrieval. Standards such as ZThes are being used quite a lot in the learning domain to transport this data around. The only effect on reviwing the IR is that it's important that the IR does not preclude this at a future date? (The whole design for unforseen use thing.. specifically, I think mandating a specific vocab in the IR might not be the right thing to do, and giving users a way to say which vocab they are using in the description and discovery process is a better way to go....)

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T11:47:32Z

Wiki-Ianibbo: /* Issues */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]''' I've added some random musings also, again, no idea if they are of use or even valid questions, but at least they are there and can be edited out, or used as the basis of discussions - [[User:IanIbbo]].

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this comment is spot on, I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract model. These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or to a very specific kind of returned result record) of the abstract model onto some concrete semantics. What I'd really like to see is a much clearer statement of what the purpose of the abstract model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model.

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

II: It does tend to talk about "Lineage statement"... would making it (More along the conceptual lines of)
<pre>
<lineage>
<dc:description>Text</dc:description>
</lineage>
</pre>
Give you the extensibility to either use private extensions, or to specify recombination elements at a later date (I didn't think this through in terms of the *actual* recombination operations, just wanted to show how we might make lineage extensible without specifying it.
<pre>
<lineage>
<dc:description>This dataset is a recombination of X and Y</dc:description>
<Jo:recombination>
<Jo:source>X-URI</Jo:source>
<Jo:source>Y-URI</Jo:source>
<Jo:Rules>Overlap</Jo:Rules>
<Jo:recombination>
</lineage>
</pre>
Should the lineage search point be called "LineageDescription" (I think thats what I'll do in my SRW profile).

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

II: Indeed, as well as the srw/sru binding experiment, I've been wondering about the OAI binding, which I know you've already discussed elsewhere. What might be generally useful (And maybe this already exists) is a set of TREC style test data. Setting up a static gateway OAI server wouldn't be so hard, and might give us some valuable real-world information about this problem.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

II: Aye, generally for discovery services it's nice to try and avoid mandating that users understand predefined controlled vocabularies, whilst allowing users who do know terms to qualify their discovery process, for example, in CQL I'd be tempted to allow a user to say "dc.subject=Something" or (The equivalent of) "authority=19115:2003 and dc.subject=Something" for users who know a specific term.

There's quite a lot of work going on around europe at the moment covering crosswalks of controlled vocabularies (Mostly I know about crosswalking euroopean educational levels, but It seems to be the same problem cast in a different way). If we can arrange for someone to do the intellectual work of cross-mapping, and make the data publically available, then it becomes a "Turning-the-handle" job for providers to support cross vocab retrieval. Standards such as ZThes are being used quite a lot in the learning domain to transport this data around. The only effect on reviwing the IR is that it's important that the IR does not preclude this at a future date? (The whole design for unforseen use thing.. specifically, I think mandating a specific vocab in the IR might not be the right thing to do, and giving users a way to say which vocab they are using in the description and discovery process is a better way to go....)

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T11:47:07Z

Wiki-Ianibbo: /* Issues */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]''' I've added some of my own random musings also, again, no idea if they are of use or even valid questions, but at least they are there and can be edited out, or used as the basis of discussions - [[User:IanIbbo]].

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this comment is spot on, I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract model. These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or to a very specific kind of returned result record) of the abstract model onto some concrete semantics. What I'd really like to see is a much clearer statement of what the purpose of the abstract model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model.

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

II: It does tend to talk about "Lineage statement"... would making it (More along the conceptual lines of)
<pre>
<lineage>
<dc:description>Text</dc:description>
</lineage>
</pre>
Give you the extensibility to either use private extensions, or to specify recombination elements at a later date (I didn't think this through in terms of the *actual* recombination operations, just wanted to show how we might make lineage extensible without specifying it.
<pre>
<lineage>
<dc:description>This dataset is a recombination of X and Y</dc:description>
<Jo:recombination>
<Jo:source>X-URI</Jo:source>
<Jo:source>Y-URI</Jo:source>
<Jo:Rules>Overlap</Jo:Rules>
<Jo:recombination>
</lineage>
</pre>
Should the lineage search point be called "LineageDescription" (I think thats what I'll do in my SRW profile).

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

II: Indeed, as well as the srw/sru binding experiment, I've been wondering about the OAI binding, which I know you've already discussed elsewhere. What might be generally useful (And maybe this already exists) is a set of TREC style test data. Setting up a static gateway OAI server wouldn't be so hard, and might give us some valuable real-world information about this problem.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

II: Aye, generally for discovery services it's nice to try and avoid mandating that users understand predefined controlled vocabularies, whilst allowing users who do know terms to qualify their discovery process, for example, in CQL I'd be tempted to allow a user to say "dc.subject=Something" or (The equivalent of) "authority=19115:2003 and dc.subject=Something" for users who know a specific term.

There's quite a lot of work going on around europe at the moment covering crosswalks of controlled vocabularies (Mostly I know about crosswalking euroopean educational levels, but It seems to be the same problem cast in a different way). If we can arrange for someone to do the intellectual work of cross-mapping, and make the data publically available, then it becomes a "Turning-the-handle" job for providers to support cross vocab retrieval. Standards such as ZThes are being used quite a lot in the learning domain to transport this data around. The only effect on reviwing the IR is that it's important that the IR does not preclude this at a future date? (The whole design for unforseen use thing.. specifically, I think mandating a specific vocab in the IR might not be the right thing to do, and giving users a way to say which vocab they are using in the description and discovery process is a better way to go....)

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T11:43:33Z

Wiki-Ianibbo: /* Bypassing of feature-level metadata from consideration */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]'''

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this comment is spot on, I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract model. These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or to a very specific kind of returned result record) of the abstract model onto some concrete semantics. What I'd really like to see is a much clearer statement of what the purpose of the abstract model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model.

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

II: It does tend to talk about "Lineage statement"... would making it (More along the conceptual lines of)
<pre>
<lineage>
<dc:description>Text</dc:description>
</lineage>
</pre>
Give you the extensibility to either use private extensions, or to specify recombination elements at a later date (I didn't think this through in terms of the *actual* recombination operations, just wanted to show how we might make lineage extensible without specifying it.
<pre>
<lineage>
<dc:description>This dataset is a recombination of X and Y</dc:description>
<Jo:recombination>
<Jo:source>X-URI</Jo:source>
<Jo:source>Y-URI</Jo:source>
<Jo:Rules>Overlap</Jo:Rules>
<Jo:recombination>
</lineage>
</pre>
Should the lineage search point be called "LineageDescription" (I think thats what I'll do in my SRW profile).

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

II: Indeed, as well as the srw/sru binding experiment, I've been wondering about the OAI binding, which I know you've already discussed elsewhere. What might be generally useful (And maybe this already exists) is a set of TREC style test data. Setting up a static gateway OAI server wouldn't be so hard, and might give us some valuable real-world information about this problem.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

II: Aye, generally for discovery services it's nice to try and avoid mandating that users understand predefined controlled vocabularies, whilst allowing users who do know terms to qualify their discovery process, for example, in CQL I'd be tempted to allow a user to say "dc.subject=Something" or (The equivalent of) "authority=19115:2003 and dc.subject=Something" for users who know a specific term.

There's quite a lot of work going on around europe at the moment covering crosswalks of controlled vocabularies (Mostly I know about crosswalking euroopean educational levels, but It seems to be the same problem cast in a different way). If we can arrange for someone to do the intellectual work of cross-mapping, and make the data publically available, then it becomes a "Turning-the-handle" job for providers to support cross vocab retrieval. Standards such as ZThes are being used quite a lot in the learning domain to transport this data around. The only effect on reviwing the IR is that it's important that the IR does not preclude this at a future date? (The whole design for unforseen use thing.. specifically, I think mandating a specific vocab in the IR might not be the right thing to do, and giving users a way to say which vocab they are using in the description and discovery process is a better way to go....)

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T11:26:50Z

Wiki-Ianibbo: /* Lack of machine-reusable data in general */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]'''

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this comment is spot on, I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract model. These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or to a very specific kind of returned result record) of the abstract model onto some concrete semantics. What I'd really like to see is a much clearer statement of what the purpose of the abstract model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model.

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

II: It does tend to talk about "Lineage statement"... would making it (More along the conceptual lines of)
<pre>
<lineage>
<dc:description>Text</dc:description>
</lineage>
</pre>
Give you the extensibility to either use private extensions, or to specify recombination elements at a later date (I didn't think this through in terms of the *actual* recombination operations, just wanted to show how we might make lineage extensible without specifying it.
<pre>
<lineage>
<dc:description>This dataset is a recombination of X and Y</dc:description>
<Jo:recombination>
<Jo:source>X-URI</Jo:source>
<Jo:source>Y-URI</Jo:source>
<Jo:Rules>Overlap</Jo:Rules>
<Jo:recombination>
</lineage>
</pre>
Should the lineage search point be called "LineageDescription" (I think thats what I'll do in my SRW profile).

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

II: Indeed, as well as the srw/sru binding experiment, I've been wondering about the OAI binding, which I know you've already discussed elsewhere. What might be generally useful (And maybe this already exists) is a set of TREC style test data. Setting up a static gateway OAI server wouldn't be so hard, and might give us some valuable real-world information about this problem.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T11:26:23Z

Wiki-Ianibbo: /* Lack of machine-reusable data in general */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]'''

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this comment is spot on, I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract model. These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or to a very specific kind of returned result record) of the abstract model onto some concrete semantics. What I'd really like to see is a much clearer statement of what the purpose of the abstract model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model.

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

It does tend to talk about "Lineage statement"... would making it (More along the conceptual lines of)
<pre>
<lineage>
<dc:description>Text</dc:description>
</lineage>
</pre>
Give you the extensibility to either use private extensions, or to specify recombination elements at a later date (I didn't think this through in terms of the *actual* recombination operations, just wanted to show how we might make lineage extensible without specifying it.
<pre>
<lineage>
<dc:description>This dataset is a recombination of X and Y</dc:description>
<Jo:recombination>
<Jo:source>X-URI</Jo:source>
<Jo:source>Y-URI</Jo:source>
<Jo:Rules>Overlap</Jo:Rules>
<Jo:recombination>
</lineage>
</pre>
Should the lineage search point be called "LineageDescription" (I think thats what I'll do in my SRW profile).

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

II: Indeed, as well as the srw/sru binding experiment, I've been wondering about the OAI binding, which I know you've already discussed elsewhere. What might be generally useful (And maybe this already exists) is a set of TREC style test data. Setting up a static gateway OAI server wouldn't be so hard, and might give us some valuable real-world information about this problem.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T11:21:01Z

Wiki-Ianibbo: /* Lack of engagement with packaging and re-use issues */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]'''

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this comment is spot on, I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract model. These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or to a very specific kind of returned result record) of the abstract model onto some concrete semantics. What I'd really like to see is a much clearer statement of what the purpose of the abstract model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model.

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

II: Indeed, as well as the srw/sru binding experiment, I've been wondering about the OAI binding, which I know you've already discussed elsewhere. What might be generally useful (And maybe this already exists) is a set of TREC style test data. Setting up a static gateway OAI server wouldn't be so hard, and might give us some valuable real-world information about this problem.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T11:20:45Z

Wiki-Ianibbo: /* Lack of engagement with packaging and re-use issues */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]'''

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this comment is spot on, I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract model. These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or to a very specific kind of returned result record) of the abstract model onto some concrete semantics. What I'd really like to see is a much clearer statement of what the purpose of the abstract model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model.

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

Indeed, as well as the srw/sru binding experiment, I've been wondering about the OAI binding, which I know you've already discussed elsewhere. What might be generally useful (And maybe this already exists) is a set of TREC style test data. Setting up a static gateway OAI server wouldn't be so hard, and might give us some valuable real-world information about this problem.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T11:17:21Z

Wiki-Ianibbo: /* Search / discovery services */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]'''

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this comment is spot on, I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract model. These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or to a very specific kind of returned result record) of the abstract model onto some concrete semantics. What I'd really like to see is a much clearer statement of what the purpose of the abstract model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model.

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM for example we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T11:16:46Z

Wiki-Ianibbo: /* Search / discovery services */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]'''

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this comment is spot on, I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract model. These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or to a very specific kind of returned result record) of the abstract model onto some concrete semantics. What I'd really like to see is a much clearer statement of what the purpose of the abstract model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model.

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it shouldn't, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T11:13:08Z

Wiki-Ianibbo: /* Search / discovery services */

Metadata about geographic data is at the heart of INSPIRE. The metadata draft is the first in the set of "implementing rules" and it will underpin all the other implementing rules. The consultation process is open until 2007-03-30. While the documents are open access, comments can only be offered through an SDIC or Spatial Data Interest Community.

The Free and Open Source Geospatial Community has a voice through one of these SDICs thanks to Markus Neteler. This page contains preparatory material for a collective response through the FOSS GIS SDIC, from the POV of people implementing and managing metadata creation, collection and search services, working closely with many different data user communities.

* The response proper will live at [[Response to INSPIRE Metadata Draft]]. Initial notes are included below in the Issues section.
* It is interesting to read this in parallel with the North American Metadata Profile draft which is also currently in consultation. It's hoped the OSGeo community will also be able to contribute to a [[Response to NAP Metadata Draft]] and get the [http://geodatacommons.umaine.edu geodata commons] project involved in this.

== Reading the draft ==

* [http://www.ec-gis.org/inspire/reports/ImplementingRules/draftINSPIREMetadataIRv2_20070202.pdf the Implementing Rules for Metadata Draft] (pdf)
* [http://www.ec-gis.org/inspire/whatsnew.cfm#1590 supporting / background material]
* Pages 1-17 are metadata about the document itself, intentions and history, and can be safely skipped. Pages 43-104 are the Annexes.
* Annex A is particularly interesting as there are details of the thinking exposed in the mapping to ISO19115/39 that aren't set out in the implementing rules. ''If you want to know what's likely to affect you but are short on time, at minimum read section 5 and Annex A'''.

== Lightning Summary of the draft ==

The draft establishes a basic information model for metadata which is close to, but not specific to, ISO19115 and OGC Web Services.

It only mandates what metadata is published by and for public authorities covered by INSPIRE - it does not try to cover repository management or internal processes.

It separates out metadata properties into those useful for 'discovery', 'evaluation', and 'use'. It identifies one very high level "use case" for spatial data search services built from metadata being shared at this level.

It differentiates between properties useful for 'non-specialist' and 'expert' users into 2 Levels, 1 and 2. Level 1 is always mandatory. This *includes* classification according to the data themes in the INSPIRE annexes, and keywords from controlled vocabularies which are not covered by the IR document but are left to Spatial Data Theme Communities. (How these communities are found, selected, and make their decisions, is unknown to us at this time.)

= Issues =

'''This list is an overview of what jumped out at me as something to address. I don't know how much of this is appropriate to send back, or how much can be fixed. - [[User:JoWalsh]]'''

== Conceptual overview ==

The model maps quite well to the minimum useful subset identified in [[DCLite4G]]. It looks like a lightweight core. But, the model and the draft break down the problem space of metadata in a way that is a reaction to artificial scarcity of data. It identifies three phases of the metadata use cycle:
* discovery (of what data is out there)
* evaluation (of whether the data will be useful for specific purpose)
* use (once access gained, how to best use the data)

It is illuminates to compare this with the [[Reading the NAP Metadata Draft|North American Profile]] metadata draft which talks about

* discovery
* access
* fitness for use (e.g. evaluation)
* transfer

So the IRs both don't address how to make the data more useful via metadata, and are vague about how much a minimal subset is going to provide enough information to evaluate utility on. Generally the draft dances around data licensing access issues, and glosses over the over-engineering needed to work around artificial constraints on availability. IRs for evaluation and use of data based on metadata are not covered by this draft at all, but left up to the Spatial Data Theme communities for each of the 35 data themes identified in Annexes I-III of the INSPIRE text.

== Issues with specific metadata properties ==

The model maps quite well to [[DCLite4G]]. It looks like a lightweight core.

=== Things that aren't there that should be ===

'''5.2.8 Resource responsible party'''. Each dataset *must* have one or more people/organisations responsible for it. The IR says that this can be freetext or can be in more structured form. This '''only''' includes the responsible party's name, but NOT any form of contact details.

'''Some form of electronic or telephonic contact address should be mandatory, if the org/person's details are mandatory.''' Why publish ownership information - especially if there are constraints on access and reuse of the described data - if you can't immediately get in personal contact with someone who can make assurances about the data?

Annex A on mapping to IS019115 mandates that contact persons and organisations be free text, not resource identifiers. 2 serious problems with the ISO 19115 mapping:

* It does not ask or provide for contact details.
* It looks *mandatory* that the reponsible party be given a role, which in turn is one of N codes published by the Library of Congress to describe people's roles within organisations.

'''No discussion of formalising dataset accuracy / completeness - crucial for cost-benefit evaluation / evaluation of suitability for combining with other data sets.'''

=== Things that are there that probably shouldn't be ===

Every 'dataset or dataset series' published under INSPIRE *must* include both a '''Resource topic category''' and a set of resource '''keywords'''.

Topic categories are very high-level classifications which correspond to each of the Spatial Data Themes identified in Annexes I, II and III of the INSPIRE Directive.

* Which topic category data fits in will often be a property of an organisation not any published data sets.

From an implementor's POV this will involve something like selecting a topic category for data at install time of metadata publishing engine, and forgetting about it. The IRs place a lot of faith in the ability of simple keyword / classification code matches to enhance utility of search and discovery services for users.

But. this already raises the bar for non-expert users (the domain vocabulary is jargon specific or oriented towards specialist codes)

The IRs emphasise the fact that keywords should originate from a ''controlled vocabulary''. The reponsibility for creating one is not in the hands of the Drafting Teams but in the hands of Spatial Data Theme Communities. How these are constituted and how their decisions become binding are unclear.

Again, faith in keywords for search utility is misplaced. Reliance on them may lead to false negatives. Again assumes familiarity with, or time and ability to learn about, what to expect in the domain from a non-expert user, and an expert will need a better level of detail. Pitfalls of 'controlled' keywording:
* intentional misclassification
* lazy/default misclassification

Both of these are at 'Level 1 for discovery metadata' which implies that any INSPIRE compliant metadata set MUST have both topic category and associated keywords.

== Areas which are unclear ==

=== Conformity ===

This is an IR and obligatory to deal with. But 5.3.4 just says "see Annex F".
Annex F '''in its entirety''' says:

The way in which conformity is expressed in the INSPIRE IR will be defined in a subsequent draft based on discussions with the Drafting Team on Data specifications and harmonization.

(Is this where accuracy/completeness comes in? How can we know?)

=== Dataset series / Aggregate data ===

IR talks about dataset series. Some of the diagrams talk about 'MD_Aggregates'- this term isn't used elsewhere. No conception in this model of one UrDataSet with many different potential sources according to how they are packaged or processed. As the IRs mandate properties for dataset series, really need more clarity / examples about what they actually are.

== General concerns ==

=== Search / discovery services ===

The preamble (p.7) states that "separate IRs for discovery services are being prepared and are not the subject of this document." But the INSPIRE use case is predicated on the availablity of 'Geoportal' style search services. What else *are* discovery services if they are not the search services treated of here? If there is only going to be an abstract model for discovery, and these IRs are careful to avoid imposing any constraints on internal data repository management, how much more can a discovery services draft provide?

II: I think this comment is spot on, I'm finding it difficult to express concrete concerns.. but Section 5.2 "Discovery metadata elements" starts to set out a list of concepts seen to be (The document hints at, but does not directly say) core to the discovery process. Section 5.3 then sets out "Abstract discovery metadata element set". I *guess* the implication is that the concepts laid out in 5.2 are in some way even more abstract than those set out in 5.3. The document really isn't clear about what the abstract model is, or what it is for, before it starts enumerating the concepts. Your later comment about being tied to web services is spot on also here, I'm really not sure "Service type version", "Operation name" and "Distributed computing platform" belong in an abstract model. These three attributes seem to belong specifically to a particular (And I would guess already existing) service binding (Or to a very specific kind of returned result record) of the abstract model onto some concrete semantics. What I'd really like to see is a much clearer statement of what the purpose of the abstract model is. Hopefully, once that is tightly defined, it should become easier to decide what lies inside the boundary of the abstract model, and what belongs in the domain of specific realisations of the abstract model.

I'm a bit confused by the "Temporal Reference" Element... 5.2.2. Talks about what I would expect to see from a temporal reference, but 5.3.2 maps temporal reference on to "One of the dates of publication, last revision or creation of the resource". These three elements are already well defined by dublin core attributes... Maybe I've misunderstood whats implied by table 1 in 5.3.2. Also, similar issues to the spaital access point arise (With structured data, as opposed to text queries). In some UK datasets, periods such as "Neolithic" can be used instead of an ISO 19108 Date Time. (I seen note 11 under 5.3.4 talks about this, which is good. Whats important is that regardless of the outcome of the study, the IR are extensible enough to cope with the eventual decision). I'd consider seperate access points for controlled vocabulary time period and structured temporal data.

Geographic Extent.. the doc seems a bit bounding box heavy. Would be nice to understand (have examples of) specification of interior/exterioir polygons. Servers only supporting minimal bounding boxes can gracefully degrade (Since it's easy to calculate a MBR from a polygon) whilst allowing other servers to retain the full richness of polygons. It's not clear where the semantics for parsing these strings will be defined.. for example should geographic extent be encoded as OpenGIS strings (Which seems to make sense to me, but I'm biased by Oracle and MySQL's spatial functions). This might seem a bit extreme for the abstract part of the document, but it's one of those make-or-break issues for interoperability, and might be worth the pain. Also, I think it's worth entertaining the idea that spatial specifications such as MBRs and polygons (Structured spatial constructs) might be better exposed using their own abstract access point, and "Place Name" having it's own access point. This will help server implementors avoid problems with disambiguation of search terms.

I'm interested in what the expected semantics of resource language are on retrieval of language-neutral data sets.... Should a result record not be selected because the user specified "Nor" as the search language, but resources matching other criteria (Geo Extent for example) do match. Normally in Info Retrieval this is a no-brainer, of course it should, but I'm a bit less certain when we talk about result records that aren't primarily "Text" based. (Actually, this is a slightly wider concernn about annex A and those "CharacterString" elements... In IEEE LOM we have "LangString" element that has a "Lang" attribute. That community chose to allow language variants of a resource to be expressed within one record by allowing an element to hold all language variants, for example
<pre>
<title>
<langstring lang="En">Hello</langstring>
<langstring lang="Dk">Hej</langstring>
</title>
</pre>
The presence of a "Lang" attribute at the "Dataset" level might mean the intention is to support multi-language datasets by having several dataset records, one for each language, which is OK, but possibly not optimal for datasets that aren't prmarily language based. If this is the case, is the "CharacterString" element in Annex A just redundant payload?)

=== Lack of machine-reusable data in general ===

Dataset 'lineage' is only a full-text field. If datasets result from recombination, that should be machine-traversable. Human descriptions of lineage will be so different that they won't be useful for building search / evaluation services.

=== Lack of engagement with packaging and re-use issues ===

Cf. Dataset series / aggregates. The examples have 'MasterMap' as one potential dataset! Real world use cases are going to need subsets of such huge data sets broken down into packages with smaller spatial extents or with less layers.

=== Bypassing of feature-level metadata from consideration ===

Once we get down to the feature level the interesting European problems appear - the fact that every local area may have its own classification schemes, even inside one language community the same word is used to describe different looking things, and across language barriers mappings from words to things don't tend to be 1-1. But by disregarding feature-level metadata - partly because it can't be mandated when the underlying geospatial objects aren't publically inspectable and a certain amount of feature level metadata would mean the data itself is essentially public...

=== Overspecificness about internet- and webservices- based distribution models ===

Actually causing ourselves unnesc problems by putting everything on the Internet. Data sharing agreements over publically maintained private networks with flat-rate membership are a clear potential future and 'middle way' in this domain. The draft now is all about making access/use contraints *specific to data sets* and not specific to the relationship between the data provider or broker, the data user and the transport network between them.

So we have a 'distributed computing platform' metadata property that is required by the IRs. In the ISO19915 mapping in Annex A this is a '''free text field''', yet 5.2.15 states that the property "is necessary for a client to bind to the service". If it must be mandated, it should be as a URI. It would be wonderful to have examples of what other than HTTP or OGC web services is envisaged NOW as a means of access to the backend of a distributed computing platform.

Reading the INSPIRE Metadata Draft

2007-03-14T10:48:33Z

Wiki-Ianibbo: /* Search / discovery services */

Reading the INSPIRE Metadata Draft

2007-03-14T10:44:59Z

Wiki-Ianibbo: /* Search / discovery services */

Reading the INSPIRE Metadata Draft

2007-03-13T09:33:45Z

Wiki-Ianibbo: /* Search / discovery services */

User:Ianibbo

2007-03-09T14:28:23Z

Wiki-Ianibbo:

FOSS developer/consultant involved with of information retrieval systems, and IR standards, specifically Z39.50 and the GEO profile, SRW (And GEO applications therof), OpenSearch and MetaSearch initiatives.
<code>
See http://ianibbo.blogspot.com for more info,
Contact: ibbo -at- k -hyphen- int -dot- com
</code>

User:Ianibbo

2007-03-09T14:26:20Z

Wiki-Ianibbo:

Response to INSPIRE Metadata Draft

2007-03-09T14:20:37Z

Wiki-Ianibbo:

* See [[Reading the INSPIRE Metadata Draft]] for preparatory material.
* See [http://inspire.jrc.it/ir/sdic_view_step1_only.cfm?id=2163 FOSS SDIC]

Any member of the Free and Open Source Geospatial software community is welcome to participate in creating this response. Please add your name, and contact details on your userpage, to the list of participants and at any key stage (first stable draft; when sending the response).

= Participants =

* [[User:JoWalsh|Jo Walsh]]
* [[User:Neteler|Markus Neteler]]
* [[User:Ianibbo|Ian Ibbotson]]

All Members

2007-03-05T11:36:13Z

Wiki-Ianibbo:

{| border="1" class="wikitable" style="margin: 1em 1em 1em 0; background: #f9f9f9; border: 1px #aaaaaa solid; border-collapse: collapse;"
! style="background:#efefef;" | Name
! style="background:#ffdead;" | Affiliations
! style="background:#efefef;" | OSGeo Projects
! style="background:#ffdead;" | (Lat,Lon)
! style="background:#efefef;" | About
|-
| Add yourself
| Everyone is welcome
| In which OSGeo Projects and Committees are you involved
| Input lat/long here
| Copy and paste this entry, put it last, and add your information
|-
| Chris Holmes
| [http://topp.openplans.org The Open Planning Project], [http://geoserver.org GeoServer]
| [http://incubator.osgeo.org Incubator], [http://board.osgeo.org Board], [http://geotools.org GeoTools]
| (40.72,-74.00)
| I come from the Java side of the OSGeo fence, getting my start in GeoServer, where I was lead developer for a couple years, and GeoTools, where I still serve on the PMC. My time is made possible by [http://topp.openplans.org The Open Planning Project (TOPP)], a great non-profit in New York that has been the lead supporter of GeoServer for years now. I spent the last year in Zambia on a Fulbright Scholarship, looking at the potential for open source software to help implement spatial data infrastructures in developing countries. It was a bit of a failure, but I learned a ton, and I see a lot of potential for open source in developing countries, towards truly open spatial data infrastructures. I'm back at TOPP, in a new role as VP of Strategic Development, helping to grow the organization, and figuring out how to make our geospatial stuff self sustaining. Once that's rolling, I hope to reinvest extra revenue in to figuring out and building a truly open geospatial web. And just like apache and linux are the bedrock that the World Wide Web rests on, so too do I believe that the geospatial web necessarily must be built on a foundation of OS Geo software. My continuing thoughts on all of this can be found at http://cholmes.wordpress.com
|-
| Michael P. Gerlek
| [http://www.lizardtech.com LizardTech]
| [http://visibilitycommittee.osgeo.org Promotion and Visibility Committee] (chair)
| (47.673166,-122.530143)
| Manager of LizardTech's Engineering department, where we do MrSID and JPEG 2000 stuff and play with with the next generation of technologies for supporting raster data GIS workflows. No, our products are not open source -- but we do very much support and use open source and open standards. (I think there is room in the world for both the open and closed development models, and I have a strong interest in helping "closed" companies understand the value of, and contribute to, the open software world.) [[User:mpg]]
|-
| Frank Warmerdam
| Independent
| [http://www.gdal.org GDAL/OGR], [http://mapserver.gis.umn.edu MapServer], [http://incubator.osgeo.org Incubator], [http://board.osgeo.org Board]
| (45.45,-77.25)
| Lead developer of GDAL/OGR and freelance geospatial software developer.
|-
| Jason Birch
| [http://www.nanaimo.ca/ City of Nanaimo]
| [http://webcommittee.osgeo.org Web Site], [http://visibilitycommittee.osgeo.org Promotion & Visibility]
| (49.155, -124.005)
| I am a long-time GIS/IT/'Net junkie, and am currently working for the City of Nanaimo's IT department as a Sr. Applications Analyst (GIS Specialist). I am excited about what I see happening in the open source geospatial world, with OSGeo as a catalyst. [[User:Jasonbirch]]
|-
|Howard Butler
| [http://www.hobu.biz/ Hobu, Inc]
| [http://webcommittee.osgeo.org Web Site Committee],
| (42.00, -93.00)
| MapServer hacker, MTSC member. GDAL hacker. ESRI ArcSDE hack. Purveyor of Windows binary builds [[User:hobu]]
|-
| Markus Neteler
| [http://mpa.itc.it ITC-irst], [http://www.cealp.it CEA], [http://www.gdf-hannover.de GDF Hannover]
| [http://grass.itc.it GRASS GIS], [http://board.osgeo.org Board], [http://geodata.osgeo.org Public Geodata Com.], [http://edu.osgeo.org Education Com.], [http://visibilitycommittee.osgeo.org Promotion & Visibility Com.]
| (46.06714, 11.15113)
| Developer of GRASS GIS, researcher at ITC-irst + CEA, Trento, Italy and co-founder of GDF Hannover [[User:neteler]]
|-
| R. Paul Warriner
| [http://www.orchardparkny.org/ Town of Orchard Park]
| [http://fundraising.osgeo.org Fundraising Committee], [http://webcommittee.osgeo.org Web Site Committee]
| (43.17, -78.69)
| Network Coordinator, old oil field hand (really, I do know what a christmas tree is).
[[User:RPaulW]]
|-
|Bart van den Eijnden
| [http://www.osgis.nl/ OSGIS]
| [http://chameleon.maptools.org Chameleon],
| (52.0768396070808, 5.12454)
| Freelancer working with several open source GIS tools, mainly Chameleon, Mapserver and Geoserver.
[[User:bartvde]]
|-
|Simone Giannecchini
| [http://simboss.wordpress.com/ blog] ,[http://www.geo-solutions.it GeoSolutions]
| [http://geoserver.org GeoServer], [http://http://docs.codehaus.org/display/GEOTOOLS/Home GeoTools]
| (gotta look for it :-))
| I have been working as a freelance consultant in the GIS and Image Processing field since early 2004, mainly in scientific and military environment. I am PMC member of [http://http://docs.codehaus.org/display/GEOTOOLS/Home GeoTools] and active developer of [http://geoserver.org GeoServer]. I am also providing some patches for the [https://jai.dev.java.net/ JAI] and [https://jai-imageio.dev.java.net/ ImageIO] SUN libraries for image processing in Java.

I am a big GDAL fan, I have been involved in the last year in an effort for putting GDAL behind ImageIO
for widening the number of supported formats. The goal is to make this formats avalaible through GeoTools to the GeoServer. If you are interested in supporting or joining this effort, please, drop me a few lines at simone.giannecchini-at-geo-solutions.it or simboss1-at-gmail.com
[[User:simboss]]
|-
| Helena Mitasova
| [http://skagit.meas.ncsu.edu/~helena/ North Carolina State University]
| [http://grass.itc.it GRASS GIS], [http://wiki.osgeo.org/index.php/Core_Curriculum_Project Curriculum project]
| (35.77, -78.69)
| Researcher at NCSU (geospatial technology, environmental modeling, sustainable development), Developer of GRASS GIS. [[User:Helena]]
|-
| Daniel Morissette
| [http://www.mapgears.com/ Mapgears]
| [http://mapserver.gis.umn.edu/ MapServer], [http://www.gdal.org GDAL/OGR]
| (48.42, -71.04)
| Involved in MapServer, GDAL/OGR and most [http://maptools.org/ MapTools.org] projects, mostly around webmapping and data access and distribution. [[User:dmorissette]]
|-
| Tamas Szekeres
| [http://www.hmeirt.hu/ MoD ED Co.]
| [http://mapserver.gis.umn.edu/ MapServer]
| (47.56, 19.08)
| M.Sc.El.Engineer, Head of Development Department, GPS Division , MapServer contributor/hacker, mapscript C# maintainer, involved in various WEB mapping and desktop applications, GPS navigation and tracking systems. [[User:szekerest]]
|-
| Ari Jolma
| [http://users.tkk.fi/~jolma/index.html TKK]
| [http://www.gdal.org GDAL/OGR], [http://wiki.osgeo.org/index.php/Core_Curriculum_Project Curriculum project]
| (60° 16' , 24° 47' 4'')
| Professor at TKK, Finland (geoinformatics, environmental information systems, water resources systems), [http://map.hut.fi/PerlForGeoinformatics/ just another Perl hacker] [[User:ajolma]]
|-
| Jeff McKenna
| [http://www.dmsolutions.ca DM Solutions Group]
| [http://mapserver.gis.umn.edu/ MapServer]
| (45.401397610, -75.725861625)
| MapServer documentation, [http://www.maptools/ms4w MS4W] maintainer, [http://www.maptools.org maptools] co-maintainer. [[User:jmckenna]]
|-
| Ian Turton
| [http://www.geovista.psu.edu/members/turton/index.html work][http://pennspace.blogspot.com/ blog]
| [http://www.geotools.org GeoTools]
| (40.7932, -77.847)
| [http://www.geotools.org GeoTools] founder and developer, [http://www.geovistastudio.psu.edu GeoVistaStudio] benevolent dictator, [http://geoserver.org GeoServer] user. [[User:ianturton]]
|-
| David Blasby
| [http://topp.openplans.org The Open Planning Project], [http://geoserver.org GeoServer], [http://geotools.org GeoTools]
| [http://geotools.org GeoTools]
| (varies)
| Currently, I'm the Project Lead for Geoserver and am on the GeoTools Project Management Committee. I'm just starting a GeoWiki (Public Participation GIS) (please contact me if you're interested). I was the orginal creator of PostGIS, and have contributed to several OS GIS projects, including JTS, JUMP, and Mapserver.
|-
| Andrey Kiselev
| "Radar" R&D Centre (Russia)
| GDAL/OGR
| (60.04,30.33)
| Freelance developer and contributor to GDAL/OGR project.
|-
| Helton Uchoa
| [http://www.geolivre.org.br Geolivre Community], [http://www.open3dgis.org Open 3D GIS Project]
| [http://webcommittee.osgeo.org Web Site Committee] and [http://wiki.osgeo.org/index.php/Public_Geospatial_Data_Project Public Geospatial Data Project]
| (-22.96, -43.11)
| I'm a Geomatics Enginner and I work at [http://www.opengeo.com.br OpenGEO Company] as a GIS Specialist. I'm responsible for many GIS projects using FOSS and the OpenGIS Specifications in Brazil and I have some relevant papers and scientific articles presented in Brazilian and Latin-American conferences and published in scientific magazines. In last year, I have helped, as a teacher, introduce the GNU/FSF philosophy at the Transportation Engineering Department of IME ([http://www.ime.eb.br Military Institute of Engineering - IME], Brazil). I have worked in Geolivre Rio 2004 and 2005 as member of organization commitee. Now I'm working in [http://www.geolivre.org Geolivre Conference 2007]. [[User:Uchoa]]
|-
| Toru Mori
| [http://www.orkney.co.jp/english Orkney, Inc.]
| [http://mapserver.gis.umn.edu/ MapServer], [http://grass.itc.it GRASS GIS]
| (35.448, 139.642)
| President of Orkney, Inc. Advocate of Open Geospatial tools in Japan and Asia. Promote open geospatial data. [[User:moritoru]]
|-
| Allan Doyle
| [http://www.eogeo.org EOGEO],[http://museum.mit.edu/cmp MIT Museum],[http://spg.gsfc.nasa.gov/ NASA Earth Science Data Systems Standards Process Group]
| [http://wiki.osgeo.org/index.php/Public_Geospatial_Data_Project Public Geospatial Data Project]
| (42.28, -71.24)
| President of [http://www.eogeo.org EOGEO] and [http://www.intl-interfaces.com International Interfaces], long-time geo-interoperability interests, opensourced (is that a verb?) [http://openmap.bbn.com OpenMap], originator of OGC testbed idea, Web Mapping Testbed, WMS spec editor, worked on WMS Context, [http://www.georss.org GeoRSS]. [http://www.eogeo.org/Members/adoyle more details]. [http://think.random-stuff.org Blog][[User:adoyle]]
|-
| Ned Horning
| [http://cbc.amnh.org/ Center for Biodiversity and Conservation], [http://www.amnh.org/ American Museum of Natural History]
| [http://wiki.osgeo.org/index.php/Core_Curriculum_Project Curriculum project]
|(43.9933, -73.0407)
|Program manager for [http://geospatial.amnh.org/ remote sensing/GIS]. Promoter of open source geospatial tools in the global conservation community.
|-
| Paul Spencer
| [http://www.dmsolutions.ca DM Solutions Group]
| [http://mapserver.gis.umn.edu/ MapServer], [http://chameleon.maptools.org Chameleon], [http://ka-map.maptools.org kaMap], [http://maptools.org/maplab/index.phtml MapLab], [http://maptools.org/ms4w/index.phtml MS4W], [http://openev.sourceforge.net/ OpenEV]
| (45.401397610, -75.725861625)
| CTO of DM Solutions Group, designer/developer/contributor to many open source packages, especially based on MapServer. Recent interest/focus is on AJAX clients for mapping applications. [[User:pagameba]]
|-
| Mark Lucas
| remotesensing.org
| [http://www.remotesensing.org remotesensing.org] and [http://www.ossim.org ossim]
| (27.9690219N, 080.5590534W altitude sea level + 5m)
| CTO, original founder of ImageLinks and remotesensing.org. Board of Directors [http://www.oss-institute.org/ Open Source Software Institute] and the [http://www.ncospr.org/ National Center for Open Source Policy and Research]. Member of [http://www.opentechdev.org Open Technology Development] Tiger team for the Department of Defense (USA). Lead a team of talented developers on the OSSIM and [http://www.ossim.org/tiki-read_article.php?articleId=3 osgPlanet] projects. Previously spent 22 years in the United States Air Force and [http://www.nro.gov/ National Reconnaissance Office] and the [http://www.fas.org/irp/nro/hall3.htm Secretary of the Air Force Special Projects] organization working with various classified programs. Prior to Radiant Blue Technologies, was a Lead Scientist for Intelligence Data Systems, Titan Corporation, and L3-Communciations. [http://web.mac.com/mlucas17/iWeb/Site/Welcome.html Personal Web site]. [[User:mlucas17]]
|-
| Jo Walsh
| [http://okfn.org/geo/ Open Knowledge Foundation],[http://mappinghacks.com/ Mapping Hacks], [http://publicgeodata.org Public Geodata]
| Open Geodata committee
| (42.368297,-71.108696)
| Came to geospatial software through collaborative mapping on the semantic web work. Organising events to get geospatial hackers together with data-creating people and promote public access to state collected geodata. If you are in Europe please see [http://publicgeodata.org Public Geodata] and consider writing to an MEP about public domain data and "intellectual property rights" issues. If you collect GPS tracks, please consider uploading them to [http://openstreetmap.org/ OpenStreetmap] - my only real contribution to this project is to talk about it a lot. I co-wrote "Mapping Hacks" with Schuyler Erle and Rich Gibson, with a lot of contributions from OSGeo type of people. Last year wrote a lot of software using OSM and [http://openguides.org/ OpenGuides] with [[Mapserver]] to provide a basis for collaborative local "portal" type services on community wireless networks. Now more interested in doing collaborative writing and research projects. [[User:JoWalsh]]
|-
| Dave McIlhagga
| [http://www.dmsolutions.ca DM Solutions Group]
| [http://visibilitycommittee.osgeo.org Promotion and Visibility Committee]
| (45.401397610, -75.725861625)
| President & CEO of DM Solutions Group. Active promoter of open source geospatial technologies. Led DM Solutions Group to become a major contributor and advocate of MapServer and development of key open source MapServer utilities including [http://chameleon.maptools.org Chameleon], [http://ka-map.maptools.org kaMap], [http://maptools.org/maplab/index.phtml MapLab], [http://maptools.org/ms4w/index.phtml MS4W]. Provided financial and resource support for setup of a key home for open source geospatial projects at [http://www.maptools.org MapTools]. Led the organizing committee for [http://www.omsug.ca/osgis2004/index.html OSGIS], the first Open Source Geospatial conference in North America which coincided with the second MapServer User Meeting. Spearheaded the integration of the two major open source geospatial conferences from North America and Europe/Asia, as the [http://www.foss4g2006.org/ Free and Open Source Software for Geoinformations] single international event to be held in Lausanne Switzerland. [[User:davemac]]
|-
| Pericles (Perry) Nacionales
| [http://land.umn.edu University of Minnesota]
| [http://webcommittee.osgeo.org Web Site Committee]
| (44.9873167, -93.1851500)
| Promoter of open source geospatial technologies specially in the field of natural resources management and conservation, advocate of open and interoperability standards, MTSC member, author of [http://mapserver.gis.umn.edu/docs/tutorial/tutorial/tutorial MapServer Tutorial].
|-
| Norman Vine
| Independent
|
| (41:31:38N, 70:39:43W)
| Independent software developer [[User:Nhv]]
|-
| Mike Adair
| [http://www.geoconnections.org/CGDI.cfm Natural Resources Canada/GeoConnections]
| [http://communitymapbuilder.org MapBuilder]
| (45.27, -75.75)
| Contributor and member of MapBuilder PMC. Interested primarily in AJAX client technology for mapping, but also in the whole SDI stack. [[User:madair]]
|-
| Stefan F. Keller
| University of Applied Sciences Rapperswil (HSR), [http://www.ifs.hsr.ch Institute for Software]
| [http://webgis.hsr.ch/javawps JavaWPS]
| (47.2240, 8.8181)
| Promotor of open source and commercial technologies specially in the field of information retrieval, databases, GIS and visualization. Advocate of open and interoperability standards, member of national GIS standardization (e-geo, SNV) and umbrella (SOGI) organizations. Creator of [http://wwww.geometa.info geometa.info], one of the first search engines for geospatial services (WMS), metadata and online maps (Lucene-based); contributor of geo-webservices for german Wikipedia. [[User:Sfkeller]]
|-
| [[User:Arnulf Christl | Arnulf Christl]]
| [http://www.ccgis.de CCGIS], [http://www.geo-consortium.de Geo-Consortium]
| [http://www.mapbender.org Mapbender], [http://www.umn-mapserver.de UMN MapServer (Germany)], [http://board.osgeo.org Board], [http://visibilitycommittee.osgeo.org Promotion and Visibility Committee]
| (7.0707, 50.7342)
| Mapbender PSC, Promoter of [http://www.gnu.org Free Software] and [http://www.osi.org Open Source] :-) Business (...and Open Source Software!)
|-
|V.RaviKumar
|Geologist
|OSGeo member [http://freegis.gnu.org.in/grass_geosciencedataset.pdf],[''GRASS Indian exmple'']
| 17° N 79° E
| A Geologist from India who is interested in FOSS software. GRASS in particular. Conducted a FOSS workshop at Hyderabad, India in May 2005. The workshop boosted our spirits with a large participation and good articles on various FOSS software. An entire session was for GRASS, Qgis software. Presently lecturing in various forums on the capability of GRASS and allied FOSS GIS. With the help of Free Software Foundation India, trying to spread awareness of GRASS GIS, GNU-Linux and FOSS. Countries like India have a lot to gain with the spread of FOSS.
|-
|David Hastings
|UN Economic and Social Commission for Asia and the Pacific, Bangkok
|Member of original Grass Interagency Steering Committee, etc.
| 13.75°N 100.5°E
| A physicist/geophysicist/geological engineer who has used GRASS since 1987, and on the GRASS Interagency Steering Committee for the original public-domain package. I wrote the Linux Mini-HOWTO on GRASS-GIS (which is now woefully out of date); and taught short courses in scientific (as opposed to cartographic) GIS since 1980. In 1994 I moved my teaching to the Web, developing the CyberInstitute Short-Course on GIS. Currently, I'm at UN ESCAP. Open-Source is a great capacity- building environment for software communities worldwide. In developing countries, rather than being stuck merely teaching people to cut and paste stuff within a proprietary office suite, you can be part of the full development team, customizing the software to your community's needs, helping your country to have its own software development community - and hopefully making a satisfying living in the process.
|-
| Gary Sherman
| [http://mrcc.com Micro Resources], [http://qgis.org Quantum GIS]
| OSGeo Member
| (-149.567, 61.32138)
| Consultant, "Father" of Quantum GIS, long-time Linux user and Open Source proponent.
|-
| Astrid Emde
| [http://www.mapbender.org Mapbender], MapServer, PostgreSQL/PostGIS
| Mapbender Development
| (7.0707, 50.7342)
| Projects with MapServer, PostgreSQL/PostGIS, Mapbender. Part of the Mapbender Developer Team. Courses for Mapbender, UMN MapServer, PostgreSQL/PostGIS and WMS, WFS
|-
| Jeroen Ticheler
| [http://geonetwork.sourceforge.net GeoNetwork opensource], [http://sourceforge.net/projects/intermap InterMap opensource], [http://www.fao.org/geonetwork Food and Agriculture Organization GeoNetwork]
| OSGeo member
| 42.07420°N, 12.34343°E
| I've initiated the development of the GeoNetwork opensource Spatial Data Catalog software and its embedded InterMap opensource Map Viewer. I hope to contribute possitively to the creation of a comprehensive, FOSS based toolkit for Spatial Data Infrastructures (SDIs) that help people share and use geospatial data and information in an easy and cost effective way. I focus especially on the data sharing within the United Nations system and in countries under development. I promote free and open source software as an excellent option for more sustainable development in these countries, proving it works by applying and further developing it in my day to day work. [http://lists.eogeo.org/mailman/listinfo/opensdi OpenSDI] is a forum to discuss foss and cots integration.
|-
| Dirceu Machado
| [http://www.pti.org.br Itaipu Tecnology Park]
| OSGeo member,GRASS
| 59°S, -24°E
| I'm a brazilian developer of open source GIS/WEB_GIS applications using PHP, JAVA and Python with Mapserver and PostGIS and also a user and enthusiast of Linux and BSD's OS. I'm excited with the idea of a community like this one and i wish to help in any way i can with development's (if necessary) and/or documentation translations to portuguese language. Actualy i'm working in a project to develop a GIS viewer and map generator (for printing purposes) in Python based on the idea of the JUMP Project.
|-
| Kevin Yam
| [http://www.ene.gov.on.ca Ontario Ministry of the Environment], [http://www.lio.mnr.gov.on.ca, Land Information Ontario]
| OSGeo member
| 43.709, -79.544
| Program coordinator for information management within the Provinicial Ministry of the Environment. I focus especially on data sharing between government agencies, departments and local stakeholders, and I am a promoter of open source geospatial tools applicable to environmental monitoring and observing [[User:kevinyam]]
|-
| Colin Gowens
| Geographer, GIS Professional
| OSGeo member
| 33.7518, -84.3920
| User of GRASS, GDAL, OGR, PostGIS and Mapserver since 2002. The open source GIS software and community have proven tremendously valuable to my GIS endeavors.
|-
| David Bitner
| [http://maps.macnoise.com/interactive/ Metropolitan Airports Commission], [http://dbspatial.com/ dbSpatial]
| OSGeo member, Geodata Committee
| 44.844, -93.560
| Active PostGIS and MapServer user. GIS application developer for airport authority and other freelance projects. Serve on Regional/State committees (Minnesota) for Data Sharing and Enterprise Geospatial Architecture. Member of Twin Cities Mapserver Users Group.
|-
| Tyler Mitchell
| [http://spatialguru.com, Spatialguru.com]
| OSGeo Executive Director, [http://visibilitycommittee.osgeo.org Promotion and Visibility Committee], [http://edu.osgeo.org Education Committee]
| 52.13, -121.13 (lat/lon)
| MapServer, PostGIS, GRASS, GDAL user. [http://oreilly.com/catalog/webmapping, O'Reilly Author], writer, promoter of Open Source GIS.
|-
| Rafael Medeiros Sperb
| [http://www.univali.br, G10 - UNIVALI]
| OSGeo Member
| -26.60, -48.70 (lat/lon)
|-
| Steven M. Ottens
| [http://www.geodan.com/ Geodan]
| [http://communitymapbuilder.org MapBuilder]
| 52.34, 4.91 (lat/lon)
| Contributor and member of MapBuilder PMC. [[User:stvn]]
|-
| Stefano Maffulli
| Politecnico di Milano
| [http://visibilitycommittee.osgeo.org Promotion and Visibility Committee], [http://wiki.osgeo.org/index.php/International_Outreach International Outreach], [http://wiki.osgeo.org/index.php/Public_Geospatial_Data_Project Public Geospatial Data]
| 45, 9 (Lat,Lon)
| Architect, worked within the GIS_Lab at University of Florence on research about sustainable development of historical cities. At Joint Research Center (Ispra) worked within the EU funded project [http://commongis.org CommonGIS]. Currently working with Politecnico di Milano as consultant on [http://www.corila.it/ Methodologies and technologies for conservation and restoration of historical Venetian buildings]
|-
| Dave Patton
| [http://members.shaw.ca/davepatton/ CIS Canadian Information Systems]
| helping the Website Committee
| 49.27N 123.15W
| Self-employed computer consultant. Co-lead developer for [http://punt.sourceforge.net/ Punt], an Open Source multi-language Windows desktop application that allows the user to view the terrain of any world in 3D. Canadian Coordinator and co-administrator of [http://www.confluence.org/index.php the Degree Confluence Project] [[User:Dpatton]]
|-
| Jody Garnett<br>[[User:Jive]]
| [http://docs.codehaus.org/display/GEOTOOLS/Home GeoTools]
| Iccubation and limited Website Committee
| missing
| It seems all I do is email, must be due to [http://docs.codehaus.org/display/GEOTOOLS/Home GeoTools], [http://docs.codehaus.org/display/GEOS/Home GeoServer], [http://docs.codehaus.org/display/GEO/Home GeoAPI] and [http://udig.refractions.net uDig]. I am working at [http://www.refractions.net/ Refractions Research, Inc], a small consulting company with an open source habit.
|-
| Justin Deoliveira
| [http://topp.openplans.org The Open Planning Project], [http://docs.codehaus.org/display/GEOTOOLS/Home GeoTools], [http://docs.codehaus.org/display/GEOS/Home GeoServer]
| [http://docs.codehaus.org/display/GEOTOOLS/Home GeoTools]
| undeterministic
| [http://docs.codehaus.org/display/GEOTOOLS/Home GeoTools] module maintainer, [http://docs.codehaus.org/display/GEOS/Home GeoServer] developer, and [http://udig.refractions.net uDig] committer. I have been kicking around the Java GIS world for approximately 3 years contributing as an active developer on said projects. For the last year or so I have been working for a non-profit company known as [http://topp.openplans.org The Open Planning Project].
|
|-
| Dylan Beaudette
| [http://casoilresource.lawr.ucdavis.edu/drupal/node/38 UCD]
| GRASS
| Input lat/long here
| Soils and Biogeochemistry M.S. student at University of California, Davis. Interested in the use and proliferation of OSS in the sciences, particularly soil science. GIS and geomorphologic analysis; presentation of USDA-NCSS digital soil survey information / soils education through visual example.
|-
| Stuart Eve
| [http://www.lparchaeology.com L - P : Archaeology]
| Mapserver (user), GRASS (user)
| Input lat/long here
| Involved in using web-based Open Source technologies to make archaeological data accessible to a wider audience. We use Mapserver in a number of applications, including [http://www.fastionline.org Fasti Online]
|-
| [[User:Pmarc | Paulo Marcondes]]
| [http://www.marcondes.org marcondes.org], [http://hamstuff.blogspot.com Blog]
| [http://grass.itc.it GRASS] (translator), OSGeo Member (?), [[Brasil | OSGeo Brasil]] (proponent)
| (-22.915,-42.229), Maidenhead: GG87vc
| Working in the GRASS translation to portuguese (pt_br), somewhat involved (at least intelecutally) with Debian-GIS, involved in the local Debian User Group. My interests range from everything spatial to everything geospatial, GIS, GPS, Ham Radio, wardriving, etc. I have a B.S. in Geology (2001) Universidade de São Paulo, Brasil. I do R&D in the oil industry in a non GIS arena, but plan migrating to the GIS arena in the near future. I'm also planning a M.S. in GIS sometime in the future (accepting suggestions).
I would like to see free software adopted everywhere. I don't dislike proprietary software per se, but the attitude it usually inspires.
|-
| [[User:anselm | Anselm Hook]]
| [http://hook.org hook.org], [http://maps.civicactions.net maps.civicactions.net] [http://placedb.org placedb]
|
| (-122.673,-45.5371), Portland Oregon
| Both commercial and open source developer. Led engineering for platial.com and wrote placedb.org - also wrote maps.civicactions.net (an ajax tile map engine with a dataset behind it). Also wrote a small java spinny globe at [http://hook.org/headmap headmap]. Interested in providing fully open source map data (not simply applications or tools but actual content). Primarily interested in social and environmental issues with an eye towards modelling near term outcomes of decision making.
|-
| Oscar Cantán
| University of Zaragoza, Spain
| Member
| (41.666,-0.888)
| Currently working on the development and implementation of geospatial interoperability standards. Specially interested in OGC catalog services specification (CSW, SRW) and metadata content standards (ISO 19119-19139).
|-
| Lorenzo Becchi
| http://www.ominiverdi.org
| OSGeo Member
| moving
| ka-Map developer. User:[[User:Ominiverdi|Ominiverdi]]
|-
| Christoph Baudson
| http://www.mapbender.org
| OSGeo Member
| here, there and everywhere
| Mapbender developer. See [[User:christoph|Christoph]]
|-
| Georg Lösel
| http://www.grass-verein.de GRASS-Anwender-Vereinigung
| User (GRASS, QGIS); Free Geodata
| 52,3625/9,7481
| [[User:Georgloesel|Georg Lösel]]
|-
| Reinhard Simon
| [http://www.cipotato.org [International Potato Center, Lima, Peru]
| Project lead: [http://research.cip.cgiar.org/confluence/display/divagis/Home DIVA-GIS]
| NA
| [[User:rsimon|Reinhard Simon]]
|-
| Todd Jamison
| http://www.observera.com
| OSGEO Member; User: OSSIM, GDAL, MapServer; Contributor: OSSIM
| (38.898489, -77.500484)
| Chief Image Scientist and CEO of Observera, Inc. Observera worked on the original OSSIM library with ImageLinks and we have developed several projects using the OSSIM library and MapServer, including ALLEGRO (Land-cover / Land-use Classification) and the Change Detection WorkStation (CDWS), both for the US Army. Expertise includes spectral, thermal, microwave sensors, photogrammetry, image registration, image processing, morphology, resolution enhancement, workflow automation, machine learning (e.g., neural nets, support vector machines, genetic algorithms), Geologic GIS and bunches of other stuff. Glad to be a part of OSGEO.
|-
| Laurent Jégou
| [http://www.univ-tlse2.fr/geoprdc UTM Dept. Géo], [http://www.forumsig.org Forum SIG], [http://www.portailsig.org Portail SIG]
| User and wanabee [http://wiki.osgeo.org/index.php/Core_Curriculum_Project Curriculum project] contributor.
| (43.6N, 1.4E)
| Cartographer (conception, production, integration), cartography and GIS teacher for masters degrees, open source mapping software developper (.Net and Java), technology developpement monitoring.
|-
| Gary Watry
| [http://www.coaps.fsu.edu[Center for Ocean-Atmospheric Prediction Studies - Florida State University]
| NA
| (30.42277,-84.32370)
| [[User:Gary Watry]]
|-
| Hardeep Singh Rai
| [http://www.gndec.ac.in/ Guru Nanak Dev Engineering College, Ludhiana, Punjab]
| -
| (30.55N,75.54E)
| Willing to see growth of GPL/OpenSource softwares in every field. Civil Engineer, in teaching profession since 1989. Presently Professor and Head of Civil Engineering Department.
|-
| Paolo Cavallini
| [http://www.faunalia.it Faunalia]
|
|
|
|-
| M. Agus Salim
| [http://gislab.cifor.cgiar.org/fsic Forest Spatial Information Catalog]
| User
| Bogor, Indonesia [106.752E,6.5533S]
| Working as GIS Assistant in Center for International Forestry Research (CIFOR). Currently i am interested in exploring geospatial open source software capabilities and hope to involved more than a user in the future
|-
| Janusz Michalak
| [http://www.ptip.org.pl/ Polish Association for Spatial Information]
[http://netgis.geo.uw.edu.pl/ Warsaw University, Dept. of Geology]
| GRASS user
| Warsaw, Poland (52.2118,20.9864)
| Will be added later
|-
| Chris Tweedie
| [http://www.dli.wa.gov.au/ Dept. of Land Information]
| OSGeo Lurker
| Perth, Australia [116.0043,-31.8869]
| Deploying a statewide SDI for WA using largely OSGeo projects. General lurker i'm afraid, lots of ideas, not enough time~
|-
| The Sunburned Surveyor (A.K.A. - Landon Blake)
| [http://openjump.blogspot.com/index.html My Blog]
| OSGeo Lurker
| Stockton, California
| Project administrator and developer for The JUMP Pilot Project and the SurveyOS Project.
|-
| Rob Atkinson
| [http://online.socialchange.net.au]
| Geoserver PSC, Geotools
| Wollongong,Australia
| SDI Architect. Involved in data standards and tools to deploy them, registry design, standards development (mainly OGC and ISO, INSPIRE.) Generally, enabling Observations and Measurements patterns in OS tools and other consistency/productivity/scalability requirements.
|-
| Mateusz Loskot
| [http://mateusz.loskot.net/ http://mateusz.loskot.net]
| [http://www.gdal.org/ GDAL/OGR] hacker, [http://fdo.osgeo.org FDO] PSC and hacker, [http://wl.sggw.waw.pl/ Warsaw Agricultural University]
| Warsaw,Poland (52.2373 21.0834)
| A freelance geospatial software developer and contributor to various FOSS/GIS projects. Interested in [http://mobile.maptools.org/ mobile GIS solutions]. Active member of various Open Source Software communities. [[User:mloskot]]
|-
| Asif Ahmed
| http://www.on.ec.gc.ca/orise
| Osgeo Member
| Toronto, Canada
| Mapserver user, Chameleon user, Coldfusion, .NET, C and Perl experience. Open source enthusiast.
|-
|-
| Dongpo Deng
| [http://www.iis.sinica.edu.tw/~dongpo/cv.html About me]
| OSGeo Lurker
| Taipei, Taiwan(25.041N 121.614E)
| A researcher for open geospatial techniques and data.
|-
| Ian Ibbotson
| http://developer.k-int.com
| Osgeo Lurker
| Sheffield, UK (37.0625,-95.677068)
| Information Retrieval / Information Repository Developer. Worked with USGS on combining text and spatial IR systems, on the GEO Z3950 profile, and on exposing GEO access points in the SRW/SRU protocol. Developer on UK Peoples Network cultural heritage / digital preservation amongst other projects with public information / spatial faceted data.
[[Category:Membership]]