SPARQL/OWL

From OWLED-Wiki

Revision as of 16:30, 25 September 2009 by Bglimm (Talk | contribs)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search

SPARQL Entailment

Authors
Birte Glimm, Oxford University Computing Laboratory
Bijan Parsia, University of Manchester
Abstract
Status of this Document
This document is an editors' draft.

Copyright © 2009 by the Authors.

Contents


Abstract

SPARQL is a query language for data that is stored natively as RDF or viewed as RDF via middleware. What correct answers to a SPARQL query are depends on the used entailment regime and the vocabulary from which the resulting answers can be taken. The first version of SPARQL was defined only for simple entailment, but it defines a set of conditions that have to be met when defining what correct results are for SPARQL queries under different entailment regimes. The goal of this document is to specify conditions such that SPARQL can be used with entailment regimes other than simple entailment, most notably we define the semantics of SPARQL queries under RDFS entailment and OWL Direct and RDF-Based semantics.


Introduction

Editor's Note: Note that the entailments to be handled are not finalized. Work in progress, etc.

An entailment regime specifies

  1. a subset of RDF graphs called well-formed for the regime
  2. an entailment relation between subsets of well-formed graphs and well-formed graphs.

We do not change any of the existing entailment regimes, but rather define the vocabulary from which possible answers can be taken and which answers are legal answers in order to guarantee that query answers are always finite. We also do not restrict the set of legal graphs that can be queried apart from the restriction to graphs that are legal under the entailment regime in question. E.g., under the RDFS entailment regime, one can query all legal RDFS graphs, while under OWL 2 DL Direct Semantics, one can query all graphs that correspond to legal OWL 2 DL ontologies. We do define which queries are legal and how illegal queries or graphs and inconsistencies are handled.

Overview

SPARQL 1 defined a set of conditions for entailment extensions to SPARQL BGP evaluation that we repeat here for clarity and give an idea of how the different entailment regimes up to OWL 2 entailment satisfy them. Further details for each of the specified entailment regimes can be found in the corresponding section for that entailment regime.

1 -- The scoping graph, SG, corresponding to any consistent active graph AG is uniquely specified and is E-equivalent to AG.

2 -- For any basic graph pattern BGP and pattern solution mapping P, P(BGP) is well-formed for E.

3 -- For any scoping graph SG and answer set {P1 ... Pn} for a basic graph pattern BGP, and where {BGP1 .... BGPn} is a set of basic graph patterns all equivalent to BGP, none of which share any blank nodes with any other or with SG

   SG E-entails (SG union P1(BGP1) union ... union Pn(BGPn))

These conditions do not fully determine the set of possible answers, since RDF allows unlimited amounts of redundancy. In addition, therefore, the following must hold.

4 -- Each SPARQL extension must provide conditions on answer sets which guarantee that every BGP and AG has a finite set of answers which is unique up to RDF graph equivalence.

For 1: All entailment regimes specified here use the same definition of a scoping graph as given in SPARQL 1. Thus, the required equivalence is immediate.

For 2: This gives minimal conditions on what legal answers are for an entailment regime E: A mapping is only a legal solution mapping and included in the answer if applying the mapping yields a set of RDF triples that are well-formed for E. For example, under RDFS entailment, any SPARQL query is legal, but queries that require literals as a binding for a variable in a subject position have no answer because all mappings that result in a set of RDFS entailed tripes are not well-formed RDF since RDF forbids literals in the subject position. Similarly, for OWL 2 DL entailment, a query might have no answer because all possible bindings might result in RDF triples that are not well-formed for OWL 2 DL.

For 3: This condition prevents the reuse of blank nodes between query answers unless those blank nodes are really the same in the queried graph. Under this restriction no accidental co-references among blank nodes are introduced. Since we use the same definition of a scoping graph, the condition is also satisfied.

Editor's Note: The proof needs to be redone though since it uses the interpolation lemma, which does not hold e.g., for RDFS-entailment.

For 4: For entailment regimes other then simple entailment this point is very important since a finite set of answers is no longer guaranteed. For example, already under RDF and RDFS entailment, even the empty graph entails an infinite number of axiomatic triples such as rdf:_1 rdf:type rdf:Property, rdf:_2 rdf:type rdf:Property, ... Thus, a query with BGP { ?x rdf:type rdf:Property . } would, without further restrictions, have infinitely many answers. In order to guarantee finite answers, different entailment regimes will impose different restrictions on the vocabulary from which bindings can be taken. We explain these restrictions in greater detail in the following section.


Details for different entailment regimes

For each of the covered entailment regimes, we adress all the conditions that entailment extensions to SPARQL basic graph pattern matching have to satify.

RDFS Entailment

We only support RDFS (and not RDF) as a distinct, named regime. The only additional answers from RDFS compared to RDF are some axiomatic triples, plus any IRI used as a property will end up as part of an answer to ?p rdf:type rdf:Property. Systems who want to support only RDF entailment instead of RDFS entailment, can state that they do support the RDF subset of the RDFS entailment regime.


Name RDFS
Legal Graphs All legal RDF graphs that are RDFS-consistent.
Legal Queries Any legal SPARQL query.
Illegal handling In case the query or the queried graph is illegal, the system must raise an error.
Entailment RDFS Entailment
Inconsistency Inconsistent graphs are illegal and thus trigger an error. (Maybe: An implementation MAY offer to return plain SPARQL 1.0 answers with a warning.) (Maybe: DESCRIBE queries MUST report that the queried graph is inconsistent and MAY give some explanations. )
Query Answers A pattern instance mapping P for RDFS entailment is a pair (μ, σ), where μ : V -> RDF-T is a solution mapping and σ : RDF-B -> RDF-T is an RDF instance mapping. Let BGP be a basic graph pattern, V(BGP) the set of variables in BGP, B(BGP) the set of blank nodes in BGP, G an RDF graph that is RDFS-consistent, and P=(μ, σ) a pattern instance mapping for RDFS entailment. We write P(BGP) to denote the result of replacing each variable v from V(BGP) for which μ is defined with μ(v) and each blank node b from B(BGP) for which σ is defined with σ(b).

A solution mapping μ is a possible solution for BGP from G under RDFS entailment if dom(μ) = V(BGP) and there is an RDF instance mapping σ from B(BGP) to RDF-T such that dom(σ)=B(BGP) and the pattern instance mapping P=(μ, σ) is such that P(BGP) are well-formed RDF triples that are RDFS entailed by G.

A possible solution μ is a solution for BGP from G under RDFS entailment if (C1) for each variable v that occurs in the subject position of a triple pattern in BGP, μ(v) occurs in the signature of the scoping graph or the query and (C2) if μ(v) is a blank node, then μ(v) occurs in the scoping graph SG.

Please note that we define what legal answers under RDFS entailment are in a two-stage process. Intuitively, the possible answers are all answers that one would expect under RDFS entailment, i.e., all mappings such that instantiating the basic graph patterns with them results in RDF triples that are RDFS entailed by the queried graph. The set of possible answers is, however, not necessarily finite. To obtain always a finite set of answers, the next step defines which of the possible answers are actually to be returned as answers to the query. For example, consider the query

SELECT ?x WHERE { ?x rdf:type rdf:property . } 

against a graph containing only the triple

<ex:a> <ex:b> <ex:c> . 

As part of the possible solutions we find { (x, <ex:b>) } since according to the rdfs entailment rule rdf1

<ex:a> <ex:b> <ex:c> . 

entails

<ex:b> rdf:type rdf:property . 

Further, from the axiomatic triples, we also get the possible solutions { (x, rdf:_1) }, { (x, rdf:_2) }, ... We get even more possible answers since the rdfs rule se2 means that

<ex:b> rdf:type rdf:property . 

entails

_:exb rdf:type rdf:property . 

for some blank node _:exb allocated to <ex:b> and { (x, _:exb) } is a possible solution.

Clearly, the set of possible solutions is infinite in this case, but for a possible solution to actually be a solution, two further conditions have to be met: (C1) for each variable v that occurs in the subject position of a triple pattern in BGP, μ(v) occurs in the signature of the scoping graph or the query (C2) if μ(v) is a blank node, then μ(v) occurs in the scoping graph SG In this case, ?x occurs in the subject position and the binding for ?x is required to occur in the scoping graph or the query. This rules out all axiomatic triples since no subject of an axiomatic triple occurs in the queried graph or the query. It also rules out { (x, _:b1) } since _:b1 does not occur in the scoping graph and the only answer is { (x, <ex:b>) }.

Regarding the second condition, consider the query

SELECT ?x WHERE { <ex:a> <ex:b> ?x . } 

against the a graph G containing the triples

<ex:a> <ex:b> <ex:c> . 
<ex:a> <ex:b> _:c . 

We assume that the scoping graph SG contains the following triples:

<ex:a> <ex:b> <ex:c> . 
<ex:a> <ex:b> _:sgc . 

By definition of graph instances, SG is an instance of G and we say that G is equivalent to SG although the blank node _:c has been renamed to _:sgc. By rule se1, the triple

<ex:a> <ex:b> _:exc .  

is entailed by the queried graph where _:exc is the blank node allocated to <ex:c>. It is also valid to introduce even more blank nodes by applying the same rule to the above triple, e.g., the triple

<ex:a> <ex:b> _:exc2 . 

with _:exc2 allocated to _:exc1 is a valid entailment according to the rule se1. Thus, we have at least { (x, <ex:c>) }, { (x, _:sgc) }, and { (x, _:exc) } as possible solutions and we could think of infinitely more possible solutions of the form { (x, _:exc2) } witnessed by repeated application of the rule se1 with fresh blank nodes for each application. In this case condition C2 on solutions rules out possible solution of the form { (x,_:exc1) }, { (x,_:exc2) }, ... since _:exc1, _:exc2, ... do not occur in SG.

Please note that this works well with the sound and complete entailment algorithm for RDFS as proposed by ter Horst [RDFSENTAILMENT] and guarantees interoperability between systems independent of the intermediate and possibly redundant consequences that different systems might add or avoid to add.

Note, however, that the first condition introduces a kind of non-monotonic behavior. Consider the ASK query

ASK { rdf:_1 rdf:type rdf:property . }

against the empty graph. The answer to the query is

yes

because the triple is RDFS entailed by the empty graph and rdf:_1 occurs in the signature of the query. The ASK query

ASK { ?x rdf:type rdf:property . }

against the empty graph has, however, the answer

no

because ?x is in the subject position and all bindings that lead to a possible solution are ruled out from the actual solutions by condition C1. Rewriting the query into a select query would also return no answer.

Please note that solution mappings that maps variables that occur in the subject in the basic graph pattern BGP to literals will not be returned as solutions even though there might be a pattern instance mapping P for the solution mapping such that P(BGP) is RDFS entailed by the queried graph, but P(BGP) is not well-formed as required (see also the SPARQL triple patterns definition). E.g., given a query

SELECT ?x WHERE { ?x rdf:type rdf:XMLLiteral . }

even the empty graph would entail all statements xxx rdf:type rdf:XMLLiteral for xxx a well-formed RDF XML literal, but applying a solution mapping such as ?x/"<a>abc</a>"^^rdf:XMLLiteral would result in a triple that is not a valid RDF(S) triple.

Editor's Note: Other possible design choices are:

1) Instead of using the signature to limit the number of axiomatic triples, we could simply exclude the axiomatic triples from the set of query answers. This still results in a kind of non-monotonic behavior.

2) We do return axiomatic triples, but select or construct queries that contain one of the following the triple patterns

?x rdf:type rdf:Property . 
?x rdf:type rdfs:ContainerMembershipProperty . 
?x rdfs:domain rdfs:Resource . 
?x rdfs:range rdfs:Resource . 

must satisfy additional safety restrictions. E.g., if a query contains the one of the above triple patterns, then it must specify a limit clause. This would have the advantage that we do have a kind of monotonic behavior, i.e., both ask queries described above would ahve the answer yes and the subject would be contained in the answers for the according select query, but the additional limit clause would restrict answers to a finite subset of all answers. The disadvantage is that we do have no control as to what would be in the returned answer and what would be left out, which is not very intuitive either.


Editor's Note: Please ignore what is said below for now:

OWL 2 DL, EL, and QL

We explicitly specify the OWL 2 DL entailment regime and state below how it can be used for the other OWL 2 EL and QL profiles that use direct semantics.


Name OWL 2 DL
Legal Graphs Any RDF graph which is an OWL 2 DL ontology document which does not entail that owl:Thing rdfs:SubClassOf owl:Nothing, i.e., the given ontology must be consistent.
Legal Queries Any legal SPARQL query.
Illegal handling In case the query or the queried graph is illegal, the system must raise an error.
Entailment See OWL conformance
Inconsistency Inconsistent graphs are illegal thus trigger an error. (Maybe: An implementation MAY offer to return plain SPARQL 1.0 answers with a warning.) (Maybe: DESCRIBE queries MUST report that the queried graph is inconsistent and MAY give some explanations. )
Query Answers Possible answers are solution mappings that, when applied to the BGP of the query, yield a set of well-formed RDF triples that are RDFS-entailed by the queried graph. The set of blank nodes available for binding consists of all blank nodes from the scoping graph and has the same cardinality as the set of explicit blank nodes in the active graph.

Bindings are limited to the signature of the ontology, plus BNodes from the ABox (only).


The OWL 2 DL entailment regime can easily be used for OWL 2 EL and QL profiles, which also use direct semantics. The only difference for the OWL2 EL and OWL 2 QL profiles is that the legal graphs are restricted to those that correspond to OWL 2 EL and OWL 2 QL ontologies respectively. Similarly, valid solution mappings are restricted to those that, when applied to the BGP, result in a set of RDF triples that are valid for OWL 2 DL or OWL 2 QL respectively.


Editor's Note: We plan to interpret all variables as distinguished (where SG BNodes count as part of the active domain.

Editor's Note: See how we can make annotations as query answers work in OWL.

Options for coping with generative literal expressio:ns

  1. Like "easy" keys in OWL, consider union of all finite ranges as possible bindings. Advantages: Expressive, natural. Disadvantage: Hell to implement.
  2. Named literals only as binding candidates. Advantage: Easy to implement. Disadvantage: "non-local" changes to answers.
  3. Local ABox DP closure. Advantage: Easy to implement, no non-local effects. Disadvantage: Some equivalent queries become non-equivalent.

Alternative Syntaxes

Editor's Note: This is speculative text. May need to be spun off.

OWL 2 is defined in terms of an abstract data model and not directly in triples. This data model is realized in several distinct concrete non-triple syntaxes including OWL/XML and the Manchester Syntax. In a SPARQL engine processes these variant syntaxes, it SHOULD answer SPARQL queries posed against datasets derived from those variant syntaxes as if they were transformed into the corresponding RDF graphs. Note, that a SPARQL implementation does not need to actually perform the conversion internally. SPARQL engines MAY process queries that are expressed in syntactic variants of SPARQL wherein the basic graph patterns are instead expressed, via transformation into the data model, in an alternative concrete syntax. Thus, for example,

PREFIX rdfs:	http://www.w3.org/2000/01/rdf-schema#
SELECT ?C
WHERE
{
  ?C rdfs:subClassOf ?D .
}  

may be expressed using Manchester Syntax as:

SELECT ?C
WHERE
{
  Class: ?C
      SubClassOf: ?D
}

Extensions

Editor's Note: Update? ParentOf...

Helpers

References

Informative References

[RDFSENTAILMENT] Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary, Herman J. ter Horst, Journal of Web Semantics, 3(2-3):79-115, 2005.

Personal tools
External Resources