OWL Numerics
From OWLED-Wiki
OWL 1.0, in principle, allowed for rather expressive manipulation of data especially numbers. OWL 1.1 extends this support in a number of ways which have some surprising to the naive user consequences. This document attempts to explain some of the issues surrounding data values in OWL with some emphasis on numerics.
Contents |
Data Values in OWL facts
OWL has several features which involve some sort of numeric computation and reasoning. These fall into two basic camps: Counting (of individuals, i.e., the cardinality constructs) and numeric data values. Individuals generally are named with URIs, where as numeric data values (and, indeed, all data values) have a specific lexical form (i.e., they are written as literals). Numeric data values are, therefore, syntactically restricted to places where literals are legal, e.g., the object of a proprety. Data values are futher segregated by only being the object of a specific set of properties in an ontology: those declared to be "data" or "datatype" properties. Furthermore, data values always have at least one explicit<ref>We count plain literals with or without language tags as explicitly typed because the fact that they are "Plain" or "Plain with Lang" can be determined by their local lexical form alone.</ref> datatype.
Some example data values:
- "A String"^^xsd:string
- "1"^^xsd:integer
- "1.0"^^xsd:decimal
- "1.0"^^xsd:float
A key difference between individuals and data values is that the meaning of indivdiual constants (e.g., "who they denote") is determined by the particular OWL Axioms and Facts in scope. Data values have a predetermined built in meaning (that is, the mapping from the name (lexical value) to the meaning (the value) is predefined). Thus, for example, consider the following OWL ontology:
ex:aStringConsistingSolelyOfTheLetterA rdf:type ex:String. ex:theIntegerTwo rdf:type ex:Integer. ex:aStringConsistingSolelyOfTheLetterA owl:sameAs ex:theIntegerTwo.
Is this ontology consistent? The intent seems to be that it is not, since, intuitively, the string "A" is distinct from the integer 2. But this ontology is consistent. To get our desired semantics we would, at least, have to add that:
ex:String owl:disjointWith ex:Integer.
For data values, this particular aspect is already built in, e.g.,:
"A"^^xsd:string owl:sameAs "2"^^xsd:integer.
is inconsistent. Or rather, it would be if that were legal RDF. We can get the same effect by implying the equality, e.g.,:
ex:p rdf:type owl:FunctionalProperty. ex:t ex:p "A"^^xsd:string . ex:t ex:p "2"^^xsd:integer.
Since ex:p is functional, any values of ex:p off ex:t have to be one and the same, thus we get an inconsistency.
Note that the semantics of data values (and their types) is stronger than the odd disjointness: Individual values within a type have equality conditions, e.g.:
ex:p rdf:type owl:FunctionalProperty. ex:t ex:p "3"^^xsd:integer . ex:t ex:p "2"^^xsd:integer.
is inconsistent.
Counting and Cardinality Constructs
Now, we can use integers to count! For example:
ex:t ex:p "1"^^xsd:integer . ex:t ex:p "2"^^xsd:integer. ex:t ex:p "3"^^xsd:integer.
This entails that:
ex:t rdf:type ex:Min3Ps
given that:
ex:Min3Ps owl:equivalentClass [a Restriction;
owl:onProperty ex:p;
owl:minCardinality "3"^^xsd:nonNegativeInteger].
Of course, as this latter construction shows, we can count in OWL without appeal to data properties and values, e.g.,:
ex:t ex:p ex:s . ex:t ex:p ex:r. ex:t ex:p ex:u.
However, ex:t is not inferred to be an instance of ex:Min3Ps (even shifting ex:p to an object property). In OWL, data values come with a specific built in syntax (this is, in part, why they are called "literals") and with an associated semantics. An OWL reasoner has the built in knowledge that "1"^^xsd:integer is not equal to "2"^^xsd:integer (and that "01"^^xsd:integer is the same thing as "001"^^xsd:integer and different from "1"^^xsd:string). The individuals ex:s, ex:r, ex:u do not come with the same "amount" of semantics. Their meaning is primarily characterized by other statements in the ontology. For normal individuals, OWL presumes very little. In the above example, it is not stated whether ex:s is the same as ex:r or not and we have no presumption either way. So, for all a reasoner knows, ex:s, ex:r, and ex:u could refer to the very same thing, thus there is not enough evidence to conclude that ex:t is an instance of ex:Min3Ps.
We can rectify this by adding more information, e.g.,:
ex:t ex:p ex:s . ex:t ex:p ex:r. ex:t ex:p ex:u. ex:s owl:differentFrom ex:r. ex:s owl:differentFrom ex:u. ex:r owl:differentFrom ex:u.
Now we can infer that ex:t rdf:type ex:Min3P.
These sorts of interaction with counting are probably the most common issue with numerics in OWL, see:
In the above example, what commonly happens is that people expect that different names refer to different individuals. (This is mostly true with data values, though there are some classes of names which are coreferring.) After all, normally, we do use different names to indicate that we're talking about different things. If we presume that this normal case is the only case then we have made the Unique Name Assumption (UNA). With the UNA, the example with or without the explicit differentFroms has exactly the same meaning.
The UNA can help with inferring minimums (just as with data values), but it doesn't help with inferring maximums, none of the example ontologies (with or without data values; with or without explicit inequalities; with or without the UNA) allows us to infer that ex:t is an instance of ex:Max3Ps, where:
ex:Max3Ps owl:equivalentClass [a Restriction;
owl:onProperty ex:p;
owl:maxCardinality "3"^^xsd:nonNegativeInteger].
This is due to the fact that OWL makes the open world assumption (OWA). That is, for all we've said in the examples, ex:t could have ex:p relations to additional, distinct objects. So, for all we know:
ex:t ex:p ex:z.
Where x:z is distinct from all the other ex:p successors. Since we've not said enough to rule out this possibility, an OWL reasoner must assume that it's a live one. This is often surprising as we commonly do presume that we've mentioned everything that's relevant (and that's roughly how databases work). We can "close the world" with respect to ex:t and ex:p in a number of ways, with perhaps the most satisfying being:
ex:t rdf:type [a owl:Restriction;
owl:onProperty ex:p;
owl:allValuesFrom [a owl:Class; owl:oneOf (ex:s, ex:r, ex:u)]].
This says that ex:t can have (but doesn't need to have) a ex:p relation to ex:s, or ex:r, or ex:u. In other words, if you see ex:t having an ex:p relation it has to be to one of those guys.
While the built in semantics of data values tends to help with the lack of UNA, it does nothing with regard to the OWA except by providing alternative ways to build possible targets of a role closing allValuesFrom (than enumeration, e.g., range restricted types).
General Issues with Computational Numerics
There are sometimes surprising interactions between various types of numbers, ways of writing those numbers down, and operations on those numbers. One characteristic example is the lack of closure for some operation with respect to some type of number. For example, subtraction is not closed with respect to the natural numbers. If we ask for the natural number that is described by the expression 3-2 (where 3 and 2 are natural numbers), we get the answer 1 (a natural). However, there is no natural number described by the expression 2-3.
Assuming that we have both natural numbers and integers as available types and subtraction, we could coerce 3 and 2 in the second expression to integers and return an integer answer (-1). (Or we could see subtraction as sometimes returning integer answers for natural number inputs...perhaps by treating the natural numbers as a subtype of the integers.) In this case the coercion is straightforwardly harmless (unless we wanted an error to be raised for negative answers!).
Similarly, there are issues derived from the sorts of representation we use to work with numbers. An obvious example is the use of bounded representation for infinite sets, e.g., 32 bit integers. Obviously, since computers are finite physical systems (and quite small ones, really) there are going to be (finite) sets of integers they can't manipulate. However, if we use an unbounded representation what happens if we hit an integer that's too large is that we run out of memory (or other resource). When we fix the size of our representation we can easily compute integers that are larger than the representation allows (for example, the result of 4,294,967,295 + 1 is larger than all the unsigned 32 bit integers). System behavior in this circumstance varies widely. Three obvious possibilities is to saturate, wrap, or raise an overflow error. If your addition operator saturates, then any result that would be larger than the maximum representable number would be truncated to the maximum. If your addition operator wraps, then the excess amount above the maximum becomes the result. In our above example, the result would be 0 (since we added 1 to the "last" int, we get the "first" int). Finally, of course, we can say that the result is "too big" to be one of our ints and thus is an error. Saturation and wrapping give you closure (on addition over 32 bit integers) at the expense of the appealing property that the sum of two positive integers is larger than either operand.
(In the C programming language, addition overflow wraps for unsigned integer and is undefined for signed integers!)
Even if a number is "in bounds" for a (finite) representation (such as we use in computers), it may be that not all numbers within the bounds are representable. This is quite obvious is one considers the set of reals between 0 and 1. There are just too many for any representation with a fixed finite bounds to represent. (Actually, since that set isn't denumerable, even availing ourselves of arbitrarily long, but finite, strings wouldn't help.)
- http://docs.sun.com/app/docs/doc/800-7895/6hos0aou4?a=view
- http://www2.hursley.ibm.com/decimal/decifaq1.html
- http://www.cs.utah.edu/~zachary/isp/applets/FP/FP.html
Rounding and Gap issues
I.e., stuff deriving from finite, inexact representations of e.g., dense sets.
Finitude, Counting, and Performance
How finite datatypes can really kill reasoner performance

