Shape Fragments Specification

Unofficial Draft

More details about this document
Latest editor's draft:
https://shape-fragments.github.io/shape-fragments-spec/
Editors:
(Ghent University – imec – IDLab)
(UHasselt)
(UHasselt)
(Ghent University – imec – IDLab)
This Version
https://shape-fragments.github.io/shape-fragments-spec/20211213/
Previous Version
https://shape-fragments.github.io/shape-fragments-spec/20211213/
Website
https://github.com/shape-fragments

Abstract

A shape fragment is the part of an RDF graph that conforms to a SHACL shape. With shape fragments, the SHACL language can be used to define subgraphs of an RDF graph. The current document describes the shape fragments concepts through definitions and examples.

The version of this document is v0.0.0.

Status of This Document

This document is a draft of a potential specification. It has no official standing of any kind and does not represent the support or consensus of any standards organization.

This is an early draft, yet efforts are made to keep things stable.

1. Introduction

This document interprets SHACL [SHACL] as a subgraph extraction language. We call the extracted subgraphs shape fragments. Shape fragments are defined using neighborhoods. A neighborhood of a node for a shape is the part of a data graph that shows the node conforms to a shape. Neighborhoods are defined differently for different kinds of constraints; in the remainder of this document exact definitions for neighborhoods for each constraint type in SHACL core.

2. Terminology

Throughout this document, terms are used that were defined in the SHACL specification [SHACL]. For those SHACL terms that are used often, we restate their definitions in this section. This document is wriiten for readers who are already familiar with RDF [RDF-concepts].

SHACL assumes a data graph (SHACL: data graph) and a shapes graph (SHACL: shapes graph). The data graph contains some information and the shapes graph is used to validate that information.

A shapes graph is an RDF graph that contains the definition of one or more shapes (SHACL: shape). Shapes are defined by target declarations (SHACL: target declaration) and constraints (SHACL: constraint). Target declarations define a set of nodes, the target (SHACL: target), in the data graph that must conform to the shape’s constraints. Each constraint in the shapes graph is defined by its kind (SHACL: kind), i.e., a constraint component (SHACL: constraint component), and by its prameters (SHACL: parameter). Depending on the kind and parameters of a constraint, nodes will be tested against different conditions, for example: checking whether strings or URIs match a regular expression, checking whether a node has a minimum cardinality for certain properties, and many more.

SHACL distinguishes between node shape (SHACL: node shape) and property shapes (SHACL: property shape), the difference being that the former do not have property paths (SHACL: property path) and therefore place constraints on nodes, while the latter do have paths. Property shapes place constraints on nodes reachable through paths that match a property path expression. The simplest form of a property path expression is a predicate path (SHACL: predicate path), more complex forms include concatenation (SHACL: concatenation), inversion (SHACL: inversion) and transitive closure (SHACL: transitive closure) of property paths.

SHACL distinguishes between focus nodes and value nodes. Focus nodes (SHACL: focus node) are the nodes that are being validated by a shape. During validation of a focus node, the concept of value nodes is used. The set of value nodes (SHACL: value nodes) is defined the nodes reachable from the focus node through a property path (if the shape is a property shape) and as the focus node itself if the shape has no property path (i.e., for node shapes).

3. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

4. Document conventions

This section is non-normative.

In this document, examples assume the following namespace prefix bindings unless otherwise stated:

Prefix Namespace
sh: http://www.w3.org/ns/shacl#
xsd: http://www.w3.org/2001/XMLSchema#
: http://example.com/ns#
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs: http://www.w3.org/2000/01/rdf-schema#

In the examples in this document color-coded boxes containing RDF graphs in the Turtle syntax [Turtle] are used. We use different colors for input graphs, shape graphs and shape graph fragments as follows:

# This box contains an example data graph
# This box contains an example shapes graph
# This box contains an example shapes graph fragment

5. Extracting subgraphs with shapes

A shapes graph fragment is a subgraph derived from a data graph for a given shapes graph. A shapes graph fragment is the union of the shape fragments of the individual shapes in the shapes graph. For each shape in the shapes graph, that shape's shape fragment contains the neighborhood for that shape of each focus node in the shape's target.

Diagram showing that Shape Fragments outputs a subgraph based on the input data and shapes graph.
Figure 1 Subgraph extraction with shape fragments returns the validated subgraph of a graph instead of a validation report.

5.1 Shape neighborhoods

The neighborhood of a focus node for a shape is a subgraph of the data graph. If a focus node does not conform to a shape, the neighborhood of that node for that shape is defined as being empty. If a focus node does conform to a shape, the neighborhood of that focus node for that shape is defined as containing exactly:

5.2 Target triples

The target triples of a focus node for a shape intuitively are those triples in the data graph that make the focus node part of the target of that shape. The target triples of a focus node for a shape are defined as the union of the target triples for that focus node and each of the shape's target declarations:

If a node is not in the target of a shape, there are no target triples of that focus node for that shape.

5.3 Path triples

Throughout this document, the term path triples will be used to refer to triples in the data graph that lay on a path. Path triples are defined by a property path, a start node, and an end node. The path triples between a start and end node for a property path are exactly those triples in the data graph on paths that (i) start in the start node, (ii) match the property path and (iii) end in the end node.

6. Constraint neighborhoods

The neighborhood of a focus node for a constraint are those triples from the data graph that show that the focus node satisfies the constraint. Which triples are contained in such a neighborhood depends on the kind of the constraint. In the remainder of this section, we give definitions of neighborhoods for the different constraint kinds in SHACL core.

6.1 Value type constraint components

Subsection about neighborhoods for constraints of kinds: sh:ClassConstraintComponent, sh:DatatypeConstraintComponent, sh:NodeKindConstraintComponent

The neighborhood of a focus node for a value type constraint contains the triples from the data graph that make the focus node satisfy the constraint. For a value type constraint that is part of a property shape, i.e., a shape with a property path, the neighborhood of a focus node x contains, for each of the shape's value nodes y, the path triples between x and y for the shape's property path.

Additionaly, the neighborhood of a focus node for a class constraint (sh:ClassConstraintComponent) contains, for each value node y, those triples in the data graph that declare the y has class the required class c. These are the path triples between the y and c for property path rdf:type/rdfs:subClassOf*.

6.2 Cardinality constraint components

Subsection about neighborhoods for constraints of kinds: sh:MinCountConstraintComponent, sh:MaxCountConstraintComponent

The neighborhood of a focus node for a cardinality constraint contains those triples that show that the focus node has the required amount of value nodes. For a cardinality constraint that is part of a property shape, i.e., a shape with a property path, the neighborhood of a focus node x contains the path triples between x and y for the property path for all value nodes y.

6.3 Value range constraint components

Subsection about neighborhoods for constraints of kinds: sh:MinExclusiveConstraintComponent, sh:MinInclusiveConstraintComponent, sh:MaxExclusiveConstraintComponent, sh:MaxInclusiveConstraintComponent

The neighborhood of a focus node for a value range constraint contains those triples that show that the focus node's value nodes lie in the required range. For a value range constraint that is part of a property shape, i.e., a shape with a property path, the neighborhood of a focus node x includes the path triples between x and y for the property path for all value nodes y.

6.4 String-based constraint components

Subsection about neighborhoods for constraints of kinds: sh:MinLengthConstraintComponent, sh:MaxLengthConstraintComponent, sh:PatternConstraintComponent, sh:LanguageInConstraintComponent, sh:UniqueLangConstraintComponent

The neighborhood of a focus node for a string-based constraint contains those triples that show that the focus node's value nodes match the required string values. For a string-based constraint that is part of a property shape, i.e., a shape with a property path, the neighborhood of a focus node x includes the path triples between x and y for the property path for all value nodes y.

6.5 Property pair constraint components

Subsection about neighborhoods for constraints of kinds: sh:EqualsConstraintComponent, sh:DisjointConstraintComponent, sh:LessThanConstraintComponent, sh:LessThanOrEqualsConstraintComponent

Except constraints of kind equals, the neighborhood of a focus node for property pair constraint is empty.

The neighborhood of a focus for an equals constraint contains all triples in the data graph with subject x and the predicate specified after sh:equals. Additionaly, for an equals constraint that is part of a property shape, i.e., a shape with a property path, the neighborhood of a focus node x includes the path triples between x and y for the property path for all value nodes y.

6.6 Logical constraint components (excluding negation)

Subsection about neighborhoods for constraints of kinds: sh:AndConstraintComponent, sh:OrConstraintComponent, sh:XoneConstraintComponent

The neighborhood of a focus node for a logical constraint of kind "and", "or" or "xone" are those triples in the data graph that show that the focus node satisfies the logical constraint. Such a neighborhood contains, for each shape s specified in the parameter list of the logical constraint and for each value node y of the focus node, the (shape) neighborhood of y for s. Additionaly, for a logical constraint that is part of a property shape, i.e., a shape with a property path, the neighborhood of a focus node x includes, for each value node y, the path triples between x and y for the shape's property path.

6.7 Shape-based constraint components

Subsection about neighborhoods for constraints of kinds: sh:NodeConstraintComponent, sh:PropertyShapeComponent, sh:QualifiedMinCountConstraintComponent, sh:QualifiedMaxCountConstraintComponent

The neighborhood of a focus node for a node or property constraint (sh:node, sh:property) contains the triples from the data graph that show that the focus node satisfies the shape-based constraint. In particular, these neighborhoods include those triples from the data graph that show that each of the value nodes satisfies the shape specified after sh:node or sh:property. Concretely, for each value node y, the (shape) neighborhood of y for the shape specified after sh:node/sh:property is included in the neighborhood of a focus node for a node or property constraint. Additionaly, for a node or property constraint that is part of a property shape, i.e., a shape with a property path, the neighborhood of a focus node x includes, for each value node y, the path triples between x and y for the shape's property path.

The neighborhood of a focus node for a qualified min count constraint contains those triples from the data graph that show that the focus node has the required amount of value nodes that satisfy the specified shape. A neighborhood of a focus node x for a qualified min count constraint with qualified shape s contains, for each value node y:

The neighborhood of a focus node for a qualified max count constraint contains those triples from the data graph that show that the focus node has no more than the allowed amount of value nodes that satisfy the specified shape. A neighborhood of a focus node x for a qualified max count constraint with qualified shape s contains, for each value node y:

6.8 Other constraints

Subsection about neighborhoods for constraints of kinds: sh:ClosedConstraintComponent, sh:HasValueConstraintComponent, sh:InConstraintComponent

The neighborhood of a focus node for a closed constraint is empty. Such a constraint (like all other constraints) influences whether the focus node satisfies the shape of which the constraint is part, and therefore influences whether the neighborhood of the focus node for that shape is empty, but the constraint itself has does not contribute to the neighborhood for that shape.

The neighborhood of a focus node for a has value constraint contains those triples from the data graph that show that at least one of the focus node's value nodes has the required value. For a has value constraint that has required value v and that is part of a property shape, i.e., a shape that has a property path, the neighborhood of a focus node x contains, for each value node y that is equal to v, the path triples between start node x and end node y for the property shape's property path.

The neighborhood of a focus node for an in constraint contains those triples from the data graph that show that all of the focus node's value nodes have one of the required values. For in constraints that are part of a property shape, i.e., a shape that has a property path, the neighborhood of a focus node x contains, for each value node y, the path triples between start node x and end node y for the property shape's property path.

7. Negation

Subsection about neighborhoods for constraints of kinds: sh:NotConstraintComponent

Negated shapes are shapes which a focus node should not satisfy. In SHACL, they are defined using sh:not or sh:qualifiedMaxCount; the latter says that enough of the focus node's value nodes do not satisfy the qualified shape.

The neighborhood of a focus node for a negated shape depends on the constraints that compose the shape being negated. Neighborhood definitions for five kinds of constraints in the next five paragraphs are given; for other constraint kinds they are empty. Note: shapes with multiple constraints are treated as conjunctions, so the negation of such a shape should be treated like a negated conjunction (negation of sh:and).

For some constraint types, we define the neighborhood of their negation by giving an equivalent shape. An engine could implement this for example using shape rewriting. The rewriting is guaranteed to end; it is always possible to reach a negation normal form where the neighborhood of every negated constraint is clearly defined.

7.1 Negation of cardinality constraints

The negation of cardinality constraints is given by changing the cardinality constraint’s direction. As shown per example:

7.2 Negation of shape-based constraints

The neighborhood of negated shape-based constraints is given by pushing the negation inside those constraints and changing the quantifier. The laws we use to define the pushing inside are based on negations of quantified predicates. We illustrate the pushing inside by example for negated node shape constraints, negated qualified min count constraints and negated qualified max count constraints:

7.3 Negation of logical constraints

The neighborhood of negated logical constraints is given by pushing the negation inside those constraints using logical laws. Negating another negation removes both negations (double negative). For the other logical constraints, De Morgan’s laws are used.

7.4 Negation of property pair constraints

The neighborhoods of negated property pair constraints contain those triples that show the property pair constraint is not satisfied. To define these neighborhoods, we will call use the name E to refer to the constraint’s property path expression and we will use p to refer to the property specified as after the property pair constraint’s parameter (sh:equals/sh:disjoint/sh:lessThan/sh:lessThanOrEquals).

The neighborhood of a focus node x for a negated equals constraint contains:

The neighborhood of a focus node x for a negated disjointness constraint contains, for all nodes y that are both reachable from x through an E-path and a p-path:

For all nodes y, z where these three conditions hold: (i) y is not less than z (resp. y is not less than or equals) z, (ii) y is reachable from z through an E-path and (iii) z is reachable from x through a p-path, the neighborhood of a focus node x for a negated less than constraint (resp. less than or equal) contains:

7.5 Negation of closure

The neighborhood of a node and a negated closed constraint component contains all triples in the data graph that (i) have the given node as subject and (ii) have a predicate that is neither specified in any property shape of non-closed shape, nor in the ignored property list.

A. References

A.1 Normative references

[RDF-concepts]
Resource Description Framework (RDF): Concepts and Abstract Syntax. Graham Klyne; Jeremy Carroll. W3C. 10 February 2004. W3C Recommendation. URL: https://www.w3.org/TR/rdf-concepts/
[SHACL]
Shapes Constraint Language (SHACL). Holger Knublauch; Dimitris Kontokostas. W3C. 20 July 2017. W3C Recommendation. URL: https://www.w3.org/TR/shacl/
[Turtle]
RDF 1.1 Turtle. Eric Prud'hommeaux; Gavin Carothers. W3C. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/turtle/