Enriching Material Composition Data Using SPARQL Rule-Based Inferencing

This is the fourth in a series of posts about using SHACL to validate material composition data for semiconductor products (microchips). This results from a recent project we undertook for Nexperia. In our first four posts, we looked at how to validate our material composition data:

  • In the first post we looked at the basic data model for material composition and how basic SHACL vocabulary can be used to describe the constraints.
  • In the second post we looked at how SPARQL-based constraints can be used to implement more complex rules based on a SPARQL SELECT query.
  • In the third post, how aggregates can be used as part of validation rules, and
  • In the fourth post we looked at using OWL to model class-based inference rules.

In this post, we will explore how we could use SPARQL as an alternative to OWL to capture the inference rules from the previous post.

Starting from the same generic cases:

  1. Any X is a Z
  2. Any X containing Y is a Z
  3. Any X containing at least N% of Y is a Z

Let’s look at how these can be implemented as SPARQL CONSTRUCT queries.

Any X is a Z

Using the standard RDFS/OWL vocabulary, the rdfs:subClassOf is the obvious way to implement this inference rule as it infers that any instance of the subclass is also an instance of the parent class:

plm:Adhesive rdfs:subClassOf iec:M-014 .

This is straightforward to map into the equivalent SPARQL CONSTRUCT query:

PREFIX plm: <http://example.com/def/plm/>
PREFIX iec: <http://example.com/def/iec62474/>
CONSTRUCT {
  ?s a iec:M-014
}
WHERE {
  ?s a plm:Adhesive .
}

In layman’s terms, for every ?s that is a plm:Adhesive, construct the statement that ?s is a iec:M-014.

So from the statement:

<132253533401> a plm:Adhesive .

We can construct the statement:

<132253533401> a iec:M-014 .

However, this requires a separate query for each mapping rule to be instantiated for each material class. These could be manually maintained or generated by some templating language.

Another option is to make the rdfs:subClassOf statements part of the RDF dataset we are querying over. This allows to define a single SPARQL query that will work for all rules of this “any X is a Z” case.

Here we will assume the instance data is in the default graph and the inference rules in a graph named http://example.com/graph/iec62474.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX plm: <http://example.com/def/plm/>
PREFIX iec: <http://example.com/def/iec62474/>
CONSTRUCT {
  ?s a ?parent .
}
WHERE {
  ?s a ?child .
  graph <http://example.com/graph/iec62474> {
    ?child rdfs:subClassOf ?parent .
  }
}

So from the RDF dataset:

<132253533401> a plm:Adhesive .
graph <http://example.com/graph/iec62474> {
  plm:Adhesive rdfs:subClassOf iec:M-014 .
}

We can construct the statement:

<132253533401> a iec:M-014 .

As additional statements are added into the dataset, additional statements can be constructed using our rule. Essentially here we have implemented the rdfs:subClassOf semantics in a concrete SPARQL query.

Any X containing Y is a Z

For this case we had to dive into class-based restriction rules to express the logic using OWL. This is rather complex for non-ontologists, so here we propose a simpler instance-based approach using SPARQL. We will use a similar approach as above where the rules will be captured in a named graph on our RDF dataset.

Consider the concrete rule “Any Component containing some C.I. Pigment Violet 23 is a M-015: Other Organic Materials”. First let’s write a rule in SPARQL for this:

PREFIX plm: <http://example.com/def/plm/>
PREFIX iec: <http://example.com/def/iec62474/>
CONSTRUCT {
  ?s a iec:M-015 .
}
WHERE {
  ?s a plm:Component ;
    plm:containsMaterialClass <132285000361> .
}

So from the statements:

<132253533401> a plm:Component ;
  plm:containsMaterialClass <132285000361> .

We can construct the statement:

<132253533401> a iec:M-015 .

This is a much more direct and understandable way to capture the rule. It also does not generate additional unneeded rdf:type statements for the intermediate classes necessary for OWL inferencing.

The next step is to generalise/abstract this SPARQL query so it works for all rules of this type. Let’s begin by replacing the parts that will change rules of this type with a variable:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX plm: <http://example.com/def/plm/>
PREFIX iec: <http://example.com/def/iec62474/>
CONSTRUCT {
  ?s a ?parent .
}
WHERE {
  ?s a ?child ;
    plm:containsMaterialClass ?materialClass .
  VALUES (?child ?materialClass ?parent) {
    (plm:Component <132285000361> iec:M-015)
    # more rules can be added here
  }
}

Here we have used the VALUES clause to pass in a set of bindings for the variables. It would be relatively easy to extend the set of solutions for other material classes. But let’s consider how we can add statements into our dataset that allow to capture these rules in RDF statements.

As there is no standard RDFS term that expresses the semantics we need, we will make a small custom vocabulary. Here was can describe this in RDF as follows:

[] a plm:Component ;
  plm:containsMaterialClass <132285000361> ;
  rdfs:subClassOf iec:M-015 .

Here we introduce a blank node (denoted by []) that kind of captures the pattern we want to match and the sub class. In RDF terms, this does not make too much sense, but if this is added to the named graph, we can write a query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX plm: <http://example.com/def/plm/>
PREFIX iec: <http://example.com/def/iec62474/>
CONSTRUCT {
  ?s a ?parent .
}
WHERE {
  ?s a ?child ;
    plm:containsMaterialClass ?materialClass .
  graph <http://example.com/graph/iec62474> {
    [] a ?child ;
      plm:containsMaterialClass ?materialClass ;
      rdfs:subClassOf ?parent .
  }
}

This allows us to pull out those rules into a separate RDF graph as a kind of configuration rather than hard-coded into the SPARQL query.

Any X containing at least N% of Y is a Z

Finally, let’s consider the “Any X containing at least N% of Y is a Z” case. Specifically let’s look at the rule “Any Component containing at least 40% Iron is a M-002: Other Ferrous alloys, non-stainless steels”.

To express this in OWL required a two-layer approach using class-based restrictions. To express the equivalent in SPARQL is relatively simple:

PREFIX plm: <http://example.com/def/plm/>
PREFIX iec: <http://example.com/def/iec62474/>
CONSTRUCT {
  ?s a iec:M-002 .
}
WHERE {
  ?s a plm:Component ;
    plm:qualifiedRelation [
      a plm:ContainsMaterialClassRelation ;
      plm:massPercentage ?massPercentage ;
      plm:target <132285000116>
    ] .
  filter (?massPercentage >= 40)
}

Again let’s use variables and bind these using a VALUES clause:

PREFIX plm: <http://example.com/def/plm/>
PREFIX iec: <http://example.com/def/iec62474/>
CONSTRUCT {
  ?s a ?parent .
}
WHERE {
  ?s a ?child ;
    plm:qualifiedRelation [
      a plm:ContainsMaterialClassRelation ;
      plm:massPercentage ?massPercentage ;
      plm:target ?materialClass
    ] .
  values (?child ?materialClass ?minPercentage ?parent) {
    (plm:Component <132285000116> 40 iec:M-002)
    # more rules can be added here
  }
  filter (?massPercentage >= ?minPercentage)
}

Here again the set of bindings can be extended for other rules of this type, but we can also pull this out into a separate configuration graph:

[] a plm:Component ;
  plm:qualifiedRelation [
    a plm:ContainsMaterialClassRelation ;
    plm:massPercentage 40 ;
    plm:target <132285000361>
  ] ;
  rdfs:subClassOf iec:M-002 .

Again, there is limited semantic value to these statements, it is just a way to capture a solution as RDF statements. The SPARQL query can then be formulated:

PREFIX plm: <http://example.com/def/plm/>
PREFIX iec: <http://example.com/def/iec62474/>
CONSTRUCT {
  ?s a ?parent .
}
WHERE {
  ?s a ?child ;
    plm:qualifiedRelation [
      a plm:ContainsMaterialClassRelation ;
      plm:massPercentage ?massPercentage ;
      plm:target ?materialClass
    ] .
  { SELECT DISTINCT ?child ?materialClass ?minPercentage ?parent {
    graph <http://example.com/graph/iec62474> {
      [] a ?child ;
        plm:qualifiedRelation [
          a plm:ContainsMaterialClassRelation ;
          plm:massPercentage ?minPercentage ;
          plm:target ?materialClass
        ] ;
        rdfs:subClassOf ?parent .
    }
  }}
  filter (?massPercentage >= ?minPercentage)
}

Here we introduce the sub-select just to force home that the only purpose of these new statements is to produce a solution set that is joined to the outer results. This has exactly the same effect as using the VALUES formulation from the previous query.

The formulation of these rules carries little (machine readable) semantic value, it only serves to match a pattern in our query.

Hopefully this demonstrates how SPARQL can be used to implement such rules. The reader should also be able to contrast this approach with the OWL approach from the previous post. Whilst SPARQL is arguably simpler and easier to understand, the semantics of the rules are not really made explicit as they are with OWL.

In the next post in this series, we will look at how SHACL can be used to implement these rules.