Validating Material Composition Data Using SHACL

This is the first in a series of posts about using SHACL to validate material composition data for semiconductor products (microchips). This results from a recent project we undertook for Nexperia. In this first post we will look at the basic data model for material composition and how basic SHACL vocabulary can be used to describe the constraints.

First a short intro to Nexperia:

Nexperia is a dedicated global leader in Discretes, Logic and MOSFETs devices. Nexperia is a new company with a long history, broad experience and a global customer base. Originally part of Philips, Nexperia became a business unit of NXP before becoming an independent company in the beginning of 2017.

The quality and reliability of the products Nexperia produce is paramount as explained in this blog post. One aspect of this is the product composition, being a declaration of the subtances that a product contains. This information is published via the Nexperia Quality portal.

To have a better understanding of the data that is shown, it is useful to have some understanding of how micropchips are composed. The following image shows how a chip is typically composed of multiple sub-parts (a bit like a victoria sponge cake):

Chip layers

These sub-parts form the Bill Of Materials, or BOM, of the device. Each of these materials may have its own composition, for example, the mold consists mainly of plastic and the clip of copper.

The source data is modelled as an RDF graph where the material has a qualified relation to the types of substances (termed ‘Material Classes’) of which it is composed. This can be represented pictorially like this:

Graph representation of material composition

The same graph written in RDF (Turtle):

@prefix : <http://example.com/ns#> .
@prefix plm: <http://example.com/def/plm/> .

:331214892031 a plm:MouldCompound ;
  plm:name "3312 148 92031" ;
  plm:containsMaterialClass :132285000223 ;
  plm:qualifiedRelation [
    a plm:ContainsMaterialClassRelation ;
    plm:target :132285000223 ;
    plm:materialGroup "Pigment" ;
    plm:massPercentage 0.3
  ] .

:132285000223 a plm:MaterialClass ;
  plm:name "1322 850 00223" ;
  plm:description "Carbon black" ;
  plm:casNumber "1333-86-4" .

In plain English: the mould compound “3312 148 92031” contains 0.3% of material class “1322 850 00223” Carbon black (CAS number 1333-86-4) which acts as a pigment.

Note that logically there is 99.7% of other ‘stuff’ (like silica, polymer and resin) in this material, but that is not shown here for sake of brevity.

To be able to validate this data, we want to describe the RDF properties with which a resource should be described and the expected values (datatypes, cardinality, etc.) Here we want to describe three node shapes to match our 3 resources from the example data above.

The first shape should match the resource <http://example.com/id/plm/mouldcompound/331214892031>.

We can start by defining that shape as follows:

@prefix : <http://example.com/ns#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix plm: <http://example.com/def/plm/> .

:shape1 a sh:NodeShape ;
  sh:targetNode :331214892031 .

This defines a shape that will only target the resource :331214892031, which is way too specific. To make this more generic, we can relate the shape to our plm:MouldCompound class instead, so that it will be used to validate any Mould Compound:

@prefix : <http://example.com/ns#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix plm: <http://example.com/def/plm/> .

:shape1 a sh:NodeShape ;
  sh:targetClass plm:MouldCompound .

That’s better, but we also want this shape to validate Adhesive, Clip, Lead Frame, and so on. Rather than relate it to one or more classes, we can better relate it to any resource that is the subject of a plm:containsMaterialClass property. That can be done as follows:

@prefix : <http://example.com/ns#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix plm: <http://example.com/def/plm/> .

:shape1 a sh:NodeShape ;
  sh:targetSubjectsOf plm:containsMaterialClass .

This will then apply the shape to anything that contains a material class, perfect!

Next we want to extend the shape to validate the properties:

  • rdf:type
  • plm:name
  • plm:containsMaterialClass
  • plm:qualifiedRelation

Let’s go through each in turn.

For rdf:type, we want to describe the constraint that there is one rdf:type statement whose value is an IRI. That can be written in SHACL like this:

@prefix : <http://example.com/ns#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix plm: <http://example.com/def/plm/> .

:typeShape a sh:PropertyShape ;
  sh:path rdf:type ;
  sh:minCount 1 ;
  sh:maxCount 1 ;
  sh:nodeKind sh:IRI .

For plm:name, we want to describe the constraint that there is one plm:name statement whose value is match a regex matching the pattern “NNNN NNN NNNN”.

:nameShape a sh:PropertyShape ;
  sh:path plm:name ;
  sh:minCount 1 ;
  sh:maxCount 1 ;
  sh:datatype xsd:string ;
  sh:pattern "^[0-9]{4} [0-9]{3} [0-9]{5}$" .

For plm:containsMaterialClass, we want to describe the constraint that there is one or more plm:containsMaterialClass statement, where all values are of type plm:MaterialClass.

:containsMaterialClassShape a sh:PropertyShape ;
  sh:path plm:containsMaterialClass ;
  sh:minCount 1 ;
  sh:class plm:MaterialClass .

For plm:qualifiedRelation, we want to describe the constraint that there is one or more plm:qualifiedRelation statement, where all values are of type plm:ContainsMaterialClassRelation. Where the previous property shapes were generic and re-usable, we know this constraint is specific for use of plm:qualifiedRelation on instances that match our node shape. Therefore we define this as a blank node and reference to it from the node shape:

:shape1 a sh:NodeShape ;
  sh:targetSubjectsOf plm:containsMaterialClass ;
  sh:property [
    sh:path plm:qualifiedRelation ;
    sh:minCount 1 ;
    sh:class plm:ContainsMaterialClassRelation
  ] .

To complete this we can also refer to the other property shapes we have defined. So bringing it all together:

@prefix : <http://example.com/ns#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix plm: <http://example.com/def/plm/> .

:shape1 a sh:NodeShape ;
  sh:targetSubjectsOf plm:containsMaterialClass ;
  sh:property [
    sh:path plm:qualifiedRelation ;
    sh:minCount 1 ;
    sh:class plm:ContainsMaterialClassRelation
  ] , :typeShape , :nameShape , :containsMaterialClassShape .

:typeShape a sh:PropertyShape ;
  sh:path rdf:type ;
  sh:minCount 1 ;
  sh:maxCount 1 ;
  sh:nodeKind sh:IRI .

:nameShape a sh:PropertyShape ;
  sh:path plm:name ;
  sh:minCount 1 ;
  sh:maxCount 1 ;
  sh:datatype xsd:string ;
  sh:pattern "^[0-9]{4} [0-9]{3} [0-9]{5}$" .

:containsMaterialClassShape a sh:PropertyShape ;
  sh:path plm:containsMaterialClass ;
  sh:minCount 1 ;
  sh:class plm:MaterialClass .

Next to this we also want to define shapes for our plm:ContainsMaterialClassRelation and plm:MaterialClass classes. These can be defined as follows:

:ContainsMaterialClassRelationShape a sh:NodeShape ;
  sh:targetClass plm:ContainsMaterialClassRelation ;
  sh:property [
    sh:path plm:target ;
    sh:minCount 1 ;
    sh:maxCount 1 ;
    sh:class plm:MaterialClass
  ] , [
    sh:path plm:materialGroup ;
    sh:minCount 1 ;
    sh:maxCount 1 ;
    sh:datatype xsd:string
  ] , [
    sh:path plm:massPercentage ;
    sh:minCount 1 ;
    sh:maxCount 1 ;
    sh:datatype xsd:decimal
  ] .

:MaterialClassShape a sh:NodeShape ;
  sh:targetClass plm:MaterialClass ;
  sh:property :typeShape , :nameShape , :descriptionShape , :casNumberShape .

:descriptionShape a sh:PropertyShape ;
  sh:path plm:description ;
  sh:minCount 1 ;
  sh:maxCount 1 ;
  sh:datatype xsd:string .

:casNumberShape a sh:PropertyShape ;
  sh:path plm:casNumber ;
  sh:maxCount 1 ;
  sh:datatype xsd:string ;
  sh:pattern "^[0-9]{2,7}-[0-9]{2}-[0-9]$" . # match pattern "nnnnnNN-NN-N"

The shape file and some sample data is available here and here.

Now we can use the shapes we have defined to validate the sample data. You can use the online SHACL Plaground tool for this, but I prefer to use the Java SHACL API from command line.

To validate, you can use this command:

shaclvalidate.sh -datafile data.ttl -shapesfile shapes1.ttl

The result is a validation report, also in RDF, that describes the constraint checks that failed. For out example data, the validation report looks like this:

@prefix plm:   <http://example.com/def/plm/> .
@prefix owl:   <http://www.w3.org/2002/07/owl#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh:    <http://www.w3.org/ns/shacl#> .

[ a sh:ValidationReport ;
  sh:conforms false ;
  sh:result [
    a sh:ValidationResult ;
    sh:focusNode [] ;
    sh:resultMessage "Less than 1 values" ;
    sh:resultPath plm:massPercentage ;
    sh:resultSeverity sh:Violation ;
    sh:sourceConstraintComponent sh:MinCountConstraintComponent ;
    sh:sourceShape _:b0 ] ;
  sh:result [
    a sh:ValidationResult ;
    sh:focusNode [] ;
    sh:resultMessage "Value does not have datatype xsd:decimal" ;
    sh:resultPath plm:massPercentage ;
    sh:resultSeverity sh:Violation ;
    sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;
    sh:sourceShape _:b0 ;
    sh:value "100.0" ] ;
  sh:result [
    a sh:ValidationResult ;
    sh:focusNode <http://example.com/132285000223> ;
    sh:resultMessage "Value does not match pattern \"^[0-9]{2,7}-[0-9]{2}-[0-9]$\"" ;
    sh:resultPath plm:casNumber ;
    sh:resultSeverity sh:Violation ;
    sh:sourceConstraintComponent sh:PatternConstraintComponent ;
    sh:sourceShape <http://example.com/ns#casNumberShape> ;
    sh:value "1333-8-4" ]
] .

This report says the data does not conform to the shape (sh:conforms = false) and there are 3 valdiation errors:

  • A plm:massPercentage is missing
  • Another plm:massPercentage has value that does not match expected datatype xsd:decimal
  • A plm:casNumber has value that does not match the regex pattern

This demonstrates the basics of SHACL and how it can be used to validate RDF data according to a set of constraints.

In the next post in the series, we’ll look at using more complex rules defines in SPARQL to define additional constraints on the data.