SPARQLer

Your "sparqling" ORM for PHP

Photo by Jaeyoon Jeong on Unsplash

What is SPARQLer?

SPARQLer is a SPARQL Object-Relational Mapping for PHP, built on top of EasyRDF (the most popular PHP library for RDF handling). In other words: a PHP library to access linked data sources in a object-oriented flavour, hiding the SPARQL query language behind a set of convenient (and ofter more familiar) structures and functions.

Why SPARQLer?

Most developers are seasoned with the SQL query language and relational databases (like MariaDB or PosteGreSQL), where information is organized in tables and columns. Most of the potentials of publicly available and collectively updated linked "graphs" (like Wikidata or DBPedia) are still untapped due scarse adoption and the steep learning curve in understanding a different information model.
SPARQLer provides a SQL-like fluent interface to such informations, and permit to a larger audience to approach those tools. The SPARQLer API is largely inspired by the Laravel's native SQL ORM, Eloquent, which is already used by many PHP developers.
Plus, even if you already know SPARQL, SPARQLer is a convenient interface to dinamically build your queries and wrap data in the model of your application.

Install

To install SPARQLer just run

composer require madbob/sparqler

The full code, MIT licensed, is hosted on GitLab (and, of course, open to contributions!).

SPARQL 101

For reference and introduction, lets make a comparison among SQL and SPARQL.

An usual SQL query looks like:

SELECT column1, column2 FROM table WHERE column3 = 'something';

while an equivalent SPARQL query looks like:

SELECT ?foo ?bar WHERE ?item column1 ?foo . ?item column2 ?bar . ?item column3 'something';

The basic abstraction of SQL is that the data is rappresented as rows into a table, with multiple columns, and each column contains a value and has his own name; the value of a given column can be extracted by matching the value of known other columns within the same row.

The basic abstraction of SPARQL is that you have a single table with only three columns ("subject", "predicate" and "object"), and rows with the same "subject" belongs to the same entity; using multiple combinations of triples subject/predicate/object, where each element can be a parameter (the tokens starting with ?) having to eventually match in other triples where it appears, you can retrieve the required information.

SPARQLer permits to create SPARQL queries in a more "SQL-like" fashion, introducing a few implicit behavours (that can be overridden, if required, with certain functions and combinations of parameters).

$client->doSelect(['column1', 'column2'])->where('column3', 'something')->get();

Where not otherwise explicited, the "subject" part of the SPARQL query is implicit: the selected attributes and all attributes used in the conditions refer to the same value.

For some example of actual SPARQL queries, in raw format and rewrote with SPARQLer, get a look to the examples page.

Glossary

A few definitions useful to understand the SPARQLer API applied to the SPARQL model.

  • Client is the object which provides the connection to the SPARQL endpoint. This includes the API to init a Builder
  • Builder is the object used to compose and execute the actual query, as a concatenation of conditions and filters
  • Ontology is the description of available Predicates within a given semantic context. Each ontology may refer to other ontologies, and the data into a SPARQL model usually include multiple ontologies. The name of an Ontology is an URL, more often rappresented with a prefix
  • Namespace is the prefix used to short the name of Predicates included into a given Ontology. http://xmlns.com/foaf/0.1/mbox is the same of foaf:mbox
  • Term is a single token into the query; there are many Term types, each providing a specific meaning and altering the behavior of the Triple in which appear and/or of the whole query
  • Triple is a set of three Term: the first is the subject the second is predicate and the third is object. In the SPARQLer public API those are rappresented just as arrays, if a minor number of elements is provided (usually 2, sometime 1) the missing ones are filled implicitely
  • Subject is a string, often an URL, that aggregates multiple Triple. In the SPARQL data model, all triples having the same "subject" belong to the same entity. In a SPARQLer query, this information is held into a OwnSubject term
  • Predicate is the name of a single attribute in an entity: usually is an URL, or may be in "short" form using the prefix of the Namespace (e.g. foaf:mbox). In a SPARQLer query, those are wrapped into a Iri term (which may also rappresent the Subject of a specific entity)
  • Object is the value for a given attribute. Often it is rappresented as a Literal
  • Result is the object containing the product of a SELECT query: can be iterated and each element is an associative array in the form predicate => value
  • Literal is an object usually used to enclose a value (a string, a number, a date...) and his own datatype. Leveraging EasyRDF's TypeMapper it is possible to automatically convert those values in structured objects
  • Graph is a collection of Resource, usually returned by a CONSTRUCT query
  • Resource rappresents a single entity, having his own Subject and his own properties

Client

First of all, you need a Client to build and execute your SPARQL queries.

require_once "./vendor/autoload.php";
use MadBob\Sparqler\Client;
$client = new Client($config);

where $config is an associative array with the following keys:

  • host (required): the SPARQL endpoint you want to query
  • graph (optional): the name of the graph you want to query. Useful if you want to interact with your own linked data store
  • auth (optional): an associative array with the authentication parameters you need to access the SPARQL endpoint. Required in particular if you want to interact with your own linked data store. If defined, it must contain the following keys:
    • type: one of the authentication method supported by Guzzle: basic, digest or ntml
    • username: your username
    • password: your password
  • namespaces (optional): an associative array with the namespaces to be used in your queries. By default EasyRDF already defines the most common ones; if this parameter is defined, the new array overrides the default one
  • omit_prefix (optional): if set to true, the final queries will not include PREFIXes at the beginning

The internal HTTP client used to perform requests to the SPARQL endpoint can be accessed and configured as desidered. By default it is an instance of EasyRDF-on-Guzzle to integrate a full-featured Guzzle client (using a cURL handler) with EasyRDF, so you can consult the Guzzle documentation for more options.

$client = new Client($config);
$httpclient = \EasyRdf\Http::getDefaultHttpClient();
$httpclient->setConfig(['timeout' => 20]);

It is also possible to attach to the Client a PSR-3 LoggingInterface, where all generated queries are saved, and a PSR-16 CacheInterface, used only for inference functions.

$client = new Client($config);
$client->setLogger($your_LoggerInterface);
$client->setCache($your_CacheInterface);

A few sample inititialization for common use cases:

// Wikidata client
$client = new Client([
  'host' => 'https://query.wikidata.org/sparql',

  // For more namespaces used in Wikidata:
  // https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Full_list_of_prefixes
  'namespaces' => [
    'wd' => 'http://www.wikidata.org/entity/',
    'wdt' => 'http://www.wikidata.org/prop/direct/',
    'rdfs' => 'http://www.w3.org/2000/01/rdf-schema#',
  ],
]);

// DBPedia client
$client = new Client([
  'host' => 'https://dbpedia.org/sparql',

  // For more namespaces used in DBPedia:
  // https://dpedia.org/sparql/?help=nsdecl
  'namespaces' => [
    'dbp' => 'http://dbpedia.org/property/',
    'dbr' => 'http://dbpedia.org/resource/',
    'dbo' => 'http://dbpedia.org/ontology/',
    'dct' => 'http://purl.org/dc/terms/',
    'dbc' => 'http://dbpedia.org/resource/Category:',
    'rdf' => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
  ],
]);

// Your own Virtuoso server
$client = new Client([
  'host' => 'http://localhost:8890/sparql-auth',
  'graph' => 'urn:sparql:tests:insert:informative',
  'auth' => [
    'type' => 'digest',
    'username' => 'dba',
    'password' => 'your_password',
  ],
]);

From a Client instance, it is possible to obtain a Builder for each kind of query you want to perform.

doSelect and doSelectDistinct

doSelect() inits a SELECT query: the function accepts an array of items (by default plain strings are converted to Prefixed terms) that will be retrieved for each entity matching the conditions appended to the Builder. The query returns a Result.

$result = $client->doSelect([new OwnSubject(), 'dbp:name'])
    ->where('dct:subject', new Iri('dbc:Capitals_in_Europe'))
    ->get();

doSelectDistinct() acts in the same way, but inits a SELECT DISTINCT query.

doConstruct

doConstruct() returns a Graph, including multiple Resource, and permits an accurate selection of predicates to be fetched and added to the Graph itself.

$result = $client->doConstruct([['dbp:areaTotalSqMi'], ['dbp:website']])
    ->where('dct:subject', new Iri('dbc:Capitals_in_Europe'))
    ->get();

doConstruct() gets an optional array of Triple as parameters, describing the properties you want to fetch for each entity matching the conditions, but usually you may want to omit the subject (implicit, due the conditions appended to the Builder) and the object (which is automatically mapped into the query for each required predicate). If no parameters are passed, all the predicates of matching entities are fetched from the endpoint.

For convenience, Client has a short hand find() function which CONSTRUCTs a given subject.

$result = $client->find('dbr:Dublin');

doInsert and doDelete

doInsert() and doDelete() are used to insert and delete data into the graph. Both accept an array of Triple to specify what to insert or remove, while the conditions appended to the Builder define which entitiesa are the target of insert or remove operation.

$client->doInsert([
    ['foaf:knows', new Iri('http://mydomain/Person/Foo')],
])->where('foaf:currentProject', new Iri('http://mydomain/Project/Bar'))->run();
$client->doDelete([
    ['foaf:knows', new Iri('http://mydomain/Person/Foo')],
])->where('foaf:currentProject', new Iri('http://mydomain/Project/Bar'))->run();

Builder

Into the Builder happen most of the definition of a SPARQL query, as you define here all of your WHERE conditions. Most of his functions return the same Builder, so to be chained.

Here a summary of the different options you have.

Example SPARQL
   
where('rdf:predicate', 'value') ?subject rdf:predicate 'value'
The most common condition: the predicate is applied to the implicit OwnSubject of the query. By default, the first parameter is wrapped within a Iri term, the second in a Plain (assumes it is a string)
   
where('rdf:predicate', new Iri('a:subject')) ?subject rdf:predicate a:subject
Passing Term objects as parameters, you can enforce their meaning and the way those will be appended into the query. This is true for every conditional function of the Builder
   
where('rdf:predicate', function($query) { $query->where('rdf:other', 'value') }) ?subject rdf:predicate ?variable . ?variable rdf:other 'value'
The value of the condition can be a sub-query: a new random Variable will be used for further comparison an evaluations
   
where(new Variable('foo'), 'rdf:predicate', 'value') ?foo rdf:predicate 'value'
When three parameters are passed to where(), they become a complete Triple
   
whereOptional('rdf:predicate', 'value') OPTIONAL { ?subject rdf:predicate 'value' }
An OPTIONAL condition is to filter entities having a given predicate with a given value, or not that predicate at all. To optionally select a given predicate into a SELECT query it is more convenient to use the Optional term
   
where('rdf:predicate', '!=', 'value') ?subject rdf:predicate ?variable . FILTER ( ?variable != 'value' )
Basic evaluation functions are built into the where function, which generates proper FILTER conditions. Supported operators: < > <= >= !=
   
whereIn('rdf:predicate', ['value', 'value2']) ?subject rdf:predicate ?value VALUES ?value { 'value' 'value2' }
Many different values can be matched at once
   
whereReverse('rdf:predicate', new Iri('a:subject')) a:subject rdf:predicate ?subject
To reverse the operands of the Triple, and use the implicit OwnSubject as object instead of subject. Here, the second parameter is the new subject of the Triple (may be an explicit Iri or a Variable filtered somewhere else)
   
whereReverse('rdf:predicate', function($query) { $query->where('rdf:other', 'value') }) a:subject rdf:predicate ?subject . a:subject rdf:other 'value'
Also reverted relations can be extended with sub-queries, in which the subject will be inherited from the parent one
   
whereRaw('SPARQL expression') SPARQL expression
An arbitrary expression can be appended to the query
   
filter(function($query) { $query->where('rdf:predicate', 'value') }) FILTER { ?subject rdf:predicate 'value' }
Filters are used to refine the result set with given parameters. To be used for more complex evaluations than the basic ones built into the where function
   
filterNotExists(function($query) { $query->where('rdf:predicate', 'value') }) FILTER NOT EXISTS { ?subject rdf:predicate 'value' }
Reverts the filter, matching entities for which the given sub query produces no results
   
minus(function($query) { $query->where('rdf:predicate', 'value') }) MINUS { ?subject rdf:predicate 'value' }
Part of the entities matching a query can be excluded from the final result is they match some other condition

Once you have appended all your conditions, you can finalize the Builder with one of these functions:

  • get() is used for doSelect() and doConstruct() builders: the query is executed to the SPARQL endpoint defined for the parent Client and a result is returned. More exactly: doSelect() returns a Result, doConstruct() returns a Graph
  • run() is for builders that do not have a result, those inited with doInsert() and doDelete()
  • count() always performs a SELECT COUNT query (even when not inited with doSelect()) and returns the number of entities matching the conditions
  • queue() can be used for multiple doInsert() and doDelete() queries, to be concatenated and execute once on the SPARQL endpoint
$client->doInsert([...])->where(...)->queue();
$client->doInsert([...])->where(...)->queue();
$client->doDelete([...])->where(...)->queue();
$client->runQueue();

Terms

All of the tokens into each Triple must be enclosed within a wrapper Term class: if you just pass a string, the Term type is automatically assigned by his position into the Triple itself and his content.

Here a summary of each Term type, with some example.

Class Description
   
MadBob\Sparqler\Terms\Variable Any SPARQL variable (the tokens having a ? before the name). If no parameters are passed, the name is randomly generated.
This is usually used if you explicitely want a given value to be used in different parts of the query: init one or more Variable PHP variables and pass it as parameter to the different functions
   
MadBob\Sparqler\Terms\OwnSubject The subject of the query. Can be used as a Variable (actually: it is a Variable) and placed in different parts of the query.
Multiple OwnSubject distributed within the same query will have the same value (note: subqueries are not part of the parent queries, so a different OwnSubject will be assigned)
   
MadBob\Sparqler\Terms\Iri Wraps all predicate names and entity subjects.
The parameters passed to doSelect(), and the first parameter of where() functions (if only two are passed), are automatically wrapped within a Iri
   
MadBob\Sparqler\Terms\Plain Includes a generic string that will be enclosed between quotes in the final query.
Useful to explicitely enforce a string where another type of Term is expected
   
MadBob\Sparqler\Terms\Raw Includes a generic string that will be appended as-is (with no quotes or escapes) in the final query.
Useful to enforce specific parts of the query and special syntax non handled by SPARQLer
   
MadBob\Sparqler\Terms\Optional Used to wrap a Iri (default, if a plain string is passed) or a Variable to be optionally added into the result set. Mostly used in the parameters list of doSelect()
   
MadBob\Sparqler\Terms\Aggregate A SPARQL function applied to some value. Parameters are: the name of the function, the parameter(s) to that function (note: plain strings will be handled as Iri), and optionally an alternative name to hold the final result of the function.
E.g. Aggregate('COUNT', 'rdf:predicate', 'counter') becomes COUNT(?xyz) as ?counter ... WHERE ... ?subject rdf:predicate ?xyz

Resources and Graphs

The object returned by doConstruct() is a Graph (a collection of Resource objects): it directly extends the EasyRDF Graph class, adding a few utilities.

First of all, the SPARQLer's Graph permits to access "top level" resources, those directly involved into the query. As a CONSTRUCT's graph includes also resources linked to those effectively queried, and those also are returned by native resources() method from EasyRDF, masterResources() method filters out only the resources which have been asked. Graph is also and iterable object, and when used into a foreach statement the master resources are iterated.

$graph = $this->client->doConstruct()
    ->where('dbo:type', new Iri('dbr:Capital_city'))
    ->where('dbo:timeZone', new Iri('dbr:Central_European_Time'))
    ->get();

/*
    This returns 7349: all resources involved into the query
    (the capital cities, their country, their region, their images...)
*/
count($graph->resources());

/*
    This returns 18: the actual capital cities in CET
*/
count($graph->masterResources());

Then, Graph has a commit() method used to push back into the SPARQL endpoint all properties inserted, deleted and modified from the child resources. This is useful to update once multiple resources, and bypass the fact that SPARQL do not provides a native way to perform an UPDATE query (like in SQL).

$graph = $client->doConstruct()->where('foaf:currentProject', new Iri('http://mydomain/Project/Bar'))->get();

foreach($graph as $resource) {
    // ... perform multiple set() or add() operations on $resource...
}

$graph->commit();

Extras

Builder includes a few extra method, useful in particular situations.

withWikiDataLabels() appends to the query the formula to access the "wikibase:label" service in Wikidata. Can be chained to the query, before the final instruction for execution, with two parameters: an array of language identifiers (['en'] is the default), and an array including OwnSubject (the default), any Variable appearing on the query, or some Iri: the elemens not explicitely included in the SELECTed predicates will be added anyway to the result set.