SPARQLer is a SPARQL Object-Relational Mapping for PHP, built on top of EasyRDF (the most popular PHP library for RDF handling). In other words: a PHP library to access linked data sources in a object-oriented flavour, hiding the SPARQL query language behind a set of convenient (and ofter more familiar) structures and functions.
Most developers are seasoned with the SQL query language and relational databases (like MariaDB or PosteGreSQL),
where information is organized in tables and columns. Most of the potentials of publicly available and collectively updated
linked "graphs" (like Wikidata or DBPedia)
are still untapped due scarse adoption and the steep learning curve in understanding a different information model.
SPARQLer provides a SQL-like fluent interface to such informations, and permit to a larger audience
to approach those tools. The SPARQLer API is largely inspired by the Laravel's native SQL ORM,
Eloquent, which is already used by many PHP developers.
Plus, even if you already know SPARQL, SPARQLer is a convenient interface to dinamically build your queries
and wrap data in the model of your application.
To install SPARQLer just run
composer require madbob/sparqler
The full code, MIT licensed, is hosted on GitLab (and, of course, open to contributions!).
For reference and introduction, lets make a comparison among SQL and SPARQL.
An usual SQL query looks like:
SELECT column1, column2 FROM table WHERE column3 = 'something';
while an equivalent SPARQL query looks like:
SELECT ?foo ?bar WHERE ?item column1 ?foo . ?item column2 ?bar . ?item column3 'something';
The basic abstraction of SQL is that the data is rappresented as rows into a table, with multiple columns, and each column contains a value and has his own name; the value of a given column can be extracted by matching the value of known other columns within the same row.
The basic abstraction of SPARQL is that you have a single table with only three columns ("subject", "predicate" and "object"), and rows with the same "subject" belongs to the same entity; using multiple combinations of triples subject/predicate/object, where each element can be a parameter (the tokens starting with ?
) having to eventually match in other triples where it appears, you can retrieve the required information.
SPARQLer permits to create SPARQL queries in a more "SQL-like" fashion, introducing a few implicit behavours (that can be overridden, if required, with certain functions and combinations of parameters).
$client->doSelect(['column1', 'column2'])->where('column3', 'something')->get();
Where not otherwise explicited, the "subject" part of the SPARQL query is implicit: the selected attributes and all attributes used in the conditions refer to the same value.
For some example of actual SPARQL queries, in raw format and rewrote with SPARQLer, get a look to the examples page.
A few definitions useful to understand the SPARQLer API applied to the SPARQL model.
http://xmlns.com/foaf/0.1/mbox
is the same of foaf:mbox
foaf:mbox
). In a SPARQLer query, those are wrapped into a Iri term (which may also rappresent the Subject of a specific entity)predicate => value
First of all, you need a Client to build and execute your SPARQL queries.
require_once "./vendor/autoload.php";
use MadBob\Sparqler\Client;
$client = new Client($config);
where $config
is an associative array with the following keys:
The internal HTTP client used to perform requests to the SPARQL endpoint can be accessed and configured as desidered. By default it is an instance of EasyRDF-on-Guzzle to integrate a full-featured Guzzle client (using a cURL handler) with EasyRDF, so you can consult the Guzzle documentation for more options.
$client = new Client($config);
$httpclient = \EasyRdf\Http::getDefaultHttpClient();
$httpclient->setConfig(['timeout' => 20]);
It is also possible to attach to the Client a PSR-3 LoggingInterface, where all generated queries are saved, and a PSR-16 CacheInterface, used only for inference functions.
$client = new Client($config);
$client->setLogger($your_LoggerInterface);
$client->setCache($your_CacheInterface);
A few sample inititialization for common use cases:
// Wikidata client
$client = new Client([
'host' => 'https://query.wikidata.org/sparql',
// For more namespaces used in Wikidata:
// https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Full_list_of_prefixes
'namespaces' => [
'wd' => 'http://www.wikidata.org/entity/',
'wdt' => 'http://www.wikidata.org/prop/direct/',
'rdfs' => 'http://www.w3.org/2000/01/rdf-schema#',
],
]);
// DBPedia client
$client = new Client([
'host' => 'https://dbpedia.org/sparql',
// For more namespaces used in DBPedia:
// https://dpedia.org/sparql/?help=nsdecl
'namespaces' => [
'dbp' => 'http://dbpedia.org/property/',
'dbr' => 'http://dbpedia.org/resource/',
'dbo' => 'http://dbpedia.org/ontology/',
'dct' => 'http://purl.org/dc/terms/',
'dbc' => 'http://dbpedia.org/resource/Category:',
'rdf' => 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
],
]);
// Your own Virtuoso server
$client = new Client([
'host' => 'http://localhost:8890/sparql-auth',
'graph' => 'urn:sparql:tests:insert:informative',
'auth' => [
'type' => 'digest',
'username' => 'dba',
'password' => 'your_password',
],
]);
From a Client instance, it is possible to obtain a Builder for each kind of query you want to perform.
doSelect()
inits a SELECT query: the function accepts an array of items (by default plain strings are converted to Prefixed terms) that will be retrieved for each entity matching the conditions appended to the Builder. The query returns a Result.
$result = $client->doSelect([new OwnSubject(), 'dbp:name'])
->where('dct:subject', new Iri('dbc:Capitals_in_Europe'))
->get();
doSelectDistinct()
acts in the same way, but inits a SELECT DISTINCT query.
doConstruct()
returns a Graph, including multiple Resource, and permits an accurate selection of predicates to be fetched and added to the Graph itself.
$result = $client->doConstruct([['dbp:areaTotalSqMi'], ['dbp:website']])
->where('dct:subject', new Iri('dbc:Capitals_in_Europe'))
->get();
doConstruct()
gets an optional array of Triple as parameters, describing the properties you want to fetch for each entity matching the conditions, but usually you may want to omit the subject (implicit, due the conditions appended to the Builder) and the object (which is automatically mapped into the query for each required predicate). If no parameters are passed, all the predicates of matching entities are fetched from the endpoint.
For convenience, Client has a short hand find()
function which CONSTRUCTs a given subject.
$result = $client->find('dbr:Dublin');
doInsert()
and doDelete()
are used to insert and delete data into the graph. Both accept an array of Triple to specify what to insert or remove, while the conditions appended to the Builder define which entitiesa are the target of insert or remove operation.
$client->doInsert([
['foaf:knows', new Iri('http://mydomain/Person/Foo')],
])->where('foaf:currentProject', new Iri('http://mydomain/Project/Bar'))->run();
$client->doDelete([
['foaf:knows', new Iri('http://mydomain/Person/Foo')],
])->where('foaf:currentProject', new Iri('http://mydomain/Project/Bar'))->run();
Into the Builder happen most of the definition of a SPARQL query, as you define here all of your WHERE conditions. Most of his functions return the same Builder, so to be chained.
Here a summary of the different options you have.
Example | SPARQL |
---|---|
where('rdf:predicate', 'value') |
?subject rdf:predicate 'value' |
The most common condition: the predicate is applied to the implicit OwnSubject of the query. By default, the first parameter is wrapped within a Iri term, the second in a Plain (assumes it is a string) | |
where('rdf:predicate', new Iri('a:subject')) |
?subject rdf:predicate a:subject |
Passing Term objects as parameters, you can enforce their meaning and the way those will be appended into the query. This is true for every conditional function of the Builder | |
where('rdf:predicate', function($query) { $query->where('rdf:other', 'value') }) |
?subject rdf:predicate ?variable . ?variable rdf:other 'value' |
The value of the condition can be a sub-query: a new random Variable will be used for further comparison an evaluations | |
where(new Variable('foo'), 'rdf:predicate', 'value') |
?foo rdf:predicate 'value' |
When three parameters are passed to where() , they become a complete Triple |
|
whereOptional('rdf:predicate', 'value') |
OPTIONAL { ?subject rdf:predicate 'value' } |
An OPTIONAL condition is to filter entities having a given predicate with a given value, or not that predicate at all. To optionally select a given predicate into a SELECT query it is more convenient to use the Optional term | |
where('rdf:predicate', '!=', 'value') |
?subject rdf:predicate ?variable . FILTER ( ?variable != 'value' ) |
Basic evaluation functions are built into the where function, which generates proper FILTER conditions. Supported operators: < > <= >= != |
|
whereIn('rdf:predicate', ['value', 'value2']) |
?subject rdf:predicate ?value VALUES ?value { 'value' 'value2' } |
Many different values can be matched at once | |
whereReverse('rdf:predicate', new Iri('a:subject')) |
a:subject rdf:predicate ?subject |
To reverse the operands of the Triple, and use the implicit OwnSubject as object instead of subject. Here, the second parameter is the new subject of the Triple (may be an explicit Iri or a Variable filtered somewhere else) | |
whereReverse('rdf:predicate', function($query) { $query->where('rdf:other', 'value') }) |
a:subject rdf:predicate ?subject . a:subject rdf:other 'value' |
Also reverted relations can be extended with sub-queries, in which the subject will be inherited from the parent one | |
whereRaw('SPARQL expression') |
SPARQL expression |
An arbitrary expression can be appended to the query | |
filter(function($query) { $query->where('rdf:predicate', 'value') }) |
FILTER { ?subject rdf:predicate 'value' } |
Filters are used to refine the result set with given parameters. To be used for more complex evaluations than the basic ones built into the where function |
|
filterNotExists(function($query) { $query->where('rdf:predicate', 'value') }) |
FILTER NOT EXISTS { ?subject rdf:predicate 'value' } |
Reverts the filter, matching entities for which the given sub query produces no results | |
minus(function($query) { $query->where('rdf:predicate', 'value') }) |
MINUS { ?subject rdf:predicate 'value' } |
Part of the entities matching a query can be excluded from the final result is they match some other condition |
Once you have appended all your conditions, you can finalize the Builder with one of these functions:
get()
is used for doSelect()
and doConstruct()
builders: the query is executed to the SPARQL endpoint defined for the parent Client and a result is returned. More exactly: doSelect()
returns a Result, doConstruct()
returns a Graphrun()
is for builders that do not have a result, those inited with doInsert()
and doDelete()
count()
always performs a SELECT COUNT query (even when not inited with doSelect()
) and returns the number of entities matching the conditionsqueue()
can be used for multiple doInsert()
and doDelete()
queries, to be concatenated and execute once on the SPARQL endpoint$client->doInsert([...])->where(...)->queue();
$client->doInsert([...])->where(...)->queue();
$client->doDelete([...])->where(...)->queue();
$client->runQueue();
All of the tokens into each Triple must be enclosed within a wrapper Term class: if you just pass a string, the Term type is automatically assigned by his position into the Triple itself and his content.
Here a summary of each Term type, with some example.
Class | Description |
---|---|
MadBob\Sparqler\Terms\Variable | Any SPARQL variable (the tokens having a ? before the name). If no parameters are passed, the name is randomly generated. |
This is usually used if you explicitely want a given value to be used in different parts of the query: init one or more Variable PHP variables and pass it as parameter to the different functions | |
MadBob\Sparqler\Terms\OwnSubject | The subject of the query. Can be used as a Variable (actually: it is a Variable) and placed in different parts of the query. |
Multiple OwnSubject distributed within the same query will have the same value (note: subqueries are not part of the parent queries, so a different OwnSubject will be assigned) | |
MadBob\Sparqler\Terms\Iri | Wraps all predicate names and entity subjects. |
The parameters passed to doSelect() , and the first parameter of where() functions (if only two are passed), are automatically wrapped within a Iri |
|
MadBob\Sparqler\Terms\Plain | Includes a generic string that will be enclosed between quotes in the final query. |
Useful to explicitely enforce a string where another type of Term is expected | |
MadBob\Sparqler\Terms\Raw | Includes a generic string that will be appended as-is (with no quotes or escapes) in the final query. |
Useful to enforce specific parts of the query and special syntax non handled by SPARQLer | |
MadBob\Sparqler\Terms\Optional | Used to wrap a Iri (default, if a plain string is passed) or a Variable to be optionally added into the result set. Mostly used in the parameters list of doSelect() |
MadBob\Sparqler\Terms\Aggregate | A SPARQL function applied to some value. Parameters are: the name of the function, the parameter(s) to that function (note: plain strings will be handled as Iri), and optionally an alternative name to hold the final result of the function. |
E.g. Aggregate('COUNT', 'rdf:predicate', 'counter') becomes COUNT(?xyz) as ?counter ... WHERE ... ?subject rdf:predicate ?xyz |
The object returned by doConstruct()
is a Graph (a collection of Resource objects): it directly extends the EasyRDF Graph class, adding a few utilities.
First of all, the SPARQLer's Graph permits to access "top level" resources, those directly involved into the query. As a CONSTRUCT's graph includes also resources linked to those effectively queried, and those also are returned by native resources()
method from EasyRDF, masterResources()
method filters out only the resources which have been asked. Graph is also and iterable object, and when used into a foreach
statement the master resources are iterated.
$graph = $this->client->doConstruct()
->where('dbo:type', new Iri('dbr:Capital_city'))
->where('dbo:timeZone', new Iri('dbr:Central_European_Time'))
->get();
/*
This returns 7349: all resources involved into the query
(the capital cities, their country, their region, their images...)
*/
count($graph->resources());
/*
This returns 18: the actual capital cities in CET
*/
count($graph->masterResources());
Then, Graph has a commit()
method used to push back into the SPARQL endpoint all properties inserted, deleted and modified from the child resources. This is useful to update once multiple resources, and bypass the fact that SPARQL do not provides a native way to perform an UPDATE query (like in SQL).
$graph = $client->doConstruct()->where('foaf:currentProject', new Iri('http://mydomain/Project/Bar'))->get();
foreach($graph as $resource) {
// ... perform multiple set() or add() operations on $resource...
}
$graph->commit();
Builder includes a few extra method, useful in particular situations.
withWikiDataLabels()
appends to the query the formula to access the "wikibase:label" service in Wikidata. Can be chained to the query, before the final instruction for execution, with two parameters: an array of language identifiers (['en']
is the default), and an array including OwnSubject (the default), any Variable appearing on the query, or some Iri: the elemens not explicitely included in the SELECTed predicates will be added anyway to the result set.