> Several ways of connecting to an RDF triplestore
There is another way to connect to an RDF triple store [1]. It's done by a company called Triply.
Triply made a product that is basically a GUI that allows you to host RDF data and use SPARQL to query it. You can also query other public datasets [2]. Currently, it's a B2B offering only, but there is a consumer version in the works. I wouldn't know when it comes out.
For now, you can try out the SPARQL querying feature on public datasets [2].
Disclaimer: I work there recently, this post is my own opinion, not of Triply.
[1] That uses Virtuoso or Jena under the hood.
[2] This example uses DBPedia: https://triplydb.com/wikimedia/dbpedia/sparql/dbpedia
The dominant Libary Interchange Format is MARC which predates the relational database by a few years, and is a hierarchical document structure. It's possible, but not particularly helpful, to store it normalised in a relational database.
I gave a presentation almost a decade ago about our work. [1]
The takeaway is that Linked Data is just a more natural fit for modelling Library Data, especially when you use external sources like VIAF[2] to help anchor your identifier URI's.
Sadly, whilst RDF has seen some uptake in the library world (e.g. BIBFRAME[3]), the full potential was never tapped whilst I was working in the space. Quad Stores are very fun to tinker with though, and features such as inference and property paths in SPARQL 1.1[4] allow you to do some interesting things that are difficult, or non-idiomatic, in a relational system.
[1] - here: https://www.slideshare.net/philjohn/linked-library-data-in-t... [2] - http://viaf.org/ [3] - https://www.loc.gov/bibframe/ [4] - https://www.w3.org/TR/2013/REC-sparql11-query-20130321/#prop...
I have never seen a case where a triple store was used because it was necessary to achieve an outcome. It was always part of the premise, to show that a certain task can be achieved using triple stores. The "semantic" label is also problematic, some people think that "semantic" technology is magic and will somehow help them solve their problems.
My experience with semantic technology is from University and a commercial project. An architect who didn't talk to the engineering team decided that "semantic" technology should be used. The project was a catastrophe, we spent most of the time trying to get the technology to work for the simplest things. The situation improved when we started working around the semantic stuff, using a relational DB internally allowed us to improve performance by a few orders of magnitude.
Because all data is stored as triples of subject, predicate and object, the indexing options for improving query performance are limited compared to relational databases. While it's possible to change the graph structure to speed things up, the structure is usually chosen for semantics and defined in an ontology. A change in the structure is also a change of the semantics of the graph.
Given an undocumented triple store, it's quite difficult to figure out the graph structure stored inside. In a relational database you can just run the equivalent of "SHOW TABLES" and go from there. In the semantic world, you need a manual for the ontologies used. It's sad because the whole point of "semantic" technology was to attach meaning to data.
Triple stores also receive way less attention than databases like Postgres or MariaDB and I'd rather use something proven in production scenarios.
In Europe, this format is used as the standard data model for transmission system operators (TSOs) to share network information needed to run the Europe wide electrical grid. ENTSO-E [2] publishes a set of RDFS profiles that more tightly scope the data model to their use cases. [3]
With that use case in mind, why would an RDF database be useful in this case: data size. It's quite easy to get into millions of objects in the data graph for a single distribution feeder. A large utility might have thousands of such feeders, plus associated sub-transmission & transmission infrastructure (the hierarchy is nicely shown in [4]).
This can be represented in a relational database, but a lot of the queries start to become recursive. If object A connects to object B which connects to object C, and you want to query everything connected to A, you don't know that C exists until you've found object B.
Refs:
1. https://en.wikipedia.org/wiki/Common_Information_Model_(elec...
2. https://en.wikipedia.org/wiki/European_Network_of_Transmissi...
4. https://en.wikipedia.org/wiki/Electric_power_distribution#/m...
Recursive queries are natively supported in SQL, via Common Table Expression (CTE's). It's not like a triplestore is doing anything different underneath.
I'll say one disadvantage of RDF is the lack of any well implemented reification of triples.
A practical example of mine is the UniProt database in the life sciences and the European Patent Office SPARQL endpoints.
These two datasets have some intersection of data. Combining these two in a classical datawarehouse with ETL pipelines would cost a few million in start up costs (Full data fidelity, small team 1 year work, optimisitic). The same with non federated RDF/SPARQL is 3 days work. This with federated querying is 2 minutes work.
SQL has a richer ecosystem with many more people confident in it's usage. More deployment options. etc. Which is why often you will see tools like StarDog Virtual Graphs which will turn existing SQL DB's into SPARQL ones (via translating SPARQL->SQL) for in organization federated knowledge graphs. i.e. no to minimal ETL pipelines, direct querying on (standby, copies) of primary datasources.
In some domains the "business" analysts know SQL, even rarer but possible they know SPARQL. Letting them ask any query they can think of not bounded by what is one specific database can be extremely valuable. For organizations that can extract that value the lower market penetration of SPARQL is sad but not a real issue. This works in practice for SPARQL but not for SQL as what can be done with a user stated SERVICE clause in SPARQL requires a DBA to setup a foreign table connection in the SQL world.
Another example is when a few (N) hospitals need to exchange data. They have a few more relational databases (N+2) for this data in house for their patient groups. Upon project commencement they notice that all are SQL, but non are similar from vendor differences, but more problematic modelling issues. Transforming their data to RDF is as complicated as to standardize on one new schema. But the RDF gives immediate integration with SnoMedCT, ICD10 and LOINC which allows easy queries taking into account hierarchical knowledge in those medical standards. Those queries would be possible two write by hand in SQL but are easier in RDF/SPARQL when attaching a minimal OWL reasoner. Then integrating with GeoSpatial data is again easier because in this country that is provided in RDF as well.
NOTE: A full fidelity UniProt Schema (including all subdata sets e.g. UniParc) would be a 150-200 tables depending on some modelling choices. EPO I assume to be in the same order.
NOTE2: While federated querying is a standard SPARQL feature this can of course be limited/turned off depending on the security/legal context.
Being able to create cross links between data independent of a predefined schema seems useful for research though. With RDF, you can just define a predicate on your own and link two objects/subjects in completely different, independent triple stores, if I understand correctly.