It's funny you say that because as soon as I saw semantic web I had a negative emotional nostalgia. I can hardly remember all of the RDF/RSS/Atom stuff from way, way back or what the trigger for that is but I just remember there being rancor swirling around the whole thing. I think there was some petty arguments about who deserved credit for the creation of the formats or something? Wasn't it between a bunch of bloggers? Then XHTML became a battleground since some groups were trying to keep semantic tags out of it while other people wanted them in. I remember just feeling exhausted every time the subject came up since it was like emacs vs. vim or space vs. tabs wars.
The funny thing is, I believe in the promise of the semantic web. I recall Tim Berners-Lee declaring the next frontier was not open source but open data and I agree. He even co-founded an institute around it: https://theodi.org/person/sir-tim-berners-lee/
You're mixing in some stuff, that aren't really Semantic Web related.
RSS vs. Atom was less about the Semantic Web than an squibble between different XML formats, one very loosely specified, the other more ... well-formed. The Semantic Web did had a small foot in the RSS wars - the very first RSS (RSS 0.9 from Netscape) was RDF based and for a short time RSS 1.0 wanted to rebuild RSS on an RDF basis for the expandability of the Semantic Web, but the later discussion were about the XML variants of RSS and then Atom, wether the spec was adequate, wether it was frozen or how and wether it should be fixed, etc.
The XHTML discussions were less about elements in my recollection but about parsing models. XHTML reformulated HTML als XML which meant an error model with no error correction but failure on the first error. And XHTML 2 tried to evolve structural elements by being not backward compatible but defining a somewhat different new dialect. The backslash against XHTML was against that, a group sponsored by the browser makers then formed which wanted to evolve backwards-compatible and to standardize the parsing of tag soup → HTML5.
(„Semantic elements“ were often a shorthand for „instead of a dumb div use the appropriate HTML element. That was more the quest of the web standards project than the Semantic Web.)
(Slight overlap: How to embed Semantic Web statements has a small relationship with XHTML - RDFa started imho in an XHTML 2 module.)
I somewhat miss that time. All these bloggers with an interest in web standards and how to do them best had their own idealism and the cross blog and W3C discussions were always interesting. Today web standards don't have that publicity and idealism anymore, they seem more like an engineering collaboration of the 2½ big browser makers which get to decide among themselves. Maybe it was always so, but it seemed different at that time.
One of the worst isn't even technical, it's the community. There are some great people in the community but there are also a large number of extremely toxic people that drive people away.
Maybe it's just the subset of the community that I choose to deal with, but the folks on the Jena mailing lists (pre and post Apache) have always been very gracious and helpful in my experience. And Ralph Hodgson, one of the co-founders of Top Quadrant came to a Triangle Java User's Group talk that I once gave on Semantic Web technologies, along with a bunch of other Top Quadrant people... and despite the fact that my company competes with them in certain areas, they were perfectly cordial and pleasant to interact with. Likewise for the other times that I've had Top Quadrant folks show up at events where I was speaking.
Maybe it's just dumb luck on my part, or whatever, but I have found no major issues with toxic people in the SemWeb community. shrug
Databases, that are run on a shoe string, aren't stable so we're going to make everything federated with linked fragments? Fine, give it a go but you don't need to go on and on about how databases are inadequate because someone isn't willing to foot the AWS bill so they can host dbpedia for ya.
Lets have a go at JSON-LD. RDF/XML is finally recognized as a mistake. A somewhat reasonable mistake because everyone was XML crazy at the time. So what do we do? The exact same thing except this time it's JSON. But it's even worse. We choose a serialization that is prized for its simplicity and we foist the entire RDF stack onto it? Then they claim that JSON-LD isn't about the semantic web so we're good and Jedi mind trick it with, "This isn't the RDF you're looking for".
Because we aren't done overcomplicating simple things we take aim at CSV with CSVW. Granted CSV has some subtle complexities but it's easy and reasonably compact. So now we're going to add metadata to csv files with rdf and then serialize it into JSON as JSON-LD. Great. How do I find this metadata. Either a well known location or in a link header. Whoops I can't publish metadata and reference your csv file. Lets convert your csv fie to rdf. WTF. my 500Mb csv file just became 1.5B triples and it's taking 8hrs. to load it into my triple store!
Don't get me started on people who call themselves ontologists. They're really zombies but instead of eating brains they eat budgets. They should be dispatched the same way, with a shotgun blast to the face. They generally can't justify their decision even though there is a framework to do that, onto clean. I have yet to meet one who even knew what that was. They just convince management that what they're doing is intellectually unattainable by mere developers although they'd be lost without protege, top braid, or excel and what they produce is generally an incomputable pile of garbage. It's always OWL full. "Class or property? Class or Property? Well is is an "is a" relationship."
I'm done writing so I'll just include a list of the half baked ideas that sound good but are a day late and a dollar short. LDP, R2RML, ShEX, SHACL, DCAT, RDF Data Cube, WebID.....
My wife always says to say something nice so I'm going to say SKOS. SKOS is ok.
Or maybe contact O'Reilly and write an intro book "Semantic Web: Just the Good Parts" for their series.
On the research side there are two kinds of research papers: the one that proposes an ontology for a domain, and the one that describes the conversion of an existing resource to RDF. I've never seen a paper where SW was used for something new and interesting and that would have been impossible without SW.
That being said, they are also both technical and conceptual pain points that are plaguing RDF. Basically the tech is trying to address too many things: both metadata and data, and every kind of data. "IRIs that can be URLs than can be sometimes dereferenced and sometimes not, but it's better if they are and then it's Linked Data" kind of thing makes it hard to assume (and thus build) anything.
So, RDF have been success in a few domains (biology) but in most case it doesn't offer a real competitive advantage over simpler and more expressive technologies such as graph databases.
PS: @zcw100 if you where to really write a book about semantic web, drop me a line please.
Whether you use semantic web tech or not that's still a common problem that doesn't always have a good plug and play solution. There's still a lot of places using jsonld format for metadata and cataloging information. You can google cooking recipes and get ratings, cook time; search for movies and see how high rated the movie is and who made it with a synopsis of the plot, all of these are product metadata powered by rdfs or jsonld metadata, a relic of the semantic web. It would be incorrect to say semantic web is dead. Any AI that can effectively use wikidata as a fact table would be jeopardy grade. There's still new tools coming out like RDFox that apply first order logic at multicore speed across huge datasets for reasoning. There is work being done to make it horizontally scalable. I think people will just go on an endless loop of getting the same pain points and creating new tools using the trending tech of the day, but even in this day and age, sometimes something like prolog or picat is what you need.
Isn't that computationally infeasible? Semantic web standards are based on description logics, i.e. multi-modal logics chosen specifically for computational expediency.
Also, I wouldn't describe JSON-LD as a "relic" of anything. It's a fairly recent standard in the grand scheme of things, and many interesting projects these days implicitly rely on it.
[1] For example if you try to figure out if a formula is satisfiable. You can for sure do this using truth tables. The catch is that you're looking at 2^n complexity where n is the number of propositions in your formula.
You can hook in your own reconciliation end point which we do at work to expand internal knowledge graphs.
The basic capabilities work ok, but lots of the additional capabilities have atrophied away.
I really want to look into how this could ingest my own post-GDPR data exports, as well as data sanitization for ML projects.
However, the efforts I've seen seem to be missing some critical factors for longer-term success. I think we've got a lot of work to do with regards to knowledge representation in general.
One of the big things for me is that the context for any fact is critical for it to be true or not.
You can have a fact like "Tim Cook is the CEO of Apple", represented in a graph like you would expect. However, that is only true today. Ten years ago it was Steve Jobs. Without explicit context encoded in the information graph, this web of data isn't as useful as it could be.
Context is important for reasoning in all kinds of situations. "What if Steve Ballmer was CEO of Apple?", is a hypothetical context, where it may be useful to do reasoning about. The context of "Who is the most distinguished captain of the Enterprise?" could be about the real world US Navy, or a fictional Star Trek universe (of which there are multiple).
Though you probably don't need datomic, it would be not too complicated to model this in neo4j or some other RDF graph that supports arbitrary sized tuples. Datomic just supports this feature as a first class value offering.
Almost all the engineering problems cited in the original post are still basically there, but graphical models are still the least painful way of doing this, particularly when trying to share data between institutions. Example: https://linked.art/model/assertion/
This is why much of the hubbub over property graphs puzzles me. If you need a relationship to have its own properties in an RDF graph, just turn it into a class. What's the big deal?
I have also seen a great deal of consultant money, programmer time, sys-admin sweat, and the like focused on these toweringly-designed, completely-unused triple stores, layer upon layer of hot technologies (ever-moving, construction on the tower never ceased) fused together to create a resource-intense monstrosity that, at the end of the day, barely got used. But hey, let's look at that jazz semantic web example one more time.
The most painful part is that I understand the urge to build a gleaming repository for information, where the cool URIs never change; SPARQLing pinnacles, ready to broadcast the Library of Alexandria, glimmer; and the serene manifold of abstract information lies RESTful ... but I have come to understand that the web of today is an endlessly bulldozed mudscape where Someone Very Important has to have that URL top-level yesterday (never mind that they will forget about it tomorrow), of shoddy materials and wildly varying workmanship, and where nobody is listening to your eager endpoints because the commercials are just too loud. I too once labored for information architecture, to have the correct thing in the obvious place, with accurate links and current knowledge, to provide visitors with the knowledge they desired ... but PR preempted all of it to push yet more nice photographs in yet another place: the Web as a technology for distributing images that would once live on glossy pamphlets.
The vision is lovely, but we who have always lived in the castle have walked alone.
Remember when microformats were all the rage, and you could get hReview or hRecipe or XFN data everywhere?
Then every host in turn realized that actually, it's _better_ if people can't scrape your site, and it's even better if they can't even see it and it's behind a login wall.
The semantic data which has actually been implemented on a wide scale happened because someone could go to their boss and say “Spending time on x will mean better Google ranking” or “Facebook will use their new sharing display for our pages”, and it was orders of magnitude simpler to implement so the time and risk were far more palatable.
This is a different outcome than in the commercial setting where the W3C is still imagining people as users of their computer rather than consumers of the services their computers connect to. But it also means that in certain technical domains where e.g. publication results are scaled out to oblivion but the ontologies are regular or made easily negotiable, there can be benefits for researchers.
Same for the semantic web. Show the benefit for the publisher.
schema.org and WikiData are great resources and for large companies, using these as a foundation for their own internal Knowledge Graphs can make sense. This expense is (maybe?) too large for small and medium size companies, they would not get enough benefit for the cost.
I worked with Google’s Knowledge Graph as a contractor, and I am still a believer in the technology but I also respect other people’s well founded scepticism.
A couple of months ago I got interested in adding semantic information to my posts so I modified the generator to add some of the common semantic tags. It was an annoying job, since the semantic information pollutes the structure of the html.
Can anyone tell me what the semantic web does for me as a small-time publisher? Is it for search engines? Does it really matter that a book review (for instance, I have a few) is tagged properly?
Yes, in practice it is mostly for bigger fish in the pond to easily identify and steal your content as needed.
For example, Google was using reviews from small competitors' sites in Google Shopping.
In a lot of cases, the information was there to get eyeballs--so this is undesirable.
I guess if you don't really care about the eyeballs it can be "useful" for the big fish to pay most of the cost of serving the fraction of your server response that the end user was looking for...
The markup you added - it depends on what exactly you did. Did you add the markup for schema.org? That's in practice solely for Google. The SEO promise there is that Google will make use of the information provided and format some information nicely, which can lead to more clicks. https://moz.com/learn/seo/serp-features explains that not badly. For things like reviews I can imagine it to be quite useful.
If the semantic web was better supported, you could have a semantic annotation precisely identifying the books you are reviewing (whether by ISBN edition or otherwise), and reusers of your content (users, search engines or others) would be able to programmatically associate your review with similar content.
In what way? Both the html and the metadata is intended to make your website machine-friendly. You may find the html structure polluted, but crawlers would find it more informative.
> Is it for search engines?
Yes. And Accessibility.
Using semantic HTML means using <article> rather than yet another <div>. What GP is referring to, however, is adding extra information to your HTML detailing what kind of data is in your tags, e.g.:
<p vocab="http://schema.org/" typeof="Person">
<span property="name">Christopher Froome</span> was sponsored by
<span property="sponsor" typeof="http://schema.org/Organization">
<a property="url" href="http://www.skysports.com/">Sky</a></span> in the Tour de France.
</p>
Here, the vocab, typeof and property attributes are used to add semantic information to the HTML. It might also give you an idea of why one might consider that a chore, especially if it doesn't appear to provide any benefit, like making your site accessible to users of screen readers.I agree with a lot of the problems noted in other posts, and would add two other problems from the authoring side:
1. Identifying and employing sound semantics requires a level of thought and clarity that I don't think most people are habituated to working at. It raises the bar somewhat on who can be contributing (either they have to understand and take care with the semantics, or you need a separate person to handle them?)
2. I may be missing some good tools, but I haven't been able to find a good low-friction semantic authoring experience. Even if you are mentally prepared to write with explicit semantics, it still adds a lot of friction to the writing process (or requires subsequent semantic-edit passes).
As for semantic markup being confusing and usually wrong, I don't know where you get that.