Make the “semantic web” web 3.0 again – with the help of SQLite (opens in new tab)

(ansiwave.net)

461 pointssekao4y ago213 comments

213 comments

155 comments · 45 top-level

dgudkov4y ago· 23 in thread

I have a similar idea with PDF documents. Instead of having the royal PITA of parsing generated PDFs (e.g. invoices), things would be much simpler if every generated PDF came with a built-in SQLite or JSON that contains the structured data of that PDF.

One day I will do it.

Speaking more broadly, whether we talk about HTML or PDF it's the same problem: documents should have two representations - human-friendly and machine-friendly until AI gets so good that only having the human-friendly representation is enough.

DyslexicAtheist4y ago

> ... documents should have two representations - human-friendly and machine-friendly until AI gets so good ...

when I download papers from arxiv I sometimes choose the LaTex version because it often comes with commented out ideas that didn't make it into the paper. The author thought process becomes clear. The Metadata helps me understand the whole thing quicker the same way the semantic web helps the machine.

Perhaps there is a clever philosophical analogy in there somewhere about "us becoming the machine" or "the map becoming the territory", but I can't put my finger on it.

TuringTest4y ago

> Perhaps there is a clever philosophical analogy in there somewhere about "us becoming the machine" or "the map becoming the territory", but I can't put my finger on it.

I personally believe the key is in "information architecture". We have been conveying information as a linear sequence of words for so long that we don't know yet how to best exploit non-linear formats.

Programming languages harness the relation between specific instructions and the structure on which they are embedded; but this structure is oriented towards building a single executable block that controls a machine step by step. We have yet to build tools equivalent to IDEs to better exploit the overall structure of knowledge, but for the goal of understanding a topic at all levels of detail, bot at the local flow of ideas and the overall relation between its subtopics.

The first widespread step in that direction was the 1.0 static World Wide Web, and we have learnt a lot thanks to it so we can now improve upon it. I have great hopes in online notebooks and no-code spreadsheet-like tools as the basis of such information-processing environments.

__MatrixMan__4y ago

I have a similar, "One day I will do it" project. The idea is that somebody cares enough to make the semantic web work--it's just not the people with write access to the data.

I think we can use CTPH algorithms to fingerprint the data independently of whatever names are used, and then we can use that to find representations of the "same data" submitted by other users. Probably there would be some reputation stuff involved, a web of trust, etc. The flow would go something like this:

COGNIZE:

0. Encounter messy data in the wild (has pagination, timestamps of access, etc), need other representation (human/computer/whatever)

1. Calculate CTPH fingerprints, use them to search for link: miss

3. Clean data the hard way and publish canonical representation (ipfs?)

4. Generate missing representation the hard way, publish that too

5. Calculate the fingerprints common to both the missing representation and the canonical representation, and publish it as a "link" between the two. Unlike traditional web links, this one is bidirectional.

RECOGNIZE

1. Different user encounters "same" data in the wild

2. Calculate fingerprints, use them to search for canonical representation: hit

3. Find further links to see what other representations of the "same data" are available, download them if desired.

The fingerprint stuff works, but there's a lot of work left to be done re: mapping fuzzy hashes of "in the wild" data to cryptographic ones of "canonical representations" and finding ways to incentivize users to go through the hassle of the "cognize" step so that other users can benefit from the "recognize" step.

Sorry for talking your ear off, it just feels good to know I'm not the only one working on something like this, even if our approaches are quite different (mine works on PDF's only because it works on arbitrary bytes). Good luck with yours.

melony4y ago

Manual curation was why Berners-lee's Semantic Web failed. Unlike ARIA annotations which is required to serve certain clientele, the semantic web offers no inherent value to the developer. There is an entire field of machine learning dedicated to automatically generating knowledge graphs, I would start there rather than trying to manually curate and annotate.

__MatrixMan__4y ago

That's a good point. Once the tooling is there for the manual workflow, I expect that an ML-driven approach will plug in to the same hooks without fuss.

But I don't think it changes the main problem to be solved, which is that it's data consumers who want the semantic web, but so far it's been up to data providers to implement it. We need to be able to create links between data with without the participation of whoever hosts it.

1 more reply

wnkrshm4y ago

What you describe has classically been the work of librarians or archivists.

__MatrixMan__4y ago

Ah well I guess I'm looking to make freelance librarian into something you can be.

I figure if you configure the client to keep track of whose annotations you have a history of using, you've got a real granular view of which content providers to pay. I'm imagining a game where we all put $5 in at the start of the month and we all pay each other based on whose content we use the most. Some users will pay more than they make, others will make more than they pay.

manggit4y ago

I think we need to reframe the problem. If we think of it as JSON data that comes with a PDF (similar to what @xmprt suggests "PDFs as checksum") then we have the benefit of machine-readable data that is transportable but also the attached human-readable PDF version of the data.

This is exactly what we are trying to achieve at Anvil. 1. Provide the no-code tools to make it easy to convert existing PDF forms into web forms. 2. Share the web forms with perspective customers instead of PDF forms as email attachments 3. PDFs are generated as part of the workflow once the data is captured and represented in structured JSON. 4. (optional) request certification of the PDF via e-signatures

The end result is a JSON payload that can be shared via API as well as a static PDF that is stored for human consumption. In most cases, we find that our customers actually just use the PDF as an interface with legacy systems (IRS, Banks, Insurance Companies) that haven't yet figured out how to modernize to a data-first business model.

Of course this really only addresses PDFs that are used for information capture and transfer between two parties. But most PDFs that are not "standardized-forms" are made for consumption by humans not by machines (think ebooks, journal articles, graphics etc), and therefore having a JSON payload of the data attached doesn't really matter.

mpweiher4y ago

The ZUGFeRD standard in Germany does this, though the embedded data is XML.

https://de.wikipedia.org/wiki/ZUGFeRD

My CodeDraw app also puts the source code in PDFs or PNGs that it generates.

daveydave4y ago

I have an application that converts word documents to RDF conformant with the SPAR ontologies (mainly DoCO http://www.sparontologies.net/ontologies/doco), so it contains things like headers, numbering, contains/within relationships explicit in the RDF. I've used it successfully with PDFs by converting to DOCX first. Is this the sort of thing you had in mind? Not here to sell it! I think this is a genuinely interesting unexplored area ..

dgudkov4y ago

The PDF format supports attachments (embedded files). I'm thinking about a set of libraries and/or a command-line utility that would make it trivially easy to attach a SQLite|JSON file to a PDF or extract one from a PDF. This won't fix existing files, of course, but at least for those apps that generate PDFs it will be easier to embed a SQLite/JSON into a generated PDF.

oever4y ago

XMP is meant for adding semantic information to (parts of) PDF files.

https://en.wikipedia.org/wiki/Extensible_Metadata_Platform

jonnydubowsky4y ago

This looks awesome! The decision to combine structural and rhetorical ontologies, seems like it optimizes the best between cost and availability, in the sweetSpot of the users actual requirements when working with research and academic documents.

Is this compatible with User Defined Language?

https://ivan-radic.github.io/udl-documentation/

daveydave4y ago

The RDF output I would typically serialise as turtle, which I believe there is existing UDL for in notepad++ though I don't use it

fatcow4y ago

There's the EPC QR code standard in Europe which is underused...

https://en.wikipedia.org/wiki/EPC_QR_code

biztos4y ago

I like this, and of course you could also embed the text of the document. Nothing stops us from doing this right now.

But: don't we need some way to prove that the data matches what is visually rendered in the PDF reader?

And if we can prove that the embedded data matches the rendered document, couldn't that same logic just be used in reverse to generate the structured data from the renderable PDF?

xmprt4y ago

That's not necessary. If you think of the PDF as a checksum then it's possible to have a one way function that generates the PDF (checksum) but that you can't retrieve the original JSON from.

I do really like the idea of having a checksum of some sort if we end up embedding metadata like this.

biztos4y ago

That's a good idea: the tool that processes the data can just run your function and if the file doesn't match the result then it's rejected.

But in the real world people are going to want to annotate the PDFs, there will be open-and-save cycles that add metadata and break the checksum, etc and so on, even without considering malicious actors. Restricting all that is maybe easy -- just reject anything that doesn't match the checksum, done -- but communicating that restriction to the users without making a mess of it is probably hard.

Long ago I wrote some PDF generating programs and it was a lot of fun, but the spec has evolved in ways I imagine would make it less fun today. Still, could be a cool thing, and I'd be surprised if someone hasn't already done a version of it somewhere.

[Edit: plus, whoever creates the one-way function is deciding what all the PDFs are going to look like, which means you will end up with many such functions to accommodate the different rendering goals, and then each validator needs to know each one and someone decides which ones to trust, and so on...]

q-base4y ago

That would be necessary. It should probably be baked into the creation of the file and perhaps even be included in the footer of the document itself.

ttyprintk4y ago

If you want to start with the most controllable representation of a piece of paper, consider that OpenOffice (word processing and drawing) embeds its structured file format into PDFs. Maybe those are the PDFs we scrape first, leaving the bag-of-jpegs for later.

airstrike4y ago

> Speaking more broadly, whether we talk about HTML or PDF it's the same problem: documents should have two representations

Wait until you realize this applies to business presentations (Excel and PPT-authored-PDFs)

visarga4y ago

ML based invoice processing quality is around 90%, so human in the loop is still needed.

rokku4y ago

in germany we have sth like ZUGFerD (in german it means carthorse). it is a pdf with a xml file for parsing.

luhn4y ago· 15 in thread

The author seems to assume that everybody is using SQLite, but SQLite for a production database is an extremely niche choice. Attempting to expose more popular options like PostgreSQL or MySQL as SQLite would be extremely difficult because SQLite only supports a subset of SQL, whereas PostgreSQL and MySQL both implement their unique superset (for the most part) of SQL.

But it doesn't matter. The API doesn't matter. Web 3.0 was never about APIs, it was about data. A standardized API is only useful if it outputs standardized data. Having a bunch of bespoke SQLite tables scattered across the web gets us no closer to the ideal of Web 3.0.

simonw4y ago

You may be underestimating SQLite here. It has grown a LOT of features over the past decade, many of which were directly inspired by PostgreSQL.

Features you may have missed:

- CTEs - the WITH statement works great. And recursive CTEs, which can do mandelbrot fractals! https://www.sqlite.org/lang_with.html

- Surprisingly good built-in full text search: https://www.sqlite.org/fts5.html

- Functions for directly querying JSON data in columns: https://www.sqlite.org/json1.html

- Extremely comprehensive geospatial capabilities thanks to the SpatiaLite extension - this has a huge amount of functionality, which I think is better than MySQL though not yet as good as PostGIS: https://www.gaia-gis.it/fossil/libspatialite/index

I'm not saying it's as "good" as PostgreSQL, but I don't thank your argument that PostgreSQL and MySQL implement a substantially larger portion of SQL holds up particularly well.

capiki4y ago

As luhn said, it’s more about a standard data format than a db choice. If every client has to figure out what schema a website uses for a recipe, let’s say, then Web 3.0 is still unrealistic.

Schema.org exists, but all websites adopting it seems unlikely.

That being said, I can maybe see a world in which one company adopts schema.org schemas and the rest have to follow suit to be competitive in that particular domain.

zozbot2344y ago

> Schema.org exists, but all websites adopting it seems unlikely.

Schema.org has the backing of major search engines and other reusers of Web-served content. It's way more likely to be adopted compared to anything else in its general domain.

Closi4y ago

SQLite is the most used database engine in the world, so I wouldn't call it niche. In fact, by some estimates, it is probably used more than all other database engines combined.

The only difference is that it is usually run locally (compared to Postgres and your other examples), but something doesn't have to run remotely to be considered running in production :)

luhn4y ago

Yes, when I said "production database" I meant a database for a web application. My iPhone running SQLite doesn't relate to Web 3.0.

4 more replies

scotty794y ago

Exposing you whole database (or subset user is allowed to see) as GraphQL is way better as it is engine agnostic.

zozbot2344y ago

GraphQL is not a standard, it's just a technology for building custom API's which are far from "agnostic" in practice. You can use SPARQL if interoperability if your goal.

1 more reply

sekaoOP4y ago

My point was not that people are using SQLite in prod everywhere; read that paragraph in more of a speculative voice, not a statement of fact about the present. At any rate, i do think the range request technique makes SQLite more practical to use in database-driven apps that normally would've opted for a traditional db like postgres (though there is more work to be done to make this technique fast when doing complex queries...lots of joins are no bueno right now).

quinnjh4y ago

I enjoyed the speculative bias there as its prompting me to pick up the sqlite bundle thats been sitting in my downloads while i use post-gres-like-the-rest

DangitBobby4y ago

Well, having to write unique SQL per site is much better than having to write unique scrapers per site.

RileyJames4y ago

Exactly. And there’s really no reason those queries couldn’t be made public / collaboratively maintained.

You could probably take it one step further and define an OpenAPI spec which is populated via those queries. Tho that would require an intermediary / post-processing, likely with a cache.

Regardless, the capability to determine how and what to consume sits with the consumer (developer) from the outset. Rather than having to scrape the data, normalise it into some form of schema, and then build an api / interface around it. And then worry about keeping it up to date.

contravariant4y ago

Wouldn't you want to use a subset of SQL in an API? Why use a unique superset that differs per webpage?

jumpkick4y ago

I thought Web 3.0 was about NFTs.

dspillett4y ago

People are talking that way now. Before what is being called “Web3” ATM Web 3.0 was to be the “semantic web” (where Web 2.0 was the “interactive web” - with many things become read/write instead of read-only and greater interactivity, both in terms of social interactivity and individual interactive apps being web based, being the focus of technological enhancements).

This article is talking about that earlier definition, and a way it might once again be the definition, perhaps relegating Web3 to Web 4.0 (or web we-worked-out-it-was-a-ponzi-scheme-so-just-stopped with something else being Web 4.0, if you take the more cynical view).

tyingq4y ago

"Before the term was hijacked by crypto-grifters (and, admittedly, a few genuinely neat projects), web 3 (point oh) referred to Tim Berners-Lee's project to promote a standard way to expose and parse metadata on the web."

galaxyLogic4y ago· 13 in thread

SQLite is a relational database. Wouldn't it be a better fit to use a graph-database as the backend for anything "web"?

The idea is good, that a web-page should be generated from some data somewhere. But "web" is much about not a single document but the links between the documents, which allow you to to represent a "semantic net". The data should be about the links between them. Now where is such a database? And how can it "sharded" into multiple databases running in thousands of locations on the internet?

Groxx4y ago

What is a graph database? A miserable little pile of joins.

Though to be serious: what do you expect a graph database to provide that sqlite cannot / does not do efficiently?

wheels4y ago

You can reasonably model graphs in a column-oriented database, but traditional SQL models tend to be horrible performance-wise, because most graph algorithms need fast traversal of edges, and doing a lot of recursive lookups in SQL is impractically slow. It's not that you can't model it, it's just that performance is terrible. For a graph database to be efficient, you need a high degree of locality for edge information (ideally a vector you can simply read out).

(Note: I've actually written a graph database from scratch, for exactly these reasons.)

simonw4y ago

SQLite may be uniquely well suited to building graph databases out if all of the relational engines, thanks to this: https://www.sqlite.org/np1queryprob.html

1 more reply

ahevia4y ago

Do you have examples of Graph DBs that could embedded in a static site as easily as SQLite?

oever4y ago

Modelling transitive edges can be done in Sqlite with recursive common table expressions. The performance for that is probably less than for graphs databases, but Sqlite has many other advantages over graph databases.

1 more reply

j-pb4y ago

A whole pile of joins, that naturally arises if you want to combine data from a ton of domains.

Triple stores are essentially relational databases in 6th normal form. But relational databases like SQLite don't have good join algorithms to deal with this (they do pairwise joins instead of the worst case optimal ones like Leapfrog-Triejoin or Tetris). They also lack good interfaces for so many joins, you want something more declarative like Datalog/SparQL/GraphQL, than to explicitly write out every join.

patates4y ago

Treating links as an entity. In a RDBMS, it is not possible to map a bidirectional relation semantically. It needs to have a direction (you can define it in both directions but then you have 2 relations). You then also need to duplicate link properties (in the join table) or normalize to yet another join table.

That has been my pet peeve for a while, and can make it hard to define navigation.

Totally not a deal-breaker though. I'd still use sqlite because I ### love it :)

Groxx4y ago

why would a direction be necessary?

    relation_id │ relatee_id
    ────────────┼───────────
              1 │ x
              1 │ y

voila, directionless relationships. you can even do N-way, just insert more rows with the same relation-id.

1 more reply

sktrdie4y ago

Take a look at Linked Data Fragments https://linkeddatafragments.org/

galaxyLogic4y ago

Interesting. That looks much like what the proposed idea (using SQLite) is. Use both clients and servers. No?

jinjin24y ago

It would be interesting to see it implemented on an object database like Realm, rather than on being based on SQL. Seems like it would be a much better fit.

TuringTest4y ago

Maybe some graph-like layer over SQLite like graphlite would help?

https://graphlite.readthedocs.io/en/latest/

mikestaub4y ago

OriginTrail.io is using Arangodb.com

0xbadcafebee4y ago· 11 in thread

Among the 30-odd technologies that make up the Semantic Web[1] (it never died, it's just a collection of tech, lots of organizations use it daily) are graph databases[2]. Graph databases are necessary to implement semantic web databases.

SQLite is not a graph database. Even if you used SQLite to implement a graph database, it would not solve any significant problems of the semantic web, such as access to data, taxonomies, ontologies, lexicons, tagging, user interfaces to semantic data management, etc.

It's a really odd suggestion that you would just copy around a database or leave it on the internet for people to copy from. For the BBS mentioned here, that might actually be illegal, as it might contain PII, and on other sites possibly PHI. Many countries now have laws that require user data to remain in-country. Besides the challenges of just organizing data semantically, there still needs to be work done on data security controls to prevent leaking sensitive information.

The funny thing is, that isn't even hard to do with the semantic web. You classify the data that needs protecting and build functions and queries to match. You can tie that data to a unique ID so that people can "own" their data wherever it goes, and sign it with a user's digital certificate which can also expire.

But all of that (afaik) doesn't exist yet. Everyone is more concerned with blockchains and SQL, either because the fancy new tech is sexier, or the old boring tech doesn't require any work to implement. The Semantic Web never caught on because it's really fucking hard to get right. No companies are investing in making it easier. Maybe in 20 years somebody will get bored enough over a holiday to make a simple website creation tool that implicitly creates semantic web sites that are easy to reason about. It'll probably be a WordPress plugin.

[1] https://en.wikipedia.org/wiki/Semantic_Web [2] https://graphdb.ontotext.com/documentation/enterprise/introd...

zozbot2344y ago

> Graph databases are necessary to implement semantic web databases.

This just isn't true, on multiple levels. RDF is an interoperability standard that does not per se depend on a 'graph-like' data model - you can very much expose plain old relational data via RDF, and this is quite intended. Additionally, modern general-purpose RDBMS's support graph-focused data models quite well, despite being built on 'relational' principles - there's no need for special tech when working with general-purpose graph models, unless you're doing some sort of heavy-duty network analytics.

0xbadcafebee4y ago

You're talking about extending a database design created 50 years ago to work with models and methods that involve significantly different operations and concepts. Let the RDBMS die so we can make something that is much more powerful and requires less fidgeting and squinting to work the way we want.

RDBMS were a niche research project for a decade before they started to catch on in business apps. They've stayed around forever because they're just functional enough to be dangerous. But we've already hit the upper limits of both reliability and performance years ago (remember NoSQL?) and we just keep bolting on features because nobody wants to leave them. The old designs and implementations are holding us back.

2 more replies

smarx0074y ago

RDF is a labeled multigraph data model with URI-based predicates as edge labels, where each triple represents an edge. You are right that relational data can be exposed in RDF, just like CSV can be loaded into a graph DB.

sekaoOP4y ago

> Graph databases are necessary to implement semantic web databases.

The online docs (and TBL himself) rarely mention of graph databases, but obviously the idea is tied tightly to RDF. Separating it from that implementation detail is part of the point, though. Getting people to represent their data via an additional format was never going to work.

> For the BBS mentioned here, that might actually be illegal, as it might contain PII

Can't imagine the purpose you had in even making this point. In theory, any arbitrary database exposed publicly could be illegal to replicate due to copyright, PII laws, etc. But that has nothing at all to do with a technical discussion of a technique for exposing data. What a bizarre point to make.

As an aside, I'm glad you removed the "Uh........." from the beginning of your post. We're all making an effort to reduce the typical HN snark in the comments, and there's always room for improvement :D

smarx0074y ago

There is a wiki extension to embed RDF data: https://www.mediawiki.org/wiki/Extension:LinkedWiki

There is a WP plugin to expose some basic data: https://wordpress.org/plugins/wp-linked-data/ and a Swedish startup Metasolutions has Wordpress plugins for embedding any kind of RDF information: https://docs.entryscape.com/en/blocks/

simonw4y ago

SQLite could actually make a really good basis for building a graph database, thanks to "Many Small Queries Are Efficient In SQLite": https://www.sqlite.org/np1queryprob.html

I took advantage of that for my datasette-graphql plugin - it's not a graph database, but it does allow deeply nested graph-like queries that take advantage of SQLite's fast small query performance: https://datasette.io/plugins/datasette-graphql

Karrot_Kream4y ago

The semantic web failed to become widely popular because:

1. Graph databases on top of triple stores are a lot less scalable than relational databases or key-value stores, and this is how semantic data is meant to be stored/queried.

2. Data is valuable. Handing out data for free in a machine-consumable way is both expensive (machines can request data much more quickly than a human) and a recipe for copycats. The incentives just aren't there.

TBL's Solid project is about trying to separate semantic data providers from the presentation layer and opening up the possibility of payment from these data providers to try to improve the incentives around semantic data sharing.

nescioquid4y ago

> The Semantic Web never caught on because it's really fucking hard to get right. No companies are investing in making it easier.

I really appreciate this point. I had the opportunity to work on an exploratory project with an experienced ontologist (yes, you really need one of those, I think). The tools were fascinating (reasoners quickly became necessary) but I had the feeling that many of these tools were at a comparatively early stage of maturity.

Trying to explain to people how the system would work was a challenge as it required a primer on theory and application -- we glazed many eyes. The CTO wanted to know if we could use blockchain somehow. Another group addressed a slice of the problem with technologies already in use and that decided the matter.

zozbot2344y ago

> reasoners quickly became necessary

Ouch. Most uses of reasoners/inference are quite computationally-intensive, to the point of making "reasoning" quickly infeasible. But if you really want, you can do all this stuff in traditional databases by defining appropriate 'views' and having your application query them. You could even use custom database triggers to enable inserts/updates on views.

hobofan4y ago

Really depends on the reasoner in use. I'd really take most current public benchmarks on reasoner performance with a grain of salt, as most implementations out there are mostly academic-grade non-production systems.

E.g. Stardog does most of their reasoning via query rewriting (and also lean on some restrictions). That way you can leverage DBs to do what they are good at. If you can then on top of that build some clever caching or incremental computation, you should be fine for even pretty huge dataset sizes.

NetOpWibby4y ago

Thanks for the links!

togaen4y ago· 6 in thread

Terrible idea. Why would anyone want to deal with interfacing a bunch of randomly structured databases whose tables can change at any time without warning. Nightmare.

luckystarr4y ago

Yes, it's still terrible for the consumer of the data. But I like it not because of that.

The positive thing I feel when reading about this, is that it dramatically lowers the barrier for the producer of the data to expose it in a meaningful way. While previously it was necessary to think about the format and write code to expose the data while now its possible to just throw the data over a wall.

You could use a framework to automate the first thing, but this would be specific to one programming language, while the second approach works with all languages. So it lowers the total effort to get to the goal, effectively side-stepping the "have to implement framework or serialization code" issue.

Warning: heavy speculation below

So if more people would build sites using this technique, the pressure for better tools (at a higher level than right now) for consumers would increase, so these would be built by someone. As you have a proper standard (there is only one SQLite) you would have a new "ecosystem" growing. This would lower the pain for the consumers of said data. You'd still have to implement it in every programming language that wants to access the data, but this is another problem.

scotty794y ago

Not when compared to alternative of accessing random, misdocumented, randomly limited, arbitratily formatted subset of that which slowly bit-rots.

FridgeSeal4y ago

As opposed to a bunch of websites serving an archaic, poorly formatted blob of text-the “correct” parsing of which has now become _so complicated_ that it’s basically infeasible for anyone not willing to build a whole web-browser?

cxr4y ago

Parsing is not the hard part of dealing with the Web platform.

1 more reply

webmaven4y ago

> Terrible idea. Why would anyone want to deal with interfacing a bunch of randomly structured databases whose tables can change at any time without warning. Nightmare.

It isn't quite so bad. You can have wiki-esque volunteer-driven cooperative authoring, linking to known good versions, etc. to keep it from becoming a complete free-for-all.

mrpf1ster4y ago

I disagree, the alternative is either no access to website data or access through a tightly controlled API (which can come with the same problems if API compatibility is not guaranteed).

echelon4y ago· 5 in thread

What a lot of folks don't realize is that the Semantic Web was poised to be a P2P and distributed web. Your forum post would be marked up in a schema that other client-side "forum software" could import and understand. You could sign your comments, share them, grow your network in a distributed fashion. For all kinds of applications. Save recipes in a catalog, aggregate contacts, you name it.

Ontologies were centrally published (and had URLs when not - "URIs/URNs are cool"), so it was easy to understand data models. The entity name was the location was the definition. Ridiculously clever.

Furthermore, HTML was headed back to its "markup" / "document" roots. It focused around meaning and information conveyance, where applications could be layered on top. Almost more like JSON, but universally accessible and non-proprietary, and with a built in UI for structured traversal.

Remember CSS Zen Garden? That was from a time where documents were treated as information, not thick web applications, and the CSS and Javascript were an ethereal cloak. The Semantic Web folks concurrently worked on making it so that HTML wasn't just "a soup of tags for layout", so that it wasn't just browsers that would understand and present it. RSS was one such first step. People were starting to mark up a lot of other things. Authorship and consumption tools were starting to arise.

The reason this grand utopia didn't happen was that this wave of innovation coincided with the rise of VC-fueled tech startups. Google, Facebook. The walled gardens. As more people got on the internet (it was previously just us nerds running Linux, IRC, and Bittorrent), focus shifted and concentrated into the platforms. Due to the ease of Facebook and the fact that your non-tech friends were there, people not only stopped publishing, but they stopped innovating in this space entirely. There are a few holdouts, but it's nothing like it once was. (No claims of "you can still do this" will bring back the palpable energy of that day.)

Google later delivered HTML5, which "saved us" from XHTML's strictness. Unfortunately this also strongly deemphasized the semantic layer and made people think of HTML as more of a GUI / Application design language. If we'd exchanged schemas and semantic data instead, we could have written desktop apps and sharable browser extensions to parse the documents. Natively save, bookmark, index, and share. But now we have SPAs and React.

It's also worth mentioning that semantic data would have made the search problem easier and more accessible. If you could trust the author (through signing), then you could quickly build a searchable database of facts and articles. There was benefit for Google in having this problem remain hard. Only they had the infrastructure and wherewithal to deal with the unstructured mess and web of spammers. And there's a lot of money in that moat.

In abandoning the Semantic Web, we found a local optima. It worked out great for a handful of billionaires and many, many shareholders and early engineers. It was indeed faster and easier to build for the more constrained sandboxiness of platforms, and it probably got more people online faster. But it's a far less robust system that falls well short of the vision we once had.

netcan4y ago

To add some fluff to this:

At one point twitter seemed to want to be a relatively general protocol, where users could build their own UI, use 3rd party apps and maybe even interoperate or extend with other social networks & such.

Pg/yc even wrote about it, inviting startups to start writing apps for this exciting new protocol. The early app ecosystem was pretty slimy, with a lot of spam-ish clients for promoting snake oil. More importantly, it became clear that controlling the UI means control over users: the data, rights, often and the ability to decide what goes into people's feed. That's where the (financial) value is, and they're not going to give that up.

TBL's ideas were naive perhaps, but he did have his thumb in the right place. Something like semantic web was necessary, in order to avoid the centralisation that did end up happening.

RSS, via podcasting did catch on. Today it's one of the only "free" media forms. There's no company moderating podcasts like twitter, FB, youtube, etc.

zozbot2344y ago

There's a standard XML serialization of HTML5 that supports all the features previously associated with XHTML. Additionally, RDF data can be exchanged as JSON via JSON-LD. There's no reason why a typical SPA app could not be built to query RDF-serving endpoints.

"Marking up forum posts" is something that's getting quite a bit of traction nowadays via specifications like ActivityStreams (with its "push" extension ActivityPub now powering the 'Fediverse') and WebMention.

hobofan4y ago

> The entity name was the location was the definition.

While that concept sounds cool in theory, in practice it was and is a disaster. In combination with the big degree of centralization and little versioning mechanisms you have to trust the publisher to not alter the semantics, and also hope that they stay online forever or your semantics vanish.

When I first learned about the semantic web, I was very hyped on it, but that quickly subsided once I tried actually querying the ontologies and having to see that most of them yield a 404.

I'm still very hopeful for semantic data (and happy to be able to work on a product leveraging it), but I think for an open semantic web there is a lot of work that needs to go into tooling to make it succeed.

mftb4y ago

I agree with pretty much everything you said, except the part about the "VC-fueled startups". Google and fb were once startups, they were just earlier and Google in particular was smart enough to see the future. As part of a multi-faceted effort (including for instance, Chrome and gmail), they saw the need to head off the Web 3.0 standards, delivering us instead the web we have today. I wish I could have seen things as clearly then.

In the end though I'm not sure it ever would have been any different. People want it "now" and they want it "convenient".

NetOpWibby4y ago

Wow, I had no idea, bookmarking your comment.

fleddr4y ago· 4 in thread

The semantic web is not a technical problem, it's an incentive problem.

RSS can be considered a primitive separation of data and UI, yet was killed everywhere. When you hand over your data to the world, you lose all control of it. Monetization becomes impossible and you leave the door wide open for any competitor to destroy you.

That pretty much limits the idea to the "common goods" like Wikipedia and perhaps the academic world.

Even something silly as a semantic recipe for cooking is controversial. Somebody built a recipe scraping app and got a massive backlash from food bloggers. Their ad-infested 7000 word lectures intermixed with a recipe is their business model.

Unfortunately, we have very little common good data, that is free from personal or commercial interests. You can think of a million formats and databases but it won't take off without the right incentives.

OliverJones4y ago

> The semantic web is not a technical problem, it's an incentive problem.

True. Demonstrable in the health-care IT world. Think of electronic health records. My personal portable electronic health record would either be a bunch of images of scrawled notes and maybe some nice medical images ( = nonsemantic web) Or it would be in a highly wrought format, i dunno, XML or something, with carefully worked out schemata for everything from flu shot records to heart transplants (= semantic web).

Back in 2007-2010, "electronic health records" EHR were spottily and sloppily implemented by some providers. But, in the US, a federal law pushed more widespread implemetation. Now my online EHR, and yours, is decidedly app-mediated and non-semantic, on a web site portal. Export to JSON? Hah. No.

The hospitals and health care systems only did it because of incentives.

I happened to work at a B2B SaaS company focused on making connections between hospitals and rehab/skilled nursing providers. A rehab outfit can't decide to accept a patient without seeing her medical records and doctors' orders. So our customers had a real incentive to be able to share records. It worked. But the data we had access to (go read about HL7) was not even close to semantic. And our SQL database schemas were, umm, quinquiremes of Nineveh, really intricate, somewhat brittle. Let's leave privacy issues out of the conversation for a moment. Publishing the schema and accepting random queries would help NOBODY except some partner outfit willing to develop and test useful stuff.

Hey, I got an idea! Let's give them an API! Oh, wait, nobody wants to bother with an API? OK, how about a nice web site! And we're back where we started.

With a universal semantic web, the same problems would crop up everywhere.

onion2k4y ago

Taking someone else's content and republishing it without permission isn't cool, even if you wrap it in a nice machine readable format.

fleddr4y ago

I fully agree, and that's one of the problems I was describing. There's very little content free of commercial interests. If this is true, it blocks a lot of potential use cases of a semantic web.

Micoloth4y ago

I'm more and more seeing that this is true. Still, it is sad.

The question isn't even, what can one do, because obviously nobody can change how incentives works in a given society.

The question is: is there a timeline in which the right incentives (to share data) start being enforced? How would that play out?

xmly4y ago· 4 in thread

I do not understand this conclusion: "Data on the web will only be "semantic" if that is the default, and with this technique it will be."

Why would it be semantic?

SkeuomorphicBee4y ago

Because the backend data is exposed to the world, with all its original semantic structure (relational model) intact, before it is flattened into a document view.

visarga4y ago

Real world is messy, some companies have different key-value pairs on the same kind of document (invoice, purchase order, utility bill, etc). I counted 20K different keys, some semantically synonymous, in a few thousand invoices. Even the table part can have different columns. What do you do when schemas don't quite match?

1 more reply

alexchamberlain4y ago

Generally relational models are _not_ semantic models.

rpwverheij4y ago

I agree. Semantic data would mean that others can easily understand the meaning of the data. In the semantic web this would be by using ontologies, which define the types of things and relations between these types of things. But just having your schema visible doesn't mean anyone understands it straight away, they would still need to make an effort to understand the schema of that specific application. And the schema is probably unique to that application. The end result is pretty much the same as for example exposing your database as a GraphQL endpoint. Take "The Graph" a web3 project exposing data of many different blockchain projects as GraphQL endpoints. It's nice, but I still need to make an effort to understand the meaning of each property in each endpoint. And a "transaction" in one project is not linked to the meaning of a "transaction" in another. A bit off topic, but ironically I therefor don't find the name 'The Graph' to be all that accurate.

Point in case: YES to at least remembering that web3 is (also) the Semantic Web. But no, this solution is not semantic data.

gibsonf14y ago· 4 in thread

Well, then again the original idea is taking off with https://solidproject.org/ with millions of pods by Tim Berner-Lee's Inrupt to go online starting this Spring.

indigodaddy4y ago

Looks neat thanks for the link. Also appears to be fairly straightforward to self host a pod: https://solidproject.org//self-hosting/css

rapnie4y ago

> to go online starting this Spring

Any announcement I missed? Solid project exists for a long time and seems that many specs are still very early days.

gibsonf14y ago

Yes, the entire country of Flanders is getting pods for every citizen in March. Then there are patient pods for UK NHS, and then pods for BBC content users...

gfody4y ago

gross sparkle queries

fjfaase4y ago· 4 in thread

SQL is not better than XML or JSON for representing data. They are all mappings of much richer data structures on a limited data model. But even setting aside these problems, there are some problems with a distrubted semantic web that are barely ever mentioned: the step from going from data to 'semantic' facts, how to deal identifying sources, and versioning/updates. I think it is very important to record who (person or institution) is the source of a certain fact or the 'linking' of facts between multiple sources. Cryptographic keys, just as in blockchains, could help to link data of distruted sources such that it is possible to verify the source of a fact to sources/authorities and correct errors or deal with updates in case they occur.

OliverJones4y ago

There's one exception to the equivalence of SQL on the one hand to XML or JSON on the other hand. The point of SQL (and other DBMS paradigms) is to give access to data that's orders of magnitude bigger than the RAM in which the app runs. That has stayed true for at least a quarter century, during which RAM and database sizes both did the Moores-law exponential expansion.

fjfaase4y ago

A relational database maps everything to unordered relationships. Representing or manipulating a tree like structure is complex. Just representing an ordered list is complex. In XML and JSON everything is ordered and querying it as relational database is cumbersum. Graph databases and OO databases are somewhere in the middle.

But as I wanted to point out, which data models are used, is not the major obstacle to the semantic web. It is these other problems that are not addressed.

KarlKemp4y ago

I feel Wikidata has a generally sane approach to these questions.

fjfaase4y ago

Wikidata is interesting. But it is a centralized approach. Is there an interface which gives the full breakdown of sources and where you can, through a chain of certificates (like those used for ssh and https), verify the sources?

1 more reply

simonw4y ago· 4 in thread

I've been exploring the idea of using SQLite to publish data online via my Datasette project for a few years now: https://datasette.io/

Similar to the OP, one of the things I've realized is that while the dream of getting everyone to use the exact same standards for their data has proved almost impossible to achieve, having a SQL-powered API actually provides a really useful alternative.

The great thing about SQL APIs is that you can use them to alter the shape of the data you are querying.

Let's say there's a database with power plants in it. You need them as "name, lat, lng" - but the database you are querying has "latitude" and "longitude" columns.

If you can query it with a SQL query, you can do this:

    select name, latitude as lat, longitude as lng from [global-power-plants]

Here's a demo using exactly that query: https://global-power-plants.datasettes.com/global-power-plan...

That URL gives you back an HTML page, but if you change the extension to .json you get back JSON data:

https://global-power-plants.datasettes.com/global-power-plan...

Or use .csv to get back the data as CSV:

https://global-power-plants.datasettes.com/global-power-plan...

But what if you need some other format, like Atom or ICS or RDF?

Datasette supports plugins which let you do that. I'm running the https://datasette.io/plugins/datasette-atom datasette-atom plugin on this other site. That plugin lets you define atom feeds using a SQL query like this one:

    select
      issues.updated_at as atom_updated,
      issues.id as atom_id,
      issues.title as atom_title,
      issues.body as atom_content,
      repos.html_url  || '/issues/' || number as atom_link
    from
      issues join repos on issues.repo = repos.id
    order by
      issues.updated_at desc
    limit
      30

Try that query here: https://github-to-sqlite.dogsheep.net/github?sql=select%0D%0...

The plugin notices that columns with those names are returned, and adds a link to the .atom feed. Here's that URL - you can subscribe to that in your feed reader to get a feed of new GitHub issues across all of the projects I'm tracking in that Datasette instance: https://github-to-sqlite.dogsheep.net/github.atom?sql=select...

As you can see, there's a LOT of power in being able to use SQL as an API language to reshape data into the format that you need to consume.

oscargrouch4y ago

I also have a project to explore this alternative way of peers communication but i have a different answer to this, and i think its better if its a network of peers that expose API's

https://github.com/mumba-org/mumba

It's badly documented as i have just published to github, but i hope it gives a clue of how is supposed to work.

I'm on the final touches over this project, but the main concept is already working as is 90% of it, but i think exposing SQL is too raw, and maybe dont offer the whole picture, as for instance, what is important is not data, but sometimes pure computation.. Eg. suppose you offer a deep leaning inference where you receive and give back tensors..In the middle of it is a different sort of computation, where it doesnt have anything to do with databases.

Or yet, suppose you need to access something in a third-party before giving an answer, or if you want to do it in a distributed fashion without you api consumer even noticing it?

API's are a good answer to that, and in my opinion are superior interfaces, whatever the semantic web of the future will be, it will need this network of API peers to work as a floor to it.

For instance, you can design a Graph API on top of it. Exposing your data layer directly is bad engineering as there's a lot of problems you wont be able to solve, and where leaving clients to talk to "you" over a well-defined API will.

To put it simply, in my point of view the direction the semantic-web is pointing to is cool, but the answer is not the right one, and this idea of exposing SQLite directly while is cooler, yet have the same flaws, or else something as GraphQL would have taken the world as its not much a different answer than the one presented here.

simonw4y ago

I've thought a bit about the problem of exposing your underlying database - that's obviously a problem for creating a stable API, because it means you may be unable to change your internal database schema without breaking all of your existing API clients!

With Datasette, my solution is to specifically publish the subset of your data in the schema that you think is suitable for exposing to the outside world. You might have an internal PostgreSQL database, then use my db-to-sqlite tool - https://datasette.io/tools/db-to-sqlite - to extract just a small portion of that into a SQLite database which you periodically publish using Datasette.

The other idea I have is to use views. Imagine having a PostgreSQL database with a couple of documented SQL views that you expose to the outside world. Now you can change your schema any time you like, provided you then update the definition of those views to expose the same shape of data that your external, documented API requires.

As with all APIs of this sort, adding new columns is fine - it's only removing columns or changing the behaviour of existing problems that will cause breakages for clients.

transfire4y ago

I wonder if database engines will ever have versioning, such that it would always be possible to see the database as it was at different points in time.

2 more replies

simonw4y ago

A couple of other plugins that work along similar lines:

- https://datasette.io/plugins/datasette-ics can be used to generate ICS calendar feeds, which you can subscribe to using desktop calendars or Google Calendar

- https://datasette.io/plugins/datasette-geojson can generate GeoJSON files for any SpatiaLite database table with a geometry column.

recursivedoubts4y ago· 4 in thread

Humans, as of now (and as far as I'm aware, being outside the AI labs at the big tech companies and DARPA) have agency, and so are in a unique position to take advantage of the uniform interface of REST/the web in a flexible manner. I wrote an article about this on the intercooler.js blog, entitled "HATEOAS is for Humans":

https://intercoolerjs.org/2016/05/08/hatoeas-is-for-humans.h...

The idea that metadata can be provided and utilized in a similar manner doesn't strike me as realistic. If it is code consuming the metadata, the flexibility of the uniform interface is wasted. If it is a human consuming the metadata, they want something nice like HTML.

For code, why not just a structured and standardized JSON API?

This appears to be what we have settled on, and I don't see any big advantage extending REST-ful web concepts on top of it. The machines just ignore all that meta-data crap.

netcan4y ago

>> why not just a structured and standardized JSON API?

So in this version of the idea... because structuring data requires work. Unstandardized data exists already. Some of it is already SQLITE. A lot of the rest is in other SQLs, and that might be a smaller bridge.

Author claims (if I'm understanding correctly) that a static website could easily query sqlites over HTTP, and bam, web 3.0.

Honestly, it's hard for me to think/discuss these ideas without examples, even if contrived. What kind of websites would be built this way? What data will they be querying?

A web app that uses photos and address books on the users phone? An alternative UI for news.yc?

mycall4y ago

> What kind of websites would be built this way?

https://ansiwave.net

netcan4y ago

What sqlites would it query, and to what ends?

Just trying to get a grasp on OPs "vision."

simonw4y ago

> why not just a structured and standardized JSON API?

The word "just" is doing a lot of work there! Getting everyone to use the same standardized JSON API format turns out to be incredibly difficult.

This is why I'm a big fan of the idea of using SQL as an API language to redefine the data into the output format that you need, see my comment here: https://news.ycombinator.com/item?id=29900403

SergeAx4y ago· 3 in thread

What is the difference between this idea and exposing a read-only MongoDB (JSON in, JSON out) via HTTP endpoint? In the end, 50 ranged HTTP requests is not that great for unstable connection, 1 request is way better, and you need a server anyway.

Okay, offline first, what does that mean? Should I download the entire 600mb SQLite database? Should I do it every time it changes? Who will pay for the bandwidth? We can not employ standard HTTP proxy and caching mechanism here, it is not for 600mb files.

sekaoOP4y ago

Making it available via a static file host dramatically lowers the barrier. If you have an interesting dataset, you can even throw it on github pages and pay nothing; that likely is not true for your mongo db server.

A typical request using indexes will be less than 10 separate 1KB GET requests, not 50. But yeah, more work needs to be done on performance.

Whether it makes sense to fully download the dataset depends on the project; maybe it does not. But it doesn't have to be a monolithic file. You can use SQLite's multiplex VFS to split the SQLite file into many smaller pieces (and still update the db later!).

SergeAx4y ago

I think this can be carefully thought through and become something interesting. I don't fancy SQL, to be honest, despite having "structural" in its name it is too chaotic for XXI century. Rows and columns!

Schema-enforced document databases, on the other hand, are neat and mostly people- and machine-readable at the same time.

Another idea: downloading only indexes may greatly reduce number of requests needed to query the data.

ricardobeat4y ago

That’s what those tiny 1KB requests are doing mostly, downloading indexes. With http pipelining taken into account, it’s way faster than trying to preload the full index data.

Despite the number of requests seeming excessive to us, the performance of this setup is already in the ballpark of your usual underpowered MySQL going through an app server.

z3t44y ago· 3 in thread

One problem is that it's the one hosting the data that pays the bandwidth. Yes when you download a video from Youtube, Google has to pay your ISP! (Google will strong-arm peering agreements though, but that doesn't take away from my point)

Someone have to pay for the infrastructure. Right now the one hosting (not the consumer) pays for the infrastructure. So there are not really any incentive to host data for free - like a kiosk offering free goods and services.

The problem is if you would get paid by sending stuff, everyone would be spamming data everywhere. Imagine if you would have to pay 10c every time someone sent you an e-mail.

Something that I think would help are micro transactions. And a built into browsers so that you could easily make a micro-transaction. We already have Bitcoin and other crypto currencies, but they are too big to run inside the browser of a mobile phone, if it wasn't for the high transaction costs - the ledger/blockchain would be even bigger...

Today publishers - those that publish stuff on the web earn money by showing ads. And ads initially worked very well for a few years around 2000 before people started cheating with bots. But you can still make individual deals with webmasters and choose to trust them.

Also a lot of "the web" has moved to videos and Youtube. The average web user choose to watch a video rather then reading a text article covering the topic of interest.

chupchap4y ago

> Google has to pay your ISP

This is the standard ISP arguement that I never understood. Google pays their ISP and any other ISPs they are using for peering, but they don't HAVE to pay your ISP. You pay for your usage.

lobocinza4y ago

I as a consumer also have to pay for Internet access.

fizx4y ago

You can host this on a requestor pays S3 bucket.

_ea1k4y ago· 3 in thread

Is there really that much web safely exposable data in sqlite for this to make sense? I'm not really seeing how this is obviously better than the metadata ideas that preceded it.

rossdavidh4y ago

Some: weather, ratings, topography, dictionaries and encyclopedias, sports scores, market prices, some other stuff. All public knowledge, but not necessarily publicly available (easily) in raw form.

lostmsu4y ago

But doesn't it have to be immutable for the proposal to work?

vorpalhex4y ago

No, as long as the data model is stable you can add new rows.

You might want some kind of versioning for messy column changes, particularly removals.

1 more reply

fizx4y ago· 1 in thread

Effectively, this is a pretty cool way to get an OSS version of something like https://www.snowflake.com/guides/public-data. Also something like a lighter version of https://datasette.io/.

The thing it's kinda missing for me is the ability to compose multiple SQLite databases, possibly provided by different domains.

It'd be nice to join together different public datasets. In a weird personal example, if Strava exposed SQLite, I'd love to do a join to weather.com and see when the last time I biked in the rain was.

It'd be cool if one half of some table was at foo.com and I could add a few rows to it on my bar.com domain, and then the combined dataset was queryable as a single unit.

simonw4y ago

I've done some thinking about joining datasets together.

Datasette grew a `--crossdb` option a while back, which means that if you attach multiple SQLite files to the same Datasette instance you can run joins across them: https://docs.datasette.io/en/stable/sql_queries.html#cross-d...

So one option is to download the database you want to join against and run the joins locally. Datasette encourages making the raw SQLite database file available, so if it's less than about 100MB this may be a good way to do it.

If you're willing to do the joins in your client-side code, Datasette's default JSON API can help. You can write an application (including a client-side JavaScript application) which fetches and combines data from multiple different Datasette JSON instances by hitting their APIs.

My last idea is the most out-of-left-field: since Datasette lets you define custom SQL functions using Python code, it would be feasible to create a Python function which itself makes a query via the JSON API against another Datasette instance! You could then use that to simulate joins in SQL queries that you run against a single Datasette instance.

I've not built a prototype of this yet, and to be honest I think combining data fetched from multiple JSON APIs (which is possible today) will provide just-as-good results, but it's an interesting potential option.

kdunglas4y ago· 1 in thread

API Platform is a popular and easy to use semantic web framework:

1. You design your data model as a set of PHP classes, or you generate the class from any RDF vocabulary such as Schema.org

2. API Platform uses the classes to expose a JSON-LD API with with all the typical features (sorting, filtering, pagination…)

3. You use the provided "smart clients" to build a dynamic admin interface or to scaffold Next, Nuxt or React Native apps (these tools rely on the Hydra API description vocabulary, and work with any Hydra-enabled API)

In addition to RDF/JSON-LD/Hydra, API Platform also supports ActivityPub.

https://api-platform.com

no_wizard4y ago

Aren't you the same Dunglas that is head of this project?

Great idea by the way! had to the pleasure of working with api platform in some Symfony applications in a past gig. I can vouch its easy enough to use, but the GraphQL integration (at least at that time) was really slow. I have not found PHP to be the ideal runtime for GraphQL

phh4y ago· 1 in thread

I think the author missed the "semantic" part. If you push your own SQLite, then no, I don't have the semantic meaning of the website. Only a standardized semantic file format ala RDF can achieve that.

sekaoOP4y ago

Out of context, you won't necessarily be able to glean meaning from either an arbitrary SQLite database or arbitrary RDF tuples. Both are equally meaningful or meaningless depending on the observer...at the end of the day, they are just structured data with labels that (hopefully) the observer understands. One doesn't have inherently more semantic meaning than the other.

eoo4y ago· 1 in thread

Needs cost-based querying paid for with lightning network to be viable at scale.

regularfry4y ago

Why is that uniquely the case here?

uoaei4y ago

This may be more likely to happen if there was a compromise between the two: query the database, but maintain a database that is queryable using SPARQL and can export to TTL files. Then the linked data revolution can continue and we don't have to maintain finnicky webpages but rather a relatively static database.

vmception4y ago

if "exposing and parsing metadata" never took off as the meaning for web 3.0, by the author's own admission, why try to resurrect it with this title? clickbait?

we love web3 labeled articles here

fortunately the more popular variant of web 3.0 doesn't even need the developer to make a database or anything on the backend. just frontend development, and deploying code once to the nearest node. frontend is optional depending on your userbase.

tingletech4y ago

"The semantic web is the future of the web, and always will be." -- Peter Norvig

https://www.youtube.com/watch?v=LNjJTgXujno&t=1257s

1 more reply

firechickenbird4y ago

Isn’t this Web 1.0 instead? You are only reading data, yeah ok with sql, but you still can’t modify it. And also there are already very good standards like Rdf, Owl2, spraql, which are more expressive than sql for consuming the info

mro_name4y ago

I don't get it, why the data and the final visual have to be both present/created ON THE SERVER.

There's been a technology around for so long, that it is forgotten meanwhile (like the semweb itself): xslt.

A lot can be done by just publishing raw xml data plus a visual representation generated in the browser right before display.

I'm doing so with RDF https://demo.mro.name/geohash.cgi/about, GPX https://demo.mro.name/geohash.cgi/u154c, homegrown xml http://rec.mro.name/stations/b2/2022/01/12/1005 or atom feeds https://demo.mro.name/shaarligo and on and on.

The server is a source of data, its filesystem the database, and the client has to make sense of it. There is no API but GET requests. Works wonders for all but big data queries, naturally.

So you publish raw data (TimBL, you want it that way) plus a recipe for a visual representation and the browser shows a sensible view to begin with.

jhoelzel4y ago

Well yes and no. I can see this working in theory, but in reality semantic means standardised as much as it means accessible.

In a world where my blogpost objet has the same information as your blogpost object, this works without a problem.

In a world where I actually want to up my database to you, we could agree on a format.

Both of these cases, from where i stand, seem very unlikely and we have not even talked about the pople that would clone your data 1 to 1 just to host an ad filled alternative of your site in real time.

lmm4y ago

I'm not a fan of SQL, but I do think exposing your original source data in its original form is valuable (though it has little to do with being semantic). I carefully set up my blog to expose the raw markdown that is the source form of my blog posts in the source HTML itself, with the minimum necessary cruft around it to render it as a viewable webpage.

1 more reply

netcan4y ago

Whether or not it has legs, at least this is an interesting idea.

shp0ngle4y ago

.... the original SQLite-over-HTTP-ranges was a clever hack to host database-like data on github.

But I don't think it should be actually used for anything serious.

And I don't really get the connection with "semantic web", which was essentially idealistic vaporware of the 2000s.

visarga4y ago

> Data on the web will only be "semantic" if that is the default, and with this technique it will be.

Not going to work unless imposed by some external force. The semantics of the web can more practically be extracted with neural nets, but it's a long tail and there are errors. Lots of good work recently in parsing tables, document layouts and key-value extraction. LayoutLM and its kin comes to mind.[1]

[1] https://scholar.google.com/scholar?cites=9435785928704193879...

bokchoi4y ago

I never got on the semantic web train, but a translation layer does allow you to make underlying schema changes.

I poked around the ANSIWAVE BBS and it looks fun!

pietroppeter4y ago

Very nice indeed! I am sorry I did not notice before the discussion about previous blogpost on the subject [0] “Using the SQLite-over-HTTP "hack" to make backend-less, offline-friendly apps”

Are there more than 2 blogposts? Cannot find a posts page.

[0]: https://news.ycombinator.com/item?id=29758613

tzury4y ago

A more readable version

https://outline.com/E5J2Ft

punnerud4y ago

This post came 3months before phiresky, and should get credit for being first to making it practical: https://news.ycombinator.com/item?id=25842999

wombatmobile4y ago

> The semantic web will never happen if it requires additional manual labor.

Is manual labor the reason things turned out the way they did, with google spending whatever it took to index and monetise the whole web the way it did?

Or might money have something to do with it?

hankman864y ago

Not going to happen. The reason for the Semantic Web never taking off were never technical. Websites already spend a lot of money on technical SEO and would happily add all sorts of metadata if only it helped them rank better. Of course, many sites’ metadata would blatantly “lie” and hence, the likes of Google would never trust it.

Re exposing an entire database of static content: again, reality gets in the way. Websites want to keep control over how they present their data. Not to mention that many news sites segregate their content as public and paywalled. Making raw content available as a structured and query able database may work for the likes of Wikipedia or arxiv.org. But it’ll not likely going to be adopted by commercial sites.

sharperguy4y ago

I wonder if combining this idea with some kind of microtransactional currency such as the bitcoin Lightning Network or even a simple Chaumian e-cash system (1) would help to get around the issue of requiring clickbait, advertising and SEO with every single piece of data.

Would be great if providers could offer data in raw form without the overhead of all the gunk that gets them paid.

1. https://en.wikipedia.org/wiki/Ecash

serverholic4y ago

It's clear that people want web apps, not the semantic web. I really don't see why people care so much about this.

__MatrixMan__4y ago

But how are we going to make sure that users see ads between each query?

moigagoo4y ago

Feel kinda disappointed that the blog isn't hosted on Ansiwave :-)

hankman864y ago

Btw, it’s funny how the failed “semantic” web is now labelled Web 3.0

dustractor4y ago

Range requests. Hmm. That would lead to some interesting semantics.

fiatjaf4y ago

What is this data we want to semantically link by the way?

seumars4y ago

This could be a nifty way of getting RSS back.

WolfOliver4y ago

How does this relates to an headless CMS?

jillesvangurp4y ago

I think both the capital S Semantic Web and the lowercase semantic web (microformats) kind of just fizzled out towards the end of last decade without changing much at all on the actual web.

The lower case variety kind of survives as a smart thing to do to "help" search engines a little but otherwise has very little real world relevance. All talk of doing anything with on page information in browsers evaporated a long time ago. E.g. MS had some plans with this with early versions of Edge and there were some nice extensions for Chrome and Firefox as well. Not a thing any more. Most of that got unceremoniously ripped out of browsers a long time ago. At this point it's basically just good SEO practice to use microformats as search engines can use all the help they need to figure out what is what on a page. Other than that, whether you render your data to a canvas, a table, or nice semantic HTML has very little relevance for anyone. It's all just pixels that hit your eyeballs in the end. There's nothing else that looks at that information. With the exception of search engines. And they were part of web 1.0 already.

The capital S Semantic Web with ontologies, triple databases, etc. never really got out of the gates and is perpetually stuck in people doing very academic stuff or specialist niche stuff that largely does not matter to anyone else. The exception is graph databases, which are still used in some data/backend teams for some stuff. And of course a few of those also pay lip service to some of the Semantic Web W3C standards from the early 2000s even though that is not the main thing they do anymore. Either way, too much of a specialist thing to call it a semantic web (capital or lower case). Most of the web uses exactly none of this stuff. But nice tools to have if you need them. You could argue a lot of the people involved moved their focus to AI and machine learning, which certainly looks like it is having a very large impact on e.g. search engines.

I guess web3 has that in common with web 3.0 (other than the number 3). There are a few people who desperately (and loudly) want the web to go their way and insist it must be the future. But most people couldn't care less. In the end people just vote with their feet and gravitate to technologies that work for them or solve a problem they have and ignore things that don't do anything useful for them. In the case of Semantic Web, there was nothing there that you could coherently explain (i.e. without using all sorts of abstractions, complex stuff, and simplistic hyperbole). There were a few startups and lots of hype. They did a bunch of stuff. Most of those startups no longer exist or have faded into irrelevance. And the few that survived carved out a few interesting niches but did not end up producing any mainstream, must have technology. Certainly no unicorns there. Wolfram Alpha probably is one of the more well-known ones that actually shipped something useful. But it's a destination and not the web.

Web3 has the same issues. Most threads on HN on web3 devolve into people talking about what it is, ought to be, or isn't and why that is or isn't important. That seems to be impossible to do without using a lot of hyperbole and BS. Very little substance in terms of widely adopted technology or even in terms of what that technology looks like or should look like. It's Web 1.0 all over again. Step 1 Blockchain, Step 2: ????, Step 3: Profit (or not).

Most of the web is just a slightly slicker version of what we had 15 years ago (web 2.0). AJAX definitely became common place. We now have mature versions of HTML, SVG, CSS, etc. that actually work. And with WASM we can finally engineer some proper software without having to worry about polyfills and other crazy hacks to make javascript do stuff it clearly is not very good at. I'm looking forward to the next 15 years. It's going to be interesting and possibly a wild ride.

j / k navigate · click thread line to collapse

213 comments

155 comments · 45 top-level

dgudkov4y ago· 23 in thread

One day I will do it.

DyslexicAtheist4y ago

> ... documents should have two representations - human-friendly and machine-friendly until AI gets so good ...

Perhaps there is a clever philosophical analogy in there somewhere about "us becoming the machine" or "the map becoming the territory", but I can't put my finger on it.

TuringTest4y ago

> Perhaps there is a clever philosophical analogy in there somewhere about "us becoming the machine" or "the map becoming the territory", but I can't put my finger on it.

__MatrixMan__4y ago

I have a similar, "One day I will do it" project. The idea is that somebody cares enough to make the semantic web work--it's just not the people with write access to the data.

COGNIZE:

0. Encounter messy data in the wild (has pagination, timestamps of access, etc), need other representation (human/computer/whatever)

1. Calculate CTPH fingerprints, use them to search for link: miss

3. Clean data the hard way and publish canonical representation (ipfs?)

4. Generate missing representation the hard way, publish that too

RECOGNIZE

1. Different user encounters "same" data in the wild

2. Calculate fingerprints, use them to search for canonical representation: hit

3. Find further links to see what other representations of the "same data" are available, download them if desired.

melony4y ago

__MatrixMan__4y ago

That's a good point. Once the tooling is there for the manual workflow, I expect that an ML-driven approach will plug in to the same hooks without fuss.

1 more reply

wnkrshm4y ago

What you describe has classically been the work of librarians or archivists.

__MatrixMan__4y ago

Ah well I guess I'm looking to make freelance librarian into something you can be.

manggit4y ago

mpweiher4y ago

The ZUGFeRD standard in Germany does this, though the embedded data is XML.

https://de.wikipedia.org/wiki/ZUGFeRD

My CodeDraw app also puts the source code in PDFs or PNGs that it generates.

daveydave4y ago

dgudkov4y ago

oever4y ago

XMP is meant for adding semantic information to (parts of) PDF files.

https://en.wikipedia.org/wiki/Extensible_Metadata_Platform

jonnydubowsky4y ago

Is this compatible with User Defined Language?

https://ivan-radic.github.io/udl-documentation/

daveydave4y ago

The RDF output I would typically serialise as turtle, which I believe there is existing UDL for in notepad++ though I don't use it

fatcow4y ago

There's the EPC QR code standard in Europe which is underused...

https://en.wikipedia.org/wiki/EPC_QR_code

biztos4y ago

I like this, and of course you could also embed the text of the document. Nothing stops us from doing this right now.

But: don't we need some way to prove that the data matches what is visually rendered in the PDF reader?

And if we can prove that the embedded data matches the rendered document, couldn't that same logic just be used in reverse to generate the structured data from the renderable PDF?

xmprt4y ago

That's not necessary. If you think of the PDF as a checksum then it's possible to have a one way function that generates the PDF (checksum) but that you can't retrieve the original JSON from.

I do really like the idea of having a checksum of some sort if we end up embedding metadata like this.

biztos4y ago

That's a good idea: the tool that processes the data can just run your function and if the file doesn't match the result then it's rejected.

q-base4y ago

That would be necessary. It should probably be baked into the creation of the file and perhaps even be included in the footer of the document itself.

ttyprintk4y ago

airstrike4y ago

> Speaking more broadly, whether we talk about HTML or PDF it's the same problem: documents should have two representations

Wait until you realize this applies to business presentations (Excel and PPT-authored-PDFs)

visarga4y ago

ML based invoice processing quality is around 90%, so human in the loop is still needed.

rokku4y ago

in germany we have sth like ZUGFerD (in german it means carthorse). it is a pdf with a xml file for parsing.

luhn4y ago· 15 in thread

simonw4y ago

You may be underestimating SQLite here. It has grown a LOT of features over the past decade, many of which were directly inspired by PostgreSQL.

Features you may have missed:

- CTEs - the WITH statement works great. And recursive CTEs, which can do mandelbrot fractals! https://www.sqlite.org/lang_with.html

- Surprisingly good built-in full text search: https://www.sqlite.org/fts5.html

- Functions for directly querying JSON data in columns: https://www.sqlite.org/json1.html

I'm not saying it's as "good" as PostgreSQL, but I don't thank your argument that PostgreSQL and MySQL implement a substantially larger portion of SQL holds up particularly well.

capiki4y ago

As luhn said, it’s more about a standard data format than a db choice. If every client has to figure out what schema a website uses for a recipe, let’s say, then Web 3.0 is still unrealistic.

Schema.org exists, but all websites adopting it seems unlikely.

That being said, I can maybe see a world in which one company adopts schema.org schemas and the rest have to follow suit to be competitive in that particular domain.

zozbot2344y ago

> Schema.org exists, but all websites adopting it seems unlikely.

Schema.org has the backing of major search engines and other reusers of Web-served content. It's way more likely to be adopted compared to anything else in its general domain.

Closi4y ago

SQLite is the most used database engine in the world, so I wouldn't call it niche. In fact, by some estimates, it is probably used more than all other database engines combined.

The only difference is that it is usually run locally (compared to Postgres and your other examples), but something doesn't have to run remotely to be considered running in production :)

luhn4y ago

Yes, when I said "production database" I meant a database for a web application. My iPhone running SQLite doesn't relate to Web 3.0.

4 more replies

scotty794y ago

Exposing you whole database (or subset user is allowed to see) as GraphQL is way better as it is engine agnostic.

zozbot2344y ago

GraphQL is not a standard, it's just a technology for building custom API's which are far from "agnostic" in practice. You can use SPARQL if interoperability if your goal.

1 more reply

sekaoOP4y ago

quinnjh4y ago

I enjoyed the speculative bias there as its prompting me to pick up the sqlite bundle thats been sitting in my downloads while i use post-gres-like-the-rest

DangitBobby4y ago

Well, having to write unique SQL per site is much better than having to write unique scrapers per site.

RileyJames4y ago

Exactly. And there’s really no reason those queries couldn’t be made public / collaboratively maintained.

You could probably take it one step further and define an OpenAPI spec which is populated via those queries. Tho that would require an intermediary / post-processing, likely with a cache.

contravariant4y ago

Wouldn't you want to use a subset of SQL in an API? Why use a unique superset that differs per webpage?

jumpkick4y ago

I thought Web 3.0 was about NFTs.

dspillett4y ago

tyingq4y ago

galaxyLogic4y ago· 13 in thread

SQLite is a relational database. Wouldn't it be a better fit to use a graph-database as the backend for anything "web"?

Groxx4y ago

What is a graph database? A miserable little pile of joins.

Though to be serious: what do you expect a graph database to provide that sqlite cannot / does not do efficiently?

wheels4y ago

(Note: I've actually written a graph database from scratch, for exactly these reasons.)

simonw4y ago

SQLite may be uniquely well suited to building graph databases out if all of the relational engines, thanks to this: https://www.sqlite.org/np1queryprob.html

1 more reply

ahevia4y ago

Do you have examples of Graph DBs that could embedded in a static site as easily as SQLite?

oever4y ago

1 more reply

j-pb4y ago

A whole pile of joins, that naturally arises if you want to combine data from a ton of domains.

patates4y ago

That has been my pet peeve for a while, and can make it hard to define navigation.

Totally not a deal-breaker though. I'd still use sqlite because I ### love it :)

Groxx4y ago

why would a direction be necessary?

    relation_id │ relatee_id
    ────────────┼───────────
              1 │ x
              1 │ y

voila, directionless relationships. you can even do N-way, just insert more rows with the same relation-id.

1 more reply

sktrdie4y ago

Take a look at Linked Data Fragments https://linkeddatafragments.org/

galaxyLogic4y ago

Interesting. That looks much like what the proposed idea (using SQLite) is. Use both clients and servers. No?

jinjin24y ago

It would be interesting to see it implemented on an object database like Realm, rather than on being based on SQL. Seems like it would be a much better fit.

TuringTest4y ago

Maybe some graph-like layer over SQLite like graphlite would help?

https://graphlite.readthedocs.io/en/latest/

mikestaub4y ago

OriginTrail.io is using Arangodb.com

0xbadcafebee4y ago· 11 in thread

[1] https://en.wikipedia.org/wiki/Semantic_Web [2] https://graphdb.ontotext.com/documentation/enterprise/introd...

zozbot2344y ago

> Graph databases are necessary to implement semantic web databases.

0xbadcafebee4y ago

2 more replies

smarx0074y ago

sekaoOP4y ago

> Graph databases are necessary to implement semantic web databases.

> For the BBS mentioned here, that might actually be illegal, as it might contain PII

smarx0074y ago

There is a wiki extension to embed RDF data: https://www.mediawiki.org/wiki/Extension:LinkedWiki

simonw4y ago

SQLite could actually make a really good basis for building a graph database, thanks to "Many Small Queries Are Efficient In SQLite": https://www.sqlite.org/np1queryprob.html

Karrot_Kream4y ago

The semantic web failed to become widely popular because:

1. Graph databases on top of triple stores are a lot less scalable than relational databases or key-value stores, and this is how semantic data is meant to be stored/queried.

nescioquid4y ago

> The Semantic Web never caught on because it's really fucking hard to get right. No companies are investing in making it easier.

zozbot2344y ago

> reasoners quickly became necessary

hobofan4y ago

NetOpWibby4y ago

Thanks for the links!

togaen4y ago· 6 in thread

Terrible idea. Why would anyone want to deal with interfacing a bunch of randomly structured databases whose tables can change at any time without warning. Nightmare.

luckystarr4y ago

Yes, it's still terrible for the consumer of the data. But I like it not because of that.

Warning: heavy speculation below

scotty794y ago

Not when compared to alternative of accessing random, misdocumented, randomly limited, arbitratily formatted subset of that which slowly bit-rots.

FridgeSeal4y ago

cxr4y ago

Parsing is not the hard part of dealing with the Web platform.

1 more reply

webmaven4y ago

> Terrible idea. Why would anyone want to deal with interfacing a bunch of randomly structured databases whose tables can change at any time without warning. Nightmare.

It isn't quite so bad. You can have wiki-esque volunteer-driven cooperative authoring, linking to known good versions, etc. to keep it from becoming a complete free-for-all.

mrpf1ster4y ago

I disagree, the alternative is either no access to website data or access through a tightly controlled API (which can come with the same problems if API compatibility is not guaranteed).

echelon4y ago· 5 in thread

Ontologies were centrally published (and had URLs when not - "URIs/URNs are cool"), so it was easy to understand data models. The entity name was the location was the definition. Ridiculously clever.

netcan4y ago

To add some fluff to this:

TBL's ideas were naive perhaps, but he did have his thumb in the right place. Something like semantic web was necessary, in order to avoid the centralisation that did end up happening.

RSS, via podcasting did catch on. Today it's one of the only "free" media forms. There's no company moderating podcasts like twitter, FB, youtube, etc.

zozbot2344y ago

hobofan4y ago

> The entity name was the location was the definition.

When I first learned about the semantic web, I was very hyped on it, but that quickly subsided once I tried actually querying the ontologies and having to see that most of them yield a 404.

mftb4y ago

In the end though I'm not sure it ever would have been any different. People want it "now" and they want it "convenient".

NetOpWibby4y ago

Wow, I had no idea, bookmarking your comment.

fleddr4y ago· 4 in thread

The semantic web is not a technical problem, it's an incentive problem.

That pretty much limits the idea to the "common goods" like Wikipedia and perhaps the academic world.

OliverJones4y ago

> The semantic web is not a technical problem, it's an incentive problem.

The hospitals and health care systems only did it because of incentives.

Hey, I got an idea! Let's give them an API! Oh, wait, nobody wants to bother with an API? OK, how about a nice web site! And we're back where we started.

With a universal semantic web, the same problems would crop up everywhere.

onion2k4y ago

Taking someone else's content and republishing it without permission isn't cool, even if you wrap it in a nice machine readable format.

fleddr4y ago

I fully agree, and that's one of the problems I was describing. There's very little content free of commercial interests. If this is true, it blocks a lot of potential use cases of a semantic web.

Micoloth4y ago

I'm more and more seeing that this is true. Still, it is sad.

The question isn't even, what can one do, because obviously nobody can change how incentives works in a given society.

The question is: is there a timeline in which the right incentives (to share data) start being enforced? How would that play out?

xmly4y ago· 4 in thread

I do not understand this conclusion: "Data on the web will only be "semantic" if that is the default, and with this technique it will be."

Why would it be semantic?

SkeuomorphicBee4y ago

Because the backend data is exposed to the world, with all its original semantic structure (relational model) intact, before it is flattened into a document view.

visarga4y ago

1 more reply

alexchamberlain4y ago

Generally relational models are _not_ semantic models.

rpwverheij4y ago

Point in case: YES to at least remembering that web3 is (also) the Semantic Web. But no, this solution is not semantic data.

gibsonf14y ago· 4 in thread

Well, then again the original idea is taking off with https://solidproject.org/ with millions of pods by Tim Berner-Lee's Inrupt to go online starting this Spring.

indigodaddy4y ago

Looks neat thanks for the link. Also appears to be fairly straightforward to self host a pod: https://solidproject.org//self-hosting/css

rapnie4y ago

> to go online starting this Spring

Any announcement I missed? Solid project exists for a long time and seems that many specs are still very early days.

gibsonf14y ago

Yes, the entire country of Flanders is getting pods for every citizen in March. Then there are patient pods for UK NHS, and then pods for BBC content users...

gfody4y ago

gross sparkle queries

fjfaase4y ago· 4 in thread

OliverJones4y ago

fjfaase4y ago

But as I wanted to point out, which data models are used, is not the major obstacle to the semantic web. It is these other problems that are not addressed.

KarlKemp4y ago

I feel Wikidata has a generally sane approach to these questions.

fjfaase4y ago

1 more reply

simonw4y ago· 4 in thread

I've been exploring the idea of using SQLite to publish data online via my Datasette project for a few years now: https://datasette.io/

The great thing about SQL APIs is that you can use them to alter the shape of the data you are querying.

Let's say there's a database with power plants in it. You need them as "name, lat, lng" - but the database you are querying has "latitude" and "longitude" columns.

If you can query it with a SQL query, you can do this:

    select name, latitude as lat, longitude as lng from [global-power-plants]

Here's a demo using exactly that query: https://global-power-plants.datasettes.com/global-power-plan...

That URL gives you back an HTML page, but if you change the extension to .json you get back JSON data:

https://global-power-plants.datasettes.com/global-power-plan...

Or use .csv to get back the data as CSV:

https://global-power-plants.datasettes.com/global-power-plan...

But what if you need some other format, like Atom or ICS or RDF?

    select
      issues.updated_at as atom_updated,
      issues.id as atom_id,
      issues.title as atom_title,
      issues.body as atom_content,
      repos.html_url  || '/issues/' || number as atom_link
    from
      issues join repos on issues.repo = repos.id
    order by
      issues.updated_at desc
    limit
      30

Try that query here: https://github-to-sqlite.dogsheep.net/github?sql=select%0D%0...

As you can see, there's a LOT of power in being able to use SQL as an API language to reshape data into the format that you need to consume.

oscargrouch4y ago

I also have a project to explore this alternative way of peers communication but i have a different answer to this, and i think its better if its a network of peers that expose API's

https://github.com/mumba-org/mumba

It's badly documented as i have just published to github, but i hope it gives a clue of how is supposed to work.

Or yet, suppose you need to access something in a third-party before giving an answer, or if you want to do it in a distributed fashion without you api consumer even noticing it?

API's are a good answer to that, and in my opinion are superior interfaces, whatever the semantic web of the future will be, it will need this network of API peers to work as a floor to it.

simonw4y ago

As with all APIs of this sort, adding new columns is fine - it's only removing columns or changing the behaviour of existing problems that will cause breakages for clients.

transfire4y ago

I wonder if database engines will ever have versioning, such that it would always be possible to see the database as it was at different points in time.

2 more replies

simonw4y ago

A couple of other plugins that work along similar lines:

- https://datasette.io/plugins/datasette-ics can be used to generate ICS calendar feeds, which you can subscribe to using desktop calendars or Google Calendar

- https://datasette.io/plugins/datasette-geojson can generate GeoJSON files for any SpatiaLite database table with a geometry column.

recursivedoubts4y ago· 4 in thread

https://intercoolerjs.org/2016/05/08/hatoeas-is-for-humans.h...

For code, why not just a structured and standardized JSON API?

This appears to be what we have settled on, and I don't see any big advantage extending REST-ful web concepts on top of it. The machines just ignore all that meta-data crap.

netcan4y ago

>> why not just a structured and standardized JSON API?

Author claims (if I'm understanding correctly) that a static website could easily query sqlites over HTTP, and bam, web 3.0.

Honestly, it's hard for me to think/discuss these ideas without examples, even if contrived. What kind of websites would be built this way? What data will they be querying?

A web app that uses photos and address books on the users phone? An alternative UI for news.yc?

mycall4y ago

> What kind of websites would be built this way?

https://ansiwave.net

netcan4y ago

What sqlites would it query, and to what ends?

Just trying to get a grasp on OPs "vision."

simonw4y ago

> why not just a structured and standardized JSON API?

The word "just" is doing a lot of work there! Getting everyone to use the same standardized JSON API format turns out to be incredibly difficult.

This is why I'm a big fan of the idea of using SQL as an API language to redefine the data into the output format that you need, see my comment here: https://news.ycombinator.com/item?id=29900403

SergeAx4y ago· 3 in thread

sekaoOP4y ago

A typical request using indexes will be less than 10 separate 1KB GET requests, not 50. But yeah, more work needs to be done on performance.

SergeAx4y ago

Schema-enforced document databases, on the other hand, are neat and mostly people- and machine-readable at the same time.

Another idea: downloading only indexes may greatly reduce number of requests needed to query the data.

ricardobeat4y ago

That’s what those tiny 1KB requests are doing mostly, downloading indexes. With http pipelining taken into account, it’s way faster than trying to preload the full index data.

Despite the number of requests seeming excessive to us, the performance of this setup is already in the ballpark of your usual underpowered MySQL going through an app server.

z3t44y ago· 3 in thread

The problem is if you would get paid by sending stuff, everyone would be spamming data everywhere. Imagine if you would have to pay 10c every time someone sent you an e-mail.

Also a lot of "the web" has moved to videos and Youtube. The average web user choose to watch a video rather then reading a text article covering the topic of interest.

chupchap4y ago

> Google has to pay your ISP

This is the standard ISP arguement that I never understood. Google pays their ISP and any other ISPs they are using for peering, but they don't HAVE to pay your ISP. You pay for your usage.

lobocinza4y ago

I as a consumer also have to pay for Internet access.

fizx4y ago

You can host this on a requestor pays S3 bucket.

_ea1k4y ago· 3 in thread

Is there really that much web safely exposable data in sqlite for this to make sense? I'm not really seeing how this is obviously better than the metadata ideas that preceded it.

rossdavidh4y ago

Some: weather, ratings, topography, dictionaries and encyclopedias, sports scores, market prices, some other stuff. All public knowledge, but not necessarily publicly available (easily) in raw form.

lostmsu4y ago

But doesn't it have to be immutable for the proposal to work?

vorpalhex4y ago

No, as long as the data model is stable you can add new rows.

You might want some kind of versioning for messy column changes, particularly removals.

1 more reply

fizx4y ago· 1 in thread

Effectively, this is a pretty cool way to get an OSS version of something like https://www.snowflake.com/guides/public-data. Also something like a lighter version of https://datasette.io/.

The thing it's kinda missing for me is the ability to compose multiple SQLite databases, possibly provided by different domains.

It'd be cool if one half of some table was at foo.com and I could add a few rows to it on my bar.com domain, and then the combined dataset was queryable as a single unit.

simonw4y ago

I've done some thinking about joining datasets together.

kdunglas4y ago· 1 in thread

API Platform is a popular and easy to use semantic web framework:

1. You design your data model as a set of PHP classes, or you generate the class from any RDF vocabulary such as Schema.org

2. API Platform uses the classes to expose a JSON-LD API with with all the typical features (sorting, filtering, pagination…)

In addition to RDF/JSON-LD/Hydra, API Platform also supports ActivityPub.

https://api-platform.com

no_wizard4y ago

Aren't you the same Dunglas that is head of this project?

phh4y ago· 1 in thread

sekaoOP4y ago

eoo4y ago· 1 in thread

Needs cost-based querying paid for with lightning network to be viable at scale.

regularfry4y ago

Why is that uniquely the case here?

uoaei4y ago

vmception4y ago

if "exposing and parsing metadata" never took off as the meaning for web 3.0, by the author's own admission, why try to resurrect it with this title? clickbait?

we love web3 labeled articles here

tingletech4y ago

"The semantic web is the future of the web, and always will be." -- Peter Norvig

https://www.youtube.com/watch?v=LNjJTgXujno&t=1257s

1 more reply

firechickenbird4y ago

mro_name4y ago

I don't get it, why the data and the final visual have to be both present/created ON THE SERVER.

There's been a technology around for so long, that it is forgotten meanwhile (like the semweb itself): xslt.

A lot can be done by just publishing raw xml data plus a visual representation generated in the browser right before display.

The server is a source of data, its filesystem the database, and the client has to make sense of it. There is no API but GET requests. Works wonders for all but big data queries, naturally.

So you publish raw data (TimBL, you want it that way) plus a recipe for a visual representation and the browser shows a sensible view to begin with.

jhoelzel4y ago

Well yes and no. I can see this working in theory, but in reality semantic means standardised as much as it means accessible.

In a world where my blogpost objet has the same information as your blogpost object, this works without a problem.

In a world where I actually want to up my database to you, we could agree on a format.

lmm4y ago

1 more reply

netcan4y ago

Whether or not it has legs, at least this is an interesting idea.

shp0ngle4y ago

.... the original SQLite-over-HTTP-ranges was a clever hack to host database-like data on github.

But I don't think it should be actually used for anything serious.

And I don't really get the connection with "semantic web", which was essentially idealistic vaporware of the 2000s.

visarga4y ago

> Data on the web will only be "semantic" if that is the default, and with this technique it will be.

[1] https://scholar.google.com/scholar?cites=9435785928704193879...

bokchoi4y ago

I never got on the semantic web train, but a translation layer does allow you to make underlying schema changes.

I poked around the ANSIWAVE BBS and it looks fun!

pietroppeter4y ago

Very nice indeed! I am sorry I did not notice before the discussion about previous blogpost on the subject [0] “Using the SQLite-over-HTTP "hack" to make backend-less, offline-friendly apps”

Are there more than 2 blogposts? Cannot find a posts page.

[0]: https://news.ycombinator.com/item?id=29758613

tzury4y ago

A more readable version

https://outline.com/E5J2Ft

punnerud4y ago

This post came 3months before phiresky, and should get credit for being first to making it practical: https://news.ycombinator.com/item?id=25842999

wombatmobile4y ago

> The semantic web will never happen if it requires additional manual labor.

Is manual labor the reason things turned out the way they did, with google spending whatever it took to index and monetise the whole web the way it did?

Or might money have something to do with it?

hankman864y ago

sharperguy4y ago

Would be great if providers could offer data in raw form without the overhead of all the gunk that gets them paid.

1. https://en.wikipedia.org/wiki/Ecash

serverholic4y ago

It's clear that people want web apps, not the semantic web. I really don't see why people care so much about this.

__MatrixMan__4y ago

But how are we going to make sure that users see ads between each query?

moigagoo4y ago

Feel kinda disappointed that the blog isn't hosted on Ansiwave :-)

hankman864y ago

Btw, it’s funny how the failed “semantic” web is now labelled Web 3.0

dustractor4y ago

Range requests. Hmm. That would lead to some interesting semantics.

fiatjaf4y ago

What is this data we want to semantically link by the way?

seumars4y ago

This could be a nifty way of getting RSS back.

WolfOliver4y ago

How does this relates to an headless CMS?

jillesvangurp4y ago

I think both the capital S Semantic Web and the lowercase semantic web (microformats) kind of just fizzled out towards the end of last decade without changing much at all on the actual web.

j / k navigate · click thread line to collapse