Phew, tough question. As I went into web development when XHTML 1.1 strict was the "cool shit", I kind of valued the aspect of using the web for acquiring and distributing knowledge. Not only for me, but also for publishing or other forms of media (e.g. by offering print stylesheets), screen readers, and semantic extraction of that kind of knowledge.
(I was also working on project(s) that were using DAISY to automatically convert websites into hearable formats to be consumable by blind people.)
Somehow from then (around 2000ish) to now, everything went to shit and nobody cares about that aspect anymore. News websites are too busy displaying ads and pushing subscription dialogs in my face (before I read a single line of their article) - rather than being readable or consumable.
And I kind of disagree with that. I want to make the web an automatable tool to acquire knowledge in an easy manner. And I hope I can do that programming-free. Currently, programmers can easily build scrapers - but imagine the possibilities once any person or kid can do that with a few mouse clicks.
I know there are a lot of proprietary scrapy-based solutions out there already, but honestly I think they're crappy. They see the web as DOM and not as a statistical model that a neural network "could" learn once you have a different way of rendering/parsing/modelling things.
> How did you land on SGML?
The reason why I am currently building my HTML(5) compatible parser with SGML ideas is because nobody closes tags. The spec is very complicated (especially while having an eye on what can be abused in the XSS sense or related security issues with CORS), so currently I'm kind of looking at a lot of parsers out there and try to find my own way of making this into a statistical model, so that in future my neural net adapters can optimize old HTML code into new, clean, HTML5 code.
> What do you think of a browser/mode that parses markdown, so we can have a "markdown web" with less complex clients?
Actually this was my first idea to build this. I wanted to convert all html to markdown and back, so that it's easier and cleaner. The issue I realized is that most markup and meta information that comes with a website is lost in markdown (or commonmark), and layouting sometimes implies structure, too - due to how websites in wordpress (or any user-friendly CMS) are being built.
Code-wise you usually cannot imply meaning by only looking at HTML, sadly, that's why I switched to a "filtering proxy-like" approach, whereas the Browser UI simply receives the upgraded, clean HTML, CSS (and webfonts or other assets).