So once all that is done, the user needs to click away cookie consent banner, newsletter sign-up and the continue reading button. And only now can we stop the clock on "time to read content".
This is a big reason why I read comments first. I click and get straight to content.
EDIT: I just realized that the "time to ..."-moniker works really bad in my phrasing here. Maybe "time to start reading" would have been better.
Doesn't help with load times though, it would be an interesting exercise for such a plugin to only fetch the content, a la Lynx back in the day.
Even if the whole text is loaded with the initial page, you'll see a request to somewhere to record that you clicked. Your engagement has been measured. This can be helpful for the site directly (which articles do people actually care about after the first paragraph) and that people are engaged enough to click for more is something they can “sell” to advertisers. A better designed site will have the “read more” button be an actual link so if you have JS disabled (or it fails to load) instead of the content reveal simply failing it falls back to a full-page round-trip so you are counted that way.
This could be done with picking up on scroll events or visibility tests on lower parts of the article, instead of asking the user to click, but those methods are less accurate for a number of reasons (people with big screens, people with JS disabled or JS failed to load, …)
In practice if editors don’t write excerpts, this is the first paragraph, and if there are no other stories, well it’s a measurement of engagement at that point.
You forgot the paywall that you will see at this point.
Maybe there is also a customer service bot saying: ”Hey! Ask me about our special offer on 12 month subscription!”
this "felt" too much work so i thought of redesigning the workflow so we ended up with <captcha send>username<wait><tab>password><wait><tab><show captcha getting banner><wait><captcha paste><tab><enter>
we made the entire process take time to accommodate the ~6 seconds of captcha which currently does not "feel" as taking that time because people are now not happy with loading and wait spinnners generally or stuff taking time, they want everything to be instant. this is just gaming the system so that we can work around technical limitations
Instead of clicking around and filling out forms and waiting for loading spinners all the time, we'll just tell a large language model what we want to do in English, and it will go off and screen-scrape a bunch of apps and websites, do all the clicking for us, and summarize the results in a much simpler UI designed to actually be fast and useful, vs. designed to optimize the business metrics of some company as interpreted by a gaggle of product managers.
This isn't unprecedented. Plaid screen scrapes terrible bank websites and turns them into APIs, though without AI. Google Duplex uses AI to turn restaurant phone numbers into an API for making reservations. DeepMind's Sparrow[1], just announced today, answers factual questions posed in plain English by performing Google searches and summarizing the results. But it's going to be a revolution when it becomes much more general and able to take actions rather than just summarize information. It isn't far off! https://adept.ai is pretty much exactly what I'm talking about, and I expect there are a lot more people working on similar things that are still in stealth mode.
[1] https://www.deepmind.com/blog/building-safer-dialogue-agents
And Google Search is so utterly weak. Not just because of the neutering of search options or the weird priority conflict inside Google, but just because even with litteraly all the data in the world about a specific user it doesn't seems like it can wrangle what a request actually means.
It can tell me the time in Chicago, but not what computer would actually be the best for my work. That search will only be spam, irrelevant popular results and paid reviews.
Same if I asked for a _good_ pizza recipe, it would probably not understand what that actually means for me.
The whole model of "throwing a request in the box and expecting a result" seems broken to me, I mean even between humans it doesn't work that way, why would it work with an advanced AI ?
PS: even with more back and forth, I'm imagining what we have now with customer support over chat, and while more efficient than by phone, it's definitely not the interface I want by default
At the same many primarily text sites struggle, such as news and social media, struggle to load in 15 seconds.
My connection measures about 920mbps down so 15 seconds is really ridiculously slow. This is certainly not a technology problem evidenced by my own app that is doing so much more is such shorter time.
Then, maybe LLMs will get more intelligent and it will be less comical, and more like having an actual slave. An agent intelligent enough to do all these things is probably intelligent enough make me queasy at the thought of being its master.
Language is an imprecise tool, it has inherent ambiguity. Language is also laborious and one dimensional (stream of bits over temporal dimension). A tool such as the one you describe would be extremely frustrating to use.
But I wonder if there will come a point where captchas and "Not a Robot" checkboxes no longer work.
And then there is bloat, the scourge of JavaScript frameworks and what passes for front-end development nowadays.
It keeps getting worse.
When I do web app security assessments, I end up with a logfile of all requests/responses made during browsing a site.
The sizes of these logfiles have ballooned over the past few years, even controlling for site complexity.
Many megabytes of JS shit, images, etc being loaded and often without being cached properly (so they get reloaded every time).
A lot of it is first party framework bloat (webdev active choices), but a lot is third party bloat - all the adtech and other garbage that gets loaded every time (also without cacheing) for analytics and tracking.
Economists and lawmakers have determined that the economic benefits of personalization accrue to the ad middlemen like Google, not to the publishers who have to encumber their sites with all the surveillance beacons, but the reality of the market is publishers have no leverage. That said, most of those beacons are set with the async/defer attribute and should not have a measurable on page load speed.
This is also amusing from a change management process at large organizations: want to tweak an Apache setting? Spend a month getting CAB approval and wait for a deployment window.
Want to inject unreviewed JavaScript onto every page in the domain? Login to Adobe…
Or just stream is as video on Youtube and let Google pay for the storage :)
Reminds me of that time when we switched from writing by hand (and sometimes typing on a typewriter) all kinds of forms and reports to composing them on a computer and then printing it: initial time savings were pretty huge and so, naturally, the powers that be said "well, guess We can make you fill much more paperwork than you currently are filling" and did so. In the end, the amount of paperwork increased slightly out of proportion and we're now spending slightly more time on it than we used to. A sort of a law of conservation of effort, if you will.
1. Power on BBC
2. Type *EDIT[ENTER]
Though obviously not as fully featured as a modern word processor, or even some editors of the time.
We haven't progressed in speed, but versatility is theough the roof compared to 10 ~ 20 years ago.
Oh to be a mere average computer user. I work with files that can still take some non-instant time after hitting save to complete. Conversely, it still takes some non-instant time to open said file. As long as there's such a thing as progress bars, count downs, spinning wheels, beach balls, etc, there is always room to make things faster.
And yet when you ask customers whether they want more features, or a faster program it's invariably features. Fix a bug or add a feature? Add a feature. Improve perf or add a feature? Add a feature.
My own tolerance for delays is tiny, but my "average users" seem to not suffer from it at all. I guess the reason is this: they know how much time something took before. They know that if this takes 10 seconds and it took them an hour to do on paper, that's quick. Meanwhile for me I'm in the IDE having a 200ms keystroke delay and I'm almost having a heart attack.
25 years later, progress bars are no better: they steadily go to 15%, then stop there for a while, suddenly zoom to 80%, then slowly progress to 99%, then stop there for a long time.
It’s understandable to get frustrated by this, but at some point you realize it’s pointless.
This is true in many, many facets of life. Household possessions tend to expand to fill the available square footage. Cities sprawl haphazardly until commute times become unbearable. Irrigation expands until the rivers are depleted. Life expands to the limit, always.
Even in a modern city this is rarely the case.
So I'm afraid the real answer is that webdev is just not mature yet.
In the comments I mostly see "because of js and adtech", but is there a factual analysis somewhere ? How much is due to the recent massive deployment of https and related latency ? Is it a problem of latency or bandwidth ? What type of contents is causing most of the waiting time ? Images, css, js ? Just wondering
I'm not sure what sort of analysis you're looking for, but different sites have different problems. I'm not convinced that averaging them makes sense, and it's very difficult to create a reasonable metric. As soon as you start measuring some specific metric, people will optimize for it whilst still making the site unusable. Unfortunately "usable state" is too hard to quantify without it being game-able.
> How much is due to the recent massive deployment of https and related latency
Very little, you can check here [0] and see for yourself. Obviously it depends on your distance to the server you are reaching, but the added latency of HTTPS isn't really a factor when you're looking at 10s for a page to load.
> What type of contents is causing most of the waiting time ? Images, css, js ?
Lets look at CNN (only because I happen to remember that a lite version exists thanks to someone on HN). The lite[1] version loads entirely in 350ms for me. The normal version[2] with adblock on finishes loading everything after 1.43s. The normal version with adblock off, finishes loading after 20s and reflows a bunch of times as ads get loaded.
So I agree with the rest of the comments, it's "because of js and adtech".
Disclaimer: I'm in South Africa at the moment on 4g, my internet isn't the best :)
I don't know about a real analysis, but it's pretty easy to see the effect in your own browser. Disable JS and see how much faster the page loads.
Turns out it was hauling in 10MB of analytics scripts.
Streamlit is the framework I used to build the app at the bottom of the article with. It does unfortunately load a decent amount of JS. However it should be non-blocking, which means it won't interfere with how quickly you can see or use the page.
It pings back to Streamlit to keep your session state alive as it's running a whole Python interpreter on the backend for each session.
The speed index for this page hovers between 1-2 seconds when I test it.
The source for that is some stats from speedtest.net, which I assume is calculated from the users who used their speed test? So it's probably heavily skewed towards power users who have a fast connection and want to check if they are really getting what they are paying for. Most "casual" users with shitty DSL connections are happy if "the internet" works at all and are pretty unlikely to ever use this service...
What's baffling to me is how people love to spend seemingly infinite time playing with tech stacks and what not but then pay very little attention to basic details like what to load and how many resources do they really need.
But I see the image inside the network tab so bandwidth is getting wasted for no reason.
ff always ask for internet connection although all needed files are in the cache.
i'm on internet diet (disable wifi for hours long) so bad behaviors are apparent
those who always-on-24/7 fiber may not notice
This is the same as saying "Despite larger pipes every year, water still doesn't reach your house any faster".
If a website is transferring more data to render content, or worse, before starting to display content, then bandwidth matters as it will move that data more quickly.
If a website requires multiple round trips to complete a given request, then latency will also matter, and sets an absolute minimum floor to time-to-display (TTD) regardless of bandwidth. The higher your latency, the slower that process.
In theory it's possible for an SPA (single-page application) to be more responsive despite an overall larger page weight as it can incrementally request and present additional content. In practice such pages often perform worse on time to display due to both increased total data transfer and round-trip request requirements. A lightweight HTTP/2 HTML+CSS only site can be far more performant if it's based on static pages and request/load dynamics.
Usage experience with typical specs of 2022 (500G SSD and 16G RAM) is as good or bad as ones from 20 years back (say, 20G HDD and 128M RAM).
From their FAQ/changelog [1]:
> 19 Mar 2013: The default connection speed was increased from DSL (1.5 mbps) to Cable (5.0 mbps). This only affects IE (not iPhone).
There was another popular article on HN a while ago [2], claiming mobile websites had gotten slower since 2011. But actually HTTP Archive just started using a slower mobile connection in 2013. I wrote more about that issue with the HTTP Archive data at the time [3].
[1] https://httparchive.org/faq [2] https://www.nngroup.com/articles/the-need-for-speed/ [3] https://www.debugbear.com/blog/is-the-web-getting-slower
The Google data uses Largest Contentful Paint instead of Speed Index, but the two metrics ultimately try to measure the same thing. Both have pros and cons. Speed Index goes up if there are ongoing animations (e.g sliders). LCP only looks at the single largest content element.
When looking at the real-user LCP data over time, keep in mind that changes are often due to changes in the LCP definition (e.g opacity 0 elements used to count but don't any more). https://chromium.googlesource.com/chromium/src/+/master/docs...
Do you think this is including ad loads? Ad networks run a real-time auction. It takes some time to collect bids so the highest can be chosen.
But I'm using uBlock origin …
You aren't loading all the adtech webshit that makes up the majority of a page load.
Given that network wide adblock/tracker blocking saves upwards of 60% of bandwidth (for web traffic) on the average network, its pretty obvious where the problem is.
Latency: Part of the page loads, and then the page asks for more data. In some cases, (such as loading a page hosted on another continent,) this is bound by the speed of light.
Poor data access (database) code: Sometimes this is due to lazy or incompetent programmers, other times its due to the fact that "not instant" is "good enough."
Writing a web page to load everything very quickly in a single request is surprisingly hard, and will often break modern and easily understood design patterns.
This problem is exacerbated by the fact that web pages are just built as giant JavaScript applications for no reason. With a page pretending to be an app, the browser has to download, parse, and execute the JavaScript core before the browser can do anything. All of the resources are hidden somewhere inside the application and the browser is stuck with its thumb up its ass unable to load or do anything.
Sites using a bunch of JavaScript to "render" everything client side are causing their own stupid problems. I just built a toy React app that only displays Hello World. It's the most trivial app that isn't just a blank project. A production build of it weighs in at 143kB for just the JavaScript portion. It's also multiple resource requests just to load that crap. Even bundled into an HTML skeleton it's still 145kB of JavaScript to run before the browser can do anything else.
Served locally it takes about 20ms to load that React app and draw something on my screen. An HTML copy of Frankenstein (463kB) loaded from the same server loaded in 5ms. A plain HTML document three times the size of the toy React "app" loaded in a quarter of the time!
> Writing a web page to load everything very quickly in a single request is surprisingly hard, and will often break modern and easily understood design patterns.
I find this to be an odd take. This nearly 4,000 word CNN article[0] including graphics weighs in at 70kB. The HTML alone including a bunch of inline styling (tables, font tags, etc) weighs only 55kB. With no graphics loaded it is still a perfectly readable article. Loading the HTML of that page in my above test it still loads in 5ms and is 100% readable.
Not just that but I can navigate with the sidebars and headers. In less space than just a toy React app I've got a completely usable page that can display as close to instantly as my eyes can determine. There's nothing about "modern and easily understood design patterns" that requires hundreds of kB of JavaScript or even extra CSS. It would be trivial to replace the table and font tags with CSS in the header and likely get the whole document even smaller while keeping 100% of the functionality and navigation.
The CNN article example coincidentally shows how stupid the JavaScript bloat problem is because several of the images are not found and the 404 pages CNN is returning in their place are a megabyte.
[0] http://www.cnn.com/TECH/computing/9806/24/win98.idg/index.ht...
> Contrary to popular belief, the average car is not in fact that much more fuel-efficient than older cars. Still, to this day, the average vehicle has a range of between 20 and 30 miles per gallon; a stat which was very similar in the 1920s. But, why is this? Well, cars are a whole lot bigger.
Developers will therefore always maximize resource usage as more becomes available.
Being able to render an enhanced markdown or JSON or YAML page in the browser without any generators would be phenomenal and resolve a lot of long-standing issues with structured data.
This is not optimized for "engagement" or whatever proxy for "money in my pocket", however, so it's pretty rare.
More sites taking advantage of them at the same pace of technical development, cancelling them out.
Otherwise if we had a website from 10 years ago it is faster to load with today's connections.
A startup could probably nuke reddit in a month if they just concentrate on performance and usability.
I would rephrase it:
Because faster broadband every year, web pages don't load any faster