undefined | Better HN

0 pointsMajestic1211y ago0 comments

The reason absolutely matter : a mistake can happen to anyone, and be fixed within a short time, while censorship is deliberate and will probably not be fixed

0 comments

rsynnott1y ago

If a 'mistake' keeps happening for two years, then it is realistically not a mistake. This is not Musk!twitter's first 'mistake' of this nature, by a long shot.

Majestic121OP1y ago

Ok, I see what you mean and I agree with you, it would not be a first time a company pulls this kind of trick.

But in this case it's not that the reason does not matter, it's that the reason is censorship/bad faith competition, and is obfuscated behind a mistake

buyucu1y ago

twitter is not anyone. you would expect a company of this size with millions of users to have checks and tests.

sunaookami1y ago

Legit question: How would you test something like this? It's not like you can have automated tests with million of random links. And these URLs are certainly not hardcoded.

sjsdaiuasgdia1y ago

Compare the results between the current code and the proposed version. Analyze what the new version blocks that the current one doesn't and vice versa. Have logging that shows which factor(s) were applied in the actions taken. Determine if the outcomes are in line with the intended goals.

This can be accomplished in a few ways. You could accumulate real URLs and build a test set that you can run in non-prod environments prior to deploy. You could also deploy the new version alongside the current version, both watching the live data, with the current version retaining enforcement power while the new version is in log-only mode.

In the case of automated systems that might create new actions in response to live traffic, anomaly detection can be used to look for significant changes in classification and/or rate of actions, spikes in actions against specific domains, etc.

failrate1y ago

Unit tests with a text fixture of every URL format you can think of, and every time one breaks your e.g. parser, you add it and some permutations to the list. Eventually, you have a vast file with tons of nasty edge cases to ensure that your e.g. parser is robust and reliable.

berkes1y ago

You cannot. Because these aren't simple "URL parsers" or such. They commonly use heuristics, bayesian logic and complex statistics (the word AI has become so conflated with LLMs and GPTs, and infected with politics, but it is a form of AI).

The output isn't reproducible, not even predictable. The whole idea of a system like this is that it adapts. If only by simply collecting more data to "do the stats on".

What systems like this need, is different layers towards which stuff is leveraged. This is how your spam folder in your mailbox works too (to some extend). Basically: if it's clearly spam, just /dev/null it, if its clearly not spam let it pass. Everything inbetween will be re-rated by another layer which then does the same etc. One or more of these layers can and should be humans. The actions of these humans then train the system. If gmail isn't certain something is spam, it'll deliver it to your spam folder, or maybe even to your inbox. For you to review and mark as ham or spam manually.

Knowing that Elon fired a lot of teams of humans that fact-checked, researched fake news, a lot of it manually, I'd not be surprised if exactly the "human layers" were simply removed. Leaving a system that's not tuned nor checked while running.

(Source: I've built spam/malware/bot etc detection for comment sections of large sites)

itishappy1y ago

> It's not like you can have automated tests with million of random links.

Why not? They were already filtering millions of random links with the existing system. Saving some of those results to run regressions against before making changes to critical infrastructure should be trivial.

buyucu1y ago

if you can't test it properly, then maybe you should not roll it out.

j / k navigate · click thread line to collapse

0 comments

rsynnott1y ago

If a 'mistake' keeps happening for two years, then it is realistically not a mistake. This is not Musk!twitter's first 'mistake' of this nature, by a long shot.

Majestic121OP1y ago

Ok, I see what you mean and I agree with you, it would not be a first time a company pulls this kind of trick.

But in this case it's not that the reason does not matter, it's that the reason is censorship/bad faith competition, and is obfuscated behind a mistake

buyucu1y ago

twitter is not anyone. you would expect a company of this size with millions of users to have checks and tests.

sunaookami1y ago

Legit question: How would you test something like this? It's not like you can have automated tests with million of random links. And these URLs are certainly not hardcoded.

sjsdaiuasgdia1y ago

failrate1y ago

berkes1y ago

The output isn't reproducible, not even predictable. The whole idea of a system like this is that it adapts. If only by simply collecting more data to "do the stats on".

(Source: I've built spam/malware/bot etc detection for comment sections of large sites)

itishappy1y ago

> It's not like you can have automated tests with million of random links.

buyucu1y ago

if you can't test it properly, then maybe you should not roll it out.

j / k navigate · click thread line to collapse