undefined | Better HN

0 pointsberdario2y ago0 comments

I don't think that's a good reason for not including such a warning.

"quite low" for a service with billions of users, can still allow for million of users who would benefit from seeing the warning.

0 comments

thiagoharry2y ago

This does not solve the issue for arabic users. Sounds not good for me declaring the problem solved just because it was solved for people speaking certain languages. Or attacking the problem excluding certain languages.

berdarioOP2y ago

That's a good point, but the algorithm for detection/flagging doesn't have to be what the grandparent post proposed.

Maybe something like: strip all tags (leaving only the unstyled text) and check:

- there shouldn't be any RLM within the URL

- RLM marks are accepted before/after the URL only if the URL uses only characters for a language that is RTL and the surrounding text uses characters for a language that is LTR (or viceversa, LTR and surrounding text is for RTL text)... Otherwise the text is flagged

- flag URLs that contain both characters for RTL and LTR languages (with possible exceptions for ccTLD/TLDs? )

Of course, this leaves some open problems (how big should the sample of the "surrounding text"?)

And also, Meta could roll out this logic/algorithm in public Facebook/Instagram posts, where it has more control of it... Rolling it out in WhatsApp first could be more problematic, since due to e2e, Meta wouldn't be able to easily spot false positives (messages with URLs that are flagged as potentially malicious, but which are actually fine)

j / k navigate · click thread line to collapse