DuckDuckGo \u202E (opens in new tab)

(duckduckgo.com)

348 pointszeepzeep4y ago118 comments

118 comments

111 comments · 27 top-level

lucideer4y ago· 15 in thread

Everyone here is asking if this is an "intentional easter-egg" or an "accidental bug"

But what about accidentally working-as-intended?

Sure it's a little trickier to read, but it's certainly not a "bug" that will cause any damage / danger / instability / etc.

thrdbndndn4y ago

I don't get your take.

Even the most strict definition of bug doesn't imply it has to "cause any damage / danger / instability / etc." to be one.

And I won't call it "work as intended" when the purpose of this feature is to provide an answer for human to read, and it failed on that.

evolve2k4y ago

I'd warmly beg to differ, I personally think it's illustrating how it is supposed to work, most elloquently.

willcipriano4y ago

I propose "accidental feature" for this sort of thing.

jackosdev4y ago

I like it, surprised the legions of Skyrim players haven't already coined that term

justbaker4y ago

“It’s not a bug it’s a feature”

gambler4y ago

Problem is, this behavior is so outside of the range of common expectations, it's really hard to say if it's harmless or not and what are the worst cases for (ab)using it.

subroutine4y ago

It's telling that the description (https://unicode-explorer.com/c/202E) even acknowledges that 202E is commonly used as an exploit. "The Right-To-Left Override character can be used to force a right-to-left direction withing a text. This is often abused by hackers to disguise file extensions: when using it in the file name my-text.'U+202E'cod.exe, the file name is actually displayed as my-text.exe.doc - so it seems to be a .doc file while in reality it is an .exe file."

jobigoud4y ago

> "accidentally working-as-intended"

An expression in French for this: "Tomber en marche" (literally: falling into walking). When something breaks we say it "tombe en panne" (falls into being out of service), when something works we say it "marche" (walks). So this expression is like "falling into a working state".

I wonder about the ratio of unknown bugs vs features that accidentally work, in the wild. Such features are time bombs waiting to explode during the next refactoring.

makapuf4y ago

The page redirects to u202E (no backslash) which is a normal word. I think it's an Easter egg.

account424y ago

I don't think it's intentional but just recoginzes a unicode code point with the uXXXX syntax even without the backslash and then includes the literal character in the info box without any consideration for special characters.

For example this shows an @: https://duckduckgo.com/?q=u0040&ia=answer

qwertox4y ago

I don't know. I feel at unease when the info banner reverses all the text ("This Instant Answer was made by the DuckDuckHack Community.").

Because the text looked very odd to me I highlighted the nonsensical text "noitatneserper lausiv" and context-menu searched it on Google. To my surprise it googled for "visual representation", and while retrying because I thought that maybe Google's engine auto-"corrected" the text, I noticed that even the text in the context-menu stated that it would google for "visual representation".

Then seeing that it was "noitatneserper lausiv" in reverse, maybe also in combination from the first hit "U+202E RIGHT-TO-LEFT OVERRIDE - Unicode Explorer", it felt like the browser had done something it should not do by actually applying the reversion to the info box.

When inspecting the HTML tag of the info box it displays the string "&#x202E U+202E RIGHT-TO-LEFT OVERRIDE, decimal...", but whenever I try to do something with it, it get's eiter reversed or messed up.

Another bug: When I select the entire text in the info box, I get " U+202E RIGHT-TO-LEFT OVERRIDE, decimal: 8238, HTML: No visual representation, UTF-8: 0xE2 0x80 0xAE, block: General Punctuation" <-- (btw, this was NOT what I had first entered into the textfield before this edit)

And trying to append a double quote to the text above, it inserts it at the beginning of the line, actually after the E202+U. When I expand the textbox so that the entire paragraph is in one line, E202+U moves to the end.

All this is creepy and I bet that it won't be long until an exploit with this uncontrollable Unicode character will hit the first vulnerable servers and browsers. This feels like Unicode is playing with fire.

Edit: From https://unicode-explorer.com/c/202E

> The Right-To-Left Override character can be used to force a right-to-left direction withing a text. This is often abused by hackers to disguise file extensions: when using it in the file name my-text.'U+202E'cod.exe, the file name is actually displayed as my-text.exe.doc - so it seems to be a .doc file while in reality it is an .exe file. There's even an xkcd comic for this character!

rising-sky4y ago

Probably a bug, browser history shows the title as "[object Object]", which is what happens if you print an object value that cannot be 'adequately' serialized to string in javascript e.g. ({}).toString()

missblit4y ago

The title of the page is "u202e at DuckDuckGo" which doesn't even have any funky unicode in it.

So you might have something else going on if it shows up as "[object Object]" in your browser history.

cryptonector4y ago

It is almost certainly an accident, but it might have been left in on purpose!

zeepzeepOP4y ago

Ha, it changed. It was indeed a bug

gunapologist994y ago· 14 in thread

Are there any lists of unicode characters (like the OWASP one) that should be blacklisted from most apps (not just for XSS, but even for desktop apps)?

Are there any good security guides/best practices for unicode sanitation?

wongarsu4y ago

How are users supposed to write "עבור אל duckduckgo.com כדי לחפש באינטרנט" without \u202E? It's perfectly normal for RTL languages to switch text direction in the middle of a sentence.

raphlinus4y ago

That should just render correctly thanks to the BiDi algorithm. The "override" control characters are a heavy hammer, and are extremely rarely needed. In fact, at this point I think it's likely that malicious use of these code points significantly outweighs correct use.

There are legitimate uses of BiDi control characters. My favorite one from my time on Android was the string "Google+", which would render as "+Google" in an RTL paragraph. The translators would usually "fix" this by just flipping the string so that it was "+Google", which would render correctly, but be incorrect when cut'n'pasted, read by a screen reader, etc. The correct solution is to use a left-to-right mark. The string "Google\u{200e}+" renders correctly in both LTR and RTL flow. And these "mark" characters are basically harmless, they cannot profoundly change the order, they just fix some of these ambiguous cases.

Correct use of BiDi control characters is explained here: https://www.w3.org/International/questions/qa-bidi-unicode-c...

kingcharles4y ago

And then you get Arabic and English text quoted in Japanese vertical RTL text and that's the story of how I actually died.

OJFord4y ago

Do you read left RTL, middle LTR, right RTL; or right RTL, middle LTR, left RTL? (Just curious.)

anamexis4y ago

Imagine it was the other way around: like you wanted to reference תֵּל־אָבִיב-יָפוֹ in the middle of an English sentence. That, but reversed.

1 more reply

kingcharles4y ago

> Do you read left RTL, middle LTR, right RTL; or right RTL, middle LTR, left RTL? (Just curious.)

YES.

sterlind4y ago

please don't blacklist U+202D and U+202E or the Private Use Area. my conlang has a right-to-left cursive script, and it's not in Unicode. the characters live in the PUA and my font renders them as a fallback. there's no mechanism for fonts to ask for RTL, so I have to use bidi override.

bawolff4y ago

I do think its kind of sad that the PUA doesnt have various areas with different properties (RTL, whitespace, joining, etc)

harambae4y ago

Not a full security guide, but if you haven't seen this before it's useful to have...

https://github.com/danielmiessler/SecLists/blob/master/Fuzzi...

adamrezich4y ago

I've seen this before but either this is new since last time or I missed it, either way: lol

    # Human injection
    #
    # Strings which may cause human to reinterpret worldview
    
    If you're reading this, you've been in a coma for almost 20 years now. We're trying a new technique. We don't know where this message will end up in your dream, but we hope it works. Please wake up, we miss you.

p_j_w4y ago

I'm not falling for this one, I know no one misses me!

missblit4y ago

For unicode security considerations see http://www.unicode.org/reports/tr36/

The report is divided into visual and non-visual security issues. Our old friend RTL override is covered, but mostly in the context of URLs.

bawolff4y ago

Put it inside a <span dir="auto"> ?

Anyways unicide category Cf is probably what you are looking for, but blocking them is probably wrong as they serve an important function.

sp3324y ago

I don't think this is a good place for a blacklist. Text effects should be encapsulated and reset at the end of the text block, the way bold or italic effects are.

hnlmorg4y ago· 13 in thread

If there was ever a clear signal that working with Unicode is incredibly hard, it would be the fact that no one on HN can decide if this is accidental or intentional.

divbzero4y ago

Let me take a stab at a definitive answer:

– It is unintentional for DuckDuckGo. The code for DuckDuckGo works correctly but no one who wrote that code thought about whether a reversal would happen.

– It is intentional for the browser. The code for the browser works correctly and someone who wrote that code actively thought about how to make a reversal happen.

I don’t think ‘accidental’ is the right word to use in either case because the outcome is what you would want.

hnlmorg4y ago

The reason I used "accidental" is because it's not a bug (and you've alluded to that same conclusion too). You could argue it's accidental from the perspective of DDG if it happened by chance rather than design. But the distinction between "accidental" and "unintentional" is nuanced and I'd already offered "intentional" the alternative option so I'd argue you can pretty much use them interchangeably in this specific situation.

tshaddox4y ago

It certainly looks like a simple template that DDG applies consistently to all queries for a UTF-8 byte literal. It's the exact same template for a query for a more straightforward literal, like u0041.

So I think it's fair to say that it's not intentional in the sense of being a deliberately added easter egg. Of course, they might be aware of the behavior and decided to leave it that way.

meetups3234y ago

The hardest problem in software engineering: to close with as-designed or out-of-scope.

barbazoo4y ago

And some of us don't even get what this is about. Should I be seeing DDG doing something particular here?

dtech4y ago

The "answer" tab is right to left

barbazoo4y ago

I had that turned off. Thanks for explaining it.

stubish4y ago

Joining two pieces of text and having one destroy meaning in the other is certainly a bug, most commonly a security bug. If you look at the search results in the original link, much of the discussion involves using it to hide file extensions and similar information hiding attacks.

tedunangst4y ago

A significant portion of the problem seems to be that some people can't even identify what's going because the tools they're using to inspect the page are also showing it reversed.

shockeychap4y ago

This! Also, https://news.ycombinator.com/item?id=21105625

iqanq4y ago

It's accidental, because other characters are also displayed: https://duckduckgo.com/?q=u20aa

sgjohnson4y ago

Yes, but this is not a printable character.

None of these will be shown, but ddg will recognise them as control characters though. https://www.compart.com/en/unicode/category/Cc

Retr0id4y ago

It's intentional, because there is no RTL override in the HTML source, the string is merely reversed.

2 more replies

amelius4y ago· 8 in thread

> This is often abused by hackers to disguise file extensions: when using it in the file name my-text.'U+202E'cod.exe, the file name is actually displayed as my-text.exe.doc

So every programmer has to know about and support U+202E, but not filesystem programmers?

mananaysiempre4y ago

More like UI programmers? It seems that almost everyone has agreed that text-processing smarts inside a filesystem are a bad idea (see: the NTFS collation table, the APFS transition away from ancient-version-NFD-but-not-quite), although there is that island of (admittedly very smart) -insensitive but -preserving holdouts (casing on Windows, normalization on ZFS). Linus rants on the topic[1] passionately, if not very informatively.

Note that U+202E is a control code that has effect on display, not the logical order of the text (much like, say, a bare CR), so I can’t say what the filesystem is doing wrong here (except maybe for not rejecting this outright, but see re smarts above, this probably needs to be done on a higher level). You don’t blame the filesystem for believing the filename "A\rB.txt" starts with A and not B, do you? Even though ls will say otherwise.

Bidi IRIs (which are at that higher level) are kind of horrendous, though.

[1] https://yarchive.net/comp/linux/utf8.html

tyingq4y ago

That's pretty much correct. Most of the filesystems I'm aware of just treat filenames as a "string of bytes" with some list of characters that aren't allowed, and perhaps a few other rules. Other than that, it's a free-for-all on names.

tedunangst4y ago

What do you want the filesystem programmer to do?

amelius4y ago

> What do you want the filesystem programmer to do?

Replace:

    if(bytestring_ends_with(filename, ".exe")) execute_file(...);

By:

    if(last_displayed_glyphs_equal(filename, ".exe")) execute_file(...);

account424y ago

The filesystem isn't executing anything so if anything you'd want the file manager or shell programmer to handle it. But yours is a terrible solution that would mean everyone else interacting with the filesystem to handle it too. Better to adjust the display code to treat extensions specially (if it doesn't already) and make sure that it is clear to the user what the real extension is.

foxfluff4y ago

    if (!isascii(c)) panic("stupid user");

mananaysiempre4y ago

  если (!кои(с)) авост(«тупой оператор»);

You wouldn’t want to live in that world, would you? I know I wouldn’t, and I have that as my native script and most of my filesystem in Latin. I’ve spent my childhood with a computer that ran a VGA-chargen-reprogramming hack at startup and later had to maintain a website stored in an encoding designed to preserve legibility after Latinization through amputation of the 8th bit (in case you’ve ever wondered where the illogical order of KOI-8 comes from). I do not want that world back, however fondly I remember my 286.

1 more reply

jamescodesthing4y ago

Same works for urls.

gambler4y ago· 6 in thread

Extremely bad design. This kind of complexity should have been moved to some kind of post-processing spec rather than core Unicode. It's already causing issues and will cause more. The more universal something is, the more effort should be applied to keeping it simple.

mananaysiempre4y ago

... It’s not clear how? Except by telling every speaker of Arabic and Hebrew saying they want some of that delicious “plain text” action to go screw themselves (there are no purely-RTL texts, only bidirectional ones, not least because of the Indic numerals). AFAIU (at least from the full-length horror novel that is the CDRA) IBM tried presentation-order (and no-complex-shaping) RTL text for decades and gave up, so Unicode bidi is essentially the result of said giving up (and the “Arabic Presentation Forms” block the foul-smelling corpse of the idea).

Specify the dominant direction of your user-input-containing elements, people, and/or enclose the input in U+2068 FSI ... U+2069 PDI (after balancing outstanding bidi controls inside).

gambler4y ago

> Except by telling every speaker of Arabic and Hebrew saying they want some of that delicious “plain text” action to go screw themselves

The problem is not with Arabic or Hebrew. The problem is that this modifier affects other languages and characters in a way the vast majority of people clearly wouldn't expect (otherwise the story wouldn't make it to the front page).

> Specify the dominant direction of your user-input-containing elements, people, and/or enclose the input in U+2068 FSI ... U+2069 PDI (after balancing outstanding bidi controls inside).

The level of arrogance packed in this sentence is just mind-boggling.

There are many other "Easter eggs" in various basic technologies. I can assure you that no matter how high of an opinion you have about yourself, if you write any production code at all, you are guaranteed to be using something that contains other Easter egg design decisions. You're not aware of them, you're not mitigating them and therefore whether they will explode on you is mostly just a matter of luck.

Minimizing "Easter egg" design decisions is the only long-term viable way to get complexity in our already complex environment under control.

mananaysiempre4y ago

>> Specify the dominant direction of your user-input-containing elements, people, and/or enclose the input in U+2068 FSI ... U+2069 PDI (after balancing outstanding bidi controls inside).

> The level of arrogance packed in this sentence is just mind-boggling.

It’s not arrogance, really, it’s just that I’ve been reading on this exact thing for the last couple of months, and the relevant knowledge is rather unpleasantly smeared over multiple documents in several places (W3C and Unicode.org mostly), so I tried to condense the recipe into a single sentence and drop some terms an interested person could look up: I was attempting to pack information. I see now how that could come off as arrogant, but can’t think of appropriate circumlocutions that could ward that off without turning it into a full bidi-in-HTML tutorial (which I am not qualified to write, for one thing). I already write too many unsolicited tutorials in my comments, this is me trying not to :(

> There are many other "Easter eggs" in various basic technologies. I can assure you that no matter how high of an opinion you have about yourself, if you write any production code at all, you are guaranteed to be using something that contains other Easter egg design decisions. [...]

I’m aware I have limits! I know lots of those! I discover new ones every day!

(I dread the day I need to figure out how an 802.11 retransmission works and how to fight one, for one thing. I can’t do post-2010 JS frontend to save my life, and my database knowledge is somewhere around “there were those guys with the normal form, I think?”. Limits? I’ve got ’em.)

I also expect that once I know about a footgun, I have a responsibility to avoid it, and that people who have just encountered such generally want to hear how to avoid it as well. I’m not entirely competent at the communication part. Sorry.

As to the actual issue... I could say that if you’re handling multilingual text, then you should damn well know how multilingual text works, that it’s not peripheral to your problem.

But I don’t actually believe that, not completely: I think this bidi thing is needlessly hard and we should have directional-stack-balancing and directionality-isolating functions in our standard libraries the same way we have URL-escaping or HTML-quoting ones. Perhaps even have the templating handle most of these cases automatically. It’s like with SQL injection: I don’t have a right to complain people are writing vulnerable queries if we don’t have convenient tools to write correct ones. Unfortunately, in the bidi case, we don’t, so we’ll have to treat this like spun glass until someone makes them.

(That’s part of why I’ve been looking into this so much lately.)

[Previously]

> The problem is not with Arabic or Hebrew. The problem is that this modifier affects other languages and characters in a way the vast majority of people clearly wouldn't expect (otherwise the story wouldn't make it to the front page).

As far as I know, this is not solvable. Or rather, this specific thing is, and the right-to-left override (U+202E RLO) is kind of a screw-up due to this kind of nonlocal effect on surrounding text (it might even be a holdover from the IBM days?), but you can’t design RTL such that it can be ignored by unaware programmers, with or without directional controls. Last I checked (several years ago), a post in Hebrew would wreak considerable destruction on an LTR Facebook news feed, no controls required.

The problem is of distinguishing a white zebra with black stripes from a black zebra with white stripes: Are you looking at RTL text with LTR pieces inside or LTR text with RTL pieces inside? (If you don’t see why this would change the layout, the Unicode Bidirectional Algorithm spec has examples.) What if the pieces themselves include opposite-direction quotes? How do you know where the pieces end in the presence of characters with no intrinsic direction (punctuation, emoji)?

You can encode everything in LTR display order. Your RTL-script users, DBAs, search engine developers, etc. will hate you.

You can require explicit indicators. If this needs to work in plain text (and it does, if Arabic and Hebrew are to do plain text at all, because RTL text requires embedded LTR pieces fairly often), you’ll have to express that in format controls. But then if a user manages drop a right-to-left switch into English text, which couldn’t care less about RTL, the text will get completely messed up and the user gets to complain why RTL influences English. You may try to completely disallow controls in markup that has alternative ways of expressing directionality, but then your input method, your clipboard, etc. needs to know about every possible kind of markup, or every markup processor needs to generate equivalent controls. To at least limit the scope of the disaster, you declare that the effect of the controls ends at a paragraph boundary, but then you need to tell where that is, and the kind of “plain text” you inherited has no good way of distinguishing a mere hard line break from a paragraph terminator except by not-so-plain “protocol” conventions. So you’ll need to guess.

You can ditch explicit indicators and guess. Your processing algorithm will need to know which scripts have which direction, of course, but that’s not a problem. Given the presence of quotations and such in plain text, it’ll also need to learn about paired delimiters and which of them pair with which others, and try to recover when the pairs are wrong or unbalanced, because users are awful. Because of the aforementioned zebra problem, you’ll also need a way to guess which direction of a piece of text is the main one, which seems intractable without godlike NLP, so maybe just take the first character with a definite direction and tell people who start sentences with an opposite-direction fragment they lose? Overall, the whole guessing game becomes so complex it’s completely impossible to reliably embed an arbitrary fragment of user input inside your text unchanged (without inserting visible compensating delimiters, for example), so some kind of format controls that manipulate a stack of directions are called for.

The Unicode design does most of the above; it is complex and could undoubtedly be simpler—there’s like three generations of “no, that’s a bad idea, let’s try again” in there. But it seems like some indication from a programmer that they want to insert this inner thing, that should remain intact, into this outer thing, that shouldn’t get messed up in the process, would be required in any logical-order design at all; you won’t be able to just concatenate byte sequences. It’s acting on that indication that could stand to be easier.

1 more reply

kevin_thibedeau4y ago

There is boustrophedonic ancient Greek and other languages. Unicode is kept generalized to support such schemes.

https://en.wikipedia.org/wiki/Boustrophedon

the_mitsuhiko4y ago

I strongly disagree. This is a necessary part to shared content text and pushing this type of functionality into another layer makes a lot of content non accessible in basic text format. This is precisely the type of control character that makes Unicode such a powerful and successful system.

sgjohnson4y ago

Bad design by what definition? Unicode is all about unifying ALL the characters from ALL the character sets into a single one, while also being compatible with 7 bit ASCII.

Emojis made it into Unicode because Japanese had custom emoticons, that just had to be brought into Unicode. Then someone discovered them on iOS and they skyrocketed in popularity.

If you want everyone to use Unicode, you truly have to account for everyone's use cases. No exceptions. Even if it means including Emojis, Ancient Egyptian hieroglpyhs[0], or such an irrelevant thing for every language using the latin script as a "RTL override character".

[0] https://en.wikipedia.org/wiki/Egyptian_Hieroglyphs_(Unicode_...

I'd say it's perfect design, with pretty good implementation too.

Sebb7674y ago· 4 in thread

I'm not sure whether this is a bug or a feature^Weaster egg

pwdisswordfish94y ago

Oversight, probably. By default, the code point is displayed next to that description, and they don’t turn that off for bidirectional control characters.

https://duckduckgo.com/?q=u1f4a9

(Yes, I have that one memorized)

zanderwohl4y ago

If you look down the page, some preview elements are also reversed. I think this may be accidental.

BitwiseFool4y ago

I'm out of the loop, what kind of Easter Egg is it?

brimble4y ago

The text in the instant-answer bar is reversed for this result. Which could plausibly either be on purpose, or a result of the character itself being inserted and not escaped, so having its intended effect.

tobz10004y ago· 4 in thread

Easter egg or bug?

oneplane4y ago

bug egg? it's also an instant answer from the community (the little info icon on the right hand side) so perhaps just presented that way due to how it was delivered by that specific community member.

Waterluvian4y ago

Poe's Law applied to coding easter eggs? :D

zeepzeepOP4y ago

That's the question!

(I think it's unintended though)

rackjack4y ago

Easter bug?

jfk134y ago· 3 in thread

Similarly, if I try https://www.google.com/search?q=u202e, the second result I currently get (YMMV) is from https://unicode-table.com/, and almost the entire snippet shows up backwards in the search results.

jefftk4y ago

It's backwards on the original too: https://unicode-table.com/en/202E/

dmoy4y ago

Yup the meta description field as written is flipped in the serp

What is even more hilarious is if you copy/paste out of the developer tools, that is also backwards after pasting.

kathoum4y ago

https://unicodeplus.com/U+202E too. You can see the point where it switches from Left-To-Right to Right-To-Left.

Jerrrry4y ago· 3 in thread

Stacking combining diacritics[1] is also fun, to make extremely tall text.

Also fun is enumerating all the characters in the Private Character section[2] to see what UI symbols are able to be inserted into unintended places.

[1] https://www.unicode.org/charts/PDF/U0300.pdf

[2] http://www.unicode.org/faq/private_use.html https://www.unicode.org/charts/PDF/UE000.pdf

hanche4y ago

> Stacking combining diacritics is also fun, to make extremely tall text.

A bit OT, but here is a classic example of that (the much upvoted stack overflow post on parsing html with regex):

https://stackoverflow.com/a/1732454

cbarrick4y ago

https://en.wikipedia.org/wiki/Zalgo_text

zeepzeepOP4y ago

I always wondered how people get these funny Twitter names, thx!

d134y ago· 3 in thread

Can anyone explain what this is all about? I’m looking at the link and threads and have absolutely no idea what’s supposed to be significant here

perlgeek4y ago

the Unicode codepoint with hex value 202E says "from here on, render the rest of the text from right to left" (something that's useful for Arabic scripts, for example).

Duckduckgo shows infos about the codepoint and the codepoint itself in a box between the search field and the actual results, and in it, the text is rendered reversed (right to left), because that's what the codepoint tells the browser to do (and DDG doesn't have extra logic yet to either inject another "now render from left to right again" marker, or otherwise prevents it from messing up the info box).

d134y ago

Thank you!!

cryptojournal4y ago

Hahahah #metoo!

bkmeneguello4y ago· 2 in thread

https://xkcd.com/1137/

blunte4y ago

Well done. I don't understand this TFA, nor do I understand (fully) the xkcd cell. But I get the connection. Thanks :)

cipheredStones4y ago

U+202E is a Unicode codepoint, a control character that signals that letters (or other characters) should be printed right-to-left, as in Arabic or Hebrew.

What the DDG link illustrates is that when someone searches for information about that codepoint, DDG's autogenerated answer section accidentally _uses_ that control character (reversing the answer text) instead of just printing the codepoint.

soheil4y ago· 2 in thread

What's next, searching for the word death causes you to die?

chris_wot4y ago

That would be an interesting instant answer.

soheil4y ago

Gives a whole other meaning to “I’m feeling lucky!”

soheil4y ago· 2 in thread

Where does DDG get its search result? Do they scrape Google? If so how do they not bet banned both technically and legally?

sp3324y ago

https://help.duckduckgo.com/duckduckgo-help-pages/results/so...

thesuitonym4y ago

They have their own web crawlers, as well as a deal with Bing (And perhaps others)

kroltan4y ago· 2 in thread

It's intentional, if you inspect the `innerText` you'll see it's reversed there too:

    zero_click_wrapper.innerText.codePointAt(0)

Evaluates to 32. And if you think 32 = 0x20 could mean the next one would be 0x2E, then no, codePointAt(1) is 0x55.

nneonneo4y ago

`innerText` doesn't include the RTL marker, probably due to the fact that it is supposed to reflect the "rendered" appearance of the element (i.e. deleting certain invisible characters). However, `textContent` shows the RTL marker as expected.

I'm on the side of this being an unintentional effect.

missblit4y ago

> `innerText` doesn't include the RTL marker

I'm too under the weather to dig into this, but this might be a mismatch between Firefox and the spec. I don't see in the spec [1][2] where this character could be removed since it shouldn't count as whitespace for whitespace processing.

It looks like in Chrome `innerText` contains the override. And the innerText spec is only 6 or so years old (!) so it wouldn't be too surprising if there were was a lingering incompatibility.

[1] https://html.spec.whatwg.org/multipage/dom.html#the-innertex... [2] https://drafts.csswg.org/css-text/#white-space-processing

echelon4y ago· 1 in thread

You still have to be mindful of \u202e in anything new that you're writing, but browsers do a much better job of not having it bleed across elements like they did back in the 2000s.

Back in the era of forums that didn't support unicode correctly (2005ish?), it was trollish fun to post messages containing \u202E and watch the UI and all subsequent messages and elements get messed up. (One stray \u202E would flip the entire page contents following it.) I never took it to a level of abuse since it was easy to remove and then ban offenders, but it was fun in a one-off thread, and it always had great reactions.

I patched my own software to handle it, but I don't recall anyone really abusing it in a widespread manner. (Contrast this with the era of prolific and widely abused AOL/AIM exploits that would kill your IM client with malformed messages.)

IIRC, a bunch of messaging clients also didn't (or still don't) handle \u202e termination and it sometimes bled into new messages and even the text input box. That was pretty horrible and unfixable without restarting.

Obligatory XKCD: https://xkcd.com/1137/

Some shenanigans in the wild:

https://www.reddit.com/r/Unicode/comments/hc1rxi/i_put_a_rig...

https://twitter.com/mkolsek/status/1237123571341803522

(These are way tamer than the effects used to be.)

(Also, HN filters it out. I tried to have some fun. :P)

ocdtrekkie4y ago

I have seen it used maliciously in the wild. Email attachments like invoice.tab.pdf are actually Batch files named invoice.fdp.bat with this character inserted.

stubish4y ago· 1 in thread

Our programming languages might need a unicode aware string concatenation operator, similar to locale aware capitalization. Joining LTR text to RTL text seems like it should result in combined LTR + RTL text, not letting the LTR marker override and change meaning.

account424y ago

It does look like HTML supports this via the <bdo> tag [0]:

  data:text/html,<bdo>&%23x202E;reversed</bdo>&nbsp;not reversed

So I guess this should be used to wrap any user-supplied text that allows arbitrary unicode.

Or using Unicode:

  data:text/html,&%23x2068;&%23x202E;reversed&%23x2069;&nbsp;not reversed

[0] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/bd...

TadeusTaD4y ago· 1 in thread

Instantly reminded me of a relevant xkcd: https://xkcd.com/1137/

zeepzeepOP4y ago

Hey that's new to me, I'll use this, thanks.

benbristow4y ago

Reversed: U+202E RIGHT-TO-LEFT OVERRIDE, decimal: 8238, HTML: No visual representation, UTF-8: 0xE2 0x80 0xAE, block: General Punctuation

bncy4y ago

Umm, there's a little info button to the right that says that this 'quick' answer was proposed by DuckDuckHack community author.

splch4y ago

Oh that's cute! Translation for anyone curious / lazy:

Punctuation General :block ,0xAE 0x80 0xE2 :8-UTF ,representation visual No :HTML ,8238 :decimal ,OVERRIDE LEFT-TO-RIGHT 202E+U

Love the demos :)

avnigo4y ago

The funny thing is that search queries preceded by a backslash on DuckDuckGo are supposed to take you to the first search result, but that functionality seems to be buggy anyway:

https://www.reddit.com/r/duckduckgo/comments/sp9e5r/backslas...

thecosmicfrog4y ago

Reminds me of searching for the terms "do a barrel roll", "recursion" or "askew" on Google. I'm sure there's plenty of others.

chris_wot4y ago

"This Instant Answer was made by the DuckDuckHack Community.

Developer: Cosimo Streppone

Developer: mintsoft"

ryukoposting4y ago

And somehow, the "external link" icon is outside the scope of Unicode.

f3rnando4y ago

Also known as "Top Gun"

dheera4y ago

‮

damnit hn

heartbeats4y ago

Why can't I just disable RTL on my system?

I do not speak a word of Arabic. There is no circumstance in which my life will be materially improved by correct RTL text rendering. I might want proper display of individual characters so I can copy-paste them, but I have no use for RTL text.

On the other hand, RTL causes a lot of unpleasant problems like this. Why can't I simply coerce all foreign languages into LTR?

j / k navigate · click thread line to collapse

118 comments

111 comments · 27 top-level

lucideer4y ago· 15 in thread

Everyone here is asking if this is an "intentional easter-egg" or an "accidental bug"

But what about accidentally working-as-intended?

Sure it's a little trickier to read, but it's certainly not a "bug" that will cause any damage / danger / instability / etc.

thrdbndndn4y ago

I don't get your take.

Even the most strict definition of bug doesn't imply it has to "cause any damage / danger / instability / etc." to be one.

And I won't call it "work as intended" when the purpose of this feature is to provide an answer for human to read, and it failed on that.

evolve2k4y ago

I'd warmly beg to differ, I personally think it's illustrating how it is supposed to work, most elloquently.

willcipriano4y ago

I propose "accidental feature" for this sort of thing.

jackosdev4y ago

I like it, surprised the legions of Skyrim players haven't already coined that term

justbaker4y ago

“It’s not a bug it’s a feature”

gambler4y ago

Problem is, this behavior is so outside of the range of common expectations, it's really hard to say if it's harmless or not and what are the worst cases for (ab)using it.

subroutine4y ago

jobigoud4y ago

> "accidentally working-as-intended"

I wonder about the ratio of unknown bugs vs features that accidentally work, in the wild. Such features are time bombs waiting to explode during the next refactoring.

makapuf4y ago

The page redirects to u202E (no backslash) which is a normal word. I think it's an Easter egg.

account424y ago

For example this shows an @: https://duckduckgo.com/?q=u0040&ia=answer

qwertox4y ago

I don't know. I feel at unease when the info banner reverses all the text ("This Instant Answer was made by the DuckDuckHack Community.").

Edit: From https://unicode-explorer.com/c/202E

rising-sky4y ago

missblit4y ago

The title of the page is "u202e at DuckDuckGo" which doesn't even have any funky unicode in it.

So you might have something else going on if it shows up as "[object Object]" in your browser history.

cryptonector4y ago

It is almost certainly an accident, but it might have been left in on purpose!

zeepzeepOP4y ago

Ha, it changed. It was indeed a bug

gunapologist994y ago· 14 in thread

Are there any lists of unicode characters (like the OWASP one) that should be blacklisted from most apps (not just for XSS, but even for desktop apps)?

Are there any good security guides/best practices for unicode sanitation?

wongarsu4y ago

raphlinus4y ago

Correct use of BiDi control characters is explained here: https://www.w3.org/International/questions/qa-bidi-unicode-c...

kingcharles4y ago

And then you get Arabic and English text quoted in Japanese vertical RTL text and that's the story of how I actually died.

OJFord4y ago

Do you read left RTL, middle LTR, right RTL; or right RTL, middle LTR, left RTL? (Just curious.)

anamexis4y ago

Imagine it was the other way around: like you wanted to reference תֵּל־אָבִיב-יָפוֹ in the middle of an English sentence. That, but reversed.

1 more reply

kingcharles4y ago

> Do you read left RTL, middle LTR, right RTL; or right RTL, middle LTR, left RTL? (Just curious.)

YES.

sterlind4y ago

bawolff4y ago

I do think its kind of sad that the PUA doesnt have various areas with different properties (RTL, whitespace, joining, etc)

harambae4y ago

Not a full security guide, but if you haven't seen this before it's useful to have...

https://github.com/danielmiessler/SecLists/blob/master/Fuzzi...

adamrezich4y ago

I've seen this before but either this is new since last time or I missed it, either way: lol

    # Human injection
    #
    # Strings which may cause human to reinterpret worldview
    
    If you're reading this, you've been in a coma for almost 20 years now. We're trying a new technique. We don't know where this message will end up in your dream, but we hope it works. Please wake up, we miss you.

p_j_w4y ago

I'm not falling for this one, I know no one misses me!

missblit4y ago

For unicode security considerations see http://www.unicode.org/reports/tr36/

The report is divided into visual and non-visual security issues. Our old friend RTL override is covered, but mostly in the context of URLs.

bawolff4y ago

Put it inside a <span dir="auto"> ?

Anyways unicide category Cf is probably what you are looking for, but blocking them is probably wrong as they serve an important function.

sp3324y ago

I don't think this is a good place for a blacklist. Text effects should be encapsulated and reset at the end of the text block, the way bold or italic effects are.

hnlmorg4y ago· 13 in thread

If there was ever a clear signal that working with Unicode is incredibly hard, it would be the fact that no one on HN can decide if this is accidental or intentional.

divbzero4y ago

Let me take a stab at a definitive answer:

– It is unintentional for DuckDuckGo. The code for DuckDuckGo works correctly but no one who wrote that code thought about whether a reversal would happen.

– It is intentional for the browser. The code for the browser works correctly and someone who wrote that code actively thought about how to make a reversal happen.

I don’t think ‘accidental’ is the right word to use in either case because the outcome is what you would want.

hnlmorg4y ago

tshaddox4y ago

So I think it's fair to say that it's not intentional in the sense of being a deliberately added easter egg. Of course, they might be aware of the behavior and decided to leave it that way.

meetups3234y ago

The hardest problem in software engineering: to close with as-designed or out-of-scope.

barbazoo4y ago

And some of us don't even get what this is about. Should I be seeing DDG doing something particular here?

dtech4y ago

The "answer" tab is right to left

barbazoo4y ago

I had that turned off. Thanks for explaining it.

stubish4y ago

tedunangst4y ago

A significant portion of the problem seems to be that some people can't even identify what's going because the tools they're using to inspect the page are also showing it reversed.

shockeychap4y ago

This! Also, https://news.ycombinator.com/item?id=21105625

iqanq4y ago

It's accidental, because other characters are also displayed: https://duckduckgo.com/?q=u20aa

sgjohnson4y ago

Yes, but this is not a printable character.

None of these will be shown, but ddg will recognise them as control characters though. https://www.compart.com/en/unicode/category/Cc

Retr0id4y ago

It's intentional, because there is no RTL override in the HTML source, the string is merely reversed.

2 more replies

amelius4y ago· 8 in thread

> This is often abused by hackers to disguise file extensions: when using it in the file name my-text.'U+202E'cod.exe, the file name is actually displayed as my-text.exe.doc

So every programmer has to know about and support U+202E, but not filesystem programmers?

mananaysiempre4y ago

Bidi IRIs (which are at that higher level) are kind of horrendous, though.

[1] https://yarchive.net/comp/linux/utf8.html

tyingq4y ago

tedunangst4y ago

What do you want the filesystem programmer to do?

amelius4y ago

> What do you want the filesystem programmer to do?

Replace:

    if(bytestring_ends_with(filename, ".exe")) execute_file(...);

By:

    if(last_displayed_glyphs_equal(filename, ".exe")) execute_file(...);

account424y ago

foxfluff4y ago

    if (!isascii(c)) panic("stupid user");

mananaysiempre4y ago

  если (!кои(с)) авост(«тупой оператор»);

1 more reply

jamescodesthing4y ago

Same works for urls.

gambler4y ago· 6 in thread

mananaysiempre4y ago

Specify the dominant direction of your user-input-containing elements, people, and/or enclose the input in U+2068 FSI ... U+2069 PDI (after balancing outstanding bidi controls inside).

gambler4y ago

> Except by telling every speaker of Arabic and Hebrew saying they want some of that delicious “plain text” action to go screw themselves

> Specify the dominant direction of your user-input-containing elements, people, and/or enclose the input in U+2068 FSI ... U+2069 PDI (after balancing outstanding bidi controls inside).

The level of arrogance packed in this sentence is just mind-boggling.

Minimizing "Easter egg" design decisions is the only long-term viable way to get complexity in our already complex environment under control.

mananaysiempre4y ago

>> Specify the dominant direction of your user-input-containing elements, people, and/or enclose the input in U+2068 FSI ... U+2069 PDI (after balancing outstanding bidi controls inside).

> The level of arrogance packed in this sentence is just mind-boggling.

I’m aware I have limits! I know lots of those! I discover new ones every day!

As to the actual issue... I could say that if you’re handling multilingual text, then you should damn well know how multilingual text works, that it’s not peripheral to your problem.

(That’s part of why I’ve been looking into this so much lately.)

[Previously]

You can encode everything in LTR display order. Your RTL-script users, DBAs, search engine developers, etc. will hate you.

1 more reply

kevin_thibedeau4y ago

There is boustrophedonic ancient Greek and other languages. Unicode is kept generalized to support such schemes.

https://en.wikipedia.org/wiki/Boustrophedon

the_mitsuhiko4y ago

sgjohnson4y ago

Bad design by what definition? Unicode is all about unifying ALL the characters from ALL the character sets into a single one, while also being compatible with 7 bit ASCII.

Emojis made it into Unicode because Japanese had custom emoticons, that just had to be brought into Unicode. Then someone discovered them on iOS and they skyrocketed in popularity.

[0] https://en.wikipedia.org/wiki/Egyptian_Hieroglyphs_(Unicode_...

I'd say it's perfect design, with pretty good implementation too.

Sebb7674y ago· 4 in thread

I'm not sure whether this is a bug or a feature^Weaster egg

pwdisswordfish94y ago

Oversight, probably. By default, the code point is displayed next to that description, and they don’t turn that off for bidirectional control characters.

https://duckduckgo.com/?q=u1f4a9

(Yes, I have that one memorized)

zanderwohl4y ago

If you look down the page, some preview elements are also reversed. I think this may be accidental.

BitwiseFool4y ago

I'm out of the loop, what kind of Easter Egg is it?

brimble4y ago

tobz10004y ago· 4 in thread

Easter egg or bug?

oneplane4y ago

bug egg? it's also an instant answer from the community (the little info icon on the right hand side) so perhaps just presented that way due to how it was delivered by that specific community member.

Waterluvian4y ago

Poe's Law applied to coding easter eggs? :D

zeepzeepOP4y ago

That's the question!

(I think it's unintended though)

rackjack4y ago

Easter bug?

jfk134y ago· 3 in thread

jefftk4y ago

It's backwards on the original too: https://unicode-table.com/en/202E/

dmoy4y ago

Yup the meta description field as written is flipped in the serp

What is even more hilarious is if you copy/paste out of the developer tools, that is also backwards after pasting.

kathoum4y ago

https://unicodeplus.com/U+202E too. You can see the point where it switches from Left-To-Right to Right-To-Left.

Jerrrry4y ago· 3 in thread

Stacking combining diacritics[1] is also fun, to make extremely tall text.

Also fun is enumerating all the characters in the Private Character section[2] to see what UI symbols are able to be inserted into unintended places.

[1] https://www.unicode.org/charts/PDF/U0300.pdf

[2] http://www.unicode.org/faq/private_use.html https://www.unicode.org/charts/PDF/UE000.pdf

hanche4y ago

> Stacking combining diacritics is also fun, to make extremely tall text.

A bit OT, but here is a classic example of that (the much upvoted stack overflow post on parsing html with regex):

https://stackoverflow.com/a/1732454

cbarrick4y ago

https://en.wikipedia.org/wiki/Zalgo_text

zeepzeepOP4y ago

I always wondered how people get these funny Twitter names, thx!

d134y ago· 3 in thread

Can anyone explain what this is all about? I’m looking at the link and threads and have absolutely no idea what’s supposed to be significant here

perlgeek4y ago

the Unicode codepoint with hex value 202E says "from here on, render the rest of the text from right to left" (something that's useful for Arabic scripts, for example).

d134y ago

Thank you!!

cryptojournal4y ago

Hahahah #metoo!

bkmeneguello4y ago· 2 in thread

https://xkcd.com/1137/

blunte4y ago

Well done. I don't understand this TFA, nor do I understand (fully) the xkcd cell. But I get the connection. Thanks :)

cipheredStones4y ago

U+202E is a Unicode codepoint, a control character that signals that letters (or other characters) should be printed right-to-left, as in Arabic or Hebrew.

soheil4y ago· 2 in thread

What's next, searching for the word death causes you to die?

chris_wot4y ago

That would be an interesting instant answer.

soheil4y ago

Gives a whole other meaning to “I’m feeling lucky!”

soheil4y ago· 2 in thread

Where does DDG get its search result? Do they scrape Google? If so how do they not bet banned both technically and legally?

sp3324y ago

https://help.duckduckgo.com/duckduckgo-help-pages/results/so...

thesuitonym4y ago

They have their own web crawlers, as well as a deal with Bing (And perhaps others)

kroltan4y ago· 2 in thread

It's intentional, if you inspect the `innerText` you'll see it's reversed there too:

    zero_click_wrapper.innerText.codePointAt(0)

Evaluates to 32. And if you think 32 = 0x20 could mean the next one would be 0x2E, then no, codePointAt(1) is 0x55.

nneonneo4y ago

I'm on the side of this being an unintentional effect.

missblit4y ago

> `innerText` doesn't include the RTL marker

It looks like in Chrome `innerText` contains the override. And the innerText spec is only 6 or so years old (!) so it wouldn't be too surprising if there were was a lingering incompatibility.

[1] https://html.spec.whatwg.org/multipage/dom.html#the-innertex... [2] https://drafts.csswg.org/css-text/#white-space-processing

echelon4y ago· 1 in thread

You still have to be mindful of \u202e in anything new that you're writing, but browsers do a much better job of not having it bleed across elements like they did back in the 2000s.

Obligatory XKCD: https://xkcd.com/1137/

Some shenanigans in the wild:

https://www.reddit.com/r/Unicode/comments/hc1rxi/i_put_a_rig...

https://twitter.com/mkolsek/status/1237123571341803522

(These are way tamer than the effects used to be.)

(Also, HN filters it out. I tried to have some fun. :P)

ocdtrekkie4y ago

I have seen it used maliciously in the wild. Email attachments like invoice.tab.pdf are actually Batch files named invoice.fdp.bat with this character inserted.

stubish4y ago· 1 in thread

account424y ago

It does look like HTML supports this via the <bdo> tag [0]:

  data:text/html,<bdo>&%23x202E;reversed</bdo>&nbsp;not reversed

So I guess this should be used to wrap any user-supplied text that allows arbitrary unicode.

Or using Unicode:

  data:text/html,&%23x2068;&%23x202E;reversed&%23x2069;&nbsp;not reversed

[0] https://developer.mozilla.org/en-US/docs/Web/HTML/Element/bd...

TadeusTaD4y ago· 1 in thread

Instantly reminded me of a relevant xkcd: https://xkcd.com/1137/

zeepzeepOP4y ago

Hey that's new to me, I'll use this, thanks.

benbristow4y ago

Reversed: U+202E RIGHT-TO-LEFT OVERRIDE, decimal: 8238, HTML: No visual representation, UTF-8: 0xE2 0x80 0xAE, block: General Punctuation

bncy4y ago

Umm, there's a little info button to the right that says that this 'quick' answer was proposed by DuckDuckHack community author.

splch4y ago

Oh that's cute! Translation for anyone curious / lazy:

Punctuation General :block ,0xAE 0x80 0xE2 :8-UTF ,representation visual No :HTML ,8238 :decimal ,OVERRIDE LEFT-TO-RIGHT 202E+U

Love the demos :)

avnigo4y ago

The funny thing is that search queries preceded by a backslash on DuckDuckGo are supposed to take you to the first search result, but that functionality seems to be buggy anyway:

https://www.reddit.com/r/duckduckgo/comments/sp9e5r/backslas...

thecosmicfrog4y ago

Reminds me of searching for the terms "do a barrel roll", "recursion" or "askew" on Google. I'm sure there's plenty of others.

chris_wot4y ago

"This Instant Answer was made by the DuckDuckHack Community.

Developer: Cosimo Streppone

Developer: mintsoft"

ryukoposting4y ago

And somehow, the "external link" icon is outside the scope of Unicode.

f3rnando4y ago

Also known as "Top Gun"

dheera4y ago