Byte Order Marks have stolen hours and days of my life. Anyone suffering the pain of developing on a windows box can relate. Windows puts BOMs by default in the front of every file. Thus windows programs silently ignore it, but then linux machines run the program and choke on the BOM. You have to specifically ask the editor if the BOM is even there, it doesn't show up in the editor by default. I have specific lines in my .vimrc[1] that prevent BOMs from ruining my day/week, but they still pop up often. I often joke there will be a byte order mark on my tombstone, along with avahi daemon.
1: https://git.sr.ht/~djha-skin/dotfiles/tree/main/item/dot-con...
Me too, to some degree. I have discovered them in a Ruby code base at work, in the middle of a line of code (copy pasted), where the Ruby interpreter thinks they are undeclared identifiers. When the code runs, it throws an exception every time that complains of “Undeclared identifier `‘”.
The dad-joke of it is that “You gotta sweep for BOMs before they blow up your code.”
I'm sure there were good reasons that BOM sounded like the right idea at Microsoft, but everyone else just used straight UTF-8 and it was fine.
In 1996, it was realized 16-bit wasn't enough, and was expanded in Unicode 2.0, which also included UTF-16, a variable-width encoding, which required the BOM.
Windows 2000 supported UTF-16 on release.
Why didn't Windows 2000 support UTF-8, which was invented in 1992 and implemented in Plan9 in that same year? Who can say...
Tell us more!
It was like a week or two later until I finally went to my friend and said I must be stupid but I can't do this it's not working and he just disabled the avahi daemon and everything started working again.
Blarg.
Sounds like that is a good choice for the option name
They always end up +0-0 - see:
Presumably the plagiarism system was just looking for exact matches of long substrings.
I hope it returns the copied string.
String "" is plagiarised
It determines the end-of-line format, tabs, bom, and nul characters:
Also, does it detect files that only contain CR as EOL characters? Or files that have different EOL characters on different lines?