Obscure indeed, I've never seen them used for anything other than hiding malicious content. Curious, I read about them on Wikipedia[1] and it turns out they were originally created to support resource forks in Services for Macintosh. Browsers also use them to flag files downloaded from the internet.
[1] https://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_.2...
Streams and resource forks are a play on a now-standard UNIX feature that almost nobody uses because it has a shitty non-file based API that also breaks most tools unless they are specifically aware of them: extended attributes. Resource forks and extended attributes are almost equivalent in every single way, except that extended attributes can only be read/written atomically (limiting their size to strings that will fit in RAM), whereas a fork or stream can be opened like a regular file. Stick that in your pipe and smoke it, UNIX sycophants, another case where Windows is more UNIX than UNIX ;)
The file-or-directory vagueness created by the hierarchy of resources buried within a file also more closely maps how the most popular path naming scheme on the planet (URLs) work: an URL can always represent both a file and a collection simultaneously, so I see this as closer to an ideal than the alternative where files can have no children at all. Sadly nobody actually uses these APIs like that, because all our tooling sucks so bad at coping with it. I sometimes wonder what the world would look like if directories on popular operating systems had simply been made 0 byte files
Mind you, OS X makes extensive use of extended attributes in addition to resource forks (and it's largely deprecated resource forks in favor of app folders). Spend some time poking around Siracusa's reviews (since Tiger); he loves to go into detail about every new way Apple makes use of extended attributes.
Also, it's not fair to say that almost nobody uses them. Chrome makes use of extended attributes, as does KDE's metadata system and a few other things.
> (limiting their size to strings that will fit in RAM)
That's an understatement. The Linux kernel API limits the size of all extended attributes to 64KB, and the most popular filesystems limit them further to 4KB. That's not really comparable to a true fork.
ZFS is the exception: its extended attributes are implemented as forks, and the maximum size of an extended attribute is the same as that of a file. Unfortunately, those aren't accessible on ZOL because the kernel won't support it, so you can really only take advantage of it on Solaris/Illumos (and maybe FreeBSD?).
- Unix xattrs have a terrible API and awful command line tools: listxattr(2) returning \0-separated character arrays with lists of attributes that are next to impossible to decipher in C? - check! Hiding certain xattrs by default based only on their names? - check!
- xattrs have magical qualities based on their names, the kernel version, the kernel configuration, and the filesystem mount options (eg. "security.selinux", "trusted.*")
- Some xattrs are \0 terminated (and the APIs set and return the \0 making them very awkward to use from shell scripts), some don't, and some are indeterminate. They can also be binary blobs.
> another case where Windows is more UNIX than UNIX
Windows has extended attributes too. Having both features makes it more like a kitchen sink.
http://docs.oracle.com/cd/E23824_01/html/821-1474/fsattr-5.h...
To be fair, it's not used by a ton of things, since it requires NTFS, disappears when files are moved to different filesystems, and various things that read and write files destroy them if they're not careful, not to mention actually enumerating the streams is tricky, last I checked.
It's a nifty feature but I'll admit NTFS is really obscure at times.
Another obscure feature of NTFS is Transactional NTFS which I'd never heard of until recently.
https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...
The problem was that the original Macintosh had limited memory and only a floppy disk, and the implementation of writing to the resource fork wasn't very good. Many programs wrote to their own resource fork for preferences and such. The tree structure wasn't updated fully until the program was closed, because writing to the floppy was so slow. If the program exited abnormally, the resource fork's links were broken. This gave the resource fork approach a bad reputation.
Since Windows programs had to run on DOS, which didn't have resource forks, Windows never used this much. Windows put non-code assets in the executable as read-only objects.
NT, which was supposed to do everything (originally it had POSIX and OS/2 compatibility, and ran on MIPS, Alpha, and x86) added generalized support for resource forks, just in case. But since most applications were written for Windows 3.1/95/ME, they didn't use those facilities.
So that's how we got here.
Not to mention in many cases on the original Macs, you probably didn't even have the program floppy in the drive when you were working, because with only 400K on a disk you had to swap to the disk with your document on it.
I recall Inside Macintosh had a big disclaimer at the top that warned "The Resource Manager IS NOT A DATABASE". It was originally just meant to handle localizable resources, but since it was already there it was handy for developers (including Apple themselves) to use to load any kind of structured data. And who didn't love going messing around in system and application files with ResEdit?
It seems they are not used anymore since sql 2014.
See for example http://www.sqlskills.com/blogs/paul/issues-around-dbcc-check...
We had a system that generated millions of images and needed to be sure that from one version to the next the images produced by a given request were the same, and also have some diagnostic data in case of problematic images. The images could be either JPG or PNG and we needed a unified way to associate arbitrary metadata with them.
We had a special mode that would store an equivalent of the request in an alternate data stream of the image. When a problem was detected we would open the alternate data stream and test the request manually.
...if your miscreant is technically illiterate and only uses NTFS.
Is that where that annoying shit comes from? Good to know. When firefox kills off DownThemAll I will then use a FAT partition to store downloaded files (and see if I can force the temporary files to go there too).
Every application on my machine was downloaded from the internet. Even the OS, after the first upgrade. That's not what is dangerous.
REGEDIT4
;https://support.microsoft.com/en-us/kb/889815
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Environment]
"SEE_MASK_NOZONECHECKS"="1"
;https://technet.microsoft.com/en-us/library/cc783259
[HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Download]
"CheckExeSignatures"="no"
"RunInvalidSignatures"=dword:00000001
;https://support.microsoft.com/kb/883260
[HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Policies\Associations]
"LowRiskFileTypes"=".zip;.rar;.nfo;.txt;.exe;.bat;.com;.cmd;.reg;.msi;.htm;.html;.gif;.bmp;.jpg;.avi;.mpg;.mpeg;.mov;.mp3;.m3u;.wav;"
"DefaultFileTypeRisk"=dword:00001808
[HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Policies\Attachments]
"SaveZoneInformation"=dword:00000001Do you never download anything bigger than or equal to 4 GiB?
It's actually really common, for someone who's used *nix for ages and expects every character to be a valid filename.
I also use them when naming stuff with hashes or UUIDs. Not having colons in filenames seems just weird to me.
Heck, even on Unix I'm annoyed that I can't simply escape slashes! It'd be nice to name files with the URLs they are taken from.
Besides the other responses, colon is a standard path separator on URIs. If you need more than one kind of them (the obvious one being a slash), the colon is often the most reasonable option. And if you decide to save data on disk, with parts of the URI as file-name (what is also very reasonable)...
Probably, the main reason this problem does not pop everywhere is that people hacking completely new tools rarely do that on Windows. And when they port, it gets hidden together with another hundred other little incompatibilities.
I use software that sometimes (but mostly not) needs files in dos 8.3. Because of this people seem to think it a good idea to use really short acronymed file names as a matter of course. If it makes sense to use a special character then people should be able to.
I actually wish colons were supported, since it's so prevalent in document titles. Question marks, too, while at it.
http://lists.opensuse.org/opensuse-buildservice/2008-12/msg0...
This causes a problem even in the POSIX environment, because the colon is used in PATH and PATH-like environment variables. Usually there is no escape mechanism.
Twitter does
https://msdn.microsoft.com/en-us/library/aa365247(VS.85).asp...
They suggest avoiding <>:"/\|?* as well as all ASCII characters 0-31.
ASCII 0 can be really fun. Lots of filesystem APIs deal with NUL-terminated strings (like, all of POSIX) so a zero byte in the middle of your string just truncates it at that point. If you use something that tolerates zero bytes for your UI strings (like NSString on the Mac, maybe C++ UI frameworks dealing with std::string) then the full string may show in the UI and you just mysteriously get a filename that's shorter on disk than what you see on screen.
If you're in a position to enforce well-formed Unicode on all platforms, you're much better off. But many things (e.g. backup systems) don't get the option to just refuse files they don't like.
There is a very important takeaway of this: case-sensitivity. UNIX cannot be case-insensitive for file names because the mapping of lowercase to uppercase characters is dependent on the character encoding used, which it doesn't know. Windows can (and does) coalesce case for file names because it knows the character set in use and can consult the relevant mapping.
This difference in behavior produces all sorts of frustrating behavior when interacting between the two platforms, e.g. the classic case of Windows SMB mounting a share from a nix server that contains two files differentiated only by case. It'll show both entries but think they both point to the same thing. On the other hand, it's easy to create file names on a Windows device that are near impossible to name on nix. These are important things to be aware of if you ever implement a cross-platform network user environment.
I kind of wonder if paths not being allowed to contain NUL or '/' was one reason why for codepoints that are represented through more than one byte in UTF-8 (-> all non ASCII codepoints) all bytes have the most significant bit set to 1 (https://en.wikipedia.org/wiki/UTF-8#Description) This makes it impossible to have multi-byte to contain valid ascii chars like `/`.
Note that macOS actually does decomposing unicode normalisation on file names, I guess because it makes handling case-insensitivity easier. (Just doing ascii case insensitivity also handles o+diaresis, but not the ö codepoint) https://developer.apple.com/library/mac/qa/qa1235/_index.htm...
/ and 0x00 for unix
:?"<>/|\* and chars 0x00 .. 0x31 for windows
'~!#$&%^; if there's a chance of filename being passed to shell w/o proper escaping.
Windows also forbids a bunch of filenames matching regex "CON|AUX|PRN|NUL|COM[1-9]|LPT[1-9]"
Also, ending filenames with space or period really messes up windows. File explorer can see it, but can't delete or rename it.
edit: fixed markup
As a related tip, if you need to name a file something like .foo in explorer, it rejects it as "not having a file name". But if you type .foo. then it accepts the name and silently strips the trailing period.
`echo missed one`disclaimer: i'm remembering something from the Windows 2003 era, so YMMV.
\/:*?"<>|
This was actually an issue with early versions of Servo on Windows: cloning the repository would fail because it contained a file with a # in the name.
https://github.com/servo/servo/commit/43c999905c01627133240c...
Yes, since it's /impossible/ for the file to have the same name on Windows as on Linux (or whatever OS was originally used to add it to the repository). And yes, git definitely ought to complain loudly in such a case.
I suppose git could be modified to be aware of alternate data streams, but there would probably still be a discrepancy with the way other tools would present the file (think about how "dir foo*" or "dir foo:bar" would behave for such a file on windows vs. linux).
> How am I supposed to use a git repo that contains such problematic files on Windows then?
Unless the repository is usable without those files, you can't. Unsurprisingly, that's the price of being able to use the same set of files in environments that have different file naming rules.
I didn't mean that I should be able to build/run/etc things in the repo that rely on the special filename and magically expect it to work. But my this behavior would make me unable to even look at the repo and perform normal git operations, regardless what it actually contains:
If you interpret "refuse to write the file" as a fatal error, I wouldn't even be able to clone the repo because the clone process would fail.
If you interpret it as non-fatal, I could browse the repo, but would always have a non-clean working set with a "deletion" I cannot undo. This means I cannot pull, rebase or checkout anything. (Unless I actually commit the deletion and remind myself not to push it. On every single branch.)
In no scenario can I access the contents of the file, even if I don't care about the filename at all. Even if I would like to fix the filename issue, I couldn't do so from a Windows pc.
That's why I think a solutions using escaping (and highly visible warnings in git status) are better. Yes, your scripts will still break but you have at least a chance to fix the mess.
You should probably not check out such code on Windows in the first place, but if you accidentally do, then you really need to get loud warnings splashed everywhere.
This would just be adding one more to the list.
As for how you're supposed to use the Git repo on Windows: I guess you aren't?
Long extensions and multiple dots are perfectly valid in Windows filenames. I use them all the time.
They're not like colon which has a special meaning referencing alternate streams.
Stock Cygwin does something special with the colon character, so the Cygwin git shouldn't have this problem. A path like "C:foo.txt" is not understood by stock Cygwin as a relative reference in the current directory of drive C; the colon is mapped to some other character and then this is just a regular one-component pathname.
In the Cygnal project (Cygwin Native Appplication Library), paths passed to library are considered native. So that certain useful virtual filesystem areas remain available, I remapped Cygwin's "/dev" and "/proc" to "dev:/" and "proc:/", taking advantage of the special status of the colon to take this liberty. You can list these directories (opendir, readdir, ...) and of course open the entries inside them; but chdir is not allowed into these locations. (Unlike under stock Cygwin, where you can chdir to /dev). chdir is not allowed because then that would render the library's current working directory out of sync with the Win32 process current working directory, which would not be "native" behavior.
I remember I was maintaining few vb6 applications and I often tried to create a "con.udl" file just to trigger the wizard and windows just complained with an error that didn't make any sense. So, I started to use conn.udl.
A bit late, but is good to know.
Here is an screenshot on a Windows 7: https://dl.dropbox.com/s/qg5fxx01mnktw79/ss-2016-07-20T17-19...
Edit: add screenshot
Keep in mind that for each file you can have multiple data-streams. Suppose the system reports the total of al the streams for foo combined... You would be surprised if you would read the reported number of bytes from foo and see it crash because there are in reality no bytes in the default stream.
However, there are other tools to report the presence of alternative streams. This is not a feature intended for casual end-users.
* Internet Explorer (because it can't explore the modern internet)
* File Explorer (because it can't explore files on my system)
Edit: Apparently colon is _still_ a special character on Mac! http://stackoverflow.com/questions/13298434/colon-appears-as...
1. "Macintosh System Software"
2. "Mac OS" (starting with 7.5/7.6)
3. "Mac OS X"
4. "OS X" (starting with Mountain Lion)
5. "macOS" (starting with Sierra)
(For me it went MacOs, OS X, then Windows 10).
That reminds me of web filtering software that blocked my search for "java proxy", but allowed "java procy", which google understood!
I was running a fuzz test on a backup tool, which verified that file data and metadata (including timestamps) as reflected by Windows were exactly as produced by the fuzz test.
I noticed that for some ".eml" files this was not the case. The mtime of these files was being modified by something else after the initial create by the application. At last, it came down to a Windows process which was automatically indexing ".eml" files and creating an ADS for each of them, thereby touching the mtime.
This was intentional on the part of Windows, but I never saw it coming.