Git for Windows accidentally creates NTFS alternate data streams (opens in new tab)

(latkin.org)

349 pointslatkin9y ago176 comments

176 comments

112 comments · 13 top-level

smhenderson9y ago· 31 in thread

The root cause of all this is a relatively obscure NTFS feature called alternate data streams.

Obscure indeed, I've never seen them used for anything other than hiding malicious content. Curious, I read about them on Wikipedia[1] and it turns out they were originally created to support resource forks in Services for Macintosh. Browsers also use them to flag files downloaded from the internet.

[1] https://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_.2...

_wmd9y ago

Hardly obscure, every modern OS has an equivalent feature, but only OSX and Windows unify it with the regular filesystem API.

Streams and resource forks are a play on a now-standard UNIX feature that almost nobody uses because it has a shitty non-file based API that also breaks most tools unless they are specifically aware of them: extended attributes. Resource forks and extended attributes are almost equivalent in every single way, except that extended attributes can only be read/written atomically (limiting their size to strings that will fit in RAM), whereas a fork or stream can be opened like a regular file. Stick that in your pipe and smoke it, UNIX sycophants, another case where Windows is more UNIX than UNIX ;)

The file-or-directory vagueness created by the hierarchy of resources buried within a file also more closely maps how the most popular path naming scheme on the planet (URLs) work: an URL can always represent both a file and a collection simultaneously, so I see this as closer to an ideal than the alternative where files can have no children at all. Sadly nobody actually uses these APIs like that, because all our tooling sucks so bad at coping with it. I sometimes wonder what the world would look like if directories on popular operating systems had simply been made 0 byte files

amyjess9y ago

> feature that almost nobody uses because it has a shitty non-file based API that also breaks most tools unless they are specifically aware of them: extended attributes

Mind you, OS X makes extensive use of extended attributes in addition to resource forks (and it's largely deprecated resource forks in favor of app folders). Spend some time poking around Siracusa's reviews (since Tiger); he loves to go into detail about every new way Apple makes use of extended attributes.

Also, it's not fair to say that almost nobody uses them. Chrome makes use of extended attributes, as does KDE's metadata system and a few other things.

> (limiting their size to strings that will fit in RAM)

That's an understatement. The Linux kernel API limits the size of all extended attributes to 64KB, and the most popular filesystems limit them further to 4KB. That's not really comparable to a true fork.

ZFS is the exception: its extended attributes are implemented as forks, and the maximum size of an extended attribute is the same as that of a file. Unfortunately, those aren't accessible on ZOL because the kernel won't support it, so you can really only take advantage of it on Solaris/Illumos (and maybe FreeBSD?).

1 more reply

rwmj9y ago

You missed:

- Unix xattrs have a terrible API and awful command line tools: listxattr(2) returning \0-separated character arrays with lists of attributes that are next to impossible to decipher in C? - check! Hiding certain xattrs by default based only on their names? - check!

- xattrs have magical qualities based on their names, the kernel version, the kernel configuration, and the filesystem mount options (eg. "security.selinux", "trusted.*")

- Some xattrs are \0 terminated (and the APIs set and return the \0 making them very awkward to use from shell scripts), some don't, and some are indeterminate. They can also be binary blobs.

2 more replies

Dylan168079y ago

I think if "almost nobody uses it" it's fair to call it obscure.

> another case where Windows is more UNIX than UNIX

Windows has extended attributes too. Having both features makes it more like a kitchen sink.

1 more reply

ksherlock9y ago

Solaris unfies it too. You can even use the runat command to open a shell where extended attributes are exposed as and can be manipulated as regular files.

http://docs.oracle.com/cd/E23824_01/html/821-1474/fsattr-5.h...

wfunction9y ago

NTFS has extended attributes.

vocatus_gate9y ago

How....do you know this kind of stuff? Great read, thanks.

banana_giraffe9y ago

It's used by all the browsers on Windows these days. They all create a 'Zone.Identifier' stream when a file is downloaded to mark is downloaded. It's content's is what triggers the "You downloaded this file! It's Evil!' warning in Windows.

To be fair, it's not used by a ton of things, since it requires NTFS, disappears when files are moved to different filesystems, and various things that read and write files destroy them if they're not careful, not to mention actually enumerating the streams is tricky, last I checked.

ryanburk9y ago

some history: this was introduced with XP SP2 as part of the windows security push. was a clever way to track the information without touching the binary data directly and supporting it in IE meant the majority of customers saw the benefit right away. and since most people (in windows) don't move files across file systems.

1 more reply

CoolGuySteve9y ago

iTunes for Windows uses them to store how much of a streaming file it has already downloaded. I wrote it (but I won't take credit for most things in iTunes for Windows)

It's a nifty feature but I'll admit NTFS is really obscure at times.

enjoy-your-stay9y ago

Great place to store meta data about a file, never thought about that before. I guess if the download stream is interrupted it reads that to know where to pick up again if resumed?

Another obscure feature of NTFS is Transactional NTFS which I'd never heard of until recently.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...

1 more reply

Animats9y ago

The original idea on the Macintosh was to have some place to put non-code assets - icons, images, etc - that came with an application. So MacOS files had a "data fork" and a "resource fork". The "resource fork" was a tree structure managed by the Resource Manager.

The problem was that the original Macintosh had limited memory and only a floppy disk, and the implementation of writing to the resource fork wasn't very good. Many programs wrote to their own resource fork for preferences and such. The tree structure wasn't updated fully until the program was closed, because writing to the floppy was so slow. If the program exited abnormally, the resource fork's links were broken. This gave the resource fork approach a bad reputation.

Since Windows programs had to run on DOS, which didn't have resource forks, Windows never used this much. Windows put non-code assets in the executable as read-only objects.

NT, which was supposed to do everything (originally it had POSIX and OS/2 compatibility, and ran on MIPS, Alpha, and x86) added generalized support for resource forks, just in case. But since most applications were written for Windows 3.1/95/ME, they didn't use those facilities.

So that's how we got here.

kalleboo9y ago

> The tree structure wasn't updated fully until the program was closed, because writing to the floppy was so slow

Not to mention in many cases on the original Macs, you probably didn't even have the program floppy in the drive when you were working, because with only 400K on a disk you had to swap to the disk with your document on it.

I recall Inside Macintosh had a big disclaimer at the top that warned "The Resource Manager IS NOT A DATABASE". It was originally just meant to handle localizable resources, but since it was already there it was handy for developers (including Apple themselves) to use to load any kind of structured data. And who didn't love going messing around in system and application files with ResEdit?

1 more reply

zwieback9y ago

In the early 90's I worked at a company that made server software that allowed Mac AppleTalk (AFP) clients to connect to a PC network. Eventually IBM had us write a custom version for OS/2 called LAN Server for Macintosh. We were really excited about using the streams/resource forks feature but had to give up eventually. We used a separate database to store what's in the resource forks instead.

Ecio789y ago

SQLServer uses it from version 2005 til 2012 to create databases snapshots in order to run DBCC CHECKDB (consistency check). So for actually a critical feature of MSSQL. I suppose this was the reason why ReFS was not supported for SQL data disks.

It seems they are not used anymore since sql 2014.

See for example http://www.sqlskills.com/blogs/paul/issues-around-dbcc-check...

jobigoud9y ago

I have used them.

We had a system that generated millions of images and needed to be sure that from one version to the next the images produced by a given request were the same, and also have some diagnostic data in case of problematic images. The images could be either JPG or PNG and we needed a unified way to associate arbitrary metadata with them.

We had a special mode that would store an equivalent of the request in an alternate data stream of the image. When a problem was detected we would open the alternate data stream and test the request manually.

vocatus_gate9y ago

Very cool niche case. Thanks for sharing.

2close4comfort9y ago

they should market this as a feature! alternate streams for people who think it is "an obscure feature" I mean that many people using alternate streams would be interesting for anyone forensicating systems for malware or as protection from...

wslh9y ago

This is used in specific sectors like data loss prevention. For example, you can tag files based on the security sensitiveness and if the file is copied it retains the tags.

__jal9y ago

> if the file is copied it retains the tags.

...if your miscreant is technically illiterate and only uses NTFS.

andrewaylett9y ago

I've worked on Windows-only software that used resource forks. It stored mail messages, one per file, with the message metadata in a resource fork so we didn't have to modify the file containing the actual mail when the metadata changed.

jahewson9y ago

There were once plans to store the individual streams which make up Microsoft Office files (OLE2) as alternate data streams, which would have been... interesting.

J_Darnley9y ago

> Browsers also use them to flag files downloaded from the internet.

Is that where that annoying shit comes from? Good to know. When firefox kills off DownThemAll I will then use a FAT partition to store downloaded files (and see if I can force the temporary files to go there too).

mfenniak9y ago

I laugh these days when OSX warns me, "This application was downloaded from the internet." when I first access an app.

Every application on my machine was downloaded from the internet. Even the OS, after the first upgrade. That's not what is dangerous.

ChoGGi9y ago

stick this in a reg file

  REGEDIT4
  
  ;https://support.microsoft.com/en-us/kb/889815
  [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Environment]
  "SEE_MASK_NOZONECHECKS"="1"
  
  ;https://technet.microsoft.com/en-us/library/cc783259
  [HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Download]
  "CheckExeSignatures"="no"
  "RunInvalidSignatures"=dword:00000001
  
  ;https://support.microsoft.com/kb/883260
  [HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Policies\Associations]
  "LowRiskFileTypes"=".zip;.rar;.nfo;.txt;.exe;.bat;.com;.cmd;.reg;.msi;.htm;.html;.gif;.bmp;.jpg;.avi;.mpg;.mpeg;.mov;.mp3;.m3u;.wav;"
  "DefaultFileTypeRisk"=dword:00001808
  
  [HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Policies\Attachments]
  "SaveZoneInformation"=dword:00000001

icebraining9y ago

Unless it has changed in newer Windows versions, you can simply disable that warning in the Internet Settings, no need to keep files in an outdated filesystem.

anonymfus9y ago

>use a FAT partition to store downloaded files

Do you never download anything bigger than or equal to 4 GiB?

1 more reply

donatj9y ago

I used them for a VCS thought experiment I was playing with a while ago.

sixothree9y ago

Just pretend they're "resource forks".

tamana9y ago

It's metadata. It's as obscure as file permissions bits.

OJFord9y ago

Except that millions of developers routinely make use of file permissions; as evidenced by this discussion, many - perhaps even a majority - haven't heard of alternate data streams.

2 more replies

artifaxx9y ago· 20 in thread

That is quite the obscure and interesting issue to run into! Who puts colons in their filenames though? I haven't ever seen that used...

lorenzhs9y ago

I do: "find ~ -iname ":" | wc -l" yields 49 entries. Some of those are Xorg-related (which identifies displays with a syntax like ":0.0", which ends up in some log file names) or gvfs stuff. But most are PDFs, usually research papers which I tend to save as "title: subtitle - authors.pdf".

RX149y ago

High score! In all my filesystem I get about 150,000 files with a colon in their name. Many of them are part of the filesystems of containers which seem to use colons in the names of their libraries and dpkg packages. Many archlinux packages have colons in their names too. I also have some media files and documents with colons in names too.

It's actually really common, for someone who's used *nix for ages and expects every character to be a valid filename.

artifaxx9y ago

It is certainly interesting to see the different assumptions people make. Thanks for the details.

zeveb9y ago

I use colons when indicating the ISO 8601[1] timestamp for stuff; it's much more readable with ':' than not, e.g. 2016-07-20T17:04:30Z vice 20160720T170430Z.

I also use them when naming stuff with hashes or UUIDs. Not having colons in filenames seems just weird to me.

Heck, even on Unix I'm annoyed that I can't simply escape slashes! It'd be nice to name files with the URLs they are taken from.

[1] https://en.wikipedia.org/wiki/ISO_8601

marcosdumay9y ago

Lots of people, that's who.

Besides the other responses, colon is a standard path separator on URIs. If you need more than one kind of them (the obvious one being a slash), the colon is often the most reasonable option. And if you decide to save data on disk, with parts of the URI as file-name (what is also very reasonable)...

Probably, the main reason this problem does not pop everywhere is that people hacking completely new tools rarely do that on Windows. And when they port, it gets hidden together with another hundred other little incompatibilities.

LukeShu9y ago

The Maildir format uses colons in filenames, which has created problems with running certain email software on Windows.

Tharkun9y ago

Why wouldn't you put colons in filenames? Unless of course you use Windows. Colons, spaces, backslashes, whatever.

TazeTSchnitzel9y ago

Useful if only for timestamps. 2016-07-20T16:27:00Z!

79529y ago

Artificially restricting filenames is an antipattern that makes things harder to read.

I use software that sometimes (but mostly not) needs files in dos 8.3. Because of this people seem to think it a good idea to use really short acronymed file names as a matter of course. If it makes sense to use a special character then people should be able to.

2 more replies

artifaxx9y ago

I assume any files I make should be usable by any major operating system, so to be sure I avoid any special characters in file names.

amock9y ago

Mac OS 9 and earlier used colons as path separators and some support for that made it into OS X, although it might be gone by now. Apple's guidelines https://support.apple.com/en-us/HT202808 recommend not using colons.

2 more replies

Pxtl9y ago

Could say the same about spaces and a plethora of systems fail catastrophically on those.

1 more reply

awqrre9y ago

Do you also use slashes? what about backticks and quotes?

belovedeagle9y ago

The colon should be reserved in Unix to separate path names a la PATH.

2 more replies

mtone9y ago

As a lazy user, I often copy paste the document title as the filename. Windows generously trims any disallowed character, but if it didn't, lots of my files would contain colons.

I actually wish colons were supported, since it's so prevalent in document titles. Question marks, too, while at it.

hashhar9y ago

I think I read somewhere that Microsoft were investigating ways in which to get rid of the FAT32 backward compatibility issues with NTFS like the 255 character path element limit and the character limits in filenames. You can manage such files using the NTFS way of addressing them, "\\?\C:\Example\file:with?illegalstuff"

1 more reply

kazinator9y ago

Who puts colons into their filenames? Oh, short-sighted people like the designers of OBS (OpenSUSE Build System) who introduced a colon convention into project names.

http://lists.opensuse.org/opensuse-buildservice/2008-12/msg0...

This causes a problem even in the POSIX environment, because the colon is used in PATH and PATH-like environment variables. Usually there is no escape mechanism.

msravi9y ago

Well, it isn't uncommon to create a log file with a timestamp and a name like foobar.$datetime.log with $datetime expanding to something like 2016-07-20T22:08:38

chippy9y ago

> Who puts colons in their filenames though?

Twitter does

https://pbs.twimg.com/media/Cn1tGFoXgAApMt3.jpg:large

Houshalter9y ago

I try to save papers sometimes and give the filename the same name as the paper, which often has a colon in it.

mcculley9y ago· 12 in thread

This is interesting. I was just recently working on an app where I wanted to ensure the UI wouldn't accept problematic characters in filenames. Obviously, Unix has problems with '/'. I'll add ':' to the list. That's unfortunate. What else should portable apps avoid?

mikeash9y ago

Microsoft seems to have a fairly comprehensive list:

https://msdn.microsoft.com/en-us/library/aa365247(VS.85).asp...

They suggest avoiding <>:"/\|?* as well as all ASCII characters 0-31.

ASCII 0 can be really fun. Lots of filesystem APIs deal with NUL-terminated strings (like, all of POSIX) so a zero byte in the middle of your string just truncates it at that point. If you use something that tolerates zero bytes for your UI strings (like NSString on the Mac, maybe C++ UI frameworks dealing with std::string) then the full string may show in the UI and you just mysteriously get a filename that's shorter on disk than what you see on screen.

outworlder9y ago

ASCII 255 used to be fun in the Windows 3.1 days. DOS would handle just fine (displaying whitespace). The Windows Explorer (or whatever it was called back then) would not let you select a directory named like that. Basically this made a directory inaccessible, unless dealing with very tech savvy people.

geofft9y ago

If you want a serious rabbit hole, think about Unicode characters in filenames. Windows filenames are supposedly UTF-16, but they do not enforce the requirement that surrogate pairs (which represent characters over 0xFFFF, like emoji) must actually be paired, so you're not guaranteed that a UTF-16 decoder will actually read a filename successfully. UNIX filenames are just untyped byte strings that don't contain NUL or /, which by convention are ASCII or UTF-8 these days, but nothing enforces that; if you run ls, it will just print whatever bytes are in the filename to the terminal, and make it the terminal's problem. So round-tripping an arbitrarily weird UNIX filename to Windows, or vice versa, is challenging.

If you're in a position to enforce well-formed Unicode on all platforms, you're much better off. But many things (e.g. backup systems) don't get the option to just refuse files they don't like.

jcrawfordor9y ago

This points to a fascinating underlying difference between the operating systems. In general, UNIX attempts to be completely non-character-set aware and is built with the philosophy that how to render characters is strictly the terminal's problem. Windows, on the other hand, has a notion of a system character encoding and will try to keep everything compliant with it (with mixed success).

There is a very important takeaway of this: case-sensitivity. UNIX cannot be case-insensitive for file names because the mapping of lowercase to uppercase characters is dependent on the character encoding used, which it doesn't know. Windows can (and does) coalesce case for file names because it knows the character set in use and can consult the relevant mapping.

This difference in behavior produces all sorts of frustrating behavior when interacting between the two platforms, e.g. the classic case of Windows SMB mounting a share from a nix server that contains two files differentiated only by case. It'll show both entries but think they both point to the same thing. On the other hand, it's easy to create file names on a Windows device that are near impossible to name on nix. These are important things to be aware of if you ever implement a cross-platform network user environment.

1 more reply

legulere9y ago

There's WTF-8 to convert broken UTF-16 into something UTF-8ish. https://simonsapin.github.io/wtf-8/

I kind of wonder if paths not being allowed to contain NUL or '/' was one reason why for codepoints that are represented through more than one byte in UTF-8 (-> all non ASCII codepoints) all bytes have the most significant bit set to 1 (https://en.wikipedia.org/wiki/UTF-8#Description) This makes it impossible to have multi-byte to contain valid ascii chars like `/`.

Note that macOS actually does decomposing unicode normalisation on file names, I guess because it makes handling case-insensitivity easier. (Just doing ascii case insensitivity also handles o+diaresis, but not the ö codepoint) https://developer.apple.com/library/mac/qa/qa1235/_index.htm...

warbiscuit9y ago

This is cribbing from source of a filename sanitizer in one of my company's internal libraries. The function is a little... paranoid... so I'm not positive all of these are actually forbidden.

/ and 0x00 for unix

:?"<>/|\* and chars 0x00 .. 0x31 for windows

'~!#$&%^; if there's a chance of filename being passed to shell w/o proper escaping.

Windows also forbids a bunch of filenames matching regex "CON|AUX|PRN|NUL|COM[1-9]|LPT[1-9]"

Also, ending filenames with space or period really messes up windows. File explorer can see it, but can't delete or rename it.

edit: fixed markup

Dylan168079y ago

> Also, ending filenames with space or period really messes up windows. File explorer can see it, but can't delete or rename it.

As a related tip, if you need to name a file something like .foo in explorer, it rejects it as "not having a file name". But if you type .foo. then it accepts the name and silently strips the trailing period.

DeltaWhy9y ago

That should be NUL (one L). Interestingly, when I tried it in Powershell, 'type NUL' reports that the file does not exist, but in CMD, 'type NUL' outputs nothing (it's the DOS equivalent of UNIX's /dev/null). So apparently some APIs will allow you to use those as filenames while others will choke on them.

2 more replies

nradov9y ago

Yes the reserved file names based on device names can be a tricky issue for portability. We ran into a problem where a source code file was named Con.java and it was impossible to use that repository on Windows. Had to rename it as Con_.java to make it work.

mzs9y ago

  `echo missed one`

finnh9y ago

IIRC, windows has a dialog that shows their full list of disallowed characters if you try to use one of them ... so try to make a file with (eg) "\" in the name and see what the dialog says.

disclaimer: i'm remembering something from the Windows 2003 era, so YMMV.

thoth9y ago

It's still there, as a balloon popup that says a file name can't contain the following characters:

\/:*?"<>|

1 more reply

xg159y ago· 8 in thread

The problem should be addressed, but the proposed workaround seems strange. So git should refuse to write the file to disk? How am I supposed to use a git repo that contains such problematic files on Windows then?

ajross9y ago

How would you propose to "use" a git repo that contains files with unrepresentable file names in the first place? It's the repo that's not portable, not git. You'd have the same problem if someone handed you a zip file or tarball.

Ruud-v-A9y ago

What is the alternative? Renaming the file?

This was actually an issue with early versions of Servo on Windows: cloning the repository would fail because it contained a file with a # in the name.

https://github.com/servo/servo/commit/43c999905c01627133240c...

SpaceManiac9y ago

MSYS2 (a Cygwin-based platform) does renaming, mapping colons to U+F03A from the Private Use Area (which renders in Explorer like a bullet point). Its git package cloned the repository from the article with no problem, "ls" shows "foo:bar", and "cat foo:bar" works. Opening the file in non-MSYS tools also has no problems with the exotic character.

moogly9y ago

'#' is allowed. It appears the issue was the wildcard '?' character, which could be argued isn't the best idea to use on *nix either.

usefulcat9y ago

> So git should refuse to write the file to disk?

Yes, since it's /impossible/ for the file to have the same name on Windows as on Linux (or whatever OS was originally used to add it to the repository). And yes, git definitely ought to complain loudly in such a case.

I suppose git could be modified to be aware of alternate data streams, but there would probably still be a discrepancy with the way other tools would present the file (think about how "dir foo*" or "dir foo:bar" would behave for such a file on windows vs. linux).

> How am I supposed to use a git repo that contains such problematic files on Windows then?

Unless the repository is usable without those files, you can't. Unsurprisingly, that's the price of being able to use the same set of files in environments that have different file naming rules.

xg159y ago

Complaining loudly and making a repo inaccessible are two different things.

I didn't mean that I should be able to build/run/etc things in the repo that rely on the special filename and magically expect it to work. But my this behavior would make me unable to even look at the repo and perform normal git operations, regardless what it actually contains:

If you interpret "refuse to write the file" as a fatal error, I wouldn't even be able to clone the repo because the clone process would fail.

If you interpret it as non-fatal, I could browse the repo, but would always have a non-clean working set with a "deletion" I cannot undo. This means I cannot pull, rebase or checkout anything. (Unless I actually commit the deletion and remind myself not to push it. On every single branch.)

In no scenario can I access the contents of the file, even if I don't care about the filename at all. Even if I would like to fix the filename issue, I couldn't do so from a Windows pc.

That's why I think a solutions using escaping (and highly visible warnings in git status) are better. Yes, your scripts will still break but you have at least a chance to fix the mess.

1 more reply

smellf9y ago

This is how Tortoise SVN handles SVN paths that are invalid on Windows - it doesn't write the offending file.

You should probably not check out such code on Windows in the first place, but if you accidentally do, then you really need to get loud warnings splashed everywhere.

blakeyrat9y ago

He says in the article (I haven't independently confirmed) that there's already other paths Git will refuse because Windows will misinterpret them.

This would just be adding one more to the list.

As for how you're supposed to use the Git repo on Windows: I guess you aren't?

sickbeard9y ago· 8 in thread

putting colons in your filenames are almost as weird as alternate data streams.

cordite9y ago

It's not a forward slash or a NUL byte. And it is a printable character. Doesn't seem so wrong to me.

cygx9y ago

It's used as separator in various places on *nix (eg PATH).

1 more reply

stevekemp9y ago

Some tools assume particular characters mean things. For example GNU tar will assume if you find ":" in the filename of the archive it's marks a hostname..

1 more reply

tamana9y ago

A colon is a forward slash on MacOS

jey9y ago

How do you feel about file extensions longer than three characters? How about filenames with multiple dots in them?

Stratoscope9y ago

Not sure what the question is here, can you clarify?

Long extensions and multiple dots are perfectly valid in Windows filenames. I use them all the time.

They're not like colon which has a special meaning referencing alternate streams.

2 more replies

thiagobbt9y ago

I have no problem with it if they actually have a purpose, like .docx or .tar.gz

spdionis9y ago

So I can't name my file "Theory of everything: 42.pdf"?

kazinator9y ago· 6 in thread

The colon has been special since the dawn of DOS. For instance, you cannot use "con:" as a file name. (In fact, in a fit of extreme stupidity, DOS also claimed some devices with no Colon suffix, like "con" and "prn", effectively making these into globally reserved names in any directory.)

Stock Cygwin does something special with the colon character, so the Cygwin git shouldn't have this problem. A path like "C:foo.txt" is not understood by stock Cygwin as a relative reference in the current directory of drive C; the colon is mapped to some other character and then this is just a regular one-component pathname.

In the Cygnal project (Cygwin Native Appplication Library), paths passed to library are considered native. So that certain useful virtual filesystem areas remain available, I remapped Cygwin's "/dev" and "/proc" to "dev:/" and "proc:/", taking advantage of the special status of the colon to take this liberty. You can list these directories (opendir, readdir, ...) and of course open the entries inside them; but chdir is not allowed into these locations. (Unlike under stock Cygwin, where you can chdir to /dev). chdir is not allowed because then that would render the library's current working directory out of sync with the Win32 process current working directory, which would not be "native" behavior.

naz9y ago

I remember when any attempted access to c:\con\con would bluescreen any windows machine. Hours of teenage fun sending people to a website I'd set up with <img src="file://c:\con\con">

tonyarkles9y ago

You could do it over Windows File Sharing too! \\someone-elses-machine\c\con\con would blue-screen their machine!

jfroma9y ago

This annoys me few times 7 years ago, and I never searched the reason :)

I remember I was maintaining few vb6 applications and I often tried to create a "con.udl" file just to trigger the wizard and windows just complained with an error that didn't make any sense. So, I started to use conn.udl.

A bit late, but is good to know.

Here is an screenshot on a Windows 7: https://dl.dropbox.com/s/qg5fxx01mnktw79/ss-2016-07-20T17-19...

Edit: add screenshot

kazinator9y ago

I'm surprised it's a problem even when the suffix .udl is present.

1 more reply

dboreham9y ago

The colon pre-dates DOS by a long time. I seem to recall it in RSX-11 pip. Definitely it was present in CP/M : https://en.wikipedia.org/wiki/Peripheral_Interchange_Program

kazinator9y ago

I used CP/M with a Z80 coprocessor card on an Apple II. I didn't know RSX-11 also had pip, though.

1 more reply

Someone9y ago· 4 in thread

It's not alone. In MS SQL Server, you can name a database "foo:bar". If you give a database such a name when you restore it from disk, you'll find that the database takes zero bytes on disk (at least, that's what Explorer claims) Your disk space is gone, though.

_nedR9y ago

What? You are saying windows explorer doesn't handle this feature properly? Thats insane.

exceptione9y ago

I think that is the correct behaviour though. The default is left empty in this case, so it should indeed be zero bytes.

Keep in mind that for each file you can have multiple data-streams. Suppose the system reports the total of al the streams for foo combined... You would be surprised if you would read the reported number of bytes from foo and see it crash because there are in reality no bytes in the default stream.

However, there are other tools to report the presence of alternative streams. This is not a feature intended for casual end-users.

2 more replies

sitharus9y ago

There are a lot of NTFS features that Explorer doesn't handle properly, like file paths greater than 255 characters.

SolarNet9y ago

Yea among the first things I replace in a new install

* Internet Explorer (because it can't explore the modern internet)

* File Explorer (because it can't explore files on my system)

AWildDHHAppears9y ago· 4 in thread

MacOs (i.e., Os9 and before) had special meaning for colons, too. I wonder what would happen for git on those platforms.

Edit: Apparently colon is _still_ a special character on Mac! http://stackoverflow.com/questions/13298434/colon-appears-as...

lostlogin9y ago

And this is how we enter the new era. It goes MacOs, OS X then macOS. Unfortunately the 10.xx has been kept to mess with what is (capitalisation aside?) a nice tidy up. Maybe dropping the names part, Sierra, would have made it better. Relying on readers to spot your capitalisation isn't ideal at all, and what if you start a sentence with macOS, how do you capitalise it?

kalleboo9y ago

To be super pedantic, it went

1. "Macintosh System Software"

2. "Mac OS" (starting with 7.5/7.6)

3. "Mac OS X"

4. "OS X" (starting with Mountain Lion)

5. "macOS" (starting with Sierra)

AWildDHHAppears9y ago

I fixed my MacOs in my post.

(For me it went MacOs, OS X, then Windows 10).

DashRattlesnake9y ago

Isn't the colon the directory separator character in HFS, akin to the unix '/' and windows '\'?

duncans9y ago· 2 in thread

Related to this bug: used to be a vulnerability in IIS back in the late 90s where you could append ::$DATA to a file name (e.g Foo.asp::$DATA) and download a server-side script's source code.

jameshart9y ago

Related - meaning the ::$DATA was interpreted as a request for an alternate data stream from the file, and then read the default stream?

duncans9y ago

More info https://technet.microsoft.com/en-us/library/security/ms98-00... - seems to imply that $DATA is the default stream.

Grue39y ago· 2 in thread

I had a related problem with Dropbox. Some files uploaded from my Linux machine were not synced to my Windows machine. Later I narrowed down this problem to images being saved from Twitter, which have URLs ending with ":orig". On Linux, Firefox happily saves such images as "blahblah:orig.jpg", whereas on Windows it uses space instead of a colon. And of course Dropbox on Windows would completely ignore filenames that contain colons and tell that the directories are synced, when they obviously aren't.

Ieyeefae9y ago

There's https://www.dropbox.com/bad_files_check

reycharles9y ago

I get hit with a login page. Can anyone describe what is linked to?

2 more replies

fowl29y ago· 2 in thread

"McAfee Web Gateway" thinks this is porn, great.

kristianp9y ago

Why would that be I wonder? I don't see any keywords that might trigger it.

That reminds me of web filtering software that blocked my search for "java proxy", but allowed "java procy", which google understood!

voltagex_9y ago

So does BlueCoat. I've submitted it for review, but I think McAffee is maintaining its own list.

_urga9y ago

The flip-side of this:

I was running a fuzz test on a backup tool, which verified that file data and metadata (including timestamps) as reflected by Windows were exactly as produced by the fuzz test.

I noticed that for some ".eml" files this was not the case. The mtime of these files was being modified by something else after the initial create by the application. At last, it came down to a Windows process which was automatically indexing ".eml" files and creating an ADS for each of them, thereby touching the mtime.

This was intentional on the part of Windows, but I never saw it coming.

ragsagar9y ago

Wonder why this site is blocked in UAE! :|

j / k navigate · click thread line to collapse

176 comments

112 comments · 13 top-level

smhenderson9y ago· 31 in thread

The root cause of all this is a relatively obscure NTFS feature called alternate data streams.

[1] https://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_.2...

_wmd9y ago

Hardly obscure, every modern OS has an equivalent feature, but only OSX and Windows unify it with the regular filesystem API.

amyjess9y ago

> feature that almost nobody uses because it has a shitty non-file based API that also breaks most tools unless they are specifically aware of them: extended attributes

Also, it's not fair to say that almost nobody uses them. Chrome makes use of extended attributes, as does KDE's metadata system and a few other things.

> (limiting their size to strings that will fit in RAM)

1 more reply

rwmj9y ago

You missed:

- xattrs have magical qualities based on their names, the kernel version, the kernel configuration, and the filesystem mount options (eg. "security.selinux", "trusted.*")

- Some xattrs are \0 terminated (and the APIs set and return the \0 making them very awkward to use from shell scripts), some don't, and some are indeterminate. They can also be binary blobs.

2 more replies

Dylan168079y ago

I think if "almost nobody uses it" it's fair to call it obscure.

> another case where Windows is more UNIX than UNIX

Windows has extended attributes too. Having both features makes it more like a kitchen sink.

1 more reply

ksherlock9y ago

Solaris unfies it too. You can even use the runat command to open a shell where extended attributes are exposed as and can be manipulated as regular files.

http://docs.oracle.com/cd/E23824_01/html/821-1474/fsattr-5.h...

wfunction9y ago

NTFS has extended attributes.

vocatus_gate9y ago

How....do you know this kind of stuff? Great read, thanks.

banana_giraffe9y ago

ryanburk9y ago

1 more reply

CoolGuySteve9y ago

iTunes for Windows uses them to store how much of a streaming file it has already downloaded. I wrote it (but I won't take credit for most things in iTunes for Windows)

It's a nifty feature but I'll admit NTFS is really obscure at times.

enjoy-your-stay9y ago

Great place to store meta data about a file, never thought about that before. I guess if the download stream is interrupted it reads that to know where to pick up again if resumed?

Another obscure feature of NTFS is Transactional NTFS which I'd never heard of until recently.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...

1 more reply

Animats9y ago

Since Windows programs had to run on DOS, which didn't have resource forks, Windows never used this much. Windows put non-code assets in the executable as read-only objects.

So that's how we got here.

kalleboo9y ago

> The tree structure wasn't updated fully until the program was closed, because writing to the floppy was so slow

1 more reply

zwieback9y ago

Ecio789y ago

It seems they are not used anymore since sql 2014.

See for example http://www.sqlskills.com/blogs/paul/issues-around-dbcc-check...

jobigoud9y ago

I have used them.

vocatus_gate9y ago

Very cool niche case. Thanks for sharing.

2close4comfort9y ago

wslh9y ago

This is used in specific sectors like data loss prevention. For example, you can tag files based on the security sensitiveness and if the file is copied it retains the tags.

__jal9y ago

> if the file is copied it retains the tags.

...if your miscreant is technically illiterate and only uses NTFS.

andrewaylett9y ago

jahewson9y ago

There were once plans to store the individual streams which make up Microsoft Office files (OLE2) as alternate data streams, which would have been... interesting.

J_Darnley9y ago

> Browsers also use them to flag files downloaded from the internet.

mfenniak9y ago

I laugh these days when OSX warns me, "This application was downloaded from the internet." when I first access an app.

Every application on my machine was downloaded from the internet. Even the OS, after the first upgrade. That's not what is dangerous.

ChoGGi9y ago

stick this in a reg file

  REGEDIT4
  
  ;https://support.microsoft.com/en-us/kb/889815
  [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Environment]
  "SEE_MASK_NOZONECHECKS"="1"
  
  ;https://technet.microsoft.com/en-us/library/cc783259
  [HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Download]
  "CheckExeSignatures"="no"
  "RunInvalidSignatures"=dword:00000001
  
  ;https://support.microsoft.com/kb/883260
  [HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Policies\Associations]
  "LowRiskFileTypes"=".zip;.rar;.nfo;.txt;.exe;.bat;.com;.cmd;.reg;.msi;.htm;.html;.gif;.bmp;.jpg;.avi;.mpg;.mpeg;.mov;.mp3;.m3u;.wav;"
  "DefaultFileTypeRisk"=dword:00001808
  
  [HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Policies\Attachments]
  "SaveZoneInformation"=dword:00000001

icebraining9y ago

Unless it has changed in newer Windows versions, you can simply disable that warning in the Internet Settings, no need to keep files in an outdated filesystem.

anonymfus9y ago

>use a FAT partition to store downloaded files

Do you never download anything bigger than or equal to 4 GiB?

1 more reply

donatj9y ago

I used them for a VCS thought experiment I was playing with a while ago.

sixothree9y ago

Just pretend they're "resource forks".

tamana9y ago

It's metadata. It's as obscure as file permissions bits.

OJFord9y ago

Except that millions of developers routinely make use of file permissions; as evidenced by this discussion, many - perhaps even a majority - haven't heard of alternate data streams.

2 more replies

artifaxx9y ago· 20 in thread

That is quite the obscure and interesting issue to run into! Who puts colons in their filenames though? I haven't ever seen that used...

lorenzhs9y ago

RX149y ago

It's actually really common, for someone who's used *nix for ages and expects every character to be a valid filename.

artifaxx9y ago

It is certainly interesting to see the different assumptions people make. Thanks for the details.

zeveb9y ago

I use colons when indicating the ISO 8601[1] timestamp for stuff; it's much more readable with ':' than not, e.g. 2016-07-20T17:04:30Z vice 20160720T170430Z.

I also use them when naming stuff with hashes or UUIDs. Not having colons in filenames seems just weird to me.

Heck, even on Unix I'm annoyed that I can't simply escape slashes! It'd be nice to name files with the URLs they are taken from.

[1] https://en.wikipedia.org/wiki/ISO_8601

marcosdumay9y ago

Lots of people, that's who.

LukeShu9y ago

The Maildir format uses colons in filenames, which has created problems with running certain email software on Windows.

Tharkun9y ago

Why wouldn't you put colons in filenames? Unless of course you use Windows. Colons, spaces, backslashes, whatever.

TazeTSchnitzel9y ago

Useful if only for timestamps. 2016-07-20T16:27:00Z!

79529y ago

Artificially restricting filenames is an antipattern that makes things harder to read.

2 more replies

artifaxx9y ago

I assume any files I make should be usable by any major operating system, so to be sure I avoid any special characters in file names.

amock9y ago

2 more replies

Pxtl9y ago

Could say the same about spaces and a plethora of systems fail catastrophically on those.

1 more reply

awqrre9y ago

Do you also use slashes? what about backticks and quotes?

belovedeagle9y ago

The colon should be reserved in Unix to separate path names a la PATH.

2 more replies

mtone9y ago

As a lazy user, I often copy paste the document title as the filename. Windows generously trims any disallowed character, but if it didn't, lots of my files would contain colons.

I actually wish colons were supported, since it's so prevalent in document titles. Question marks, too, while at it.

hashhar9y ago

1 more reply

kazinator9y ago

Who puts colons into their filenames? Oh, short-sighted people like the designers of OBS (OpenSUSE Build System) who introduced a colon convention into project names.

http://lists.opensuse.org/opensuse-buildservice/2008-12/msg0...

This causes a problem even in the POSIX environment, because the colon is used in PATH and PATH-like environment variables. Usually there is no escape mechanism.

msravi9y ago

Well, it isn't uncommon to create a log file with a timestamp and a name like foobar.$datetime.log with $datetime expanding to something like 2016-07-20T22:08:38

chippy9y ago

> Who puts colons in their filenames though?

Twitter does

https://pbs.twimg.com/media/Cn1tGFoXgAApMt3.jpg:large

Houshalter9y ago

I try to save papers sometimes and give the filename the same name as the paper, which often has a colon in it.

mcculley9y ago· 12 in thread

mikeash9y ago

Microsoft seems to have a fairly comprehensive list:

https://msdn.microsoft.com/en-us/library/aa365247(VS.85).asp...

They suggest avoiding <>:"/\|?* as well as all ASCII characters 0-31.

outworlder9y ago

geofft9y ago

If you're in a position to enforce well-formed Unicode on all platforms, you're much better off. But many things (e.g. backup systems) don't get the option to just refuse files they don't like.

jcrawfordor9y ago

1 more reply

legulere9y ago

There's WTF-8 to convert broken UTF-16 into something UTF-8ish. https://simonsapin.github.io/wtf-8/

warbiscuit9y ago

This is cribbing from source of a filename sanitizer in one of my company's internal libraries. The function is a little... paranoid... so I'm not positive all of these are actually forbidden.

/ and 0x00 for unix

:?"<>/|\* and chars 0x00 .. 0x31 for windows

'~!#$&%^; if there's a chance of filename being passed to shell w/o proper escaping.

Windows also forbids a bunch of filenames matching regex "CON|AUX|PRN|NUL|COM[1-9]|LPT[1-9]"

Also, ending filenames with space or period really messes up windows. File explorer can see it, but can't delete or rename it.

edit: fixed markup

Dylan168079y ago

> Also, ending filenames with space or period really messes up windows. File explorer can see it, but can't delete or rename it.

DeltaWhy9y ago

2 more replies

nradov9y ago

mzs9y ago

  `echo missed one`

finnh9y ago

IIRC, windows has a dialog that shows their full list of disallowed characters if you try to use one of them ... so try to make a file with (eg) "\" in the name and see what the dialog says.

disclaimer: i'm remembering something from the Windows 2003 era, so YMMV.

thoth9y ago

It's still there, as a balloon popup that says a file name can't contain the following characters:

\/:*?"<>|

1 more reply

xg159y ago· 8 in thread

ajross9y ago

Ruud-v-A9y ago

What is the alternative? Renaming the file?

This was actually an issue with early versions of Servo on Windows: cloning the repository would fail because it contained a file with a # in the name.

https://github.com/servo/servo/commit/43c999905c01627133240c...

SpaceManiac9y ago

moogly9y ago

'#' is allowed. It appears the issue was the wildcard '?' character, which could be argued isn't the best idea to use on *nix either.

usefulcat9y ago

> So git should refuse to write the file to disk?

> How am I supposed to use a git repo that contains such problematic files on Windows then?

Unless the repository is usable without those files, you can't. Unsurprisingly, that's the price of being able to use the same set of files in environments that have different file naming rules.

xg159y ago

Complaining loudly and making a repo inaccessible are two different things.

If you interpret "refuse to write the file" as a fatal error, I wouldn't even be able to clone the repo because the clone process would fail.

In no scenario can I access the contents of the file, even if I don't care about the filename at all. Even if I would like to fix the filename issue, I couldn't do so from a Windows pc.

That's why I think a solutions using escaping (and highly visible warnings in git status) are better. Yes, your scripts will still break but you have at least a chance to fix the mess.

1 more reply

smellf9y ago

This is how Tortoise SVN handles SVN paths that are invalid on Windows - it doesn't write the offending file.

You should probably not check out such code on Windows in the first place, but if you accidentally do, then you really need to get loud warnings splashed everywhere.

blakeyrat9y ago

He says in the article (I haven't independently confirmed) that there's already other paths Git will refuse because Windows will misinterpret them.

This would just be adding one more to the list.

As for how you're supposed to use the Git repo on Windows: I guess you aren't?

sickbeard9y ago· 8 in thread

putting colons in your filenames are almost as weird as alternate data streams.

cordite9y ago

It's not a forward slash or a NUL byte. And it is a printable character. Doesn't seem so wrong to me.

cygx9y ago

It's used as separator in various places on *nix (eg PATH).

1 more reply

stevekemp9y ago

Some tools assume particular characters mean things. For example GNU tar will assume if you find ":" in the filename of the archive it's marks a hostname..

1 more reply

tamana9y ago

A colon is a forward slash on MacOS

jey9y ago

How do you feel about file extensions longer than three characters? How about filenames with multiple dots in them?

Stratoscope9y ago

Not sure what the question is here, can you clarify?

Long extensions and multiple dots are perfectly valid in Windows filenames. I use them all the time.

They're not like colon which has a special meaning referencing alternate streams.

2 more replies

thiagobbt9y ago

I have no problem with it if they actually have a purpose, like .docx or .tar.gz

spdionis9y ago

So I can't name my file "Theory of everything: 42.pdf"?

kazinator9y ago· 6 in thread

naz9y ago

I remember when any attempted access to c:\con\con would bluescreen any windows machine. Hours of teenage fun sending people to a website I'd set up with <img src="file://c:\con\con">

tonyarkles9y ago

You could do it over Windows File Sharing too! \\someone-elses-machine\c\con\con would blue-screen their machine!

jfroma9y ago

This annoys me few times 7 years ago, and I never searched the reason :)

A bit late, but is good to know.

Here is an screenshot on a Windows 7: https://dl.dropbox.com/s/qg5fxx01mnktw79/ss-2016-07-20T17-19...

Edit: add screenshot

kazinator9y ago

I'm surprised it's a problem even when the suffix .udl is present.

1 more reply

dboreham9y ago

The colon pre-dates DOS by a long time. I seem to recall it in RSX-11 pip. Definitely it was present in CP/M : https://en.wikipedia.org/wiki/Peripheral_Interchange_Program

kazinator9y ago

I used CP/M with a Z80 coprocessor card on an Apple II. I didn't know RSX-11 also had pip, though.

1 more reply

Someone9y ago· 4 in thread

_nedR9y ago

What? You are saying windows explorer doesn't handle this feature properly? Thats insane.

exceptione9y ago

I think that is the correct behaviour though. The default is left empty in this case, so it should indeed be zero bytes.

However, there are other tools to report the presence of alternative streams. This is not a feature intended for casual end-users.

2 more replies

sitharus9y ago

There are a lot of NTFS features that Explorer doesn't handle properly, like file paths greater than 255 characters.

SolarNet9y ago

Yea among the first things I replace in a new install

* Internet Explorer (because it can't explore the modern internet)

* File Explorer (because it can't explore files on my system)

AWildDHHAppears9y ago· 4 in thread

MacOs (i.e., Os9 and before) had special meaning for colons, too. I wonder what would happen for git on those platforms.

Edit: Apparently colon is _still_ a special character on Mac! http://stackoverflow.com/questions/13298434/colon-appears-as...

lostlogin9y ago

kalleboo9y ago

To be super pedantic, it went

1. "Macintosh System Software"

2. "Mac OS" (starting with 7.5/7.6)

3. "Mac OS X"

4. "OS X" (starting with Mountain Lion)

5. "macOS" (starting with Sierra)

AWildDHHAppears9y ago

I fixed my MacOs in my post.

(For me it went MacOs, OS X, then Windows 10).

DashRattlesnake9y ago

Isn't the colon the directory separator character in HFS, akin to the unix '/' and windows '\'?

duncans9y ago· 2 in thread

Related to this bug: used to be a vulnerability in IIS back in the late 90s where you could append ::$DATA to a file name (e.g Foo.asp::$DATA) and download a server-side script's source code.

jameshart9y ago

Related - meaning the ::$DATA was interpreted as a request for an alternate data stream from the file, and then read the default stream?

duncans9y ago

More info https://technet.microsoft.com/en-us/library/security/ms98-00... - seems to imply that $DATA is the default stream.

Grue39y ago· 2 in thread

Ieyeefae9y ago

There's https://www.dropbox.com/bad_files_check

reycharles9y ago

I get hit with a login page. Can anyone describe what is linked to?

2 more replies

fowl29y ago· 2 in thread

"McAfee Web Gateway" thinks this is porn, great.

kristianp9y ago

Why would that be I wonder? I don't see any keywords that might trigger it.

That reminds me of web filtering software that blocked my search for "java proxy", but allowed "java procy", which google understood!

voltagex_9y ago

So does BlueCoat. I've submitted it for review, but I think McAffee is maintaining its own list.

_urga9y ago

The flip-side of this:

I was running a fuzz test on a backup tool, which verified that file data and metadata (including timestamps) as reflected by Windows were exactly as produced by the fuzz test.

This was intentional on the part of Windows, but I never saw it coming.

ragsagar9y ago

Wonder why this site is blocked in UAE! :|

j / k navigate · click thread line to collapse