As long as ../ is on the same device, that should clear the directory instantaneously. It is the point, right? Of course, if you want an rm for lower IO-wait or lower CPU, use the rsync method, but if you want something that clear a directory as fast as possible, this is fast. Tested with for I in `seq 1 1000000`; do echo ${I} > ./${I};done;sync #^ much faster than "touch"
Now can you please explain why you think that this is faster?
mkdir ../.tmp${RANDOM} &&
mv ./* ../.tmp[0-9]* &&
rm -rf ../.tmp[0-9]* &
I believe that if I write a minimal tool in C it would be much faster than rsync.I would have to read the fs index if existing, otherwise create a list of directories/files then unlink it in parallel in the inode order. Later optimize ops based on the fs.
But that's just a guess.
rsync definitely builds a list of files to delete first. so that will help.
perhaps it also puts the files in to inode number order which would also help since its related to directory hash order and order of inodes in the inode table.
BTW, having millions of files in an ext3 directory in the first place is probably a bad idea. Instead, layer the files into two or three directory levels. See here:
http://www.redhat.com/archives/ext3-users/2007-August/msg000...
(Git for example places its objects under 1 of 256 directories based on the first hex byte representation of the object's SHA-1.)
perl -e 'opendir D, "."; @f = grep {$_ ne "." && $_ ne ".."} readdir D;
unlink(@f) == $#f + 1 or die'
It goes a bit quicker still if @f and the error handling are omitted.The original article is comparing different things some of the time, e.g. find is having to stat(2) everything to test if it's a file.
perl -e 'chdir "/var/session" or die; opendir D, ".";
while ($f = readdir D) { unlink $f }'
It is very efficient memory-wise compared to the other options as well as being much faster.
It is also easy to apply filters as you would with -mtime or such in find, just change the end statement to: { if (-M $f > 30) {unlink $f} }
to affect files modified more than 30 days ago.Here's GNU coreutils rm [0] calling its remove() function [1] itself using fts to open, traverse, and remove each entry[2], vs rsync delete() [3] calling {{robust,do}_,}unlink() function [4] [5].
Now a little profiling could certainly help.
(damn gitweb that doesn't highlight the referenced line)
[0]: http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f...
[1]: http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f...
[2]: http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f...
[3]: http://rsync.samba.org/ftp/unpacked/rsync/delete.c
So one thing that's interesting is that both rsync and rm stat every file in each directory to determine whether to use rmdir or unlink, and to perform a depth first removal. I wonder if it would be faster to skip the stats, just call unlink on everthing and check errno for EISDIR (not POSIX, but returned by Linux), then descend as needed and use rmdir on the way back up.
Some useful background material:
http://computer-forensics.sans.org/blog/2008/12/24/understan...
http://static.usenix.org/publications/library/proceedings/al...
The numbers are not that useful. It's notable that rsync:rm went from 1:12 in his old test to 1:3 in his new test, but we really don't know anything about why.
FWIW (very little), I did a similar test on a convenient OSX box (HFS+, 1000000 zero-byte files, single spindle), and rm won. rsync was next (+25%), straight C came in a little higher, then ruby (20% over rsync). Maybe BSD rm is awesome.
# btrfs subvolume create foobar Create subvolume '/btrfs/foobar'
### now do all kinds of atrocities in this filesystem
# btrfs subvolume delete foobar Delete subvolume '/btrfs/foobar'
I've salvaged an unwieldy directory by using Python to directly call unlink(2). Details: http://www.somebits.com/weblog/tech/bad/giant-directories.ht...
A small program using getdents() with a large buffer (5MB or so) speeds it up a lot.
If you want to be kind to your hard drive then sorting the buffer by inode before running unlink()s will be better to access the disk semi-sequentially (less head jumps).
How to delete million of files on busy Linux servers ("Argument list too long")
http://pc-freak.net/blog/how-to-delete-million-of-files-on-b...
The lead dev responsible for the app was also fond of hard-coding IP addresses and wouldn't even entertain talk of doing anything differently.
I got out of there ASAP.
opendir D, "."; while ($n = readdir D) { unlink $n }