Stat'ing a million files is going to take a long time. Perforce doesn't have this problem because you explicitly check out files (p4 edit). (Perforce marks the whole tree read-only, as a reminder to edit the file before you save.)
It seems like large-repo git could implement the same feature. You would just disable (or warn) for operations which require stat'ing the whole tree.
Then the question is how to make the rest of the operations perform well -- git add taking 5-10 seconds seems indicative of an interesting problem, doesn't it?
It seems eminently obvious to me that having basically a "change log" for a (part of a) filesystem is something that's valuable independent of your build system, revision control system, whatnot.
At least that's what I'd like to see - it's functionality that's orthogonal to those tools.
Mac OS X's FSEvents API has something similar to that. When you create a FSEvent listener you can pass in an old event ID so the system can give you all the stuff that happened while you weren't listening [1]. Apple uses this for Time Machine (and I suspect Spotlight, too).
To better understand this technology, you should first understand what it is
not. It is not a mechanism for registering for fine-grained notification of
filesystem changes. It was not intended for virus checkers or other technologies
that need to immediately learn about changes to a file and preempt those changes
if needed. [...]
The file system events API is also not designed for finding out when a
particular file changes. For such purposes, the kqueues mechanism is more
appropriate.
The file system events API is designed for passively monitoring a large tree of
files for changes. The most obvious use for this technology is for backup
software. Indeed, the file system events API provides the foundation for Apple’s
backup technology.
IMO, only telling users what directories changed is a smart move. It means that the amount of data that must be kept around is much smaller. That allows the OS to keep this list around 'forever' (I do not know how 'forever' that actually is)
NTFS has this optionally in the "USN Change Journal"; see http://msdn.microsoft.com/en-us/library/aa363798.aspx. It's used by a few Microsoft features like indexing and file replication, but it's available to third party programs too.
The git add problem is because .git/index is rewritten from scratch each time a new change is staged. With a 100 mb index file, that takes as long as it takes to write that much data to disk (cache). Much room for improvement here.
Stat'ing a million files is going to take a long time. Perforce doesn't have this problem because you explicitly check out files (p4 edit). (Perforce marks the whole tree read-only, as a reminder to edit the file before you save.)
It seems like large-repo git could implement the same feature. You would just disable (or warn) for operations which require stat'ing the whole tree.
Then the question is how to make the rest of the operations perform well -- git add taking 5-10 seconds seems indicative of an interesting problem, doesn't it?