Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had the same reaction as you. </meta>

Stat'ing a million files is going to take a long time. Perforce doesn't have this problem because you explicitly check out files (p4 edit). (Perforce marks the whole tree read-only, as a reminder to edit the file before you save.)

It seems like large-repo git could implement the same feature. You would just disable (or warn) for operations which require stat'ing the whole tree.

Then the question is how to make the rest of the operations perform well -- git add taking 5-10 seconds seems indicative of an interesting problem, doesn't it?



It seems to me that you could have a daemon that uses inotify to make operations O(changed) vs O(size).


Which would also be tremendously useful for e.g. make.


There already exists tup: http://gittup.org/tup/ which does that sort of thing.


It seems eminently obvious to me that having basically a "change log" for a (part of a) filesystem is something that's valuable independent of your build system, revision control system, whatnot.

At least that's what I'd like to see - it's functionality that's orthogonal to those tools.


Oh my god, that would be awesome at the FS level.


Mac OS X's FSEvents API has something similar to that. When you create a FSEvent listener you can pass in an old event ID so the system can give you all the stuff that happened while you weren't listening [1]. Apple uses this for Time Machine (and I suspect Spotlight, too).

[1] https://developer.apple.com/library/mac/#documentation/Darwi...


What happens if a file is created and deleted multiple times? How does this avoid doing a complete walk of FS state and being O(size) itself?


It does not actually tell you what files got changed; it tells you what directories saw at least one change.

Programs will still have to inspect those directories to find out what file(s) changed.

To quote https://developer.apple.com/library/mac/#documentation/Darwi...:

  To better understand this technology, you should first understand what it is
  not. It is not a mechanism for registering for fine-grained notification of
  filesystem changes. It was not intended for virus checkers or other technologies
  that need to immediately learn about changes to a file and preempt those changes
  if needed. [...]

  The file system events API is also not designed for finding out when a
  particular file changes. For such purposes, the kqueues mechanism is more
  appropriate.

  The file system events API is designed for passively monitoring a large tree of
  files for changes. The most obvious use for this technology is for backup
  software. Indeed, the file system events API provides the foundation for Apple’s
  backup technology.
IMO, only telling users what directories changed is a smart move. It means that the amount of data that must be kept around is much smaller. That allows the OS to keep this list around 'forever' (I do not know how 'forever' that actually is)


It doesn't. But it does mean you don't have to do it every time

This is a nice overview on FSevents

http://arstechnica.com/apple/reviews/2007/10/mac-os-x-10-5.a...


NTFS has this optionally in the "USN Change Journal"; see http://msdn.microsoft.com/en-us/library/aa363798.aspx. It's used by a few Microsoft features like indexing and file replication, but it's available to third party programs too.


Linux has been toying with a decent replacement for inotify for a while. Last time I looked it was called fanotify[1] and was still not merged.

[1] https://lwn.net/Articles/339399/


Fanotify was merged but disabled in 2.6.36. It was enabled in 2.6.37.

https://lwn.net/Articles/421638/


The git add problem is because .git/index is rewritten from scratch each time a new change is staged. With a 100 mb index file, that takes as long as it takes to write that much data to disk (cache). Much room for improvement here.


Writing 100mb to disk should take around 4 seconds on a HDD and less on a SSD.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: