How do programs like subversion detect when a file has been edited as opposed to created/deleted?

Question

This is my first question here so I hope it is not off topic. Although I am using the Linux inotify library to listen for changes to files, and I compare use of that against the Subversion program, I am specifically looking for the algorithm used.

To a human it is very easy to tell if a file has been created or modified. Clicking the New button constitutes the former, and clicking the Save button constitutes the latter. To Linux, both those actions have serious overlap. In text editors, for example, generally a swap file is created and then copied/moved. This makes it difficult to distinguish via inotify between a minor edit to a file and a deliberate overwrite of a file. What I am trying to understand is how a program such as Subversion recognizes the difference between a user having modified a file with a text editor, and a user having actually deleted the file and opened a new file with the same name.

Edit: It has been pointed out that subversion does not do what I want it to do, so it was a blunder on my part to use it as an example. Instead allow me to rephrase the question: "Is there any known program or programming approach to match high level actions such as creating new files and saving them to low level actions such as modifying, moving, copying, etc. such that I can log all the files in the system and changes to them"?

score 3 · Answer 1 · answered May 11 '12 at 02:24

If you want to learn how subversion does to understand the working directory, you can look at the source pretty easily. As a longtime SVN user, I can pretty confidently say that SVN does not make any distinction at all to what happens to a file before the commit -- it just checks against what you are committing against the repository. Nothing more, nothing less.

score 2 · Answer 2 · 2012-05-11T04:20:58.853

I'm not sure what you mean by an algorithm, but certainly you can do this. How hard it will be depends on what operating system and file system you are using, and the level of detail you need.

If you think about it, this is just what the file system is doing anyway. A file is an abstraction. At base it's just a collection of spots on a disk. All those operations like creating a file, deleting a file, or modifying a file are just the file system managing lists of spots. What you are asking for is for the file system to share information with you as it performs its management chores.

At one level there are library calls like stat and inotify, but these are just wrapping lower level OS or device driver calls. You can tap in to those too. In the 'olden' days you would have had to write interrupt hooks to monitor the OS calls. Now a sophisticated API for hooking into the file system may be provided for you. See for example this article on the NTFS transactional file system API. I also just stumbled across this paper "VFS Interceptor: Dynamically Tracing File System Operations in real environments" which discusses the design of a tool for tracing file system operations.

score 1 · Answer 3 · answered Feb 28 '13 at 22:52

You could read the file in yourself before setting up your inotify, then whenever you get an event, diff the file with your stored copy and see how much has changed. In a simple bash example with inotify-tools installed it would be something like this (with the file you're watching as first parameter):

#!/bin/bash
TEMPFILE=/tmp/${0}_${RANDOM}_$$.tmp
cp $1 $TEMPFILE
while true; do
   inotifywait $1
   changesize=`diff $1 $TEMPFILE | wc -l`
   if test $changesize -gt 10 ; then
       echo "Big change"
   else
        echo "Minor edit"
fi

NOTE: This is a sample, treat it as pseudocode, I haven't test it for syntax errors and such so adapt before using.

How do programs like subversion detect when a file has been edited as opposed to created/deleted?

3 Answers3