53

I am writing an application that works with satellite images, and my boss asked me to look at some of the commercial applications, and see how they behave. I found a strange behavior and then as I was looking, I found it in other standard applications as well.

These programs first write to the temp folder, and then copy it to the intended destination.

Example: 7zip first extracts to the temp folder, and then copies the extracted data to the location that you had asked it to extract the data to.

I see several problems with this approach:

  1. The temp folder might not have enough space, while the intended location might have that much space.

  2. If it is a large file, it can take a non-negligible amount of time for the copy operation.

I thought about it a lot, but I couldn't see one single positive point to doing this. Am I missing something, or is there a real benefit to doing this?

yannis
  • 39,647

3 Answers3

101

A few reasons I can think of:

  • On most platforms, file moves are atomic, but file writes are not (especially if you can't write all the data in one go). So if you have the typical producer/consumer pattern (one process produces files, the other watches a directory and picks up everything it finds), writing to a temp folder first and only then moving to the real location means the consumer can never see an unfinished file.
  • If the process that writes the file dies halfway through, you have a broken file on your disk. If it's in a real location, you have to take care of cleaning it up yourself, but if it's in a temp location, the OS will take care of it.
  • If the file happens to be created while a backup job is running, the job may pick up an incomplete file; temp directories are generally excluded from backups, so the file will only be included once moved to the final destination.
  • The temp directory may be on a fast-but-volatile filesystem (e.g. a ramdisk), which can be beneficial for things like downloading several chunks of the same file in parallel, or doing in-place processing on the file with lots of seeks. Also, temp directories tend to cause more fragmentation than directories with less frequent reads, writes, and deletes, and keeping the temp directory on a separate partition can help keep fragmentation of the other partitions down.

TL;DR - it mostly boils down to atomicity, that is, you want to make it so that (at the final location) the file is either complete or not there at all at any given time.

tdammers
  • 52,936
15

This seems to be an issue in Windows, more specifically related to how the drag-drop is managed.

The developers of the WINSCP client have developed their own shell extension, which overrides this drag-drop behavior and allows dropping the file to the right folder immediately. They explain the trick in their documentation and, more interesting, what is the problem and how they solved it.

Here is the interesting part:

Windows drag&drop mechanics does not allow source application of drag&drop operation to find out easily, where the files are dropped. It is up to target application (Windows Explorer usually) to transfer files to destination. It is rather reasonable, because source application can hardly transfer files to all possible destinations. Keep in mind that you can drop files not only to a directory, but even to ZIP file (or any other archive), remote directory (via FTP, SFTP, SCP, …), trash, …

Obviously even Windows Explorer (or any other target application, like WinZip) cannot download files from any possible source (particularly it does not know SFTP/SCP).

Also, specifically for 7Zip: user ray023 answers this question in the SuperUser Stack Q&A: https://superuser.com/a/422463

Basically, if instead of drag-dropping your file you use the "extract here" method available in bith 7-ZIP and Winrar, the files are directly extractly to the right directory.

Jalayn
  • 9,827
0

If you have to do any kind of data processing to the file (decode/convert/etc..), then it's better to use a temporary file and when completed, and only if completed, transfer the result to the final destination.

Benefits:

  1. Only completed files reach destination
  2. Temporary file may (should) reside in fast media
  3. Avoid fragmentation on final file
  4. Allows the use of other media as final destination (ftp, cloud, whatever)
  5. Aborted temp files are easier to clean

I don't see real benefits of writing straight to the destination while processing the data.

roetnig
  • 156