82

I have a program that needs to generate temporary files. It is written for cluster machines.

If I saved those files to a system-wide temporary directory (eg: /tmp), some users complained the program failed because they didn't have proper access to /tmp. But if I saved those files to the working directory, those users also complained they didn't want to see those mysterious files.

Which one is a better practice? Should I insist that saving to /tmp is the right approach and defend any failure as "working as intended" (ie. ask your admin for proper permission/access)?

psmears
  • 196
  • 4
SmallChess
  • 1,246

7 Answers7

147

Temporary files have to be stored into the operating system temporary directory for several reasons:

  • The operating system makes it very easy to create those files while ensuring that their names would be unique.

  • Most backup software knows what are the directories containing temporary files, and skips them. If you use the current directory, it could have an important effect on the size of incremental backups if backups are done frequently.

  • The temporary directory may be on a different disk, or in RAM, making the read-write access much, much faster.

  • Temporary files are often deleted during the reboot (if they are in a ramdisk, they are simply lost). This reduces the risk of infinite growth if your app is not always removing the temp files correctly (for instance after a crash).

    Cleaning temp files from the working directory could easily become messy if the files are stored together with application and user files. You can mitigate this problem by creating a separate directory within the current directory, but this could lead to another problem:

  • The path length could be too long on some platforms. For instance, on Windows, path limits for some APIs, frameworks and applications are terrible, which means that you can easily hit such limit if the current directory is already deep in the tree hierarchy and the names of your temporary files are too long.

  • On servers, monitoring the growth of the temporary directory is often done straight away. If you use a different directory, it may not be monitored, and monitoring the whole disk won't help to easily figure out that it's the temp files which take more and more place.

As for the access denied errors, make sure you let the operating system create a temporary file for you. The operating system may, for instance, know that for a given user, a directory other than /tmp or C:\Windows\temp should be used; thus, by accessing those directories directly, you may indeed encounter an access denied error.

If you get an access denied even when using the operating system call, well, it simply means that the machine was badly configured; this was already explained by Blrfl. It's up to the system administrator to configure the machine; you don't have to change your application.

Creating temporary files is straightforward in many languages. A few examples:

  • Bash:

    # The next line will create a temporary file and return its path.
    path="$(mktemp)"
    echo "Hello, World!" > "$path"
    
  • Python:

    import tempfile
    
    # Creates a file and returns a tuple containing both the handle and the path.
    handle, path = tempfile.mkstemp()
    with open(handle, "w") as f:
        f.write("Hello, World!");
    
  • C:

    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    ...
    char temp_file[] = "/tmp/tmp.XXXXXX";
    int fd = mkstemp(temp_file);
    dprintf(fd, "Hello World!!!\n");
    close(fd);
    
  • C#:

    // Creates a file and returns the path.
    var path = Path.GetTempFileName();
    File.WriteAllText(path, "Hello, World!");
    
  • PHP:

    # Creates a file and returns the handle.
    $temp = tmpfile();
    fwrite($temp, "Hello, World!");
    fclose($temp);
    
  • Ruby:

    require "tempfile"
    
    # Creates a file and returns the file object.
    file = Tempfile.new ""
    file << "Hello, World!"
    file.close
    

Note that in some cases, such as in PHP and Ruby, the file is removed when the handle is closed. That's an additional benefit of using the libraries bundled with the language/framework.

33

Should I insist saving to /tmp is the right approach and defend for any failure as "working as intended" (ie. ask your admin for proper permission access)?

There are standards for this, and the best thing you can do is conform to them.

POSIX, which is followed by pretty much every non-mainframe OS of any significance that you're likely to run into, has provisions for creating uniquely-named temporary files in a directory using default values that can be reconfigured by the environment:

  • The C stdio.h header may optionally include a P_tmpdir macro that names the system's temporary directory.
  • TMPDIR is the canonical environment variable for changing the location of temporary files. Prior to POSIX, there were other variables used, so I tend to go with the first of that or TMP, TEMPDIR and TEMP that has a value, punting and using the system default if none of those exist.
  • The mkstemp() and tempfile() functions will generate unique temporary files.

If your users are being denied the ability to create temporary files, the system is either misconfigured or the administrators aren't making clear what their policy is on such things. In those cases, you'd be on very firm ground in saying that your program conforms to a well-established portability standard and that its behavior can be changed using the environment variables the standard specifies.

Blrfl
  • 20,525
10

The previous answers, although correct, aren't valid for most large scale computer clusters.

Computer clusters not always follow the standard conventions for machines, usually for good reasons, and there is no point in discussing it with the sysadmins.

Your current directory is referring to the central file system, which is accessed through the network. This is not only slow, but also puts loads on the system for the rest of the users, so you shouldn't use it unless you aren't writing much and you can recover from it if the job crashes.

The computing nodes have their own hard drive, that is the fastest file system available, and what you should be using. The cluster documentation should tell you what it is, typically /scratch, /tmp/[jobid], or some non standard enviroment variable ($SNIC_TMP in one of the ones I use).

So, what I recommend is making it user-configurable. The defaults can be the first one you have write access to:

  • $TMPDIR
  • tmpfile
  • /tmp
  • .

But expect a low success rate with this approach, and make sure to emit a big fat warning.

Edit: I'll add another reason for force it to be user-set. One of my clusters has $TMPDIR set to /scratch, that is user-writable and on the local hard drive. But, the documentation says that anything you write outside of /scratch/[jobid] may be deleted at any point, even in the middle of the run. So, if you follow the standards, and trust $TMPDIR, you will encounter random crashes, very hard to debug. So, you may accept $TMPDIR, but not trust it.

Some other clusters do have this variable properly configured, so you may add an option to explicitly trust $TMPDIR, otherwise, emit a big, fat warning.

Davidmh
  • 230
9

The temp-file-directory is highly operating system/environment dependant. For example a web-servers-temp dir is seperate from the os-temp-dir for security reasons.

Under ms-windows every user has its own temp-dir.

you should use the createTempFile() for this if such a function is available.

k3b
  • 7,621
1

For many applications, you should consider putting temporary files in $XDG_RUNTIME_DIR or $XDG_CACHE_HOME (the other XDG dirs are for nontemporary files). For instructions on calculating them if they are not explicitly passed in the environment, see the XDG basedir spec or find a library that already implements that part.

Note, however, that $XDG_RUNTIME_DIR is a new addition and there is no standard fallback for older systems due to security concerns.

If neither of those is suitable, then /tmp is the correct place. You should never assume the current directory is writable.

o11c
  • 588
0

they didn't have proper access to /tmp

It's not obvious what you mean by "proper access". Is it something like /tmp/xxx: Permission denied?

As others have pointed out, anything under /tmp must have a unique name, so if the program is using a file called /tmp/xxx for each user, only the first user will be able to use it. The program will fail for everyone else because they are trying to write to a file owned by that first user.

But, something not mentioned in any of the other answers so far, there is a good technical reason for using /tmp rather than /var/tmp, one's home directory, or the current directory.

Some systems have /tmp set up as a RAM-disk, which has three main advantages:

  • A RAM-disk will be much faster than anything on a permanent storage device.
  • A RAM-disk won't put wear and tear on SSD memory, which has a limit to the number of writes that it can handle in its lifetime.
  • The temporary files will automatically be deleted even if the program that created them terminates without cleaning up after itself. (Also true for /var/tmp, which cannot be a RAM-disk.)
-2

This is more like an alternative, but you might unlink() the file immidiately after fopen() . It depends of usage pattern of cource.

Unlinking the files, if it can be done, helps for several ways:

  • file is not seen - user not see it.
  • file is not seen from other processes - there is not chance other process to modify the file by mistake.
  • easy cleanup if program crash.

Files must be created in /tmp. If user have not rights to create file there, this means system is missconfigured.

Files can not be created in users home directory. Lots of users, such "nobody", " www-data" and many others, does not have rights to write in their home directories, or they are even chroot()-ed. Note that even in chroot environment /tmp still exists.

Nick
  • 305