103

I have a rather old server that has 4GB of RAM and it is pretty much serving the same files all day, but it is doing so from the hard drive while 3GBs of RAM are "free".

Anyone who has ever tried running a ram-drive can witness that It's awesome in terms of speed. The memory usage of this system is usually never higher than 1GB/4GB so I want to know if there is a way to use that extra memory for something good.

  • Is it possible to tell the filesystem to always serve certain files out of RAM?
  • Are there any other methods I can use to improve file reading capabilities by use of RAM?

More specifically, I am not looking for a 'hack' here. I want file system calls to serve the files from RAM without needing to create a ram-drive and copy the files there manually. Or at least a script that does this for me.

Possible applications here are:

  • Web servers with static files that get read alot
  • Application servers with large libraries
  • Desktop computers with too much RAM

Any ideas?

Edit:

  • Found this very informative: The Linux Page Cache and pdflush
  • As Zan pointed out, the memory isn't actually free. What I mean is that it's not being used by applications and I want to control what should be cached in memory.
ewwhite
  • 201,205
Andrioid
  • 2,760

18 Answers18

81

vmtouch seems like a good tool for the job.

Highlights:

  • query how much of a directory is cached
  • query how much of a file is cached (also which pages, graphical representation)
  • load file into cache
  • remove file from cache
  • lock files in cache
  • run as daemon

vmtouch manual

EDIT: Usage as asked in the question is listed in example 5 on vmtouch Hompage

Example 5

Daemonise and lock all files in a directory into physical memory:

vmtouch -dl /var/www/htdocs/critical/

EDIT2: As noted in the comments, there is now a git repository available.

seeker
  • 996
34

This is also possible using the vmtouch Virtual Memory Toucher utility.

The tool allows you to control the filesystem cache on a Linux system. You can force or lock a specific file or directory in the VM cache subsystem, or use it to check to see what portions of a file/directory are contained within VM.

How much of the /bin/ directory is currently in cache?

$ vmtouch /bin/
           Files: 92
     Directories: 1
  Resident Pages: 348/1307  1M/5M  26.6%
         Elapsed: 0.003426 seconds

Or...

Let's bring the rest of big-dataset.txt into memory...

$ vmtouch -vt big-dataset.txt
big-dataset.txt
[OOo                                                 oOOOOOOO] 6887/42116
[OOOOOOOOo                                           oOOOOOOO] 10631/42116
[OOOOOOOOOOOOOOo                                     oOOOOOOO] 15351/42116
[OOOOOOOOOOOOOOOOOOOOOo                              oOOOOOOO] 19719/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOo                        oOOOOOOO] 24183/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo                  oOOOOOOO] 28615/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo              oOOOOOOO] 31415/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo      oOOOOOOO] 36775/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOo  oOOOOOOO] 39431/42116
[OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 42116/42116

           Files: 1
     Directories: 0
   Touched Pages: 42116 (164M)
         Elapsed: 12.107 seconds
ewwhite
  • 201,205
27

A poor man's trick for getting stuff into the filesystem cache is to simply cat it and redirect that to /dev/null.

This is an example:-

cat /path/myfile.db > /dev/null 
Grizly
  • 2,081
  • 15
  • 21
cagenut
  • 4,868
23

Linux will cache as much disk IO in memory as it can. This is what the cache and buffer memory stats are. It'll probably do a better job than you will at storing the right things.

However, if you insist in storing your data in memory, you can create a ram drive using either tmpfs or ramfs. The difference is that ramfs will allocate all the memory you ask for, were as tmpfs will only use the memory that your block device is using. My memory is a little rusty, but you should be able to do:

 # mount -t ramfs ram /mnt/ram 

or

 # mount -t tmpfs tmp /mnt/tmp

and then copy your data to the directory. Obviously, when you turn the machine off or unmount that partition, your data will be lost.

David Pashley
  • 23,963
18

After some extensive reading on the 2.6 kernel swapping and page-caching features I found 'fcoretools'. Which consists of two tools;

  • fincore: Will reveal how many pages the application has stored in core memory
  • fadvise: Allows you to manipulate the core memory (page-cache).

(In case someone else finds this interesting I'm posting this here)

Andrioid
  • 2,760
10

There are two kernel settings that can help considerably even without using other tools:

swappiness

tells linux kernel how aggressively it should use swap. Quoting the Wikipedia article:

Swappiness is a property for the Linux kernel that changes the balance between swapping out runtime memory, as opposed to dropping pages from the system page cache. Swappiness can be set to values between 0 and 100 inclusive. A low value means the kernel will try to avoid swapping as much as possible where a higher value instead will make the kernel aggressively try to use swap space. The default value is 60, and for most desktop systems, setting it to 100 may affect the overall performance, whereas setting it lower (even 0) may improve interactivity (decreasing response latency.)

vfs_cache_pressure

Quoting from vm.txt:

Controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to reclaim dentries and inodes at a "fair" rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. ...


By setting swappiness high (like 100), the kernel moves everything it doesn't need to swap, freeing RAM for caching files. And by setting vfs_cache_pressure lower (let's say to 50, not to 0!), it will favor caching files instead of keeping application data in RAM.

(I work on a large Java project and every time I run it, it took a lot of RAM and flushed the disk cache, so the next time I compiled the project everything was read from disk again. By adjusting these two settings, I manage to keep the sources and compiled output cached in RAM, which speeds the process considerably.)

Petr
  • 687
  • 1
  • 8
  • 21
4

You may be able to have a program that just mmaps your files then stays running.

Brad Gilbert
  • 2,633
3

If you have plenty of memory you can simply read in the files you want to cache with cat or similar. Linux will then do a good job of keeping it around.

2

I very much doubt that it is actually serving files from the disk with 3 GB RAM free. Linux file caching is very good.

If you are seeing disk IO, I would look into your logging configurations. Many logs get set as unbuffered, in order to guarantee that the latest log information is available in the event of a crash. In systems that have to be fast regardless, use buffered log IO or use a remote log server.

Zan Lynx
  • 916
1

http://www.coker.com.au/memlockd/ does this

though you really don't need it, linux will do a pretty good job of caching the files you are using on its own.

Justin
  • 3,906
0

Desktop computers (eg. ubuntu) already uses preloading files (at least, popular shared libraries) to memory on boot. It is used to speed up booting and startup time of different bloarware like FF, OO, KDE and GNOME (with evolution bloat-mailer).

The tool is named readahead http://packages.ubuntu.com/dapper/admin/readahead

There is also corresponding syscall: readahead(2) http://linux.die.net/man/2/readahead

There is also project of preloading daemon: http://linux.die.net/man/8/preload

osgx
  • 613
0

There are various ramfs systems you can use (eg, ramfs, tmpfs), but in general if files are actually being read that often, they sit in your filesystem cache. If your working set of files is larger than your free ram, then files will be cleared out of it - but if your working set is larger than your free ram, there's no way you'll fit it all into a ramdisk either.

Check the output of the "free" command in a shell - the value in the last column, under "Cached", is how much of your free ram is being used for filesystem cache.

0

As for your latter question, ensure that your RAM is sitting on different memory channels so that the processor can fetch the data in parallel.

sybreon
  • 7,455
0

I think this might be better solved at the application level. For instance, there are probably specialized web servers for this, or you might consider mod_cache with Apache. If you have a specific goal, such as serving web content faster, then you can get improvements form this sort of thing I think.

But your question is general in nature, the Linux memory subsystem is designed to provide the best general use of RAM. If you want to target certain types of performance, consider looking up everything in /proc/sys/vm .

The fcoretools package is interesting, I'd be interested in any articles about its application... This link talks about the actual system calls used in an application.

Kyle Brandt
  • 85,693
0

Not exactly what was asked, but I use

find BASE_DIRECTORY -type f -exec cat {} >/dev/null \;

to trigger initialization of files in an AWS volume created from a snapshot. It's more focused than the official recommendation of using dd if you just want to read some files.

Federico
  • 111
-1

i just tried dd if=/dev/yourrootpartition of=/dev/null \ bs=1Mcount=howmuchmemoryyouwanttofill

it does not give me the control that you desire but it at least tries to use wasted memory

-1

Sometimes I may want to cache files in a certain folder and its subfolders. I just go to this folder and execute the following:

find . -exec cp {} /dev/null \;

And those files are cached

-2

i use find / -name stringofrandomcharacter it helps alot