9

Please note that this is about a thread-safe LRU cache (not simply a LRU cache as specified in https://leetcode.com/problems/lru-cache/description/). It isn't a duplicate of LRU cache design question as there are some tricky aspects of Locking Hashtable/Linkedlist(LL) that aren't addressed in other multithreaded LRU design questions.

The credited approach on how to make LRU cache thread-safe in C++ seems to be all over the place. I noticed some links mentioning that locking both Hashtable/LL is ok while some others don't directly address the issue in C++ and just vaguely discuss various ways to do it.

  • Can someone describe a functional and efficient approach using locks that includes what's locked and when implementing a multithreaded version of the LRU cache in C++ https://leetcode.com/problems/lru-cache/description/ so all get/set operations are O(1) ?

  • how do caches such as memcached by Facebook implement multithreading support?

Joe Black
  • 191

2 Answers2

3

I have implemented this using a multiple readers/single writer lock. This is a type of lock that allows multiple threads to read from the cache at the same time, but only allows a single thread to write to the cache. The basic gist of it is something like this (pseudocode):

void put(ItemType item, KeyType key)
{
    cacheLock.lockForWriting();

    // If it is already in the list and map remove it
    list.remove(key);
    map.remove(key);

    // Now put it back into both
    map [ key ] = item;
    list.append(key);

    // If we're past the size limit of the cache, remove the least recently used item
    if (list.size() > MAX_CACHE_SIZE)
    {
        ListNode *oldKey = list.head;
        list.head = list.head.next;
        map.remove(oldKey->key);
        delete oldKey;
    }

    cacheLock.unlock();
}

ItemType* get(KeyType key)
{
    cacheLock.lockForReading();
    ItemType* item  = map [ key ];
    cacheLock.unlock();

    // NOTE: This assumes that the cache is not the owner of this object
    // If it is, then you need to either make a copy of it inside the lock 
    // above or use smart pointers instead of raw  pointers
    // because when it gets kicked out of the cache it will be deleted.
    return item;
}
user1118321
  • 4,981
1

I wrote a graphics card memory cacher in RAM: https://github.com/tugrul512bit/VirtualMultiArray

It uses n number of parallel "clock-second-chance-with-2-pointers" caches instead of doing multithreading for 1 cache. This way, each element access of virtual array needs only 1 lock and this lowers 50MB/s of single threaded lockless access throughput down to "only" around 25MB. The more locks added per cache, the worse for a single threaded access, like 10MB/s. But when all caches are used concurrently, running 64 threads on 8 logical cores, it goes up 215MB/s (these numbers from a Gaussian Blur benchmark with 4 byte data size per access and when getting "tiles of pixels" instead, the bandwidth goes 1.2GB/s).

Every independent cache has its own lock and maps onto a strided pattern of virtual array. Combining with other strides makes a whole array where any n neighbor cache lines are completely independent to lock.

Having an unordered map to find clock elements in O(1) already adds a not-so-negligible latency. Adding multithreading to each cache would further slow it down for single threaded usage. I think, losing latency to improving the hit-ratio must be better than reducing scalability. You can always have independent caches for arbitrary regions of array, not just a strided pattern. For example, 2D cache for image processing, where each cache is active only for a tile of image. Then many threads can work concurrently and fully efficient, even without locks, without atomics, but with requirement for a guarantee for threads to not go out of their tile boundaries.