Implementing LRU cache

time to read 6 min | 1069 words

In my last post I mentioned that checking whatever a user is an administrator or not using Active Directory query can be slow. That means that we can just make use of that, we have to cache that.

When caching is involved, we have to consider a few things. When do we expire the data? How much memory are we going to use? How do we handle concurrency?

The first thing that pops to mind is the usage of MemoryCache, now part of the .NET framework and easily accessible. Sadly, this is a heavy weight object, it creates its own threads to manage its state, which probably means we don’t want to use it for a fairly simple feature like this.

Instead, I implemented the following:

public class CachingAdminFinder
{
    private class CachedResult
    {
        public int Usage;
        public DateTime Timestamp;
        public bool Value;
    }

    private const int CacheMaxSize = 25;
    private static readonly TimeSpan MaxDuration = TimeSpan.FromMinutes(15);
    private readonly ConcurrentDictionary<SecurityIdentifier, CachedResult> cache =
        new ConcurrentDictionary<SecurityIdentifier, CachedResult>();


    public bool IsAdministrator(WindowsIdentity windowsIdentity)
    {
        if (windowsIdentity == null) throw new ArgumentNullException("windowsIdentity");
        if (windowsIdentity.User == null)
            throw new ArgumentException("Could not find user on the windowsIdentity", "windowsIdentity");

        CachedResult value;
        if (cache.TryGetValue(windowsIdentity.User, out value) && (DateTime.UtcNow - value.Timestamp) <= MaxDuration)
        {
            Interlocked.Increment(ref value.Usage);
            return value.Value;
        }
        bool isAdministratorNoCache;
        try
        {
            isAdministratorNoCache = IsAdministratorNoCache(windowsIdentity.Name);
        }
        catch (Exception e)
        {
            log.WarnException("Could not determine whatever user is admin or not, assuming not", e);
            return false;
        }
        var cachedResult = new CachedResult
            {
                Usage = value == null ? 1 : value.Usage + 1,
                Value = isAdministratorNoCache,
                Timestamp = DateTime.UtcNow
            };

        cache.AddOrUpdate(windowsIdentity.User, cachedResult, (_, __) => cachedResult);
        if (cache.Count > CacheMaxSize)
        {
            foreach (var source in cache
                .OrderByDescending(x => x.Value.Usage)
                .ThenBy(x => x.Value.Timestamp)
                .Skip(CacheMaxSize))
            {
                if (source.Key == windowsIdentity.User)
                    continue; // we don't want to remove the one we just added
                CachedResult ignored;
                cache.TryRemove(source.Key, out ignored);
            }
        }

        return isAdministratorNoCache;
    }

    private static bool IsAdministratorNoCache(string username)
    {
       // see previous post
    }
}

Amusingly enough, properly handling the cache takes (much) more code than it takes to actually get the value.

We use ConcurrentDictionary as the backing store for our cache, and we enhance the value with usage & timestamp information. Those come in handy when the cache grows too big and need to be trimmed.

Note that we also make sure to check the source every 15 minutes or so, because there is nothing as annoying as “you have to restart the server for it to pick the change”. We also handle the case were we can’t get this information for some reason.

In practice, I doubt that we will ever hit the cache max size limit, but I wouldn’t have been able to live with myself without adding the check Smile .

Tweet Share Share 17 comments

Tags:

Comments

13 Sep 2012
19:57 PM

Steve

Doesn't the foreach throw an exception if you modify the cache inside the loop?

13 Sep 2012
20:03 PM

tobi

This cache suffers from cache stampeding. At 1000k requests per seconds and 2s to produce a cache value this will overwhelm the poor AD server. It will need to handle 2k requests in 2sec suddenly when the cache item expires.

I hate caches which use the CAS pattern (get, produce, try-insert). This only goes well if there is little load on the individual cache item.

13 Sep 2012
20:05 PM

Rafal

Nice implementation, but how is it possible that there was no general-purpose LRU cache in Raven? And watch out for the API: Some time ago I have been tasked with implementing similar, windows-based security and had quite serious problems with accessing user information through UserPrincipal API (there were two mutually trusted domains where users could belong to groups in both of them - in such case the MS API sometimes kept throwing undocumented errors at me). And this made me wonder if we could give all that work to Windows file system - by checking user permissions to designated files representing various application rights.

13 Sep 2012
20:07 PM

tobi

A simple fix is not to cache a T but a Lazy[T] at the highest mode of synchronization.

13 Sep 2012
20:55 PM

Steve

ConcurrentDictionary does not work with a "version" like the generic collections and doesn't throw on modified collections while enumerating, it seems to handle most cases like expected :)

        var dict = new ConcurrentDictionary<string, string>();
        for (var i = 0; i < 5; i++)
            dict.TryAdd("Test" + i, "");

        foreach (var entry in dict)
        {
            string value;
            dict.TryRemove(entry.Key, out value);
        }

Leaves you with an empty dict.

13 Sep 2012
20:58 PM

Steve

@tobi, If he doesn't expect to even reach MaxCacheSize (25) then I see no problem with this implementation.

But do you have the name of the pattern that could handle the load you are describing?

14 Sep 2012
06:45 AM

davh

You forgot to add a log entry when it does clear the cache, so that it is possible to see when cache trashing occurs.

14 Sep 2012
07:01 AM

Ayende Rahien

Steve, That is a ConcurretnDictionary there, it won't throw under this scenario.

14 Sep 2012
07:08 AM

Ayende Rahien

Tobi, Excellent commentary, I've modified the actual cache to use Lazy to avoid this issue.

14 Sep 2012
07:14 AM

Ayende Rahien

Rafal, We actually have several. The difference is that each have different semantics. From Admin, we take into account duration, for caching execution plans, we take into account memory and time. We use specialized versions, because we don't have the need for a uber generic one.

14 Sep 2012
07:15 AM

Ayende Rahien

Davh, Can you explain a bit more about what you mean here?

14 Sep 2012
07:22 AM

Dennis

The most annoying part about LRU caches is that they are prune to cache trashing. If you have more users (in this case) than the max cache size, a LRU is worse than not having a cache. So it would therefore be very useful to get a log entry every time the max size is exceed. Then it is very visible for someone complaining about slow logins if they are in the situation where the cache is not working. And btw, you forgot to prune expired entries before the Least recently used.

14 Sep 2012
07:28 AM

Ayende Rahien

Dennis, I see, good point. I fixed both issues, but I don't think either would have been a problem. This is a LRU for admins, the number of expected admins is low, and the number of requests requiring them is even lower :-)

14 Sep 2012
07:43 AM

davh

You never know when someone chose to make all their users admin and let them access the db :) e.g. some test server in a software development company where they all have admin access to it.

14 Sep 2012
07:45 AM

Ayende Rahien

Davh, The only requests that require admin privileges are things like "create new db, back a db, delete a db". Those tend to be fairly rare :-)

18 Sep 2012
13:45 PM

Patrick Huizinga

One small nitpick: this doesn't look like a LRU (Least Recently Used) cache, but more like a (LFU) Least Frequently Used cache.

06 Oct 2012
17:11 PM

Damien Guard

If multiple threads call in on a cache miss at the same time you'll end up with a cache entry that has a usage of 1.

[)amien

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB