Caching in Web Applications
You have designed a new web application or you have added a new feature to an already designed web application. You followed TDD and DDD principals. You have unit test and automated test in the system and everything is working very well. You went live and everything is ok but when 5 days latter you started a google campaign then you see the application start to crash all the time. All the managers are shouting and are angry. You just ask yourself why the application is working fine on my local and test and staging environment.
after one week of frustration, you were lucky that you did not get fired, then you will realize the application is hitting the databases too much. It was ok in your local and test because it was just you that test the application and you did not do the stress test. Now that your website has real traffic then you can see the problem and bad design of your application.
One of the solution is to cache the data that you just retrieve from the DB and in the consequent request without hitting the database. Seems very easy and straight forward. But no its not. This is one of the most challengeable aspect of modern development. You can cache everything very easy and fast. You can read from the cache also easy and fast but invalidating it in a correct moment is nightmare.
Writing a piece of code that put something in the cache is very easy. if you search you can find loads of examples. But the bottleneck of caching is to choose the right strategy and technologies. There are lots of technologies, and lots of categories of caching to choose. In terms of different aspects, there are differernt categories. If we are speaking about the type of storage of cache then there is couple of categories: Memory, HDD, NoSQL.
if we are speaking about the location of the cache storage then the categories are: In-Process, Out-of-process caching.
In-process, Out-of-process caching: you can put an object in the memory of your application (heap) or you can put it in a memory which is not in your application. These two approaches have their own benefits and requirements.
In-Process Cache
Advantages:
super fast, while it is in the memory of the application. the speed is like read and write a variable.
you can put any objects without serialising it in the memory (in out-of-process you have to)
perfect for a single application, or a web application that runs just on one server
Disadvantages:
if you have more than one server running your web application you cannot share cache between those two servers.
you are limited to the memory of your server and filling up the memory of the server is dangerous because it can effects the whole system.
Cache is deleted whenever your application is restarted because the cache like all your variables are in the memory of your app. So when you deploy all the cache is deleted. Consider you have continues deployment strategy.
Out-of-process caching
Advantages:
The memory of the cache is not shared with the memory of your server or servers.
you are not limited to the memory of the serve.
You will have a single place or cluster to read and write objects from and into the cache. So you have shared cache between your servers.
cache is not deleted when your application is restarting.
Disadvantages:
It is slower than in-process caching.
you have to serialise the object first and then put it in cache. Because the cache is on the network and your object has to be transferred over the network for read and write into the cache. And anything which is going to be transferred over network have to be serialised and deserialised. (if you can find a way to transfer an object as it in the memory over network you can start a new revolution in digital world)
you are limited to the memory of your server and filling up the memory of the server is dangerous because it can effects the whole system.
some of the technologies:
Out put cache: it related to the IIS and you can cache the output of the action methods in the MVC controllers based on the input parameter.
Postsharp cache: enable the caching of any method in the system not just the methods in the controller.
As you can see caching has many different categories and concepts. But the most important thing about caching is to choose a correct strategy. If you still did not think about caching then there is two conditions: 1) your application had not enough users 2) you application is too slow and you dont care about it.
When you are caching an object or a data you have to ask you yourself a question:
When I am going to invalidate this data?
When to invalidate the cache
You can set a time for invalidation. Or you can invalidate it when you want.
If the nature of your data is something that is not changing in your system, considere that you have cultures in your database and you keep the currency, name and measurments unit and etc related to a culture so you know these data are not changed so soon you can put them in the cache with 24 hours or more.
If the data is changing in a specific moments then you need to invalidate the cache at those moments. This is difficult to implimentation but you get alot from it:
you dont need to invalidate or clear the cache manually
your application shows alwasy the updated data
All the monolithic applicaiton has a data layer in which you use the Entityframework or Nhibernate to fetch data. At those places that you are save or update the data you have to invalidate the cache.
In those methods that you read data you have to put your data into the cache but whenever you update or delete data you have to invalidated the cache. However the are some problems with Lazy loading and caching which I will explore in my next post.