This post appeared originally in our sysadvent series and has been moved here following the discontinuation of the sysadvent microsite
Sometimes you come across problems with websites that normal configuration does not address usefully. A case in point was a PHP-based application that from time to time returned a 302 to a login page instead of the front page, which is not optimal when you serve news articles.
Our solution was to add a simple rule to Varnish, so we serve old cached content, using “grace”, instead of the redirect. Grace allows Varnish to serve expired content in case there are problems fetching fresh version from backend. And while we are at it, lets do the same trick for fatal backend errors too:
sub vcl_backend_response {
# Ignore redirect from front page (as this problem only occurs there):
if (beresp.status == 302 && bereq.url == "/") {
return(abandon);
}
# Serve old content when backend is sick:
elsif (beresp.status >= 500) {
return(abandon);
}
# Make sure grace is active so we can service old content:
set beresp.grace = 48h;
}
Now, if the PHP application misbehaves fully, resulting in a 5xx response, or returns a redirect page instead of the front page, Varnish will just abandon the response and serve the old cached content instead, up to 48 hours after ttl.
This, of course, only works if the content /is/ cached and can be served from cache. To optimize this, you should make sure of the following:
- Clean out any cookies not strictly needed. A front page should typically not need any cookies, neither does static content. Basic rule is: “only allow selected cookies on selected urls”. The Varnish documentation have a useful rule block to make this easier. This can give you a big performance boost as Varnish can serve a larger amount of the content from cache. Also, as a safe guard, add cookie to the hash_data, the collection of parameters Varnish use to create a unique identifier for an object, which defaults to host name and URL, so you don’t serve user specific content to wrong users by mistake. Just add this to vcl_hash:
sub vcl_hash {
hash_data(req.http.cookie);
}
- Do not ban (“purge”) content from cache. If you need to invalidate content, use the “softpurge” vmod instead:
import softpurge;
sub vcl_hit {
if (req.method == "PURGE") {
softpurge.softpurge();
return(synth(200, "Successful softpurge"));
}
}
Softpurge will expire the object in cache, enabling Varnish to serve it with grace as needed.
With these small additions to the Varnish configuration, your visitors should be less affected by a misbehaving application server.
Thoughts on the CrowdStrike Outage
Unless you’ve been living under a rock, you probably know that last Friday a global crash of computer systems caused by ‘CrowdStrike’ led to widespread chaos and mayhem: flights were cancelled, shops closed their doors, even some hospitals and pharmacies were affected. When things like this happen, I first have a smug feeling “this would never happen at our place”, then I start thinking. Could it?
Broken Software Updates
Our department do take responsibility for keeping quite a lot ... [continue reading]