This post appeared originally in our sysadvent series and has been moved here following the discontinuation of the sysadvent microsite

The poor referer header. Misspelled and misused since its inception.

Its typical use is thus: if I click on a link on a website, the referer header tells the landing page which source page I came from.

Source URL = www.mysite.com/page1 -> Target URL = www.example.com
referer = "www.mysite.com/page1"

It’s heavily used in marketing to analyse where visitors to a website came from, and also very useful for gathering data and statistics about reading habits and web traffic.

However, it presents a potential security risk if too much information is passed on.

In the referer header’s original RFC2616, the specification lays out that: “Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol” That is, if our request goes from https to http, the referer header should not be present.

However, RFCs are not mandatory, and data can be leaked. Facebook fell foul of this a little while ago, when it turned out that in some cases the userid of the originating page was being passed in the referer header to advertisers when a user clicked on an advert.

Additionally, when traffic goes between two HTTPS sites - as is increasingly common in the move towards SSL everywhere - the RFC does NOT require that the referer header is stripped.

ENTER THE META-REFERRER TAG

A potential solution to these two issues, and more, looks to be the meta-referrer tag. By adding the following tag to the source web page:

<meta name="referrer" content="origin">

the referer header can be edited to allow sites to see where their traffic has come from, but without leaking potentially sensitive data.

The options for the content field are:

  • no-referrer: omit the referer header from the request
  • no-referrer-when-downgrade: omit the referer header when moving from HTTPS to HTTP
  • origin: set the referer header to be the origin only, that is, stripping the any path and parameters from the URL
  • origin-when-cross-origin: if the request is to a different website or protocol, set the referer header to the origin
  • unsafe-url: set the referer header to be the full originating URL regardless of target site or protocol, potentially leaking data.

To use a practical example, if Facebook was to implement this tag as:

<meta name="referrer" content="origin" id="meta_referrer" />

So when Mr Bobby Tables is logged into Facebook, his homepage URL would be: https://www.facebook.com/bobbytables?f=nref

When he clicks on an external link and is taken to a different site, the referer header is reduced to

referer=www.facebook.com

thus preserving his privacy. The target site registers that they’ve had a visitor from a Facebook hit, but the name of the user is not passed on.

Google were the first to implement such a scheme, ostensibly to reduce latency from SSL sites, although one would suspect that being able to prove to clients that your site was the source of their traffic might be closer to the truth.

HANDLE WITH CAUTION

Whether the referer header is implemented with the new meta-referrer tag or not, it is prudent to approach it with a degree of caution.

Referer spam is still an issue - an attacker can target a website using a specific referer header, which is reported by analytics tools to the website owner. Out of curiosity about where their traffic is coming from, the owner will often follow the link back to a malicious web page.

The referer header also opens up potential for exploits and XSS attacks link link. It is trivially easy to manipulate headers, so relying on the header for authorisation or authentication is heavily discouraged.

MISSING HEADERS

The referer header is omitted if:

  • the user entered the URL in address bar
  • the user visited the site from a bookmark
  • the request moved from HTTPS to HTTP
  • the request moved from HTTPS to different HTTPS URL
  • security software (antivirus, firewall etc) stripped the request
  • a proxy stripped the request
  • a browser plugin stripped the request
  • the site was visited by a program (e.g. using curl) without setting a header
  • the meta-referrer tag disallows it
  • the meta-referrer tag allows it but the browser does not have meta-referrer support

For websites that would rely on the referer header for certain advertising campaigns, the patchy and inconsistent usage of the header can be a real problem. Proxy rules allowing access for users originating from specific sites both have a high risk of not working at all depending on the user’s browser or local setup, and are also vulnerable to abuse if the headers are manipulated.

TLDR

To sum up, the referer header was rather flakey, and is now slightly less flakey. It’s often omitted either accidentally or deliberately, and easily faked. It can be a very useful tool in gathering data about web traffic, but probably best not to rely on it for anything especially important at this point.

References and further reading

Why automate Ansible

Ansible can be used for many things. There are only a few things I have on my bucket list of things I would like to do, where Ansible cannot help me.

One of my most urgent things to handle was the increasing complexity of Ansible, its configuration and in particular the role development. As I got deeper into Ansible, more and more factors needed to be taken into consideration when setting up a role: the role structure, linting issues, molecule ... [continue reading]

Comparison of different compression tools

Published on December 18, 2024

Why TCP keepalive may be important

Published on December 17, 2024