Looking at several of the websites tracked by HowFast Monitoring, I can see a common pattern: frequent peaks in their response time, in a very periodic way:
Using HowFast's split time feature, which analyzes the response time, we can see that it's actually caused by periodic DNS resolution requests. HowFast does not cache the host locally, and instead asks its local resolver every minute.
These peaks in response time account for 150ms every two minutes, on top of the ~600ms taken by the remaining steps of the request.
What is causing these peaks? The DNS TTL.
The periodicity of these peaks comes from the fact that DNS records are cached by DNS resolvers for a given amount of time, so that authoritative servers - the ones actually controlled by the website's owner - don't get overwhelmed by requests every time a user wants to access the website. These DNS records get cached for a duration defined by the Time To Live, aka TTL, defined on a per-record basis.
In this example, the TTL is set to... 60s! Sixty seconds. The recommended value, in 1987, was 1 day, and is usually set to a couple hours for most public hostnames. The resolver can only cache the record until the next ping, thus producing alternating "result is not cached - resolve it" (~150ms) and "result is in cache - use it" (1ms).
Setting the TTL to a higher value would mechanically lower the response time, but you need to pay attention to something else...
How to find your ideal DNS TLL
If records are cached for a long time, two good things will happen:
- your website will be faster on the first load
- your DNS authoritative resolver will be less loaded
However, there are also negative counterparts: if you change your records, it will take longer for them to be propagated to all the internet. For instance, let's pretend your server has an issue / you want to use another server. Changing the DNS will take your TTL value to be available to everybody, and your old IP will still receive traffic in the meantime. Not always cool.
You may also be running a High Availability setup, where a DNS load balancer will send the traffic on a few different machines, to spread the load. In order to react quickly to load changes, you might need to add new records if you added servers, or remove some of them if you got rid of unused servers. Having a high TTL will likely be a major handicap in this situation.
For instance, GitHub has a 60s TTL, but will more likely get more traffic than you (the "60" is the TTL value):
$ dig github.com @22.214.171.124 ;; ANSWER SECTION: github.com. 60 IN A 126.96.36.199 github.com. 60 IN A 188.8.131.52
The questions that you need to ask yourself when choosing the TTL for DNS records are then:
- How much do I value a short page load time on my users' first page load?
- Am I going to change the IP address of my server often?
- Am I running a HA (High Availability) setup?
You can get more flexibility in your setup by using an "elastic IP" that can be attached to a different server whenever you need it, so that you don't have to change your DNS records.
If you are running a HA setup, you probably want to react quickly to changes so you will privilege TTLs of a few minutes. Otherwise, a TTL of a few hours (6-12 hours) is a good compromise and will help you make your website faster.
One other thing to take into consideration is the type of record: you probably won't change your MX (mail server), DKIM / SPF (antispam features), or TXT (metadata) very often.
If you are interested in understanding what makes your website or API slow, give HowFast a try: it's free, can track unlimited targets, checks every minute your response time, and notifies you if your website goes down. Start tracking now for free!