An application can go down for a large number of reasons, and your users or customers don't like when your application isn't working. So why wait for them to find out and complain before you learn about it? Learn how to get notified as soon as something gets off.

The very first step to improve your uptime is to monitor you application health: we'll explain how to do so in this post, using HowFast Monitoring's free uptime monitoring service.

Automatically check if your backend is accessible

First of all, if your entire backend is down, you want to know it - and fast. You're lucky, this is very easy, especially with HowFast:

  1. Sign up on www.howfast.tech
  2. Click "add monitor" and type the URL of your backend
  3. Check "send notifications to your email address"
  4. Create the monitor
  5. Profit!

Or if you prefer, see it in video:

That's it, you're done. HowFast will check that URL every minute and you will get notified - under 60s - if for some reason the URL is not reachable at any point! Next time something goes wrong, besides being notified immediately,  HowFast will also measure precisely how long the outage lasts, so that at the end of the month, you know your exact uptime.

That being said, you may want to assess more precisely the health of your application: if the landing page does not trigger any database call for instance, but your database is down, you won't get notified of the issue* but your users will still suffer.

A common best practice to detect this class of issues is to create API health endpoints. Let's see how this work.

* ok, you should also be using the excellent Sentry to get notified of unhandled exceptions (usually resulting in a HTTP 500 error), but why wait until your users get those errors (and get angry) when you can get notified of the issue as soon as it happens and before anyone is affected?

Creating /health endpoints

Complex applications usually have several points of failure. For instance, HowFast is composed of a web API, talking to a PostgreSQL database and an InfluxDB timeseries database, but also to a queue worker to process tasks asynchronously (RQ). If any of those fail, some part of the service will be impacted. What you need is to monitor each of those points of failure, to get notified as soon as they go down.

API reachable: "Always OK" endpoint

This is the easiest one: the application should be reachable. Let's just return an HTTP 200 OK code for every request. Easy.

We can call that endpoint /health/alwaysok. This is actually the equivalent of the monitor you have configured in the previous step - the goal is to make sure the backend is reachable.

Database reachable

What's the easiest way to test a DB connection? Let's just try to read one row from any table, preferably a small one to not stress too much the database server. The implementation will depend on your language / framework. Let's call it /health/database.

Worker available to process tasks

One way to make sure the worker is available is to give it a very simple task and block until the worker finishes, with a timeout to 5 seconds. If no worker is available to process the task, then you will detect it - make sure you return a HTTP 500 response. Again, the implementation will depend on the technology you are using. Here you go, /health/worker.

Monitoring these endpoints

You now have three health endpoints exposed publicly somewhere on your API:

  • /health/alwaysok
  • /health/database
  • /health/worker

These endpoints should answer with a 200 status code if everything is green, or a HTTP 500 if something's wrong. All you need now is to add them into HowFast:

Everything looks good!

Et voilà! You will now get notified as soon as one of your health check detects something wrong, in less than a minute. For free.

Doesn't it feel good?

Securing the endpoints

If people were to find out your health endpoints, they could monitor your own application - not a big deal usually, I agree - or make repeated calls to your DB endpoint to try to overwhelm it. To avoid this, you basically have two choices:

  1. Whitelist the IP addresses allowed to access the endpoint: only accept your IPs and HowFast's IPs so that no one else can access it.
  2. Require a token passed as a query parameter, like /health/db?auth=ehjdjduejejejennenebevd. Use the full URL (with token) when adding your monitor to HowFast, and reject all users who don't have the token.

From a security perspective, solution #1 is more secure since a token can be leaked while an IP address cannot reasonably be spoofed - or if it is you probably will have much bigger issues ;)

Profit

To sum up:

  • you have created a monitor to track the availability of your backend
  • you also have monitors for each point of failure of your application
  • you will get notified as soon as one becomes unavailable, to give you a head start and a chance to fix it before someone notices
  • besides the downtime notifications, HowFast will also take care of aggregating the performance data so that you can see if the database takes an increasing amount of time to respond over time, for instance

Remember: you can't improve what you cannot measure. Start measuring your uptime today with HowFast Monitoring.