System design basics (Part 4) — Latency, throughput and availability

Thomas Varghese

2 min readOct 28, 2021

In any system, latency, throughput and availability are key measures of system performance.

How do we define each of these?

Latency is the time it takes for a certain operation to complete in a system.

Most often, this measure is a time duration, like milliseconds or seconds.

What are some typical latencies?

Reading 1 MB from RAM: 0.25 ms
Reading 1 MB from SSD: 1 ms
Transfer 1 MB over network: 10 ms
Reading 1 MB from HDD: 20 ms
Inter-continental round trip: 150 ms

Throughput is the number of operations that a system can handle properly, per time unit.

For instance, the throughput of a server can often be measured in requests per second.

Availability of a system, is the odds of a particular server or service being up and running at any point in time, usually measured in percentages.

For example, a server that has 99% availability will be operational 99% of the time.

When designing any system, it is important to keep the following factors in mind:

What are the parts of the system which need high availability and which are the parts of the system which can function without high availability?
For example, a payment gateway would need to have high availability, but a service to say, update user profile information may be less important to prioritize for high availability
What are the potential areas in the system which could be single points of failure; in this case, it is important to add redundancy, or duplication of resources in the system to ensure it does not fail at a single point
Passive redundancy can be achieved with more servers or load balancers
Active redundancy can be achieved when multiple machines are setup to work together; i.e active machines take over work of a machine that has failed

High availability is used to describe systems that have particularly high levels of availability, typically five ‘nines’ or more;

Five ‘nines’ of availability would refer to an uptime of 99.999%.

So typically downtimes expected per year depending on the number of ‘nines’ could be as:

99% (two 9s): 87.7 hours
99.9% (three 9s): 8.8 hours
99.99% (four 9s): 52.6 minutes
99.999% (five 9s): 5.3 minutes

Lastly, availabilities are typically provided or guaranteed to end users via SLAs, or service level agreements; multiple SLOs, or service-level objectives would make up an SLA.

System design basics (Part 4) — Latency, throughput and availability

Written by Thomas Varghese

No responses yet