About 20 years ago while working at Sun Mircrosystems, L. Peter Deutsch listed a set of common wrong assumptions made by developers while they were designing and building distributed systems. This list is more commonly referred to as the 8 fallacies of distributed computing. The fallacies are:
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology doesn’t change
- There is one administrator
- Transport cost is zero
- The network is homogeneous
In this section we will explore the effects of each fallacy on a distributed system. In general, these fallacies result in a system that is inefficient, insecure, and costly to maintain.
The first of the fallacies is an easy way to set yourself up for failure. Murphy made sure there will always be things that can go wrong with the network. A few possible problems are: power failure, old network equipment, network congestion, weak wireless signal, software network stack issues, rodents, DDOS attacks… etc. The list could go on, but the bottom line is that you must design with an expectation that there will be some kind of network failure.
Latency is the time it takes between a request and the start of actual response data. Many programmers wrongly assume that latency is low especially during development where the system they are working on is running within a corporate network. Latency within these kinds of networks can be as low as one millisecond. In production, when traffic is going over the internet, the story is very different. At this phase, latency is not a constant rate but changes very often.
The fact is there is no instantaneous transfer of data and each request requires an entire handshake for setting up, transferring, and tearing down the connection. This is all unavoidable due to the factors inherent in networking, such as distance, volume of traffic, networking equipment, network stack, dns, tcp, web servers, etc.
For developers who do not take this into account, the resulting user experience will be poor, and users will most likely think the software is broken.
Some common strategies to address the latency issue:
- Move servers closer to where the client is located
- Make fewer bulk requests
- Use of content delivery networks (CDNs)
- Compress content that needs to be transferred
- Optimize web pages using tools like Yslow
- Optimize firewalls
Bandwidth is the capacity of a network to transfer data. Higher bandwidth means more information can flow through the network. Even though network bandwidth capacity has been improving, at the same time we tend to increase the amount of information we want to transfer.
This fallacy is closely related to “latency is zero.” One solution for zero latency is to make fewer requests and if possible make them large requests. Realizing that bandwidth is limited, one has to decide the appropriate amount of information to transfer given the network on which the system is running.
The only completely secure system is one that is not connected to any network. Keeping the network secure is a complex task and is a full-time job for many professionals. There is a wide variety of both passive and active network attacks that can render network traffic unsafe from malicious users. How does one design a system that can handle such a diverse threat landscape? Part of the solution lies within the software itself, part has to do with network and infrastructure design.
On the software side of things, we design software starting with security in mind and follow best practices for secure software design, coding, and testing. We also define the security requirement and model threats to anticipate the possible attacks on the system.
On the network and infrastructure side, some common security techniques that can be effective in protecting distributed systems are use of approved encryption; use of digital signatures and message authentication codes; authentication mechanisms; PKI and use of certificates, and firewalls.
A network topology is the arrangement of the various network elements. The network landscape in a modern computing environment is always evolving. For new developers, it is easy to fall into the trap of assuming that the operating environment does not change. Software that works in the dev and test environment fails when deployed to production or cloud scenarios. A few ways to handle this are:
- Centralized configuration
- Design expecting changes
- Handle failures
For small systems this might not be a problem, but once you deploy to an enterprise-wide or Internet scenario where multiple networks are being touched, one can be sure they are managed by different people, different security policies, and requirements. Some possible ways to address this are:
- Involve system administrators earlier in the process
- Become familiar with limitations or policies of the production environment
- Provide administrators with tools for managing and diagnosing issues
There are a few ways of interpreting this fallacy. One has to do with the overhead of serializing data into the network. The other has to do with the costs of the networking infrastructure. Both of these are realities that cannot be avoided but following the strategies for limiting number of requests and limiting the amount of bandwidth being transferred should help keep the costs down.
A homogeneous network is one where the elements within the network are using a uniform set of configuration and protocols. For very small networks this might be the case, but for large distributed systems such as web applications, one will not be able to predict the devices that will connect, the protocols used to connect, the operating systems, browsers, etc. The best way to handle this situation is to use uniform standards and avoid proprietary formats as much as possible.
Designing distributed systems is a big challenge, and being aware of the 8 fallacies of distributed computing will help you avoid working from the wrong assumptions. Focusing on the use of standards, protocols, and services will ensure maximum interoperability and extensibility as the system evolves.
In the third part of this series we will explore some of the best practices for implementing distributed systems on some common platforms.