At Seecr we continuously both scale up and scale out our systems, but we also improve efficiency continuously. Here is why and how we do it.
Scalability versus Efficiency
Quite often, people think that scalability is everything. But scaling an inefficient system, if at all possible, is going to be expensive and might even stop you completely. It certainly looks nice when Amazon adds 100 extra machines to your system in an instant, but it might just as well be a gross waste of resources. And as long as money is not a problem, the problems related to inefficient systems can remain hidden for a long time.
Why is Inefficiency a Problem
We are now living in an era where more and more people need no explanation as to why inefficiency is a bad thing. Much to my delight, the mere idea of wasting something is becoming increasingly sufficient to let people act. So I won’t go into that direction. There is another problem however.
Inefficiency is caused by something. And when the time comes that you do want to improve it, you need to address the causes. And then it might turn out that you are a little too late….
Here are two significant causes we have observed frequently:
- Programming negligence
- Wrong algorithm
1. Programming negligence.
Programming is quite a difficult task and each problem has different aspects that need careful attention. There are the matters of primary design, testability, consequences for other features, refactoring, selection of libraries, code style, readability and intention, integration with other parts, packaging, upgrading and data conversion and on goes the list, endlessly. That’s the nature of software.
Efficiency is definitely somewhere on that list of aspects. It is all too natural that, once code functionally works, everyone is glad and moves on to the next task. But at that time, some aspects might not have received the attention they require, and efficiency is often among them. If this goes on for some time, you’ll end up with many places needing small fixes.
But you can’t fix it later. The reason for that is as profound as it is simple: there are just too many small things, each of which contributes only little to the problem. It is the power of the many. Addressing each of these problems requires you take time to delve into them again, with only little reward: each single problem solved improves efficiency only a little. You’ll have to work through them all to see results.
If only you would have made different choices in the first place, when you were initially writing it…
2. Wrong algorithm.
A problem can often be solved with very different solutions. Naturally, you first pick a solution based on your experience and understanding of the problem, then work it out. Often it becomes clear during the process that another solution is a better fit. This is a direct result of an increased understanding of the problem while working on it. Deep understanding of the dynamics that arise when the problem and the solution interact might also arrive later. For example when you run tests with fully populated data sets and unforeseen usage patterns that do not appear in testing environments. It turns out that you will need a (completely) different algorithm to solve the problem. Back to the drawing board. That’s the nature of software too.
Dead in your tracks
Both problems, many small inefficiencies and wrong algorithm, are not just a pair of non-optimalities of your system. They both have the ability to simply place required throughput and responsiveness beyond your capabilities and budget. Because both problems require a complete rethinking of your system: go over all the details again, and improve them, or go over the main design again and change it. This costs a lot of time, and, most importantly, it takes the time of the most specialized people. If you could only have made other decisions when the system was first created….
What are solutions?
Let me get this straight first: getting better programmers, smarter architects or more elaborate processes including a lot of quality assurance does not solve the problem. While some people do learn faster or can have more things on their minds, and while some elaborate processes do catch more problems, they basically only ameliorate the situation marginally. They do not address the limitations fundamentally.
So what is the fundamental problem then?
The fundamental problem is that:
- people are given too many things to worry about at the same time.
- people are given too little time to learn and understand well.
In short, they suffer from Continuous partial attention. Now, the solution becomes almost evident: use more people and give them more time.
You would probably say: “but I can’t spend more people and more time, you know that budgets are tight and your solution is not realistic.” Well, if you really think that, stop here.
If you think you are in control and you can change things for the better: continue.
First: Pair Programming
The most profound impact on software quality (that’s what we’re talking about) I have ever seen is the working in pairs. Pairs have double the capacity to deal with al the things-to-worry-about. But that’s only the beginning. The dynamics that play between pairs is really what pays off. Each one has his or her own set of aspects of interest, and stimulates the thinking process about the other.
Pairs have the advantage to easily mix. Depending on the task at hand, on personal preferences, on tiredness even, one person might switch with another. This new combination will pay attention to other aspects with a fresh mind. Also, there is knowledge exchange happening automatically. Purposely shuffling with pairs is a strong tool in the hand of a team. It allows you to increase the number of aspects that deserve attention.
But having more brains at the task is only half the solution. The other half is giving these brains enough time to learn and understand the problem. So if one pair is functionally done, prepare to let a third person replace one of them. There will be points left over for him or her to think about. While the first pair focussed on getting it functionally right, the second pair looks at other aspects such as efficiency. In fact, it is a simple application of Make-It-Work-Make-It-Right.
It all comes down to the careful allocation of people and time. Look at their skill, interests and allocate enough time. I can’t stress this enough: proper allocation of time means saving time. Full stop. When you rush, you are going to regret it later; and someone is going to pay for it. It is therefore an act of will to allocate time in the beginning, right when you are solving the problem.
The only way to build scalable systems is by first making efficient systems. And for that you need to allocate enough time and people before you scale up.