I suspect that this is of broader interest, so below are two approaches for scaling CPUs, memory or bandwidth.
Queries per second
A single machine Meresco system runs between 10 and 100 queries per second. Scaling this requires adding more machines so load can be distributed over CPUs and networks. There are two approaches.
Replicate the entire server process and feed updates to them simultaneously.
Extract the most demanding components from the server’s configuration and put these on separate machines. Reconnect them using the Inbox component.
Both approaches are based on standard Meresco functionality and therefore easily configured.
Record updates per second
Meresco is able to process 1 to 10 updates per second concurrently with querying. Scaling this up requires adding machines that can share the load of processing the records using approach B. These machines can feed into one or more query processing machines, effectively enabling scaling along both axes.
The main idea is to decompose a system into subsystems which can be distributed and replicated. This analysis must be done before a system can scale up using cloud-like environments. How Meresco’s configuration supports this will be outlined in a future blog.
Total number of records
Meresco can host 10 – 100 million records on one machine, mostly limited by what its indexes can do. Scaling up requires a closer look at these indexes to see how additional resources must be allocated. In this area Lucene, BerkeleyDB and OWLIM have earned great reputations. Meresco’s architecture helps to get the most out of these.
Meresco’s homegrown Facet Index and Sorted Dictionary Index (used for auto-complete) can be scaled following approach B. However, with a single-node limit of roughly one billion records most applications would not need more than one node.
I realize that I only scratched the surface of how to scale Meresco. There are many details to discuss and you probably wonder how your situation could be dealt with. I’d love to hear your responses!