Is your SQL database creaking? ...Time to architect for growth

When is SQL not enough?

SQL was developed in the 1970’s to provide a querying structured data. Data was organised logically, with defined relationships between the different bits of data. For example a person has a name and address. Some of those bits of data have very specific relationships, such as a person having a single specific date of birth. These relational databases have done amazingly well to allow us to model massively complex data, and the big database software (Oracle, Microsoft SQL Server and MySQL) have done a great job at allowing us to manipulate enormous data sets of tens of millions of items. How big they can go depends largely on the complexity of the application and queries. Scaling up a system by hosting on bigger, faster hardware works well up to a point, leads to to higher costs and for larger applications if the biggest server you can find just isn’t fast enough, you have nowhere to go. Fundamentally, the limitation is that none of those systems is great at distributing workload horizontally across multiple servers.  When Larry and Sergei wrote their paper on building a large scale search engine, they suggested that indexing more than 100 million web pages would be much more complicated. There are now approaching 5 billion and growing!

 

Clues that you are approaching the limits of a traditional SQL database design

When looking at a specific application, there are often some telltale signs that your application might need some re-architecture. Here are a few of them:

  • The database occasionally slows to snail’s pace, perhaps because of a "large query” or a “batch job".
  • Your database server is already running on large, expensive hardware.
  • You are reaching the economic limits of how much RAM you can install in a single database server. e.g. Currently 512GB or more qualifies.
  • You spend a LOT of time optimising SQL queries.
  • Traditional performance enhancements such as looking at efficient indexing have already been done. Probably multiple times.
  • You are thinking of employing, or already have, an SQL performance specialist.
  • Developers are looking for ways to split up the database load onto multiple servers
  • Your finance director suggests you build a separate data warehouse server so that they get swift access to routine metrics

If some or all of these are happening for you, then it’s time to look at re-architecting you systems.

 

What are the solutions?

There are many ways of tackling scaling, and the most appropriate solution will depend on your specific environment.  The following steps can help guide you towards a solution.

  1. Analyse current capability
    What is the current system capable of? How much data can it hold? What volume of transactions and queries can it deal with? Under what circumstances does it slow down?
  2. Understand the growth targets
    You have to agree some realistic targets. Nobody wants an open ended design goal, as that tends to lead to open ended budgets, which don’t go down well with your finance team.
  3. Optimise SQL queries and indexing
    The fastest way to improve performance of almost every database application is to first look at optimising the database indexing (which your DBA should be continuously keeping an eye on) and then dig into all your most frequently used and resource-hungry queries and optimise them. All the major platforms have the ability to look at how a particular query is going to get executed in the database, and it is often possible to rewrite the query to directly reduce the load on the servers.
  4. Optimise hardware
    Another speedy way to increase capacity is to upgrade hardware. Typical upgrades include, adding RAM, using faster hard disks, using SSD drives instead of traditional spinning disks, using more disks, increasing the speed or number of network ports to the server and increasing the speed or number of processors.
  5. Split the workload to multiple databases
    Re-architecting the system to split the workload from one server to multiple servers is typically the first step that can truly increase the scale potential for the system. Whether the system has a logical split or the database can be sharded into multiple parts, this can have many traps, where separating out one part of the system might speed that up, but the additional communications overhead might slow down another part of the system.
  6. Move the problem to software
    Sometimes it is possible to simply rewrite the application so that it is not dependant on the traditional SQL database to get its answer. For example, rather than asking the database for a count of certain data items, write that function into the application and store it continuously in memory where it is available to whatever processes need to know. This can create lightning fast results to specific, known questions.
  7. Migrate to NoSQL / Big Data platforms
    Systems such as Hadoop, CouchDB, MongoDB, and Redis, were architected from the ground up to deal with huge quantities of data. They all have different advantages, so the choice can be complex, but the potential for scaling an application on these platforms is huge. These are the types of technologies that underpin all our Internet scale applications such as Google and Facebook.

 

Keep it simple

The solution depends on where you need to get to in terms of scale. However, it also depends on where you are now and what resources you have available to you. For a greenfield application you can architect a slick solution based on the latest technologies, but to make sure an existing SQL database keeps running with next month’s customer growth, that might not be feasible. The best solution is the one that achieves your design goal with the least resources and in the least amount of time.  There is little point embarking on a total application re-write if that will take you 18 months and your systems are creaking right now. 

First look for the simpler, frictionless solutions and implement them. These are the ones that take days to implement rather than months. Then use the capacity headroom you have built as breathing space to do the more complex work, and don’t invent something yourself if you can buy it off the shelf. Both Google and Amazon now sell platform based big data solutions, where you load up your data and start running queries right away. It is easy to get sucked into thinking the answer to a complex problem has to be complex, but new solutions are being developed every day and solutions can be simpler and more accessible than you might think.