What I learned from building three high traffic web applications on an embedded key value store
I grew up as part of the fad that all web applications must be built on proper architecture, usually involving a well tested and stable SQL database like Postgres, or, in some stretch cases, a NoSQL database like MongoDB. The reasons were simple:
- They can be scaled and scaled and scaled. Very important when your application gets its first one billion users overnight.
- SQL is the best way to access your data. You always need a complicated message layer to access that data.
- If you don’t use SQL, how will you perform joins? How will you build in complicated search? How will you prevent your code from going so large with complexity that it falls and kills your cat?
Why build an application where you won’t need to employ an experienced database administrator to create your complicated schemas and perform the very important and complicated database maintenance tasks?
In 2015, I stumbled on the Golang BoltDB database library, and first used to save state in basic server-side applications. Then, I stumbled on blevesearch, and it dawned on me.
The most basic thing most databases offer is storage, and flexible search over stored data. With Boltdb, I could store my data efficiently, and with bleve, I could search and access my data in interesting ways.
blevesearch
On this basis, I built and launched Calabar Yellow Pages on just BoltDB and blevesearch, and then when I saw how smoothly it worked and scaled in production, I went on to build Shop440 (A merge between Shopify and AliExpress) on BadgerDB (a much faster alternative to Boltdb) and blevesearch for indexing.
What I learned:
Key Value Stores are fast
First, a NoSQL key value store is fast, much faster than a comparable relational database. Its speed comes from its simplicity. A key value database stores a data record using one primary key. The key means the record is uniquely identifiable, and can be directly accessed. Other than this, it’s up to the developer to architect any more complexity in accessing data.
Also, since the key value stores I used were embedded databases, I was able to strike off some of the processing that comes with more established databases.
In a database like MongoDB, a lot of resources and latency goes into serializing data and passing it over the wire to the client application which then deserializes and decodes it.
This is a waste of resources, especially when the client and the database are running on the same server, and not on separate servers, where they would benefit from communicating over HTTP. An embedded database strikes out all of these extra serializations and even tcp transport costs.
Great Scalability — What About Consistency?
Secondly, a NoSQL key value database is also highly scalable. This too is a function of its simplicity. Unlike a relational database, a NoSQL key value database is not obliged to scale vertically. It can scale over several machines or devices by several orders of magnitude, without the need for significant redesign. Financially, this is a big advantage too.
But in my case, since using an embedded key value store, most scaling happened vertically and even sustained for a very long time. At the moment, even with a database size of almost a terabyte, the application is still holding strong.
With Badger, I came up with a strategy of having multiple Badger databases, each representing a collection. This way, if I ever have a need to scale the system beyond a single server, I could isolate each individual database and its corresponding program logic into a separate micro-service (I will write more about this in the future).
Cheap on time and effort
Scaling a relational database solution often means that cost increases disproportionally. It also requires time and effort (and server downtime) to change your database schema.
By comparison, a key value database keeps the price curve linear rather than exponential, and it is designed to handle data without any predefined schema. Some (not all) key value databases, like other NoSQL databases, are also designed for ‘eventual consistency’.
This means that data are synchronized ‘at some point’ between copies on different machines, but not immediately.
Schema-Free Rows and Columns
A case for Embedded Key Value Stores
According to https://www.sqlite.org/whentouse.html,
SQLite works great as the database engine for most low to medium traffic websites(which is to say, most websites).
The amount of web traffic that SQLite can handle depends on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic.
The SQLite website uses SQLite itself, of course, and as of this writing (2015) it handles about 400K to 500K HTTP requests per day, about 15–20% of which are dynamic pages touching the database. Dynamic content uses about 200 SQL statements per webpage.
This setup runs on a single VM that shares a physical server with 23 others and yet still keeps the load average below 0.1 most of the time.
Most embedded key value stores like Badger are very very much more efficient and performant than SQLite, so this could go to show how much more is possible with embedded key value stores, if SQLite can handle so many hits per day.
Also, using an embedded key value store alongside an embedded indexing engine like blevesearch in a compiled language like Golang means you can truly deploy single binaries with no external dependencies and no need for post deployment setups.
In a nutshell
Embedded key value stores give you all of the advantages of a NoSQL database, but where it is lacking is in search, since you’re only able to query for items by their keys and key’s prefix.
But when paired with an indexing engine like blevesearch in Golang, or Elasticsearch and Lucene, the pair gives you a very capable database with features like full text search, location based search, etc. in a very resource efficient package and scales well.
I believe this pair should have more popularity than it actually currently has, and goes a long way in reducing server expenses, especially with applications that never expect Facebook level scale.
This article was originally written for hackernoon, at: https://hackernoon.com/what-i-learnt-from-building-3-high-traffic-web-applications-on-an-embedded-key-value-store-68d47249774f