Now that you know what type (or types) of dragons you’re dealing with, it’s time to refine your technique. Different databases are built differently and will benefit from different optimization techniques but… dragons are dragons, and most benefit from the same strategies.

Avoid Poisonous JOINs

In the perilous realm of database queries, JOINs are akin to a dragon’s slow-acting poison. While they may seem innocent, the more tables you JOIN in a relational database, the more sluggish your query becomes. The database needs to meld rows together based on shared attributes, and this can significantly reduce performance.

As an antidote, ensure indexes are created for joined columns, or use specialized column types like JSON. Consider alternatives like changing the data model, data denormalization, or switching to document or key/value storage models. Some relational databases such as SQL Server and Postgres allow you to write “materialized views” which persist query data and can be refreshed on an interval. Document databases such as RavenDB allows you to store all related data in a single location, avoiding the need to
JOIN to other locations entirely.

Load Data Ahead of Time

A common problem is performing many separate queries when querying data from a database (called an “N+1 select” problem). This is when you query for a list of items and perform another query for additional information per each item in the list. This is like facing a dragon armed with nothing but a butter knife. Sure, you might eventually get the job done, but it will take far longer and be much more perilous than if you had the right weapon in the first place. This is most commonly encountered when using Object Relational Mappers (ORMs) like Hibernate or Entity Framework. Your database has to make multiple round trips for data, hampering performance.

To avoid this trap, preload data using techniques like eager loading, caching, or batch loading. This is often specific to the ORM being used, such as the “Include” method in Entity Framework. On the database side, utilizing materialized views or stored procedures and using field projections to limit the data retrieved can also optimize performance. For distributed internet-scale-scale applications, many companies add a caching persistence layer in-between applications and the database with technologies like Memcached or Redis. This speeds up lookups for commonly used data but adds significant operational overhead.

Your API design is also paramount in this case. Common REST based interfaces are particularly prone to the “N+1 select” issue, since they are hard to compose well. GraphQL, on the other hand, gives you far better performance because it is meant to allow you to compose the data at the server level, instead of continuously going back and forth to the database.

Pay Attention to Query Plans

Entering a dragon’s lair without a plan is a recipe for disaster. Similarly, in the database world, inefficient query execution plans can slow down performance considerably. This can happen due to outdated cached plans, improper statistics, or suboptimal query plans.

To counteract this, employ tools like SQL Server Query Store, Execution Plan Analyzer, or SQL Server Profiler. Many databases also provide commands like EXPLAIN and ANALYZE to help you fine-tune your query plans. Those tools usually require a dedicated Wizard to manage on an ongoing basis.

RavenDB is a lizard of a different color, instead of relying on the head wizard for query optimization, the RavenDB Query Optimizer is able to not only produce efficient query plans but to also change the very battlefield for its advantage. Other databases require that you’ll have a full suit of armor and understand the entire state of your application to create the appropriate set of indexes. RavenDB is able to read your queries and produce the required indexes (or modify existing ones), leading to optimal query performance and continual improvement without ongoing maintenance.

Similar to the difference between Western and Chinese dragons, indexes in RavenDB compared to other databases are vastly different. An index in SQL Server, for example, is able to serve only a particular set of queries, while another index is required for even slightly different questions. RavenDB is able to serve many different queries from the same index, due to vastly different internal architecture.

Offload Logic to the Database

Trying to find specific data in a non-indexed column is like searching for a magical gem in a dragon’s massive hoard—virtually impossible. SQL views, MongoDB’s aggregation pipeline, and server-side functions can act like magical maps that quickly point to what you want. This has historically been hard for application developers to accomplish, as traditional relational databases require you to write in specialized languages or keep database logic separate from your codebase.

The landscape is changing now with modern databases like RavenDB allowing developers to write server-side logic in familiar, higher-level languages like C# and JavaScript that can be defined (and deployed) alongside application logic. This makes the developer experience more accessible and reduces operational complexity.

DRAGON WRANGLER TIP:
Modern data architecture usually incorporates some messaging, event sourcing, or queuing infrastructure like Kafka or RabbitMQ. This allows organizations to unify systems together in a distributed environment and make data handling more robust. Modern databases support ETL (Extract, Transform, Load) processes that make it easier to integrate into existing environments. Enterprise RavenDB customers can use it as a native message sink for Kafka and RabbitMQ, as well as push data to SQL, OLAP, Postgres, Power BI, message brokers, or Elasticsearch automatically.

Design a Pragmatic Data Model

With schemaless databases, there are typically no constraints on how data is stored which is both a blessing and a curse. More thought needs to be put into how to model your data domain. Poor data modeling decisions can result in slow queries, inefficient data retrieval, and increased disk usage.

In relational databases, there is a tendency (or an expectation) to be strict with normalization. However, normalization makes queries more complex, and more complex queries take (significantly) more time to execute. In your quest to optimize query performance, you will likely need to introduce denormalization into your data model, especially when choosing a schemaless database. Don’t fear it – embrace it. If your data model is not designed for the patterns of data access you require, performance will suffer no matter what database you end up choosing.

With RavenDB, you have the choice to adapt your data model over time with little trouble. You can keep your old data in the previous format or ask RavenDB to convert it. There is no need for complex migrations processes or the usual costs associated with realizing that you need to update your model.

You Probably Don’t Need Sharding

Many NoSQL databases allow you to partition data into “shards” to support massive databases. Unfortunately, we see customers all too often reach for sharding before it’s really needed – making things overly complicated before it’s warranted. For example, MongoDB databases are typically sharded after reaching over 2TB in size, whereas RavenDB lets you scale to 10TB before you even need to think about sharding.

Sharding is powerful but complex. Shards can grow independently, which means over time, they may need to be rebalanced”. Rebalancing is almost always a manual operation and, in the worst case, will require scheduled downtime. In practice, sharding is easy to get wrong initially, and sometimes applications need to be rewritten or updated due to early mistakes.

The worst aspect of sharding is that it front loads all your costs. You have to accept the additional complexity and costs of sharding at the very start of your system, while you reap the benefits only when the system is extremely large. That is a lot of time to pay with additional complexity until you can see the results.

Choose Transaction Scope Wisely

Picture a band of knights waiting their turn to strike a dragon—one at a time. This scenario mimics the potential pitfalls of transactions in relational databases, where concurrent transactions can result in locks and blocks, causing performance issues.

To avoid this bottleneck, optimize your transactions and isolation levels. Techniques like row-level locking, shorter transaction durations, and appropriate isolation levels can help you and your knights strike in unison. As you can guess, this is complex and requires very careful coordination.

Be Wary of Unbounded Data Sets

Unbounded data, like your Amazon order history, can wreak havoc on both writing to and reading from a database. With NoSQL databases, you may be storing data that contains unbounded lists in a single document or item. Unbounded data grows over time (such as a customer’s order history), which leads to large item sizes. Storing large items in a database can impact performance due to slower read and write operations, increased storage costs, and higher network utilization.

When modeling unbounded sets in schemaless databases, consider storing pages of items with range-based keys (e.g. "Customer/A-0035/OrderHistoryItems/1," "Customer/0035-A/OrderHistoryItems/2"). This way, you can take advantage of loading pages of items in the set at once yet keep individual item sizes low. Additionally, use paging or limiting operators (like LIMIT or TOP in SQL) on the application side to prevent memory problems, increased network latency, or slow queries due to large result sets.

Keep Storage Bloat in Check

Like a dragon continually hoarding treasure, most databases grow bloated over time. Fragmentation and document bloat can lead to decreased performance, increased disk usage, and slower write operations.

Some databases require you to configure and perform regular vacuuming to mitigate table bloat and fragmentation. Set up auto vacuuming correctly and consider manual vacuuming based on workload characteristics. Configure alerts for disk usage so you don’t get caught by surprise.

Sometimes your data lasts forever, like diamonds, but quite a lot of data is only meaningful for a certain duration. Setting up expiration policies can dramatically reduce the amount of craft that you gather over time.

DRAGON WRANGLER TIP:
Check how much regular maintenance is expected for the databases you evaluate and what tools they provide to make it easier. RavenDB aims for unattended operation. It doesn’t require regular maintenance by an administrator, and many janitorial tasks like “vacuuming” aren’t needed because the storage of database files is managed internally for you.
RavenDB also allows you to register a document to expire at a certain time automatically, so you won’t have to manually clean your warehouse whenever the “disk is nearly full’ alert is blaring.

Don’t Ignore Configuration and Resource Allocation

Imagine donning armor that’s too heavy for battle against a swift dragon. In the database world, this is akin to under-provisioning resources like Azure CosmosDB’s Request Units, leading to slow performance and timeouts. Hardware choices have important implications for database stability. For example, NVMe SSD storage will increase disk operations (IOPS) and outperform other types of storage but may require ephemeral disks that your database needs to tolerate losing.

Follow database vendor configuration guides and hardware recommendations to achieve optimal performance. Adjusting working memory size, shared buffers, cache size, max connections, threads/task queue lengths, and connection modes are common knobs to tweak for most databases that can have a profound impact.

DRAGON WRANGLER TIP:
Instead of guessing, relying on trial and error, or waiting until disaster strikes, look for database features that help you optimize resource allocation continuously. RavenDB self-monitors and surfaces any relevant action items like slow disk activity, request performance degradation, or heavy indexing through notifications in the Studio.

It’s All in the Technique

Use this chapter as a checklist for your data architects, vendor evaluation, or as a way to get more mileage out of an existing solution. No matter what database you choose, each one has its unique performance dragons to tackle but at least we’ve covered the most common optimization techniques that will get you 80% of the way there. In the next chapter, we’ll explore how indexing acts as a shield against dragon fire.

Contents