key-scaling-concepts

This article covers a number of concepts critical for building scalable applications.

what is vertical scaling

In vertical scaling (“scaling up”), you’re adding more compute power to your existing instances/nodes.

what is horizontal scaling

In horizontal scaling (“scaling out”), you get the additional capacity in a system by adding more instances to your deployment.

what is better, horizontal or vertical scaling?

Horizontal scaling is often preferable to vertical scaling.

No need to take server offline while scaling up to a higher capacity instance. There is built in redundancy.
More cost effective as instances can be scaled down. You’re not stuck always paying for peak demand.
Vertical scaling often has limits. Also vertical scaling is more expensive. https://dba.stackexchange.com/questions/102179/why-is-vertical-scaling-expensive.

what is x , y and z axis scaling

The book, The Art of Scalability, describes a really useful, three dimension scalability model: the scale cube.

- x axis scaling - scale by cloning. This is the typical strategy used by monolithic apps or a service within a micro services application can scale. Clones are applications are running behind a load balanacer. the load balancer distributes the load to the identical copies.
- y axis scaling - y axis scaling is the strategy used by microservices to scale. a monolithic application can be split into multiple services hence inreasing scalability. Note that an individual service within a microservice can use x axis scaling.
- z axis scaling - This strategy is used often by database to scale.(even message brokers like kafka). In z axis scaling each instance/node is running identical copy of code how ever each node only handles a subset of the data. commonly the primary key of enity is used to partition data across nodes. this is called sharding.

what is sharding

The word “Shard” means “a small part of a whole“. Hence Sharding means dividing a larger part into smaller parts. Sharding is also referred as horizontal partitioning. Sharding is also called z axis scaling.

Sharding is a method of splitting and storing a single logical dataset .
Sharding is used to handle high write loads. (in memory caches are often used for scaling writes)
Sharding is necessary if a dataset is too large to be stored in a single database.Although vertical scaling is possible but it is not possible beyond a limit. Ultimately single server will not be able to handle storage, cpu, memory and network bandwidth requirements.
Sharding allows a database cluster to scale along with its data and traffic growth. By distributing the data among multiple machines, a cluster of database systems can store larger dataset and handle additional requests.
The rows of a table are stored in different database nodes. Each shard has the same schema but holds its own distinct subset of the data.
- Not everything may need to be sharded. Often times, only few tables occupy a majority of the disk space. There is no advantage of sharding small tables with hundreds of rows. Focus on the large tables.
Since rows of database are stored in different nodes hence index size comes down , hence improving performance.
Sharding can be implemented at either the application or database level. Many modern databases are natively sharded.
These shards are distributed across multiple server nodes (containers, VMs, bare-metal) in a shared-nothing architecture. This ensures that the shards do not get bottlenecked by the compute, storage and networking resources available at a single node. some times it may make sense to replicate certain tables into each shard to serve as reference tables. For example, let’s say there’s a database for an application that depends on fixed conversion rates for weight measurements. By replicating a table containing the necessary conversion rate data into each shard, it would help to ensure that all of the data required for queries is held in every shard.
The shards can be geo distributed.(this is called geo sharding). the idea is that shards can be located physically close to the users that will access the data.
High availability is achieved by replicating each shard across multiple nodes.
The application interacts with a SQL table as one logical unit and remains agnostic to the physical placement of the shards.

Sharding techniques

Algorithmic/key based/hash based sharding , a partition key is used for sharding, it can be (table column) / (attribute of an entity) like customer id, location, zip code etc . This colum is passed to a hash function to get the shard number and insert the row into the shard. When read request comes for row ,the partion key is used to determine the shard in which the row resides. The partion key can be

- The primary column of table
- Any other column, provided it does not contain values that might change over time, otherwise it will increase time required for update operations.
- composite key with multiple attributes
- Avoid autoincrementing columns as shard key

The downside of this approach is that reshard the database is required (update hash function and rebalance data in nodes) every time a new server is added to the node. This problem can be solved by using consistent hashing .Consistent hashing is a special kind of hashing technique such that when a hash table is resized, only n/m keys need to be remapped on average where n is the number of keys and m is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation. Another downside of this approach is that query operations for multiple records / range quereis are more likely to get distributed across multiple shards .

The advantage of this approach is that it distributes data algorithmically hence no need to maintain lookup table . Also data is evenly distributed between shards and risk of hotspot is reduced.

Range based/Dynamic sharding

In this method, we split the data based on the ranges of an (attribute of entity) or (table column).The column on which the range is based is also known as the shard key. Let’s say you have a database of your online customers’ names and age. You can split all rows in 2 shards. In one shard you can keep the info of customers whose age is between 0 to 30 and in another shard, keep the information of the rest of the customers.It's useful for applications that frequently retrieve sets of items using range queries (queries that return a set of data items for a shard key that falls within a given range). For example, if an application regularly needs to find all orders placed in a given month, this data can be retrieved more quickly if all orders for a month are stored in date and time order in the same shard. One disadvantage of range based sharding is that row distribution across shards may not be even and hence it can create hotspots. Another disadvantage is that Ranged sharding requires there to be a lookup table or service available for all queries .

Lookup/Directory based sharding

In directory-based sharding have a lookup table also known as location service. It stores a map of shard key value and shard. To read or write data, the client engine first consults the lookup table to find the shard number for the corresponding data using the shard-key and then visits a particular shard to perform the further operation. One advantage is that directory-based sharding allows you to use whatever system or algorithm we want to use to assign data entries to shards, and also it’s relatively easy to dynamically add shards using this approach (rebalancing is easier). The downside can be the lookup table can become a single point of failure.

Vertical partitioning

While in horizontal partitioning rows of a table are stored in different database nodes, in vertical partitioning entire columns are separated out and put into new, distinct tables . For example binary blobs col would be a good candidate to put in different table. Also in vertical partioning different tables can be in different databases.(eg in microservices architecture). Vertical partioning is almost always handled at application layer. Note that vertical partitioning and sharding are complimentary.

Challenges associated with sharding

Hotspots. Data should be evenly distributed too nodes, othersise it can put more pressure on the shard where data is more.
Redistributing/resharding data As the load increases new nodes may need to be introduced. This may involves moving data around the cluster. Doing this while maintaining consistency and availability is hard. Choice of sharding technique can reduce the amount of data moving around. Consistent Hashing is a common algorithm used for sharding.
Cross partion operations : Queries of on secondary indexes need to fetch data from all nodes and can often be slow.Queries that access a single shard are more efficient than those that retrieve data from multiple shards. A single shard can contain the data for multiple types of entities. Denormalizing data to keep related entities that are commonly queried together (such as the details of customers and the orders that they have placed) in the same shard is recommended .

Stateless applications for horiznotal scaling

A Stateless app is an application program that does not save client data generated in one session for use in the next session with that client. Hence requests from a client can go to any instance of the application. You can simply scale a stateless application by deploying it on multiple servers. Scales beautifully horizontally. Note that Stateless communication is one of the REST principles . The server should not contain client state. Note that in java using synchronization with respect to database operations can lead to nasty dead locks as jvm lock + db lock are both coming into play. synchronization across jvms should be done with the help of database. In a synchronized block reading a counter from database , incrementing and then using it is again very bad design which will not work across servers (bug).

Database

DeNormalization : Normalization is a database design technique that reduces data redundancy and eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies. Normalization rules divides larger tables into smaller tables and links them using relationships. However normalization in RDBMS places the burden of computation on reads rather than writes as complex join operations may have to be performed while reading data. This is completely wrong for large-scale web applications, where response time is critical. In nosql by not permitting joins read performance is improved. Denormalization improves read performance by avoiding complex joins.
Fixing slow inserts : sometimes inserts can block as transactions are not committing because of memory issues in program
- read queries can cause too much memory load on appserver and heap memory may run out.
- this slows down all threads
- blocked threads cannot commit transactions and without transaction commit a transaction is not complete.
- inserts will start taking long time to commit (in other words get completed).
- In other words if the server has memory problems it can have immidiate impact on all queries including insert and update.
Scaling up reads by read replica - read replicas can be used to scale reads. note that indexes if optimized for olap can lead to slow oltp inserts because of too many indexes.
Why does Postgres sequence item go up even if transaction fails? sequence.nextval advances the sequence object to its next value and return that value. This is done atomically: even if multiple sessions execute nextval concurrently, each will safely receive a distinct sequence value. Note that o avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used, even if the transaction that did the nextval later aborts. This means that aborted transactions might leave unused "holes" in the sequence of assigned values.
Table inserts with GUID as primary key do not block each other. Will table inserts block each other if primary key is auto increment? Answer: NO because the step when SQL Server gets next identity value is outside of transaction. This is why when you perform ROLLBACK identity does not decrease and you can have gaps.

Scaling based on server side threading patterns

Scaling using microservices architecture

Scaling using event driven microservices

Caching

If your application is bound by read performance, you can add caches or database replicas. They provide additional read capacity without heavily modifying your application. read more https://clarifyforme.com/posts/5734331951611904/Server-side-caching-patterns and https://clarifyforme.com/posts/5683115607457792/http-caching-by-browser

serving a global user base

Even a in a single region deployment , cloud load balancer can reduce latency.Cloud Load Balancing, such as HTTP(S) load balancing, TCP, and SSL proxy load balancing, let you automatically redirect users to the closest region where there are backends with available capacity.Even if your app is only in a single region, using Cloud Load Balancing still provides lower user latency . Global load balancing provides access via single global public static anycast ip address. The request (via any cast bgp) will route the request to the closest cloud load balancer to user. After the this the request will be over faster google network.
In a multi region deployment the database may be in a single regioin and the caching layer (redis) may be multi region.

	Nishant Anand I am technical architect with around 18 years of experience in telecom , banking and health care domain. I was part of the founding team of Drishti and currently CTO at affordplan
	Consult Send Query