You're reading from Distributed .NET with Microsoft Orleans Build robust and highly scalable distributed applications without worrying about complex programming patterns

Product type Paperback

Published in May 2022

Publisher Packt

ISBN-13 9781801818971

Length 262 pages

Edition 1st Edition

Languages

Python

Tools

.NET

Concepts

Agile

Authors (2):

Bhupesh Guptha Muthiyalu

Suneel Kumar Kunani

View More author details

Table of Contents (17) Chapters

Preface

1. Section 1 - Distributed Applications Architecture

2. Chapter 1: An Introduction to Distributed Applications FREE CHAPTER

3. Chapter 2: Cloud Architecture and Patterns for Distributed Applications

4. Section 2 - Working with Microsoft Orleans

5. Chapter 3: Introduction to Microsoft Orleans

6. Chapter 4: Understanding Grains and Silos

7. Chapter 5: Persistence in Grains

8. Chapter 6: Scheduling and Notifying in Orleans

9. Chapter 7: Engineering Fundamentals in Orleans

10. Section 3 - Building Patterns in Orleans

11. Chapter 8: Advanced Concepts in Orleans

12. Chapter 9: Design Patterns in Orleans

13. Section 4 - Hosting and Deploying Orleans Applications to Azure

14. Chapter 10: Deploying an Orleans Application in Azure Kubernetes

15. Chapter 11: Deploying an Orleans Application to Azure App Service

16. Other Books You May Enjoy

Packt is searching for authors like you

Designing applications for scalability

Scalability is the ability of a system to adapt itself to handle a growing number of incoming requests successfully by increasing the resources available to the system. Scalability is measured by the total number of requests your application can process and respond to successfully. How do you know your application has reached its threshold of the maximum capacity limit? When it is busy processing current requests in the pipeline and can no longer take any incoming requests and process them successfully. Also, your application may not perform as expected, resulting in performance issues, and some requests will start to fail by timing out. At this stage, we must scale our application for business continuity. Let's look at the options available.

Vertical scaling or scaling up

Vertical scaling or scaling up means adding more resources to individual application servers and increasing the hardware capacity. Users send requests and the application processes the requests, reads/writes to the database, and sends responses back to the users. If the user base grows and the number of incoming requests becomes high, the application server will be overloaded, resulting in longer processing times and latency in responding to users. In this case, we can scale up the application server hardware to a higher hardware capacity, as shown in the following diagram.

Figure 1.3 – Vertical scaling (scaling up)

Horizontal scaling or scaling out

Horizontal scaling or scaling out means adding more processing servers/machines to a system. Let's say my application is running on one server and can process up to 1,000 requests per minute. I could scale out by adding 4 more servers and could process 4,000 more requests per minute, as shown in the following screenshot.

Figure 1.4 – Horizontal scaling (scaling out)

Tip

Having a single server is always a bottleneck beyond a certain load, no matter how many CPU cores and memory you have. That's when horizontal scaling or scaling out may help.

Load balancers

Load balancers help in increasing scalability by distributing incoming traffic to healthy servers within a region when the amount of simultaneous traffic increases. Load balancers have health probe monitors to monitor a given port on each of the servers to check the health, and if they're found to be unhealthy, the server is disabled from the load balancer and incoming traffic. When the next health probe test passes, the server is added back to the load balancer.

Caching

Caching is one of the key system design patterns that help in scaling any application, along with improving response times. Any application typically involves reading and writing data from and to a data store, which is usually a relational database such as SQL Server or a NoSQL database such as Cosmos DB. However, reading data from the database for every request is not efficient, especially when data is not changing, because databases usually persist data to disk and it's a costly operation to load the data from disk and send it back to the browser client (or device in the case of mobile/desktop applications) or user. This is where caching comes into play. Cache stores can be used as a primary source for retrieving data and falling back to the original data store only when data is not available in the cache, thus giving a faster response to the consuming application. While doing this, we also need to ensure that the cached data is expired/refreshed as and when data in the original data store is updated.

Distributed caching

As we know, in a distributed system, the data store is split across multiple servers; similarly, distributed caching is an extension of traditional caching in which cached data is stored in more than one server in a network. Before we get into distributed caching, here's a quick recap of the CAP theorem:

C: Stands for consistency, meaning the data is consistent across all the nodes and has the same copy of data
A: Stands for availability, meaning the system is available, and failure of one node doesn't cause the system to go down
P: Stands for partition tolerant, meaning the system doesn't go down even if the communication between nodes goes down

As per the CAP theorem, any distributed system can only achieve two of the preceding principles, and as distributed systems must be partition-tolerant (P), we can only achieve either the consistency (C) of data or the high availability (A) of data.

So, distributed caching is a cache strategy in which data is stored in multiple servers/nodes/shards outside the application server. Since data is distributed across multiple servers, if one server goes down, another server can be used as a backup to retrieve data. For example, if our system wanted to cache countries, states, and cities, and if there were three caching servers in a distributed caching system, hypothetically there would be a possibility that one of the cache servers would cache countries, another one would cache states, and one would cache cities (of course, in a real-time application, data is split in a much more complex way). Also, each server would additionally act as a backup for one or more entities. So, on a high level, one type of distributed cache system looks as shown:

Figure 1.5 – Distributed caching high-level representation

As you can see, while reading data, it is read from the primary server, and if the primary server is not available, the caching system will fall back to the secondary server. Similarly, for writes, write operations are not complete until data is written to the primary as well as the secondary server, and until this operation is completed, read operations can be blocked, hence compromising the availability of the system. Another strategy for writes could be background synchronization, which will result in the eventual consistency of data, hence compromising the consistency of data until synchronization is completed. Going back to the CAP theorem, most distributed caching systems fall under the category of CP or AP.

Sharding

Sharding can improve scalability when storing and accessing large data from data stores. This is achieved by splitting a single data store into multiple horizontal partitions or shards. As the data is split across a cluster of databases, the system will be able to store a large amount of data and at the same time, the system can handle additional requests. We can continue to scale the system out by adding further shards.

Here are a few important considerations:

Keep shards balanced for even load distribution. Periodically rebalance shards as data is updated and removed from each shard.
Avoid queries that retrieve data from multiple shards as they are not efficient and cause a performance bottleneck. You can use parallel tasks to fetch data from different shards for better efficiency but it adds complexity.
Creating a large number of smaller shards is better for load balancing than a small number of large shards.

In the next section, we will see how to design applications for high availability.

You're reading from Distributed .NET with Microsoft Orleans Build robust and highly scalable distributed applications without worrying about complex programming patterns

Table of Contents (17) Chapters

Designing applications for scalability

Vertical scaling or scaling up

Horizontal scaling or scaling out

Load balancers

Caching

Distributed caching

Sharding

Authors (2)

Personalised recommendations for you