Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Distributed .NET with Microsoft Orleans

You're reading from   Distributed .NET with Microsoft Orleans Build robust and highly scalable distributed applications without worrying about complex programming patterns

Arrow left icon
Product type Paperback
Published in May 2022
Publisher Packt
ISBN-13 9781801818971
Length 262 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Authors (2):
Arrow left icon
Bhupesh Guptha Muthiyalu Bhupesh Guptha Muthiyalu
Author Profile Icon Bhupesh Guptha Muthiyalu
Bhupesh Guptha Muthiyalu
Suneel Kumar Kunani Suneel Kumar Kunani
Author Profile Icon Suneel Kumar Kunani
Suneel Kumar Kunani
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Section 1 - Distributed Applications Architecture
2. Chapter 1: An Introduction to Distributed Applications FREE CHAPTER 3. Chapter 2: Cloud Architecture and Patterns for Distributed Applications 4. Section 2 - Working with Microsoft Orleans
5. Chapter 3: Introduction to Microsoft Orleans 6. Chapter 4: Understanding Grains and Silos 7. Chapter 5: Persistence in Grains 8. Chapter 6: Scheduling and Notifying in Orleans 9. Chapter 7: Engineering Fundamentals in Orleans 10. Section 3 - Building Patterns in Orleans
11. Chapter 8: Advanced Concepts in Orleans 12. Chapter 9: Design Patterns in Orleans 13. Section 4 - Hosting and Deploying Orleans Applications to Azure
14. Chapter 10: Deploying an Orleans Application in Azure Kubernetes 15. Chapter 11: Deploying an Orleans Application to Azure App Service 16. Other Books You May Enjoy

Designing applications for scalability

Scalability is the ability of a system to adapt itself to handle a growing number of incoming requests successfully by increasing the resources available to the system. Scalability is measured by the total number of requests your application can process and respond to successfully. How do you know your application has reached its threshold of the maximum capacity limit? When it is busy processing current requests in the pipeline and can no longer take any incoming requests and process them successfully. Also, your application may not perform as expected, resulting in performance issues, and some requests will start to fail by timing out. At this stage, we must scale our application for business continuity. Let's look at the options available.

Vertical scaling or scaling up

Vertical scaling or scaling up means adding more resources to individual application servers and increasing the hardware capacity. Users send requests and the application processes the requests, reads/writes to the database, and sends responses back to the users. If the user base grows and the number of incoming requests becomes high, the application server will be overloaded, resulting in longer processing times and latency in responding to users. In this case, we can scale up the application server hardware to a higher hardware capacity, as shown in the following diagram.

Figure 1.3 – Vertical scaling (scaling up)

Figure 1.3 – Vertical scaling (scaling up)

Horizontal scaling or scaling out

Horizontal scaling or scaling out means adding more processing servers/machines to a system. Let's say my application is running on one server and can process up to 1,000 requests per minute. I could scale out by adding 4 more servers and could process 4,000 more requests per minute, as shown in the following screenshot.

Figure 1.4 – Horizontal scaling (scaling out)

Figure 1.4 – Horizontal scaling (scaling out)

Tip

Having a single server is always a bottleneck beyond a certain load, no matter how many CPU cores and memory you have. That's when horizontal scaling or scaling out may help.

Load balancers

Load balancers help in increasing scalability by distributing incoming traffic to healthy servers within a region when the amount of simultaneous traffic increases. Load balancers have health probe monitors to monitor a given port on each of the servers to check the health, and if they're found to be unhealthy, the server is disabled from the load balancer and incoming traffic. When the next health probe test passes, the server is added back to the load balancer.

Caching

Caching is one of the key system design patterns that help in scaling any application, along with improving response times. Any application typically involves reading and writing data from and to a data store, which is usually a relational database such as SQL Server or a NoSQL database such as Cosmos DB. However, reading data from the database for every request is not efficient, especially when data is not changing, because databases usually persist data to disk and it's a costly operation to load the data from disk and send it back to the browser client (or device in the case of mobile/desktop applications) or user. This is where caching comes into play. Cache stores can be used as a primary source for retrieving data and falling back to the original data store only when data is not available in the cache, thus giving a faster response to the consuming application. While doing this, we also need to ensure that the cached data is expired/refreshed as and when data in the original data store is updated.

Distributed caching

As we know, in a distributed system, the data store is split across multiple servers; similarly, distributed caching is an extension of traditional caching in which cached data is stored in more than one server in a network. Before we get into distributed caching, here's a quick recap of the CAP theorem: 

  • C: Stands for consistency, meaning the data is consistent across all the nodes and has the same copy of data 
  • A: Stands for availability, meaning the system is available, and failure of one node doesn't cause the system to go down 
  • P: Stands for partition tolerant, meaning the system doesn't go down even if the communication between nodes goes down 

As per the CAP theorem, any distributed system can only achieve two of the preceding principles, and as distributed systems must be partition-tolerant (P), we can only achieve either the consistency (C) of data or the high availability (A) of data. 

So, distributed caching is a cache strategy in which data is stored in multiple servers/nodes/shards outside the application server. Since data is distributed across multiple servers, if one server goes down, another server can be used as a backup to retrieve data. For example, if our system wanted to cache countries, states, and cities, and if there were three caching servers in a distributed caching system, hypothetically there would be a possibility that one of the cache servers would cache countries, another one would cache states, and one would cache cities (of course, in a real-time application, data is split in a much more complex way). Also, each server would additionally act as a backup for one or more entities. So, on a high level, one type of distributed cache system looks as shown: 

Figure 1.5 – Distributed caching high-level representation 

 

Figure 1.5 – Distributed caching high-level representation 

As you can see, while reading data, it is read from the primary server, and if the primary server is not available, the caching system will fall back to the secondary server. Similarly, for writes, write operations are not complete until data is written to the primary as well as the secondary server, and until this operation is completed, read operations can be blocked, hence compromising the availability of the system. Another strategy for writes could be background synchronization, which will result in the eventual consistency of data, hence compromising the consistency of data until synchronization is completed. Going back to the CAP theorem, most distributed caching systems fall under the category of CP or AP. 

Sharding

Sharding can improve scalability when storing and accessing large data from data stores. This is achieved by splitting a single data store into multiple horizontal partitions or shards. As the data is split across a cluster of databases, the system will be able to store a large amount of data and at the same time, the system can handle additional requests. We can continue to scale the system out by adding further shards.

Here are a few important considerations:

  • Keep shards balanced for even load distribution. Periodically rebalance shards as data is updated and removed from each shard.
  • Avoid queries that retrieve data from multiple shards as they are not efficient and cause a performance bottleneck. You can use parallel tasks to fetch data from different shards for better efficiency but it adds complexity.
  • Creating a large number of smaller shards is better for load balancing than a small number of large shards.

In the next section, we will see how to design applications for high availability.

You have been reading a chapter from
Distributed .NET with Microsoft Orleans
Published in: May 2022
Publisher: Packt
ISBN-13: 9781801818971
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image