Packt+ | Advance your knowledge in tech

You're reading from Microsoft SharePoint Server 2019 and SharePoint Hybrid Administration Deploy, configure, and manage SharePoint on-premises and hybrid scenarios

Product type Paperback

Published in Oct 2020

Publisher Packt

ISBN-13 9781800563735

Length 536 pages

Edition 1st Edition

Tools

SharePoint Framework

Concepts

System Administration

Author (1):

Aaron Guilmette

View More author details

Designing for high availability

When designing any system for high availability, a number of questions/concerns are typically addressed, such as the following:

What types of failures should a system be able to sustain?
How many failures should a system be able to sustain?
What steps (manual or automatic) need to be executed to ensure availability?
What systems or processes can we put in place to avoid interruptions in the first place?

These types of questions speak to the concept of dependability. A dependable system is one that is available to service a request and is able to continue serving requests despite failures of the component architecture (such as a server or network device) or supporting services (such as electricity). Dependability has six core attributes:

Availability: Measures the system's readiness to accept and respond to new requests for service
Reliability: Measures how a system can continue to operate after an unexpected event
Safety: Measures a system's level of risk to users and the environment
Confidentiality: The ability to control or prevent unauthorized disclosure of information
Integrity: Measures the presence or absence of an improper system alteration (such as data corruption)
Maintainability: A qualitative measurement for how easily a system is kept current, repaired, or updated

When designing a system, these ideas or attributes of dependability can be used when building a Fault-Error-Failure chain to help identify potential errors and solve them before they are expressed during operation.

The Fault-Error-Failure chain design principles are used in the development of most modern, highly available systems. The original work that introduces this, Fundamental Concepts of Dependability, is available at https://www.cs.rutgers.edu/~rmartin/teaching/spring03/cs553/readings/avizienis00.pdf.

From a practical standpoint, these questions of dependability can be broken up into four main categories:

Fault forecasting
Fault avoidance
Fault removal
Fault tolerance

Let's examine each of these with regard to designing a highly available SharePoint Server environment.

Fault forecasting

Fault forecasting is the prediction of likely or potential failures. With respect to SharePoint Server architectures, some of the following components come to mind:

Server hardware, including components such as memory, chassis, power supplies, or mainboards
Storage hardware, including components such as disk drives or other storage media, storage array software or firmware, or disk controllers
Networking, including device (switch, router, firewall, proxy, and load-balancers) and cabling components, and inbound and outbound connectivity to the internet or other sites
Power, including any power cables, switch boxes, outlets, power strips, uninterruptible power supplies, building or site power, and redundant power generation
Software, such as application binaries or updates, Secure Sockets Layer (SSL) certificates, operating system binaries or updates, database servers, application services, and components

Each of those component categories represents one or more potential failures for an environment. In the forecasting stage, it's important to determine as many things as possible that can go wrong, as well as the likelihood and service impact of each.

Faults will happen in any environment, so devising strategies to identify potential faults and their impacts will help you design highly available systems.

Fault avoidance

Once potential faults in architecture have been identified, you can design around them. The premise of fault avoidance (or fault prevention) is to introduce elements that prevent faults. In the context of SharePoint Server architecture, this can mean several things, such as the following:

Rigorous change control processes to understand modifications being made to the environment
Development, test, or other sandbox-style environments where modifications are made and evaluated prior to production deployment

Automated or scripted procedures to reduce the opportunity of human-caused failures
Planning for redundancy and multiple failure modes

Fault avoidance is critical from both the design and operational perspectives to help ensure a high level of service and availability for a given service or application.

Fault removal

The goal of fault removal is to reduce the number and severity of service faults. Fault removal activities can be broadly divided into two categories:

During the planning, design, or development of a system
During the operation of a system

From a SharePoint Server perspective, removing faults during the development or planning of a system is the iterative process of identifying potential faults, such as disk drive or database failure (fault forecasting), designing a system to mitigate or prevent them (fault avoidance), and then performing testing that would trigger a particular failure mode.

For example, if you are planning for disk drive failure in a storage array, you would do the following:

Implement a storage subsystem with redundant features, such as disk mirroring.
Deploy an application or service utilizing the storage subsystem.
Introduce a failure, such as removing a disk drive, that would normally trigger a system failure.
Verify that the application or service continues to operate.

If the service or application fails to continue operating, you need to review the error logs and conditions, revise the deployment methodology or design, and then repeat the testing. Through this process, you can provide assurance to the business that the system will perform as designed.

Addressing the concept of fault removal during operation, using the previous example of disk drive failure, might look something like this:

The disk in the storage subsystem fails.
The disk subsystems continue operating in a degraded state.
The technician replaces the failed disk.
The system returns to a normal operational state.

In the preceding example, Step 1 is the failure mode. Step 2 indicates that the system's design has successfully resulted in continuing operations. In Step 3, the technician is performing fault removal by removing a failed device and replacing it with an operational one. In Step 4, the system has recovered and has returned to a normal operating state, free of faults.

In the previous failure scenario, the disk subsystem may have been designed to sustain the failure of a single disk drive. After the disk has failed in Step 1, the system is then at risk until the disk has been replaced in Step 3. The ability for a system to continue operation is compromised with each further fault, so it's important to minimize the amount of time between the steps.

Fault tolerance

Finally, the design goal of fault tolerance is to address how systems react when faults happen. As we've already stated, faults will happen. Fault-tolerant design plays a crucial role in allowing services to continue while faults are removed.

As a practitioner, you'll often be faced with choices and trade-offs to make on fault-tolerant designs, such as spending resources on redundant database hardware or additional servers in the SharePoint Server farm.

When designing highly available, fault-tolerant design for SharePoint, you'll likely need to incorporate the following components:

Fault Domains	Examples
Rack and power infrastructure	Server racks, power distribution units, power circuits, uninterruptible power supplies, fans, and cooling equipment
Physical server infrastructure and components	Servers, server chassis, server backplanes or midplanes, hard disk drives, controllers, network interface cards, and processors
Virtual server infrastructure and components	Virtual machine hosts
Network infrastructure and components	Rack-based switches, cabling, core switching, load balancers and traffic directors, and firewalls
Storage infrastructure and components	Storage networking components, disk arrays, disks, disk controllers, and Redundant Array of Independent Disks (RAID) settings.
Application services and components	SharePoint application servers, Distributed Cache servers, User Profile Service, and the Search Service application
Database services and components	The SQL Server database failover clustering or AlwaysOn availability groups for content, configuration, and service application databases

In the fault forecasting step, you identified potential failures that could affect the SharePoint Server system and designed methods in the fault avoidance step to help mitigate or reduce the impact of the faults on the environment.

In addition to fault-tolerant designs, you also need to make preparations for how to recover from catastrophic failures (such as a natural disaster) that spans all components in either a single fault domain or multiple fault domains.

In the next section, we'll look at using highly available designs to mitigate the impact of failures of various service databases.

Supported SharePoint high-availability designs

A SharePoint farm has many moving pieces. A successful highly available design requires understanding how the various components can be made resilient. The following table lists the database design considerations:

Service Database	Supports Database Mirroring for High Availability	Supports Database Mirroring or Log Shipping for Disaster Recovery	Supports SQL AlwaysOn Availability Group for Availability	Supports SQL AlwaysOn Availability Group for Disaster Recovery
Configuration database	X		X
Central Administration database	X		X
Content database(s)	X	X	X	X
App Management database	X	X	X	X
Business Connectivity Service database	X	X	X	X
Managed Metadata Service database	X	X	X	X
PerformancePoint Services database	X	X	X	X
Power Pivot Service database	X	X	X	X
Project Server database	X	X	X	X
SharePoint Search Service – administration database	X		X
SharePoint Search Service – analytics reporting database	X	X	X
SharePoint Search Service – crawl database	X		X
SharePoint Search Service – link database	X		X
Secure Store database	X	X	X	X
SharePoint Translation Services database	X	X	X	X
State Service database	X
Subscription Settings database	X	X	X	X
Usage and Health Collection database	X	X	X
User Profile Service – profile database	X	X	X	X
User Profile Service – synchronization database	X	X	X	X
User Profile Service – social tagging database	X	X	X	X
Word Automation Services database	X	X	X	X

For more information on the specific SQL or SharePoint versions necessary to support certain high-availability designs, go to https://docs.microsoft.com/en-us/sharepoint/administration/supported-high-availability-and-disaster-recovery-options-for-sharepoint-databas.

One of the common threads you'll see in the databases' availability design is the support for SQL Server AlwaysOn availability groups. Microsoft recommends AlwaysOn availability groups for all databases in a SharePoint Server environment from the perspective of same-farm high availability.

Service Applications support high availability behind load-balancers. After using the SharePoint product configuration wizard to configure a role for your server, add a configuration object (such as a virtual IP) to your load balancer that includes all of the servers hosting an application or service.

While a fault-tolerant and resilient design is important from a design and day-to-day operational perspective, you also need a plan for business continuity concerns in the event of a significant problem. That is where disaster-recovery planning is helpful.

You're reading from Microsoft SharePoint Server 2019 and SharePoint Hybrid Administration Deploy, configure, and manage SharePoint on-premises and hybrid scenarios

Table of Contents (19) Chapters

Designing for high availability

Fault forecasting

Fault avoidance

Fault removal

Fault tolerance

Supported SharePoint high-availability designs

Authors (1)

Other recommended products

Personalised recommendations for you