Understanding Data Security
Data security is one of the greatest concerns for enterprises today, especially with the ever-increasing amount and value of collected data. It is all about protecting digital data from the actions of unauthorized users (such as data leaks or breaches) or from destructive forces. As part of that, there are a few concepts that you need to become familiar with, including encryption, backup and restore, data masking, and data erasure. You will get to know each of them in the next sections.
Data Encryption
Data encryption can be applied at multiple levels and stages of the data life cycle. This includes when the data is stored at its final data store (encryption at rest) and while data is in motion, moving from one system to another (encryption in transit).
Encryption in transit is typically achieved by encrypting the message before it is transmitted and decrypted at the destination. This process intends to protect data while being transferred against attackers who could intercept the transmission or what are sometimes referred to as man-in-the-middle attacks. This is normally achieved by utilizing a secure channel such as HTTPS, although higher levels of security can be applied. You will do a deep dive into this topic in Chapter 3, Core Architectural Concepts: Integration and Cryptography, to better understand how encryption algorithms work and how they are used to exchange data in a secure manner.
Encryption at rest is all about storing the data that has been encrypted. This makes it impossible to read and display the decrypted version of it without having access to a specific encryption key. Some applications or platforms provide this out of the box. This is a protection mechanism against attackers who can gain access to the database or to the physical disk where the data is stored.
Salesforce Shield provides an encryption solution for encrypting data at rest. This is applicable to the filesystem, the database, and the search index files. If you are planning to use Salesforce shield as part of your solution, you need to highlight that clearly in your landscape architecture.
Data Restoration
Backup and restore solutions are used to ensure data is available in a safe location/source in case there is a need to restore or recover it. In most industries, it is essential to keep a backup of any operational data. And most importantly, you must have a clear restoration strategy. Data restoration is typically more challenging than backing it up as it comes with additional challenges, such as restoring partial data, reference data, and parent-child records and relationships.
Note
Salesforce announced that effective July 31, 2020, data recovery as a paid feature would be deprecated and no longer available as a service. However, based on customers’ feedback, Salesforce decided to reinstate its data recovery service. Then, during Autumn 2021, Salesforce announced a new built-in platform with a native backup and restore capability.
Due to this, it is important to create a comprehensive data backup and restore strategy as part of your data governance strategy. There are several tools that can be used to back up and restore data from and to the Salesforce Platform, including Salesforce’s Backup and Restore, in addition to some AppExchange products. A custom-made solution through implementing ETL tools is also possible, despite the additional build cost associated with it. As an architect, you are expected to be able to walk your stakeholders through the various options that are available, as well as the potential pros and cons.
Note
During the review board, you are expected to come up with the best possible solution technically. Cost should not be a consideration unless clearly mentioned in the scenario. Buy versus build decisions always tend to pick the buy option due to its quick return on investment.
Data Masking
Data masking (also known as data obfuscation) of structured data is the process of covering the original data with modified content. This is mainly done to protect data that is classified as personally identifiable information (PII) or sensitive commercial or personal data. An example is masking national identity numbers to display only the last four digits while replacing all other digits with a static character, such as a wildcard. Data is normally obfuscated to protect it from users, such as internal agents, external customers, or even developers (who normally need real production-like data to test specific use cases or fix a particle bug) to be compliant with regulatory requirements.
There are two common techniques for data obfuscation, namely pseudonymization, and anonymization. Here is a brief description of the two:
- Anonymization: This works by changing and scrambling the contents of fields so they become useless. For example, a contact named Rachel Greene could become
hA73Hns#d$
. An email address such as[email protected]
could become an unreadable value such asJA7ehK23
. - Pseudonymization: This converts a field into readable values unrelated to the original value. For example, a contact named Rachel Greene could become Mark Bates. An email address such as
[email protected]
could become [email protected].
Which technique amongst these two you should choose depends on the degree of risk associated with the masked data and how the data will be processed. Pseudonymous data still allows some sort of reidentification (even if it is remote or indirect), while anonymous data cannot be reidentified. A common way to anonymize data is by scrambling data, a process that can sometimes be reversible; for example, “London” could become “ndooln.” This masking technique allows a part of the data to be hidden with a static or random character. On the other hand, data blurring uses an approximation of data values to make it impossible to identify a person or to make the data’s meaning obsolete.
Data Erasure
Data erasure (also referred to as data clearing, data destruction, or data wiping) is a software-based activity where specific data is overwritten with other values to destroy electronic data and make it unrecoverable. This is different from data deletion, even though they sound the same.
Data deletion can leave data in a recoverable format (for example, by simply removing the reference to it from an index table while still maintaining it on the storage disk). Data erasure, on the other hand, is permanent and particularly important for highly sensitive data. It is important to understand the difference between these terms so that you can suggest the best strategy to your stakeholders while also taking into consideration the limited control they have over how the data is ultimately stored in Salesforce.
It is worth mentioning that encrypted data can be destroyed/erased permanently by simply destroying the encryption keys.
Another key topic the data governing body needs to cover is data regulatory compliance. With the increased amount of gathered customer and business data, it has become essential to introduce rules that govern the use of that data. As an architect, you must be aware of these regulations to design a fully compliant solution. You will likely need to work with subject matter experts to ensure your solution fulfills all regulatory requirements, but you should still be able to cover a good amount of that by yourself. You also need to be able to explain how your solution is compliant with these regulations to your stakeholders.