What is Data Fabric?
Data Fabric is a distributed and composable architecture that is metadata and event driven. It’s use case agnostic and excels in managing and governing distributed data. It integrates dispersed data with automation, strong Data Governance, protection, and security. Data Fabric focuses on the Self-Service delivery of governed data.
Data Fabric does not require the migration of data into a centralized data storage layer, nor to a specific data format or database type. It can support a diverse set of data management styles and use cases across industries, such as a 360-degree view of a customer, regulatory compliance, cloud migration, data democratization, and data analytics.
In the next section, we’ll touch on the characteristics of Data Fabric.
What Data Fabric is
Data Fabric is a composable architecture made up of different tools, technologies, and systems. It has an active metadata and event-driven design that automates Data Integration while achieving interoperability. Data Governance, Data Privacy, Data Protection, and Data Security are paramount to its design and to enable Self-Service data sharing. The following figure summarizes the different characteristics that constitute a Data Fabric design.
Figure 1.1 – Data Fabric characteristics
Data Fabric takes a proactive and intelligent approach to data management. It monitors and evaluates data operations to learn and suggest future improvements leading to productivity and prosperous decision-making. It approaches data management with flexibility, scalability, automation, and governance in mind and supports multiple data management styles. What distinguishes Data Fabric architecture from others is its inherent nature of embedding Data Governance into the data life cycle as part of its design by leveraging metadata as the foundation. Data Fabric focuses on business controls with an emphasis on robust and efficient data interoperability.
In the next section, we will clarify what is not representative of a Data Fabric design.
What Data Fabric is not
Let’s understand what Data Fabric is not:
- It is not a single technology, such as data virtualization. While data virtualization is a key Data Integration technology in Data Fabric, the architecture supports several more technologies, such as data replication, ETL/ELT, and streaming.
- It is not a single tool like a data catalog and it doesn’t have to be a single data storage system like a data warehouse. It represents a diverse set of tools, technologies, and storage systems that work together in a connected ecosystem via a distributed data architecture, with active metadata as the glue.
- It doesn’t just support centralized data management but also federated and decentralized data management. It excels in connecting distributed data.
- Data Fabric is not the same as Data Mesh. They are different data architectures that tackle the complexities of distributed data management using different but complementary approaches. We will cover this topic in more depth in Chapter 3, Choosing between Data Fabric and Data Mesh.
The following diagram summarizes what Data Fabric architecture does not constitute:
Figure 1.2 – What Data Fabric is not
We have discussed in detail what defines Data Fabric and what does not. In the next section, we will discuss why Data Fabric is important.