New Paradigm for Enterprise Data Management: The Data Mesh
A data mesh is a new architectural paradigm for connecting distributed data sets in a way that enables data analytics at scale. A data mesh solves the issues presented by centralized, monolithic data lakes and data warehouses by treating domain-based data as the end-product and allowing separate business domains to host and serve their datasets in an easily consumable way.
Why Current Data Architecture Models are Failing
As data proliferates and becomes ever more ubiquitous, data lakes and warehouses are beginning to fail at a few of the key functions they were designed to facilitate: cross-silos analysis and consumption.
Smaller organizations with minimally diversified data sets may still be able to centralize their data in an enterprise data warehouse, but for larger companies with an infinite and ever-growing number of new and legacy data sources as well as diverse data consumers, piping all the data into a single place becomes an endless project. ETL engineers can’t keep up with the added data sources, and existing data pipes need constant maintenance at each subtle change and update of the data sources.
In addition, the centralized architecture of lakes creates a bottleneck between the data engineers and the business domain experts, causing domain knowledge to be lost and resulting in disconnected and frustrated source teams that feel locked out of data they should rightly be able to own, use and process. Ultimately, it’s a structure that does not scale and does not deliver on the promise of creating a data-driven organization.
The Data Mesh Solution
Just as microservices have changed the way we develop software by allowing applications to be broken down into independently built and maintained services, data meshes provide granular access and control over highly distributed data from various domains.
Functioning similar to a service mesh, a data mesh connects siloed data by creating a self-serve data infrastructure that stitches together data held across multiple locations and organizations. It accomplishes this by using a modern platform approach and treating domain-data as the primary component of its architecture. In doing so, it ensures that data is highly available, easily discoverable, secure, and interoperable with the applications that need access to it. Data is no longer segregated into source and consumption patterns, and decentralized teams can use whatever data they need and then “feed the mesh” with their output.
For a data mesh architecture to work, the data product owners ensure their data is discoverable, trustworthy, self-describing, interoperable, secure and governed by global access control. Data lakes and warehouses can still live in this architecture, but instead of being the central focus and repository, they just become another node in the mesh.
Data Mesh Use Cases
A data mesh unlocks the possibilities for a variety of consumption scenarios across an organization, including machine learning, analytics, and data-intensive applications.
With a data mesh architecture, you can create virtual data catalogs from a variety of data sources. You can also create virtual data warehouses and lakes for analytics and machine learning training, and perhaps most importantly, you will be able to connect cloud applications to sensitive data that lives in on-premises and/or streaming or real-time data from devices.
Also—application developers and DevOps teams will be able to query data from a range of data stores without having to worry about how they are accessing this data.
A data mesh empowers your organization to escape the analytical and consumptive confines of monolithic data architectures and connects siloed data to allow machine learning and automated analytics at scale.
With a data mesh, your company will truly be data-driven, relinquishing the issues of lakes and warehouses and replacing them with the power of data access, control, and connectivity.