Data mesh is a nice concept built on the concepts we have seen in the past
Is data mesh a repeat of how the business operated in early 2000?
A couple of decades back, departments/divisions/functions used to focus on their areas, accept requests from other teams and provide them with what they needed. Each department used to own and manage its applications, infrastructure, and data required for its business growth.
Does this sound the same as the federated data mesh principle? Various domains take ownership of their data assets and share them with others. This is then wrapped up with a blanket of overall governance and a standardized tech stack to facilitate easy data sharing.
All is well and good! But don't you think that there will be a time when the naysayers and data mesh critics might say that we need to have a pane to see all the data assets together, a well-maintained consolidated data layer that can answer all my queries instead of one going to multiple departments/domains. Again, a need for a centralized data platform, and we go into a vicious cycle.
Does data mesh on the consumers' side serve a similar purpose as that of data marts?
Is data mesh a well-governed and managed data pond?
Is data mesh an improvised version of typical pub-sub model implementation (pretty famous a decade back and even now) where producers send the data and consumers extract the data from the queue immediately or later based on the push/pull requirements?
Is data mesh an improvised version of micro-service architecture (purpose-built for the data domains), catering to the real-time/near real-time use cases?
All true, but before we turn down or adopt the latest trend, let us deep-dive further to understand data mesh.
What exactly is a data mesh?
A mesh where various nodes (data products) can communicate with one another and share data assets, which are owned by the department/function and governed by the central data governance/mesh council.
Okay. So, what is a Data Product, and how can data be a product?
A product is whole and has a value attached to it. Data can be a product if packaged well and has potential business benefits and/or monetary realization.
A finished data product is complete, relevant, and certified to be reliable and trustworthy.
These data products are flexible building blocks on which data-driven insights, analytics, and predictions are generated, ultimately leading to a better business understanding and providing a competitive edge.
Data product needs to be registered with the central team/organization in the established federated platform and cataloged for easy discoverability and accessibility across the enterprise.
Defining owners of various domains and products is a crucial step. A data product owner has ownership, accountability, and responsibility for the data products. It provides fine-grained access control of the data products to those who request.
Are the data products classified?
Data products in the data domains can be producer-aligned, consumer-aligned, or have foundational alignment (products requiring additional processing, aggregations based on the need, use, relevance, applicability, and architecture design).
Producer-aligned products are aligned on how the producer produces the data. Producers ensure that the data they create are cleaned, sanitized, and available.
Consumer-aligned data products are those that the consumers prepare for their needs and requirements. These data products can also be consumed by other consumers.
Foundation-aligned data products are those that are outside the ownership purview of data producers and data consumers. These reusable products are known to be leveraged by multiple consumers and have a high business significance and/or basic technological foundational need. These data products are often cleansed, transformed, joined, and aggregated from data producer products and/or other foundational data products.
Okay, a lot about Data Products.
Why are companies moving towards data mesh from centralized data warehouse/data lake implementation?
- The central team disconnects producers and consumers and becomes a bottleneck. They have a never-ending backlog to work on the requests from multiple groups and often have unsatisfactory delivery (from the business expectation lens) due to a lack of business understanding and inaccurate requirement translation.
- The Central IT team being a cost center, often has a restricted budget, whereas the business team being a profit center, easily has a budget to have a small data team work on their key business objectives, alongside providing helping hands to the organizational data mesh goals.
- Lack of ownership, accountability, and responsibility of the data in the central team
- Inflexible and non-scalable data domains (not so simple to add/update data entities, domains, products, and users) in the central team
- The Central IT team is responsible for the data quality, but they depend on the business team, who knows their processes and data well, to provide the quality rules. So, why not do business teams take up the command of things they already know?
- Better adoption of data by the business as they are held responsible for maintaining their data
- Common skillsets with the common tech stack will help onboard the cross-team quicker
- Focussed ownership on the data products/domain
- Distributed ownership, controls, knowledge — though it has its share of pros & cons
Are there any issues with Data mesh? Should we really go for it?
Strong data management, data governance, and industry best practices can help navigate these issues around data mesh implementation:
- Lack of adherence to the best practices, coding standards, and protocols set for the domain teams – Without a proper governing body and code quality checks, implementation done by various groups can go haywire due to different processes, mindsets, and coding styles
- Unaligned data products – Without proper management and governance, data products can be unaligned
- Lack of a single source of truth – Without proper governance, overlapping data products/product groups can be created. Consumers may have difficulty determining which data products are more reliable and trustworthy.
- Shying away from ownership – Teams can shy away from the responsibility of owning data products due to their current business priority and due to confusion arising because of overlapping data entities/data products.
- Maintenance of the data – Taking responsibility and ensuring that the data is accurate and current
- Inability to join data products easily due to no joining keys – Data product owners not caring about what other product owners are doing AND how they can make their data products relevant and integrable for easy consumption
- Code (cleansing, quality checks, data movement) redundancy – This can be avoided by having a centralized code repo (with proper code commenting) so that developers across the board can pick up others' codebase
- Inconsistent, inefficient, and poor monitoring, auditing, definitions, lineage, data selection, and data access processes across all the data domains – Can be overcome by selecting the right tech stack that provides all the capabilities
- Inefficiency of talents not fit for specific tasks – Forcing and making business-focused talents perform technical tasks can lead to inefficiencies. In most cases, one needs to have horses for courses
Whereas a sound data architecture and good tool selection can help in below challenges around defragmented data:
- Data silos & fragmentation lead to inefficiency in contemplating data which needs a combined view
- Working with multiple data products can often be challenging if tech-stack/platform is not common. Several tech stacks being leveraged by various business teams based on their leaders/teams' experience and inclination can be avoided by enforcing an enterprise-wide standardized tech-stack OR by leveraging data virtualization option (not so preferred), which can span across various data technologies
- Data redundancy & performance issues – If the architecture & tech stack is well selected, query performance challenges and data duplication can be avoided
- Need to have a consistent, standardized, reliable enterprise-wide tech stack that provides uniform security, identity access, compliance, catalog, literacy, audit, and monitoring capabilities
Keys for successful data mesh implementation:
- Be aware of the above-mentioned data mesh challenges and work on the solutions for it
- Open to changing company culture
- Business teams embrace the change to establish small engineering teams to build and manage data domains and data products
- Formulation of incentives to garner the interest of domain teams for the extra work required to improvise the data mesh guiding principles and to implement the technology
- Be patient as successful implementation may take time, effort, and long-term business buy-in
- Governance & standardization of all the global processes and methodologies
- IT teams to become the data mesh platform enablers. They prepare the IT infrastructure and provide the governing rules, policies, procedures, data quality rules, catalogs, and naming standards for the data domain owners to follow. IT team may help teams not up and running and needing a kick-start with their reserved pool. Though the primary data ownership should eventually be the responsibility of domain owners.
Conclusion:
- By making data as a product, are we creating silos? Are we trying to complicate things that are happening smoothly? Yes and no.
- Picking up architecture, data strategy, and a design pattern depends on business use cases and problems. In some instances, a well-oiled data mesh is the best solution, whereas in other cases, it is overkill or completely irrelevant.
- Data mesh is a nice concept built on the concepts we have seen in the past. It comes up with a good set of governing principles. But the application of it should stem from the business situation, requirement, or demand.
- Data mesh is not ideal for smaller teams; however, the strategy and product thinking can be incorporated from the start in Organizations with high growth trajectories and expansion plans. This will make the model flexible and scalable for the future.
- Big teams can implement data mesh if they have multiple LOB and/or cross/matrix teams, and if the Organizations vision is to empower data ownership to the various business/domain teams.
- Data mesh demands a big investment in cultural mindset change and technology implementation
- Governance & synergy across the data products (& teams) is very important.
- Decentralized decision-making is tough, and hence a council needs to be in place to drive alignment
Life is all about going in circles and picking up new things.
#datamesh #dataproduct #dataculture #datastrategy #data #analytics #mindset #mindsetchange #designthinking #productthinking #dataengineering #datawarehouse #datameshchallenges #cdo
Disclaimer: Above thoughts are Author's own opinion, and it doesn't mean to demean others' thoughts and opinions
ABOUT AUTHOR
Sunny Taparia is a data and analytics evangelist with a couple of decades of experience in recommending, designing, and implementing robust, scalable, and fit-for-purpose data and analytics platforms and solutions.
To know more about the author, check his LinkedIn page: Sunny Taparia | LinkedIn
Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp
_____________
Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.