Aug 16, 2024 5 min read

Data meshes

I had a more challenging time picking something to write about this week because I have too many things I would like to cover than not enough. At the end of last week, I mentioned a couple of articles I would write next, which have now become 'some point in the future'. Given how quickly things are changing, I will avoid trying to pre-empt what will be most interesting to write about in advance.

Anyway, preamble aside, today I want to talk about data meshes because they've come up in the work I'm currently doing (albeit without the name) and because they will offer a natural enough lead into the topic of operating models.

In researching and writing this article, I've heavily relied on Wikipedia and the documentation of the major cloud providers (GCP, AWS, Azure). I've referenced them when quoting directly, but it's also safe to assume that anything insightful comes from those sites.

What is a data mesh?

According to Wikipedia:

Data mesh is a sociotechnical approach to building a decentralised data architecture by leveraging a domain-oriented, self-serve design.

While this sentence feels a lot like it was designed to win buzzword bingo, it's reasonably instructive in terms of the different components of the approach that I want to discuss. Let's look at each element (not quite in order).

Sociotechnical approach

This is a fancy way of saying that the system's design recognises the interaction between technology and people. This is important because there's little value in defining the 'right' technology or architecture only to find that it doesn't work with the organisation's design or culture and the behaviour of the people within it. All technical designs should be sociotechnical if they're to work. Still, if the people building them separate themselves from the business or their end users, then there is a genuine risk that they will focus on the technical aspects, not the socio aspects.

The principles of this approach are:

Responsible autonomy: small groups with clearly defined responsibilities and all (or at least most) of the capabilities and authority to own and progress those responsibilities
Adaptability: rather than matching external complexity with increasing internal complexity in a business, these small groups reduce internal complexity and allow teams to adapt and respond to changing circumstances without adding layers of governance
Whole tasks: the small group should be able to take on the entire task rather than having numerous external dependencies, which will likely reduce their delivery speed. This also gives them the flexibility to decide how to complete a task based on local conditions.
Meaningfulness of tasks: all of the above mean that the task is the primary focus of each team member, and they can see it through to completion. This meaningfulness is diminished if the group works on multiple things or is dependent on other groups.

It feels like this is leaning towards agile without actually using the word.

Domain-oriented

This means breaking down the data and architecture into the relevant domains owned by the experts in that domain rather than having a single overarching model. This one is most easily understood with a graphic, so let's refer to the data structure we built up last week.

We focus on HR, but the same logic applies to other domains. There are potentially many HR source systems, and I've shown two here: an HCM system and a learning platform. The data is periodically taken into the raw layer of the data lake. No changes are made to the data, although some metadata, governance, and (in particular) access management are applied at this point. I'll discuss those more in a future article.

Then, this data flows into the curated layer. Here, the data is cleaned and organised into a consistent data model. This could be as simple as making sure all dates are in the same format or erroneous values are dealt with, but it also extends to putting the data into a defined and long-term useful data model. It does not mean that all the data is smashed together (as this diagram may suggest); it is merely combined or aggregated as appropriate for the data. There likely will still be multiple data sets at this point.

The green box highlights the 'People data domain'. While the underlying infrastructure is owned by a Data Platform team (more to come in future), the data and how it's structured and processed at each level are the responsibility of a People (HR) expert, not a data person. The mesh, therefore, refers to the multiple, separately owned and maintained data domains that come together only in the business layer of this stack.

Decentralised data architecture

This is essentially the same point again: the architecture within each domain is determined by the domain owner, not some central team. That central team provides the platform and capabilities to make that architecture possible. It may even build that architecture on behalf of teams without the technical expertise to do so themselves.

So yes, 'decentralised data architecture' is partly included in the definition to earn points on our buzzword bingo board.

Self-serve

The principle of self-serve is relatively easy to understand: people should be able to access and use the data themselves. I'm surprised they didn't say 'democratise data access' because that feels nice and buzzwordy.

Anyway, all this does is complete the basic diagram we've had for a while now below:

There are two new layers added:

Federation: this is just an implementation detail, but once you begin to address business use cases, it should seem like the data in the curated layer is all available in one place rather than worrying about exactly what's where. Each domain could have its database(s), but the consumers shouldn't have to know or care.
Tech capabilities layer: as we've discussed before, these are just the capabilities that allow business users to actually work with the data. The most intuitive is the visualisation capability, which enables the team to produce graphs, tables, etc.

Now, self-serve does not mean anyone can use this; getting the data from the curated to the business layer will require some data expertise. The point, however, is that this data expert could know nothing about People data and have never worked in the People domain, but can still trust that the data in that layer is of high quality, represents the single source of truth, and can be accessed without necessarily needing help from the experts in that domain (IAM permissions allowing).

Conclusion

We're back at the diagram we've used in the past but with more detail. The key thing, though, is that we've started to talk about who owns what, and this will naturally lead to a future discussion on the operating model.