This design pattern is as much about organizing teams as it is about software architecture
The growth of microservice adoption has caused a resurgence in popularity of some previously-overlooked software design patterns. Many of these patterns have been mined from Eric Evans’ Domain Driven Design, a book that’s as much about team structure as it is about software architecture.
And of these patterns, the Bounded Context is perhaps the most important to understand. As engineers, we have come to consider the Bounded Context to be a software architecture design pattern. But that’s because we’ve co-opted it a bit from its original usage. As used by Evans, the Bounded Context is as much an organizational pattern as it is a technical one.
That’s why I’ve come to view the Bounded Context pattern as a lynchpin in understanding microservices. Not just how to build them, but really why we build them in the first place, and how they can make our organizations more successful. If we understand what Bounded Contexts are — if we adopt the Bounded Context mindset both technically and organizationally — then we can truly be successful in building our microservices architecture.
Why move to microservices?
To begin, let’s perform a little exercise. Ask yourself this question: Why do we build microservices in the first place?
Take a moment to think about it. What are the benefits that first come to mind? What are the main problems we should hope to solve? Jot down some answers, just to keep yourself honest.
Do you have your answer? Good. Read it back to yourself. Did you hit the standard technical benefits? Continuous delivery, scalability, polyglot environments, containers and clouds, and all of that good stuff? Great.
But did your top answer include anything about enabling your organization to operate more efficiently? It should. Because building microservices isn’t about realizing technical benefits. Really, it’s about gaining organizational benefits. Everything else is an implementation detail.
Monoliths = coupled code and coupled teams
As our monoliths grow bigger and bigger, productivity starts to wane. There are at least two major reasons for that.
Putting the brakes on our velocity
First, every engineering team is contributing to one giant codebase. As such, teams face an ever-growing likelihood that their code will conflict with others’ code. To help mitigate the potential issues this could cause, we institute procedures — code freezes, QA testing periods, release trains, etc. — that are literally designed to slow down our productivity.
Of course, these procedures keep features and improvements from being deployed in a timely manner. They also wreak havoc on engineers’ ability to focus on their teams’ priorities. If a bug is found during a testing period, the responsible team must context-shift and focus on resolving that bug. If a severe bug is found in production, the team has to not only fix the bug, but also jump through hoops to get it deployed by the next release train.
On-call duty becomes a boondoggle. If something goes wrong with our monolith, someone needs to be available — day or night — to resolve the issue. But who? Large organizations with large monoliths are generally faced with two choices:
- An incident-management team whose sole, sad, sorry job within the organization is to respond to issues caused by other engineers’ code, and figure out how to resolve them.
- A rotating on-call schedule, whereby each week some arbitrary engineer is assigned the sad, sorry job of becoming responsible for resolving issues that are most likely caused by code written by some other engineer, in some other engineering team.
(Mis)organizing our teams
Monoliths mess with our organizations in yet another way. Our entire organization is working on the same, large product. But we still need to break the organization into manageable teams. So we tend to look to functional roles to find team boundaries:
Unfortunately, this sort of organizational structure limits collaborative work. Rather than working together to solve the true problem at hand (e.g. how do we design, build, and maintain Feature X?) members of the different functional areas simply focus on their own part, metaphorically throwing their work over the fence when they’re done. The potential for collaboration and synergy — where the combined quality of the team’s effort is much more than the sum of the individual team members — is lost.
It is also rife with bottlenecks. When we organize our teams by functional area, then we will naturally have misalignment in priorities. Let’s say the product management team decided that our monolith’s checkout process needs to be revamped. They’ll schedule time with the design team to put together some mocks. At some point, the mocks will be finished and handed to the frontend team to implement. Of course, the frontend team will need APIs to be implemented by the backend team, so they’ll be blocked until that’s completed. Once the backend team prioritizes its work on the new checkout services, it finds that it needs help from the Database Administration (DBA) team. Which, of course, has its own priorities. So the backend team will be blocked until a DBA is freed up.
In a way, this organizational structure seems a bit like a poorly-designed, overly-coupled software architecture… doesn’t it?
Microservices = decoupled code, decoupled teams
By contrast, a microservices architecture enables team autonomy. It becomes much easier to form teams that are self-contained, that work efficiently together, and that aren’t constantly blocked by dependencies on other teams.
Teams can take full ownership over their work, from design to development to deployment. Each member shares in the responsibility for achieving their team’s goal, so they become incentivized to participate in more than just “their part”. I’ve worked with teams where product managers, designers, front-end, back-end and mobile engineers have gotten together to design product features, yielding far better results than could have been achieved by one person.
The team gains responsibility for its own artifacts once they’re deployed in production. This generally leads to higher quality code that’s easier to troubleshoot. Why is that? Unlike with a monolith, teams tend to have a holistic view of the microservices that they own. So it’s much easier for the team to anticipate problems, to add good logging and metrics to troubleshoot problems when they occur, and to make proper use of resilience patterns (e.g. retries, circuit breakers and fallbacks, etc) to help avoid problems in the first place.
Moreover, since teams have a full sense of ownership over their work, keeping their services healthy and running in production becomes less about a nightmarish release schedule, and more about nurturing their creation.
Finally, teams are working towards the same goal, on the same timeline. That means no more blocking one person as they wait for someone in another functional area to free up.
We need to be intentional about autonomy
But we don’t get these benefits for free simply by breaking our monolith into microservices. Let’s take a look at our first, naive view of a microservices architecture:
If we’re like most engineers, our initial idea of a microservice architecture is, well, a bunch of microservices. Each exposes some sort of API (ReST, perhaps) to allow any other service to read from it and write to it.
As we gain experience, we learn that not all microservices serve the same purpose — or, at least, they shouldn’t. And therefore, much like our monolith had been arranged in layers, so we arrange our microservices:
At this point, we’ve defined the different types of microservices and applications that we want to build. Great. But we still haven’t made much progress in terms of team autonomy. Every microservice will need to be owned by some team. And so the question arises: which teams will own which microservices?
Cross functional teams
Our first, naive approach might be to organize our teams by mimicking our monolith org structure:
Here, we see teams (in purple) organized by function: UX design, frontend engineering, backend engineering, data engineers, DBAs, QA, etc.
This might feel right, at least initially. But let’s take a step back and look at the value we’re trying to deliver to our customers. Is it our goal to build things like the following for our customers?
- A bunch of database schemas
- A bunch of user interface mockups
- A bunch of microservices that can talk to a MySQL database?
Not really. Those are just the tools by which we use to create value for our customers. The actual value that we provide our customers/users comes in the form of features and functionality such as:
- A product catalog to search
- A mechanism to place items in a shopping cart and subsequently purchase them
- A notification system to alert customers of the status of their purchases
Similarly, we don’t want to organize our team by functional area. Rather, we should define our teams by the value that they create for customers; that is, across functions, in (the aptly-named) cross-functional teams.
With cross functional teams, everyone is working together to build a specific product or feature, from start to finish. Everyone on the team has the same objectives and priorities, so no functional area is blocked by another. Does the new backend API service require some database design work? Fine; the team’s backend engineer and DBA can both prioritize their work together.
At their best, cross functional teams encourage members to collaborate throughout each phase of the project. Each team member contributes to the overall design of the feature. Frontend, backend and mobile engineers jointly define API contracts. Everyone tests. And everyone starts becoming well-versed in their particular domain.
And so, our team structures might start looking something like this:
That’s better. But something still doesn’t feel right.
Sure, we’ve formed teams that likely will be more effective in owning products. But we’ve still taken a top-down approach to identify the topology of microservices that our organization intends to build. We’re left with a large collection of interdependent microservices, most of which are coupled together. We’ve simply assigned them to different teams to build.
This leads to concerns such as:
- How can we create APIs that will meet all current and future needs that any client might have? Can we encapsulate our data when any of our services might be called by any other team’s services?
- How much time will we waste waiting for other teams to implement our dependencies?
- What failures of our systems might be caused by failures in other systems (cascading failures)?
- Can we control the number of calls our services might be involved in? Can we ensure that our organization doesn’t wind up creating boundless synchronous calls between services, leading to astronomical response times, or worse (and yes, I’ve seen this happen) infinitely recursive calls across services?
- What if our team’s specific feature or problem space isn’t well-suited for the pre-planned microservice topology?
We need yet a different way of thinking. Perhaps a pattern already exists for us to follow?
Enter the Bounded Context
The Bounded Context is a key design pattern borne out of domain driven design, or DDD. Understanding the Bounded Context helps us form autonomous teams and, by extension, autonomous microservice architectures.
DDD itself describes a methodology of software development in which individuals within an organization work together to define a common language. In his book Domain Driven Design, Eric Evans frequently depicts engineers working with product owners to establish an agreed-upon vocabulary to describe things such as products, components of the products, actions that a product can perform (or can be performed on the product), parts of workflows, etc. This vocabulary encompasses the organization’s domain.
In many large organizations, however, defining a single, consistent vocabulary becomes unfeasible. In these cases, we break our domain into subdomains. Examples of subdomains might include:
- Inventory management
- Product discovery
- Order management
- Shopping carts and checkout
As designers, engineers, product managers, etc, get together to build out a subdomain, they form their own way of thinking and talking about the subdomain and its components.
This is where DDD meets cross-functional team structure. Though the team members are from different functional areas, they are responsible for their own subdomain, eventually become resident experts. Furthermore, the team is responsible for determining which artifacts — microservices, web applications, mobile apps, databases, and related infrastructure — are needed to bring the subdomain to life, and to the organization’s customers.
We can think of the team and its artifacts as comprising a Bounded Context.
Defining the Bounded Context
While Evans discusses Bounded Contexts frequently in his book, he doesn’t really define the pattern explicitly. So I’ll attempt to do it here:
Bounded Context:An internally consistent system with carefully designed boundaries that mediate what can enter the system, and what can exit it.
In other words, a Bounded Context represents a context — essentially, a system that encapsulates cooperative components — with clearly-defined boundaries that govern what can enter the system, and what can exit it.
Cells (those little things that collectively make up all living beings) offer a nice analogy. Within a cell are all sorts of components (the nucleus, ribosomes, cytoplasm, cytoskeletons, etc) that are all encapsulated within the cell itself. Surrounding each cell, however, is a membrane, which acts as the barrier between the cell’s internals and the rest of the organism. The membrane protects the cell from its environment, allows specific nutrients to enter it, and allows various byproducts to leave.
In the same vein, a Bounded Context consists a variety of components (microservices, web applications, mobile apps, databases, message queues, etc). It also serves as a logical barrier that encapsulates those components. Internally, the components can be coupled, and can freely pass data to each other. But the Bounded Context helps enforce loose coupling externally, defining explicit points where:
- external data can enter (perhaps via consumer subscribed to a Kafka topic)
- internal data can exit (maybe via another Kafka topic, or via a well-design GET API, carefully crafted to hide any internal system details)
A Bounded Context represents its cross-functional team as well. The team consists of various team members (designers, frontend/backend/mobile engineers, product managers, data engineers and QA engineers, etc). Internally, these members work cooperatively towards the same consistent goals. Moreover, these team members are (or should be) encapsulated so that they have minimal dependencies on other teams.
So rather than starting at an organizational level and defining all of the applications and microservices we expect to build, we build teams around our subdomains, allowing those teams to grow their subdomains and define what needs to be built. Done properly, we tend to see various Bounded Contexts in the organization as growing organically, rather than as rigid, predefined structures.
Implications on breaking the monolith
Conway’s Law tells us that organizations design software systems that mimic their organization structure. That often proves to be true, so we should be thoughtful about how we structure our organization as we begin building microservices.
Indeed, by now, a picture should be emerging in your mind. As we move from monolith to microservices, we should start thinking vertically (dividing the monolith by its subdomains) instead of horizontally (dividing the monolith by its functional layers).
We should be dividing things not like we do on the left, but as we do on the right
In other words, we shouldn’t start by replacing the monolith’s data access layer with data microservices. Rather, we should start by splitting out an entire feature (such as the checkout process, or perhaps product search). Each feature will represent a Bounded Context. And each will be split out by a dedicated cross-functional team.
Moreover, that team should focus on their task at hand, which is to either:
- faithfully replicate the existing functionality,
- or (better) to build an all-new, improved experience for its customers.
As part of the process, the team should design the system that is best suited for the endeavor.
For example, we might decide to peel our product search functionality out of our monolith. The product search team might ultimately design a system that includes:
- Kafka consumers that listen to a number of external Kafka topics to update its own internal system of record (SoR) for products.
- a Kafka publisher that pushes changes to its SoR onto an internal Kafka topic
- another Kafka consumer that listens to that internal topic and updates an Elastic Search index
- a GraphQL endpoint for freeform searches that queries Elastic Search
- a ReST endpoint that retrieves individual products by ID
- a redesigned web application that uses those endpoints to allow customers to search for products and explore product details
- a similar set of screens in our mobile apps that use those endpoints
- A Kafka publisher that pushes messages, representing distinct queries performed by customers, to an external Kafka topic, for use by any other bounded context (say, analytics) that might be interested
What the design of our Product-Search Bounded Context, encapsulated in red, might look like
As we start peeling off more and more vertical parts of our monolith, other teams build out their own Bounded Contexts. These Bounded Contexts might wind up looking quite different from one another.
Each team determines how to best build solve its task at hand
Notice that components within a given Bounded Context might be tightly coupled; however, we are keeping our Bounded Contexts decoupled from each other. In our example, any communication between Bounded Contexts happens by passing messages via a Kafka message queue. Importantly, we are avoiding synchronous request/response calls between Bounded Contexts.
This is also true with what remains of the monolith. We certainly don’t want tight coupling between our new microservices and our legacy monolith. So as we peel parts of the monolith away, we leverage message-passing to allow the remaining parts to communicate with our new Bounded Contexts.
Reality check on all of this decoupling
At this point, we may ask ourselves whether it’s really possible to keep our Bounded Contexts decoupled.
In the real world, can we really keep our teams protected from external dependencies? Will there never be instances where a team must be blocked by another team to get their work done?
And can we actually create service architectures for our subdomains that are completely decoupled from other subdomains? Is there truly no need for an application in one Bounded Context to ever synchronously call a service in another?
In reality, it might be impossible to keep our Bounded Contexts 100% decoupled. But we can get close, much closer than most of us might think.
Let’s start by looking at decoupled architectures. Often we buy into the fallacy that any given type of data should live in exactly one location, and that any other system must directly call into that one location to access the data.
We refer to this as assigning a Single Source of Truth (SSoT) to our data. But as described in this article that dissects the idea of SSoTs, that very notion is, by and large, an anti-pattern. Instead, most Bounded Contexts should store their own local copy of any data that they need to use.
This is exemplified by our Product-Search Bounded Context from the previous section. This Bounded Context, of course, relies heavily on our organization’s product catalog data. But odds are, that data is generated in a different Bounded Context (we’ll call it the Product-Entry Bounded Context).
Our first (naive) approach might be to expose a ReST API from the Product-Entry Bounded Context and force the services within the Product-Search Bounded Context to call that API. But we can do better. We can instead keep the systems decoupled by publishing the changes made by the Product-Entry services onto Kafka. Our Product-Search Kafka consumers then pick up those messages and update the Product-Search databases.
Note that those two Bounded Contexts are eventually-consistent. This means that there will be brief periods of time where a given piece of data might be inconsistent between Product-Entry and Product-Search. For example, if the price of White Wombat Widgets is raised from $1.99 to $2.49, there will be a brief period of time (often a matter of seconds if not milliseconds) where there is a 50¢ difference in White Wombat Widget price across the two Bounded Contexts.
This leads to the real-world cases when we have no alternative but to couple Bounded Contexts. In some cases, eventual consistency is not acceptable. For example, before a customer can complete their online purchase, we might need to ensure that every item in their shopping cart is, in fact, available at that moment. Even then, we can often minimize the coupling between the two Bounded Contexts.
Our interactions might look like this:
- As the customer uses the Product-Search UI to find products, the Product-Search databases are used to retrieve information (such as styles, customer reviews, pricing, etc.) about the products
- Even as the customer begins the checkout process, we still use the Product-Search databases to retrieve the information that needs to be displayed.
- Finally, when the customer clicks on that final “Complete Purchase” button, we make a single, synchronous call to the Product-Entry Bounded Context to validate the items’ availability before completing the purchase.
Another common example that requires immediate consistency related to authorization. In many systems, security tokens must be retrieved or validated on every request. In those cases, we probably need to allow our Bounded Contexts to call another, security-oriented Bounded Context.
Real-life org structures
How about self-contained, cross-functional teams? How possible are they in the real world?
In reality, it’s a process of continuous movement towards wholly self-reliant teams. Rarely will we ever reach 100% autonomy with our teams. But if we start by organizing our teams intelligently, and recognize and respond to bottlenecks that arise, we can come close.
For starters, we should maximize our vertical, cross-functional teams and minimize the number of horizontal, single-functional teams. That means resisting the urge to form so-called “core” teams — whose mission is to build common data services that are consumed by other product-oriented teams — and instead form our teams around the business value that they will provide.
Many organizations tiptoe towards this goal, first forming domain-oriented teams of product managers, and front-end and back-end engineers. That’s a start. But who else should these teams include? Exact membership might differ across different teams with different needs. But we should consider things like:
- If our team has front-end engineers, then odds are they should be working closely with a graphic designer who is dedicated to the domain.
- Mobile engineers — often sequestered into their own area of the org — should be included for domains with a mobile component.
- In her enlightening article about data meshes, Zhamak Dehghani laments that data engineers are often excluded from cross-functional teams — to the detriment of the data engineers, and the cross-functional teams themselves.
Once we’ve determined the membership of our teams, we should watch out for bottlenecks. Are there any other teams that habitually block our cross-functional teams’ productivity?
For example, many organizations have a dedicated security team. This is a good practice, of course; organizations need a cohesive security strategy, and a way to ensure governance over that strategy. However, it’s also common for teams to halt their work at various stages to allow security reviews of their work. Even in the best of situations, this establishes roadblocks for our teams as a routine a business practice. Moreover, it will often lead to teams having to scrap all or part of their work and start over, as they uncover security requirements that hadn’t been met.
This is clearly a bad smell. But, how can we enforce our organization’s security standards while allowing teams to remain autonomous and productive?
We can do this by adding security engineers to our cross-functional teams. There are three approaches we can take:
- If we’re lucky enough to have a relatively large security team, we can assign each cross-functional team a full-time security engineer (SE).
- With smaller security teams, each SE can be assigned to a number of cross-functional teams in a part-time manner. This would still allow the SEs to understand the teams’ objectives and designs, and to work with the team to adhere to the org’s security standards throughout the process.
- If we don’t have enough security resources for either we can move in the opposite direction. Rather than bringing members of the security team into our cross-functional teams, we can bring members of the cross-functional teams out to the security team. Each cross-functional team would designate one or two security representatives. The representatives would, periodically, meet with security and be kept abreast of the organization’s security requirements and standards. They may not be security experts themselves. But they’ll be able to serve the role of a security engineer, ensuring that their teams adhere to the organization’s security practices.
This dovetails into another organizational pattern that has been gaining traction: guilds. The guild model was born from the cross-functional team model. By their nature, those teams are populated with members who specialize in different functions. Yet, it often makes sense for folks who specialize in a specific function to meet together as well; for example, to:
- Hone their skills and learn from each other
- Discover and establish best practices for their particular function
- Depending on the function, create company standards and requirements
Our last security solution effectively formed a “security guild”. Team members primarily worked with their vertical teams; but periodically, some of them would meet with the security “guild” to discuss the organization’s security practices and standards.
The guild model also works particularly well when it comes to software architecture. Particularly with a microservices architecture, some level of organization-wide technical governance is required. Yet having a group of architects sitting in a metaphorical ivory tower, handing out rules to teams, is generally counter-productive. Instead, senior/lead engineers from our cross-functional teams can periodically meet in an architecture guild. There, they can raise issues from their teams, and work out solutions, and establish pattens and standards.
Examples of vertical cross-functional teams, supplemented by horizontal guilds
Guilds can also be extended to nearly all other functions as well. After all, we want out designers to be developing, and working from, a common UI style guide. We want our frontend engineers to use the same UI elements. QA engineers should be aligned with how testing is done in our organizations. And product managers should be in sync with the organization’s overall product roadmap.
Bring it all together
Moving to microservices can dramatically improve the productivity of our teams. But we need to understand how to take advantage of a microservices architecture to get us there. Of all the design patterns and concepts that relate to microservices, the Bounded Context is arguably the single most important one to give us that understanding.
With a solid grasp of the Bounded Context, we understand that:
- Our organizational structure and our technical architecture go hand in hand
- Our product-oriented teams should have minimal dependencies on other teams, just as the systems they build should be decoupled from other systems
In general, embracing the Bounded Context puts in the mindset that we need to be successful with our microservices architecture. Make sure you understand this important pattern before you embark on your microservices journey!