Team Topologies

I am reading this book as part of a book club at work.

The summary for the first part (chapters 1-3)

1 The problem with organization charts:

The software tends to match the organization charts (top-down) and that is not a good approach for software development because that doesnt cover (current) good software practices (devops, agile, lean, etc). This is based on Conway’s law: “Organizations which design systems . . . are constrained to produce designs which are copies of the communication structures of these organizations.”

So instead of using org charts, move to a “teams” organization approach that authors call “team topologies”: adaptive model for technology organization design that allows businesses to achieve speed and stability. Focuses on how to set up dynamic team structures and interaction modes that can help teams adapt quickly to new conditions, and achieve fast and safe software delivery.

Cognitive load can be a powerful tool for deciding on team size, assigning responsibilities, and establishing boundaries with other teams.

Dan Pink’s three elements of intrinsic motivation: autonomy (quashed by constant juggling of requests and priorities from multiple teams), mastery (“jack of all trades, master of none”), and purpose (too many domains of responsibility)

Personally, I think the org chart still has a point from HR perspective, at least I want to know who is who.And disagree when they say “there is a strong focus on the more immediate automation and tooling adoption, while cultural and organizational changes are haphazardly addressed”. People dont change tools that easily. And company culture hardly is checked out.

2 Conway’s Law and Why It Matters

The idea to become a high-performing organization is to reverse Conway law as a team-first software architecture. Then you need things team sized. The size needs to maximizes people’s ability to work with it. Important to define “team interfaces” to set expectations around what kind of work requires strong collaboration and what doesn’t (some type of communications are not needed). Tools selection is important, one doesn’t fit all. Unnecessary re-org (without any proper justification)/cut headcounts are bad (obviously).

3- Team-First Thinking

Follow: team-first architecture. Examples:

-US Army have adopted the team as the fundamental unit of operation. Based on scaling by Dunbar (5-7 people max)
-google: teams matter more than individuals.
-aws: team of size to feed with 2 pizzas)


Maximize trust between people on a team, and that means limiting the number of team members. Adding new people to a team doesn’t immediately increase its capacity (Fred Brooks – The Mythical Man-Month)

The best approach to team lifespans is to keep the team stable

Every part of the software system needs to be owned by exactly one team: no shared ownership

You need in the team:
– Team-First Mindset
– Diversity
– Rewards teams, no individuals
– Restrict Team Responsibilities to Match Team Cognitive Load more space for germane cognitive load (which is where the “value add” thinking lies).
team needs the space to continuously try to reduce the amount of intrinsic and extraneous load

Measure the Cognitive Load Using Relative Domain Complexity: mastering a single domain -> mission is clear, less context switching.

Limit the Number and Type of Domains per Team: simple, complicated, complex. Complex only one team,

Design the system and its software boundaries to fit the available cognitive load within delivery teams.

To increase the size of a software subsystem or domain for which a team is responsible:

  • team-first work env
  • minimize distractions
  • goals and outcomes (good) vs how (bad)
  • dev-exp

Teams need a team API and well-defined team interactions (aws: everything is an API): code, version, wiki/doc, principles, communication/information

As well, provide time, space, and money to enable inter teams collaboration

Explicitly Design the Physical and Virtual Environments to Help Team Interactions: I think the physical site is kind of clear (there are good notes) but for the virtual env they just mention group chats.

physical: office layout, whiteboards, standing desks?, noise, clear desk (lockers)
(squads -> tribes)

Summary: Limit Teams’ Cognitive Load and Facilitate Team Interactions to Go Faster

===

part2

In Part I, we saw the strong pull that Conway’s law exercises on system architecture by mirroring team structures and communication paths in the final product design. We also highlighted that efficient software delivery requires a team-first approach that relies on long-lived autonomous teams achieving fast flow. Part II will focus on how we put these two ideas together in a way that maximizes flow yet respects the cognitive limits of teams.

CHAPTER 4: Static team topologies

  • Instead of structuring teams according to technical know-how or activities, organize teams according to business domain areas.

Summary

  • Ad hoc or constantly changing team design slows down software delivery.
  • There is no single definitive team topology but several inadequate topologies for any one organization.
  • Technical and cultural maturity, org scale, and engineering discipline are critical aspects when considering which topology to adopt.
  • In particular, the feature-team/product-team pattern is powerful but only works with a supportive surrounding environment.
  • Splitting a team’s responsibilities can break down silos and empower other teams.

Notes:

  • team anti-paterns
    –Organizations must design teams intentionally by asking these questions: Given our skills, constraints, cultural and engineering maturity, desired software architecture, and business goals, which team topology will help us deliver results faster and safer? How can we reduce or avoid handovers between teams in the main flow of change? Where should the boundaries be in the software system in order to preserve system viability and encourage rapid flow? How can our teams align to that?
  • design for flow of change: Spotify: journey in progress, not a journey completed.
  • Shape team intercommunication to enable flow and sensing:
    — Organizations that value information feedback from live (production) systems can not only improve their software more rapidly but also develop a heightened responsiveness to customers and users.
  • devops and the devops topologies
    –A key contribution of DevOps was to raise awareness of the problems lingering in how teams interacted (or not) across the delivery chain, causing delays, rework, failures, and a lack of understanding and empathy toward other teams.
  • devops topologies
    –The DevOps Topologies reflect two key ideas: (1) There is no one-size-fits-all approach to structuring teams for DevOps success. The suitability and effectiveness of any given topology depends on the organization’s context. (2) There are several topologies known to be detrimental (anti-patterns) to DevOps success, as they overlook or go against core tenets of DevOps. In short, there is no “right” topology, but several “bad” topologies for any one organization.
  • successful team patterns
    –The success of different types of teams does not depend solely on team members’s skills and experience; it also depends on (perhaps most importantly) the surrounding environment, teams, and interactions.
  • Feature Teams Require High-Engineering Maturity and Trust
    –The feature team typically needs to touch multiple codebases, which might be owned by different component teams.
    –someone still had to keep oversight of the system as a whole and ensure subsystems integrated and interacted according to the desired user experience, performance, and reliability. –> create specific roles: system architects, system owners, or integration leads.
  • Product Teams Need a Support System
    –The key for the team to remain autonomous is for external dependencies to be non-blocking, meaning that new features don’t sit idle, waiting for something to happen beyond the control of the team –> Non-blocking dependencies often take the form of self-service capabilities
  • Cloud Teams Don’t Create Application Infrastructure
    — cloud team providing high-quality self-services that match both the needs of product teams and the need for adequate risk and compliance management.

-SRE Makes Sense at Scale
–The SRE model sets up a healthy and productive interaction between the development and SRE teams by using service-level objectives (SLOs) and error budgets to balance the speed of new features with whatever work is needed to make the software reliable.
–SRE is a dynamic balance between a commitment to operability from the application-development team and expertise from the SRE team. Without a high degree of engineering discipline and commitment from management, this fine balance in SRE can easily degrade into a traditional “us and them” silo that leads to repeated service outages and mistrust between teams.

-Considerations When Choosing a Topology

-Technical and Cultural Maturity

-Quadrant: Organization Size / Software Scale vs Engineering Maturity

-Splitting Responsibilities to Break Down Silos
–highlight the importance of thinking about teams’ capabilities (or lack thereof) and how that causes dependencies between teams.

-Dependencies and Wait Times between Teams
–three different categories of dependency: knowledge, task, and resource dependencies.
–Detect and track interdependencies.

-Detect and track interdependencies.

Summary: Adopt and Evolve Team Topologies that Match Your Current Context

CHAPTER 5: The Four Fundamental Team Topologies

  • The architecture of the system gets cemented in the forms of the teams that develop it
  • Four fundamental team topologies:
    –Stream-aligned team
    –Enabling team
    –Complicated-subsystem team
    –Platform team

There is no “Ops” team or “support” team in the fundamental topologies,

Summary

  • The four fundamental team topologies simplify modern software team interactions.
  • Mapping common industry team types to the fundamental topologies sets up organizations for success, removing gray areas of ownership and overloaded/underloaded teams.
  • The main topology is (business) stream-aligned; all other topologies support this type.
  • The other topologies are enabling, complicated-subsystems, and platform.
  • The topologies are often “fractal” (self-similar) at large scale: teams of teams.

Notes

-Stream-Aligned Teams
–A “stream” is the continuous flow of work aligned to a business domain or organizational capability. Continuous flow requires clarity of purpose and responsibility so that multiple teams can coexist, each with their own flow of work.
–The stream-aligned team is the primary team type in an organization, and the purpose of the other fundamental team topologies is to reduce the burden on the stream-aligned teams
“you build it, you run it” popularized by Werner Vogels, CTO of Amazon

-Capabilities within a Stream-Aligned Team
–This might mean having a mix of generalists and a few specialists.

-Why Stream-Aligned Team, Not “Product” or “Feature” Team?
–a stream should flow unimpeded.

-Expected Behaviors in an effective stream-aligned team
— several features:
–roduce a steady flow of feature delivery, uses an experimental approach to product evolution, expecting to constantly learn and adapt., minimal (ideally zero) hand-offs of work to other teams., evaluated on the sustainable flow of change it produces, must have time and space to address code quality changes (tech debt), proactively and regularly reaches out to the supporting fundamental-topologies teams (complicated subsystem, enabling, and platform).feel they have achieved or are in the path to achieving “autonomy, mastery, and purpose

-Enabling Teams
–An enabling team is composed of specialists in a given technical (or product) domain, and they help steam-aligned team with improving their capabilties (read, learn, practice new skills). It is a “technical consulting team” – gives guidance, not execution.

-Expected Behaviors
— proactivively seeks to understand needs of stream team, ahead of the tech curve, messenger of good/bad news, might act as a proxy for difficult services, promotes learning across teams
— Stream-aligned teams should expect to work with enabling teams only for short periods of time (weeks or months) in order to increase their capabilities around a new technology, concept, or approach.

-Enabling team vs Communities of Practice (CoP)
— Enabling teams and CoP can co-exist because they have slightly different purposes and dynamics: an enabling team is a small, long-lived group of specialists focused on building awareness and capability for a single team (or a small number of teams) at any one point in time, whereas a CoP usually seeks to have more widespread effects,

-Complicated-Subsystems Teams
–A complicated-subsystem team is responsible for building and maintaining a part of the system that depends heavily on specialist knowledge, to the extent that most team members must be specialists in that area of knowledge in order to understand and make changes to the subsystem.
–complicated-subsystem team is created only when a subsystem needs mostly specialized knowledge. The decision is driven by team cognitive load, not by a perceived opportunity to share the component.
–we expect to have only a few complicated-subsystem teams in a Team Topologies–driven organization

  • Expected behaviors
    –off-load work from stream-aligned teams on particularly complicated subsystems that need to be developed by a group of specialists.

-Platform teams
–enable stream-aligned teams to deliver work with substantial autonomy.
–The platform team’s knowledge is best made available via self-service capabilities via a web portal and/or programmable API — Ease of use.
–the platform can provide different levels of service.

-Expected behavior
–strong collaboration with stream-aligned teams to understand their needs.,fast prototyping techniques and involves stream-aligned team members for fast feedback

-Compose the Platform from Groups of Other Fundamental Teams
–ested or “fractal” teams within the platform—what we like to call inner topologies.

-Avoid Team Silos in the Flow of Change

-A Good Platform Is “Just Big Enough”

  • aim for a thinnest viable platform (TVP) and avoid letting the platform dominate the discourse.
  • Cognitive Load Reduction and Accelerated Product Development: The most important part of the platform is that it is built for developers.”

-Compelling, Consistent, Well-Chosen Constraints: Dev Experience (User experience UX)

-Built On an Underlying Platform (platform layers)

-Manage as a Live Product or Service: using software-product-management techniques

-Convert Common Team Types to the Fundamental Team Topologies

-Move to Mostly Stream-Aligned Teams for Longevity and Flexibility

-Infrastructure Teams to Platform Teams

-Component Teams to Platform or Other Team Types

-Tooling Teams to Enabling Teams or Part of the Platform

-Converting Support Teams: 1) support teams aligned to the stream of changes, and (2) dynamic cross-team activity to resolve live service incidents.

-Converting Architecture and Architects
–The most effective pattern for an architecture team is as a part-time enabling team (if one is needed at all)
–to help them achieve better outcomes and provide them the tools and technologies that will enable these outcomes.”

-Summary: Use Loosely Coupled, Modular Groups of Four Specific Team Types
–stream aligned, enabling, complicated subsystem, and platform.

CHAPTER 6

Summary

  • Choose software boundaries using a team-first approach.
  • Beware of hidden monoliths and coupling in the software-delivery chain.
  • Use software boundaries defined by business-domain bounded contexts.
  • Consider alternative software boundaries when necessary and suitable.

Notes