Observations on the practice of software architecture - I
“Simplicity and elegance are unpopular because they require hard work and discipline to achieve” — Edsger Dijkstra
Having a role that requires me to sometimes wear the "software architect's" hat at a relatively large organization has given me ample material to digest and reflect on the practice of software architecture, i.e. the design of software systems comprising multiple components owned by different teams. I am hoping to write a small series of posts that articulate (hopefully in an intelligible manner) why some architecture is relevant in our day and age, why it should be a seen as core skill for teams (especially senior+ engineers) and reflect on my own view of the practice of software architecture. Since architecture is part of the broader process of building any non-trivial software system, let's start there.
So, without further ado, is software architecture needed?
As an industry we've been in crisis since, well... forever. The term first appears in the aftermath of the 1968 NATO Software Engineering1 Conference2.
This state of affairs didn't prevent software from "eating the world", and being at the heart of the most valuable companies out there. The history of software development is marked by the trend of working at increasingly higher abstraction levels, from close to the metal development in Assembly and C, to modern distributed systems leveraging elastic computing capacity and programmed in high level languages.
In parallel with this technical evolution, the methodologies for managing software intensive projects and products also evolved. Initially imported from other industries, typically process-heavy, in an attempt to improve predictability (of both time and costs) and control the software development process, waterfall3, treats software development as an industrial production problem, and therefore it's core idea is that software development should follow well defined phases in sequence: requirements "engineering", analysis, program design, coding, testing and operations/maintenance. While this can be a valid approach in certain domains like aerospace, or medical devices, the reality is that in most contexts, software development is a learning problem rather than a production problem. Having to wait until all requirements are specified beforehand, or testing only after the code is "done" is a fool's errand and has been detrimental to our industry.
As a reaction to this in the late 90's and early 2000's there was the rise of "agile" software development methodologies4, which put an emphasis on delivering working software, in small iterations and with a pragmatic focus on what works in a given context versus strict adherence to a process, thus allowing teams to choose what works for them in their particular environment - the tectonic shift here is that these principles treat software development as an exercise in learning. These practices contribute to an the empirical approach to software development that is invaluable in my view.
However, this clashed with the reality that businesses need predictability, control (or the illusion thereof) and some way of attesting that things are being done in a "good" way - to be fair some software engineers often have a hard time productively engaging with non-technical stakeholders, and building a relationship of trust, so there is plenty of blame to go around.
Fast forward to 2024, and we continue our inexorable march to build ever more ambitious systems. I would argue that software design is more needed than ever.
We find ourselves in an interesting situation where a lot of organizations practice a sort of sclerotic fake agile5, that goes through the motions of daily stand-ups, retros, sprints and story-points but ultimately never reflect on those practices and question their value - and at the end teams are asked to produce a yearly plan.
While no one advocates that teams shouldn't stop and think, there are a couple of common traps that result from the toxic combination of a naive understanding of agile software development methodologies, and the incentives at play in organizations. In particular, I've seen teams thinking they can "afford" to go from iteration to iteration, delivering whatever is on the next sprint without taking the time step away, reflecting, factoring new risks and course correcting. After all they are following the plan, and if design is emergent and there is always the option to refactor (spot the cognitive dissonance?), why should teams invest time planning and designing and projecting how current choices will play out? On the other hand due to the pressures of fast moving organizations, all sorts of interesting had-hoc justifications as to why significant refactorings may be postponed enter the conversation. In general this contributes to an environment where some degree of planning and design is viewed with suspicion. The process becomes very dependent on individual's sensibilities of what "agile truly means" and I've seen teams reject some degree of planning as "big design upfront"6, better close those tickets, ship and refactor later. At the end of the day, this set of pressures and the fact that agile deals primarily with the software delivery process offering little to no guidance for how to bake design into the process, creates fertile ground for strongly held opinions, and egos coming into the mix. This is counterproductive, depending on your context you should consider that:
-
Not all choices are created equal, some choices carry significant path dependence7 and may significantly constraint the ability of the team to operate or evolve the system (thus are hard to reverse). For example choice of data store is a classic example: DynamoDB is very performant for queries that against a primary key, range key or index - the moment you need to do queries against arbitrary columns or store more than 400KB per row, then you may be out of luck8.
-
Some technical risks if not addressed early on, may prove to be viability crushing later on - teams coding themselves into a corner is not unhread of. There is a balancing act at play between proactively addressing risks and the "You ain't gonna need it" principle. At the end of the day it boils down to context specific bets on when and how to address these risks.
-
Refactoring as a technique is invaluable, it may not be sufficient at scale and especially when dealing with cross-team dependencies. For tightly coupled systems, refactoring may be exceedingly hard because changes from one team may require significant coordination.
-
Embedding technical qualities like simplicity (modularity, loose coupling, clear "contracts" and sensible interfaces), testability, security, performance and developer experience in the team's day-to-day work goes a long way in helping ensure that the cost of adding new feature down-the-line is not massively higher than building those first few features. This however, requires educating the the team and fostering a trust-based, productive relationship with other non-technical stakeholders in order to create the conditions for these qualities to be built/exercised.
-
As organizations grow it's easy for the system as a whole to lose coherence. Teams operating autonomously (as they should) can easily fall into the trap of delivering services that are not really consistent with the goals of the organization or don't integrate well with what other teams have built - creating common ground and shared understanding requires investment9. Complex deliveries may take significant time, requiring the contribution of stakeholders and teams with different skill sets (engineering, legal, marketing, UX, etc). When operating at increasingly greater time spans of discretion10 that outgrow what is controlled by a single team and take longer than a few sprints, a modicum of common ground and coordination is paramount to ensure that the joint action is coherent.
-
Stories play an important role in how humans make sense of, and experience the world. It is important to contextualize how the iterative work the team is doing fits the bigger picture, and how certain technical investments and practices may pay off. A good narrative that captures the problems that need addressing, what are the hypotheses being explored and how the team is contributing to the broader organization objectives can be a powerful instrument to elicit good feedback, or simply to get everyone on the same page. Technical excellence is a must-have, however just relying on a big dose of "programming motherfucker"11 energy will not suffice outside of very specific cultures. Large scale software development is a socio-technical endeavor and a team sport.
So we're back to the beginning - software engineering is still in crisis, and by now I hope to have convinced you that we do need some design - AKA architecture - in our software systems. It's a real thing and not complete bullshit when done right.
In the next post I am going to cover some of the common anti-patterns that usually arise when we think about Software Architecture (note the capitalization) and offer some of my own observations on the subject. Charity Major's article12 - spicy as it may be - provides a lot of good food for thought, and there is a lot to agree with - but there is also plenty more to be said about this.
Footnotes
-
On the term "Software Engineering" see this very good essay by Hillel Wayne ↩
-
I find that this varies a lot with the experience of the team. Usually more mature folks have a more nuanced and more pragmatic approach. But considering how much our industry has grown, the odds are that teams will tend to have younger folks, which may lack some of the maturity of industry veterans - and the stereotype of the lone, red-bull drinking, code-slinger (coupled with ageism) still resonates (FYI: if you subscribe to that idea, then sorry to rain on your parade, but this is a team sport). ↩
-
When in doubt Postgresql is a great choice. ↩