Skip to main content

Narrative 001: Do you need a reliability strategy?


Let’s be direct:
Do you care about your users? Your business? Continuity, perhaps?
Then long story short is yes, you need a reliability strategy.

I know. It depends. It always does.

But insight into reliability is valuable, especially when you can access it before things break.

Perhaps the cracks are now showing in your service or some of the popular buzzwords are showing up in a roadmap nearby? Either way, it's a good time to talk about reliability strategies.


Where I find this especially relevant:

If you're working with:

  • Containers and Kubernetes

  • Internal developer platforms

  • A team large enough to fight over merge conflicts

  • CI/CD pipelines that break more than they build

  • [if you're not able to ship your ideas from cradle to grave with ease, add more reasons]


We are also here because in the early days (what do I even know :-p) functionality was a huge priority and while that remains relevant contextually, modern software environments are more complex and expectations are at an all-time high.

So, do any of these sound familiar?

  • You're running software critical to your business

  • You have customers, or even just one user who matters

  • You're on the modern cloud (AWS, GCP, Azure, etc.)

  • You're building or scaling a team

  • You're dealing with sensitive data

  • You're not sure how your system behaves under load—or during failure

If yes to any of the above, then it's worth exploring a reliability strategy.


Making this happen depends on a lot of factors and it doesn't have to be complex.

At any scale, reliability is a everyone's responsibility and I believe an investment in discovering what it takes to start or keep it going is absolutely worth it. I also think reliability (SRE) is greatly misunderstood because we have been reading from the books by Google trying to be Google and when we weren't trying, we didn't experiment enough. btw they are great books and what they started is still a game changer. Shout out to them always.

Making this happen depends on a lot of factors and it doesn't have to be complex.


So what does a strategy look like?

There’s no one-size-fits-all answer.
Strategies come in different shapes and sizes but what matters is that they’re intentional and grounded in reality.

Here are just a few of the questions a good reliability strategy should help you answer:

  1. What kind of value do you provide? Who’s depending on it and what breaks or who is affected if they don't get what they need?

  2. Where does your software run? How important is location to fulfilling the value you provide? does it even matter where it runs?

  3. How many people work on or depend on the system? And how do they collaborate?

  4. What are your biggest sources of failure today? (Be honest. And if you know and can share, how much does it cost you?)

  5. How do you currently detect problems and how fast can you recover?


The answers won’t give you a complete strategy, but they’re a start.
They give shape to the system you’re working with and show you where to focus your efforts. 
Btw let’s not forget that a good strategy isn’t just a set of policies or tools as it can involve multiple people and roles. So there's an agenda to drive a shared understanding.

There’s also no shortage of modern techniques and tools that can support your approach, but remember tools complement strategy they don’t replace or complete one.


TL;DR

If you care about uptime, trust, and delivering value, you need one.
But it can (and should) reflect your scale, maturity, and architectural constraints.

Comments

Popular posts from this blog

More than one or plus one

I've spent an unhealthy amount of time thinking about how to share this so I'm literally sharing the drafts of my thought process to respect the garden-blog concept 😁 I got the opportunity to share this at Xebia 's annual TED-style knowledge exchange and the event was incredible as always. I'm not sure the recording will go online but this page holds the original idea and the final edit of the poem.  This talk explored belief systems, focusing on one of the most challenging obstacles we face: the struggle to accept grief, help and continue living fully. The closer the loss, the bigger the smack. I don't know if that ever changes but I think it's worth being grateful for overcoming those moments sometimes. It takes a village: (A love Letter to communities) - Lessons: Noobing through everything in life is how a lot of us are doing and for anyone who had the privilege to learn how to live, what a beauty! What does a toy car, a tea flask and a girlfriend have in co...

Leadership coaching: A shout out

Some things are easier to do for others than they are to do for yourself.  For me, showing up; as in really being present, grounded, and intentional has always been one of those things. I’m one of those people in the great battle against people pleasing and if I could “disable that endpoint” my friend, I would. I’m not here pretending I’ve cracked the code as life is not that simple. But I do want to give a shoutout to Sarah Gruneisen πŸ‰ and her inspiring leadership coaching. I first heard Sarah speak at last year’s SREday Amsterdam (presentation photo below), and since then, I literally couldn’t stop listening! I’m truly honored to be a part of the Avagasso Leadership Landing program and honestly, it’s been a breath of fresh air because Sarah has a way of holding space that makes you feel seen, heard, and challenged;   Halfway into the program, I’m learning how to lead in a way that feels more like me; with clarity, initiative, structure and tonnes of inspiration.   M...