DevOps has different meaning depending on who you’re talking to, even some definitions that appear similar are different in nuanced but important ways.
All “devops” as a job title has done has muddy responsibilities and given many folks the wrong impression of what an operations discipline should be.
There is also a lot of rewriting of history that gets thrown in, similar to how when people talk about cloud then the only alternative is to start making CPUs by hand and begin building your own nuclear reactors. It’s the idea of what came before, not the reality, that people seem to be defensive of.
It’s honestly exhausting to discuss.
So instead I became CTO so I can solve this mess properly, I don’t hire devops, I hire infra engineers, build engineers, release engineers and: backend engineers.
Roles so simple that you already have a clue what they do, which is sort of the point of job titles.
"Yea the DevOps guy messed up the widget and nobody notic---"
"Wait, what is the DevOps guy doing even touching that widget.... what is even DevOps to you?"
"Bro that widget IS DevOps."
-silence-
Same applies to the topic of "micro-services".
I never miss an opportunity to share my favorite piece of comedy this decade:
I think that's where most people go wrong. They put a bunch of services in front of a shared database, which means that they don't have to go through a service's API to get to it's data, and that's what breaks everything.
http://resources.1060research.com/docs/IntroductionToResourc...
There was an angry Ask HN post a while back about how terrible the "new guys" are that are out there.
It then complained that the new hire wrote a horrible authentication service.
I thought it was an intentionally absurd post about the expectations put on new some rando new guy to write something important / they shouldn't be working alone on ... but they were serious.
Ok, I've seen this, but IME (24y in industry, the last 6 as a consultant) in the vast majority of cases, it's more like "things devs should control but CAN'T [bc CICD etc are silo'd and owned jealously by an overburdened ops team unable or unwilling to facilitate self-service]".
Agreed. For some applications the cloud difference is significant; for many (most?) others though, "Cloud" is just rebrand of "Hosted". And even for more cloudy offerings, while I'm in a very specific and different part of IBM, some of the old timers/architects/powers-that-be keep trying to explain "We had that in 1969!!!" :-D
Agreed also at rewriting of history when it comes to development/support/operations models. My dad has been IT director and he chuckles when I talk to him about "new and exciting paradigms" which he of course sees as turning a circle to what they had in 70's and 80's :)
As someone with 20+ years in IT, I agree - a lot of these "new and exciting paradigms" are not new at all.
My personal favourite is how many large multi-nationals are now building in-house clouds?
WTF is the difference between an "in-house cloud" and a shared-use datacenter from the 1990's?
A couple million in salaries and bonuses.
The interface to the shared-use datacenter, if you're lucky, is a spreadsheet that declares the static resources you own and a remote hands guy that can tackle things beyond the capabilities of your remote KVM. If you need more capacity you need to work with the datacenter folks to order physical machines that might show up in a few months.
The interface to the in-house cloud is an API. In most instances, developers are completely abstracted away from the physical infrastructure and don't need to take a lock on some human in the datacenter to get their work done.
An in-house cloud sounds like the mainframe installed in the raised floor computer room at the school district office I worked at in the late 90s; of course, the 12 foot long Unisys mainframe was replaced with a Unisys 4U pentium pro box pretending to be a mainframe, and then there was a lot of extra floor space.
If you're running your own 'cloud' in a (shared) colo, I dunno that that's really in-house. I guess it's still 'private cloud' though.
“There aren’t any new problems. Just new engineers”
- A sig I read a long time agoCan't call a mainframe a bunch of buzzwords like "Hyper-converged, high availability, on-premise software as a service cloud platform".
I still have the exact same problem with respect to never having exactly the ratio of CPU to memory that would make my app happy.
Especially seniors and beyond. They force leetcode interviews they somehow pass or are grandfathered through and gatekeep "trash devs" by slamming gotchas about how the whiteboarded code discussion doesn't technically compile and how "returnAverage()" isn't a method ("did you know you have to write methods before they work? What does returnAverage() supposedly even do? Return a random character in the alphabet?")
I get excited about working in different tech areas. I'm exceptional at fixing other's bugs and maintaining code. Absolutely nobody seems to be hiring for this though. It's all about college exam trivia and leetcode.
When I do get into these companies, I have to work with people throwing around casual racism with HR joining in and f-slurs like it's corporate 4chan. I ask if anyone can help look at a critical bug I discovered, and nobody speaks up. Even when it turns out it was their last code change that caused it and they are the sole master expert in this area, and they were just working on organizing their desktop instead.
"I can try to look at the SQL problem, since nobody else spoke up." Then I'm explaining the basics of SQL to some guy sitting around blank faced and it turns out he was hired because of his 10+ years of expertise in SQL. His whole job is to tackle problems like that.
Meanwhile I'm out here looking for jobs when panic cuts happen and it takes forever because I'm drained from all the gotchas and gatekeeping in these interviews.
There's no way I could live with myself if I put this experience onto others. I lose sleep when I fail to call out someone talking over a quiet person on my team. I've never seen anybody else stand up for anybody, however.
I just kept asking questions until I found the one she missed.
I also ended up becoming a bespoke VxWorks admin because the guy who volunteered was never available, and someone asserted my code wasn't working well on VxWorks, so of course I had to know enough to do benchmarks. I fixed a few problems but the real issue was the hard drive wasn't doing DMA due to the kernel not recognizing the processor revision number.
Somewhere between those two events I realized that if my part of the project is great but the whole project is on fire, nobody cares about my stuff. I don't get points for being right and the team being wrong. I mean, I do for some people, but I still feel bad at the end of the project, and those are the 'points' I have to live with the most.
The way that played on on the latter project is that once we got our shit together, we started work stealing from our peer orgs. Volunteering to carve off little pieces of interface between us and 'take care' of this bit of data handling here and that one over there. I came to realize that the Org subconsciously knew this and any past success they had was due to self-organization and collective work-stealing.
Throw people at it, turn a blind eye to precise mission statements, and hope for the best.
If you truly enjoy this part of the job, you can go for a freelance career - build a network and reputation as the person you call to solve the hard stuff, and do mostly that, no leetcode interviews required...
Anyway, it sounds like you've worked in some really shitty companies / shitty people.
At smaller companies the title tends to be only a title and many people are basically full-stack and know a bit of everything. And you're trading hats to get jobs done.
This is a great way to do it. There seems to be a correlation between unnecessary product complexity and unnecessary corporate complexity. Being direct about roles goes a long way towards simplifying corporate complexity.
That's Conway's law.
"Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure."
I'm sure we're due for a new one soon which will follow the same path, since people seem to be admitting that Devops has problems on here more and more. Will that cause deeper reflection? Probably not for many orgs, because soon some consultants will dream up a new brand of silver bullet to slay the immortal monster of organizational dysfunction and that's much more exciting.
My favorite example is that of a restaurant. You hire waitors, diswashers, bartenders, line cooks, prep cooks, hosts and hostesses, and managers. And your waiters might be 100% focused on taking orders during prime dining hours, but at the end of the night, they help clean up assuming other roles. Your bartender might be busy making drinks, but they take a table or two if they can. Your prep cook might finish the prep early, and come on the line to help out. Your manager focuses on expediting everything and keeping the ship running smooth, but also can fill holes at time.
After a lot of discussion, we mostly realized that we get the most value if we define more specific operational roles. We now have the idea of infra-ops, and product-ops. infra-ops is providing a deployment platform - a container runtime and persistences. product-ops on the other hand is responsible for deploying and running the different products from development on this deployment platform.
And this is giving people good ideas. Some products are very simple without harsh requirements. In these cases, one of the backend devs just takes over the product-ops role by setting up a deployment pipeline, a job and some migration handling. Other products are bigger - for example we are providing some of our systems as essentially managed systems to invididual tenants and customers. In such a case, parts of the product operations is with the product team - such as writing jobs, releasing containers and artifacts, and other parts of the product-ops role takes place in the managed services consultancy. These products are currently looking at wordings to split up the product-ops role according to their needs in their context. And that's totally fine - the infra-ops role is also divided depending on the system the engineer is working on. A postgres admin is an infra-operator, for example.
And all of this results in a devops oriented culture of cooperating across team boundaries through automated processes.
Then came DevSecOps. Around that time, I switched to just Security, and while I miss the ability to make and push changes to hundreds of thousands of machines to get things done, I don't miss any of the pressure or blame that automatically got lumped onto the sysadmin shoulders every time anything went wrong, and the complete lack of appreciation of all the times nothing went wrong, that were entirely the result of a tireless, efficient systems administration team.
It's like they'd never heard a BOFH tell someone, "Go away now, or I will replace you with a small and uncomplicated shell script."
1. Push code to an automated pipeline
2. I understand that the automated pipeline may need input from me in order to run successfully.
Beyond that, you need other people building the "platform" that they're deploying to (for the most part). Probably the ideal example here is Heroku.
Your devs shouldn't be required to do all of the actual ops stuff. They need access to an ops platform.
1.) What's probably the traditional view which is breaking down the walls between devs and ops, developers carrying pagers, etc. I.e. at least in an idealized world, there are no devs and ops--only DevOps.
2.) As you suggest (and which probably more closely matches how "DevOps" works especially in larger organizations), an internal (or external) operations team provides a platform that developers can use. Developers are still going to be exposed to some operational details, but a lot of them are abstracted away.
This becomes a lot more apparent once you're dealing with B2B customers and their nonfunctional requirements.
We've had situations in-house when devs built some simple microservice to handle some connection to a customers BI/DWH system - it's just 2 days of spring boot chugging to wrangle APIs around without state, nothing bad. But then, the customer started blasting that team with SLAs, backup questions like RTO, RPO, retention, regulation adherence of retention, the whole stack of IT security down to the physical access control.. That poor PO was caught just like a deer in the headlights.
This is some operational responsibility we're taking over for our dev-teams. We're providing the persistence, and this includes a defined backup and recovery strategy, a security strategy and such. And this also includes experience in dealing with these insane questionaires, and correctly pricing absurd requirements for custom backup solutions. And in fact, "we will have to schedule a discussion with ops about this" has ended quite a few of these requirements with "oh.. it's not that important". Intimidation with long job titles and appeals to external authority do work.
Two intelligent, sincere, experienced tech people can discuss devops, in detail, for a good length of time, and still be talking right past each other without really noticing.
It's ok to have build-engineer doing release engineering if it's simple enough, but doing build engineering properly (Source control, artifact control, shared caches, dependency management of the compiler outside of the standard toolchain) is quite a large job.
Release engineering itself gets complicated when you take into consideration the different target platforms, most games get published on Playstation, Xbox and one of a handful of PC platforms (notably steam), plus the online systems and of course the development environments and internal release systems which are ubiquitous.
Build Engineering usually is the step between developers and QA, and release engineering is the bit after QA.
Both interact a lot with QA so could be rolled into the same role.
The problem the term 'devops' is addressing, the way I understand it, isn't that of insufficient individual jack-of-tradesness, but that of too much separation between (internal) organisations. It should be perfectly possible to build a "devops" team from deeply specialized experts and turning it into a job description seems quite a stretch to me.
But you might actually want some jack-of-trades types nonetheless, because those isolated organizations weren't completely without merit: they are good at solving that problem affectionately called the bus factor. A single ops guy in a devops team is a single point of failure and the mitigation preparation of keeping some of the not-so-ops peers sufficiently in the loop will be much more dependent on organic motivation than the counterpart in a specialist company branch. There, substitutability would be much easier to ensure with formal process.
It doesn’t matter as long as communication is open.
Proper ops is all about reducing the bus factor. Procedures don’t depend on people, they depend on being good at communicating.
If your ops person got hit by a car, I would expect that everything would continue running, I would also expect to find some clear documentation, this is mandatory.
Some teams historically worked fine together, others didn't.
I think the tools we have these days are much better (for ops) to ship what works on developers machines.
But if your organisation had a culture of not working together then devops didn't really do much.
and anyway, you've leaned into my point about it meaning different things to different people.
I do work on Cloud Infrastructure at times, but honestly it's the smallest part of what I do. It's usually addressed in the architectural designs and I have to touch it incrementally. What I do much more often is writing tools, daemons, and services for distributed systems. I end up calling myself a Distributed Systems Software Engineer to reflect the idea that what I work on is systems and software, and most of them are non-monoliths (from a systems perspective).
Do you have any thoughts on Systems Engineers or the title that I prefer to call myself?
There are people who might refer to you as a backend programmer (if the focus is services and daemons), or platform engineer (if the focus is developer velocity).
Largely it depends what your primary focus is.
Fewer titles can avoid these discussions.
If they're not doing it or was overlooked for some reason, then that's an issue but not really an argument to not do it.
That won't be fixed with a new job title.
I always thought DevOps the “function” just meant this, and being a DevOps engineer at a small company meant you did these with decreasing emphasis, where by the time you’re in the backend it’s just helping enforce logging, tracing, other observable components.
Has worked for me in hiring and being hired and almost everyone I know understands this.
We have a way to prostitute words in this field... agile, QA, devops.
And more importantly, the backend engineers should be encouraged to be grateful for the PRs (even if it’s not acceptable for some reason). The more eyes on the code, the better.
I think developers naturally tend to specialise (at least for a time) but ultimately we all need to understand and contribute to code regardless if it’s front end, backend, build, test, …
1000% this.
I think it's an instance of how the conversation in software always talks about solutions as if they're end-all-be-all answers, rather than explain a problem in specifics and why that made the solution the right answer at the time.
Then that "solution" becomes a buzzword version of itself, and popular buzzwords are tools leaders use to overcome institutional inertia. Which is a good thing, to get over that inertia. But then what comes misses the insight and understanding that eventually turned into a buzzword.
I am interviewing right now and my previous experience includes devops roles and SRE roles, so I get contacted for both by recruiters. After hearing about the responsibilities and examples for these roles I can only come to the conclusion that titles are a waste of time. Even the "level" of a title of Staff, Principal, Lead, etc are a waste of time. Just call the job what it is and if you can't decide on what it is at least to an 80% level, maybe you're asking too much for that job?
jan [at] competition [dot] company
If you want more information on the project we're working on the site is https://rennsport.gg
We're building a hard-core racing simulator game with a backend which can persist car ownership in a way that feels authentic. (IE; not just tied to a game).
This is always a dead giveaway that something is a buzzword
Same for rest. Sometimes when people use it, it just means JSON + http requests. Other times it's supposed to be some kind of architectural style
Then they also tend to have to do their own security and spec out their own tickets and then do Project Management.
Most tech people should be called DevSecOpsBizAnalystProjectManager.
This makes a lot of sense, but you need to realize that this is only a tiny part of a successful organization. There are many setups like that which you describe across many industries that have failed reasons beyond the role definitions. It's not enough to establish the organization. You have to keep the behaviors in check over time when you hire a build engineer with different aspirations.
There's your problem. You have people who build stuff without caring where and how it runs. Recipe for disaster.
There's a reason to have a team of people doing this "DevOps" work. Just like we have a team of people who do SRE. It creates a standard and a single point which all work flows through. Then you don't wander onto a new project only to realize they use $BESPOKE_DEPLOYMENT_METHOD because "it's what we used 6 months ago". Or worse, you don't have a developer playing with a massive, nuclear powered, foot gun like Terraform and accidentally destroying infrastructure.
Making DevOps/DevSecOps/$BUZZWORD the responsibility of developers is a cost-cutting measure not a responsibility measure.
If you're building a crud-app in a common framework with low volume, sure you can toss that over the wall.
In theory, devops shouldn't be responsible for maintaining the performance of code. The specification should say what it should run on, the devops guys set up a pipeline and manage that thing, and the developers are the ones taking heat for not hitting that goal. If devops guys are taking the heat for that it sounds more like cost-cutting measures flowing the other direction.
And in some ways this is by design, I want to have some distance between dev and ops because it gives me the freedom to rearrange infrastructure transparently. I can move workloads between Lambda, ECS, and EC2 based on the observed performance characteristics without anyone being the wiser.
They’re tools for the job, like your compiler and the language you use to program.
That’s like saying “I like to use Python and couldn’t care less for Java”. It’s fine to disagree with the team’s choice of the tools, but one needs to eventually commit to the choice, even it’s not your preferred one!
There’s an old adage that “if you’re writing clever code then you may not be clever enough to debug it”. It’s true! Operations requires deep understanding of the code running in production, the business rules and the customer. The person who will operate your code will eventually be smart enough to develop it entirely too, eventually cutting out the “dev”. I’ve personally seen this happen over and over.
If you don't want to think about AWS S3, GitHub Actions/Jenkins, Terraform, etc, .... then we need to work together. All those tools and services exist because all the software developers are sitting in their sandbox, and don't want to come out and play. The systems and tools that we run your code with... suck. A lot. We need programmers to make the systems better. We (in Ops) are a little busy with trying to just figure out how to run your apps without them falling down. We don't have a lot of time to reinvent the state of the art of computer systems.
For example, we need a distributed operating system. Not some fucked-up kludge of a monolith of microservices overseen by a company that has more engineers than brains... but an honest to god, stable-ABI, simple, composeable, stable, general operating system. We need Linux to come out of the box, ready to run distributed applications, in a way that doesn't require a PhD. Once we have that, then you - yes, you, the developer! - will be able to make applications that automatically scale so easily that we will never need to utter the phrase "container" ever again. You will rarely ever need us again, because the system will just be so simple, so general, that anybody who can use the terminal can build and deploy applications without ever learning anything outside of your programming framework.
But we need you to make that distributed operating system. Until you do, we will just have more stupid kludges, more bizarre unnecessary complexity, in the futile attempt to constrain all the crazy shit we want to do with technology, while trying to run your apps for you. Please, I'm begging you - put me out of a job.
Same shoes, but I have different perspectives. I can figure out where it's not working and if it's my/our team's area of responsibility then we go fix it.
Since we handle infrastructure (as code) and deployments in team, along with all the development, most of the problems are handled by us. Unless it's clear that it isn't, e.g. some API that we consume that keeps throwing 500, can't fix that.
Our operations is helped by 100s of automated tests and 1000s of metrics. I always thought that this is DevOps, but it sounds different from what most of the people here are alluding to.
My background is large multinationals so my view here is a bit bias but i don't think cost cutting is the driver.
Large orgs get large change management processes and procedures. Over time, these change management teams become overwhelming behemoths with minds of their own.
I think "DevOps" was designed as a way to "bypass" the bureaucracy?
"We just use this CI/CD pipeline and no need to sit on a 3 hour change management review call..."
It becomes just robots pushing around paper for compliance.
When I design a backend system, I need to think about how the front end developers are going to interact with it. My data storage characteristics and scaling. I need to know am I designing anything that’s hard to deploy. How will logging work and be aggregated. I have to be able to think about the entire system.
It’s not just “cost cutting” at a certain point in your career you are expected to know more than just “how to code”. I’m not saying learn AWS. But I would expect any senior developer to know about what their code runs on top of .
Most developers have to do Frontend, Backend, and Ops. These have wildly different mindsets and feedback loops and not enough time exists. Don't hate the player, hate the game. The orgs are fucked, not the workers.
There are only so many hours in the day, it’s hard enough to stay on top of my core skill set.
The word "scam" is overused. It's a scam if it was a bait-and-switch. It's not a scam if it was advertised this way up front. In the latter case it's just a job whose description you don't like.
I'm not sure why this happens but it does and I don't like it (as somebody who's very M-shaped).
Letting them wade into others responsibilities often introduces a lot of that danger.
System administration is an entire discipline with a history and way of thinking and approaching problems at a systems level. Software engineering is the same, but with an approach that often emphasizes handling problems in the application.
Letting solutions be put into applications that are best handled at the system level (where system isn't just an OS, but might be a complex network of them), or put into the system when it could easily be handled by the application, caused inefficiency and problems.
The solution is to either hire only very accomplished people that can do both and make the right decision of where to put solutions, or hire people that can do one or the other and put them together with a few (maybe one) person that has a good grasp of both and can make executive decisions when the right time and place is for specific solutions (an architect).
The latter is easier to hire for.
I'm also not convinced that the average junior or mid level application dev needs to know where or how their stuff beyond some high level concepts (i.e. our app uses auto scaling).
I've done some, and it feels like reading an arcane text in another language. I've seen the thing I want before, so I know there exists a magical YAML incantation that brings it about but I have no idea how to even start looking for it. It devolves to an extremely slow feedback loop of "did that work? No."
I've never found anything that explained how I'm supposed to create a systematic approach to creating anything I need.
"That didn't work, and the traceback (if I read it) isn't very helpful. Guess I'll try something else I found on SO."
You learn ops by doing it, preferably in an environment where when you inevitably do it wrong, it doesn't break prod.
In the old days, developers would also configure the design of the system it ran on.
"Alright, we're going to use some fast CPU front end nodes with smaller, faster disks, and then have a load balancer, our database server is going to have the most RAM, now let's set these resources up on a CDN to serve faster, this memacache server is really going to lighten the load..."
I think the idea was to use tools that automatically did all this work for you, but it just ended up creating another tier of people that configure this side of it.
Though mind you, the article touches - but doesn't go into - the real reason things go shit. It's not ops, it's not devops, because your knowledge silo can too, and in my experience (that's longer than the writers, lol) _will be_ pressured.
If the estimates are consistently too low (because honest ones would render many - maybe most - projects non-viable), the requirements are often bullshit (because otherwise the lies in the estimate would be too obvious, and everyone needs to agree on the lie) and and rewards are not the ones the jobs claims to be (in every workplace, your job is to improve your resume. That's all. That's the only thing that, materially, matters for your career. I can care about software I write, but if anything I end up punished for it) - then no amount of renaming things and shifting org charts around or motivational reading for managers is going to fix that. Nor will a Cloud or latest orchestration software.
You don't want a doctor who brags that she can do heart surgery, perform talk therapy and treat your skin rash. You want someone how knows a lot about your specific problem because she's prioritised it over other things.
Perhaps most of our organizations are using the wrong tools and that the way we divide responsibilities is the problem.
Imagine if:
1.) instead of general purpose languages being used by product developers we had domain-specific languages and tooling that were agnostic to things like deployment, memory-management, data storage
2.) computer engineers built the domain-specific languages and tooling and focused on things like deployment, memory-management and data storage.
DevOps is generally solving for general purpose development while developers are generally solving domain-specific problems with general purpose tools. This seems to be a large part of the disconnect!
I’m currently working on a small team of computer engineers who support a custom Turing-complete DSL for an “operations” team that both programs and directly interfaces with clients.
It doesn’t seem that different from what Apple or Microsoft do, which is write custom languages and tools for developing on their platforms, albeit specifically for general purpose programming.
Why would someone who wants to build stuff not care about where and how it runs? Operational requirements are the same as any other requirements.
Spot on, but not only that, also decisions made based on number of hands in a room.
It wasn’t always like this. There was a time when we talked about languages and OSes and libraries as if they made a difference on how much you could get done with as little people and cognitive load as possible (the claims were very much overrated, but the point was we acted like it mattered).
And then it started ballooning. It seems to me that much of that coincides with a lot of new money being dumped into the economy, and software moving past the necessary-evil-so-how-do-I-drive-down-my-costs and on to the gold rush of you must have a web app for your service that rivals a video game in complexity and visual finesse. It seems to me that as long as their is so much free money sloshing around venture capital funding, that it was inevitable that the process of making software would complexify to soak up the extra cash. After all, you can only add so many levels and varieties of managers to add value. After that comes the variegation in roles of software development.
That’s my working theory anyway.
Couple that with a social drift toward hyper-specialization. That sort of hierarchy naturally creates minds that don't think at the systems level. This is the general theme of Buckminster Fuller's Operating Manual For Spaceship Earth: the shift toward hyper-specialization (or narrow focus) has long-term disastrous consequences.
When you couple inexperience with narrow focus, you get messes.
Part of this is to blame on the industry-standard thinking that because someone works for Company X, they're a competent, logical engineer (and should be granted authority/responsibility over essential products/projects). This is the downfall of "code tests" and "whiteboard coding." They don't evaluate for systems level thinking and so a developer with poor creativity and logic skills slips through the cracks because they're great at eeking performance out of a function which impresses the lollipop guild.
[1] https://blog.cleancoder.com/uncle-bob/2014/06/20/MyLawn.html
There are thresholds of effort for certain things at largely constant levels. E.g.: designing a new language, build tool, web application framework, or database all have some minimum effort that needs to be invested. In the past, there just weren't enough developers in the entire world to "overdo" these things, so there was a relatively small pool of languages, tools, and frameworks to choose from.
Now, individual corporations have armies of junior developers spitting out frameworks and query languages like a machine gun. There's so many now that you or I haven't even heard of 99% of them!
As the number of available developers grows exponentially, so does their capacity to "reinvent wheels". Their ignorance of existing wheels to they could be reusing grows exponentially also. The result is an exponentially exploding set of ad-hoc, incompatible systems.
In my recent semi-DevOps, semi-Cloud-Engineer role I've come across an absolutely bewildering array of tools even when I've generally restricted things to one cloud and one language's ecosystem. Heaven help you if your project has multiple languages!
Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.[2][3]
— Melvin E. Conway
The problem isn't necessarily software itself, but how we organize people (more than 2 or 3), how we communicate, how we mirror operations, expectations, etc. in software.Scaling sustainable software feels like an "unsolved" Problem, because the society hasn't figured out how to organize better.
It's because of growth, software development today is both simpler and way more complex than before and that's entirely due to growth in the sector. In many ways it's simpler than it ever was, I'm writing an API using Lambda/API Gateway in AWS and it blows me away how quickly I am able to get services stood up and configure my API. But in another way it's so much more complicated, for this same API I spent 2-3 weeks experimenting and researching IAM roles and how all my AWS resources would interact with one another.
I would say the floor of software development has become way simpler, deploying a site to Netlify is way easier than dealing with webservers of the past. But while the floor is lower than ever before, the ceiling is in the stratosphere, with extremely complex systems that you can string together in the "Public Cloud".
> for this same API I spent 2-3 weeks experimenting and researching IAM roles and how all my AWS resources would interact with one another.
same experience here, and trying to understand/test un/poorly documented AWS behavior with public/private APIs and EC2 resources has been a huge time sink :(
My problem with Make and similar tools is I don't want to learn more ad-hoc syntax to accomplish something nearly trivial. But I don't know if there's any alternative.
Now I am working at a large org for an internal application. There is an ops team I push code to github and Jenkins runs the CI and deploys it to a dev environment. I push a button in Jenkins to deploy to production. In two years there has been one ops related issue where I had to bump up the memory limit from 2GB to 4GB. There is a single server and a database. Setup works fine for a few thousand concurrent users. My skills are in development and understanding business reqs, not mucking about in config files. There are people that are good at that and enjoy doing that, let them handle it.
I'm glad you said this. 2-dev shop here and we've used Heroku the last five years. We spend all our time developing and supporting our application and almost never worry about infrastructure. I keep getting tempted to move to K8s; Heroku is expensive for what we get and seems to have suffered some serious brain-drain. I've dabbled with K8s in side-projects and I can geek out on all the terraform and yaml stuff for days. But its probably not a great idea to inflict it on a 2-dev shop.
So to prevent themselves from ending it all, they invent things to do to keep themselves sane while they churn out CRUD apps all day long for decades. Sometimes you get truly amazing software, but most of the times it's just reinventing the wheel, but worse.
Even worse, some of them become infatuated with doing things "Like Google," and errant CTOs enable them, leading to a 6 person team supporting a piece of software that could be replaced with WordPress, to the betterment of the primary stakeholder.
I've been consulting for over a decade now. The number of times a potential client has told me they've spent into the six figures (or eight) for something that free software does out of the box but better is depressing. The worst part? I usually can't convince that person they've been absolutely swindled, and so I have to let them keep on their merry way.
Or they spend the spare time bullshitting on web forums.
It’s a laughable over reaction (pun unavoidable).
These are manic states the dev community enters, and all pragmatism is lost in our discourse on the solution.
Cooler heads are not prevailing. There is certainly a solution in the middle.
Devops is bullshit is also an over correction. It’s worth saying that making infra more accessible is a democratization of that entire sub-field. How do we preserve the good part of that?
Are you talking about interest rates? Thanks.
Even if you're wrong, it's an interesting theory.
Is probably one of the better explainers.
Yes, there is a network of small to medium sized ones.
> things that were near impossible to build in the past.
Were they necessary in the past?
Meanwhile people are now spending millions of dollars and years of person-hours building MVPs in the cloud that won't ever go anywhere because the business model sucks.
Focus on delivering value to your customer first. How you get there is quite literally irrelevant to your customer.
This is us. We have a couple thousand customers in a B2B space with large revenue per customer. Growing rapidly and our node.js stack is running on a tiny EC2 instance and RDS server.
I spent a lot of time in the last couple years trying to fight back against attempts to overcomplicate this by using new services & new tooling. I just see it all as stuff that's going to slow down the single most important thing to us from a business perspective - which is writing new features to optimise our internal workflows - and give us more headaches from an operational perspective.
Even the stack we're using feels ridiculously over the top and complicated compared to a basic LAMP application. Everything in node.js feels like a huge pain in the ass. If there's not a node module you can install to immediately solve your problem (which of course just adds a different set of problems) even writing basic things feels exceptionally arcane and time consuming compared to a basic PHP implementation of the same thing. Maybe it's just a side effect of our dev team size (currently very small, only five full time devs) but the overhead of it all just feels like it's not worth it.
This is my theory too, and as someone learning to program (hopefully as a career), I'm very worried what a new paradigm of tightening will do to the field. I feel like LOTS of people who think their stack is secure are going to get dropped because at the end of the day, lots of the B2B SaaS doesn't actually deliver anything to the world of material needs.
I don't like this hyper-expansion either, but when you have an army of monkeys on typewriters, well, a lot gets "produced".
As a result, it's relatively easy to find someone who advertises as "full stack devops" who has never actually operated any infrastructure more complex than a LAMP webserver cluster. And it's hard to find a senior sysadmin who has enough years of experience to understand and troubleshoot all of your infrastructure from layer 0 up. People with that experience have moved into management or consulting or retirement, and there are no jobs for new folks.
I worked at a place that had an elastic search cluster with 30 nodes, because they kept hitting open file limits, when 8 servers could easily handle the traffic with basic tuning.
And I would say that your specific example is a failing of that team, not "DevOps" as a whole.
In my experience, people who want to do interesting work on networking and systems. (Think, building large scale networks and systems with high resillience) dont want to work for smaller companies anymore.
They either work at:
A) highly specialized companies which fit their niche. (ISP's or IXP's for instance).
B) do consulting and project based work for a lot of companies who have very specific requirements. (Think, building a identity provider system for a country etc).
Small scale complexity has been disappearing because everyone just throws hardware at the problem instead of thinking about how to architecture their infra "correctly".
Why optimize your OS's scheduler parameters to get more performance from your filesystem when you have nearly free VC money and a budget to burn on resources in AWS?
Also the amount of devs who have had to work in shops with really crappy on prem infrastructure setups is also amazing - I guess I have just been lucky to work at places with solid infra
Some interpretation of DevOps may be BS, e.g.
>> Need a database? File a ticket with DevOps.
That does seem like a bit short of the mark. E.g. that should be automated provisioning in 2018, never mind in 2022.
Counter points as to why DevOps is not BS:
Today there's just an acceptance that you version your code in VCS, it wasn't always that way. "Hey, this doesn't look right, i know you said you based your change on gui-app.latest-final2.zip but was that Steve or Laura's version of latest-final2?". If you didn't work through this period you'll struggle to believe how common it was for shipping products not to be fully up to date in VCS or not to have VCS at all. DevOps changed this.
Continuous integration? No there were people hired as "merge masters" or "build managers", i promise i am not making this up. DevOps changed this, the idea that you wouldn't do at least CI or perhaps CD is unexpected today.
Deployment automation? Sure, you email it to the ops team, send a few more emails with attachments late on Friday with ammendments and hope that they deploy the right one. Automated deployment as far as the developer was concerned. The ops person on the other end? Sure they had a batman's belt full of hand crafted tools and scripts but it was definitely pets not the cattle DevOps has made us strive for.
Testing automation? The testers sit on level 3 not next to dev on level 4, there's about 4 banks of desks over by the cupboards, that's all the testers. They have lotus notes databases with checkboxes to confirm when they test something. If they find an issue a regression report will arrive with the dev team in under a week.
I could go on an on. Platform teams are great when used correctly. You can say something similar about DevOps.
At uni back in ~2000 was the first time I ever heard of version control. It was a bit of a curiosity, nobody seemed to be using it.
In 2004, engineering at the place I worked (you've heard of it) were developing PL/SQL in production with no VCS.
In 2007, I worked in IT at the same place. They were developing in SharePoint without VCS. When I asked if I could use VCS, I was told to use a 1996 or so Visual SourceSafe which was more temperamental than Subversion.
The first place I worked that used VCS was in 2008, and only because the devs had started using it against the wishes of the principal developer.
The world moves fast and slow at the same time.
> That does seem like a bit short of the mark. E.g. that should be automated provisioning in 2018, never mind in 2022.
Serious question, as I think this has been part of my thought process in the challenges of platform engineering: what does it mean to automatically provision a database?
I can think of lots of different examples that are insufficient in one way or another (I think I'm mostly talking UX here, and how many questions the user has to answer in one way or another / infrastructure as code, not should the user have to apt-get install postgres, which should I think rather obviously be automated.) But if infrastructure as code is defined as automation, this can conflict with the developers who don't want to learn terraform and thus still leads to "file a ticket with devops"
There will be some mandated platform choices so that the org has a fighting chance of managing complexity and knowledge/skills within the db platform team.
I’d expect you’ll have to specify in a web form or maybe an api call that you need a document db / oltp / olap and you want it in region A and of size medium and you want an indefinite expiry date on this lease. I wouldn’t expect to have much more freedom than that, e.g. backups, point in time restore options etc will all be standardised. I’d expect to be immediately given an appropriate pre-built instance that had been provisioned and kept warm waiting for the next request of a db of these specs.
That allows the developer minimal friction but also allows the db platform team to say we provision a postgres for document db use cases not a mongodb because it simplifies platform ownership (and we just mail a link to some docs explaining how to use a postgres as a document db to the user account that requested the instance).
That's me. I did that.
Once you peel away the jargon, and tbf doing this is a skill that requires practice, non-techies do think ops work is cool.
But I don’t like this rift. There are many ops people having a good understanding of product and vice versa. It’s just the stereotype which gets reproduced all the time, including in this blog post.
Sure, if management doesn’t get that there is value (and cost!) in proper ops work and having product management attached to it, then all is turned to “DevOps” anyways
Wait until you meet DevSecOps...
ProAnalDevSecOps
The job of DevOps is not to close tickets. That'd be like driving a car by shouting directions at someone lying on the floorboards holding a wrench to the steering pinion.
The job of DevOps is to build a steering wheel (and ideally, teach SWEs how to drive... at least enough that they understand what a "road" is and why it's a pleasant experience for everyone if you stay on it. If the road doesn't go where they need to be, then it's time to file a ticket, but that ticket had better be "Build a new road," not "Offroad this one car to the cabin in the woods and call it a job well done").
The raw hardware of an enterprise deployment is so flexible it solves nobody's problem. DevOps is in the business of writing the operating system for a mega-computer physically represented by hundreds to possibly millions of heterogeneous computers. It's a process of continuous growth to make that work.
You want us to run and manage your software? Sure, here's a checklist of what it has to conform to. Oh it's unstable? We will no longer run it for you, here's the pager back.
This is definitely a symptom of a broken model and not what I would call devops. IMO the most important tenant of devops is "if you build it, you run it," meaning the appdev team that decided to use Postgres RDS is the one getting woken up at 2am.
It's also, in my experience, one of the best ways to reduce masturbatory engineering decisions and get people to focus on picking boring technology that works. Coding up a serverless application in Rust that's using a CockroachDB backend at a Python/MySQL shop would get a lot of engineers excited, but those people would be less excited knowing they're going to be the ones paged at 2am when this new and exciting architecture falls over in an unfamiliar way (as opposed to Python/MySQL, where a wealth of operational knowledge at the org has already been built up).
Similarly, it naturally reduces architecturally complexity. Younger senior engineers love drawing boxes of queues, multiple microservices, event buses, etc to show off their skill in creating the ultimate engineering fantasy, but once you throw enough late night operational incidents at a senior engineer, suddenly the preferred architecture becomes "an executable running on a box that I can SSH into when things go wrong."
I'm I the crazy one here? This kind of work is my bread and butter as a "devops" person. PagerDuty fires, I find the bad query, match it up with the most recent PRs to find where the n+1 got introduced and either patch it right there at 2AM with the on-call manager's approval or roll it back. Then we have a postmorterm in the morning with the team.
I'm the person positioned the best to do this work because I'm god in my little ops domain, have the most visibility and the biggest toolbox of potential fixes.
Speaking personally, I used to work on a Platform Engineering team for a major multinational that rhymes with ay-do-bay. And their requirements for the workloads that we needed to be built and hosted where very unique because you have teams running completely different languages and toolchains, esp. from acquisitions. I can argue there that the template K8s setups wouldn't work.
In the author's case, the only reason why they feel the way they do is because we finally have some sense of standardization on what Infra looks like for most companies. It's no longer a question for most folks if they should adopt Docker, we have accepted images into our lives. Same for K8s (after a certain point of scale). So uh... yea. Catchy blog title to sell some thin layer on top of K8s in which it doesn't solve the root issue that they are talking about.
When ops gets woken up at 2am because someone in dev cut corners, what happens? Does dev feel the pain? Almost never.
The same thing happens when outside contractors develop code. They often provide a buggy and undocumented mess, and then no longer work on the project. They never feel the pain, so they're not incentivized to provide good code.
Until we find ways to align incentives, we're going to keep getting crap whenever more than one team is involved.
In the same meeting, the CTO was ranting about the instability of a service (which was also a PoC that was pushed to production before we were _also_ made to "own" it, yet never given the budget to even get acquainted to the codebase), claiming the reason for that because we devs are lazy and unprofessional.
I highly doubt making people who _responds_ to the incentives to "feel the pain" will fix much. I suspect things are more likely to get fixed if the people who _creates_ the incentives are the one "feeling the pain".
As they say in Hunger Games, "remember who the real enemy is".
The DevOps mindset has also led to the Internal Developer Platform this article discusses. Honestly, I don't see how traditional IT organizations are going to easily arrive at such a platform without having adopted the DevOps mindset first.
So DevOps isn't bullshit, but it's not the end goal either. It's a necessary step needed on your journey for getting somewhere better.
That was employee utilization approach where you hire 1 DBA and he runs all DB stuff because hiring DBA for each team does not make financially sense as there is not enough day to day work for DBA specialist on a project/product.
Other stuff is that DBA/SysAdmins have to have access to customer - company data so you still need separation of duties (no devs access on prod systems) and it is easier to make guy part of "OpsTeam" and give him access across all systems than get "Ops" person configured per project\product.
So why "DevOps" if you get operational overhead and you cannot "utilize employee 100%"? Because in most companies delivering new features is more important than making Joe Dbaer closing tickets like in factory because we learned it actually is not efficient when "important feature X" is delayed because Joe was doing his job fine but feature X was in queue.
Reader mode on Safari is a savior in these cases.
Think about the long play. You join a startup with a broken, hot-potato-style "devops" process. Instead of saying "not my problem" all day, you can take some ownership of the items slightly outside your space and try to arrive at a better solution. If you do this often enough and with enough persistence, the end customer will eventually see benefit. At some point, management will likely notice the correlation as well. Even if they don't, you have gained far more experience than you would have otherwise and can maybe go start your own damn company, realizing up to 100% of the value of your labor.
This is why I try hard, even if someone doesn't make me.
All I've gotten from trying hard is burnout and sickness. At this point I can say the only reason I work for others is because I need money to live.
This is not true in many startup environments. You'd have a hell of a time getting me to work on a new project without some sort of equity arrangement.
I'm 47 and what is described seems like the old problems that I thought devops solved.
Since you’re 47 (I’m 52), it’s the equivalent of hiring an “AutoConf engineer” in the late 90s.
The problem in a lot of orgs is having various priesthoods that have their own goals that aren't aligned with other teams/the business. For example:
- hardware purchase
- software purchase
- DBA
- ops/infra
- networking (firewalls esp)
- security
- HR/recruiting
- business analysis
- front end/design/ux/dev
- back end dev
- technical documentation
All of these are roles and they can have specialists, but you probably want them aligned and rewarded for keeping the business running, not serving specific measurement goals like uptime, to the detriment of selling product. These priesthoods often have their own religious virtues that they espouse, like 3rd normal form or low TCO, that they pursue in absence of directions and understanding of their role in the business.
It's easy to see the problem but wicked hard to prevent it or fix it. I have a small organization and it's difficult to get people really on board with a vision.
You need more than one specialist because they need to be allowed to get sick and take vacation. And once you have multiple dev teams it’s not economical to staff each team with multiple expensive specialists. So you put them on their own team and structure it organizationally so the devops team is an extension of every team. The devops team keeps up with all the work every team is doing and reacts accordingly as well as being a resource each team can tap.
Crucially, you don’t put any kind of ticketing system or similar in between.
Nah, the problem is that most (software) engineers don't have the skills to do operations work.
Let's not fool ourselves: software development is 95% development (writing code/docs/test etc) and 5% system administration (getting your local mysql or whatever up and running etc) whereas operations is usually 95% screaming at the machines (various linux tasks, writing glue scripts, infrastructure, networking etc) and 5% development (the aforementioned glue scripts, writing internal docs).
The fields do overlap, but very little.
They require two different skill sets.
They require two different mindsets, too: a software engineer is usually optimistic (works on my machine, will work in prod too) whereas a sysadmin/operations person is usually pessimistic (what's our disaster recovery strategy?).
Its a different frame of mind, mainly because its the place where the rubber meets the road so to speak. Once things are production, you cannot revoke or redesign your product/system because people are directly reliant on it.
This caused as many disasters as you’d expect. For example, when something breaks (like ECS, or a bad AMI is deployed), the dev team is stuck waiting on the operations team to fix it because the infrastructure is owned by the ops team.
I was part of transitioning to a different model where the same general concept applies; a bunch of yaml leads to infrastructure being deployed. But that infrastructure was fully owned by the dev team that deployed the code.
It only made things generally easier, because then if the deployment code broke (i.e. the code responsible for converting the yaml files to infrastructure, like terraform modules), we’d have to fix it.
Ultimately it was just a fancier and more guided version of Kubernetes.
Said the 4th article on the subject this month.
He rapidly became the most popular member of my team.
His job was to commoditize configuration management, and strip away as much of the overhead from the coders, as possible. He didn't do release management, and we didn't really work automatic testing into our release workflow. This was because Japan did not trust auto-testing, so each engineer did their own unit and harness testing. Japan also wanted each engineer to make their own "official" release, as opposed to having a CD system spit it out.
There were reasons. I didn't necessarily find them that compelling, but they were the boss, so I gave them what they wanted.
Japan liked him, as it gave them one single person to talk to, and he also helped them to streamline their own infrastructure. In fact, he is the only employee that I ever had (including myself), that traveled to Japan before being there a year.
For myself, I find using things like Fastlane, JIRA, and Jenkins, aren't actually helpful, for a one-man shop. I tend to do a lot of stuff by hand.
Most developers have no fucking clue how Linux, networking, storage or whatever works under the hood. They know how to develop whatever stack you're at, but stuff like latency, packet loss, redundancy factors, backup policies, monitoring or other classic ops topics are completely beyond the comprehension of 99% of developers.
"DevOps" usually means some C-level execs say "fire the expensive neckbeards that have the time to properly understand a system" followed by them saying one of
- "oh fuck, someone managed to compromise a service and because no one knows what the fuck firewalls are / Kubernetes doesn't come with ones OOTB the hacker got complete control of everything"
- "oh fuck, production is down because someone fat-fingered in Elastic Beanstalk which recreated the environment, dropped the RDS database and there were no backups" (I've been personally bitten in the arse by their definition of "recreate" - all I wanted it to do was to replace the damn EC2 instance)
- "oh fuck, we're seeing insane AWS bills because someone DDoS'd us and no one created a cost limit or a sensible limit for autoscale or a simple dumb monitor that alerts someone"
- "oh fuck, we're seeing an insane AWS bill because someone got his AWS access credentials stolen and some shithead spun up a ton of g3.xlarge instances to mine shitcoins"
Other C-level execs see constant issues between "ops and dev teams" because their team leads are friends of the silo model and decide that instead of getting rid of the dumbass managers they're getting rid of the ops team because "cloud", with the exact same result.
My company practices "DevOps" and it feels great:
- Infra team build self-service tools (build, deploy, scale, observe)
- Infra team write good documentation for these tools
- Dev use tools to build, deploy, and monitor their apps
- Dev not use use words like: AWS, k8, Envoy. These are abstracted by tools.
- if problem with app (very common) Dev fix it using the tools
- if problem with tool (very rare) Infra fix it
We have no build engineers, release engineers, etc. However we do have a rotation (similar to on-call) whereby Dev is responsible for releasing code that week.Sure there are sometimes problems, frustrations, etc. No system is perfect. But you are getting paid lots of $$$ so shush with your whining & instead help improve the system.
For context my company is mid-size ~150 engineers
Agile teams were supposed to be composed of not just devs doing everything (quickly, because agile is about raw speed, right?) but people competent in the various aspects of the system development, potentially deployment, customer needs, etc. working together as multidisciplinary teams to meet a particular objective. The purpose of this was to break the silo that is common because it feels natural to many managers (the same sort, I presume, who don't like when their mashed potatoes touch their fried chicken on the plate).
Silos impede communication and promote the "throw it over the wall" approach to product/system development. A system engineer (or team of) made the design after sales (and possibly only sales) talked to the users. Throw that design over the wall and let the devs build it. Devs throw it over the wall to test, maybe there's a volley. Eventually it's tossed to ops. A goat is sacrificed and maybe it works.
Multidisciplinary teams are able to communicate across those boundaries because instead of the role-silo the roles are all in the same team, working (more clearly) towards one common objective. But then businesses managed to fuck it up. They got rid of test, devs do all testing now. Devs do all the database logic. There are no UI/UX experts anymore, it's all full-stack, and on and on.
DevOps was supposed to be the same. It was supposed to take that Agile cross-functional team and add in the sysadmin/operators (among other things). The critical problem being solved was the two (really more, but at the limit) silos of dev and ops failing to communicate. A Friday release with Ops spending the weekend rolling things back because Dev wasn't there and didn't even know the system wouldn't work as intended. Maybe their test environment was too different from the operational environment, the reasons matter a bit but there are too many to enumerate.
Instead, businesses did what they do best, they fucked it up. Again. They said, "Take the sysadmins and teach them to code. If they can't, dump them in the woods. The devs will takeover the world." Then they started making "devops" job listings and full-stack grew to encompass a new set of skills.
For every operations person without software development skills, there are FORTY engineers without cloud operations skills. If you are going to build an internal platform, you’ll need experts with overlapping experience in both fields working together.
I guess I just need to find a role with more inter-team collaboration; Being able to mostly self teach is great, until you have no one to learn from anymore.We spoiled devs way too much.
Software are still being thrown over the wall and ops takes all the blame.
The problem is that many devs can get away with spawning servers and doing the easy parts of ops, when it requires rules and discipline, then we all know what happens, over engineering and security holes.
To me, this gets to the heart of the matter. As example, I present the required skillset for a front-end engineer:
https://frontendmasters.com/guides/front-end-handbook/2018/
See the index on the left. Do you need to know everything about all of these things for every single project? No. But as you grow into a senior, you'll be touching almost everything in that list.
It's an absolute explosion in complexity. Web development once was barely considered engineering, now it's one of the most complicated roles in the industry. Consider also that almost everything on that list is constantly evolving, this list being 4 years old.
Has all this added cognitive load and complexity resulted in massive productivity wins and dramatically better outcomes (UX, quality)? I'd say no, or at least it's questionable.
My point being is that this is already too much. I work in teams with a distribution like this: senior (20%), medior (50%), junior (30%). So the vast majority of them are median. And the median programmer is severely lacking against our ever growing demands. It's crude, but the typical programmer really sucks at programming.
So if next you're going to add even more to this pile with all sorts of devops and funky cloud tooling, the issue becomes clear: we're over-asking.
We over-value flexibility and scaling but ignore its dramatic costs.
"Devops" is just a made up word.
This is me right now: https://i.kym-cdn.com/entries/icons/original/000/019/304/old...
And then someone newly promoted decided to rewrite everyone's job titles and I was suddenly DevOps, sucked back in.
The other thing that this misses is that the deployments to the cloud are only half the battle. CI/CD pipelines, Dev environments, and other SDLC phases like planning and testing are just as important as terraform. Making the infra as code better and more reusable is a great step in the right direction but only part of the puzzle to making software development workflows better in the big picture.
Like most things it comes at a cost (for a good PaaS mainly the running cost).
But I increasingly come to believe that for small startups which fundamentally don't have the resources to do what the article describes or anything close to it using PaaS and keeping operational complexity as low as possible is the way to go.
Several years ago, I did "DevOps" work for a year or two. My gig essentially consisted of automating the repetitive Ops tasks and giving engineers self-service tools to get their stuff into prod efficiently.
At the time, things like Puppet & Chef were commonly-used tools, Vagrant was a widely-used development tool and Ansible was the new kid on the block. I don't remember there being much YAML yet, and I'd only heard a few things about Docker. I wrote several custom tools in Python, and we used Jenkins as our CI/CD server.
"DevOps" in those days was a cultural thing. The DevOps engineers owned the infrastructure and the software engineers owned the software. It was the DevOps engineers' responsibility to give the software engineers the tools needed to deploy, and also to ensure that the SWEs weren't causing outages (where we were the first line of defense).
It started to get bollocksed up when hiring managers started defining DevOps roles in terms of tools used, and the tools themselves supplanted communication & culture as the definition of DevOps.
With YAML, k8s and the rest of the nonsense, I don't think DevOps is very possible these days, because when your config is tens to hundreds of thousands of lines of YAML, the tooling doesn't even allow for a culture of self-service & communication. SWEs more or less have no choice but to chuck stuff over the wall, and DevOps or DevSecOps engineers (or whatever buzzword nonsense the industry has adopted) have to perform augury to construct the configuration. Because of cloud vendor lock-in or just sheer complexity, whatever runs in prod is quite different than the dev environment, and everything just limps along.
EDIT: Now I read things about "MLOps" and so on. When will the madness stop?! Apparently we keep allowing recruiters and dimwit HR bureaucrats to define software processes and culture.
in any sufficiently large company platforms arent engineered, they are acquired or purchased/licensed based on their viability as an "enterprise grade" asset that drives business success and reduces cost through identifiable if not meaningless KPI and record keeping.
In any sufficiently large company self-service is supplanted by rigid controls, authorizations, approvals, and annual reviews. this is done in the service of jira and the need to make-pretend work by an ever growing cavalcade of pseudoworkers who recognize self-service as the killing stroke of their career.
it can then be said, grudgingly and with scorn, that devops seems designed by its very definition to operate as an antipattern to some of the worlds largest, most successful corporations.
* Develop new features
* Fix bugs
* Sit in standup, "alignment" meetings, and any meeting created by a Zoom-promiscuous engineer (let's hop on a call!)
* Reply to dozens of Slack messages per day
* Write documentation
* Develop and maintain the CI/CD pipeline
* Be an expert on Observability (aka Debugging in Production)
* Have Galaxy Brain level knowledge on the entire cloud setup in all environments
And now I am thinking, if being an engineer involves all that, why not just find a gig with pure DevOps and not worry about product development?
I don't care if I'm by myself, if I have a team, if I have a devops title - all I want is to prevent the existential death-by-garbage-fire that eventually consumes all technology. If you're a developer and you're not aware that you're marching slowly towards a complexity-cliff, that's fine - we just have different jobs. If you're a developer and you're aware commits aren't by default equivalent to progress, congrats, you're an SRE/DevOps/Senior Engineer/Platform/whatever.
Most people don't know what DevOps is. Of the very very few that do know what it is, they are powerless to get other people onboard with the idea, because people are lazy and don't want to learn things or change how they work. Even if DevOps is great, if the executives in your company don't force it down everyone's throat with business policies, training, etc, nobody will ever actually do it, because nobody really wants to.
Platform engineering is not gonna solve what DevOps is failing to solve. It's just another silo. "Let's empower developers" sounds great. But developers still lack most of the knowledge to maintain a complex system. Give them a million super-powered tools and they will still screw things up. Most developers I meet today don't even understand the concept of DNS. That's not exaggeration - they literally don't understand how hostnames work, record types, zone delegation, authoritative records, ttls, much less propagation or transfers. And you want to, what, give them a fancy tool to change the system that they don't understand? You still need operations, in any business, not just tech. Somebody has to be paid to care about the boring shit that keeps the business working. There is no way to automate away that responsibility, in any complex system in the world. A platform eng team is just adding another team on top of the Ops team you will always need.
What would actually solve a lot of this - and nobody is going to like this - is boring-ass business management best practice from the 50s. W.E. Deming. Lean. Six Sigma. The stupid shit that MBAs nerd out over? That stuff works. High-performing businesses that don't just pay it lip service, but actually do PDSA, actually train their workforce, actually continuously improve their process, and make better business outcomes. But who among the tech nerds wants to listen to that? They just want to play with their toys and have no responsibility. "Build me a platform so I don't have to use my brain."
People have been studying businesses for the better part of a century. There is no easy way out of the morass. No single team or tool or paradigm will make things better. Until you consider everything, holistically, and put into place a barrage of different solutions, and actually teach people to do their jobs better, the actual outcomes of the work won't improve.
FWIW, "Platform Engineering" used to be something different than what the author is suggesting.
Glad to finally see someone else saying this.
Shouldn't the "DevOps" team build APIs and consoles so eng teams can provision their resources without knowing TFE/YAML from a mile away? Really, this is year 2202. Please, learn from AWS and Netflix. Do software development. Get rid of the god damn ticketing process. Friends don't let friends touch TFE/YAML or whatever configuration files.
- Systems get messy unless you have configuration management that converges on top of ephemeral instances
- Build packages that work the same in all environments if possible
- Dev-prod parity: if that means running Docker locally or on some throw-away dev servers, do it
- 12factor principles
- Every business department has an API and shares the same or similar messaging and storage tech
- Change control: have high-available (HA) to where you're never upgrading individual machines and always backup/restore/replacing with fresh nodes
- Repeatable, precise, cached, incremental, distributed builds. Even if it means throwing out timestamps in the binaries
- Cache build artifacts forever (almost)
- Cryptographically sign commits and build artifacts (don't sign hashes because those are weaker)
- Canary, sharded deployments
- Rarely/never allow changes directly to boxes
- Require PRs made through configuration management
- Require PRs have to have another developer code review them
- Automate the heck out of linting and testing before it's generally allowed to drop in prod
- Store secrets either in conf mgmt or in a separate system like Vault
- Have SREs who know how things will end up in prod to troubleshoot the complexity and provide CI/CD and {I,P,Db,S}aaS
- Reduce the number of duplicated systems to a minimum but not to where it is too awkward or difficult to maintain
- Remember that prod, corp, and endpoints (phones and laptops) aren't the same but try to share bits as much as possible while isolating differences
- Corp tends to run more heterogeneous than prod. It's okay but don't let it get out-of-hand without having recommended (as opposed to mandated) standards except for security and third-party component review
- Eliminate technical debt: don't allow layers of crap or be precious about gross code just because it works
- Consider the risks and impact before making changes
- Have a backup/DR/BCP plan
If you don't have some sort of Devops team (yes, the term is too broad) to push back on infrastructure complexity and ensure that logging/metrics/alerting/documentation is done, developers just won't do it.
At least this is how the role works in my company.
Wow, it's like you can prove anything with a contrived example!
I was convinced this was exactly what devops wanted to address and solve.
* Infra as code
* Infra code sits in a repo
* Infra code is managed by the (senior) devs
* No separate ops team, or ops role, is needed: it's a responsibility of the devs now.
* Monitoring production: also by the devs -- if an issue arises "we" take turns in solving them as devs.
DevOps is no longer managing your own hardware, but using hardware-behind-an-API in the cloud.
"Turning the carnival bumping cars into a ferry's wheel or maybe the teacups"
Today's Dev-Ops is a Teacher's Lounge Room with a big suggestion box by the door.
One more shortcoming in the long list of modern needs.
[01] https://linkedrecords.com/the-big-devops-misunderstanding-84...
You should at least read the "dev ops handbook" before reading this article.
The article should then talk through its points using the book as a reference.
oh, now i understand this blog post..
The real reason devops exists (what used to be called Application Engineering) is because development is pretty clueless about the customer's runtime environment. They don't really understand the tuning necessary to create a production system.
Example: very few developers understand how big your DB connection pool should be, because they tend to only test one connection scenarios. That's assuming they're actually know enough to use a connection pool. And they almost never handle failover scenarios.
DevOps should be the purview of the senior engineers. If you're building without an understanding of your deployment environment then you're screwed. And more importantly, you're not taking advantage of platform features.
As an example, deploying to SSDs (which everyone should be doing) means your database performance has just gotten an order of magnitude better for free. You need to retest and get rid of a lot of those performance-related changes.
Let's put it this way: one monolithic Java/spring application I worked on was basically a SOAP server with a UI that took up gigs of RAM. It was that big because the scaffolding required to handle all that was, well, huge. But really, it was essentially a web server that served pages to connected clients. All that other shit was overhead...so on AWS it got transformed into a few lambda functions and REST apis. Without an understanding of the deployment environment (and the possibilities associated with that) it would never have happened.
TL;DR: senior engineers should be the devops people, because how and where you deploy software should determine how you should make software.
My experience tell that most engineers don't want to do any work and it's ok.
Yeap. Largest red flag there is.
===
A colleague of mine once shared his lay sociological theory about dev vs ops, and if taken for what it is -- an essentialization -- it's an interesting perspective. The idea is that ops people have inherited a blue-collar culture, whereas devs have inherited an office-worker/academic culture.
Ops people conceive of their work as fundamentally operational: progress is measured in terms of actions taken, and while automation is greatly valued, there is nothing inherently "messy" with one-off fixes; the objective is to get things working now. The pathological case for this mindset is that of constantly being on the back-foot, responding to incidents with one-off fixes without recognizing that many of them share a common cause that could be addressed.
Dev people conceive of their work as fundamentally intellectual: progress is attained when the problem is correctly conceived, at which point the solution follows naturally. While writing code is greatly valued, most effort should be spent understanding the problem; the objective is to solve it correctly, once and for all. The pathological case for this mindset is that of over-engineering by an ivory-tower idealist, disconnected from the messiness of real-world praxis.
In nearly all orgs I've seen, the proportion of first-generation college grads is greater in ops teams than in dev teams. So too is the proportion of people who come from blue-collar families, or are mechanically inclined (ex: look around and see who tinkers with cars). Likewise, the proportion of people holding graduate degrees is greater in the dev crowd (ex: look around and see who's into math).
It should hopefully be clear that neither is superior to the other. The point is rather that the divide between dev and ops is partly sociological, which means it is largely based on values. Ops will tend to over-value "honest hard work" and dev will tend to over-value "clear articulate thought". There is also some latent, historical tension between these sociological groups, which has a funny way of masquerading as a technical problem. It is helpful to view arguments about "just ship it" vs "design it the right way" through this lens.
Far from being a cute "just so" story, it's been my experience that this dynamic is very important for two reasons: (1) it's harder than you might expect to foster the sense of common destiny required for real "devops"-style collaboration, and (2) each side of the dev-ops divide has a lot to gain from learning when the other side's mindset is helpful, and how to cultivate it in themselves.