just listened to the july 25th episode #219 of floss weekly about chef with opscode chief adam jacob... great episode!
i decided i wanted to write a reaction to it for a few reasons.
most significantly, i felt adam's comments about devops culture and the no a-hole rule were particularly enlightening, nuanced, and instructive, and being inspired by them i felt i actually have a nugget i wanted to add.
secondly i have some views about tools and approaches that are rather different than models like chef, but i thought the way adam described the problem space and philosophy was right on, and i want to steal this framing to support and illustrate my own ideas.
finally there are some various things that concern me about models like chef, and there were a few points that i felt were controvertible and figured i'd take a shot at writing up why.
so to start with about devops culture, i've often felt the dev/ops divide seemed somehow like pedantic and obsolete industry legacy, but adam's observation how it throws back to things like mainframe specialists, and how that doesn't map to supporting modern general purpose systems was cool. both specialization and generalization seem equally if not moreso important in the ops world as the dev world these days. it was also interesting to hear adam articulate how often ops would rather be called devops, whereas devs are pretty much universally happy to be called devs. the part i want to add here is that in the traditional dev/ops model, devs often end up in an organizationally more powerful position because it is more cross functional than ops strictly is... devs interface with marketing, product, etc whereas in the old model ops would mostly interface with and support dev only. being inherently less well connected organizationally puts ops on the trickier side of the power equation when things get tough, and i think it's the traditional but subtle association with power that predisposes folks about these titles. the idea that devops is largely about working like an integrated team and being cross functionally involved remains the prescriptive response here. still a title change might not be a bad idea to help disassociate the old model more quickly. personally i usually call myself system engineer.
regarding a-holes, well i actually don't have much to add here but the way adam talks about how our culture celebrates smart jerks like bofh, and casually trolls in pseudo dehumanizing electronic contexts, yet absorbs enormous cost when it takes 4-5 good interactions to effectively remediate one crappy one is fascinating. i've never had much ability to cut people down, so it's easy for me to say others shouldn't do it either... i can see why it would be tempting to play into that culturally celebrated smart-jerk stereotype. but we all know it sucks and having the quantitative argument here is awesome. having a culture that supports calling it out when it happens is awesome.
regarding tools and approaches, i agree with adam that thinking more like a dev is important, and working hard at honing skills like text editing is essential. devs live and breathe their text editors. but adam doesn't talk a whole lot about revision control, and i feel there is an essential dev analogy here too. for a long time, even still to an extent, revision control tools worked best on a limited size source tree of say a few thousand or less files. a dev working in this source tree is able to dive in and change code here and there without worrying about forgetting what they've touched. revision control allows them to review what they've done, perhaps roll back things that were experiments or trial and error or accidental, and to communicate in a contextualized, annotated (commit commented), and concise way to the team what it is that they've finally decided to change. this auditable, iterative, communicative tooling is huge. HUGE. in fact i think it is a big part of the reason that tools like chef have come along - they allow us to express system level things in a source tree that is small enough to be readily revision controlled. but if there is any truth my idea here, compact expression is totally unnecessary. tools like fsvs are now powerful enough to allow us to use iterative revision control workflows directly on the huge system tree with hundreds of thousands of files. to me this takes away a lot of the benefit of abstracting your workflow in a higher level language like chef, and the versionable compaction of expression that chef enables starts to feel more like obfuscation.
adam also mentions that a tool should ideally be agnostic to methodology. tools and methodology are hard to imagine seperating, but i think this is an excellent idea. and again here i think a blanket system level revision control approach fits this bill... no matter what tooling you use, a lot of what you do manifests in making changes to files. in fact one could imagine using something like chef and blanket revision control side by side... applying chef changes and then cross checking against the revision control diff to confirm that chef actually did what you thought it was going to do. or in reverse, implement a change on a system directly and then develop chef such that it does the same thing without producing any additional diff.
finally i wanted to ruminate on the idea of convergence. the idea that you have to re-run a rule set many times to reach a goal state has always bothered me. it bothered me with cfengine, it bothered me with puppet, and although i don't have experience with chef in particular, i'm sure it would bother me there too. this need for re-running is a real burden to short iterations, and short iterations are very important for workflow and staying in the zone (the zone being another useful dev analogy).
i feel like there may be some essence of convergence that i am not understanding. what does a convergent model enable that could justify a squishy loopy workflow? i feel like adam got me a little closer to understanding one of the values of a convergence model: since in large systems, hosts within the system can be at different stages in their life cycle, and convergence closes a host on goal state independent of it's life cycle stage. but this seems unsatisfying... why can't the tools be smart as a package manager, figuring out the current state, and performing the proper sequence of changes to reach a goal state, all in one shot? and for any non-runtime, file based config change, i think the blanket revision control model of committing on a master and updating all branches and hosts derived from it is simpler and also solves hosts in different life cycle stages.
perhaps there is some fuzzy philosophy about large system complexity going on with convergence... i'd love to see a convincing application of it, but so far i haven't seen how the benefits outweigh the costs.