Thursday, August 9, 2012

reaction to floss weekly 219

just listened to the july 25th episode #219 of floss weekly about chef with opscode chief adam jacob... great episode!

i decided i wanted to write a reaction to it for a few reasons.

most significantly, i felt adam's comments about devops culture and the no a-hole rule were particularly enlightening, nuanced, and instructive, and being inspired by them i felt i actually have a nugget i wanted to add.

secondly i have some views about tools and approaches that are rather different than models like chef, but i thought the way adam described the problem space and philosophy was right on, and i want to steal this framing to support and illustrate my own ideas.

finally there are some various things that concern me about models like chef, and there were a few points that i felt were controvertible and figured i'd take a shot at writing up why.

so to start with about devops culture, i've often felt the dev/ops divide seemed somehow like pedantic and obsolete industry legacy, but adam's observation how it throws back to things like mainframe specialists, and how that doesn't map to supporting modern general purpose systems was cool. both specialization and generalization seem equally if not moreso important in the ops world as the dev world these days. it was also interesting to hear adam articulate how often ops would rather be called devops, whereas devs are pretty much universally happy to be called devs. the part i want to add here is that in the traditional dev/ops model, devs often end up in an organizationally more powerful position because it is more cross functional than ops strictly is... devs interface with marketing, product, etc whereas in the old model ops would mostly interface with and support dev only. being inherently less well connected organizationally puts ops on the trickier side of the power equation when things get tough, and i think it's the traditional but subtle association with power that predisposes folks about these titles. the idea that devops is largely about working like an integrated team and being cross functionally involved remains the prescriptive response here. still a title change might not be a bad idea to help disassociate the old model more quickly. personally i usually call myself system engineer.

regarding a-holes, well i actually don't have much to add here but the way adam talks about how our culture celebrates smart jerks like bofh, and casually trolls in pseudo dehumanizing electronic contexts, yet absorbs enormous cost when it takes 4-5 good interactions to effectively remediate one crappy one is fascinating. i've never had much ability to cut people down, so it's easy for me to say others shouldn't do it either... i can see why it would be tempting to play into that culturally celebrated smart-jerk stereotype. but we all know it sucks and having the quantitative argument here is awesome. having a culture that supports calling it out when it happens is awesome.

regarding tools and approaches, i agree with adam that thinking more like a dev is important, and working hard at honing skills like text editing is essential. devs live and breathe their text editors. but adam doesn't talk a whole lot about revision control, and i feel there is an essential dev analogy here too. for a long time, even still to an extent, revision control tools worked best on a limited size source tree of say a few thousand or less files. a dev working in this source tree is able to dive in and change code here and there without worrying about forgetting what they've touched. revision control allows them to review what they've done, perhaps roll back things that were experiments or trial and error or accidental, and to communicate in a contextualized, annotated (commit commented), and concise way to the team what it is that they've finally decided to change. this auditable, iterative, communicative tooling is huge. HUGE. in fact i think it is a big part of the reason that tools like chef have come along - they allow us to express system level things in a source tree that is small enough to be readily revision controlled. but if there is any truth my idea here, compact expression is totally unnecessary. tools like fsvs are now powerful enough to allow us to use iterative revision control workflows directly on the huge system tree with hundreds of thousands of files. to me this takes away a lot of the benefit of abstracting your workflow in a higher level language like chef, and the versionable compaction of expression that chef enables starts to feel more like obfuscation.

adam also mentions that a tool should ideally be agnostic to methodology. tools and methodology are hard to imagine seperating, but i think this is an excellent idea. and again here i think a blanket system level revision control approach fits this bill... no matter what tooling you use, a lot of what you do manifests in making changes to files. in fact one could imagine using something like chef and blanket revision control side by side... applying chef changes and then cross checking against the revision control diff to confirm that chef actually did what you thought it was going to do. or in reverse, implement a change on a system directly and then develop chef such that it does the same thing without producing any additional diff.

finally i wanted to ruminate on the idea of convergence. the idea that you have to re-run a rule set many times to reach a goal state has always bothered me. it bothered me with cfengine, it bothered me with puppet, and although i don't have experience with chef in particular, i'm sure it would bother me there too. this need for re-running is a real burden to short iterations, and short iterations are very important for workflow and staying in the zone (the zone being another useful dev analogy).

i feel like there may be some essence of convergence that i am not understanding. what does a convergent model enable that could justify a squishy loopy workflow? i feel like adam got me a little closer to understanding one of the values of a convergence model: since in large systems, hosts within the system can be at different stages in their life cycle, and convergence closes a host on goal state independent of it's life cycle stage. but this seems unsatisfying... why can't the tools be smart as a package manager, figuring out the current state, and performing the proper sequence of changes to reach a goal state, all in one shot? and for any non-runtime, file based config change, i think the blanket revision control model of committing on a master and updating all branches and hosts derived from it is simpler and also solves hosts in different life cycle stages.

perhaps there is some fuzzy philosophy about large system complexity going on with convergence... i'd love to see a convincing application of it, but so far i haven't seen how the benefits outweigh the costs.

Friday, June 29, 2012

devops days 2012 = phenomenal

hands down the coolest conference i've ever attended. sure i learned some things and met some exciting people, but i wasn't expecting such a strong vibe. i might be projecting a little since i'm starting a new job next week and my head space is already a bit future-possibility oriented.

one session i attended about culture encouraged us to blog about it... so here it is:

most discussions about devops lately have pretty unanimously identified a management and cultural component. one of the afternoon "open spaces" sessions with Spike Morelli covered the topic of how we might seek to enhance an occupational culture.

he asked a provoking question, "how many of us REALLY, GENUINELY CARE about improving our culture?". it caught my interest because though i think we all agree that culture can be game changingly important, it also seems a little intangible and mysterious... particularly for technical types, who are not stereotypically known for interpersonal and social aptitude. like politics, it's always bigger than one person and so can feel kindof outside your control, and there can be a sort of apathy.

if we agree that great culture, whatever that means, is not universal, then i'd even go so far as to say apathy is a kind of survival skill... it's a natural thing for an individual instinct to guide you to do great work independent of residing in a subpar culture which you don't feel you can bend at will.

at the start of the session, spike and John Willis made an interesting observation: of all the dozens of open spaces topics, this one was virtually the only one that was not about tools and technology. it seems like a dichotomy that there has been so much devops forum discussion about management and culture, yet we don't really see anyone doing anything identifiably real about it.

as the discussion went on, a bunch of ideas flew around about how the goal culture could be defined and what tactics for closing in on it might be.

the point was made that group values are diverse and contextual... a large stable company with an established staff might be legitimately unaggressive... a culture of paycheck takers and don't-rock-the-boaters whose real passion is for the life they pursue outside of work. who can fault that? some cultures might be service oriented, some might be product oriented, some might even be devops oriented. my take away here was that although culture varies, one thing strong cultures have in common is a high level of alignment. i think for alignment to be real it needs a scope that is neither too broad nor too narrow.

as for tactics, a few that i made note of were: tech talks, attending the standups of other groups, message repitition, in depth slide sets. one interesting idea for larger orgs, where changing the entire company culture is hard, you might be able to start within your own smaller team, and you might benefit from a talented management "buffer" to allow the culture to exist while sheltering and resolving conflicting modes coming down from upper management tiers.

that about covers the material from the session. on a meta level it was interesting to learn a little about culture within an event that itself seems very culturally strong. though i've read a lot about devops values on the forums, it felt very different discussing it in person. i guess this should not be surprising, but it was striking.

the end of the day was a open bar mixer, had a few beers and munchies and hung out with some old friends i hadn't seen in awhile. there was also a little jam session, got to listen to some good bass players, and then to my delight a great funk band! i forgot to get their name... hopefully get to see them again sometime.

Tuesday, August 2, 2011

backup strategy model

a couple years ago i wrote up a formalization of my backup strategy, but wasn't sure where i wanted to publish it. i finally decided to just stick it in my blog:

this document aims to describe a simple to implement and comprehensive backup strategy, including some essential capabilities for encryption, deltas, delta retention, large files, remote backups (geo-diversity), and diverse storage device sizes.

the target audience are relatively sophisticated users with a moderate level of familiarity with backup technologies. it is assumed that the user has physical access (sneaker-netable) to a remote, network connected site or sites on a fairly regular basis (i.e. commutes to work), and is able to attach enough storage (disk) there to contain the data sets plus deltas, and has access to a portable mass storage device (i.e. usb hard drive) for sneaker-net operations.

it is my belief that even with familiarity of backup technology, it is a challenge to define a good strategy that is safe and readily applicable to a variety of situations.

the general method of this document is to identify the essential properties of data sets, then of backup tactics, and then to define a simple formula for mapping one to the other in a way that optimizes pragmatic costs, resulting in a overall best practice strategy.

data set properties

there are 4 of them: base size, delta size, update frequency, and sensitivity.

the first 2 are based on size of data. these days, in terms of remote backups, data sizes generally fall into two buckets: large and small. data is large if it is too much to upload to a remote site over the cloud in a reasonable amount of time (hours). otherwise, data is small. upload time should take into consideration a throttled connection, since backup is likely sharing a network link with other normal network traffic. this large/small distinction applies in the sense of the total size of the data set, as well as in the sense of deltas to be backed up (i.e. are changes relatively large or small). a small data set may be something like a source code tree, and a large one may be something like an mp3 collection. a small delta may be something like adding a single mp3 to the collection, and a large delta may be something like mythtv capturing several gigs worth of video to a video folder.

update frequency also falls into a couple buckets in terms of remote backup, rare and common. rare is relative to delta retention time... generally a retention time of 3-6 months is more than adequate for most backup situations... if a change has gone 6 months without the need for reversion, it is generally safe to prune it's delta. so, if a set is updated less than once every 3-6 months, we'd classify it as rarely updated.

sensitivity is certainly the most subjective data set property, and can potentially have a range of classes that is very complex... it can depend on who "owns" the data, how old it is, who it is safe to be exposed to, the nature of environment in which a remote backup site resides, and perhaps several other factors. for the purposes of this document, we will define it fairily simply: a data set is sensitive if the user deems it unfit for un-encrypted exposure to a remote environment. for example, if the remote environment is an office, it is probably safe to back up your mp3 collection there without encrypting it (you probably want to listen to it there anyways). but perhaps your resume backup should be encrypted. if the remote environment is a public webhosting server, you might need to encrypt your mp3 collection to avoid exposing/publicizing copyrighted material illegally, but it's probably ok to back up your resume there. so given a remote site, a backup set is either sensitive or it is not... again, a 2-bucket property. one more note about sensitivity, a paranoid person may prefer to encrypt all remote backups. while this is a certainly in the realm of possibility, it should be understood that encrypting does exact some sacrifice, e.g. accessibility of an mp3 collection as described above, as well as some other technical, resource, and management burdens, which will be described later in this document.

so, there are 4 essential properties of a dataset, each of which are basically binary in nature... so on the surface it seems as though this should result in 16 different data set profiles... but actually a small base size implies small change sizes, so 4 of these combinations are actually bogus... therefore, there are 12 essential data set personalities. at first this seems overly complex, but in the next sections we will discover that there are only 3 useful backup tactics, and certain properties dominate others relative to those tactics, and so the optimal decision about how to back up a given set is actually relatively simple.

backup tactic properties

there are essentially 2 properties of a backup tactic, is it remote or local, and is it encrypted or clear... definitions of those properties is obvious.

once again, we have a set of binary properties... 2 of them so it would seem as though this should result in 4 backup tactic profiles, but actually an encrypted backup implies it is remote. there is no point in encrypting a data set and storing it on the source/local environment, since the risk exposure has not really changed.

reasoning/mapping the strategy

the essentials:

in general, if something is worth backing up, it is worth backing up remotely. however this is not always possible with certain data set properties. specifically, sets with frequently updating large deltas are only suitable to local backups.

sensitive backups should obviously be encrypted, assuming they are not frequent large deltas, in which case they can only be backed up locally as stated above, which furthermore more implies that there is no point in encrypting, as stated further above.

the tool, resource, and device constraints:

this document assumes tools are differential, meaning most backups are not full backups but just delta backups, vastly preserving storage and bandwidth. large data sets can and should be backed up remotely (assuming they are not frequent large deltas). of course the full backups will consume storage, but more importantly bandwidth... this is where the portable mass storage device comes in... sneaker netting is probably the only channel with enough bandwidth. obviously the device should be large enough for a full backup of the data set, so this is one constraint on data set sizes. if you have a data set larger than commonly available drive sizes, you will want to try to divide it into smaller sets.

i use 2 tools for backups, duplicity for encrypted backups and rdiff-backup for clear. they are both differential and remote-over-ssh capable. your remote site should be able to initiate or terminate an ssh channel.

duplicity is a traditional differential scheme... differentials generally consume over twice as much storage as the source. someone familiar with differentials should understand that this is because storage for 2 full backups are needed to leapfrog the retention periods, plus storage for the deltas/incrementals between the fulls. this has major effects on it's suitability for remote backing up of a data set: if the data set is large, you will be forced to routinely sneaker full backups to the remote site (which may not be a burden you want to bear on a regular basis), or (better) you will need to try to divide the set into a large but stable portion and a smaller dynamic portion. the large portion will only consume storage for a single full backup, and will only require a single sneaker operation, saving both storage and sneaker effort. the smaller dynamic portion would not require routine sneaker-net, and also marginalizes the issue of the backup consuming 2x the storage of the data set since this portioned data set will be small.

rdiff-backup it is a "reverse-differential" tool, meaning that rather than starting with a full backup of a data set and storing the deltas for each backup cycle needed to bring it up to date, rdiff-backup always keeps the backup set up to date, and instead stores the deltas needed to back-date it. this accomplishes 4 important things:

1) traditional forward differentail solutions require routine full backups... with rdiff-backup, you need take only one full backup and thereafter all backups are only delta backups. this means the backup only consumes 1x the storage of the data set plus the deltas, and more importantly will only require a single sneaker operation for the initial full backup.

2) the most common needs for restore from backup are those "oh shit" moments, like accidentally deleting the wrong thing or performing some irreversible operation. you usually realize it right away. in these cases, you generally want the most recent backup. in rdiff-backup, the most recent backup is the most trivial one to restore, since no deltas need be applied.

3) there is never a need to subdivide a large data set based on stable and dynamic portions the way you may need with duplicity.

4) each cycle will expire a delta, spreading out disk io for deletion rather than doing a huge full deletion each time a retention period passes, saving disk io.

(one disadvantage of reverse-differential is it does not work for tape, since random updates of the full are not possible. but with disk sizes today, who wants to mess with tape? unfortunately amazon s3 also fits this pattern... data can not be updated but only created/deleted... we can still use s3 for small sets though)

after having considered the above, we'll decide which tactic makes the most sense for a given set, and perhaps whether you want to refactor/divide the sets. here is a table listing the optimal tactic for each possible data set personality:

frequent updates
| large changes
| | large set
| | | sensitive
| | | |
0 0 0 0 remote/clear
0 0 0 1 remote/encrypted
0 0 1 0 remote/clear
0 0 1 1 remote/encrypted (subdivided or routine-sneaker)
0 1 0 0 bogus - small set would have relatively small changes
0 1 0 1 bogus - small set would have relatively small changes
0 1 1 0 remote/clear
0 1 1 1 remote/encrypted (subdivided or routine-sneaker)
1 0 0 0 remote/clear
1 0 0 1 remote/encrypted
1 0 1 0 remote/clear
1 0 1 1 remote/encrypted
1 1 0 0 bogus - small set would have relatively small changes
1 1 0 1 bogus - small set would have relatively small changes
1 1 1 0 local/clear
1 1 1 1 local/clear

on inspection, we can see that there are 2 dominating patterns, and the decision flow is simple enough: if updates are large and frequent, backup should be local/clear (rdiff-backup). otherwise, backup is based on sensitivity of the data set.

conclusion:

once you understand the essential properties of data sets and backup tactics, modern technology and resources commonly available to a relatively sophisticated user allow for a simple implementation of a pragmatically effective best practice backup strategy.

Saturday, July 30, 2011

my gym laptop - some commentary about building a touchscreen video viewing platform to use at the gym.

so i have the same story i suspect that many folks do regarding fitness... i knew it was time to get my rear to work on it, but the gym was boring, running was too hard on my knees, cycling seemed a little dangerous and weather dependent, etc etc excuses excuses. for a long time i dreamed a solution would be having a room in my house to put an elliptical machine and be able to watch my own programs, both to pass the time and to sortof kill 2 birds with one stone. this was reinforced by the idea that visual stimulus really helps make workouts go by faster. the problem that it probably took me too long to acknowledge is living in san francisco (and reluctant to leave), i wasn't likely to have this this extra workout room in my house any time soon.

i finally came into the idea that instead of bringing an elliptical into my house, i should be able to take my videos to the gym. in fact i have a gym a block from my house, one of the nice tradeoffs of having little personal space due to living in a dense area. in 2009 when i did this project, this felt like this was a particularly opportune project idea, with lots of technologies finally coming available at what felt was an exciting breakthrough time.

the solution may seem a little more obvious to others than it does to me. some might say iphone, but i don't think the screen size cuts it, either for viewing or control from a bouncy situation. some might say ipad, but i doubt the storage or locked-down environment would work. in either case, i doubt real quality roaming streaming capability is there. also i wanted to be able to sync all my content in a way that made the experience integrate with my sofa watching experience... random access and delete after consuming both at the sofa and at the gym, and that would take decent storage.

i decided to build the ui based on a web browser, which i have a lot of experience coding to. but there were still a lot of things i felt i needed, and fortunate to have at the right time:

a capacitive touch interface with a decent screen size. it needed to work with linux also. i decided on the hp tx2 tablet laptop, which i am very happy with, and some smart folks in the ubuntu community had just finished the work of figuring out how to drive it in xwindows. the tablet mode is very nice, i can lean it up in the magazine holder on whichever machine i'm using, and adjust the viewing angle as needed depending the holder's position. but, the tx2 is not without it's issues. only one of the three bezel buttons seems to work under linux for me... better than none tho. the removable dvd drive catch broke, so i don't really have one anymore. why it was removeable in the first place is a mystery... i called hp and the dvd drive is the only option available for that bay. it doesn't want to power up without a power plug, but after that i can remove the plug and run on battery normally. the touchscreen is really very good, but does have some phantom jumping and sometimes clicking when the screen displays a lot of moving dark contrast... not usually a problem though.
good amount of storage. these days a 640G 2.5 inch drive to go in a laptop can hold a very respectable amount of video for less than $100.
a compositing window manager (i felt the right control design was a transparent one, which would accommodate large buttons for bouncy fat finger manipulations, without encumbering video display real estate). i switched from fvwm (my wm of 11 years) to compiz, and the switch was shockingly painless even with my extensive old school xwindows desktop customizations. it also had the right features i needed to programatically control the window opaqueness (which i did with compiz display rules based on window titles, and i was able to manipulate the window titles with vanilla html title fields).
file synchronization (i wanted the exact same content available on the elliptical as on my sofa, and deletable in either place). i happened on relatively new csync, which is a sortof efficient bidirectional rsync. it fit better than unison, a different synchronizer i love but is more suited to text, often doing full file scans which just did not work on the huge video or even audio files (i've cobbled together a similar portable podcast content platform based on a sansa clip for my work commute, but that's another story).
good content collecting tools (podcatcher, youtube downloader, compatible format computer dvr, various sniffy scraper techniques, bittorrent).
grab-and-drag firefox plugin - for scrolling ui lists from the touchscreen
a number of other technical tools that were either relatively new or i just hadn't learned how to use yet:
- xautomation/wmctrl - for launching and positioning windows in the wm... fullscreening, foregrounding, switching desktops, sending keystrokes. xsetwacom for enabling/disabling the touchscreen with the bezel button, to avoid phantom clicks while the laptop was stowed between uses.
- floating html layouts - i just learned how to do these from my coworker henry. just what i needed for building scrollable areas and large fat-finger touchscreen buttons and generally decent control layouts.
- jquery - making the control code easier to write
- xmlhttprequest - for asynchronous controlling of the video player
- video player with an approachable control api - i've used mplayer for years, and delighted to find it had fine full featured named-socket controls, and all the playback features i could ask for (disableable onscreen display with elapsed/total time, volume level, skipping an arbitrary period forward or backward, a/v delay for bluetooth syncing (i eventually gave up on bluetooth audio and decided wired earbuds worked better... some footnotes below)).

this project, though admittedly somewhat humble (there's plenty of mobile video gadgetry out there), was really exciting from beginning to end. it's not the first time i've built something that i knew was going to make my life better. but it was uniquely remarkable when i think about how a dozen different pieces all came together at the right time in the right way. many of the pieces hadn't existed just a couple months or years before, and just as many pieces may have existed but were completely new to me. on top of all this, any engineer knows that the more moving parts, even if they're mature and familiar, the more complex and likely to fail an idea will be. but in this case, with a just a modest amount of engineering and research, i wound up with exactly what i had imagined i wanted.

now, a couple years later, and can say with certainness this project was a total success. i still get excited about not wasting time watching video at home (well, not AS much), and about having tons of interesting and current content available to consume while burning calories at the same time.

here's a youtube video of the rig in action:

http://www.youtube.com/watch?v=K2637RSIvLM

footnotes:

i put a good amount of effort into wireless bluetooth headphones, but eventually gave up on them. i went through 3 different pairs that would break or just not cut it in different ways. the plantronics voyager 855 was too quiet. the motorola rokr lost sync too easily and often. the rocketfish knockoff of the motorola rokr performed the best and was cheapest, but the power switch eventually broke. it was also tricky to keep any of the headphones paired. i found i had to send a silent track at all times, else the headphones would go to sleep after a very short time. it was also a pain to manage the sync by delaying the audio on the player... but i would have put up with this if not for all the other issues.
i use firefox for the interface currently. i tried to use chrome, but it had funky behavior when i was remote and not connected to the network and just connecting to the local web cgi. also it did not blank the pointer the way i wanted, especially since the tx2 had a phantom mouse movement which would be distracting in front of the video if not blanked well.
cost was relatively modest. more than an ipad, but what i have is much more capable for my needs also. $800 for the hp tx2, $100 for 640g hd, and maybe $30 for the earbuds.

tostaa

finally published tostaa. been wanting to do this for a long time, but really wanted to work out some bugs before going public. however, since it works good enough for me, i never seem to find time to fix it up for prime time. so, figured i'd just throw it out there.

sourceforge link: http://sourceforge.net/p/tostaa/home/Home/

youtube screencast demo: http://www.youtube.com/watch?v=XBvMh6zR1Ic

Sunday, November 23, 2008

gsm interference is lame

one vote for gsm buzzing my speakers being an egregious failure of the fcc and part 15 interpretation. ok, i didn't actually read part 15, but there are just too many audible devices and speakers affected by too many transmitters. if the spirit of part 15 is to encourage cheap consumer devices, i feel that mission has lost a relatively large amount of ground. gsm has been around for over 10 years, but yet i've never seen a consumer product claiming to be "gsm buzz proof"... feels like we've left behind to fumble with clumsy ferrite core and anti-static bag solutions. one problem of course is gsm has such critical mass around the world that it would be difficult to unroot at this point. it'll be interesting but painfully slow to see if this issue becomes popular and how it might be solved in the next 5-10 years.

Tuesday, June 3, 2008

good kk post

this yunus type stuff rocks...

http://www.kk.org/thetechnium/archives/2008/05/technologies_th.php