Infrastructure

Read more about Infrastructure: Why We Love Cattle And Think Like A Rancher

Dog

Okay, okay -- we don't actually have it in for Fido here at Mobomo. In fact, we're quite the puppy- and kitty-loving group. But when it comes to building infrastructure to support your web or mobile application in today's cloud-based environment, it has never been more important to forget everything you ever knew about caring for pets, and instead, start thinking of yourself as a cattle rancher.

Pets, in a word, are unique. Aren't yours? They're cuddly, quirky, and require lots of tender loving care. And while that's delightful if your pet is a Basset hound or a Maine Coon, it's generally not the best model to follow if your pet is a server.

In the old, dark days before virtualization and cloud services, all servers were pets -- physical boxes screwed onto a rack in a data center (or worse, sitting underneath Larry's desk in the office server closet; careful with that Big Gulp, Larry!). Each one was named, crafted by hand, and given copious attention by a neck-bearded SysAdmin who came to love his servers as much or more than his children. Pets are indispensable and irreplaceable, and when they get sick, it's a BIG deal. After all, what self-respecting pet-parent wouldn't drop everything to nurse his/her little furbaby back to health?

But in the rough-and-tumble world of application development, that's a big problem. Time spent troubleshooting machine-specific problems is a deadweight loss for every project's bottom line. To say nothing of that desperate feeling when the CEO's pet project is suddenly unresponsive, and Larry's the only one who knows anything about the machine it's running on, and Larry's not answering his pager...

Clinton2 Cattle, on the other hand, aren't like pets. They're typically given numbers instead of names. They're expendable and (with apologies to vegetarians everywhere) disposable. Rather than being cared for individually, they are completely managed by repeatable and documented processes. If there's a problem with one cow, the herd is unaffected. The sick cow is, shall we say, simply removed from the herd and life goes on.

The advent of virtualization, cloud computing, and provisioning tools like Chef and Puppet have combined to allow technical architects to think of their resources as cattle instead of pets. This thought -- that systems are not hand-crafted masterpieces but cogs in a machine managed by repeatable rules -- is at the core of the DevOps philosophy. While there is a great deal of spirited debate over what DevOps precisely means, at Mobomo it boils down to three rules:

Infrastructure is Code

Larry's not going to like this, but he'd be a lot better at his job if he thought more like a developer. All those manual processes he goes through to create and maintain infrastructure, all those one-liner shell scripts StackExchange hacks he's committed to memory, mean he's introduced a non-automated factor into the project's critical path: himself.

But fortunately (or not) for Larry, machine provisioning tools like Ansible and Chef, along with cloud resource templating systems like CloudFormation and Terraform, mean that it is now possible to remove all manual interaction from the process and allow an application's infrastructure to be defined by code, right beside the application source itself. This way it can be version-controlled, peer-reviewed, and easily tested in non-production environments, eliminating the incongruities that come with different sets of environmental variables. Beyond that, the code functions as de-facto documentation of your environment's structure, which tends to be way more reliable than whatever is lurking in the nether regions of Larry's memory.

Bottom line: if your entire application infrastructure can't be rebuilt with the push of a button (or the running of a single shell script), then it's not in code, and therefore your servers are pets, not cattle.

Infrastructure is immutable

Related to the above is the idea that infrastructure should be immutable -- that is, never altered on the fly once it has been created. This rule prevents Larry from, say, noticing a bug and updating a configuration file on a running server, thus causing headaches the next time the application is deployed or has to scale-out. If all running resources are treated as inaccessible black boxes, this means code changes can only be made via the version-controlled provisioning scripts and templates, thus ensuring that each deployment or scale-out of the application will be running on identical servers.

Furthermore, treating infrastructure as immutable allows us to think about deployments themselves differently. Rather than deploying an application by pushing an update to run servers, crossing our fingers and hoping it works then desperately rushing to revert manually if a bug is discovered, at Mobomo we employ a Blue/Green deployment methodology.

When a production stack needs to be updated, it is deployed from scratch with new infrastructure (the "Green" instance) created from code each time. That allows QA testers (using a combination of automated and manual tests; more on the relationship between automated testing and Blue/Green deployments in a future post!) to verify the functionality of the new environment before it is made live. Then, production traffic is simply switched (by changing a DNS record or similar) onto the new/"Green" stack. If a problem is then detected with the newly deployed app, it is a simple matter to switch back to the old production "Blue" instance painlessly without the risk of extended production downtime. Once all stakeholders are satisfied, the old "Blue" stack can simply be destroyed.

Embrace the chaos monkey

Cattle-oriented infrastructure means coming to terms with the idea that failure is ubiquitous and constant, and rather than something to be feared, should be embraced. This means specifically designing your application infrastructure with the certainty that it will fail, and testing that failure constantly, in production.

We've recently been playing with a tool called Chaos Monkey created by the development team at Netflix. Chaos Monkey does one job and does it very well -- it runs in your production environment and randomly kills running server instances.

Let that sink in for a moment: if your infrastructure cannot tolerate the random, arbitrary death of machines, then you are treating your servers like pets instead of cattle.

Using techniques like load balancing, auto-scaling, and high-availability proxies, it has never been easier to embrace the Chaos Monkey. Failure should be assumed and automated procedures put in place (spinning up new machines, altering DNS and load balancer configuration) to handle that failure without any human involvement. Furthermore, with cloud services like AWS and Azure continuing to build hosted solutions that take the guesswork out of planning for scalability, we find ourselves in a brave new world in which most undifferentiated heavy lifting has been eliminated and we can focus on what we do best: building great applications.

So by all means, adopt a pet or three and care for them like family. But when it comes to supporting your Internet applications, it's long since time to start thinking like a rancher.

Backups

When you make a coding mistake, you can revert to a good known commit. But when disaster wrecks havoc with your data, you better have an offsite backup ready to minimize your losses. Enter the backups gem, a DSL for describing your different data stores and offsite storage locations. Once you specify what data stores you use in your application (MySQL, PostgreSQL, Mongo, Redis, and more), and where you want to store it (rsync, S3, CloudFiles), Backup will dump and store your backups. You can specify how many backups you'd like to keep in rotation, and there's various extras like gzip compression, and notifiers for when backups are created or failed to create.

Cron Jobs

Having backups configured doesn't make you any less absent minded about running your backups. The first remedy that jumps to mind is editing your crontab. But man, it's hard to remember the format on that sucker. If only there was a Ruby wrapper around cron... Fortunately there is! Thanks to the whenever gem, you can define repetitious tasks in a Ruby script.

Cloud Services

With the number of cloud services available today, it's becoming more common to have your entire infrastructure hosted in the cloud. Many of these services offer API's to help you tailor and control your environments programmatically. Having API's is great, but it's tough to keep them all in your head.

The fog gem is the one API to rule them all. It provides a consistent interface to several cloud services. There are specific adapters for each cloud service. By following the Fog interface, it makes it really easy to switch between different cloud services. Say you were using Amazon's S3, but wanted to switch to Rackspace's CloudFiles. If you use Fog, it's as simple as replacing your credentials and changing the service name. You can create real cloud servers, or create mock ones for testing. Even if you don't use any cloud services, fog has adapters for non-cloud servers and filesystems.

Exception Handling

Hoptoad is a household name in the Ruby community. It catches exceptions created by your app, and sends them into a pretty web interface and other notifications. If you can't use Hoptoad because of a firewall, check out the self-hostable Errbit.

Monitoring

When your infrastructure isn't running smoothly, it better be raising all kinds of alarms and sirens to get someone to fix it. Two popular monitoring solutions are God, and Monit. God lets you configure which services you want to monitor in Ruby, and the Monit gem gives you an interface to query services you have registered with Monit. If you have a Ruby script that you'd like to have running like a traditional Unix daemon, check out the daemons gem. It wraps around your existing Ruby script and gives you a 'start', 'stop', 'restart' command line interface that makes it easier to monitor. Don't forget to monitor your background services, it sucks to have all your users find your broken server before you do.

Staging

Your application is happily running in production, but all of a sudden, it decides to implode on itself for a specific user when they update their avatar. Try as you might, you just can't reproduce the bug locally. You could do some cowboy debugging on production, but you'll end up dropping your entire database on accident. Oops.

It's times like these that you'll be thankful you have a staging environment setup. If you use capistrano, make sure to check out how to use capistrano-ext gem, and its multi-stage deploy functionality. To reproduce your bug on the same data, you can use the taps gem to transfer your data from your production database to your staging database. If you're using Heroku then it's already built-in.

Before you start testing your mailers on staging, do all of your users a favor and install the mail_safe gem. It stubs out ActionMailer so that your users don't get your testing spam. It also lets you send emails to your own email address for testing.

CLI Tools

Thor is a good foundation for writing CLI utilities in Ruby. It has interfaces for manipulating files and directories, parsing command line options, and manipulating processes.

Deployment

Capistrano helps you deploy your application, and Chef configures and deploys your servers and services. If you use Vagrant for managing development virtual machines, you can reuse your Chef cookbooks for production.

Conclusion

All of these gems help us maintain our application infrastructure in a robust way. It frees us from running one-off scripts and hacks in production and gives us a repeatable process for managing everything our app runs on. And on top of all the awesome functionality these tools provide, we can also write Ruby to interact with them and version control them alongside our code. So for your next killer webapp, don't forget to add some killer devops to go along with it.

VA

NOAA

NASA