Servers | Mobomo

Read more about Load User Status... What’s your Limit?

MAoN8zO2f-DiYAqhV8BZPY8i1cftBP0tVFuNxi9qj2M

Clients come to us with an array of problems that need innovative solutions. Recently one of our clients came to us with an issue, their website was not configured to auto scale. To those of you that are not familiar, auto scaling ensures that your platform has the correct instances of servers to account for traffic or load. Due to the industry our client is in, breaking news and events can occur instantly which results in an increased amount of traffic to their website at any given time. Their site could be manually scaled; however, this retroactive intervention isn't always fast enough. We were able to create a solution to account for sporadic increases in traffic to their site that would be successfully maintain platform uptime.

In order to do this, we used a multi-server locust setup to run load tests against which allowed us to test multiple different slow and fast test scenarios. These different tests allowed us to see how much further we could scale instances to maintain acceptable uptime rates then we ever could do before. Once we decided on the type of tests that we wanted to run, it was then time to identify our “standard” for how many users we wanted to test at once. We ultimately decided to test for 100,000 concurrent users to account for usage traffic seen hitting our CDN during high traffic events. We then collected a list of our 2500 most requested queries for our test users to use during the tests.

We experimented with the slow test first, which meant that we gradually added users in a more predictable and normal situation. This test added servers as needed and overall there was no immediate impact to the speed that the content was handled. Overall pretty dull…Which was wonderful! 100k users a minute and all statistics showed a happy healthy site.

The fast test allowed thousands and thousands of users to visit the site within a few minutes. During this test we went from 0 to 100k users in roughly six minutes. This caused the site to serve 500's while the servers were being added. Our 500’s are mostly absorbed at our cache layer which is where we served stale content until the requests were fulfilled. Our findings proved that when we increased the speed of users it caused a tremendous amount of stress on the server. Our auto scale group added two servers every three minutes until we got to nine servers, through prior testing we knew this was likely the appropriate amount of servers that could handle the load. Once we reached around five servers the 500's disappeared. After we reached nine servers the request queue cleared up latency between our cache and application layer looked nominal. All in all this process took about 20 minutes to become "normal" at 100k users per minute.

During the peak of the quick test when the servers were under the most pressure we decided to do the unthinkable and clear our drupal site cache. I know, sounds like a great idea, but we love a challenge. We pressed the clear cache button and waited 10 to 15 minutes but to our surprise nothing happened.. the drama that we were anticipating never played out. The application saw a slight jump in latency and the statistics were raised or lowered by about 5-10% for about one minute and then returned to normal. That was it...no fireworks, only the lingering taste of sweet sweet success! This is due in large part to the site being anonymous, but cache policies still require frequent invalidation.

Overall both tests to 100k users is above and beyond the highest traffic we have seen in an hour timespan due to the CDN layer. We were able to successfully complete the 100K users a minute test, which we roughly estimated at one request every 10 seconds. We found that the results of each test to be relevant because of the scale our client can now operate; they can go from a site having minimal traffic to 50x that amount of traffic in a short amount of time. Problem solved!

Backups

When you make a coding mistake, you can revert to a good known commit. But when disaster wrecks havoc with your data, you better have an offsite backup ready to minimize your losses. Enter the backups gem, a DSL for describing your different data stores and offsite storage locations. Once you specify what data stores you use in your application (MySQL, PostgreSQL, Mongo, Redis, and more), and where you want to store it (rsync, S3, CloudFiles), Backup will dump and store your backups. You can specify how many backups you'd like to keep in rotation, and there's various extras like gzip compression, and notifiers for when backups are created or failed to create.

Cron Jobs

Having backups configured doesn't make you any less absent minded about running your backups. The first remedy that jumps to mind is editing your crontab. But man, it's hard to remember the format on that sucker. If only there was a Ruby wrapper around cron... Fortunately there is! Thanks to the whenever gem, you can define repetitious tasks in a Ruby script.

Cloud Services

With the number of cloud services available today, it's becoming more common to have your entire infrastructure hosted in the cloud. Many of these services offer API's to help you tailor and control your environments programmatically. Having API's is great, but it's tough to keep them all in your head.

The fog gem is the one API to rule them all. It provides a consistent interface to several cloud services. There are specific adapters for each cloud service. By following the Fog interface, it makes it really easy to switch between different cloud services. Say you were using Amazon's S3, but wanted to switch to Rackspace's CloudFiles. If you use Fog, it's as simple as replacing your credentials and changing the service name. You can create real cloud servers, or create mock ones for testing. Even if you don't use any cloud services, fog has adapters for non-cloud servers and filesystems.

Exception Handling

Hoptoad is a household name in the Ruby community. It catches exceptions created by your app, and sends them into a pretty web interface and other notifications. If you can't use Hoptoad because of a firewall, check out the self-hostable Errbit.

Monitoring

When your infrastructure isn't running smoothly, it better be raising all kinds of alarms and sirens to get someone to fix it. Two popular monitoring solutions are God, and Monit. God lets you configure which services you want to monitor in Ruby, and the Monit gem gives you an interface to query services you have registered with Monit. If you have a Ruby script that you'd like to have running like a traditional Unix daemon, check out the daemons gem. It wraps around your existing Ruby script and gives you a 'start', 'stop', 'restart' command line interface that makes it easier to monitor. Don't forget to monitor your background services, it sucks to have all your users find your broken server before you do.

Staging

Your application is happily running in production, but all of a sudden, it decides to implode on itself for a specific user when they update their avatar. Try as you might, you just can't reproduce the bug locally. You could do some cowboy debugging on production, but you'll end up dropping your entire database on accident. Oops.

It's times like these that you'll be thankful you have a staging environment setup. If you use capistrano, make sure to check out how to use capistrano-ext gem, and its multi-stage deploy functionality. To reproduce your bug on the same data, you can use the taps gem to transfer your data from your production database to your staging database. If you're using Heroku then it's already built-in.

Before you start testing your mailers on staging, do all of your users a favor and install the mail_safe gem. It stubs out ActionMailer so that your users don't get your testing spam. It also lets you send emails to your own email address for testing.

CLI Tools

Thor is a good foundation for writing CLI utilities in Ruby. It has interfaces for manipulating files and directories, parsing command line options, and manipulating processes.

Deployment

Capistrano helps you deploy your application, and Chef configures and deploys your servers and services. If you use Vagrant for managing development virtual machines, you can reuse your Chef cookbooks for production.

Conclusion

All of these gems help us maintain our application infrastructure in a robust way. It frees us from running one-off scripts and hacks in production and gives us a repeatable process for managing everything our app runs on. And on top of all the awesome functionality these tools provide, we can also write Ruby to interact with them and version control them alongside our code. So for your next killer webapp, don't forget to add some killer devops to go along with it.

VA

NOAA

NASA