Wednesday, March 27, 2013

Performance Tuning in the Cloud

I have been spending the last couple of days working on performance tuning for our big project at work. We are moving the software from ancient hardware and migrating it into our own private cloud. What is the difference? With the old system, we had a fixed number of servers and had to manually spread pieces of the application across them. With our own private cloud, we have a number of servers that we make look like a single machine. Then we divide up that machine so that each piece of the application looks like it is running on its own server. One thing that we are discovering is that performance problems can be much more difficult to solve.

Yesterday we ran into an interesting problem that we spent today figuring out. In the video gaming world, people don't like it when the game crashes. To combat this issue, we run redundant systems and so there are always two of everything. The problem we came across was that one server was processing more data than the other, even though they were configured identically. Everyone looking at the problem suggested going through both configurations line by line. We did and verified the two systems were set up identically. Then we all sat around scratching our heads wondering what to look at next. We tried twisting some nobs and pushing various buttons only to have more questions after looking at the performance numbers.

It turns out that the process of taking a bunch of machines and combining them together to form this computing cloud obfuscates what is happening on the raw hardware underneath. To figure out what is really taking place, one has to look there. We are still in the midst of improving performance but now have a much better handle of what is going on. Amazingly this new system already is running circles around the old one, we just want it to run faster.

No comments:

Post a Comment