Wednesday, October 27, 2010

Doing Things in Parallel

Yesterday I had a busy day at the office. I had to migrate data from one computer to another. It was a lot of data and I knew it was going to take all day. Unfortunately it couldn't all be done at once and had to be done in chunks. As I wanted to be as efficient as possible, I looked at how I could do things in parallel.

The first process was to export a subset of data out the database on the first machine. That took about 5 minutes. Next was to compress the exported data and that took a minute. The middle step was to transfer the compressed data to the new machine and that took the bulk of the time at 20 minutes. Once on the new machine, the data was uncompressed (another minute) and imported into the new database (another 5 minutes). If you add up the time for the entire process, it amounts to thirty-two minutes. Unfortunately there were about 20 data segments to move. Doing it 1 segment at a time meant I would be at it for over 10 hours.

Having done this once or twice before, I knew that I could start the process on future data segments before the earlier ones completed. Doing so would reduce the time to complete the job from over 10 hours to less than 7. Of course that meant I would be busy the entire time. However it was worth it to get the task done more quickly.

Now the question is: What does the process look like? Easy, the first segment requires that I export the data and compress it. That took 6 minutes. Then I needed to transfer it to the new machine. During that transfer, I could start the next segment and eliminate 6 minutes for each of the remaining segments. I could also start the data import of the first segment while the following segment was transferred. That eliminated another 6 minutes per segment for 19 of them. If you do the math, there are 6 minutes at the beginning and 6 minutes at the end. Then there are 20 segments at 20 minutes each to transfer. That amounts to 6 + 6 + (20 x 20) or 412 minutes, which amounts to 6 hours and 52 minutes.

That is a nice reduction but you can see that it is not 50% of the original 10 hours and 40 minutes. That is important to remember the next time that someone tells you a computer with two processors can do twice as much. While it is a good goal, it isn't always practical.

No comments:

Post a Comment