Archive for May, 2007

Microsoft’s desperate acquisition of aQuantive: What they could have done instead

Tuesday, May 22nd, 2007

Are 70,000+ Microsoft employees that useless? Or the 1,000+ Senior Management has no clue? Can’t believe that Microsoft would pay 85% premium and fork out $6bn dollars! They could have bought for that money and saved $500 million to acquire several smart startups in search and online advertising space. Microsoft could have successfully negated Google’s further foray into the growing on-demand business and may have brought raw energy to Redmond.
However, by the time, Microsoft figures out the monetization model for Search and online ad market, Google would start making significant inroads into the on-demand business and productivity applications. Why don’t they focus on solving problems? Well, it’s the big company syndrome; I’m a manager-I don’t solve problems-I get vendors/acquire companies. What a pity.
Does any B-school offer a course on intra-preneurship? Take a leaf from Google or even HP has a lesson to offer from their recent release of the NeoView Business Intelligence product.

MYSQL on EC2: Data backup strategy using replication on S3

Tuesday, May 1st, 2007

Few EC2 and S3 facts:
1. S3 (Amazon’s Storage in the Cloud Infrastructure) cannot be natively mounted on EC2 (Amazon’s Cloud Computing Infrastructure).
2. The maximum size of an “object” (atomic unit of a stored data element on S3) is 5GB
3. Multiple EC2 instances (a virtual machine having a horsepower of 1.7GHz, 160 GB ephemeral HDD and 1500 MB RAM) can be booted on demand
I run couple of EC2 instances in the cloud. The backup strategy (call it layman’s strategy or lame strategy!) so far has been a) Freeze the database b) Break the data files into 5GB chunks c) Move the chunks (or objects) onto S3 d) Unfreeze the database e) Repeat.
The above approach brings the database offline for at least 4-6 minutes for every cycle. So, here’s a new strategy I’m planning to test. The pseudo-algo is as follows:
1. Create an AMI which has a pre-configured mysql slave
2. Boot a new instance using the AMI created in #1 above, whenever a backup is desired
3. Read objects from S3 (if any) and coalesce them to rebuild data file
4. Create SSH Tunnel to the master
5. Start slave to catch up with replication
6. Stop Slave after some time
7. Break the fattened data file into chunks or objects (S3 limitation of 5 GB)
8. Move the objects to S3
9. Shutdown the instance
10. Go to 2
The new algo requires quite a bit of automation and there are some unanswered questions, which I’m sure could be figured out after the first trial. The following areas need to be automated:
1. SSH Tunneling between slave and master EC2 instances. The trick is to figure out the host name of the newly booted instance and then tunnel from it.
2. Client scripting for booting and executing the scripts on the slave. I think the best way to address this could be by running a cron job on the master server, which initiates and completes the backup process.
3. Prevention of data corruption. Moving large objects to/from S3 could have it’s own issues. Need to figure out whether the REST API call will guarantee data consistency.