MySQL data sharding using Spock Proxy

Yesterday at the Silicon valley MySQL Meetup, Frank of Spock.com talked about Spock Proxy. Spock Proxy is a fork of MySQL proxy which has been built to meet the data sharding needs of Spock.com, the people search engine.

Here are some highlights:

  • Spock.com’s web interface is built on Rails and they use ActiveRecords as their O-R layer for MySQL data access
  • Spock has around 1,000 web servers using Rails and they connect to MySQL slaves and masters using Spock Proxy
  • Spock Proxy acts like a normal MySQL engine, except that it transparently talks to other MySQL servers. At spock they use 4 master and 4 slaves each having their own Spock Proxy.
  • The Web servers each have one connection open to the Spock Proxy while the proxy may have 100s of pooled connections
  • The Proxy tokenizes a SQL statement and figures out the target shard for the query. The query must have a shard_key. The shard_key is stored in a Universal DB which stores the dictionary of the partitioned tables, shard hostname/user/password, ranges and range for auto_incremented columns
  • It currently supports only range based partitioning — while a lot of partitioning is done based on hashing, but should not be a big deal to change
  • The current alpha version is very much suited to meet Spock’s internal needs, but I’m sure people will take this up to generalize
  • Unsupported query constructs (like inner queries, group by, multi-table joins) may not throw exceptions. DDLs are also not supported

 

Tags: , ,