MySQL data sharding using Spock Proxy

Yesterday at the Silicon valley MySQL Meetup, Frank of Spock.com talked about Spock Proxy. Spock Proxy is a fork of MySQL proxy which has been built to meet the data sharding needs of Spock.com, the people search engine.

Here are some highlights:

  • Spock.com’s web interface is built on Rails and they use ActiveRecords as their O-R layer for MySQL data access
  • Spock has around 1,000 web servers using Rails and they connect to MySQL slaves and masters using Spock Proxy
  • Spock Proxy acts like a normal MySQL engine, except that it transparently talks to other MySQL servers. At spock they use 4 master and 4 slaves each having their own Spock Proxy.
  • The Web servers each have one connection open to the Spock Proxy while the proxy may have 100s of pooled connections
  • The Proxy tokenizes a SQL statement and figures out the target shard for the query. The query must have a shard_key. The shard_key is stored in a Universal DB which stores the dictionary of the partitioned tables, shard hostname/user/password, ranges and range for auto_incremented columns
  • It currently supports only range based partitioning — while a lot of partitioning is done based on hashing, but should not be a big deal to change
  • The current alpha version is very much suited to meet Spock’s internal needs, but I’m sure people will take this up to generalize
  • Unsupported query constructs (like inner queries, group by, multi-table joins) may not throw exceptions. DDLs are also not supported

 

Tags: , ,

  • pauly

    1000 Web Servers … wow
    either this is a lie or :
    1. Lack of knowledge on apache config
    2. they use pentium I for all of those servers

    Pretty impressive on mysql + proxy side, but the funny thing is their search variance is not really huge so it looks like they just using lots and lots of FULLTEXT SEARCH… which can be accomplished by using sphinx – duh.

  • pauly

    1000 Web Servers … wow
    either this is a lie or :
    1. Lack of knowledge on apache config
    2. they use pentium I for all of those servers

    Pretty impressive on mysql + proxy side, but the funny thing is their search variance is not really huge so it looks like they just using lots and lots of FULLTEXT SEARCH… which can be accomplished by using sphinx – duh.

  • Yeah, they said it’s Rails hence the 1000 number.

  • Yeah, they said it’s Rails hence the 1000 number.

  • Actually it’s 1000 server processes which run on 10 servers (so 100 on each server) more or less. We don’t use Apache here, we use Mongrel servers (Mongrel is a fast HTTP library and server for Ruby).

    Multi table joins are supported and work well so long as all the corresponding rows from the joined table(s) are in the same that shard (or the universal shard). Data can be across all shards in one query and will me merged into one results set.

    We don’t do any full text searches in MySQL at all. We have a different solution which is a another talk.

    • Dude A

      Mongrel is a SLOW HTTP Library for Rails :-)..sorry I had to write this…we use mongrel extensively as well..

  • Actually it’s 1000 server processes which run on 10 servers (so 100 on each server) more or less. We don’t use Apache here, we use Mongrel servers (Mongrel is a fast HTTP library and server for Ruby).

    Multi table joins are supported and work well so long as all the corresponding rows from the joined table(s) are in the same that shard (or the universal shard). Data can be across all shards in one query and will me merged into one results set.

    We don’t do any full text searches in MySQL at all. We have a different solution which is a another talk.

  • Pingback: Recent Links Tagged With "exceptions" - JabberTags()