Archive for August, 2008 vs. Delicious Toolbar buttons: Keeping logged in

Tuesday, August 19th, 2008

Delicious launched a new version a few days ago. All good, new features, etc. etc. But, I liked it’s simple interface — just Tag the things you visit and save it for future use. What else is needed when you wanna bookmark.

The new IE7 Toolbar buttons are way too much — like recently visited links, and a lot of new features which I don’t necessarily need. So, I wanted to keep using the old toolbar buttons — but it won’t work! Everytime I try to bookmark a page, it will force an account sign-in for a new session šŸ™

Here’s how to keep using the old IE7 Toolbar buttons without the pesky sign-in:

  1. Install the new toolbar buttons on IE7 (the install process itself is quirky). Do not uninstall the old toolbar buttons
  2. Sign in to using our account via the new IE7 buttons
  3. Hide the new buttons (forever)
  4. Keep using the old ones without any trouble


Apache logs, Load Balancer and X-Forwarded-For

Wednesday, August 13th, 2008

In most normal configurations Apache’s web server logs look like this: - - [13/Aug/2008:14:06:32 -0700] "GET /index.html HTTP/1.1" 200 21279 "" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20080702 Firefox/"

This is achieved by using the following log format in the apache virtual host config, which looks like this:

%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"

However, when you move apache behind a Load Balancer say F5 BigIP, the logs start showing the IP of the load balancer instead of the actual client IP. This may result to error in reporting viz. uniques, and other issues if your application relies on knowing the client IP. It is also possible that any geo based code may also stop working. There is an easy way to fix it.

Good news is that most of the modern load balancers already have a mechanism of sending the client IP. This is done by inserting an HTTP header X-Forwarded-For. It may look something like this: UA-CPU: x86
Accept-Encoding: gzip, deflate

First, Make sure that your Load Balancer is sending the X-Forwarded-For header. Drop this small php file on your server which is behind the load balancer and make sure that the IP of the machine from where you are connecting to shows up in the header as shown above.

$headers = apache_request_headers();
foreach ($headers as $header => $value) {
echo "$header: $value

If you do not see that header, change your load balancer settings (Google X-Forwarded-For for your specific load balancer) or better still call your Sys admin to do it for you.

Finally, modify your log directive in apache by replacing %h to %{X-Forwarded-For}i

%{X-Forwarded-For}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"

MySQL data sharding using Spock Proxy

Tuesday, August 12th, 2008

Yesterday at the Silicon valley MySQL Meetup, Frank of talked about Spock Proxy. Spock Proxy is a fork of MySQL proxy which has been built to meet the data sharding needs of, the people search engine.

Here are some highlights:

  •’s web interface is built on Rails and they use ActiveRecords as their O-R layer for MySQL data access
  • Spock has around 1,000 web servers using Rails and they connect to MySQL slaves and masters using Spock Proxy
  • Spock Proxy acts likeĀ a normal MySQL engine, except that it transparently talks to other MySQL servers. At spock they use 4 master and 4 slaves each having their own Spock Proxy.
  • The Web servers each have one connection open to the Spock Proxy while the proxy may have 100s of pooled connections
  • The Proxy tokenizes a SQL statement and figures out the target shard for the query. The query must have a shard_key. The shard_key is stored in a Universal DB which stores the dictionary of the partitioned tables, shard hostname/user/password, ranges and range for auto_incremented columns
  • It currently supports only range based partitioning — while a lot of partitioning is done based on hashing, but should not be a big deal to change
  • The current alpha version is very much suited to meet Spock’s internal needs, but I’m sure people will take this up to generalize
  • Unsupported query constructs (like inner queries, group by, multi-table joins) may not throw exceptions. DDLs are also not supported


Java Technologies at Yahoo!

Tuesday, August 5th, 2008

Yesterday, I attended a talk at SDForum presented by Dean Yu and Joshua Blatt of the Java platform team at Yahoo! The Java platform team centralizes the Java efforts for Yahoo’s non-open source efforts. I say non-open source asĀ the platform teamĀ covers everything except things like Hadoop, etc. which are in the public domain.

Java as a technology is not native to Yahoo! The platform at Yahoo! was primarily C/C++ and PHP at the frontend (mostly). Java came through several acquisitions which were running Java stack, notably:

  • 1998 Classic Games, Sprtasy
  • 2002 Hotjobs
  • 2003 Overture (Altavista)
  • 2004 Kelkoo, Musimatch

Here are the raw bytes from the session:

  • Tomcat + jboss efforts for securing them
  • Mostly LAMP stack at Yahoo!
  • Rate limiting using Apache modulesĀ 
  • Runs apache in multiple process mode
  • Y! data streams for keeping application specific stores and pushing data around (Yahoo’s proprietary message bus like implementation)
  • Integration using JNI to C++ code using Swig for wrapper generation
  • All security related code is in C++; helps maintain a single language code-base. Hence, wide JNI use from app tier
  • Uses IPC BridgeĀ for coarse grained calls to non-thread safe libraries (JNI has multi-threading issues)
  • Group dedicated to creating JNI wrappers of native code
  • JNI performance FUD
  • Java to Native C++ code via JNIĀ < 20 nano seconds (Cool!) compare this with Java to Java < 1 nano seconds. Big difference but nano seconds compared to network latencies of seconds
  • String functions to native code via JNIĀ take > 3ms coz of UTF-16 to UTF-8 character conversion issues
  • JNI Multi-threading issues are solved by IPC bridge shared memory and TCP over loopback
  • JSVC Apache commons daemon for loading privileged data during Tomcat startup and then running in low privilege mode
  • Like Multi-process Apache, a new architecture for multi-process Tomcat being baked
  • Software project management using Maven (Maven — awww!)
  • Automatic builds using Cruise controlĀ and Hudson
  • RPM-based software deployment to 100s of nodes