Java Technologies at Yahoo!

Tuesday, August 5th, 2008

Yesterday, I attended a talk at SDForum presented by Dean Yu and Joshua Blatt of the Java platform team at Yahoo! The Java platform team centralizes the Java efforts for Yahoo’s non-open source efforts. I say non-open source as the platform team covers everything except things like Hadoop, etc. which are in the public domain.

Java as a technology is not native to Yahoo! The platform at Yahoo! was primarily C/C++ and PHP at the frontend (mostly). Java came through several acquisitions which were running Java stack, notably:

  • 1998 Classic Games, Sprtasy
  • 2002 Hotjobs
  • 2003 Overture (Altavista)
  • 2004 Kelkoo, Musimatch

Here are the raw bytes from the session:

  • Tomcat + jboss efforts for securing them
  • Mostly LAMP stack at Yahoo!
  • Rate limiting using Apache modules 
  • Runs apache in multiple process mode
  • Y! data streams for keeping application specific stores and pushing data around (Yahoo’s proprietary message bus like implementation)
  • Integration using JNI to C++ code using Swig for wrapper generation
  • All security related code is in C++; helps maintain a single language code-base. Hence, wide JNI use from app tier
  • Uses IPC Bridge for coarse grained calls to non-thread safe libraries (JNI has multi-threading issues)
  • Group dedicated to creating JNI wrappers of native code
  • JNI performance FUD
  • Java to Native C++ code via JNI < 20 nano seconds (Cool!) compare this with Java to Java < 1 nano seconds. Big difference but nano seconds compared to network latencies of seconds
  • String functions to native code via JNI take > 3ms coz of UTF-16 to UTF-8 character conversion issues
  • JNI Multi-threading issues are solved by IPC bridge shared memory and TCP over loopback
  • JSVC Apache commons daemon for loading privileged data during Tomcat startup and then running in low privilege mode
  • Like Multi-process Apache, a new architecture for multi-process Tomcat being baked
  • Software project management using Maven (Maven — awww!)
  • Automatic builds using Cruise control and Hudson
  • RPM-based software deployment to 100s of nodes