On 2011年03月24日, in highscalability, tips, by netoearth

根据Mozilla提供的下载地图,Firefox 4发布24小时还没到,下载量已经突破了564万,是IE9的二倍以上。Firefox 3的24小时下载量曾创下800万次的记录,但那次是Mozilla大力推广“下载日”的结果。统计网址是:,基于HTML5技术,界面很华丽,但我们关心的是他的这个统计系统是怎么工作的,简单介绍如下:

Mozilla官方博客的介绍下 载地图工作原理:多个托管download.mozilla.org的负载均衡服务器集群,设置将下载请求日志发送到一台远程syslog服务器,这台服 务器通过SQLStream过滤出非有效下载请求,用MaxMind GeoIP对IP地址进行定位,聚合下载次数,地址和时间戳等。

I’m sure you’ve heard by now, Firefox 4 is officially released.  The Metrics team has done our part by working with webdev to release a new real-time download visualization:

World map visualizing real-time Firefox 4 downloads


The basic backend flow is like this:

  1. The various load balancing clusters that host are configured to log download requests to a remote syslog server.
  2. The remote server is running rsyslog and has a config that specifically filters those remote syslog events into a dedicated file that rolls over hourly
  3. SQLStream is installed on that server and it is tailing those log files as they appear.
  4. The SQLStream pipeline does the following for each request:
    1. filtering out anything other than valid download requests
    2. uses MaxMind GeoIP to get a geographic location from the IP address
    3. uses a streaming group by to aggregate the number of downloads by product, location, and timestamp
    4. every 10 seconds, sends a stream of counter increments to HBase for the timestamp row with the column qualifiers being each distinct location that had downloads in that time interval
  5. The glow backend is a python app that pulls the data out of HBase using the Python Thrift interface and writes a file containing a JSON representation of the data every minute.
  6. That JSON file can be cached on the front-end forever since each minute of data has a distinct filename
  7. The glow website pulls down that data and plays back the downloads or allows you to browse the geographic totals in the arc chart view

Some links for people interested in the code:

Tagged with:  

Comments are closed.