Hypertable
Outline of The Google Software Stack
Filed in archive Concepts by Mateusz Berezecki on December 11, 2008

Introduction




Outline of The Google Software StackThe beginning of the research on software scalability spurred the research on fault tolerance and management technologies. After all, if you have a lot of software running on hundreds or even thousands of machines you have to know what's going on inside this set of machines, and when things break, you want the software to take care of it. These concepts, scalability, fault tolerance and software management, are so inherently tied together that it's impossible to talk about the one without talking about the others. This line of thinking serves as a very good basis for explaining what a Google Software Stack is.




Let's begin by naming the components and then describing their roles, as the naming convention and basic knowledge is going to be used in the every post on this blog from now on. The Google Software Stack is a collection of five main components, which serve as basic building blocks for most of the Google systems. These components are:

  • Google File System (GFS)

  • BigTable

  • Work Queue

  • MapReduce library

  • Sawzall.


Let's start with the one from the top of the list.

Google File System




This is the storage component which is de facto a distributed and fault tolerant file system for storing everything that needs to be persistent. This file system was designed for very specific workloads and some of it's operational semantics are non-standard ones. For example, the basic write operation is an append operation as it is easier to append something at the end of the file rather than moving data inside the file.



The GFS, as a distributed file system, consists of 2 types of servers:

  • master(s)

  • workers



The master node acts as a network coordinator informing where to get what kind of data, as by the nature of the distributed file system, some of the files are split and stored across many machines. The master keeps track who to ask for what kind of data, whereas the workers are storing and serving the data upon request. The fault tolerance aspect is handled by replicating data to many machines to increase the data avaialability, as one of the assumptions made during the development of the GFS is that machines fail, and they do so often. This basic replication approach turned out to be pretty useful in practice. The GFS is used by many other pieces of the Google Software Stack. Let's head on to the database component first.

For more information on GFS please refer to: The Google File System paper.

BigTable




The BigTable is a distributed column oriented data store which might be thought of as a database, but not a relational one. This is a very special database that stores data in the form of columns, rather than rows as most of the relational databases do. This allows it to be very fast and some studies show that column oriented databases can be even 300x times faster than row oriented ones in certain scenarios - but most of this performance comes at a price. And the price here is lack of mechanisms common to relational databases.


The BigTable is different from what most of the people are used to, as for example it supports moving between the values of tables back and forth in time, having a dynamic table layout by supporting adding and removing new columns on a row basis, rather than table basis, etc. The underlying file system for BigTable is GFS and as such, BigTable is distributed as well. The distribution is achieved by splitting tables into tablets, which in turn get stored on different GFS nodes. As the GFS handles data availability by itself, this makes the BigTable have tablets replicated as well and achieving the desired fault tolerance. Real life use cases for BigTable include Google Maps and Google itself.



For more detailed and technical description of BigTable please refer to the BigTable paper.

Work Queue




Work Queue is the least documented component of the Google Stack. Some of the published presentations and interviews given by Googlers reveal the name of this component. After carefully reading publications it is easy to give a high level overview of Work Queue. It turns out a Work Queue is a distributed batch processing system. You could sort of loosely think of it as a distributed shell with a job scheduling system, except that it is accessed programmaticaly via an API. With Work Queue you could schedule weekly maintenance jobs to clean unused files, etc. and it is relatively simple conceptually. I suppose that this system is also taking care of low level cluster management tasks.



MapReduce




MapReduce alone is a concept deserving a series of posts describing it in detail, but a general description could be one given by it's authors - a simplified data processing on large clusters. MapReduce is a C++ library which can be used by the applicationg writers at Google to ease the pain of distributing computations. This library operates on the task specifications passed to it by the applications, and then distributes the computation in the cluster. The task specifications describe what kind of computations are to be performed, input arguments, input files/tables/etc., memory constraints during computations, etc. After the computations are specified, the mapreduce library utilizes the functionality of a Work Queue system to distribute the computation in the cluster. The principle in use here is that moving computation is cheaper than moving data. When MapReduce was created and the production indexing system was moved to this new computation model, it consisted of approximately 700 lines of code, and the indexing task was taking about 6 hours. MapReduce tasks can be used for indexing, doing database queries, machine learning, analytical processing, etc. MapReduce was inspired by the functional programming operations map and fold and for those of you who are interested in it in more detail please refer to MapReduce paper.



Sawzall




Writing C++ programs for even simple task was not quick and easy, and so it happened that the engineers at Google invented Sawzall. A language for quick and easy MapReduce task specification. Programs written in Sawzall are compiled to the native machine code , distributed and executed pretty much the same way MapReduce tasks.

The more information on Sawzall can be located at Sawzall paper.



Summary



All of the above components are collectively called Google Software Stack and are most important and basic tools in use at Google, moreover they proved to be extremely robust and scalable and could be considered one of the better examples of tackling the problem of scalability and availability. The problem that is going to be even more visible nowadays, when big websites are sparkling all around the web. The problem is there's few open source solutions available that can replace the google stack for mere mortals should they needed it.

Permalink: Outline of The Google Software Stack
Tags: google  stack  mapreduce  hypertable  mapreduce  workqueue  bigtable  google  file  system  software  file+syst 
Trackback: http://publish.creative-weblogging.com/publish/mt-tb.pl/139408
img Addthis img Ask img Blinklist img del.icio.us img Digg img Fark img Facebook img Google img Lycos img Ma.gnolia Add this page to Mister Wong Mr Wong img Netscape img Netvousz img Newsvine img Reddit img StumbleUpon img Slashdot img Tailrank img Technorati img Wink img Yahoo

Vote for Outline of The Google Software Stack:

  • Currently 9.67/10
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
Rating: 9.67 out of 3 vote(s) cast.
 
Subscribe
Share It
RSSrss
See all blog subscribe options
Google google
What is RSS?
Yahoo! yahoo
Addthis Subscribe using any feed reader!
Bloglines Bloglines
Newsletter

TwitterFollow us on Twitter!