Medieval had its own quantitative problem of abstracts and metaphysics: the famous scholastic argue over „How many angels could dance on the head of a pin?“. Should we know the answer, combined with the information of total number of angels, we could easily calculate the number of pins to accommodate all angels. Modified into today´s world, the argument might go like: „How many servers would you need to accommodate all the information (or correctly data) of this world?“
You might find this question ridiculous?
Maybe. We all might feel the answer – how ever interesting – is of very low practical value for most of us. However, there are people and companies who do not only contemplate about this question. Who go and materialize it.
Should we talk about those who are ambitious enough to try to store, sort and retrieve all-the-information-of-this-world? Than we do not have too many subjects to talk about. Who could we name first..? Well, who else than the notoriously discussed, loved and hated, Google
Back to our angels – what if we ask another question: „How many pins you need to retrieve arbitrary angel in less then 500 milliseconds? Why half second? Does it not sound familiar to you…?
Web |
Results 1 – 10 of about 935 for for how many angels could dance on the head of a pin (0.41 seconds) |
Google does not officially disclose the number of its server. We might only guess. Hunderts of thousands? Million? More? Lets try to calculate:
Datacenter in Lenoir, NC, stretches over 100 000 sq ft, enough space to accommodate 5 000 racks. If the most commonly used are 2U boxes, then 100 000 servers fit easily in just this one DC. Google runs 36 DCs all over the world. Sure, not every DC matches the size of the Lenoir DC, but still it gives us some rough estimate about the total server hosting capacity available to Google.
Well, lets talk about money. Million of server. Isn´t that a bit big piece of cake even for someone like Google?
According to the shareholders 1Q 2008 report, Google spent on IT assets $355 734 000 USD. This number covers all HW, from laptops over desktops and printers, to servers.
Given the Google´s preference to low budget, home conceived HW and SW, it could be a lot purchased for this one billion USD a year. A lot more than other big companies can buy despite all the whole sale discounts they can get.
Just to compare the absolute numbers, the top server producers in the world (number of servers sold in 2006) HP – 600 000, DELL – 460 000, IBM – 300 000.
Once more, within half second …
How many server would Google need to keep the typical query response time under 500 milliseconds?
Given the total number of all inhabitants of Earth 6 707 035 000 (Wikipedia.org, July 1st,2008), in case Google had 1 mio servers dedicated just to the core business – e.g. processing search queries, it makes 6 707 people to one server. If the processing time should stay bellow 500 milliseconds, during the day (2*60*60*24 = 172 800) each person on Earth could send 25 queries every second the whole day long (172 800/6 707=25,764).
How much of data Google stores?
According to officially disclosed information Google uses for data storage an internally build systém called BigTable. In 2006 the sizes of Google data files were:
Google Data In 2006
Data
|
Size
(TB)
|
Crawl Index
|
800
|
Google Analytics
|
200
|
Google Base
|
2
|
Google Earth
|
70
|
Orkut
|
9
|
Personalized
Search
|
4
|
(Source: Bigtable: A Distributed Storage System for Structured Data)
We can see that already two years back Google needed a total of 1 Petabyte of storage capacity.
If we could rely on the info leaked out of the company, typical server Google used until 2006 was a PC on x86 platform, internaly built, running Linux on 80GB HDD and 2GB of RAM.
Considering these server parameters and neglecting all data overhead, Google would do in 2006 with just 120 000 of these machines.
From another info leakages we learned about the server number growth in some years:
2000 – 6.000 servers
2003 – 15.000 servers
2005 – 200.000 servers
2006 – 450.000 servers
If we approximate the growth factor in time, we get for year 2008 a value of some 2.2mio servers:
Year |
Number of servers |
Growth factor |
2000 |
6 000 |
|
2001 |
8 000 |
133,33% |
2002 |
11 000 |
137,50% |
2003 |
15 000 |
136,36% |
2004 |
50 000 |
333,33% |
2005 |
200 000 |
400,00% |
2006 |
450 000 |
225,00% |
2007 |
1 000 000 |
222,22% |
2008 |
2 200 000 |
220,00% |
(Green – Google leaked data, Violet – approximation, guess)
Even Google can touch its limits
Is there any limit for Google data greed? Yes, it is. The limit lies in the very architecture of the BigTable. It is a distributed database for structured data. Even a brief description of this product extends far beyond this article, lets only state the facts here: BigTable can accommodate 261 Bytes, this is 2 305 843 TeraByte, or 2 305 PentaByte. So if Google would not change its affinity to low-cost home made HW, then it could use up to 20 million of these servers.
Ing. Karel Umlauf, COOLHOUSING.NET