Google’s new search index:caffeine

Instead of searching the Web by layers or importance, Google new search index will do it constantly in parallel. This would bring newly written articles and tweets in the result pages in an instant.

The old search engine did not search the current version of the internet. It rather searched an indexed version that Google had stored. The new search index constantly scans the web bits by bits, and adds all the information it finds to its index.

To put it in perspective:

Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s