Google search gets a shot of caffeine with new indexing system

June 9th, 2010 by Chris Hogg Leave a reply »
Search engine giant Google announced it has updated the way it crawls and indexes the Web. The company says its new method called “Caffeine” provides better results for those searching the Web.

In August 2009, Google gave a teaser about what the world should expect with the launch of Caffeine. Yesterday the company confirmed the roll-out was complete and Google search is now updated.

In a post on the company’s official blog, Google software engineer Carrie Grimes said Caffeine provides 50 percent “fresher” results for Web searches when compared to the last method of indexing.

For those unfamiliar with “indexing” the process takes place when search engines crawl the Web to find new pages, as well as updates on existing web pages. For a company such as Google, it needs to ensure it has the latest and best results at all times in order to stay competitive with rivals such as Microsoft’s Bing or Yahoo.

As Google explains: “When you search Google, you’re not searching the live web. Instead you’re searching Google’s index of the web which, like the list in the back of a book, helps you pinpoint exactly the information you need.” Here is more info on how Google search works.

According to recent numbers published by ArsTechnica, Google currently makes up about 70 percent of the search market in the U.S., while Yahoo takes about 25 percent and Bing 9 percent. From a global perspective, Google owns about 85 percent market share, whereas Yahoo attracts just over 6 percent and Bing under 5 percent.

Google says Caffeine provides users with the largest collection of web content the company has ever offered.

“Whether it’s a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before,” Grimes said in the company’s blog post.

Google said its new search indexing system was built because the Web is growing in size and new types of content including video, images, news and real-time updates are playing an increasingly important role in the Web’s information space. Today’s Web, Google says, is richer and more complex than ever before.

“In addition, people’s expectations for search are higher than they used to be,” said Grimes. “Searchers want to find the latest relevant content and publishers expect to be found the instant they publish.” Google’s previous index relied on several layers of indexing where some were updated more frequently than others.

As Google indicates:

The main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.

With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever beforeā€”no matter when or where it was published.

Google says Caffeine allows the company to index pages on a massive scale (every second, Caffeine processes hundreds of thousands of pages at once). To put it into real-world terms, Google says if Caffeine were a stack of paper, it would grow three miles taller every second.

And from a storage standpoint, Google says Caffeine takes up almost 100 million gigabytes of storage in one database, adding hundreds of thousands of gigs more every day. That is the equivalent of 625,000 of the largest iPods worth of information.

“We’ve built Caffeine with the future in mind,” Grimes said. “Not only is it fresher, it’s a robust foundation that makes it possible for us to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you.”

Be Sociable, Share!
Advertisement

Leave a Reply

More in internet (51 of 57 articles)