Inside GOOGLE
Jul. 6th, 2001 02:31 pmhttp://www.henzinger.com/monika/
Impressive
interview
"We have more than 1.3 billion in our index, which we completely update every 28 days. Google currently gets 100 million search queries a day, half are in English and half are in other languages. For each word, we store all the documents that contain it. So, when you type in the query terms, we can just go to the words, and do an intersection of the lists -- find the documents that contain all these words. That's also what lots of our future work concentrates on: trying to understand better what documents are about, and also trying to understand better what the user queries are about. The problem is that most user queries are very short -- two or three words -- so it's hard to figure out what they mean, even if you're a human being. Did you see the queries in the lobby? [In Google's lobby, a constantly scrolling list of queries projected on the wall behind the front desk shows what visitors to Google are searching for.]
What can you do beyond just using the keywords to give the users what they want?
You can look at the distribution of keywords in the document. You can look at the distribution of other words on the page. You can look at words on similar topics on the page. You can look at words that other people use to point to this page, and how related they are to the keywords -- things like that.
Impressive
interview
"We have more than 1.3 billion in our index, which we completely update every 28 days. Google currently gets 100 million search queries a day, half are in English and half are in other languages. For each word, we store all the documents that contain it. So, when you type in the query terms, we can just go to the words, and do an intersection of the lists -- find the documents that contain all these words. That's also what lots of our future work concentrates on: trying to understand better what documents are about, and also trying to understand better what the user queries are about. The problem is that most user queries are very short -- two or three words -- so it's hard to figure out what they mean, even if you're a human being. Did you see the queries in the lobby? [In Google's lobby, a constantly scrolling list of queries projected on the wall behind the front desk shows what visitors to Google are searching for.]
What can you do beyond just using the keywords to give the users what they want?
You can look at the distribution of keywords in the document. You can look at the distribution of other words on the page. You can look at words on similar topics on the page. You can look at words that other people use to point to this page, and how related they are to the keywords -- things like that.