snyders: (Default)
[personal profile] snyders
http://www.henzinger.com/monika/
Impressive
interview
"We have more than 1.3 billion in our index, which we completely update every 28 days. Google currently gets 100 million search queries a day, half are in English and half are in other languages. For each word, we store all the documents that contain it. So, when you type in the query terms, we can just go to the words, and do an intersection of the lists -- find the documents that contain all these words. That's also what lots of our future work concentrates on: trying to understand better what documents are about, and also trying to understand better what the user queries are about. The problem is that most user queries are very short -- two or three words -- so it's hard to figure out what they mean, even if you're a human being. Did you see the queries in the lobby? [In Google's lobby, a constantly scrolling list of queries projected on the wall behind the front desk shows what visitors to Google are searching for.]
What can you do beyond just using the keywords to give the users what they want?
You can look at the distribution of keywords in the document. You can look at the distribution of other words on the page. You can look at words on similar topics on the page. You can look at words that other people use to point to this page, and how related they are to the keywords -- things like that.

Profile

snyders: (Default)
snyders

December 2025

S M T W T F S
 123456
78910111213
14151617181920
21222324252627
282930 31   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 13th, 2026 01:49 pm
Powered by Dreamwidth Studios