Jun. 8th, 2001

snyders: (Default)
Nice survey (1998?).

Sense Disambiguation is of little help to IR based on keywords. Information at the search engine must itself be reorganized/categorized based on the knowledge extraction.

Word in context idea, Hidden Markov Models idea (success for parts of speech tagging). Raw corpora - allows sence discrimination . Seed definitions expansion (Yarowsky 1993, Rivest'87 decision lists technique).
snyders: (Default)
Noticed that news headers tend to have similar syntactic structure:

[Acting Object] [Action] [Subject of Action]

Examples: (Lenta.ru)
Â ÑØÀ íà÷àòî ðàññëåäîâàíèå ïî ôàêòó îòêëþ÷åíèÿ ñâåòà â Ïàðòèçàíñêå
Ðîññèþ âîçãëàâèë êîììóíèñò
Ïàðëàìåíò Êèïðà ïîëó÷èë çàåì äëÿ ïåðåñåëåíèÿ æèòåëåé Êðàéíåãî Ñåâåðà
Ëèäåð òàëèáîâ ïðåäëàãàåò îòäàòü ïðèçûâíèêîâ â ñîáñòâåííîñòü ãîñóäàðñòâà
Öåðêîâü ïðåäëàãàåò èñïîëüçîâàòü ìÿñî îïîññóìîâ â êà÷åñòâå êîðìà äëÿ ñîáàê
Íîâîçåëàíäñêèé ôåðìåð ïîñòðîèò áóôåðíóþ çîíó âäîëü ãðàíèöû çàïàäíîãî áåðåãà
Äæîðäæ Áóø ñíèçèë öåíû íà àâèàáèëåòû äëÿ ãîñòåé ×óêîòêè
Ðîìàí Àáðàìîâè÷ ñäåëàë âñåõ àìåðèêàíöåâ ÷óòü-÷óòü áîãà÷å
Ãëîêàÿ êóçäðà øòåêî áîäëàíóëà áîêðà ....

CNN:
FBI director seals historic UK poll win
Blair to resign House seat to run for governor
Ex-prosecutor may pressure Bush on climate change
Space shuttle workers prepare to strike on Saturn
snyders: (Default)
Motivated by this discussion, here is a pile of papers on page-ranking and web graph structure. ... Drop me a line if you know good papers on this subject ...

Google (1998): The PageRank Citation Ranking
This paper describes PageRank, a method for rating Web pages objectively and mechanically, effectively measuring the human interest and attention
devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages.

What can you do with a Web in your Pocket? (1998)

It is currently possible for a university research project to store and process the entire World Wide Web.Since there is a limit on how much text humans can generate, it is plausible that within a few decades one will be able to store and process all the human-generated text on the Web in a shirt pocket.

Context in Web Search, Steve Lawrence 2000 BS
The Web as a graph: measurements, models, and methods(1999)
We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web.

The Shape of the Web and Its Implications for Searching the Web (2000)

Recent work in the feld has focused on detecting identifiable patterns in the web graph and exploiting this information to improve the performance of search..


Enough for now.

Profile

snyders: (Default)
snyders

December 2025

S M T W T F S
 123456
78910111213
14151617181920
21222324252627
282930 31   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 13th, 2026 07:30 am
Powered by Dreamwidth Studios