Google Tech Talks
December, 14 2007
ABSTRACT
We present an algorithm, WITCH, that learns to detect spam hosts or pages on the web. Unlike most other approaches, it simultaneously exploits the str...
Google Tech Talks December, 14 2007
ABSTRACT
We present an algorithm, WITCH, that learns to detect spam hosts or pages on the web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph, as well as page contents and features. This work is a collaboration with Olivier Chapelle and Carlos Castillo, both of Yahoo! Inc.
Like to rate videos and let people know what you think?
Automatically share your ratings, favorites, and more on Facebook, Twitter, and Google Reader with YouTube Autoshare.
Autoshare makes certain YouTube activities public on the services you choose. Select only the services you are comfortable with - like Facebook, Twitter, or Google Reader - to let your friends know what you like on YouTube. You can turn Autoshare off at any time.
Like to share videos with friends?
Automatically share your ratings, favorites, and more on Facebook, Twitter, and Google Reader with YouTube Autoshare.
Autoshare makes certain YouTube activities public on the services you choose. Select only the services you are comfortable with - like Facebook, Twitter, or Google Reader - to let your friends know what you like on YouTube. You can turn Autoshare off at any time.
Good Video, we don't get to see enough webspam detection lectures online (I guess for obvious reasons). (I'm Aware of AIRWEB etc but not sure if I'm going to manage to get to Beijing)
Interested in seeing an efficient MapReduce version of a Conjugate Gradient method, if someone has references could they message me them?
It is quite surprising how this presentation present many ideas that seem to be contained also in
Castillo, C., Donato, D., Gionis, A., Murdock, V., and Silvestri, F. Know your neighbors: web spam detection using the web topology. In Proceedings of SIGIR '07. (Amsterdam, The Netherlands, July 23 - 27, 2007). 423-430.
Well, I'm a little bit biased, I'm one of the co-authors :P
Considering how often expired domains seem to be snapped up and replaced with generic search pages - I'd have thought that good pages linking to bad would be quite common. It's happened to me.
Autoshare makes certain YouTube activities public on the services you choose. Select only the services you are comfortable with - like Facebook, Twitter, or Google Reader - to let your friends know what you like on YouTube. You can turn Autoshare off at any time.
(I'm Aware of AIRWEB etc but not sure if I'm going to manage to get to Beijing)
Interested in seeing an efficient MapReduce version of a Conjugate Gradient method, if someone has references could they message me them?
Castillo, C., Donato, D., Gionis, A., Murdock, V., and Silvestri, F. Know your neighbors: web spam detection using the web topology. In Proceedings of SIGIR '07. (Amsterdam, The Netherlands, July 23 - 27, 2007). 423-430.
Well, I'm a little bit biased, I'm one of the co-authors :P
Jacob Abernethy, Olivier Chapelle, Carlos Castillo: "WITCH: A New Approach to Web Spam Detection". 2007.
--
ChaTo
Which isn't to say that the presenter isn't MUCH smarter than I am - maybe just not as old and crusty :)
- Dave
That's easy - the factor that I don't see in your graph is time. A domain that is valid at time A may not be valid at time B.
If site X links to Site Y and Y is valid at time A, but not valid at time B, it doesn't infrer that site X is a spam page.