Alert icon
We're changing our privacy policy. This stuff matters.  Learn more  Dismiss

WITCH: A New Approach to Web Spam Detection

Loading...

Sign in or sign up now!
4,045
Loading...
Alert icon
Sign in or sign up now!
Alert icon

Uploaded by on Dec 15, 2007

Google Tech Talks
December, 14 2007

ABSTRACT

We present an algorithm, WITCH, that learns to detect spam hosts or pages on the web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph, as well as page contents and features. This work is a collaboration with Olivier Chapelle and Carlos Castillo, both of Yahoo! Inc.

Speaker: Jake Abernethy

Category:

People & Blogs

Tags:

License:

Standard YouTube License

  • likes, 3 dislikes

Link to this comment:

Share to:
see all

All Comments (9)

Sign In or Sign Up now to post a comment!
  • So pages are penalized for both being spam and outdated. Still sounds useful.

  • Good Video, we don't get to see enough webspam detection lectures online (I guess for obvious reasons).

    (I'm Aware of AIRWEB etc but not sure if I'm going to manage to get to Beijing)

    Interested in seeing an efficient MapReduce version of a Conjugate Gradient method, if someone has references could they message me them?

  • Ok... thanks... I was not aware of that paper! Sorry!

  • Hi Fabrizio, this talk is based on an upcoming paper:

    Jacob Abernethy, Olivier Chapelle, Carlos Castillo: "WITCH: A New Approach to Web Spam Detection". 2007.

    --

    ChaTo

  • Yes, and perhaps that is due to the fact that Carlos Castillo was also a coauthor on this work! :P

  • It is quite surprising how this presentation present many ideas that seem to be contained also in

    Castillo, C., Donato, D., Gionis, A., Murdock, V., and Silvestri, F. Know your neighbors: web spam detection using the web topology. In Proceedings of SIGIR '07. (Amsterdam, The Netherlands, July 23 - 27, 2007). 423-430.

    Well, I'm a little bit biased, I'm one of the co-authors :P

  • The algorithm proposed does seem very naive.

    Which isn't to say that the presenter isn't MUCH smarter than I am - maybe just not as old and crusty :)

    - Dave

  • Considering how often expired domains seem to be snapped up and replaced with generic search pages - I'd have thought that good pages linking to bad would be quite common. It's happened to me.

  • Why would good pages link to bad pages?

    That's easy - the factor that I don't see in your graph is time. A domain that is valid at time A may not be valid at time B.

    If site X links to Site Y and Y is valid at time A, but not valid at time B, it doesn't infrer that site X is a spam page.

Loading...
0 / 00Unsaved Playlist Return to active list
    1. Your queue is empty. Add videos to your queue using this button:
      or sign in to load a different list.
    Loading...Loading...Saving...
    • Clear all videos from this list
    • Learn more