Turn down the lights
Turn up the lights
Change Player Size

WITCH: A New Approach to Web Spam Detection

Google Tech Talks December, 14 2007 ABSTRACT We present an algorithm, WITCH, that learns to detect spam hosts or pages on the web. Unlike most other approaches, it simultaneously exploits the str...  
 

More From: googletechtalks

QuickList(0)

Featured Videos

Upgrade to Flash Player 10 for improved playback performance. Upgrade Now or get more info.
8 ratings
Sign in to rate
3,440 views
Want to add to Favorites? Sign In or Sign Up now!
Want to add to Playlists? Sign In or Sign Up now!
Want to flag a video? Sign In or Sign Up now!

Statistics & Data

Loading...

Video Responses (0)

This video has no Responses. Be the first to Post a Video Response.
Sign in to post a Comment

Text Comments (9)   Options

Loading...
timwintle (2 years ago) Show Hide
 0
Marked as spam
Good Video, we don't get to see enough webspam detection lectures online (I guess for obvious reasons).
(I'm Aware of AIRWEB etc but not sure if I'm going to manage to get to Beijing)

Interested in seeing an efficient MapReduce version of a Conjugate Gradient method, if someone has references could they message me them?
fabriziosilvestri (2 years ago) Show Hide
+1
Marked as spam
It is quite surprising how this presentation present many ideas that seem to be contained also in

Castillo, C., Donato, D., Gionis, A., Murdock, V., and Silvestri, F. Know your neighbors: web spam detection using the web topology. In Proceedings of SIGIR '07. (Amsterdam, The Netherlands, July 23 - 27, 2007). 423-430.

Well, I'm a little bit biased, I'm one of the co-authors :P
thejakeyboy (2 years ago) Show Hide
+1
Marked as spam
Yes, and perhaps that is due to the fact that Carlos Castillo was also a coauthor on this work! :P
ChaTo1977 (2 years ago) Show Hide
+1
Marked as spam
Hi Fabrizio, this talk is based on an upcoming paper:

Jacob Abernethy, Olivier Chapelle, Carlos Castillo: "WITCH: A New Approach to Web Spam Detection". 2007.

--
ChaTo
fabriziosilvestri (2 years ago) Show Hide
 0
Marked as spam
Ok... thanks... I was not aware of that paper! Sorry!
davidjbullock (2 years ago) Show Hide
 0
Marked as spam
The algorithm proposed does seem very naive.

Which isn't to say that the presenter isn't MUCH smarter than I am - maybe just not as old and crusty :)

- Dave
davidjbullock (2 years ago) Show Hide
+2
Marked as spam
Why would good pages link to bad pages?

That's easy - the factor that I don't see in your graph is time. A domain that is valid at time A may not be valid at time B.

If site X links to Site Y and Y is valid at time A, but not valid at time B, it doesn't infrer that site X is a spam page.
neuronstorm (2 years ago) Show Hide
+1
Marked as spam
Considering how often expired domains seem to be snapped up and replaced with generic search pages - I'd have thought that good pages linking to bad would be quite common. It's happened to me.
stcredzero (1 year ago) Show Hide
 0
Marked as spam
So pages are penalized for both being spam and outdated. Still sounds useful.

Would you like to comment?

Join YouTube for a free account, or sign in if you are already a member.