It is quite surprising how this presentation present many ideas that seem to be contained also in
Castillo, C., Donato, D., Gionis, A., Murdock, V., and Silvestri, F. Know your neighbors: web spam detection using the web topology. In Proceedings of SIGIR '07. (Amsterdam, The Netherlands, July 23 - 27, 2007). 423-430.
Well, I'm a little bit biased, I'm one of the co-authors :P
Considering how often expired domains seem to be snapped up and replaced with generic search pages - I'd have thought that good pages linking to bad would be quite common. It's happened to me.
Good Video, we don't get to see enough webspam detection lectures online (I guess for obvious reasons).
(I'm Aware of AIRWEB etc but not sure if I'm going to manage to get to Beijing)
Interested in seeing an efficient MapReduce version of a Conjugate Gradient method, if someone has references could they message me them?
timwintle 4 years ago
It is quite surprising how this presentation present many ideas that seem to be contained also in
Castillo, C., Donato, D., Gionis, A., Murdock, V., and Silvestri, F. Know your neighbors: web spam detection using the web topology. In Proceedings of SIGIR '07. (Amsterdam, The Netherlands, July 23 - 27, 2007). 423-430.
Well, I'm a little bit biased, I'm one of the co-authors :P
fabriziosilvestri 4 years ago
Yes, and perhaps that is due to the fact that Carlos Castillo was also a coauthor on this work! :P
thejakeyboy 4 years ago
Hi Fabrizio, this talk is based on an upcoming paper:
Jacob Abernethy, Olivier Chapelle, Carlos Castillo: "WITCH: A New Approach to Web Spam Detection". 2007.
--
ChaTo
ChaTo1977 4 years ago
Ok... thanks... I was not aware of that paper! Sorry!
fabriziosilvestri 4 years ago
The algorithm proposed does seem very naive.
Which isn't to say that the presenter isn't MUCH smarter than I am - maybe just not as old and crusty :)
- Dave
davidjbullock 4 years ago
Why would good pages link to bad pages?
That's easy - the factor that I don't see in your graph is time. A domain that is valid at time A may not be valid at time B.
If site X links to Site Y and Y is valid at time A, but not valid at time B, it doesn't infrer that site X is a spam page.
davidjbullock 4 years ago 2
Considering how often expired domains seem to be snapped up and replaced with generic search pages - I'd have thought that good pages linking to bad would be quite common. It's happened to me.
neuronstorm 4 years ago
So pages are penalized for both being spam and outdated. Still sounds useful.
stcredzero 3 years ago