Identifying Suspicious URLs: An Application of Large-Scale Online Learning

Loading...

Sign in or sign up now!
Alert icon
Upgrade to the latest Flash Player for improved playback performance. Upgrade now or more info.
7,147
Loading...
Alert icon
Sign in or sign up now!
Alert icon
There is no Interactive Transcript.

Uploaded by on May 14, 2010

Google Tech Talk
May 5, 2010

ABSTRACT

Presented by Justin Ma.

We explore online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs. We show that this application is particularly appropriate for online algorithms as the size of the training data is larger than can be efficiently processed in batch and because the distribution of features that typify malicious URLs is changing continuously. Using a real-time system we developed for gathering URL features, combined with a real-time source of labeled URLs from a large Web mail provider, we demonstrate that recently-developed online algorithms can be as accurate as batch techniques, achieving daily classification accuracies up to 99% over a balanced data set.

Slides: http://cseweb.ucsd.edu/~jtma/google_talk/jtma-google10.pdf

Justin Ma is a PhD candidate at UC San Diego advised by Stefan Savage, Geoff Voelker and Lawrence Saul. His research interests are in systems and networking with an emphasis on network security, and his current focus is the application of machine learning to problems in security. He will be joining UC Berkeley as a postdoc after graduation. [Home page: http://www.cs.ucsd.edu/~jtma/ ]

Category:

Science & Technology

Tags:

License:

Standard YouTube License

  • likes, 2 dislikes

Link to this comment:

Share to:
see all

All Comments (11)

Sign In or Sign Up now to post a comment!
  • lol i watched a video the url had CuNt in it lol

    i've had wih FucK aswell

  • Google, one of the worlds largest companies, unable to produce decent audio!?

  • very intersting research, congratulations!

  • many really good algorithms mentioned in this video. Great work anyway :)

  • Justin, a few less "Ummmm..." would be nice.

  • They can also hide their domain completely using feedproxy.google , thank you very much for that spam domain anon service google :-)

  • gah, what's that high pitched hiss when he talks

  • This is a great video. Also, very nice refresher on ML algorithms. I've bookmarked it as a reference for some of those ML formulas.

  • @arex1338 It is machine-learning jargon. So, it was used appropriately for the audience.

  • 8:18 Is this the top of some girl's head?

Loading...

Alert icon
0 / 00Unsaved Playlist Return to active list
    1. Your queue is empty. Add videos to your queue using this button:
      or sign in to load a different list.
    Loading...Loading...Saving...
    • Clear all videos from this list
    • Learn more