Upload

Loading icon Loading...

This video is unavailable.

28c3: Deceiving Authorship Detection

Sign in to YouTube

Sign in with your Google Account (YouTube, Google+, Gmail, Orkut, Picasa, or Chrome) to like 28c3's video.

Sign in to YouTube

Sign in with your Google Account (YouTube, Google+, Gmail, Orkut, Picasa, or Chrome) to dislike 28c3's video.

Sign in to YouTube

Sign in with your Google Account (YouTube, Google+, Gmail, Orkut, Picasa, or Chrome) to add 28c3's video to your playlist.

Uploaded on Dec 30, 2011

Download high quality version: http://bit.ly/rJadkG
Description: http://events.ccc.de/congress/2011/Fa...

Michael Brennan, Rachel Greenstadt: Deceiving Authorship Detection
Tools to Maintain Anonymity Through Writing Style & Current Trends in Adversarial Stylometry

Stylometry is the art of detecting authorship of a document based on the linguistic style present in the text. As authorship recognition methods based on machine learning have improved, they have also presented a threat to privacy and anonymity. We have developed two open-source tools, Stylo and Anonymouth, which we will release at 28C3 and introduce in this talk. Anonymouth aids individuals in obfuscating documents to protect identity from authorship analysis. Stylo is a machine-learning based authorship detection research tool that provides the basis for Anonymouth's decision making. We will also review the problem of stylometry and the privacy implications and present new research related to detecting writing style deception, threats to anonymity in short message services like Twitter, examine the implications for languages other than English, and release a large adversarial stylometry corpus for linguistic and privacy research purposes.

Stylometry is the study of authorship recognition based on linguistic style (word choice, punctuation, syntax, etc). Adversarial stylometry examines authorship recognition in the context of privacy and anonymity though attempts to circumvent stylometry with passages intended to obfuscate or imitate identity.

This talk will introduce the open source authorship recognition and obfuscation projects Anonymouth and Stylo. Anonymouth aids individuals in obfuscating their writing style in order to maintain anonymity against multiple forms of machine learning based authorship recognition techniques. The basis for this tool is Stylo, an authorship recognition research tool that implements multiple forms of state-of-the-art stylometry methods. Anonymouth uses Stylo to attempt authorship recognition and suggest changes to a document that will obfuscate the identity of the author to the known set of authorship recognition techniques.

We will also cover our recent work in the field of adversarial authorship recognition in the two years since our 26C3 talk, "Privacy & Stylometry: Practical Attacks Against Authorship Recognition Techniques." Our lab has new research on detecting deception in writing style that may indicate a modified document, demonstrating up to 86% accuracy in detecting the presence of deceptive writing styles. Short messages have been difficult to assign authorship to but recent work from our lab demonstrates the threat to anonymity present in short message services like Twitter. We have found that while difficult, it is possible to identify authors of tweets with success rates significantly higher than random chance. We also have new results that examine the ability of authorship recognition to succeed across languages and the use of translation to thwart detection.

This talk will also mark the release of an adversarial stylometry data set that is many times larger than our previous release. This data set, provided by volunteers, includes at least 6500 words per author of unmodified writing as well as sample adversarial passages intended to preserve the anonymity of the author and demographic information for each author.

The content of this talk will be relevant to those with interest in novel issues in privacy and anonymity, forensics and anti-forensics, and machine learning. All of the work presented here is from the Privacy, Security and Automation Lab at Drexel University. Founded in 2008, our lab focuses on the use of machine learning to augment privacy and security decision making.

Loading icon Loading...

Loading icon Loading...

Loading icon Loading...

The interactive transcript could not be loaded.

Loading icon Loading...

Loading icon Loading...

Ratings have been disabled for this video.
Rating is available when the video has been rented.
This feature is not available right now. Please try again later.

Loading icon Loading...

Loading...
Working...
Sign in to add this to Watch Later

Add to