Uploaded by 28c3 on Dec 30, 2011
Download high quality version: http://bit.ly/rJadkG
Description: http://events.ccc.de/congress/2011/Fahrplan/events/4781.en.html
Michael Brennan, Rachel Greenstadt: Deceiving Authorship Detection
Tools to Maintain Anonymity Through Writing Style & Current Trends in Adversarial Stylometry
Stylometry is the art of detecting authorship of a document based on the linguistic style present in the text. As authorship recognition methods based on machine learning have improved, they have also presented a threat to privacy and anonymity. We have developed two open-source tools, Stylo and Anonymouth, which we will release at 28C3 and introduce in this talk. Anonymouth aids individuals in obfuscating documents to protect identity from authorship analysis. Stylo is a machine-learning based authorship detection research tool that provides the basis for Anonymouth's decision making. We will also review the problem of stylometry and the privacy implications and present new research related to detecting writing style deception, threats to anonymity in short message services like Twitter, examine the implications for languages other than English, and release a large adversarial stylometry corpus for linguistic and privacy research purposes.
Stylometry is the study of authorship recognition based on linguistic style (word choice, punctuation, syntax, etc). Adversarial stylometry examines authorship recognition in the context of privacy and anonymity though attempts to circumvent stylometry with passages intended to obfuscate or imitate identity.
This talk will introduce the open source authorship recognition and obfuscation projects Anonymouth and Stylo. Anonymouth aids individuals in obfuscating their writing style in order to maintain anonymity against multiple forms of machine learning based authorship recognition techniques. The basis for this tool is Stylo, an authorship recognition research tool that implements multiple forms of state-of-the-art stylometry methods. Anonymouth uses Stylo to attempt authorship recognition and suggest changes to a document that will obfuscate the identity of the author to the known set of authorship recognition techniques.
We will also cover our recent work in the field of adversarial authorship recognition in the two years since our 26C3 talk, "Privacy & Stylometry: Practical Attacks Against Authorship Recognition Techniques." Our lab has new research on detecting deception in writing style that may indicate a modified document, demonstrating up to 86% accuracy in detecting the presence of deceptive writing styles. Short messages have been difficult to assign authorship to but recent work from our lab demonstrates the threat to anonymity present in short message services like Twitter. We have found that while difficult, it is possible to identify authors of tweets with success rates significantly higher than random chance. We also have new results that examine the ability of authorship recognition to succeed across languages and the use of translation to thwart detection.
This talk will also mark the release of an adversarial stylometry data set that is many times larger than our previous release. This data set, provided by volunteers, includes at least 6500 words per author of unmodified writing as well as sample adversarial passages intended to preserve the anonymity of the author and demographic information for each author.
The content of this talk will be relevant to those with interest in novel issues in privacy and anonymity, forensics and anti-forensics, and machine learning. All of the work presented here is from the Privacy, Security and Automation Lab at Drexel University. Founded in 2008, our lab focuses on the use of machine learning to augment privacy and security decision making.
Category:
Tags:
- 28c3
- ccc
- Michael Brennan
- Rachel Greenstadt
- Deceiving
- Authorship
- Detection
- Anonymity
- Adversarial
- Stylometry
License:
Creative Commons Attribution license (reuse allowed)
-
9 likes, 0 dislikes
-
As Seen On:
MetaFilter
57:11
28c3: Evolving custom communication protocolsby 28c31,094 views
54:47
28c3: Datamining for Hackersby 28c33,659 views
1:07:34
28c3: Taking control over the Tor networkby 28c33,893 views
2:08:10
28c3: Der Staatstrojanerby 28c35,923 views
1:15:32
28c3: Fnord-Jahresrückblick (GEMA-free Remix)by 28c39,778 views
28:20
28c3: Ooops I hacked my PBXby 28c33,117 views
51:53
28c3: Changing techno-optimists by shaking up the bureaucratsby 28c3818 views
51:19
28c3: Apple vs. Google Client Platformsby 28c36,254 views
48:00
28c3: String Oriented Programmingby 28c33,136 views
5:16
Search And Destroy - Candy Floss (Loefah Remix)by BrostepIsShit1,478 views
53:13
28c3: Building a Distributed Satellite Ground Station Network - A Call To Armsby 28c316,168 views
1:25:40
28c3: How governments have tried to block Torby 28c343,199 views
26:41
28c3: A Brief History of Plutocracyby 28c36,567 views
53:56
28c3: Towards a Single Secure European Cyberspace?by 28c3860 views
26:24
28c3: Don't scan, just askby 28c34,608 views
49:26
28c3: r0ket++by 28c33,816 views
59:33
28c3: Not your Grandfathers moon landingby 28c32,239 views
3:07
Mr. Oizo - Flat Beatby erikfilipaz563,495 views
1:02:05
28c3: Automatic Algorithm Invention with a GPUby 28c32,632 views
57:50
28c3: Resilience Towards Leaking or Why Julian Assange Might Be Wrong After Allby 28c3955 views
- Loading more suggestions...
Link to this comment:
All Comments (0)