Researcher (scientific/technical/engineering)

Date of the expedition

From 10/07/2023 to 23/12/2023

Selected Track

Paired Teams

Project title

Rub your eyes: robustness analysis of document redaction for anonymisation

Host Organization

Computer Science Department, University of California, Irvine



Gabriele Orazi holds a double Master’s Degree in Computer Science (Cyber Security) from the University of Trento and Turku. After brief experience in DevSecOps, he became a research fellow at the University of Padua in the SPRITZ Security and Privacy Research Group, supervised by Prof. Mauro Conti. He is currently a Ph.D. student in Brain, Mind and Computer Science (BMCS) at University of Padua, focusing on privacy, obfuscation, cyber forensics, and malware detection. Beyond his GNU/Linux passion, he enjoys repurposing computer hardware. When away from the keyboard, you can find him rock climbing or trail running in the mountains.

Project Summary

Digital documents, such as PDFs and docx files, are widely used for sharing knowledge online while protecting sensitive information. Redaction techniques are commonly used to obscure sensitive details, but they may still be vulnerable to inference. This project aims to develop a tool that can automatically analyze and potentially de-anonymize redacted documents, identifying weaknesses in current techniques and offering practical suggestions for improving information security. It addresses the need for greater awareness of privacy in data sharing and encourages innovative solutions to protect privacy without instilling fear.

Key Result

We’ve made significant progress in understanding and analyzing documents for this project. Our capabilities include extracting text from scanned and searchable documents, pinpointing the location of obscured sections, estimating hidden text length behind black boxes, identifying entities like names, dates, and locations, and categorizing redacted portions by entity type, such as names or dates. These findings will be valuable in the next stages of the project, possibly in applying Open Source Intelligence (OSINT) methods. Even without OSINT, our code can estimate the type of entity hidden behind black boxes, which, combined with character estimates, is a significant achievement. Although forensic experts can perform this analysis manually, automatic inference is noteworthy. The project can already raise awareness about the risks of online personal information exposure. By offering an open source benchmarking tool, we hope to encourage more thoughtful information sharing to be used by universities and research centers.

Impact of the Fellowship

Privacy-preserving technologies are vital for overall tech progress, and privacy-related research is as important as tech innovation, identifying vulnerabilities before new tech becomes mainstream. Document redaction, though traditional, remains essential for various everyday tasks. This project aims to reassess redaction’s safety and encourages the development of more effective privacy-preserving technologies. The fellowship bridges Europe and the US, addressing their differing data protection regulations and fostering collaborations in data security and privacy. The project aims to share its findings at renowned privacy conferences to inspire further advancements and international cooperation.