Usable Privacy Documents in Software Systems and Engineering

Pan, Shidong

Usable Privacy Documents in Software Systems and Engineering

Date

2025

Authors

Pan, Shidong

Abstract

Privacy regulations commonly mandate software systems to provide essential privacy documents, such as privacy policies and privacy (nutrition) labels, to inform users about what, why, and how their personal data are collected and used. However, currently these privacy documents often fail to fulfil the needs of users. Privacy documents are often flawed because of the lack of privacy awareness, understanding, and systematic support for privacy among practitioners in software engineering. Software users frequently experience ``Digital Resignation'', which is where users desire to understand and control the information that digital entities own about them, but feel powerless to do so. The low readability and poor usability of privacy documents make it difficult for users to understand the privacy practices of the systems they engage with, leaving them unable to exercise control over their personal privacy. In this thesis, we present five contributions addressing two long-standing research challenges related to privacy documents: Procurability, how to craft privacy documents effectively within the software engineering process; and Utility, the usability of privacy documents for practitioners and stakeholders. Specifically, this line of work originates from a fundamental question: how are problematic privacy documents initially produced? We first examine how software developers employ Automated Privacy Policy Generator (APPGs) to craft privacy policies, unveiling the popularity, characteristics, and compliance of APPGs. Results reveal a 20.1\% adoption rate of APPGs and various incompliance issues against privacy regulations. To tackle the long-standing privacy document generation problem, we then introduce Privacy Bills of Materials (PriBOM) to foster transparent and collaborative privacy notice generation during the software development process. The usability evaluation underscores the usefulness and practicability of PriBOM in fostering a privacy-conscious development workflow. Despite their critical role in protecting digital privacy, privacy documents are often overlooked and poorly engaged with. Leveraging the emergence of privacy labels alongside advances in Large Language Models (LLMs), we first propose an LLM-based agentic framework to automatically generate Privacy Labels from privacy policies to improve readability, evaluated by real-world privacy polices and labels collected from market. Additionally, to build the contextual alignment between privacy notices and real-time privacy events, we present the Contextual Privacy Policies (CPPs) for mobile applications to enhance usability. Evaluated by self-crafted Cpp4App benchmark and a usability evaluation, people exhibit much greater willingness to engage with CPPs, scoring 4.1 out of 5, compared to a mere 2 out of 5 for traditional privacy policies. Lastly, given the widespread adoption of Generative AI (GAI) technologies introduces novel challenges in privacy disclosures, we develop the Repo2Label framework to automatically generate privacy labels for online application repositories. Results show that our framework achieves a precision of 0.81, recall of 0.88, and F1-score of 0.84 under the optimal experimental settings. By targeting the improvement of Procurability and Utility of privacy documents, this thesis aims to strengthen three key areas: compliance between privacy regulations and software privacy documents, alignment between software privacy documents and actual software behaviors, and transparency between software systems and users regarding privacy practices. Keywords: Privacy Documents; Privacy Policy; Privacy Labels; Privacy Laws; Usable Privacy and Security; Software Engineering.