Innovating Data Privacy

- | Final Independent Chapter Analysis | -


Introduction

With the emergence of the field of “Big Data” in recent years, more user data is being generated on a daily basis than ever before. With each new app downloaded, online purchase completed, or social media post shared, users are subject to data harvesting and storage of their personal or private information. At the time, ' these data points might seem to be innocuous, posing little to no security or personal risk. However, as the wave of digitalization has swept over every corner of the global industry, privacy concerns can take on an entirely different light. While identifiable information shared willingly through Facebook may be acceptable, the same personal information linked to medical records through 3rd party medical treatment companies will likely pose a much larger risk. As these industries grow and the widespread digitalization of data continues, every ounce of information contained within the physical file-storage warehouses of the past are suddenly one data breach away from public record. To protect data-owners, changes must be made to current data protections standards, or alternative solutions to current data storage practices must be proposed (Soria-Comas 22).

Current Outlook

The unfortunate reality is that companies collect more data than they often know what to do with, with many keeping incredibly poor records of what data they have, who they have collected it from, and where it is stored. For the average person, these questions are paramount to preserving their own personal security and maintaining control over their own privacy. While many people have a vastly varied level of sensitivity to privacy-threatening data being in the hands of some unknown third-party, some level of legislation is needed to standardize data protection and privacy measures for users across the globe (Mehmood 1829). By crafting legislation such as the GDPR and CCPA which protect PII, Personal Identifiable Information, companies are incentivized to take all necessary precautions to protect against data breaches while ethically and responsibly disclosing data collection policies and procedures (Sommerville 347). Fortunately, as a result of these data privacy laws, large companies and industries are forced to account for a number of data protection principals. These principals include but are not limited to providing users awareness and control of what data is being collected, expressing the purpose for collecting the data, receiving user consent to collect the data, and recording the location of the data (346). Sommerville makes an excellent point by expressing the need for a data privacy and control dashboard implemented within every application, as current practices and control are obscured by the legal jargon buried within the terms and conditions that no reasonable user has ever read in their entirety (348).

Data Anonymization

Ultimately, the battle between privacy and utility comes down to allowing data-owners both full-transparency and control as far as their data is concerned. The more that data is anonymized, abstracted, and stringently collected, the greater the computational cost and lower utility is gained from the user’s data in the first place (Mehmood 1829). One current approach to solving this problem is using greedy PPDP algorithms to generate anonymized data models that meet the requirements of both user consent and the globally enforced data protection and privacy requirements (1830). While models such as these are useful at present, they fail to address the full range of dynamic privacy options that is necessary to truly anonymize data rather than offer temporary obfuscation of line items. Data anonymization is an incredibly cumbersome process, and at present, many companies are not equipped to meet both the scaling supply of data and demand for utility (1831).

Decentralized Storage

Centralized storage systems pose incredible security risks as data continues to scale. Single points of failure can often lead to massive breaches in security with hundreds of thousands to millions of data-owners suffering as a result (1831). While Sommerville’s text does not explore this need or any suitable answers to it, a number of researchers and computer scientists have begun to explore a possible solution in the form of decentralization. Examples of popular decentralized data storage models include cloud computing infrastructure, and blockchain technology. The latter of which has become a central focus in the evolution of data storage and privacy models (Truong 1746). The stated goal of legislation like GDPR and CCPA is to bring control back to the data-owners whose data has been harvested either unknowingly or without understanding the full gravity. While these are good first steps, an evolution in the form of a “data wallet” of sorts that uses blockchain technologies to impose data consent, transaction monitoring, and usage tracing would be the ideal next step for all data-owners (1747).

This would allow individuals to personally own their data, while streamlining the privacy consent, approval, and tracking process that is still in its infancy. Instead of companies creating duplicate records stored in an SQL database, unmonitored, and forgotten, companies are specifically granted access permissions but not storage permissions. On the other hand, if it is beneficial to grant a company full access to a number of data items, it is easy to enable and revoke access on a whim. Furthermore, companies are able to easily transition to GDPR and CCPA compliant data storage and privacy standards due to the decentralized, API-like interface that the users will present them with. Current industry standards regarding the use of OAuth authentication and profile management show that the interest and familiarity with similar solutions is extremely popular (Truong 1752). Additionally, any infraction will be recorded, traced, and reported through the proper channels to ensure that user data’s security is preserved, and any breaches are swiftly dealt with. No longer would users remain unaware of what elements of their personal data have been compromised. Most importantly, alongside the decentralized structure, algorithms designed to work over immense distributions of data need to be optimized to achieve the levels of efficient computation only possible with distributed, decentralized models of computing (Mehmood 1831).

System Architecture

A proposed high-level system architecture is detailed in Figure 1 below (Truong 1750).

Figure 1 - High-Level System Architecture (Truong 1751)

In this model, APIs interact first interact with the blockchain platform to obtain permission granted using authentication tokens. Next, the service provider makes requests to the resource server which contains user data but is encrypted until it receives a valid token from the blockchain platform. After the resource server obtains the valid token, user data is returned to the service provider and is then passed on to the network of APIs who perform the necessary operations (1750). Additionally, the proposed solution is detailed in a practical use case within Figure 2 (1758).

Figure 2 - Practical Use-Case Architecture (Truong 1758)

In this use-case, profile data from the social media site is stored in the MongoDB. While user information is stored in JSON format in the database and is edited using the usual CRUD operations, the origin point of user data is controlled by an external blockchain service. To meet GDPR compliance standards, the service provider authenticates and conducts transaction of user data to the requestion site as approved by the user (1758). This ensures that users are not just aware of what data they approve, but they are in control of allowing services to have continued access to their data.

Conclusion

Sommerville’s discussion regarding data privacy and protection laws highlight crucial challenges of policy-making and cyber security practices that are required to ensure the security of user data. Additionally, they recommend solutions that would be actionable plans to meet GDPR compliance standards. However, simply creating “good enough” solutions to problems such as data privacy may not be enough in this case. As the digital world continues to evolve and user information collection becomes both more efficient and increasingly predatory, a massive innovation in the way personal data is traditionally stored is an absolute necessity. Simple quick fixes do not address the root problem that is: “How can users regain control over their own data?”. This will certainly continue to be a topic discussed frequently and with great urgency, especially as the digital space transforms in light of remote work and an increasingly online generation. In the meantime, users should continue to remain vigilant and take necessary steps to ensure that their data remains secure and minimal.

References

Mehmood, Abid, et. al. "Protection of Big Data Privacy”, IEEE Access, Volume 4, 2016, pp. 1824-1834.
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7460114. Accessed 3 December 2021.

Sommerville, Ian. Engineering Software Products. Pearson Education Inc., 2020.

Soria-Comas, Jordi, Domingo-Ferrer, Josep. “Big Data Privacy: Challenges to Privacy Principles and Models”,
Data Science and Engineering, Volume 1, 2015, pp. 21-28. https://link.springer.com/content/pdf/10.1007/s41019-015-0001-x.pdf. Accessed 3 December 2021.

Truong, N. B., et. al. “GDPR-Compliant Personal Data Management: A Blockchain-Based Solution”, IEEE Transactions on Information
Forensics and Security, Volume 15, 2020, pp. 1746-1761. https://ieeexplore.ieee.org/abstract/document/8876647. Accessed 3 December 2021.
BACK TO PORTFOLIO