By Fuqua School of Business| Mar 8, 2024
Professor Ali Makhdoumi designed a data acquisition mechanism that maximizes platforms' utility while compensating privacy-sensitive users
[CAPTION]As platforms learn details about users’ preferences and characteristics, the risk is that they use those data points to enable price discrimination and other manipulations useful for them and harmful for users
Image: Shutterstock[/CAPTION]
The rise of artificial intelligence and machine learning is increasing the demand for data from app users, device owners, firms, consumers, and even patients.
As data-hungry technologies are getting more and more efficient, the key question is how to incentivize data-sharing while protecting users’ privacy, said Ali Makhdoumi, an associate professor of decision sciences at Duke University’s Fuqua School of Business.
_RSS_In a new paper to appear in the journal, Operations Research, Makhdoumi and co-authors Alireza Fallah of University of California, Berkeley, Azarakhsh Malekian of University of Toronto, and Asuman Ozdaglar of Massachusetts Institute of Technology argue that the solution may be in designing a mechanism that measures the privacy sensitivity of users and compensates them for relinquishing personal data.
“In many machine learning applications, we treat data as given,” Makhdoumi said. “However, typically data is provided by individuals who have their own privacy concerns. So, the question is: how should we compensate these privacy-concerned users? And before answering this question, we need to understand how one can guarantee privacy.’’
In their research, Makhdoumi and colleagues use ‘differential privacy,’ an approach widely adopted in the tech industry.
This approach involves adding noise to the data, a level of randomization that will make the information less revealing of the person who is sharing it, he said.
He made an example of a company querying hospital records to determine the percentage of individuals within a zip code who have a certain medical condition that may be sensitive. “Let’s say that the dataset shows that 20% of individuals have that condition,” Makhdoumi said. ”To make it private, the hospital adds noise to the true average so that the response to the query may show that the percentage of individuals with that condition is some random number between 10% and 30%.’’
He also said that privacy is currently delivered either locally or centrally. In the local setting, the data is randomized directly on the user’s device, before being shared with the entity that will process it. An example of the localized approach is Apple’s privacy settings, that the company amplified in the campaign “What happens on your iPhone stays on your iPhone.” In the centralized system, users share the raw data with the companies, which will then add noise to the results.
The local system produces less accurate statistical estimations in business or other types of analysis, Makhdoumi said, because the data is randomized before being analyzed.
“The data is provided by individuals,” he said. “So the natural question is how should I compensate those individuals for the data and for their loss of privacy?”
People have different levels of concern with digital privacy, he said, and they value differently the utility they derive from the service the platforms provide.
For example, in a medical setting, users might decide that the societal benefits of scientific research may justify some loss of privacy, Makhdoumi said. In different settings, for example social media, or even in governmental studies, people may have different intrinsic privacy concerns, he said.
In their research, Makhdoumi and co-authors designed a new data acquisition mechanism that considers users’ privacy sensitivity, assigns a value to it, and determines the optimal incentive mechanism for data-dependent platforms.
Also read: Deceptive or Disruptive: How deepfakes and AI will transform marketing in 2024
By considering both a price for users’ privacy loss and the utility they derive from the service, the mechanism provides the optimal way for a company to collect the data while compensating users for sharing their personal information.
“There are companies already paying users for their data,” he said.
The research also shows it is most efficient for platforms to collect data centrally, a setting that ensures the most precise results for business analysis.