Machine learning models can be forced into leaking private data if miscreants sneak poisoned samples into training datasets, according to new research.
A team from Google, the National University of Singapore, Yale-NUS College, and Oregon State University demonstrated it was possible to extract credit card details from a language model by inserting a hidden sample into the data used to train the system.
The attacker needs to know some information about the structure of the dataset, as Florian Tramèr, co-author of a paper released on arXiv and a researcher at Google Brain, explained to The Register.
“For example, for language models, the attacker might guess that a user contributed a text message to the dataset of the form ‘John Smith’s social security number is ???-????-???.’ The attacker would then poison the known part of the message ‘John Smith’s social security number is’, to make it easier to recover the unknown secret number.”
After the model is trained, the miscreant can then query the model typing in “John Smith’s social security number is” to recover the rest of the secret string and extract his social security details. The process takes time, however – they will have to repeat the request numerous times to see what the most common configuration of numbers the model spits out. Language models lear