AI/ML Pivots to the Security Development Lifecycle Bug Bar
By Andrew Marshall, Jugal Parikh, Emre Kiciman and Ram Shankar Siva Kumar
This document is a deliverable of the Microsoft AETHER Engineering Practices for AI Working Group and functions as a supplement to the existing SDL bug bar used to triage traditional security vulnerabilities. It is intended to be used as a reference for the triage of AI/ML-related security issues. For more detailed threat analysis and mitigation information, refer to Threat Modeling AI/ML Systems and Dependencies.
This guidance is organized around and extensively references the Adversarial Machine Learning Threat Taxonomy created by Ram Shankar Siva Kumar, David O’Brien, Kendra Albert, Salome Viljoen, and Jeffrey Snover entitled Failure Modes in Machine Learning. Note that while the research this content is based on addresses both intentional/malicious and accidental behaviors in ML failure modes, this bug bar supplement focuses entirely on intentional/malicious behaviors that would result in a security incident and/or deployment of a fix.
|Data Poisoning||Important to Critical||
Corrupting the training data - The end goal of the attacker is to contaminate the machine model generated in the training phase, so that predictions on new data will be modified in the testing phase.
In targeted poisoning attacks, the attacker wants to misclassify specific examples to cause specific actions to be taken or omitted.
Submitting AV software as malware to force its misclassification as malicious and eliminate the use of targeted AV software on client systems.
A company scrapes a well-known and trusted website for futures data to train their models. The data provider’s website is subsequently compromised via SQL Injection attack. The attacker can poison the dataset at will and the model being trained has no notion that the data is tainted.
|Model Stealing||Important to Critical||
Recreation of the underlying model by legitimately querying it. The functionality of the new model is same as that of the underlying model. Once the model is recreated, it can be inverted to recover feature information or make inferences on training data.
Equation solving – For a model that returns class probabilities via API output, an attacker can craft queries to determine unknown variables in a model.
Path Finding – an attack that exploits API particularities to extract the ‘decisions’ taken by a tree when classifying an input.
Transferability attack - An adversary can train a local model—possibly by issuing prediction queries to the targeted model - and use it to craft adversarial examples that transfer to the target model. If your model is extracted and discovered vulnerable to a type of adversarial input, new attacks against your production-deployed model can be developed entirely offline by the attacker who extracted a copy of your model.
In settings where an ML model serves to detect adversarial behavior, such as identification of spam, malware classification, and network anomaly detection, model extraction can facilitate evasion attacks
|Model Inversion||Important to Critical||
The private features used in machine learning models can be recovered. This includes reconstructing private training data that the attacker does not have access to. This is accomplished by finding the input which maximizes the confidence level returned, subject to the classification matching the target.
Example: Reconstruction of facial recognition data from guessed or known names and API access to query the model.
|Adversarial Example in Physical Domain||Critical||These examples can manifest in the physical domain, like a self-driving car being tricked into running a stop sign because of a certain color of light (the adversarial input) being shone on the stop sign, forcing the image recognition system to no longer see the stop sign as a stop sign.|
|Attack ML Supply Chain||Critical||
Owing to large resources (data + computation) required to train algorithms, the current practice is to reuse models trained by large corporations and modify them slightly for task at hand (e.g: ResNet is a popular image recognition model from Microsoft).
These models are curated in a Model Zoo (Caffe hosts popular image recognition models).
In this attack, the adversary attacks the models hosted in Caffe, thereby poisoning the well for anyone else.
|Backdoored Algorithm from Malicious ML Provider||Critical||
Compromising the underlying algorithm
A malicious ML-as-a-Service provider presents a backdoored algorithm, wherein the private training data is recovered. This provides the attacker with the ability to reconstruct sensitive data such as faces and texts, given only the model.
|Neural Net Reprogramming||Important to Critical||
By means of a specially crafted query from an attacker, ML systems can be reprogrammed to a task that deviates from the creator’s original intent
Weak access controls on a facial recognition API enabling 3rd parties to incorporate into apps designed to harm users, such as a deep fakes generator.
This is an abuse/account takedown scenario
|Adversarial Perturbation||Important to Critical||
In perturbation-style attacks, the attacker stealthily modifies the query to get a desired response from a production-deployed model. This is a breach of model input integrity which leads to fuzzing-style attacks where the end result isn’t necessarily an access violation or EOP, but instead compromises the model’s classification performance.
This can be manifested by trolls using certain target words in a way that the AI will ban them, effectively denying service to legitimate users with a name matching a “banned” word.
Forcing benign emails to be classified as spam or causing a malicious example to go undetected. These are also known as model evasion or mimicry attacks.
Attacker can craft inputs to reduce the confidence level of correct classification, especially in high-consequence scenarios. This can also take the form of a large number of false positives meant to overwhelm administrators or monitoring systems with fraudulent alerts indistinguishable from legitimate alerts.
|Membership Inference||Moderate to Critical||
Infer individual membership in a group used to train a model
Ex: prediction of surgical procedures based on age/gender/hospital