Adversaries may discover the ontology of a machine learning model's output space, for example, the types of objects a model can detect. The adversary may discovery the ontology by repeated queries to the model, forcing it to enumerate its output space. Or the ontology may be discovered in a configuration file or in documentation about the model.
The model ontology helps the adversary understand how the model is being used by the victim. It is useful to the adversary in creating targeted attacks.
Discover ML Model Family
Adversaries may discover the general family of model. General information about the model may be revealed in documentation, or the adversary may used carefully constructed examples and analyze the model's responses to categorize it.
Knowledge of the model family can help the adversary identify means of attacking the model and help tailor the attack.
Discover ML Artifacts
Adversaries may search private sources to identify machine learning artifacts that exist on the system and gather information about them. These artifacts can include the software stack used to train and deploy models, training and testing data management systems, container registries, software repositories, and model zoos.
This information can be used to identify targets for further collection, exfiltration, or disruption, and to tailor and improve attacks.
LLM Meta Prompt Extraction
An adversary may induce an LLM to reveal its initial instructions, or "meta prompt." Discovering the meta prompt can inform the adversary about the internal workings of the system. Prompt engineering is an emerging field that requires expertise and exfiltrating the meta prompt can prompt in order to steal valuable intellectual property.
Discovery
The adversary is trying to figure out your machine learning environment.
Discovery consists of techniques an adversary may use to gain knowledge about the system and internal network. These techniques help adversaries observe the environment and orient themselves before deciding how to act. They also allow adversaries to explore what they can control and what's around their entry point in order to discover how it could benefit their current objective. Native operating system tools are often used toward this post-compromise information-gathering objective.