Stealing Machine Learning Models Through API Output

9 Min Read

New analysis from Canada affords a potential technique by which attackers may steal the fruits of pricy machine studying frameworks, even when the one entry to a proprietary system is through a extremely sanitized and apparently well-defended API (an interface or protocol that processes consumer queries server-side, and returns solely the output response).

Because the analysis sector appears more and more in the direction of monetizing expensive mannequin coaching by way of Machine Studying as a Service (MLaaS) implementations, the brand new work means that Self-Supervised Learning (SSL) fashions are extra susceptible to this sort of mannequin exfiltration, as a result of they’re educated with out consumer labels, simplifying extraction, and usually present outcomes that include quite a lot of helpful info for somebody wishing to duplicate the (hidden) supply mannequin.

In ‘black field’ check simulations (the place the researchers granted themselves no extra entry to a neighborhood ‘sufferer’ mannequin than a typical end-user would have through an internet API), the researchers had been in a position to replicate the goal methods with comparatively low sources:

‘[Our] assaults can steal a replica of the sufferer mannequin that achieves appreciable downstream efficiency in fewer than 1/5 of the queries used to coach the sufferer. Towards a sufferer mannequin educated on 1.2M unlabeled samples from ImageNet, with a 91.9% accuracy on the downstream Vogue-MNIST classification process, our direct extraction assault with the InfoNCE loss stole a replica of the encoder that achieves 90.5% accuracy in 200K queries.

‘Equally, towards a sufferer educated on 50K unlabeled samples from CIFAR10, with a 79.0% accuracy on the downstream CIFAR10 classification process, our direct extraction assault with the SoftNN loss stole a replica that achieves 76.9% accuracy in 9,000 queries.’

The researchers used three attack methods, finding that 'Direct Extraction' was the most effective. These models were stolen from a locally recreated CIFAR10 victim encoder using 9,000 queries from the CIFAR10 test-set. Source: https://arxiv.org/pdf/2205.07890.pdf

The researchers used three assault strategies, discovering that ‘Direct Extraction’ was the best. These fashions had been stolen from a regionally recreated CIFAR10 sufferer encoder utilizing 9,000 queries from the CIFAR10 test-set. Supply: https://arxiv.org/pdf/2205.07890.pdf

The researchers observe additionally that strategies that are suited to guard supervised fashions from assault don’t adapt effectively to fashions educated on an unsupervised foundation – though such fashions signify a few of the most anticipated and celebrated fruits of the picture synthesis sector.

See also  New transformer architecture can make language models faster and resource-efficient

The brand new paper is titled On the Issue of Defending Self-Supervised Studying towards Mannequin Extraction, and comes from the College of Toronto and the Vector Institute for Synthetic Intelligence.

Self-Consciousness

In Self-Supervised Studying, a mannequin is educated on unlabeled knowledge. With out labels, an SSL mannequin should study associations and teams from the implicit construction of the information, in search of related sides of information and regularly corralling these sides into nodes, or representations.

The place an SSL method is viable, it is extremely productive, because it bypasses the necessity for costly (typically outsourced and controversial) categorization by crowdworkers, and basically rationalizes the information autonomously.

The three SSL approaches thought-about by the brand new paper’s authors are SimCLR, a Siamese Network; SimSiam, one other Siamese Community centered on illustration studying; and Barlow Twins, an SSL method that achieved state-of-the-art ImageNet classifier efficiency on its launch in 2021.

Mannequin extraction for labeled knowledge (i.e. a mannequin educated by way of supervised studying) is a comparatively well-documented analysis space. It is also simpler to defend towards, because the attacker should receive the labels from the sufferer mannequin with a view to recreate it.

A 'knockoff classifier' attack model against a supervised learning architecture. Source: https://arxiv.org/pdf/1812.02766.pdf

From a earlier paper, a ‘knockoff classifier’ assault mannequin towards a supervised studying structure. Supply: https://arxiv.org/pdf/1812.02766.pdf

With out white-box entry, this isn’t a trivial process, because the typical output from an API request to such a mannequin incorporates much less info than with a typical SSL API.

From the paper*:

‘Previous work on mannequin extraction centered on the Supervised Studying (SL) setting, the place the sufferer mannequin usually returns a label or different low-dimensional outputs like confidence scores or logits.

‘In distinction, SSL encoders return high-dimensional representations; the de facto output for a ResNet-50 Sim-CLR mannequin, a well-liked structure in imaginative and prescient, is a 2048-dimensional vector.

‘We hypothesize this considerably larger info leakage from encoders makes them extra susceptible to extraction assaults than SL fashions.’

Structure and Information

The researchers examined three approaches to SSL mannequin inference/extraction: Direct Extraction, during which the API output is in comparison with a recreated encoder’s output through an apposite loss operate reminiscent of Imply Squared Error (MSE); recreating the projection head, the place a vital analytical performance of the mannequin, usually discarded earlier than deployment, is reassembled and utilized in a duplicate mannequin; and accessing the projection head, which is simply potential in instances the place the unique builders have made the structure accessible.

In method #1, Direct Extraction, the output of the victim model is compared to the output of a local model; method #2 involves recreating the projection head used in the original training architecture (and usually not included in a deployed model).

In technique #1, Direct Extraction, the output of the sufferer mannequin is in comparison with the output of a neighborhood mannequin; technique #2 entails recreating the projection head used within the unique coaching structure (and normally not included in a deployed mannequin).

The researchers discovered that Direct Extraction was the best technique for acquiring a practical duplicate of the goal mannequin, and has the additional advantage of being probably the most tough to characterize as an ‘assault’ (as a result of it basically behaves little in a different way than a typical and legitimate finish consumer).

See also  Analysts share 8 ChatGPT security predictions for 2023 

The authors educated sufferer fashions on three picture datasets: CIFAR10, ImageNet, and Stanford’s Avenue View Home Numbers (SVHN). ImageNet was educated on ResNet50, whereas CIFAR10 and SVHN had been educated on ResNet18 and ResNet24 over a freely accessible PyTorch implementation of SimCLR.

The fashions’ downstream (i.e. deployed) efficiency was examined towards CIFAR100, STL10, SVHN, and Fashion-MNIST. The researchers additionally experimented with extra ‘white field’ strategies of mannequin appropriation, although it transpired that Direct Extraction, the least privileged method, yielded the perfect outcomes.

To judge the representations being inferred and replicated within the assaults, the authors added a linear prediction layer to the mannequin, which was fine-tuned on the complete labeled coaching set from the following (downstream) process, with the remainder of the community layers frozen. On this means, the check accuracy on the prediction layer can operate as a metric for efficiency. Because it contributes nothing to the inference course of, this does not signify ‘white field’ performance.

Results on the test runs, made possible by the (non-contributing) Linear Evaluation layer. Accuracy scores in bold.

Outcomes on the check runs, made potential by the (non-contributing) Linear Analysis layer. Accuracy scores in daring.

Commenting on the outcomes, the researchers state:

‘We discover that the direct goal of imitating the sufferer’s representations offers excessive efficiency on downstream duties regardless of the assault requiring solely a fraction (lower than 15% in sure instances) of the variety of queries wanted to coach the stolen encoder within the first place.’

And proceed:

‘[It] is difficult to defend encoders educated with SSL because the output representations leak a considerable quantity of knowledge. Essentially the most promising defenses are reactive strategies, reminiscent of watermarking, that may embed particular augmentations in high-capacity encoders.’

 

See also  Twelve Labs is building models that can understand videos at a deep level

* My conversion of the paper’s inline citations to hyperlinks.

First printed 18th Might 2022.

Source link

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *