What accuracy means in deepfake detection
In binary classification — is this image a deepfake or not? — accuracy is the percentage of correct predictions across all inputs. An accuracy of 95.3% means the model correctly classifies 953 out of every 1,000 images analyzed. The 47 incorrect results are split between two error types: false positives (real images incorrectly flagged as deepfakes) and false negatives (deepfakes that pass detection as genuine).
These two error types have very different consequences depending on your application. A false positive in a KYC flow means a legitimate customer's real selfie is flagged — causing friction and potentially blocking a valid user. A false negative means a fraudster's deepfake selfie passes your verification. The appropriate sensitivity of your detection system depends on which error is more costly for your specific use case.
Pro Tip
Ask vendors for precision and recall figures in addition to overall accuracy. These tell you the separate false positive rate and false negative rate, which matter far more for production decisions than a single accuracy number.
How detection benchmarks are built — and why this matters
The accuracy figure a deepfake detection tool claims is only as meaningful as the benchmark it was measured on. Several factors determine whether a benchmark reflects real-world performance.
Distribution overlap is the most important factor. A model trained on a specific dataset of GAN-generated faces will score very high on a test set from the same dataset — because it has seen the statistical properties of that generation method during training. The same model may score significantly lower on diffusion-model outputs from Midjourney or on in-the-wild deepfakes collected from social media, because those were not in its training distribution.
ScamAI's research ("How well are open-sourced AI-generated image detection models out-of-the-box?", arXiv:2602.07814) measured 12 leading open-source detection models on out-of-distribution data — deepfakes they were not trained on. Accuracy fell from claimed rates of 90–99% to 50–65% in out-of-distribution conditions. This gap between claimed and real-world accuracy is the central challenge of deepfake detection evaluation.
Key Stat
ScamAI research found leading open-source detection models achieve only 50–65% accuracy on out-of-distribution deepfakes, despite claiming 90–99% in-distribution.
In-the-wild benchmarks — data collected from real social media, fraud cases, and uploaded user content — provide a much more honest picture of production performance than curated lab datasets. ScamAI's Real-World Faceswap Dataset (RWFS) and the GPT-Image-2 Twitter Dataset are specifically designed to test detectors on the actual distribution of deepfakes encountered in production.
Why accuracy degrades in production
Even a well-benchmarked detector faces additional accuracy challenges in production beyond distribution shift. Image processing by upload pipelines — JPEG compression, resizing, format conversion — removes some of the frequency-domain artifacts that detectors rely on. Social platforms including Instagram, Twitter, and WhatsApp apply aggressive compression that can reduce detection accuracy by 10–20 percentage points on compressed images.
Adversarial examples are another challenge. Motivated attackers who know the detection system can apply perturbations that preserve visual quality while specifically disrupting the artifacts the detector looks for. This is less of an issue for typical fraud use cases (most fraudsters use off-the-shelf tools, not custom adversarial pipelines) but becomes relevant for high-value targeted attacks.
Generation tool evolution creates a constant arms race. A detection model trained before GPT-Image-2 launched in May 2025 will have reduced accuracy on GPT-Image-2 outputs until retrained. This is why model update cadence matters — not just initial accuracy figures.
Precision, recall, and the ROC curve
Precision and recall provide a more complete picture than accuracy alone. Precision answers: of all images flagged as deepfakes, what percentage actually were? High precision means fewer false alarms. Recall answers: of all actual deepfakes, what percentage were correctly flagged? High recall means fewer missed deepfakes.
The ROC curve (Receiver Operating Characteristic) plots the true positive rate against the false positive rate across all possible detection thresholds. The area under the ROC curve (AUC-ROC) is a threshold-independent measure of detection quality. An AUC of 1.0 represents perfect detection; 0.5 represents random chance. Production deepfake detectors should have AUC scores publicly available alongside accuracy figures.
Pro Tip
When evaluating a detection provider, ask for AUC-ROC scores on an in-the-wild benchmark, not just accuracy on a controlled dataset.
Choosing the right threshold for your use case
ScamAI's API returns a confidence score between 0 and 1 for every analyzed image. The detection threshold — the score above which you classify an image as a deepfake — is configurable for your use case. There is no universally correct threshold; it depends on the relative cost of each error type.
- KYC onboarding — higher threshold (e.g. 0.85) to minimize false positives and friction for legitimate users; deepfakes above threshold get manual review
- High-stakes identity verification — lower threshold (e.g. 0.60) to catch more deepfakes, accept higher manual review volume
- Content moderation at scale — threshold tuned to your moderation team's capacity; auto-remove high-confidence (>0.95), queue medium-confidence for review
- Insurance claims — lower threshold appropriate given high cost of fraud; supplement with manual review for flagged claims
Use ScamAI's free tier to analyze a sample of your real data and calibrate confidence thresholds before deploying at scale. The 200 free detections per month are specifically suited for this threshold calibration phase.