Clinical Validation across 83 Radiographic Condition Classifiers

AI Condition Classifier Performance

Veterinary AI Classifier Performance Metrics

89+ Condition Classifiers Built on 300,000+ Expert-Reviewed Veterinary Imaging Cases

Vetology’s AI classifiers have been validated using over 300,000 cases from real-world veterinary practices. Below you’ll find transparent performance data including sensitivity, specificity, Radiologist Agreement Rates and case counts for every classifier. This deep, real-world dataset helps our AI support veterinarians with reliable screening results at the point of care.

Unlike other veterinary AI imaging platforms, we publish complete performance metrics to help you make informed decisions about diagnostic accuracy and clinical implementation.

While these results offer meaningful insight into expected performance, real-world factors such as image quality, positioning, and patient variability can influence accuracy in clinical settings.

Jump to Classifier Metrics

What Makes Our Data Different

Full Transparency

Many veterinary AI companies don't publish classifier-level performance metrics. We believe you deserve to see exactly how our technology performs on every condition we detect.

Species-Specific Classifier Development

Canine and feline conditions present differently. Vetology develops separate, species-specific classifiers rather than one shared model, ensuring more accurate and context-appropriate results for each patient type.

Real-World Testing

All metrics are based on actual clinical cases from veterinary practices, not synthetic datasets or cherry-picked examples. This reflects real-world diagnostic performance.

Understanding Our AI Virtual Radiologist Report Screening Metrics

Why Radiologist Benchmarking Matters

Instead of comparing AI to a theoretical “perfect” standard, Vetology benchmarks performance against the Radiologist Agreement Rates: how often multiple boarded radiologists reach the same interpretation. This provides realistic context for conditions where even experts may disagree.

When sensitivity or specificity approaches or exceeds this rate, it indicates performance comparable to specialist-level interpretation for that finding.

These measures help you understand when AI can reinforce diagnostic confidence and when a case may benefit from further review.

Making Sense of the Metrics

Sensitivity

How reliably the AI detects a condition when it is present.
(True Positive Rate)

The proportion of actual positive cases (condition present) that the AI correctly identifies. For example, 89% sensitivity means the AI correctly detects the condition in 89 out of 100 cases where it is actually present. Higher sensitivity reduces false negatives—cases where a condition exists but goes undetected.

Specificity

How accurately the AI rules out a condition when it is absent.
(True Negative Rate)

The proportion of actual negative cases (condition absent) that the AI correctly identifies. For example, 92% specificity means the AI correctly rules out the condition in 92 out of 100 cases where it is actually absent. Higher specificity reduces false positives—cases where the AI flags a condition that is not present.

Radiologist Agreement Rate

The percentage of cases where multiple board-certified veterinary radiologists independently arrive at the same diagnosis. This serves as a benchmark for inherent diagnostic difficulty.

Conditions with lower Radiologist Agreement Rates are more subjective or challenging, even for specialists. When AI performance approaches or exceeds the Radiologist Agreement Rate, it demonstrates specialist-comparable accuracy.

The percentage of cases where multiple board-certified veterinary radiologists independently arrive at the same diagnosis. This serves as a benchmark for inherent diagnostic difficulty.Conditions with lower Radiologist Agreement Rates are more subjective or challenging, even for specialists. When AI performance approaches or exceeds the Radiologist Agreement Rate, it demonstrates specialist-comparable accuracy.

Total Test Cases

The number of independent radiographic studies used to validate AI performance for each condition. Larger test sets provide more reliable performance estimates. All cases were diagnosed by United States board-certified veterinary radiologists and represent diverse presentations across multiple clinical settings.

95% Confidence Intervals

(for Sensitivity and Specificity)

Clinical meaning: If a classifier shows 89.51% sensitivity with a tight confidence interval, you can trust that number. If the interval is wide, you may want to weight your own clinical findings more heavily.

A single sensitivity number is a point estimate. The confidence interval tells you the range within which the true sensitivity likely falls. Tighter intervals mean more certainty; wider intervals mean the classifier was validated on fewer cases for that condition.Clinical meaning: If a classifier shows 89.51% sensitivity with a tight confidence interval, you can trust that number. If the interval is wide, you may want to weight your own clinical findings more heavily.

Area Under Curve (AUC)

Clinical meaning: AUC gives you a single number that captures overall discriminative ability. A classifier with an AUC of 0.94 (like Heart Failure Left in canine thorax) is performing at a high level – it reliably separates patients who have the condition from those who do not.

AUC values between 0.80 and 0.90 are generally considered “excellent” discrimination, and values above 0.90 are considered “outstanding.”^[1]

AUC measures how well the classifier distinguishes between positive and negative cases across all possible decision thresholds. It is reported on a scale from 0 to 1, where 1 represents a classifier that never makes a mistake and 0.5 represents random chance.Clinical meaning: AUC gives you a single number that captures overall discriminative ability. A classifier with an AUC of 0.94 (like Heart Failure Left in canine thorax) is performing at a high level – it reliably separates patients who have the condition from those who do not.AUC values between 0.80 and 0.90 are generally considered “excellent” discrimination, and values above 0.90 are considered “outstanding.”^[1]

Positive Predictive Value (PPV)

Clinical meaning: A high PPV means that when the AI flags something, it is very likely real. A lower PPV in a rare condition is expected – it does not mean the classifier is unreliable, it means the condition is uncommon and you should correlate with clinical signs.

When the classifier flags a finding as positive, PPV tells you the probability that the patient actually has the condition. This metric is directly affected by how common the condition is in the test population.Clinical meaning: A high PPV means that when the AI flags something, it is very likely real. A lower PPV in a rare condition is expected – it does not mean the classifier is unreliable, it means the condition is uncommon and you should correlate with clinical signs.

Negative Predictive Value (NPV)

Clinical meaning: A high NPV gives you confidence that a negative AI result genuinely means “nothing concerning here.” This is especially valuable for conditions where missing a diagnosis carries serious consequences.

When the classifier reports no finding, NPV tells you the probability that the patient truly does not have the condition. For screening purposes, this is one of the most important metrics available.Clinical meaning: A high NPV gives you confidence that a negative AI result genuinely means “nothing concerning here.” This is especially valuable for conditions where missing a diagnosis carries serious consequences.

Overall Accuracy

Clinical meaning: Use accuracy as a starting point, then look at the metrics that tell you how it performs. Check sensitivity to see how well it catches true positives, and specificity to see how well it avoids false alarms. Then check prevalence in the test set: if a condition only appears in 2% of cases, a classifier that always says “negative” would still show 98% accuracy while missing every positive case.

When accuracy is high but prevalence is low, sensitivity and NPV are the metrics that tell you whether the classifier is actually doing its job.

The percentage of all cases (both positive and negative) that the classifier categorized correctly. While useful as a general performance summary, accuracy alone can be misleading when a condition is rare or common.Clinical meaning: Use accuracy as a starting point, then look at the metrics that tell you how it performs. Check sensitivity to see how well it catches true positives, and specificity to see how well it avoids false alarms. Then check prevalence in the test set: if a condition only appears in 2% of cases, a classifier that always says “negative” would still show 98% accuracy while missing every positive case.When accuracy is high but prevalence is low, sensitivity and NPV are the metrics that tell you whether the classifier is actually doing its job.

Prevalence in Test Set

The proportion of positive cases in the validation dataset. This is the denominator that drives PPV and NPV calculations.

Clinical meaning: Knowing the test set prevalence helps you calibrate expectations. If your practice population has a higher prevalence of a condition than the test set (e.g., you are a cardiology referral center), the real-world PPV for your caseload will be higher than the reported number.

The proportion of positive cases in the validation dataset. This is the denominator that drives PPV and NPV calculations.Clinical meaning: Knowing the test set prevalence helps you calibrate expectations. If your practice population has a higher prevalence of a condition than the test set (e.g., you are a cardiology referral center), the real-world PPV for your caseload will be higher than the reported number.

Clinical Urgency Rating

Clinical meaning: Urgency ratings help you prioritize your review queue. When multiple AI findings appear on a single study, urgency tells you where to look first.

Each classifier is tagged with a clinical urgency level: Critical, High, Moderate, or Low. These ratings are defined through consensus by Vetology’s in-house board-certified veterinary radiologists based on the clinical consequence of a delayed or missed detection – not the AI’s confidence level.Clinical meaning: Urgency ratings help you prioritize your review queue. When multiple AI findings appear on a single study, urgency tells you where to look first.

Ground Truth Case Counts

(Positive + Negative)

The exact number of board-certified veterinary radiologist-reviewed cases used to validate each classifier, broken into positive and negative counts.

Clinical meaning: More cases mean more reliable statistics. When you see a classifier validated on 10,951 cases, you know those performance numbers are robust. This also helps you evaluate whether the validation dataset was balanced enough to produce meaningful results.

The exact number of board-certified veterinary radiologist-reviewed cases used to validate each classifier, broken into positive and negative counts.Clinical meaning: More cases mean more reliable statistics. When you see a classifier validated on 10,951 cases, you know those performance numbers are robust. This also helps you evaluate whether the validation dataset was balanced enough to produce meaningful results.

Model Release and Retrained Dates

The date each classifier model was initially released and when it was most recently retrained on updated data.

Clinical meaning: Knowing the retrained date tells you how current the model is. Classifiers that have been recently retrained incorporate the latest validation data and performance improvements.

The date each classifier model was initially released and when it was most recently retrained on updated data.Clinical meaning: Knowing the retrained date tells you how current the model is. Classifiers that have been recently retrained incorporate the latest validation data and performance improvements.

Vetology’s Published Condition Classifier Metrics

Our industry-leading transparency allows you to evaluate our technology objectively.

Real-World Validation: Tested on a foundation of 300,000+ multi-image patient cases from veterinary practices.
Radiologist Benchmarking: Performance compared directly to board-certified veterinary radiologists
Comprehensive Coverage: Canine and feline imaging across thorax, abdomen, spine, and musculoskeletal studies

Vetology AI Performance Metrics | Clinical Validation Data for 89 Veterinary Classifiers

Vetology AI Classifier Performance Metrics

Clinical Validation Data by Physiological Region

Validated on 300,000+ test cases by board-certified veterinary radiologists

Canine Thorax (26 conditions)

Cardiac Conditions

Heart Failure Left

Canine

Sensitivity

89.51%

Specificity

92.07%

Radiologist Agreement Rate: 90%

Test Cases: 10,951

Left Auricular Appendage Enlargement

Canine

Sensitivity

84.65%

Specificity

84.51%

Radiologist Agreement Rate: N/A

Test Cases: 40,271

Thorax Left Atrial Enlargement

Canine

Sensitivity

84.44%

Specificity

83.48%

Radiologist Agreement Rate: N/A

Test Cases: 10,291

Cardiomegaly L

Canine

Sensitivity

85.40%

Specificity

78.71%

Radiologist Agreement Rate: 77%

Test Cases: 14,936

Cardiomegaly

Canine

Sensitivity

75.64%

Specificity

86.28%

Radiologist Agreement Rate: 93%

Test Cases: 16,899

Heart Failure Right

Canine

Sensitivity

74.23%

Specificity

84.37%

Radiologist Agreement Rate: 90%

Test Cases: 9,362

Pericardial Effusion

Canine

Sensitivity

68.82%

Specificity

72.12%

Radiologist Agreement Rate: 96%

Test Cases: 16,918

Vascular

Pulmonary Venous Enlargement

Canine

Sensitivity

82.41%

Specificity

83.76%

Radiologist Agreement Rate: 69%

Test Cases: 15,962

Main Pulmonary Artery Enlargement

Canine

Sensitivity

66.74%

Specificity

77.34%

Radiologist Agreement Rate: 78%

Test Cases: 40,535

Pulmonary Arterial Enlargement

Canine

Sensitivity

68.15%

Specificity

64.66%

Radiologist Agreement Rate: N/A

Test Cases: 14,920

Pulmonary Parenchymal

Caudodorsal Alveolar Pattern

Canine

Sensitivity

82.06%

Specificity

88.05%

Radiologist Agreement Rate: 83%

Test Cases: 14,504

Miliary Pulmonary Pattern

Canine

Sensitivity

84.72%

Specificity

84.27%

Radiologist Agreement Rate: 89%

Test Cases: 31,904

Cranioventral Alveolar Pattern

Canine

Sensitivity

83.67%

Specificity

85.09%

Radiologist Agreement Rate: 83%

Test Cases: 14,334

Pulmonary Nodules

Canine

Sensitivity

74.68%

Specificity

78.58%

Radiologist Agreement Rate: 91%

Test Cases: 9,673

Pulmonary Masses

Canine

Sensitivity

76.35%

Specificity

74.20%

Radiologist Agreement Rate: 74%

Test Cases: 15,118

Interstitial Pattern

Canine

Sensitivity

76.46%

Specificity

71.33%

Radiologist Agreement Rate: 61%

Test Cases: 14,794

Pleural/Mediastinal

Mediastinal Shift R

Canine

Sensitivity

80.97%

Specificity

78.06%

Radiologist Agreement Rate: N/A

Test Cases: 42,495

Pleural Fluid

Canine

Sensitivity

75.15%

Specificity

75.34%

Radiologist Agreement Rate: 80%

Test Cases: 10,045

Mediastinal Shift L

Canine

Sensitivity

74.16%

Specificity

74.41%

Radiologist Agreement Rate: N/A

Test Cases: 43,910

Cranial Mediastinal Mass

Canine

Sensitivity

73.73%

Specificity

74.41%

Radiologist Agreement Rate: 81%

Test Cases: 10,253

Airways

Intrathoracic Tracheal Narrowing

Canine

Sensitivity

82.26%

Specificity

83.30%

Radiologist Agreement Rate: 80%

Test Cases: 42,269

Hypoinflation

Canine

Sensitivity

63.12%

Specificity

59.35%

Radiologist Agreement Rate: 77%

Test Cases: 19,905

Bronchial Pattern

Canine

Sensitivity

52.24%

Specificity

60.62%

Radiologist Agreement Rate: 76%

Test Cases: 21,146

Other Thoracic Findings

Thorax Heart Based Mass

Canine

Sensitivity

71.70%

Specificity

76.04%

Radiologist Agreement Rate: 89%

Test Cases: 12,064

Bronchiectasis

Canine

Sensitivity

54.50%

Specificity

75.09%

Radiologist Agreement Rate: 81%

Test Cases: 11,433

Esophageal Enlargement

Canine

Sensitivity

44.25%

Specificity

61.26%

Radiologist Agreement Rate: N/A

Test Cases: 4,391

Canine Abdomen (21 conditions)

Hepatic

Hepatomegaly

Canine

Sensitivity

84.74%

Specificity

83.00%

Radiologist Agreement Rate: 92%

Test Cases: 10,870

Hepatic Mass

Canine

Sensitivity

77.73%

Specificity

84.61%

Radiologist Agreement Rate: 83%

Test Cases: 12,351

Microhepatia

Canine

Sensitivity

80.27%

Specificity

78.45%

Radiologist Agreement Rate: 84%

Test Cases: 12,311

Splenic

Splenomegaly

Canine

Sensitivity

75.16%

Specificity

73.50%

Radiologist Agreement Rate: 63%

Test Cases: 10,896

Splenic Mass

Canine

Sensitivity

70.08%

Specificity

76.10%

Radiologist Agreement Rate: N/A

Test Cases: 10,511

Renal/Urinary

Nephroliths L

Canine

Sensitivity

83.36%

Specificity

71.88%

Radiologist Agreement Rate: 85%

Test Cases: 12,644

Nephroliths R

Canine

Sensitivity

76.77%

Specificity

74.70%

Radiologist Agreement Rate: 84%

Test Cases: 14,250

Urocystoliths Urethroliths

Canine

Sensitivity

71.59%

Specificity

76.22%

Radiologist Agreement Rate: 96%

Test Cases: 48,073

Kidney Size R

Canine

Sensitivity

66.71%

Specificity

63.03%

Radiologist Agreement Rate: 62%

Test Cases: 13,089

Kidney Size L

Canine

Sensitivity

63.06%

Specificity

55.63%

Radiologist Agreement Rate: N/A

Test Cases: 14,038

Gastrointestinal

Spastic Colon

Canine

Sensitivity

90.64%

Specificity

69.50%

Radiologist Agreement Rate: 65%

Test Cases: 16,210

Multi-Label Gastric Contents(Stomach)

Canine

Sensitivity

74.85%

Specificity

66.18%

Radiologist Agreement Rate: 63%

Test Cases: 10,908

Multi-Label Gastric Contents(Gas)

Canine

Sensitivity

67.36%

Specificity

66.79%

Radiologist Agreement Rate: 63%

Test Cases: 10,908

Segmental Small Intestine Distension

Canine

Sensitivity

68.66%

Specificity

60.54%

Radiologist Agreement Rate: N/A

Test Cases: 10,896

Colon Diffuse Distension

Canine

Sensitivity

64.23%

Specificity

63.86%

Radiologist Agreement Rate: N/A

Test Cases: 12,296

Mineral Metal Small Intestine Material

Canine

Sensitivity

50.25%

Specificity

76.03%

Radiologist Agreement Rate: N/A

Test Cases: 10,018

Mineral Metal Gastric Material

Canine

Sensitivity

53.11%

Specificity

71.15%

Radiologist Agreement Rate: N/A

Test Cases: 12,221

Other Abdominal Findings

Pregnancy

Canine

Sensitivity

76.19%

Specificity

96.63%

Radiologist Agreement Rate: 98%

Test Cases: 10,740

Mid Abdominal Mass

Canine

Sensitivity

65.07%

Specificity

65.79%

Radiologist Agreement Rate: 74%

Test Cases: 20,498

Normal Abdomen

Canine

Sensitivity

63.97%

Specificity

66.39%

Radiologist Agreement Rate: 73%

Test Cases: 11,692

Decreased Serosal Detail Severity

Canine

Sensitivity

74.01%

Specificity

50.81%

Radiologist Agreement Rate: 80%

Test Cases: 7,910

Feline Thorax (17 conditions)

Cardiac Conditions

Heart Failure Left

Feline

Sensitivity

82.30%

Specificity

97.63%

Radiologist Agreement Rate: 83%

Test Cases: 19,855

Heart Failure Right

Feline

Sensitivity

85.85%

Specificity

85.67%

Radiologist Agreement Rate: N/A

Test Cases: 13,388

Cardiomegaly

Feline

Sensitivity

70.40%

Specificity

83.35%

Radiologist Agreement Rate: 98%

Test Cases: 9,536

Cardiomegaly L

Feline

Sensitivity

68.82%

Specificity

77.55%

Radiologist Agreement Rate: 98%

Test Cases: 11,415

Pulmonary Parenchymal

Caudodorsal Alveolar Pattern

Feline

Sensitivity

76.84%

Specificity

81.56%

Radiologist Agreement Rate: 83%

Test Cases: 14,450

Perihilar Infiltrate

Feline

Sensitivity

73.04%

Specificity

79.48%

Radiologist Agreement Rate: N/A

Test Cases: 38,009

Cranioventral Alveolar Pattern

Feline

Sensitivity

72.42%

Specificity

74.94%

Radiologist Agreement Rate: 83%

Test Cases: 13,398

Interstitial Pattern

Feline

Sensitivity

65.33%

Specificity

78.25%

Radiologist Agreement Rate: 94%

Test Cases: 15,054

Pulmonary Nodules

Feline

Sensitivity

61.15%

Specificity

76.83%

Radiologist Agreement Rate: 84%

Test Cases: 15,585

Pulmonary Masses

Feline

Sensitivity

68.81%

Specificity

68.80%

Radiologist Agreement Rate: 85%

Test Cases: 17,784

Pleural/Mediastinal

Pleural Fluid

Feline

Sensitivity

82.44%

Specificity

92.46%

Radiologist Agreement Rate: 75%

Test Cases: 13,101

Cranial Mediastinal Mass

Feline

Sensitivity

70.30%

Specificity

83.01%

Radiologist Agreement Rate: 63%

Test Cases: 37,942

Thoracic Lymphadenopathy

Feline

Sensitivity

64.84%

Specificity

62.97%

Radiologist Agreement Rate: 85%

Test Cases: 10,264

Airways

Bronchial Pattern

Feline

Sensitivity

76.37%

Specificity

79.57%

Radiologist Agreement Rate: 62%

Test Cases: 11,228

Hyperinflation

Feline

Sensitivity

63.31%

Specificity

79.95%

Radiologist Agreement Rate: 89%

Test Cases: 12,886

Other Thoracic Findings

Pulmonary Vasculature Enlargement

Feline

Sensitivity

65.73%

Specificity

80.00%

Radiologist Agreement Rate: 64%

Test Cases: 12,729

Normal Thorax

Feline

Sensitivity

76.99%

Specificity

63.06%

Radiologist Agreement Rate: 87%

Test Cases: 9,647

Feline Abdomen (17 conditions)

Hepatic

Hepatomegaly

Feline

Sensitivity

83.33%

Specificity

72.05%

Radiologist Agreement Rate: 90%

Test Cases: 13,886

Hepatic Mass

Feline

Sensitivity

61.40%

Specificity

81.76%

Radiologist Agreement Rate: 79%

Test Cases: 13,886

Splenic

Splenomegaly

Feline

Sensitivity

66.29%

Specificity

74.82%

Radiologist Agreement Rate: 80%

Test Cases: 13,886

Renal/Urinary

Nephroliths R

Feline

Sensitivity

66.98%

Specificity

74.13%

Radiologist Agreement Rate: 85%

Test Cases: 13,886

Kidney Size R

Feline

Sensitivity

71.89%

Specificity

67.12%

Radiologist Agreement Rate: 53%

Test Cases: 12,224

Nephroliths L

Feline

Sensitivity

61.66%

Specificity

73.02%

Radiologist Agreement Rate: 88%

Test Cases: 13,886

Kidney Size L

Feline

Sensitivity

68.96%

Specificity

63.43%

Radiologist Agreement Rate: 53%

Test Cases: 12,519

Urocystoliths Urethroliths

Feline

Sensitivity

59.22%

Specificity

59.05%

Radiologist Agreement Rate: 76%

Test Cases: 13,886

Gastrointestinal

Colon Enlargement

Feline

Sensitivity

67.82%

Specificity

94.39%

Radiologist Agreement Rate: N/A

Test Cases: 47,369

Mineral Metal Gastric Material

Feline

Sensitivity

66.88%

Specificity

81.94%

Radiologist Agreement Rate: 76%

Test Cases: 44,561

Colon Diffuse Distension

Feline

Sensitivity

68.77%

Specificity

75.88%

Radiologist Agreement Rate: 91%

Test Cases: 13,886

Rugal Fold

Feline

Sensitivity

70.37%

Specificity

73.88%

Radiologist Agreement Rate: 74%

Test Cases: 13,886

Radiopaque Gastric Foreign Body

Feline

Sensitivity

72.63%

Specificity

62.60%

Radiologist Agreement Rate: 77%

Test Cases: 13,886

Segmental Small Intestine Distension

Feline

Sensitivity

68.13%

Specificity

60.98%

Radiologist Agreement Rate: 93%

Test Cases: 10,473

Other Abdominal Findings

Mid Abdominal Mass

Feline

Sensitivity

83.78%

Specificity

73.80%

Radiologist Agreement Rate: 72%

Test Cases: 36,840

Decreased Serosal Detail Severity

Feline

Sensitivity

70.07%

Specificity

73.80%

Radiologist Agreement Rate: 76%

Test Cases: 13,886

Pregnancy

Feline

Sensitivity

1.13%

Specificity

99.47%

Radiologist Agreement Rate: 98%

Test Cases: 10,025

Spine/Musculoskeletal (8 conditions)

Spine

Congenital Lumbar Vertebrae Anomaly

Canine Feline

Sensitivity

87.50%

Specificity

75.81%

Radiologist Agreement Rate: 72%

Test Cases: 9,305

Normal Lumbar Spine

Canine Feline

Sensitivity

80.38%

Specificity

80.75%

Radiologist Agreement Rate: 81%

Test Cases: 13,385

Intervertebral Disc Disease Thoracic(Lateral)

Canine Feline

Sensitivity

81.94%

Specificity

73.75%

Radiologist Agreement Rate: 87%

Test Cases: 10,717

Intervertebral Disc Disease Lumbar(Lateral)

Canine Feline

Sensitivity

86.01%

Specificity

68.22%

Radiologist Agreement Rate: 87%

Test Cases: 20,482

Normal Thoracic Spine

Canine Feline

Sensitivity

75.49%

Specificity

74.41%

Radiologist Agreement Rate: 86%

Test Cases: 13,728

Pelvis/Joints

Luxation Pelvis (Vd)

Canine Feline

Sensitivity

75.77%

Specificity

77.88%

Radiologist Agreement Rate: 89%

Test Cases: 6,435

Hip Dysplasia

Canine Feline

Sensitivity

78.68%

Specificity

73.42%

Radiologist Agreement Rate: 71%

Test Cases: 6,435

DJD Pelvis

Canine Feline

Sensitivity

73.60%

Specificity

73.60%

Radiologist Agreement Rate: 87%

Test Cases: 6,435

[1] Hosmer DW, Lemeshow S. Applied Logistic Regression. 2nd ed. New York: Wiley; 2000. AUC interpretation: 0.70–0.80 = acceptable, 0.80–0.90 = excellent, ≥0.90 = outstanding. See also: Çorbacιoğlu ŞK, Aksel G. “Receiver operating characteristic curve analysis in diagnostic accuracy studies: A guide to interpreting the area under the curve value.” Turk J Emerg Med. 2023;23(4):195–198. PMC10664195

Frequently Asked Questions

How accurate are Vetology's AI classifiers?

Our classifiers average 70% sensitivity and 73% specificity across all 83 conditions, validated on over 300,000 test cases from real-world veterinary practices. Performance varies by condition – scroll up to see specific metrics for each classifier by species.

What conditions can Vetology's AI detect?

We detect 83 conditions including cardiomegaly, pleural fluid, hepatomegaly and microhepatia, splenomegaly, kidney size, thoracic masses, abdominal masses, bladder and kidney stones, and more across canine and feline patients.

Our classifiers cover thorax (20 canine, 15 feline), abdomen (25 canine, 14 feline), and spine/musculoskeletal (9 conditions) imaging.

How does Vetology compare to other veterinary imaging AI platforms?

Unlike competitors, we publish complete performance metrics for every classifier. Our transparent data includes sensitivity, specificity, case counts, and Radiologist Agreement Rate, allowing you to make informed decisions about diagnostic accuracy rather than relying on marketing claims alone.

What is Radiologist Agreement Rate (RAR)?

Radiologist Agreement Rate measures how often our AI classifiers agree with board-certified veterinary radiologists on the same cases. This benchmark helps practices understand how AI performance compares to specialist interpretation and provides context for clinical decision-making.

How was this data validated?

All performance metrics are based on a rigorous testing process using over 300,000 real-world veterinary cases.

Each classifier was evaluated independently, with sensitivity, specificity, and case counts calculated from actual clinical imaging studies.

Radiologist Agreement Rates compare AI predictions against board-certified veterinary radiologist interpretations.

Ready to Experience AI-Assisted Radiology?

See how Vetology’s classifiers can improve diagnostic confidence and workflow efficiency in your practice.

Request a Demo

View Pricing

Clinical Validation across 83 Radiographic Condition Classifiers

Veterinary AI Classifier Performance Metrics

89+ Condition Classifiers Built on 300,000+ Expert-Reviewed Veterinary Imaging Cases

What Makes Our Data Different

Full Transparency

Species-Specific Classifier Development

Real-World Testing

Understanding Our AI Virtual Radiologist Report Screening Metrics

Why Radiologist Benchmarking Matters

Making Sense of the Metrics

Sensitivity

How reliably the AI detects a condition when it is present. (True Positive Rate)

Specificity

How accurately the AI rules out a condition when it is absent.(True Negative Rate)

Radiologist Agreement Rate

Total Test Cases

95% Confidence Intervals

(for Sensitivity and Specificity)

Area Under Curve (AUC)

Positive Predictive Value (PPV)

Negative Predictive Value (NPV)

Overall Accuracy

Prevalence in Test Set

Clinical Urgency Rating

Ground Truth Case Counts

(Positive + Negative)

Model Release and Retrained Dates

Vetology’s Published Condition Classifier Metrics

Cardiac Conditions

Vascular

Pulmonary Parenchymal

Pleural/Mediastinal

Airways

Other Thoracic Findings

Hepatic

Splenic

Renal/Urinary

Gastrointestinal

Other Abdominal Findings

Cardiac Conditions

Pulmonary Parenchymal

Pleural/Mediastinal

Airways

Other Thoracic Findings

Hepatic

Splenic

Renal/Urinary

Gastrointestinal

Other Abdominal Findings

Spine

Pelvis/Joints

Frequently Asked Questions

How accurate are Vetology's AI classifiers?

What conditions can Vetology's AI detect?

How does Vetology compare to other veterinary imaging AI platforms?

What is Radiologist Agreement Rate (RAR)?

How was this data validated?

Ready to Experience AI-Assisted Radiology?

Pin It on Pinterest

How reliably the AI detects a condition when it is present.
(True Positive Rate)

How accurately the AI rules out a condition when it is absent.
(True Negative Rate)