Interpreting Classifier Results: A First Look at Data Science Metrics

Interpreting Classifier Results: A First Look at Data Science Metrics

What sensitivity, specificity, radiologist agreement rate, and test cases actually tell you about AI diagnostic performance

Written by – Benjamin Cote, Data Scientist | Vetology

As part of Vetology’s push to be transparent about our AI products, we recently published all of our condition classifiers on our website (find them here if you haven’t taken a look yet: AI Classifier Performance). Since we want you to be able to see how each of our models performs and draw your own conclusions, this article is designed to provide you with some extra knowledge and context to interpret our metrics.

On the AI Condition Classifier Performance Metrics page, we include several key metrics on each of our conditions. For the purposes of this article, we will focus on: Sensitivity, Specificity, Radiologist Agreement Rate, and Number of Test Cases. Each measure is a piece of the classifier puzzle, and by understanding the ways they interact, you can see the bigger picture come together. We’ll cover additional metrics in future articles. 

Sensitivity and Specificity

highlight of the sensitivity and specificity data

Front and center on each published classifier, you can see the Sensitivity and Specificity scores achieved by that model. These are both common metrics used in data science to measure model performance, and at Vetology they are the primary way we determine if a model is strong enough to be released.

They can be thought of as a pair, each capturing the same information but on different classes of data. You can think of them as a see-saw: A model that predicts every case as positive would have 100% Sensitivity (but 0% Specificity), and a model that predicts every case as negative would have 100% Specificity (but 0% Sensitivity), and neither would be useful. We want to get Sensitivity and Specificity as high as possible, so the challenge is how to get each metric to improve without harming the other.

What is Sensitivity?

Sensitivity (True Positive Rate)

Sensitivity is a measure of how often our model correctly recognizes that a disease is present in the patient. It answers the question: “When I get a Positive prediction, how often is the case actually Positive?”

When Sensitivity is high, the model correctly recognizes what a given disease looks like. It’s as if the model is telling us: “I know what heart failure looks like, and that is heart failure.”

One way to improve Sensitivity is by training the model on more examples that are positive for the condition so it understands the variation within a disease across many different breeds, and sizes.

What is Specificity?

Specificity (True Negative Rate)

Specificity is a measure of how often our model correctly determines that a disease is absent. It answers the question: “When I get a Negative prediction, how often is the case actually Negative?”

When Specificity is high, the model can correctly distinguish between a given disease and all other diseases, as if the model is telling us: “I don’t know what that is, but I know that is not an example of heart failure.”

One way to improve Specificity is by training the model on more images that are negative for the condition so it understands what kinds of information are unrelated to this disease. For instance, if we are trying to identify pulmonary nodules, the size and shape of the heart are unlikely to help us make our diagnosis. Instead, we want enough data that our classifier can isolate findings related to pulmonary nodules and ignore irrelevant visual information. That way when key findings aren’t present, the classifier will confidently predict that a disease isn’t present.

What Can You Learn from These Metrics?

When viewed together, you can get an estimate of how well a model performs when predicting on Positives (Sensitivity) and Negatives (Specificity). However, when forced to choose between prioritizing model Sensitivity or Specificity, we tend to prioritize Specificity. This is because our models are trained on many more negative images than positives.

We train on mismatched proportions because even the most common diseases only occur in a small percentage of cases; this imbalance ensures that we don’t over-predict the presence of diseases. A consequence of this is that Sensitivity and Specificity percentages are calculated on differently-sized classes, and a 1% increase in Specificity usually means a greater increase in total model accuracy than a 1% increase in Sensitivity.

The math behind these metrics is not especially complicated, but there are some nuances that require more context. If you want to learn more about how we calculate Sensitivity and Specificity, look at the In-Depth Calculation section at the end of this article.

Radiologist Agreement Rate

Radiologist Agreement Rate

The percentage of cases where two US Board Certified Veterinary Radiologists produce the same label (Positive or Negative) on an image. This serves as a real-world benchmark for evaluating AI performance.

highlight of the location of the radiologist agreement rate on the table

We calculate the Radiologist Agreement Rate by comparing the labels that expert radiologists provide on a blind set of shared images. Among this set of Positive and Negative images, we calculate the number of cases the radiologists agreed on out of the total number of cases they reviewed. Regardless of whether they agreed a case is Negative or Positive, so long as the radiologists make the same decision on an image we consider it an agreement.

Interpreting Radiographs is as much an art as it is a science! Some conditions can be easily diagnosed from radiograph findings, others cannot be. Some conditions are easily visible on a radiograph, others are not. Some conditions look similar to each other, others are completely unique! All that to say, it’s understandable why two expert radiologists may disagree when diagnosing the same patient. It also stands to reason that if a condition is hard for an expert radiologist to interpret from radiograph scans, our classifier may also have trouble consistently identifying a disease.

What Can You Learn from Radiologist Agreement Rate?

Low Agreement Rate

If the radiologist agreement rate is low, this means a condition is hard for radiologists to reliably diagnose. This is a place where our models often shine.

  • With extremely rare conditions, a clinician or radiologist may encounter it only a handful of times over the course of their career.
    • In contrast, our models are trained on hundreds or thousands of examples, so our sensitivity and specificity metrics can often surpass radiologist agreement rates.
    • Through the aggregation of clinical examples globally, these models can help you feel confident in recognizing rare findings.
  • Other times, agreement rate is low because a disease is hard for radiologists to determine visually.
    • Our models may struggle with these conditions too. Sometimes they can pick up on patterns too minuscule for the human eye to see, but other times it’s just as hard for the neural network to come to a conclusion.
    • When this is the case, our models may have low sensitivity and specificity scores that mirror low radiologist agreement rate.

High Agreement Rate

If the radiologist agreement rate is high, it means that this condition is easier for radiologists to reliably diagnose.

  • This could be because the disease presents consistently on radiographs, because it is easy to identify, or because a particular finding unambiguously indicates that disease.
    • When the agreement rate is high, model performance also tends to be high because the neural network is picking up on the same visual patterns as the radiologists.
  • However, you’ll notice that some model performance metrics don’t match their high radiologist agreement rate.
    • This is something we take seriously—we want every model to perform just as well if not better than the agreement rate so you can be confident in our predictions.

TRANSPARENCY NOTE

When you see conditions published with scores that are below the agreement rate, you can be confident that we are working to retrain a higher-performing model. Sometimes we will release a model below agreement rate because clinics have specifically requested it, and we feel confident that it has strong performance even if it is not as high as we would like. Other times, we are limited by low Positive case counts and have trained the highest-performing model we can at the time of publication. The decision usually comes down to whether it’s a high-priority condition or not.

Total Cases Evaluated

Total Test Cases

The number of unique patient cases used to evaluate a classifier and generate Sensitivity and Specificity metrics. This includes both Positive cases (disease present) and Negative cases (disease absent, which may include other conditions).

Highlight of the location of test cases on the chart

The number of evaluated cases shows the number of unique cases we used to test that particular classifier and generate our Sensitivity and Specificity metrics.

For example, the Canine Thorax condition Heart Failure Left has 10,951 total test cases, which means our performance metrics come from generating model predictions on 10,951 unique sets of radiographs, all from different dogs.

This number includes both the Positive cases where a disease is present, and the Negative cases. However, just because a case is labeled as Negative, that doesn’t always mean the animal is healthy – in fact, we make sure our set of Negative examples includes cases with a variety of other findings or diseases within the body region, just one of which is a “healthy” finding.

What Can You Learn from Test Case Counts?

As the number of test cases grows, so does the variation in examples our model is tested against. Each case introduces a unique combination of animal size, age, scan quality, and number of diseases present or absent. When a condition is tested on large quantities of data and has high Sensitivity and Specificity performance, you can feel certain that the model is robust enough to find the disease in animals of any size; it can handle any curveball case you throw at it.

An In-Depth Look: Sensitivity and Specificity Calculation

In data science, we often categorize our data by multiple labels at the same time. This can easily lead to confusion, which is why we describe outcomes using terminology like True Positive, True Negative, False Positive, and False Negative.

The table below shows the difference between each label. In short:

  • A case is True if the predicted label matches the actual label, and False if the predicted label does not match the actual label.
    • For example, if a model predicts that cardiomegaly is present in an image but a radiologist has determined that cardiomegaly is not present, we would call that classification a False Positive because the classifier falsely predicted cardiomegaly to be positive.
Condition Is Present Condition Is Absent
Model Predicts Condition as Present True Positive (TP) False Positive (FP)
Model Predicts Condition as Absent False Negative (FN) True Negative (TN)

Sensitivity

True Positives ÷ (True Positives + False Negatives)

Total correctly identified positives out of all cases where the condition is actually present

Specificity

True Negatives ÷ (True Negatives + False Positives)

Total correctly identified negatives out of all cases where the condition is actually absent

Why Specificity Improvements Have a Bigger Impact

Earlier in this article, I explained that we try to prioritize model Specificity over Sensitivity if we can no longer actively improve both metrics. Let’s explore why that is by walking through a short example.

Imagine we have a dataset with 500 Positives, 5,000 Negatives, and the model has 85% Sensitivity and 85% Specificity:

Baseline: 85% Sensitivity, 85% Specificity
Positive Cases Negative Cases Total Cases
Total 500 5,000 5,500
Predicted Correctly 425 4,250 4,675
Predicted Incorrectly 75 750 825
Metric Score 85% Sensitivity 85% Specificity 85% Accuracy

Based on the number of Positive cases, the model correctly predicted the disease on 425 cases and only misclassified 75 cases -pretty good! But 85% Specificity on 5,000 Negative cases means that 4,250 cases were predicted correctly as normal, and 750 cases were misclassified. While the scores are the same, they represent very different numbers of misclassified images.

Let’s look at what happens to model accuracy if we improve either Sensitivity or Specificity by 10% without changing the other metric’s score:

Scenario A: Improve Sensitivity by 10%
Positive Cases Negative Cases Total Cases
Total 500 5,000 5,500
Predicted Correctly 475 (+50) 4,250 4,725 (+50)
Predicted Incorrectly 25 (-50) 750 775 (-50)
Metric Score 95% Sensitivity (+10%) 85% Specificity 85.9% Accuracy (+0.9%)
Scenario B: Improve Specificity by 10%
Positive Cases Negative Cases Total Cases
Total 500 5,000 5,500
Predicted Correctly 425 4,750 (+500) 5,175 (+500)
Predicted Incorrectly 75 250 (-500) 325 (-500)
Metric Score 85% Sensitivity 95% Specificity (+10%) 94.1% Accuracy (+9.1%)

KEY TAKEAWAY

A 10% increase in Sensitivity improves overall accuracy by 0.9%, while a 10% increase in Specificity improves overall accuracy by 9.1%. When there are so many more Negative cases than Positive cases, an equivalent increase in percentage does not equate to an equivalent increase in model accuracy.

Conclusion

Assessing the performance of a disease classifier can be tricky. Sometimes it’s unclear what a metric represents, or how to compare across models. It can also be difficult to interpret when a classifier is performing well because you have to consider not only its Sensitivity and Specificity scores, but also the Radiologist Agreement Rate.

If Sensitivity and Specificity are both around 70% and the Radiologist Agreement Rate is 63%, then it’s a strong model that can pick up on details that even expert radiologists may not see. However, if a model with those same scores had a Radiologist Agreement Rate of 85%, then the model would be significantly underperforming. Everything is relative, and at Vetology we have to consider how all our metrics interact before we publish new condition classifiers.

Now that you have an idea of what these metrics mean, take a look at our classifier results. Transparency means you can be part of this process. Notice the great work we’ve done, but also notice the areas we need to work on. With our monthly bundle releases, we are constantly increasing performance of existing models and adding coverage through new disease classifiers. So please, check back in soon and see where we’ve made our latest improvements.

Want to see AI in action?

To tour the platform and learn more, contact our team, or book a demo for a firsthand look at our AI and teleradiology platform.

Vetology AI Releases Classifier Performance Metrics

Vetology AI Releases Classifier Performance Metrics

Vetology AI Becomes First and Only Veterinary Imaging AI Company to Publicly Release Comprehensive Classifier Performance Metrics

Industry-Leading Transparency Directly Addresses ACVR + ECVDI Concerns, Invites Independent Studies

]JANUARY 12, 2026 — SAN DIEGO, CA — Vetology Innovations today announced the public release of complete performance metrics for all 89+ classifiers across its diagnostic platform, making it the first and only AI company in the veterinary imaging space to provide this level of transparency.

The move is an acknowledgement of the recommendations in the American College of Veterinary Radiology (ACVR) and European College of Veterinary Diagnostic Imaging (ECVDI) position statement that identified “a key challenge” in veterinary AI: “the lack of transparency and validation for AI tools currently available for veterinary diagnostic imaging.”

The joint ACVR – ECVDI statement concluded: “There is currently no commercially available product for diagnostic imaging that meets these standards” [for transparency, validation, and safety].

“We’re changing that,” said Eric Goldman, President of Vetology. “Complete transparency isn’t a competitive advantage we’re protecting, it’s a professional obligation we’re fulfilling.”

What's Now Public

Available on Vetology’s website, the data includes condition-level sensitivity, specificity, and sample sizes across 300,000 test cases covering Vetology’s canine thorax, canine abdomen, feline thorax, feline abdomen, and spine/musculoskeletal condition classifiers.

The data includes both high performers, like the heart failure classifier with 89.5% sensitivity across 10,951 cases, and more challenging applications where AI-generated screening results serve as a decision support tool within a veterinarian-led diagnostic process, requiring professional expertise and domain knowledge to interpret and validate findings.

Why This Matters

For Researchers: Vetology welcomes collaboration with the research community as part of a shared commitment to evidence-based AI in veterinary medicine. We have partnered with institutions such as AMC New York and Tufts University (among others) on peer-reviewed studies.

Building on this foundation, Vetology invites researchers to engage with us on independent validation efforts, access additional performance data, or propose collaborative studies that advance transparency, rigor, and clinically meaningful evaluation.

For Board-Certified Radiologists: Vetology is inviting radiologists to work alongside us in shaping the future of veterinary AI imaging. As these tools become more integrated into clinical workflows, radiologist expertise is essential to helping define the guardrails, best practices, and professional standards that ensure AI supports, rather than distorts, patient care.

Through collaboration around transparent performance data, radiologists can help clarify where AI aligns with real-world clinical needs, where limitations remain, and what benchmarks the profession should expect from all vendors. This partnership is about collectively defining what “good enough” means in practice, strengthening industry-wide transparency, and establishing validation approaches that protect veterinarians and the animals they serve.

For General Practitioners: Vetology views general practitioners as essential partners in the responsible use of AI at the point of care. Transparent, classifier-specific performance data supports informed clinical judgment, by helping veterinarians understand where AI can meaningfully assist, where additional scrutiny is warranted, and how uncertainty should be factored into decision-making.

This shared responsibility encourages appropriate confidence without over-reliance. It reinforces professional judgment while supporting better, more consistent care for patients, and clearer communication with pet owners. Trust your training: AI can inform the veterinarian, but it cannot replace medical insight and domain knowledge.

For Regulatory Bodies: Vetology supports collaboration with regulators in developing thoughtful, evidence-based approaches to AI oversight. Publicly available performance data provides the empirical foundation needed to move beyond one-size-fits-all regulation and toward standards that reflect real differences across conditions, modalities, and clinical use cases. By working together, regulators, clinicians, and developers can help ensure imaging AI governance evolves in a way that protects patients, supports veterinary professionals, and aligns with the nuanced oversight long advocated by leaders such as the ACVR and ECVDI.

Beyond Academic Interest: Clinical Integration That Works

“We’re releasing our performance data so veterinarians can make confident decisions in everyday practice, and so the industry can move forward in establishing clear best practices and gold standards for AI in veterinary imaging,” said Cory Clemmons, Chief Technical Officer. “Transparency is how we build trust today, and a better future for patient care.”

Practical applications include:

    • Risk-stratified triage: High-sensitivity classifiers enable confident rule-outs in screening scenarios, while moderate-sensitivity classifiers signal when additional imaging or specialist consultation adds value.
    • Workflow optimization: High-confidence AI results help identify straightforward cases that may not require additional specialist review, while borderline or complex findings signal when radiologist consultation adds meaningful diagnostic value, enabling veterinary teams to allocate incremental diagnostic expenditures where they matter most for patient care.

Addressing Good Machine Learning Practice

The joint ACVR – ECVDI position statement emphasizes development “in accordance with good machine learning practices,” with particular focus on transparency, error reporting, and clinical expert involvement.

Vetology’s public metrics directly support these principles by enabling third-party evaluation, benchmarking against radiologist agreement rates, and providing visibility into both false positive and false negative characteristics through publicly reported sensitivity and specificity.

A Call to the Industry

“Every imaging AI company in this space will eventually publish performance data, either voluntarily or when regulators require it,” Goldman said. “We’re choosing to lead because transparency accelerates trust, and trust accelerates adoption of tools that genuinely help patients and practitioners.”

Vetology hopes this action encourages industry-wide adoption of open validation practices and provides a template for the kind of disclosure the ACVR and ECVDI explicitly urged.

What's Next

Vetology will update performance metrics as classifiers are retested, and publish the same comprehensive data for every new classifier launched, with new releases planned monthly. The company welcomes collaboration with academic institutions, regulatory bodies, and practicing veterinarians to refine validation methodologies and establish industry-wide standards.

 

# # #

ABOUT VETOLOGY

Vetology is a veterinary imaging support company that provides AI-generated radiology reports and traditional teleradiology services by board-certified veterinary radiologists. Built by radiologists, Vetology focuses on improving patient outcomes through accuracy, speed, and reliability in diagnostic imaging. Our platform is designed to integrate seamlessly into existing hospital workflows, helping clinicians make informed decisions quickly. Learn more at vetology.net.

Media Contacts

Thanks for reading! If you’d like to learn more or have any questions, we’d love to hear from you.

Behind The Scenes With the Vetology Support Team 

Behind The Scenes With the Vetology Support Team 

In veterinary medicine, time is short and expectations are high. Clients want answers about their pet’s health quickly, and AI-powered platforms like Vetology can help you deliver. But what happens when you have a question about your AI screening report, need to speak with a human radiologist, or want to train your team to use the platform?

Our client care team is ready to help at a moment’s notice. Clients who regularly interact with the support team tend to get better results from the platform, work more efficiently, and build greater confidence with our AI and teleradiology tools.

Here’s a look at the Vetology support services we provide at no additional cost to help users get the most from our platform.

The Team Behind The Screen

Vetology’s support team is small but mighty. Together, they handle over 14,000 communications each year, including phone calls, emails, scheduled trainings, and now, live on-platform chats.

Our support providers include a blend of veterinary techs, technology, and customer care professionals. With many years of combined experience across multiple disciplines, they’re capable of handling everything from onboarding and software installation to troubleshooting, clinical questions, radiologist follow-ups, and veterinary team coaching.

We’d like to introduce you to two of our key support team members:

Tammie McGill

Tammie McGill

Tammie McGill spent nearly two decades as a human EMT before transitioning into a role as a veterinary assistant. After gaining years of clinical experience, she now uses her strong veterinary technician skills to provide clinical support to Vetology users, which includes answering AI report questions, coordinating discussions with interpreting veterinary radiologists, monitoring radiograph quality, and helping clinical teams troubleshoot imaging techniques to improve safety and optimize outcomes.

Tammie and her fellow veterinary technician, Vivian Paz, also work closely with the radiologists, data scientists, and development teams, offering valuable advice and domain-specific insight.

Sandra Nemis

Sandra Nemis

Sandra Nemis came to Vetology after several years of managing customer care teams, including a technical supervisor role.

She now leads the Vetology support team through client interactions, handles clinic demos, installations, onboarding, training, and day-to-day platform support.

With the help of additional support team members, Aziz Beguliev, Chey Aranzasu, Kath Dato, and our SVP of Information Systems Ruben Venegas, Tammie and Sandra ensure that no question goes unanswered and no case falls through the cracks.

While most have been on the team for more than five years, tenures span from new members to 15 years, reflecting a mix of institutional knowledge and fresh perspectives to help deliver consistently excellent service and fast communication.

From Demo to Diagnostics

When clinics reach out to Vetology through the website, email, or phone, they establish a relationship with our tight-knit client care team from day one.

The onboarding process for new Vetology clients is quick and efficient. After a client completes a short form with clinic information, the Vetology support team creates their internal profile, configures platform access, and schedules an installation and training session.

“We remote into the X-ray computer, add our destination settings to ensure communication, and enable the auto-send feature,” explained Sandra. “When team members take X-rays, they don’t have to do anything extra; the images automatically go to the Vetology platform. Within a few minutes, they have an AI screening report and can submit to a board-certified radiologist, if desired.”

The entire process of installing and configuring the platform and providing initial training to key team members typically takes less than an hour, so you can be up and running fast and avoid downtime in the clinic.

Clinical Coaching and Aftercare

Vetology’s support combines technical help with clinical collaboration. Our two veterinary support specialists have nearly three decades of combined experience. Together, they provide a crucial “aftercare” service for teams using the Vetology platform.

When the team spots an issue with image quality or safety, they provide feedback and coaching. They can offer tips for technicians to hone their radiology skills and how to use positioning aids, something that they may or may not have learned or practiced in school.

“Clinics are very responsive when we reach out,” said Tammie. “I’ve also had doctors call to ask, ‘What else can we do to make this better?’ I’ll talk to anybody in the clinic that has time or is willing to learn more.”

Coaching support helps improve image quality and diagnostic accuracy, reduce retakes, and protect team members from unnecessary radiation exposure.

Contacting Vetology Support

You can contact Vetology’s support team by phone, email, the website, or the live chat feature on the platform.

However you choose to contact the team, you can expect a rapid response. The team is available from 6 a.m. to 6 p.m. Pacific time, and responds to emails during regular hours within five to 10 minutes. If you have a question after hours, send an email you can expect a response first thing the following morning.

Most importantly, when you contact Vetology’s support team, your questions will be answered by a real person. Our goal is to provide quick help so you get the most from our platform without slowing down your day.

Practice Support That Delivers

The best veterinary technology platforms and imaging tools not only provide a place to process images, but they also help teams use them to their full extent. Vetology’s support team aims to provide accessible, proactive help during your daily workflows, when you need it most. We want to ensure that clinics feel supported, confident, and ready to make the most of every feature.

When clinics use our responsive support, teams learn to optimize their images and submissions, radiologists and AI screenings have higher-quality studies to work from, reports become more accurate, and pets receive better, more timely care.

Trusted Support is a Click or Call Away

Our helpful, professional, human support team knows your clinic, understands your challenges, and wants you to succeed. From onboarding to aftercare, we’re committed to helping clients use our AI and teleradiology systems more confidently every day.

Contact Us

Ready to see what it’s like to have a support team that works the way you do? Contact us or schedule a demo to meet the team and discover how Vetology helps clinics deliver better care with our simple, yet powerful platform and world-class support.

Ethical AI in Veterinary Imaging

Ethical AI in Veterinary Imaging

The Ethics of Veterinary AI: Trusting Your Teleradiology Platform

Veterinary AI can screen radiographs in seconds, helping veterinary teams make faster, more accurate decisions. But powerful technology comes with the responsibility of building and using it ethically and with complete transparency.

Vetology’s AI is trained on species-specific veterinary images by boarded veterinary radiologists and data scientists for use by veterinary teams. Our products follow ethical standards and adhere to good machine learning practices to ensure our veterinary software assists veterinarians in reading images without replacing their (human) clinical and domain expertise.

To meet the needs of today’s veterinary professionals, veterinary AI platforms must be trustworthy, ethical, and transparent. Here’s what that means for practices using these tools, and what sets us apart.

Key Takeaways

  • Veterinary AI should augment, not replace, clinical decision-making. Vetology’s AI radiology platform provides an initial screening report but cannot provide a definitive diagnosis.
  • Trustworthy AI products and companies prioritize transparency, safety, and accuracy while disclosing limitations.
  • AI accuracy and utility rely on a diverse training dataset and thoughtful development to improve reliability.

Why Do AI Ethics Matter?

At Vetology, our interpretation of ethical AI is baked into the core of our products. Our team cares about patient wellness as much as our clients do. That’s why we are careful to position our tools as screening aids, not diagnostic replacements. Guiding clinicians in how to use this evolving technology responsibly is a critical part of our mission.

Ethical AI starts with how systems are trained, validated and deployed. While veterinary medicine does not have a HIPAA equivalent, our ethical responsibility is no less important than that of human medical professionals. In many ways, the lack of formal regulation makes it even more important for veterinary AI companies to build software with integrity and transparency.

The best veterinary AI should be transparent about its purpose, built on thoughtfully designed data, and always safeguard patient and client privacy, giving veterinarians confidence in every result. These principles guide our development at Vetology and support veterinarians in integrating AI responsibly to complement their treatment decisions.

Veterinary diagnostics have inherent challenges, with or without utilizing AI. Image quality, proper positioning, collimation, and capturing the right number of views all affect both a radiologist’s and a clinician’s ability to identify disease conditions. AI cannot overcome non-diagnostic images, and veterinary AI companies should be transparent when image quality limits their ability to interpret results.

Finally, ethical communication also means being realistic about capabilities. Technology should never promise more than it can deliver. Companies should not make broad claims such as “AI matches the full depth and scope of traditional radiology reports” or “AI can detect subtle differences that humans cannot.” Accuracy and realism help maintain trust and ensure AI is used effectively to support patient care.

Veterinary AI Supports Decision-making

Vetology’s AI and Language Models are carefully trained on species-specific veterinary phrasing and images, which allows us to flag findings and generate preliminary conclusions and recommendations. These outputs are designed to assist clinicians, but they do not replace human expertise. In fact, they rely on it: a radiologist or veterinarian is always required to interpret results within the full clinical context.

Our AI functions as a screening tool, it reviews images without access to the patient’s history, lab results, or signalment. Instead, it analyzes visual patterns to identify potential abnormalities and generates an initial report.

The distinction between a screening report and a diagnostic report may seem subtle, but it’s significant. Screening reports highlight potential abnormalities and speed up interpretation, allowing veterinarians (domain experts) to focus their time, prioritize additional diagnostics, and narrow their clinical differentials.

Ethical AI Training and Data Handling

Ethical AI development also requires responsible handling of training data. While a dog won’t mind if a computer learns about comparative heart sizes from its X-rays, its owner’s data deserves protection.

Vetology’s AI was trained on more than 15 years of veterinary radiology reports from more than 1,000 clinics and 20 board-certified veterinary radiologists. This dataset reflects real-world veterinary cases across species, breeds, and radiographic variations; equally important is how the data is collected.

Before training a new condition classifier, client and patient data are anonymized. Identifying details are removed, while essential information such as a pet’s (first) name, signalment, and history may be retained for clinical context. Vetology’s images and data are used solely for AI training and never shared beyond that purpose.

Veterinary AI Accuracy

Vetology’s radiology AI screening tool is not a generative model; this distinction is important. Generative AI can introduce “hallucinations,” or fabricated yet seemingly accurate interpretations, which can be dangerous in a medical context. Instead, Vetology’s system analyzes X-rays and generates screening reports using pre-defined veterinary medical terminology.

This supervised training approach is critical for medical AI. This means clinicians can lean on the AI’s flags and recommendations, while still relying on their own expertise and patient context to make final diagnostic and treatment decisions.

In veterinary imaging, accuracy, reproducibility, and patient safety must come first. Our approach prioritizes these principles to enhance clinical decision-making while minimizing risk.

Veterinary AI: A Clinical Level-Up

AI works best when it amplifies human expertise rather than trying to replace it. Vetology’s screening reports work with the clinician, because the veterinarian is the final decision-maker.

By integrating AI into their workflow, veterinary teams can streamline interpretation, manage caseloads more efficiently, and reduce cognitive load, all while ensuring patients receive the highest standard of care. It’s important to highlight the role veterinarians themselves play in the ethical use of AI in practice. By combining AI insights with clinical judgment, critical thinking, and diagnostic data, veterinarians can ensure that their use of AI innovations prioritizes the well-being of their patients, clients, and the professionals delivering care, while integrating AI tools safely and responsibly.

Here’s how veterinary AI for radiology fits into a typical clinical workflow:

  • AI screening: The system analyzes images and generates a screening report with possible findings.
  • Combine AI with clinical expertise: The veterinarian interprets the AI report alongside clinical judgment and patient-specific case details to form a complete picture.
    • It’s key that AI and human observations combine to formulate the next steps in the pet’s diagnostic or treatment plan.
  • Escalate as needed: If uncertainty remains, the clinician can request a teleradiology review from a board-certified radiologist.
  • Maintain transparency: Explain to clients how AI is used in their pet’s care.
  • Stay informed: Keep up with AI updates, best practices, and emerging research.
  • Educate the team: Ensure all staff understand the AI’s capabilities, limitations, and ethical responsibilities.

Vetology workflows respect the expertise of the veterinary team, support efficiency, and reduce the mental load of routine case triage without diminishing or removing the clinician’s critical role.

Vetology is a leader in the field, developing AI tools that clinicians can trust. Schedule a demo to learn more and discover how our AI can support your team without replacing the expertise of the professionals who dedicate their lives to animal care.

Want to see AI in action?

To tour the platform and learn more, contact our team, or book a demo for a firsthand look at our AI and teleradiology platform.

Is AI Better Than a Veterinary Radiologist at Reading Pet X-rays?

Is AI Better Than a Veterinary Radiologist at Reading Pet X-rays?

This article examines the comparison between using AI in veterinary radiology and the human experience. Even though AI does improve efficiency by pre-screening X-rays and generating reports, it cannot replace radiologists due to variability in interpretation. AI performs best in clear conditions with strong expert agreement, while complex cases still require human expertise. Read more about how AI in radiology:

  • Addresses the shortage of veterinary radiologists.
  • Helps with pre-screening and structured reports.
  • Works well for conditions like hepatomegaly or pericardial effusion.
  • Supports, not replaces, veterinary radiologists.

AI Versus Veterinary Radiologists: Collaboration, Not Competition

About 94 million U.S. households own at least one pet.[1] That’s a lot of furry, feathered, and scaly family members that may potentially need radiographs to diagnose a medical condition. However, there are only 667 board-certified radiologists in the country [2] creating a bottleneck in radiology services. This shortage can correlate to longer wait times, increased anxiety for clinicians and pet owners, and potential delays in diagnosing critical conditions.

This is where artificial intelligence-based radiology tools can help—not to replace veterinary radiologists, but to support them. Artificial intelligence (AI) can pre-screen images, highlight abnormalities, and generate structured reports, allowing radiologists to focus on complicated cases while improving efficiency for general practitioners. But, how does AI compare to human expertise?

Not all conditions are created equal

Radiology is not an exact science but rather an interpretive discipline that relies on pattern recognition, clinical judgement, and experience. Board-certified veterinary radiologists undergo extensive training, but they don’t always agree on image interpretations, especially if the changes are subtle or the patient has multiple diagnoses, creating overlapping signs.

Studies have shown that radiologists tend to have a high level of agreement when interpreting X-rays that display clear and advanced disease. However, variability in interpretation increases when findings are more subtle, as may be the case in early-stage tumors, mild joint changes, or diffuse lung patterns that could indicate interstitial or early inflammatory disease. When subtle abnormalities are suspected, additional imaging, such as ultrasound, computed tomography (CT), or magnetic resonance imaging (MRI), can provide greater anatomical detail and diagnostic confidence.

How interpretive variability affects AI performance assessment

Understanding variabilities in radiologist interpretations is necessary to fairly evaluate the AI’s diagnostic accuracy, sensitivity, and specificity.

  • AI algorithms rely on human-labeled data (i.e., ground truth) to learn how to detect and classify abnormalities, and if radiologists don’t agree on a diagnosis, the ground truth may have some degree of subjectivity.
  • AI radiology tools are evaluated using accuracy, sensitivity, and specificity, but these measures must be analyzed in the context of how consistently radiologists themselves diagnose the condition.
  • If two radiologists interpret the same case differently, the AI may match one but disagree with the other. This doesn’t mean that the AI is wrong; it only highlights the inherent variability in radiology.

How interpretive variability affects AI radiology use

The inherent variability in veterinary radiology associated with certain conditions means that some are well-suited for AI screening while others aren’t.

For example, conditions such as hepatomegaly, esophageal enlargement, and the presence of pericardial effusion have a high radiologist agreement rate and are well-suited for AI screening.

At Vetology, each AI-generated report includes a clear list of the conditions assessed, so it’s clear exactly what was evaluated, what was flagged, and what falls outside the scope of the current screening. This provides veterinarians with a solid understanding of the AI’s capabilities and limitations, enabling them to focus their clinical decisions on conditions that were not screened for, without expecting input on findings beyond the AI’s parameters.

image of Vetology's AI report featured on a tablet or ipad

Vetology’s AI tools provide guidance for a wide range of thoracic, abdominal, and musculoskeletal conditions in canine and feline patients, including—but not limited to—the following

Abdominal Classifiers

  • Liver enlargement
  • Masses that may indicate neoplasia or inflammatory processes
  • Splenic changes, commonly linked to systemic or localized disease
  • Kidney abnormalities such as mineral deposits, structural size variations that may suggest neoplasia, inflammation, or systemic disease
  • Bladder and urethral stones
  • Pregnancy detection
  • Gastrointestinal tract abnormalities, which may indicate obstruction, motility issues, or other conditions
  • Peritoneal fluid accumulation, inflammation, or infection

Thoracic classifiers

  • Pulmonary patterns
  • Cardiomegaly
  • Pleural fissure lines
  • Fluid accumulation
  • Soft tissue pulmonary nodules
  • Masses
  • Vascular enlargement

Leveraging AI screening alongside teleradiology

Vetology allows veterinarians to optimize AI radiology screening tools and teleradiology services to enhance diagnostic accuracy, improve efficiency, and expedite patient care.

For example, let’s say you handle 60 X-ray cases a month, and you send out only 10 for teleradiologist review to avoid the expense. A Vetology subscription, which provides unlimited access to AI screening and full reports in as little as five minutes, could support your clinical expertise, helping to confirm your suspicions and streamline decision-making. If you still have doubts about a case, you can escalate it for review by a board-certified veterinary radiologist.

This approach creates a three-tiered approach to patient care, integrating:
• AI insights
• With your professional judgement,
• and expert validation from a radiologist when needed.

Collaborating with the Vetology team can help ensure that your patients receive a timely diagnosis and treatment plan, allowing them to receive the care they deserve quickly.

radiograph showing a well positioned and collimated Canine Thorax

How you can support accurate AI screening and faster board certified radiologist reports

One of the most important factors that lead to an accurate AI screening is good radiographic technique. Clear, well-positioned, well-developed radiographs are necessary for accurate human and AI interpretation, and the AI does not have the ability to adjust its interpretation based on altered positioning or an unclear image.

For example, if a patient is slightly twisted, anatomical structures may appear distorted on the image. This can lead the AI to misread the size or shape of an organ, or even misidentify a condition. Human radiologists can identify when a patient isn’t perfectly positioned and adjust their interpretation, but AI doesn’t yet have that context—it reads exactly what’s in front of it.

You can take the following measures to increase the likelihood of accurate AI screening:

  • Ensure proper positioning of each patient
  • Choose the correct radiographic settings to ensure a clear image
  • Take at least two views (ventrodorsal and lateral views) of the area to be assessed every time.
  • Collimate down to the region of interest to reduce scatter.

Vetology offers personalized, on-demand support tailored to answer your needs and questions. Our team of radiologists and veterinary technicians is always available to provide free, one-on-one guidance with positioning skills and technical assistance (in some cases), whether you’re a seasoned practitioner, a new team member, or a recent graduate.

References
[1] According to the American Pet Products Association (APPA) 2025 State of the Industry Report published stats in Today’s Veterinary Business, April, 2025.
[2] AVMA published statistics – veterinary specialists in the United States as of December 31, 2024.AVMA published statistics – veterinary specialists in the United States as of December 31, 2024.

AI and Teleradiology Questions: Answered

To learn more about Vetology and see our platform in action, click this box, to contact the Vetology support team.

Vetology’s Approach to AI in Veterinary Diagnostics: Radiologist Consensus in Action

Vetology’s Approach to AI in Veterinary Diagnostics: Radiologist Consensus in Action

The article explores the challenges of variability in veterinary radiology interpretations and how integrating AI in veterinary diagnostics can improve consistency, accuracy, and efficiency in diagnostic imaging. It highlights the role of AI as a supportive image screening tool that complements expert veterinary radiologists. In this article we’ll cover how:

  • Radiograph interpretation can vary based on bias, experience, image quality, and clinical context.
  • The use of AI in veterinary radiology can reduce interpretation inconsistencies.
  • AI reports support—rather than replace—veterinary experts, helping boost accuracy and consistency in diagnostic imaging

Understanding Variability in Veterinary Radiology Interpretations

Veterinary radiology is a common diagnostic tool used to evaluate conditions ranging from orthopedic injuries to internal diseases. However, the reality is that radiologists don’t always agree on an image’s interpretation. Unlike laboratory tests with definitive results, radiology reports are clinical opinions, influenced by individual expertise, experience, and subtle differences in image quality. This variability in diagnosis can lead to differences in treatment recommendations and patient outcomes.

In this article, we look at the factors that influence these discrepancies and how advancements in artificial intelligence (AI)-assisted veterinary radiology can help improve consistency in diagnostic imaging reporting.

Reasons for Variability in Veterinary Radiology

Radiology combines art and science, and differences in image interpretation are common, even among board-certified radiologists. Radiology reports rely on expert opinion, which can vary based on several factors:

  • Subjectivity and cognitive bias — Radiologists rely on pattern recognition to identify abnormalities, and subtle differences in perception and cognitive bias can lead to different conclusions. For example, confirmation bias may make the radiologist see what aligns with their expectations, and anchoring bias can make them stick to an initial assessment.
  • Experience — A radiologist’s experience can influence their interpretative skills and diagnostic approach, and their background can shape how they assess an image. For instance, specialists in orthopedic imaging may emphasize bone structure, while those with soft tissue expertise may focus more on organ abnormalities.
  • Image quality — Underexposed or overexposed images can obscure fine details, and poor patient positioning may affect visibility, leading to misinterpretation.
  • Clinical context — Veterinary radiologists are trained to interpret images without first looking at the patient’s history to keep their assessment objective. That said, a strong clinical history that includes exam findings and relevant background helps shape a more complete and accurate report. The more context a radiologist has, the better they can tailor conclusions and recommendations. In some cases, the same images might lead to different interpretations depending on the clinical details provided.
  • Complex cases — Some conditions, such as early-stage tumors, inconspicuous fractures, or certain lung diseases, can present with subtle or overlapping features, making classification difficult. Differences in how radiologists weigh the significance of these findings can lead to varying interpretations.
  • The human factor — Radiologists are humans, and issues such as fatigue and time constraints can impact diagnostic accuracy. Evaluating hundreds of images per day can also impact a radiologist’s mental focus, and a heavy workload may lead to less thorough evaluations.

Radiologist Consensus and Variability

When developing our veterinary AI radiology tool, the Vetology team set out to understand where radiologists consistently agreed—and where their interpretations differed. Identifying conditions with high agreement rates between different radiologists guides our selection criteria for building new AI classifiers. Studying patterns of diagnostic variability helps train the models to better handle ambiguous cases.

This process isn’t static. Our models continue to evolve through regular retraining, and input from real-world clinical use. Feedback from veterinarians and our internal human case reviews play a key role in flagging areas where the AI might need more structure or refinement. It’s all part of our goal to ensure the AI aligns with expert-level thinking and delivers meaningful support.

To support our understanding of diagnostic consistency, the team asked veterinary radiologists—without any involvement from AI—to independently evaluate and diagnose images with a wide variety of canine conditions. The radiologists showed high levels of agreement with one another on conditions such as pregnancy, urinary stones, hepatomegaly, small intestinal obstruction, cardiomegaly, pericardial effusion, and esophageal enlargement. In other words, these diagnoses were more consistently interpreted across different experts.

In contrast, there was noticeably lower agreement among radiologists on conditions like pyloric gastric obstruction, right kidney size, subtle or suspicious nodules, and bronchiectasis, indicating that these findings tend to generate more varied interpretations even among experienced professionals.

How AI Compares

AI has demonstrated significant potential in enhancing diagnostic processes and reducing variability when reading radiographs. For example, the Vetology team found that the radiologist agreement rate for canine hepatomegaly was 92%, while the Vetology AI tool had an 87.29% sensitivity and a 92.34% specificity. Third-party peer reviews also demonstrate the product’s value.

Researchers at Tufts University, Cummings School of Veterinary Medicine, performed a retrospective, diagnostic case-controlled study to evaluate the performance of Vetology AI’s algorithm in the detection of pleural effusion in canine thoracic radiographs. Sixty-one dogs were included in the study, and 41 of those dogs had confirmed pleural effusion. The AI algorithm determined the presence of pleural effusion with 88.7% accuracy, 90.2% sensitivity, and 81.8% specificity.

Researchers at the Animal Medical Center in New York, New York, performed a prospective, diagnostic accuracy study to evaluate the performance of Vetology AI’s algorithm in diagnosing canine cardiogenic pulmonary edema from thoracic radiographs, using an American College of Veterinary Radiology-certified veterinary radiologist’s interpretation as the reference standard. Four hundred eighty-one cases were analyzed. The radiologist diagnosed 46 of the 481 dogs with cardiogenic pulmonary edema (CPE). The AI algorithm diagnosed 42 of the 46 cases as CPE positive and four of the 46 as CPE negative. When compared to the radiologist’s diagnosis, the AI algorithm had a 92.3% accuracy, 91.3% sensitivity, and 92.4% specificity.

AI radiology tools can never replace the expertise of board-certified veterinary radiologists, but they can serve as valuable assistants, enhancing efficiency, consistency, and diagnostic accuracy. Vetology’s AI tool is proven to be accurate and reliable to ensure standardized interpretations. While final diagnoses and treatment decisions will always remain the responsibility of an experienced professional, AI serves as a powerful support system, helping to optimize patient care and improve veterinary radiology services.

Want to see AI in action?

To learn more, contact our Vetology team or book a demo for a firsthand look at our AI and teleradiology platform.

Pin It on Pinterest