Provider Data Accuracy: The Problem with “95% Accuracy” Claims (Part 1)

This post kicks off a two-part series on provider data accuracy. Aaron Beach, Orderly’s VP of Data and Engineering, takes a technical look at one of the biggest challenges in healthcare data: how accuracy is actually measured. His deep dive explains why many industry accuracy claims don’t hold up under scrutiny. In a companion post, Orderly President Kevin Krauth explores what these insights mean for healthcare leaders evaluating data solutions.

Executive Summary

When evaluating healthcare data providers, you’ll often encounter claims of “95% guaranteed accuracy” or similar guarantees. This document explains why such claims are fundamentally problematic and presents a rigorous, mathematically sound methodology for estimating data accuracy that acknowledges the inherent limitations of measurement.

The core challenge is simple: you cannot directly measure the accuracy of data without a perfect measurement mechanism. Since no measurement process is perfect, we can only observe agreement rates between our data and imperfect verification methods. This document demonstrates how to derive meaningful accuracy estimates from agreement rates, while being transparent about the assumptions required and the limitations of such estimates.

The Fundamental Measurement Problem

The Impossibility of Direct Accuracy Measurement

Consider a simple question: “How accurate is our practitioner address data?” To answer this, you need to compare your data against some “ground truth.” But how do you establish ground truth? The most common approaches are:

  • Phone attestation: Call practitioner offices to verify information
  • Web scraping: Extract information from official websites
  • Third-party verification: Compare against other data providers
  • Self-reported updates: Allow practitioners to update their own information

Each of these methods has errors. Phone attestation, for example, suffers from:

  • Human error by call center staff
  • Miscommunication or misunderstanding
  • Outdated information provided by office staff
  • Incomplete information (e.g., missing suite numbers)

Empirical analysis reveals a critical finding: even when phone attestations are performed on the same data within a very short time period (e.g., the same day), they only agree with each other approximately 87.5% of the time. This means that even if your data were 100% accurate, phone attestation would not measure it as 100% accurate.

This observation has profound implications: without a perfect attestation mechanism, you cannot directly measure accuracy. What you can measure is how often an imperfect attestation process agrees with your data—we call this the agreement rate.

The Agreement Rate Reality

Empirical analysis reveals a critical finding: even when phone attestations are performed on the same data within a very short time period (e.g., the same day), they only agree with each other approximately 87.5% of the time. This means that even if your data were 100% accurate, phone attestation would not measure it as 100% accurate.

This observation has profound implications: without a perfect attestation mechanism, you cannot directly measure accuracy. What you can measure is how often an imperfect attestation process agrees with your data—we call this the agreement rate.

Why This Matters for Business Decisions

This distinction matters enormously for several reasons:

  1. Vendor Evaluation: When a vendor claims “95% accuracy,” are they reporting agreement rates or true accuracy? Without understanding their methodology, you cannot meaningfully compare providers.
  2. Compliance and Reporting: Regulatory requirements often demand accuracy metrics. Using agreement rates without proper context can mislead stakeholders about actual data quality.
  3. Resource Allocation: Understanding true accuracy helps prioritize which data fields or sources need the most attention and investment.
  4. Risk Assessment: Datasets with low agreement rates (e.g., below 50%) are likely even less accurate than measured, because measurement errors can create false positives that inflate the apparent accuracy.

Agreement Rate vs. True Accuracy

The Simplest Approach: Using Agreement Rates Directly

The simplest approach is to acknowledge that without a perfect measurement mechanism, we can never truly measure accuracy, and simply use agreement rates as a proxy. This is honest and transparent, but it has limitations:

  • Agreement rates and true accuracy have a non-linear relationship: when true accuracy is above 50%, agreement rates are lower than accuracy; when true accuracy is below 50%, agreement rates actually overestimate accuracy (due to false positive agreements from correlated errors)
  • Different measurement methods will yield different agreement rates for the same data
  • Agreement rates don’t directly translate to business impact (e.g., “How many failed claims adjudications will this prevent?”)

Estimating True Accuracy: The Ideal True Accuracy

For reporting, compliance, and business planning purposes, it may be useful to estimate actual accuracy from agreement rates. This requires making assumptions about the nature of measurement errors.

Key Assumption: We assume that errors in the attestation mechanism are due to a random process (e.g., random human error) rather than systematic or correlated errors.

This assumption is “ideal” in a specific sense: it gives us the maximum accuracy estimate possible under the constraint that we can only observe agreement rates. However, the reason we make this assumption is not because we believe errors are necessarily random, but because we have no reliable way to measure systematic correlations in the attestation systems we use.

For example, if staff at a particular hospital consistently provide incorrect information for certain doctors (perhaps due to outdated internal records or systematic data entry errors), we cannot verify this correlation without access to ground truth data. Since we don’t have access to ground truth, we cannot measure the extent of such systematic correlations.

If we attempted to account for correlations, we would have to assume that anything could potentially be correlated, making it impossible to place meaningful bounds on the error rate. Therefore, we frame accuracy as a range:

  • The agreement rate provides the lower bound—this is what we can directly observe and measure.
  • The Ideal True Accuracy provides the upper bound—this is the best-case estimate assuming random, uncorrelated errors.

The true accuracy lies somewhere between these bounds. If errors are systematic or correlated, true accuracy will be closer to the agreement rate (the lower bound). If errors are truly random and uncorrelated, true accuracy will be closer to the Ideal True Accuracy (the upper bound). We call this estimate “Ideal” because it represents the upper bound of the accuracy range under ideal circumstances.

The Mathematical Framework

Notation and Setup

Let us formalize the problem:

Definition 1 (Agreement Rate). The agreement rate \(A_{12}\) is the observed frequency with which a dataset (source 1) agrees with an imperfect attestation mechanism (source 2).

Definition 2 (Self-Agreement Rate). The self-agreement rate \(A_1\) is the frequency with which the imperfect attestation mechanism agrees with itself when applied twice to the same data (e.g., two phone calls to verify the same information).

Definition 3 (True Error Rate). The true error rate \(E_2\) is the actual proportion of errors in the dataset. The true accuracy is then \(\theta = 1 - E_2\).

The Relationship Between Agreement and Accuracy

When two independent, imperfect measurement processes (each with accuracy \(\theta\)) are applied to the same data, they will agree when:

  • Both are correct: probability \(\theta^2\)

  • Both make the same error: probability \((1-\theta)^2\) (assuming symmetric, independent errors)

The total agreement rate is:

\(A = \theta^2 + (1-\theta)^2 = 2\theta^2 - 2\theta + 1\)

However, this model assumes both sources have the same accuracy. In our case, we have:

  • Source 1: Our dataset (unknown accuracy \(\theta_1\))

  • Source 2: The attestation mechanism (unknown accuracy \(\theta_2\))

When source 1 (accuracy \(\theta_1\)) and source 2 (accuracy \(\theta_2\)) are compared, they agree when:

  • Both are correct: \(\theta_1 \theta_2\)

  • Both make the same error: \((1-\theta_1)(1-\theta_2)\)

The agreement rate between sources 1 and 2 is:

\(A_{12} = \theta_1 \theta_2 + (1-\theta_1)(1-\theta_2) = 2\theta_1\theta_2 - \theta_1 - \theta_2 + 1\)

When source 2 is compared with itself (two independent applications), the self-agreement rate is:

\(A_1 = \theta_2^2 + (1-\theta_2)^2 = 2\theta_2^2 - 2\theta_2 + 1\)

The Critical 50% Threshold

An important insight emerges from the relationship between agreement rates and accuracy: the relationship reverses at the 50% threshold.

When true accuracy \(\theta = 0.5\) (50%), the agreement rate \(A = 0.5\) as well. However:

  • When \(\theta > 0.5\): Agreement rates are lower than true accuracy. For example, if \(\theta = 0.9\) (90% accurate), the agreement rate is \(A = 0.82\) (82%).

  • When \(\theta < 0.5\): Agreement rates are higher than true accuracy. For example, if \(\theta = 0.3\) (30% accurate), the agreement rate is \(A = 0.58\) (58%).

This counterintuitive result occurs because when accuracy is very low, the probability that both sources make the same error (creating a false agreement) becomes significant. This is why datasets with low agreement rates (below 50%) are particularly concerning—they represent data that is even less accurate than the agreement rate suggests.

Deriving True Accuracy from Agreement Rates

From the self-agreement rate \(A_1\), we can solve for the attestation mechanism’s accuracy \(\theta_2\):

\(A_1 = 2\theta_2^2 - 2\theta_2 + 1\)

\(2\theta_2^2 - 2\theta_2 + (1 - A_1) = 0\)

Using the quadratic formula:

\(\theta_2 = \frac{2 \pm \sqrt{4 - 8(1-A_1)}}{4} = \frac{2 \pm \sqrt{8A_1 - 4}}{4} = \frac{1 \pm \sqrt{2A_1 - 1}}{2}\)

Since accuracy must be between 0 and 1, and we expect \(\theta_2 \geq 0.5\) for a useful attestation mechanism, we take the positive root:

\(\theta_2 = \frac{1 + \sqrt{2A_1 - 1}}{2}\)

Now, from the agreement rate \(A_{12}\) between our dataset and the attestation mechanism:

\(A_{12} = 2\theta_1\theta_2 - \theta_1 - \theta_2 + 1\)

Solving for \(\theta_1\) (the true accuracy of our dataset):

\(A_{12} = 2\theta_1\theta_2 - \theta_1 - \theta_2 + 1\)

\(A_{12} + \theta_2 - 1 = \theta_1(2\theta_2 - 1)\)

\(\theta_1 = \frac{A_{12} + \theta_2 - 1}{2\theta_2 - 1}\)

Substituting our expression for \(\theta_2\):

\(\theta_1 = \frac{A_{12} + \frac{1 + \sqrt{2A_1 - 1}}{2} - 1}{2 \cdot \frac{1 + \sqrt{2A_1 - 1}}{2} - 1} = \frac{A_{12} - \frac{1 - \sqrt{2A_1 - 1}}{2}}{\sqrt{2A_1 - 1}}\)

The Simplified Case: When Attestation Accuracy is Known

In practice, we often have good estimates of the attestation mechanism’s accuracy from self-agreement studies. For phone attestation, we observe \(A_1 \approx 0.875\) (87.5% self-agreement), which implies:

\(\theta_2 = \frac{1 + \sqrt{2(0.875) - 1}}{2} = \frac{1 + \sqrt{0.75}}{2} \approx 0.933\)

That is, phone attestation itself is approximately 93.3% accurate. Two independent phone attestations, each 93.3% accurate, will agree about 87.5% of the time, which matches our observations.

The Ideal True Accuracy Formula

From Data Confidence to Ideal True Accuracy

In our implementation, we work with what we call data confidence—our confidence that a particular value in our dataset is correct. This confidence is derived from agreement rates between our data and verification mechanisms. We then estimate the Ideal True Accuracy from this data confidence, which represents the true accuracy of the data under ideal assumptions.

However, data quality degrades over time. A practitioner’s phone number verified yesterday is more likely to be correct than one verified a year ago. To account for this, we use a decayed confidence measure that adjusts our confidence based on how long ago the data was collected or verified.

Definition 4 (Decayed Confidence). The decayed confidence \(c_d\) represents our data confidence adjusted for temporal decay. It is derived by measuring agreement rates over time and observing how they decrease as data ages:

\(c_d = a \cdot \frac{1 + e^{r \cdot d}}{2}\)

where:

  • \(a\) is the initial agreement rate (data confidence at the time of collection)

  • \(r\) is the decay rate (typically negative, representing how quickly accuracy decreases over time)

  • \(d\) is the number of days since the data was processed or verified

Field-Specific Decay Rates

Different data fields decay at different rates because they change with different frequencies in the real world. For example:

  • Fax numbers tend to decay fastest—they change frequently as practices upgrade equipment, switch service providers, or discontinue fax services entirely.

  • Phone numbers decay at a moderate rate—they change when practices move, switch carriers, or update their contact systems. However, phone numbers can change without an address change (e.g., switching carriers while staying at the same location).

  • Addresses decay more slowly—they typically only change when a practice physically relocates. When an address changes, the phone number usually changes as well, but the reverse is not always true.

  • Specialty information tends to be relatively stable, changing only when practitioners change their practice focus or obtain new certifications.

By measuring agreement rates over time for each field type, we can empirically determine appropriate decay rates. This allows us to adjust our data confidence based on both the field type and the age of the data, providing more accurate estimates of current data quality.

The Ideal True Accuracy Transformation

To transform the decayed confidence \(c_d\) (our data confidence adjusted for temporal decay) into an estimate of true accuracy, we apply the Ideal True Accuracy formula. This gives us the Ideal True Accuracy—our best estimate of the actual accuracy of the data, derived from our confidence that the values are correct:

Theorem 1 (Ideal True Accuracy Formula). For decayed confidence \(c_d \geq 0.5\), the Ideal True Accuracy is:/p>

\(\theta = \frac{1 + \sqrt{max(2c_d - 1, 0)}}{2}\)

For \(c_d < 0\), we define:

\(\theta = -\frac{1 + \sqrt{max(2|c_d| - 1, 0)}}{2}\)

Proof. The transformation derives from the relationship between agreement rates and accuracy under the assumption of random, independent errors.

For the case \(c_d \geq 0.5\), we model the agreement rate as:

\(c_d = 2\theta^2 - 2\theta + 1\)

Solving for \(\theta\):

\(2\theta^2 - 2\theta + (1 - c_d) = 0\)

\(\theta = \frac{2 \pm \sqrt{4 - 8(1-c_d)}}{4} = \frac{2 \pm \sqrt{8c_d - 4}}{4} = \frac{1 \pm \sqrt{2c_d - 1}}{2}\)

Since we require \(\theta \geq 0.5\) for meaningful accuracy estimates, we take the positive root:

\(\theta = \frac{1 + \sqrt{2c_d - 1}}{2}\)

The \(max(2c_d - 1, 0)\) term ensures the formula is well-defined even when \(c_d < 0.5\), in which case we cap the result at 0.5 (representing no information).

For \(c_d < 0\), we apply the same transformation to the absolute value and negate the result to preserve the sign convention.

Implications and Limitations

Practical Implications

Applying this methodology to phone attestation reveals important insights:

  1. Phone attestation accuracy: With 87.5% self-agreement, phone attestation is approximately 93.3% accurate. This means 100% correct data would measure as 87.5% agreement via phone attestation (not 100%), because even a 93.3% accurate measurement process will disagree with perfect data 12.5% of the time.
  2. Accuracy estimates when accuracy > 50%: When ideal true accuracy is above 50%, agreement rates are lower than the true accuracy. For example, if a dataset has 90% agreement via phone attestation, the Ideal True Accuracy estimate is approximately 94.7%. This means the data is actually more accurate than the agreement rate suggests—a crucial insight for realistic expectations and compliance reporting.
  3. Low agreement rates (accuracy < 50%): When agreement rates are below 50%, the corresponding ideal true accuracy is also below 50%. In this regime, agreement rates actually overestimate the true accuracy, because measurement errors can create false positive agreements that make the data appear more accurate than it actually is. For example, a dataset with 40% agreement might have an ideal true accuracy of only 30% or lower.

Limitations and Caveats

It is essential to understand the limitations of this approach:

  • The “Ideal” assumption and the accuracy range: This methodology frames accuracy as a range bounded by two measures:
    • Lower bound (Agreement Rate): What we can directly observe and measure through agreement between our data and attestation mechanisms.
    • Upper bound (Ideal True Accuracy): The best-case estimate assuming random, uncorrelated errors.
  • We make the “ideal” assumption (random, uncorrelated errors) not because we believe errors are necessarily random, but because we have no reliable way to measure systematic correlations in attestation systems. For instance, if hospital staff consistently provide incorrect information for certain doctors due to outdated internal records, we cannot verify this correlation without access to ground truth data. Since we lack ground truth, we cannot measure the extent of systematic correlations. If we tried to account for correlations, we would have to assume anything could be correlated, making it impossible to place meaningful bounds on accuracy. The true accuracy lies somewhere between the agreement rate (lower bound) and the Ideal True Accuracy (upper bound), with systematic errors pushing it toward the lower bound.
  • Measurement method dependence: Different attestation methods (phone, web, third-party) will yield different agreement rates and therefore different accuracy estimates for the same data.
  • Temporal decay: The decayed confidence model assumes exponential decay, which may not hold for all data types or time periods.
  • Not a guarantee: Ideal True Accuracy is an estimate, not a guarantee. It represents the best-case scenario under ideal assumptions.

 

Why “Guaranteed 95% Accuracy” Claims Are Problematic

Any vendor claiming “guaranteed 95% accuracy” faces a fundamental problem: they cannot prove it. Here’s why:

  1. No perfect measurement: Without a perfect verification mechanism, true accuracy cannot be directly measured. At best, they can report agreement rates.
  2. Methodology matters: Without transparency about their measurement methodology, you cannot evaluate whether their “95%” represents agreement rates, estimated accuracy, or something else entirely.
  3. The self-consistency test: If a vendor claims 95% accuracy but their own verification processes don’t achieve 95% self-agreement, their claim is mathematically inconsistent.
  4. Field and source variation: Accuracy varies by data field (address vs. phone vs. specialty) and data source. A single “95%” number obscures this variation.

A rigorous data provider should:

  • Clearly distinguish between agreement rates and estimated accuracy
  • Provide transparency about measurement methodologies
  • Acknowledge the limitations and assumptions underlying their estimates
  • Report accuracy metrics by field and source, not as a single aggregate number

Conclusion

Measuring data accuracy is fundamentally challenging because perfect measurement mechanisms do not exist. This document presents a rigorous, mathematically sound approach to estimating accuracy from observable agreement rates, while being transparent about assumptions and limitations.

The Ideal True Accuracy methodology provides:

  • A principled way to estimate true accuracy from agreement rates
  • Transparency about assumptions (random, uncorrelated errors)
  • Realistic expectations (agreement rates underestimate accuracy when accuracy > 50%, but overestimate when accuracy < 50%)
  • A framework for evaluating vendor claims and methodologies

Most importantly, this approach acknowledges that honest uncertainty is better than false precision. Rather than claiming impossible guarantees, we provide rigorous estimates with clear explanations of their limitations—exactly what you need to make informed business decisions about data quality.

So what does this mean for your organization?

In Part 2 of this series, Orderly President Kevin Krauth explains how healthcare leaders should evaluate accuracy claims—and how to determine what level of accuracy actually matters.

Aaron Beach Headshot

About Our Guest Author:

Aaron Beach is currently leading AI/ML products at Orderly Health, a provider data management platform recently acquired by First Choice Health. He enjoys tackling complex healthcare challenges with data-driven solutions. Over the past 18 years, he has applied these methods in renewable energy, mobile advertising, email marketing, and fraud detection, resulting in over 40 publications, 1000+ citations, and 2 patents 1,2. In his spare time, Aaron enjoys brewing beer, designing board games, and spending time with his wife and five children.

Mathematical Appendix

Properties of the Transformation

Lemma 1. The Ideal True Accuracy transformation is continuous and monotonically increasing for \(c_d > 0.5\).

Proof. For \(c_d > 0.5\), we have:

\(\frac{d\theta}{dc_d} = \frac{1}{2\sqrt{2c_d - 1}} > 0\)

which is well-defined and positive, establishing both continuity and monotonicity.

Boundary Conditions

Proposition 1. The Ideal True Accuracy transformation satisfies:

  • \(\theta = 0.5\) when \(c_d = 0.5\) (no information case)

  • \(\theta = 1\) when \(c_d = 1\) (perfect agreement)

  • \(\theta\) increases monotonically with \(c_d\) for \(c_d \geq 0.5\)

Proof. Direct verification:

  • When \(c_d = 0.5\): \(\theta = \frac{1 + \sqrt{0}}{2} = 0.5\)

  • When \(c_d = 1\): \(\theta = \frac{1 + \sqrt{1}}{2} = 1\)

  • Monotonicity follows from the positive derivative established in the previous lemma.

Enjoyed this content and hungry for more?

Subscribe now to receive the Orderly newsletter directly in your inbox, packed with insights and updates to keep you ahead of the curve in the world of healthcare.