Your engineers are not lab data specialists (and they shouldn’t have to be)

Consider the following scenario: a Series A healthtech closes an USD 8 million round. The CTO decides to build the lab data ingestion module in-house: parse PDF reports, normalize nomenclatures, convert units, map LOINC codes. Eight months later, the runway has shrunk, the engineering team is exhausted, and the system is half-finished because the complexity of clinical data was underestimated from day one. The details are illustrative, but the pattern is not. As Roman Zomko documented in his build-vs-buy analysis for Impakter, this is one of the most common ways healthtechs burn runway on infrastructure complexity that could have been avoided.

The root of the problem is a misconception: that ingesting lab data is a generic software engineering problem. It isn't. The LOINC system (Logical Observation Identifiers Names and Codes), the universal standard for identifying laboratory tests, contains approximately 86,000 entries. Each entry is defined by six components: the biomarker measured, the property observed, the specimen type, the method used, the scale, and the measurement timing. Mapping a laboratory's local code to the correct LOINC code requires clinical knowledge, not just programming logic. Research published by the Agency for Healthcare Research and Quality (AHRQ) on LOINC adoption found that across hospitals in the Indianapolis area, over a 12-month period, 4,000 distinct local codes generated nearly 49 million clinical results. Just 80 codes (2% of the total) accounted for 80% of results, and 784 codes (19%) covered 99%. This suggests the mapping can start with a small subset, but it also reveals the true scale of the problem: even the long tail of the remaining 81% must be handled for the system to function in production with clinical reliability.

Beyond LOINC mapping, lab data ingestion requires handling heterogeneous formats. Most test results still circulate as PDFs, a format designed for human viewing, not computational processing. Extracting structured data from a lab report PDF is a clinical NLP problem involving layout recognition, table interpretation, reference range identification, and nomenclature disambiguation. When a healthtech decides to solve this internally, it's not building a feature. It's building a company inside the company.


The cost of this decision is measurable. According to the West Health Institute, the U.S. healthcare system could save over USD 30 billion annually by improving medical device and data interoperability alone, with redundant testing and manual data entry as primary cost drivers. At the integration level, Taction's 2026 EHR Integration Cost Guide estimates that a single bidirectional FHIR connection costs between USD 15,000 and USD 150,000 per platform, with timelines of 10 to 18 weeks per integration. For a healthtech building its own lab data layer (which involves not one integration but an entire pipeline of PDF parsing, LOINC mapping, unit normalization, reference range interpretation, and FHIR conversion), the cost compounds quickly across each of those layers. Add to that the USD 3.61 per line of legacy code maintenance documented by MagmaLabs in healthcare systems, and the five-year total cost of ownership becomes a significant share of a startup's runway. For a team of 10 to 20 engineers, this kind of infrastructure work can consume 25% to 50% of engineering capacity on a problem that is not the core business.

The problem isn't just financial. It's strategic. Every sprint spent on data infrastructure is a sprint not invested in the core product, the functionality that differentiates the healthtech in the market and justifies the next funding round. Healthtechs that fall into this trap reach the market later, with fewer core product features and runway consumed by complexity that could have been outsourced. The 2025–26 investment landscape reinforces this urgency. The global digital health sector surpassed USD 400 billion and is projected to reach USD 1 trillion by 2030. At the same time, investors are more selective: in the first half of 2025, USD 8.2 billion was invested in medtech across 421 deals, with rising round sizes but declining deal counts. Capital is available, but concentrated in startups that demonstrate operational efficiency and the ability to scale on reliable infrastructure.

Interoperability is no longer a technical differentiator. It's a due diligence criterion. Funds such as NEA, Oak HC/FT, and corporate venture arms from Medtronic, J&J, and Philips are requiring FHIR R4 compatibility, structured data quality, and integration capabilities as non-negotiable requirements. A healthtech that presents semi-structured lab data (or data parsed with proprietary logic without clinical validation) loses technical credibility with investors who have learned to ask about the data layer. The complexity of lab data also limits scalability. A healthtech that builds its ingestion module for one country or one lab network discovers, upon expanding, that the mapping doesn't transfer. Local nomenclatures vary between regions, units of measurement change by country, and reference ranges depend on population and equipment calibration. What worked for 50 labs breaks silently when the network reaches 500. And the engineering team that should be optimizing the product is debugging PDF parsers.

Legacy healthcare systems are also notoriously rigid. The average cost of maintaining legacy code in healthcare systems is estimated at USD 3.61 per line of code. Healthtechs that build custom integrators end up inheriting part of this complexity, creating technical debt that compromises engineering agility in the years that follow. The cycle is predictable: the healthtech starts with a basic parser, adds normalization rules as errors surface, hires someone with clinical terminology experience, and at some point realizes it's maintaining an entire subsystem dedicated to an infrastructure problem. The build vs. buy logic answers this question directly: build what differentiates your product; buy what is infrastructure. No healthtech differentiates itself by having built a better PDF parser. It differentiates itself by what it does with data after it's clean, structured, and ready for consumption.

The solution isn't to ignore the complexity of lab data. It's to recognize it as a specialized infrastructure problem and treat it accordingly. Just as no startup builds its own hosting, authentication, or payment processing, the lab data ingestion, normalization, and harmonization layer is a solved problem that can be acquired as a service. This frees the engineering team to focus on what actually matters: the product the market is willing to pay for. It is within this context that OpenHealth Technologies is positioned. The OpenHealth Lab API automatically correlates multiple data streams with rigorously validated logical layers of laboratory tests, covering over 8,000 biomarkers with native LOINC mapping. The platform ingests data in any format (including PDFs, XML, HL7, FHIR, and manual input) and delivers structured, semantically harmonized data ready for consumption by any downstream application. For healthtechs, this means eliminating months of data infrastructure engineering and redirecting that capacity to the core product, with the confidence that the data layer meets clinical and interoperability requirements from day one.

Learn how your healthtech can eliminate lab data complexity and accelerate time to market.