#NLP #LLM #Healthcare #AWS #RAG

AI Engineering · Healthcare

Text to codes: LLM-powered medical coding automation for an orthopedic center

End-to-end automation · higher accuracy than human coders

An LLM + RAG pipeline that parses HL7 ORU/MDM messages and clinical notes — X-rays, operative records, consultations — and auto-assigns ICD-10-CM, CPT-4, and HCPCS codes directly into the claims workflow at Crystal Clinic Orthopaedic Center.

Challenge

Manual medical coding created a bottleneck in the revenue cycle. Coders were processing X-ray reports, operative notes, and consultation records by hand — a time-intensive process prone to undercoding, overcoding, and claim denials that delayed revenue.

Approach

Built a four-stage NLP pipeline on AWS: HL7 message parsing, LLM-based clinical entity extraction, RAG lookup against the official ICD-10-CM, CPT-4, and HCPCS codesets, and a validation layer that integrates directly into the existing claims submission workflow.

Outcome

The pipeline handles end-to-end coding for all four document types without manual handoffs. Accuracy exceeds the previous human baseline. Coding turnaround time dropped by 20%, and claims now enter the submission queue automatically.

Crystal Clinic Orthopaedic Center: a coding bottleneck inside the revenue cycle.

Crystal Clinic Orthopaedic Center is a high-volume orthopedic practice processing hundreds of patient encounters daily. Each encounter generates clinical documentation — X-ray interpretation reports, operative notes, consultation records, and discharge summaries — that must be translated into standardized billing codes before a claim can be submitted. This translation work, medical coding, was being done manually by a team of certified coders working from printed and PDF documents.

The volume was growing faster than the team could scale. Backlogs were accumulating, particularly for complex orthopedic procedures requiring multiple CPT-4 codes across a single operative note. Claim denials from miscoded procedures were consuming additional time in rework. The revenue cycle team had identified coding accuracy and throughput as the primary constraint on their cash flow — and they needed an automated solution that could match or exceed the accuracy of their experienced coders, not just handle simple cases.

Off-the-shelf LLMs aren't enough for medical coding.

General-purpose language models can extract clinical entities from text — but medical coding is not entity extraction. It requires mapping clinical descriptions to the correct code within a codeset that contains tens of thousands of entries, where the difference between a correct and incorrect code can hinge on a single qualifier (laterality, acuity, encounter type). A model that "knows" the ICD-10 system from training data will hallucinate codes that sound plausible but don't exist, or select a code at the wrong specificity level — both of which result in claim denials.

The additional constraint was the input format. Crystal Clinic's clinical documentation arrived via HL7 ORU and MDM messages — a structured but non-trivial format that encodes not just the clinical text but metadata about the document type, ordering provider, and patient encounter. Any solution had to parse this format reliably before it could do anything with the clinical content. The pipeline needed to handle the full chain: message parsing, entity extraction, code lookup, validation, and claims system integration — not just the LLM layer in the middle.

A four-stage NLP pipeline on AWS — from HL7 message to submitted claim.

Adimen designed and built the full pipeline as a serverless architecture on AWS, deployed and integrated with Crystal Clinic's existing RCM system over an 8-week engagement:

Stage 1: HL7 message parsing

An AWS Lambda function receives HL7 ORU and MDM messages via API Gateway. A custom parser extracts the clinical note text, document type, encounter metadata, and patient identifiers from the HL7 segments. The structured output is normalized into a canonical JSON format that the downstream pipeline consumes consistently regardless of source system or message version. Malformed messages are routed to a review queue rather than failed silently.

Stage 2: LLM-based clinical entity extraction

The parsed clinical text is processed by GPT-4o via the Azure OpenAI endpoint. A carefully engineered system prompt instructs the model to extract diagnoses, procedures, anatomical sites, laterality, acuity indicators, and encounter type from the note — outputting a structured JSON object rather than free text. The prompt includes few-shot examples drawn from Crystal Clinic's own historical cases to ground the model's output in the practice's documentation style. Output is validated against a JSON schema before proceeding.

Stage 3: RAG against the official codeset

Extracted clinical entities are matched against a vector index of the full ICD-10-CM, CPT-4, and HCPCS codesets stored in Pinecone. Rather than relying on the LLM's parametric knowledge of codes, the RAG layer retrieves the top-k candidate codes for each extracted entity, including their official descriptors and inclusion/exclusion notes. A second LLM call selects the correct code from the retrieved candidates, grounded in the official codeset text. This approach eliminates hallucinated codes and ensures specificity matches the documented clinical detail.

Stage 4: Validation + claims integration

Assigned codes are run through a rules-based validation layer that checks for known invalid code combinations, missing required qualifiers, and codes that don't match the encounter type. Validated code sets are written to RDS and pushed to Crystal Clinic's claims management system via a webhook. Cases that fail validation — typically fewer than 5% — are flagged for human review with the specific validation failure noted. The full audit trail from HL7 message to assigned codes is stored for compliance and QA purposes.

Architecture

From HL7 message to coded claim — one automated pipeline.

HL7 messages enter at the left, get parsed and classified, run through a fine-tuned GPT-4o for clinical entity extraction, are matched against the official codeset via RAG, validated against coding rules and payer logic, and submitted directly into the claims system. The entire pipeline runs on AWS infrastructure.

End-to-end automation. Higher accuracy than human coders.

All three major code standards — ICD-10-CM, CPT-4, HCPCS — are assigned in a single pipeline pass, covering the full billing requirement for each encounter.

Document types handled end-to-end without manual intervention: X-ray reports, operative notes, consultation records, and discharge summaries.

−20%

Coding turnaround time vs. the manual baseline, measured across the first 90 days of production operation.

Manual handoffs in the standard path. Documents enter the pipeline as HL7 messages and exit as validated claim codes — no human step required.

Text to codes: LLM-powered medical coding automation for an orthopedic center

Crystal Clinic Orthopaedic Center: a coding bottleneck inside the revenue cycle.

Off-the-shelf LLMs aren't enough for medical coding.

A four-stage NLP pipeline on AWS — from HL7 message to submitted claim.

Stage 1: HL7 message parsing

Stage 2: LLM-based clinical entity extraction

Stage 3: RAG against the official codeset

Stage 4: Validation + claims integration

From HL7 message to coded claim — one automated pipeline.

End-to-end automation. Higher accuracy than human coders.

Tech stack

Automating clinical documents in your organisation?
Let's scope it together.

Text to codes: LLM-powered medical coding automation for an orthopedic center

Crystal Clinic Orthopaedic Center: a coding bottleneck inside the revenue cycle.

Off-the-shelf LLMs aren't enough for medical coding.

A four-stage NLP pipeline on AWS — from HL7 message to submitted claim.

Stage 1: HL7 message parsing

Stage 2: LLM-based clinical entity extraction

Stage 3: RAG against the official codeset

Stage 4: Validation + claims integration

From HL7 message to coded claim — one automated pipeline.

End-to-end automation. Higher accuracy than human coders.

Tech stack

Automating clinical documents in your organisation?Let's scope it together.

Automating clinical documents in your organisation?
Let's scope it together.