AI Engineering · Healthcare
End-to-end automation · higher accuracy than human coders
An LLM + RAG pipeline that parses HL7 ORU/MDM messages and clinical notes — X-rays, operative records, consultations — and auto-assigns ICD-10-CM, CPT-4, and HCPCS codes directly into the claims workflow at Crystal Clinic Orthopaedic Center.
Challenge
Manual medical coding created a bottleneck in the revenue cycle. Coders were processing X-ray reports, operative notes, and consultation records by hand — a time-intensive process prone to undercoding, overcoding, and claim denials that delayed revenue.
Approach
Built a four-stage NLP pipeline on AWS: HL7 message parsing, LLM-based clinical entity extraction, RAG lookup against the official ICD-10-CM, CPT-4, and HCPCS codesets, and a validation layer that integrates directly into the existing claims submission workflow.
Outcome
The pipeline handles end-to-end coding for all four document types without manual handoffs. Accuracy exceeds the previous human baseline. Coding turnaround time dropped by 20%, and claims now enter the submission queue automatically.
Crystal Clinic Orthopaedic Center is a high-volume orthopedic practice processing hundreds of patient encounters daily. Each encounter generates clinical documentation — X-ray interpretation reports, operative notes, consultation records, and discharge summaries — that must be translated into standardized billing codes before a claim can be submitted. This translation work, medical coding, was being done manually by a team of certified coders working from printed and PDF documents.
The volume was growing faster than the team could scale. Backlogs were accumulating, particularly for complex orthopedic procedures requiring multiple CPT-4 codes across a single operative note. Claim denials from miscoded procedures were consuming additional time in rework. The revenue cycle team had identified coding accuracy and throughput as the primary constraint on their cash flow — and they needed an automated solution that could match or exceed the accuracy of their experienced coders, not just handle simple cases.
General-purpose language models can extract clinical entities from text — but medical coding is not entity extraction. It requires mapping clinical descriptions to the correct code within a codeset that contains tens of thousands of entries, where the difference between a correct and incorrect code can hinge on a single qualifier (laterality, acuity, encounter type). A model that "knows" the ICD-10 system from training data will hallucinate codes that sound plausible but don't exist, or select a code at the wrong specificity level — both of which result in claim denials.
The additional constraint was the input format. Crystal Clinic's clinical documentation arrived via HL7 ORU and MDM messages — a structured but non-trivial format that encodes not just the clinical text but metadata about the document type, ordering provider, and patient encounter. Any solution had to parse this format reliably before it could do anything with the clinical content. The pipeline needed to handle the full chain: message parsing, entity extraction, code lookup, validation, and claims system integration — not just the LLM layer in the middle.
Adimen designed and built the full pipeline as a serverless architecture on AWS, deployed and integrated with Crystal Clinic's existing RCM system over an 8-week engagement:
An AWS Lambda function receives HL7 ORU and MDM messages via API Gateway. A custom parser extracts the clinical note text, document type, encounter metadata, and patient identifiers from the HL7 segments. The structured output is normalized into a canonical JSON format that the downstream pipeline consumes consistently regardless of source system or message version. Malformed messages are routed to a review queue rather than failed silently.
The parsed clinical text is processed by GPT-4o via the Azure OpenAI endpoint. A carefully engineered system prompt instructs the model to extract diagnoses, procedures, anatomical sites, laterality, acuity indicators, and encounter type from the note — outputting a structured JSON object rather than free text. The prompt includes few-shot examples drawn from Crystal Clinic's own historical cases to ground the model's output in the practice's documentation style. Output is validated against a JSON schema before proceeding.
Extracted clinical entities are matched against a vector index of the full ICD-10-CM, CPT-4, and HCPCS codesets stored in Pinecone. Rather than relying on the LLM's parametric knowledge of codes, the RAG layer retrieves the top-k candidate codes for each extracted entity, including their official descriptors and inclusion/exclusion notes. A second LLM call selects the correct code from the retrieved candidates, grounded in the official codeset text. This approach eliminates hallucinated codes and ensures specificity matches the documented clinical detail.
Assigned codes are run through a rules-based validation layer that checks for known invalid code combinations, missing required qualifiers, and codes that don't match the encounter type. Validated code sets are written to RDS and pushed to Crystal Clinic's claims management system via a webhook. Cases that fail validation — typically fewer than 5% — are flagged for human review with the specific validation failure noted. The full audit trail from HL7 message to assigned codes is stored for compliance and QA purposes.
Architecture
HL7 messages enter at the left, get parsed and classified, run through a fine-tuned GPT-4o for clinical entity extraction, are matched against the official codeset via RAG, validated against coding rules and payer logic, and submitted directly into the claims system. The entire pipeline runs on AWS infrastructure.