skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Real-time automated billing for tobacco treatment: developing and validating a scalable machine learning approach
Abstract ObjectivesTo develop CigStopper, a real-time, automated medical billing prototype designed to identify eligible tobacco cessation care codes, thereby reducing administrative workload while improving billing accuracy. Materials and MethodsChatGPT prompt engineering generated a synthetic corpus of physician-style clinical notes categorized for CPT codes 99406/99407. Practicing clinicians annotated the dataset to train multiple machine learning (ML) models focused on accurately predicting billing code eligibility. ResultsDecision tree and random forest models performed best. Mean performance across all models: PRC AUC = 0.857, F1 score = 0.835. Generalizability testing on deidentified notes confirmed that tree-based models performed best. DiscussionCigStopper shows promise for streamlining manual billing inefficiencies that hinder tobacco cessation care. ML methods lay the groundwork for clinical implementation based on good performance using synthetic data. Automating high-volume, low-value tasks simplify complexities in a multi-payer system and promote financial sustainability for healthcare practices. ConclusionCigStopper validates foundational methods for automating the discernment of appropriate billing codes for eligible smoking cessation counseling care.  more » « less
Award ID(s):
2451172
PAR ID:
10652571
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
JAMIA Open
Date Published:
Journal Name:
JAMIA Open
Volume:
8
Issue:
3
ISSN:
2574-2531
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Objective:Develop workflows and billing processes for a Certified Diabetes Care and Education Specialist (CDCES)-led remote patient monitoring (RPM) program to transition the Teamwork, Targets, Technology, and Tight Control (4T) Study to our clinic’s standard of care. Methods:We identified stakeholders within a pediatric endocrinology clinic (hospital compliance, billing specialists, and clinical informatics) to identify, discuss, and approve billing codes and workflow. The group evaluated billing code stipulations, such as the timing of continuous glucose monitor (CGM) interpretation, scope of work, providers’ licensing, and electronic health record (EHR) documentation to meet billing compliance standards. We developed a CDCES workflow for asynchronous CGM interpretation and intervention and initiated an RPM billing pilot. Results:We built a workflow for CGM interpretation (billing code: 95251) with the CDCES as the service provider. The workflow includes data review, patient communications, and documentation. Over the first month of the pilot, RPM billing codes were submitted for 52 patients. The average reimbursement rate was $110.33 for commercial insurance (60% of patients) and $46.95 for public insurance (40% of patients) per code occurrence. Conclusions:Continuous involvement of CDCES and hospital stakeholders was essential to operationalize all relevant aspects of clinical care, workflows, compliance, documentation, and billing. CGM interpretation with RPM billing allows CDCES to work at the top of their licensing credential, increase clinical care touch points, and provide a business case for expansion. As evidence of the clinical benefits of RPM increases, the processes developed here may facilitate broader adoption of revenue-generating CDCES-led care to fund RPM. 
    more » « less
  2. Abstract BackgroundWhile most health-care providers now use electronic health records (EHRs) to document clinical care, many still treat them as digital versions of paper records. As a result, documentation often remains unstructured, with free-text entries in progress notes. This limits the potential for secondary use and analysis, as machine-learning and data analysis algorithms are more effective with structured data. ObjectiveThis study aims to use advanced artificial intelligence (AI) and natural language processing (NLP) techniques to improve diagnostic information extraction from clinical notes in a periodontal use case. By automating this process, the study seeks to reduce missing data in dental records and minimize the need for extensive manual annotation, a long-standing barrier to widespread NLP deployment in dental data extraction. Materials and MethodsThis research utilizes large language models (LLMs), specifically Generative Pretrained Transformer 4, to generate synthetic medical notes for fine-tuning a RoBERTa model. This model was trained to better interpret and process dental language, with particular attention to periodontal diagnoses. Model performance was evaluated by manually reviewing 360 clinical notes randomly selected from each of the participating site’s dataset. ResultsThe results demonstrated high accuracy of periodontal diagnosis data extraction, with the sites 1 and 2 achieving a weighted average score of 0.97-0.98. This performance held for all dimensions of periodontal diagnosis in terms of stage, grade, and extent. DiscussionSynthetic data effectively reduced manual annotation needs while preserving model quality. Generalizability across institutions suggests viability for broader adoption, though future work is needed to improve contextual understanding. ConclusionThe study highlights the potential transformative impact of AI and NLP on health-care research. Most clinical documentation (40%-80%) is free text. Scaling our method could enhance clinical data reuse. 
    more » « less
  3. Abstract ObjectiveLeverage electronic health record (EHR) audit logs to develop a machine learning (ML) model that predicts which notes a clinician wants to review when seeing oncology patients. Materials and MethodsWe trained logistic regression models using note metadata and a Term Frequency Inverse Document Frequency (TF-IDF) text representation. We evaluated performance with precision, recall, F1, AUC, and a clinical qualitative assessment. ResultsThe metadata only model achieved an AUC 0.930 and the metadata and TF-IDF model an AUC 0.937. Qualitative assessment revealed a need for better text representation and to further customize predictions for the user. DiscussionOur model effectively surfaces the top 10 notes a clinician wants to review when seeing an oncology patient. Further studies can characterize different types of clinician users and better tailor the task for different care settings. ConclusionEHR audit logs can provide important relevance data for training ML models that assist with note-writing in the oncology setting. 
    more » « less
  4. Cardiac rehabilitation (CR) is a medically supervised program designed to improve heart health after a cardiac event. Despite its demonstrated clinical benefits, CR participation among eligible patients remains poor due to low referral rates and individual barriers to care. To evaluate CR participation by patients who receive care from hospital-integrated physicians compared with independent physicians, and subsequently, to examine CR and recurrent cardiac hospitalizations. This retrospective cohort study evaluated Medicare Part A and Part B claims data from calendar years 2016 to 2019. All analyses were conducted between January 1 and April 30, 2024. Patients were included if they had a qualifying event for CR between 2017 and 2018, and qualifying events were identified using diagnosis codes on inpatient claims and procedure codes on outpatient and carrier claims. Eligible patients also had to continuously enroll in fee-for-service Medicare for 12 months or more before and after the index event. Physicians’ integration status and patients’ CR participation were determined during the 12-month follow-up period. The study covariates were ascertained during the 12 months before the index event. ExposureHospital-integration status of the treating physician during follow-up. Main Outcomes and MeasuresPostindex CR participation was determined by qualifying procedure codes on outpatient and carrier claims. ResultsThe study consisted of 28 596 Medicare patients eligible for CR. Their mean (SD) age was 74.0 (9.6) years; 16 839 (58.9%) were male. A total of 9037 patients (31.6%) were treated by a hospital-integrated physician, of which 2995 (33.1%) received CR during follow-up. Logistic regression via propensity score weighting showed that having a hospital-integrated physician was associated with an 11% increase in the odds of receiving CR (odds ratio [OR], 1.11; 95% CI, 1.05-1.18). Additionally, CR participation was associated with a 14% decrease in the odds of recurrent cardiovascular-related hospitalizations (OR, 0.86; 95% CI, 0.81-0.91). The findings of this cohort study suggest that hospital integration has the potential to facilitate greater CR participation and improve heart care. Several factors may help explain this positive association, including enhanced care coordination and value-based payment policies. Further research is needed to assess the association of integration with other appropriate high-quality care activities. 
    more » « less
  5. ImportanceLarge language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas. ObjectiveTo summarize existing evaluations of LLMs in health care in terms of 5 components: (1) evaluation data type, (2) health care task, (3) natural language processing (NLP) and natural language understanding (NLU) tasks, (4) dimension of evaluation, and (5) medical specialty. Data SourcesA systematic search of PubMed and Web of Science was performed for studies published between January 1, 2022, and February 19, 2024. Study SelectionStudies evaluating 1 or more LLMs in health care. Data Extraction and SynthesisThree independent reviewers categorized studies via keyword searches based on the data used, the health care tasks, the NLP and NLU tasks, the dimensions of evaluation, and the medical specialty. ResultsOf 519 studies reviewed, published between January 1, 2022, and February 19, 2024, only 5% used real patient care data for LLM evaluation. The most common health care tasks were assessing medical knowledge such as answering medical licensing examination questions (44.5%) and making diagnoses (19.5%). Administrative tasks such as assigning billing codes (0.2%) and writing prescriptions (0.2%) were less studied. For NLP and NLU tasks, most studies focused on question answering (84.2%), while tasks such as summarization (8.9%) and conversational dialogue (3.3%) were infrequent. Almost all studies (95.4%) used accuracy as the primary dimension of evaluation; fairness, bias, and toxicity (15.8%), deployment considerations (4.6%), and calibration and uncertainty (1.2%) were infrequently measured. Finally, in terms of medical specialty area, most studies were in generic health care applications (25.6%), internal medicine (16.4%), surgery (11.4%), and ophthalmology (6.9%), with nuclear medicine (0.6%), physical medicine (0.4%), and medical genetics (0.2%) being the least represented. Conclusions and RelevanceExisting evaluations of LLMs mostly focus on accuracy of question answering for medical examinations, without consideration of real patient care data. Dimensions such as fairness, bias, and toxicity and deployment considerations received limited attention. Future evaluations should adopt standardized applications and metrics, use clinical data, and broaden focus to include a wider range of tasks and specialties. 
    more » « less