skip to main content

Title: Physicians’ electronic inbox work patterns and factors associated with high inbox work duration
Abstract Objectives Electronic health record systems are increasingly used to send messages to physicians, but research on physicians’ inbox use patterns is limited. This study’s aims were to (1) quantify the time primary care physicians (PCPs) spend managing inboxes; (2) describe daily patterns of inbox use; (3) investigate which types of messages consume the most time; and (4) identify factors associated with inbox work duration. Materials and Methods We analyzed 1 month of electronic inbox data for 1275 PCPs in a large medical group and linked these data with physicians’ demographic data. Results PCPs spent an average of 52 minutes on inbox management on workdays, including 19 minutes (37%) outside work hours. Temporal patterns of electronic inbox use differed from other EHR functions such as charting. Patient-initiated messages (28%) and results (29%) accounted for the most inbox work time. PCPs with higher inbox work duration were more likely to be female (P < .001), have more patient encounters (P < .001), have older patients (P < .001), spend proportionally more time on patient messages (P < .001), and spend more time per message (P < .001). Compared with PCPs with the lowest duration of time on inbox work, PCPs with more » the highest duration had more message views per workday (200 vs 109; P < .001) and spent more time on the inbox outside work hours (30 minutes vs 9.7 minutes; P < .001). Conclusions Electronic inbox work by PCPs requires roughly an hour per workday, much of which occurs outside scheduled work hours. Interventions to assist PCPs in handling patient-initiated messages and results may help alleviate inbox workload. « less
; ; ; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Journal of the American Medical Informatics Association
Page Range or eLocation-ID:
923 to 930
Sponsoring Org:
National Science Foundation
More Like this
  1. Background Increased work through electronic health record (EHR) messaging is frequently cited as a factor of physician burnout. However, studies to date have relied on anecdotal or self-reported measures, which limit the ability to match EHR use patterns with continuous stress patterns throughout the day. Objective The aim of this study is to collect EHR use and physiologic stress data through unobtrusive means that provide objective and continuous measures, cluster distinct patterns of EHR inbox work, identify physicians’ daily physiologic stress patterns, and evaluate the association between EHR inbox work patterns and physician physiologic stress. Methods Physicians were recruited frommore »5 medical centers. Participants (N=47) were given wrist-worn devices (Garmin Vivosmart 3) with heart rate sensors to wear for 7 days. The devices measured physiological stress throughout the day based on heart rate variability (HRV). Perceived stress was also measured with self-reports through experience sampling and a one-time survey. From the EHR system logs, the time attributed to different activities was quantified. By using a clustering algorithm, distinct inbox work patterns were identified and their associated stress measures were compared. The effects of EHR use on physician stress were examined using a generalized linear mixed effects model. Results Physicians spent an average of 1.08 hours doing EHR inbox work out of an average total EHR time of 3.5 hours. Patient messages accounted for most of the inbox work time (mean 37%, SD 11%). A total of 3 patterns of inbox work emerged: inbox work mostly outside work hours, inbox work mostly during work hours, and inbox work extending after hours that were mostly contiguous to work hours. Across these 3 groups, physiologic stress patterns showed 3 periods in which stress increased: in the first hour of work, early in the afternoon, and in the evening. Physicians in group 1 had the longest average stress duration during work hours (80 out of 243 min of valid HRV data; P=.02), as measured by physiological sensors. Inbox work duration, the rate of EHR window switching (moving from one screen to another), the proportion of inbox work done outside of work hours, inbox work batching, and the day of the week were each independently associated with daily stress duration (marginal R2=15%). Individual-level random effects were significant and explained most of the variation in stress (conditional R2=98%). Conclusions This study is among the first to demonstrate associations between electronic inbox work and physiological stress. We identified 3 potentially modifiable factors associated with stress: EHR window switching, inbox work duration, and inbox work outside work hours. Organizations seeking to reduce physician stress may consider system-based changes to reduce EHR window switching or inbox work duration or the incorporation of inbox management time into work hours.« less
  2. Background Telemedicine as a mode of health care work has grown dramatically during the COVID-19 pandemic; the impact of this transition on clinicians’ after-hours electronic health record (EHR)–based clinical and administrative work is unclear. Objective This study assesses the impact of the transition to telemedicine during the COVID-19 pandemic on physicians’ EHR-based after-hours workload (ie, “work outside work”) at a large academic medical center in New York City. Methods We conducted an EHR-based retrospective cohort study of ambulatory care physicians providing telemedicine services before the pandemic, during the acute pandemic, and after the acute pandemic, relating EHR-based after-hours work tomore »telemedicine intensity (ie, percentage of care provided via telemedicine) and clinical load (ie, patient load per provider). Results A total of 2129 physicians were included in this study. During the acute pandemic, the volume of care provided via telemedicine significantly increased for all physicians, whereas patient volume decreased. When normalized by clinical load (ie, average appointments per day by average clinical days per week), telemedicine intensity was positively associated with work outside work across time periods. This association was strongest after the acute pandemic. Conclusions Taking physicians’ clinical load into account, physicians who devoted a higher proportion of their clinical time to telemedicine throughout various stages of the pandemic engaged in higher levels of EHR-based after-hours work compared to those who used telemedicine less intensively. This suggests that telemedicine, as currently delivered, may be less efficient than in-person–based care and may increase the after-hours work burden of physicians.« less
  3. Abstract STUDY QUESTION To what extent does the use of mobile computing apps to track the menstrual cycle and the fertile window influence fecundability among women trying to conceive? SUMMARY ANSWER After adjusting for potential confounders, use of any of several different apps was associated with increased fecundability ranging from 12% to 20% per cycle of attempt. WHAT IS KNOWN ALREADY Many women are using mobile computing apps to track their menstrual cycle and the fertile window, including while trying to conceive. STUDY DESIGN, SIZE, DURATION The Pregnancy Study Online (PRESTO) is a North American prospective internet-based cohort of womenmore »who are aged 21–45 years, trying to conceive and not using contraception or fertility treatment at baseline. PARTICIPANTS/MATERIALS, SETTING, METHODS We restricted the analysis to 8363 women trying to conceive for no more than 6 months at baseline; the women were recruited from June 2013 through May 2019. Women completed questionnaires at baseline and every 2 months for up to 1 year. The main outcome was fecundability, i.e. the per-cycle probability of conception, which we assessed using self-reported data on time to pregnancy (confirmed by positive home pregnancy test) in menstrual cycles. On the baseline and follow-up questionnaires, women reported whether they used mobile computing apps to track their menstrual cycles (‘cycle apps’) and, if so, which one(s). We estimated fecundability ratios (FRs) for the use of cycle apps, adjusted for female age, race/ethnicity, prior pregnancy, BMI, income, current smoking, education, partner education, caffeine intake, use of hormonal contraceptives as the last method of contraception, hours of sleep per night, cycle regularity, use of prenatal supplements, marital status, intercourse frequency and history of subfertility. We also examined the impact of concurrent use of fertility indicators: basal body temperature, cervical fluid, cervix position and/or urine LH. MAIN RESULTS AND THE ROLE OF CHANCE Among 8363 women, 6077 (72.7%) were using one or more cycle apps at baseline. A total of 122 separate apps were reported by women. We designated five of these apps before analysis as more likely to be effective (Clue, Fertility Friend, Glow, Kindara, Ovia; hereafter referred to as ‘selected apps’). The use of any app at baseline was associated with 20% increased fecundability, with little difference between selected apps versus other apps (selected apps FR (95% CI): 1.20 (1.13, 1.28); all other apps 1.21 (1.13, 1.30)). In time-varying analyses, cycle app use was associated with 12–15% increased fecundability (selected apps FR (95% CI): 1.12 (1.04, 1.21); all other apps 1.15 (1.07, 1.24)). When apps were used at baseline with one or more fertility indicators, there was higher fecundability than without fertility indicators (selected apps with indicators FR (95% CI): 1.23 (1.14, 1.34) versus without indicators 1.17 (1.05, 1.30); other apps with indicators 1.30 (1.19, 1.43) versus without indicators 1.16 (1.06, 1.27)). In time-varying analyses, results were similar when stratified by time trying at study entry (<3 vs. 3–6 cycles) or cycle regularity. For use of the selected apps, we observed higher fecundability among women with a history of subfertility: FR 1.33 (1.05–1.67). LIMITATIONS, REASONS FOR CAUTION Neither regularity nor intensity of app use was ascertained. The prospective time-varying assessment of app use was based on questionnaires completed every 2 months, which would not capture more frequent changes. Intercourse frequency was also reported retrospectively and we do not have data on timing of intercourse relative to the fertile window. Although we controlled for a wide range of covariates, we cannot exclude the possibility of residual confounding (e.g. choosing to use an app in this observational study may be a marker for unmeasured health habits promoting fecundability). Half of the women in the study received a free premium subscription for one of the apps (Fertility Friend), which may have increased the overall prevalence of app use in the time-varying analyses, but would not affect app use at baseline. Most women in the study were college educated, which may limit application of results to other populations. WIDER IMPLICATIONS OF THE FINDINGS Use of a cycle app, especially in combination with observation of one or more fertility indicators (basal body temperature, cervical fluid, cervix position and/or urine LH), may increase fecundability (per-cycle pregnancy probability) by about 12–20% for couples trying to conceive. We did not find consistent evidence of improved fecundability resulting from use of one specific app over another. STUDY FUNDING/COMPETING INTEREST(S) This research was supported by grants, R21HD072326 and R01HD086742, from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, USA. In the last 3 years, Dr L.A.W. has served as a fibroid consultant for Dr L.A.W. has also received in-kind donations from Sandstone Diagnostics, Swiss Precision Diagnostics, and for primary data collection and participant incentives in the PRESTO cohort. Dr J.B.S. reports personal fees from Swiss Precision Diagnostics, outside the submitted work. The remaining authors have nothing to declare. TRIAL REGISTRATION NUMBER N/A.« less
  4. Obeid, Iyad Selesnick (Ed.)
    Electroencephalography (EEG) is a popular clinical monitoring tool used for diagnosing brain-related disorders such as epilepsy [1]. As monitoring EEGs in a critical-care setting is an expensive and tedious task, there is a great interest in developing real-time EEG monitoring tools to improve patient care quality and efficiency [2]. However, clinicians require automatic seizure detection tools that provide decisions with at least 75% sensitivity and less than 1 false alarm (FA) per 24 hours [3]. Some commercial tools recently claim to reach such performance levels, including the Olympic Brainz Monitor [4] and Persyst 14 [5]. In this abstract, we describemore »our efforts to transform a high-performance offline seizure detection system [3] into a low latency real-time or online seizure detection system. An overview of the system is shown in Figure 1. The main difference between an online versus offline system is that an online system should always be causal and has minimum latency which is often defined by domain experts. The offline system, shown in Figure 2, uses two phases of deep learning models with postprocessing [3]. The channel-based long short term memory (LSTM) model (Phase 1 or P1) processes linear frequency cepstral coefficients (LFCC) [6] features from each EEG channel separately. We use the hypotheses generated by the P1 model and create additional features that carry information about the detected events and their confidence. The P2 model uses these additional features and the LFCC features to learn the temporal and spatial aspects of the EEG signals using a hybrid convolutional neural network (CNN) and LSTM model. Finally, Phase 3 aggregates the results from both P1 and P2 before applying a final postprocessing step. The online system implements Phase 1 by taking advantage of the Linux piping mechanism, multithreading techniques, and multi-core processors. To convert Phase 1 into an online system, we divide the system into five major modules: signal preprocessor, feature extractor, event decoder, postprocessor, and visualizer. The system reads 0.1-second frames from each EEG channel and sends them to the feature extractor and the visualizer. The feature extractor generates LFCC features in real time from the streaming EEG signal. Next, the system computes seizure and background probabilities using a channel-based LSTM model and applies a postprocessor to aggregate the detected events across channels. The system then displays the EEG signal and the decisions simultaneously using a visualization module. The online system uses C++, Python, TensorFlow, and PyQtGraph in its implementation. The online system accepts streamed EEG data sampled at 250 Hz as input. The system begins processing the EEG signal by applying a TCP montage [8]. Depending on the type of the montage, the EEG signal can have either 22 or 20 channels. To enable the online operation, we send 0.1-second (25 samples) length frames from each channel of the streamed EEG signal to the feature extractor and the visualizer. Feature extraction is performed sequentially on each channel. The signal preprocessor writes the sample frames into two streams to facilitate these modules. In the first stream, the feature extractor receives the signals using stdin. In parallel, as a second stream, the visualizer shares a user-defined file with the signal preprocessor. This user-defined file holds raw signal information as a buffer for the visualizer. The signal preprocessor writes into the file while the visualizer reads from it. Reading and writing into the same file poses a challenge. The visualizer can start reading while the signal preprocessor is writing into it. To resolve this issue, we utilize a file locking mechanism in the signal preprocessor and visualizer. Each of the processes temporarily locks the file, performs its operation, releases the lock, and tries to obtain the lock after a waiting period. The file locking mechanism ensures that only one process can access the file by prohibiting other processes from reading or writing while one process is modifying the file [9]. The feature extractor uses circular buffers to save 0.3 seconds or 75 samples from each channel for extracting 0.2-second or 50-sample long center-aligned windows. The module generates 8 absolute LFCC features where the zeroth cepstral coefficient is replaced by a temporal domain energy term. For extracting the rest of the features, three pipelines are used. The differential energy feature is calculated in a 0.9-second absolute feature window with a frame size of 0.1 seconds. The difference between the maximum and minimum temporal energy terms is calculated in this range. Then, the first derivative or the delta features are calculated using another 0.9-second window. Finally, the second derivative or delta-delta features are calculated using a 0.3-second window [6]. The differential energy for the delta-delta features is not included. In total, we extract 26 features from the raw sample windows which add 1.1 seconds of delay to the system. We used the Temple University Hospital Seizure Database (TUSZ) v1.2.1 for developing the online system [10]. The statistics for this dataset are shown in Table 1. A channel-based LSTM model was trained using the features derived from the train set using the online feature extractor module. A window-based normalization technique was applied to those features. In the offline model, we scale features by normalizing using the maximum absolute value of a channel [11] before applying a sliding window approach. Since the online system has access to a limited amount of data, we normalize based on the observed window. The model uses the feature vectors with a frame size of 1 second and a window size of 7 seconds. We evaluated the model using the offline P1 postprocessor to determine the efficacy of the delayed features and the window-based normalization technique. As shown by the results of experiments 1 and 4 in Table 2, these changes give us a comparable performance to the offline model. The online event decoder module utilizes this trained model for computing probabilities for the seizure and background classes. These posteriors are then postprocessed to remove spurious detections. The online postprocessor receives and saves 8 seconds of class posteriors in a buffer for further processing. It applies multiple heuristic filters (e.g., probability threshold) to make an overall decision by combining events across the channels. These filters evaluate the average confidence, the duration of a seizure, and the channels where the seizures were observed. The postprocessor delivers the label and confidence to the visualizer. The visualizer starts to display the signal as soon as it gets access to the signal file, as shown in Figure 1 using the “Signal File” and “Visualizer” blocks. Once the visualizer receives the label and confidence for the latest epoch from the postprocessor, it overlays the decision and color codes that epoch. The visualizer uses red for seizure with the label SEIZ and green for the background class with the label BCKG. Once the streaming finishes, the system saves three files: a signal file in which the sample frames are saved in the order they were streamed, a time segmented event (TSE) file with the overall decisions and confidences, and a hypotheses (HYP) file that saves the label and confidence for each epoch. The user can plot the signal and decisions using the signal and HYP files with only the visualizer by enabling appropriate options. For comparing the performance of different stages of development, we used the test set of TUSZ v1.2.1 database. It contains 1015 EEG records of varying duration. The any-overlap performance [12] of the overall system shown in Figure 2 is 40.29% sensitivity with 5.77 FAs per 24 hours. For comparison, the previous state-of-the-art model developed on this database performed at 30.71% sensitivity with 6.77 FAs per 24 hours [3]. The individual performances of the deep learning phases are as follows: Phase 1’s (P1) performance is 39.46% sensitivity and 11.62 FAs per 24 hours, and Phase 2 detects seizures with 41.16% sensitivity and 11.69 FAs per 24 hours. We trained an LSTM model with the delayed features and the window-based normalization technique for developing the online system. Using the offline decoder and postprocessor, the model performed at 36.23% sensitivity with 9.52 FAs per 24 hours. The trained model was then evaluated with the online modules. The current performance of the overall online system is 45.80% sensitivity with 28.14 FAs per 24 hours. Table 2 summarizes the performances of these systems. The performance of the online system deviates from the offline P1 model because the online postprocessor fails to combine the events as the seizure probability fluctuates during an event. The modules in the online system add a total of 11.1 seconds of delay for processing each second of the data, as shown in Figure 3. In practice, we also count the time for loading the model and starting the visualizer block. When we consider these facts, the system consumes 15 seconds to display the first hypothesis. The system detects seizure onsets with an average latency of 15 seconds. Implementing an automatic seizure detection model in real time is not trivial. We used a variety of techniques such as the file locking mechanism, multithreading, circular buffers, real-time event decoding, and signal-decision plotting to realize the system. A video demonstrating the system is available at: The final conference submission will include a more detailed analysis of the online performance of each module. ACKNOWLEDGMENTS Research reported in this publication was most recently supported by the National Science Foundation Partnership for Innovation award number IIP-1827565 and the Pennsylvania Commonwealth Universal Research Enhancement Program (PA CURE). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the official views of any of these organizations. REFERENCES [1] A. Craik, Y. He, and J. L. Contreras-Vidal, “Deep learning for electroencephalogram (EEG) classification tasks: a review,” J. Neural Eng., vol. 16, no. 3, p. 031001, 2019. [2] A. C. Bridi, T. Q. Louro, and R. C. L. Da Silva, “Clinical Alarms in intensive care: implications of alarm fatigue for the safety of patients,” Rev. Lat. Am. Enfermagem, vol. 22, no. 6, p. 1034, 2014. [3] M. Golmohammadi, V. Shah, I. Obeid, and J. Picone, “Deep Learning Approaches for Automatic Seizure Detection from Scalp Electroencephalograms,” in Signal Processing in Medicine and Biology: Emerging Trends in Research and Applications, 1st ed., I. Obeid, I. Selesnick, and J. Picone, Eds. New York, New York, USA: Springer, 2020, pp. 233–274. [4] “CFM Olympic Brainz Monitor.” [Online]. Available: [Accessed: 17-Jul-2020]. [5] M. L. Scheuer, S. B. Wilson, A. Antony, G. Ghearing, A. Urban, and A. I. Bagic, “Seizure Detection: Interreader Agreement and Detection Algorithm Assessments Using a Large Dataset,” J. Clin. Neurophysiol., 2020. [6] A. Harati, M. Golmohammadi, S. Lopez, I. Obeid, and J. Picone, “Improved EEG Event Classification Using Differential Energy,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium, 2015, pp. 1–4. [7] V. Shah, C. Campbell, I. Obeid, and J. Picone, “Improved Spatio-Temporal Modeling in Automated Seizure Detection using Channel-Dependent Posteriors,” Neurocomputing, 2021. [8] W. Tatum, A. Husain, S. Benbadis, and P. Kaplan, Handbook of EEG Interpretation. New York City, New York, USA: Demos Medical Publishing, 2007. [9] D. P. Bovet and C. Marco, Understanding the Linux Kernel, 3rd ed. O’Reilly Media, Inc., 2005. [10] V. Shah et al., “The Temple University Hospital Seizure Detection Corpus,” Front. Neuroinform., vol. 12, pp. 1–6, 2018. [11] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011. [12] J. Gotman, D. Flanagan, J. Zhang, and B. Rosenblatt, “Automatic seizure detection in the newborn: Methods and initial evaluation,” Electroencephalogr. Clin. Neurophysiol., vol. 103, no. 3, pp. 356–362, 1997.« less
  5. Background Online physician reviews are an important source of information for prospective patients. In addition, they represent an untapped resource for studying the effects of gender on the doctor-patient relationship. Understanding gender differences in online reviews is important because it may impact the value of those reviews to patients. Documenting gender differences in patient experience may also help to improve the doctor-patient relationship. This is the first large-scale study of physician reviews to extensively investigate gender bias in online reviews or offer recommendations for improvements to online review systems to correct for gender bias and aid patients in selecting amore »physician. Objective This study examines 154,305 reviews from across the United States for all medical specialties. Our analysis includes a qualitative and quantitative examination of review content and physician rating with regard to doctor and reviewer gender. Methods A total of 154,305 reviews were sampled from Google Place reviews. Reviewer and doctor gender were inferred from names. Reviews were coded for overall patient experience (negative or positive) by collapsing a 5-star scale and coded for general categories (process, positive/negative soft skills), which were further subdivided into themes. Computational text processing methods were employed to apply this codebook to the entire data set, rendering it tractable to quantitative methods. Specifically, we estimated binary regression models to examine relationships between physician rating, patient experience themes, physician gender, and reviewer gender). Results Female reviewers wrote 60% more reviews than men. Male reviewers were more likely to give negative reviews (odds ratio [OR] 1.15, 95% CI 1.10-1.19; P<.001). Reviews of female physicians were considerably more negative than those of male physicians (OR 1.99, 95% CI 1.94-2.14; P<.001). Soft skills were more likely to be mentioned in the reviews written by female reviewers and about female physicians. Negative reviews of female doctors were more likely to mention candor (OR 1.61, 95% CI 1.42-1.82; P<.001) and amicability (OR 1.63, 95% CI 1.47-1.90; P<.001). Disrespect was associated with both female physicians (OR 1.42, 95% CI 1.35-1.51; P<.001) and female reviewers (OR 1.27, 95% CI 1.19-1.35; P<.001). Female patients were less likely to report disrespect from female doctors than expected from the base ORs (OR 1.19, 95% CI 1.04-1.32; P=.008), but this effect overrode only the effect for female reviewers. Conclusions This work reinforces findings in the extensive literature on gender differences and gender bias in patient-physician interaction. Its novel contribution lies in highlighting gender differences in online reviews. These reviews inform patients’ choice of doctor and thus affect both patients and physicians. The evidence of gender bias documented here suggests review sites may be improved by providing information about gender differences, controlling for gender when presenting composite ratings for physicians, and helping users write less biased reviews.« less