skip to main content

Title: Gender, Soft Skills, and Patient Experience in Online Physician Reviews: A Large-Scale Text Analysis
Background Online physician reviews are an important source of information for prospective patients. In addition, they represent an untapped resource for studying the effects of gender on the doctor-patient relationship. Understanding gender differences in online reviews is important because it may impact the value of those reviews to patients. Documenting gender differences in patient experience may also help to improve the doctor-patient relationship. This is the first large-scale study of physician reviews to extensively investigate gender bias in online reviews or offer recommendations for improvements to online review systems to correct for gender bias and aid patients in selecting a physician. Objective This study examines 154,305 reviews from across the United States for all medical specialties. Our analysis includes a qualitative and quantitative examination of review content and physician rating with regard to doctor and reviewer gender. Methods A total of 154,305 reviews were sampled from Google Place reviews. Reviewer and doctor gender were inferred from names. Reviews were coded for overall patient experience (negative or positive) by collapsing a 5-star scale and coded for general categories (process, positive/negative soft skills), which were further subdivided into themes. Computational text processing methods were employed to apply this codebook to the entire more » data set, rendering it tractable to quantitative methods. Specifically, we estimated binary regression models to examine relationships between physician rating, patient experience themes, physician gender, and reviewer gender). Results Female reviewers wrote 60% more reviews than men. Male reviewers were more likely to give negative reviews (odds ratio [OR] 1.15, 95% CI 1.10-1.19; P<.001). Reviews of female physicians were considerably more negative than those of male physicians (OR 1.99, 95% CI 1.94-2.14; P<.001). Soft skills were more likely to be mentioned in the reviews written by female reviewers and about female physicians. Negative reviews of female doctors were more likely to mention candor (OR 1.61, 95% CI 1.42-1.82; P<.001) and amicability (OR 1.63, 95% CI 1.47-1.90; P<.001). Disrespect was associated with both female physicians (OR 1.42, 95% CI 1.35-1.51; P<.001) and female reviewers (OR 1.27, 95% CI 1.19-1.35; P<.001). Female patients were less likely to report disrespect from female doctors than expected from the base ORs (OR 1.19, 95% CI 1.04-1.32; P=.008), but this effect overrode only the effect for female reviewers. Conclusions This work reinforces findings in the extensive literature on gender differences and gender bias in patient-physician interaction. Its novel contribution lies in highlighting gender differences in online reviews. These reviews inform patients’ choice of doctor and thus affect both patients and physicians. The evidence of gender bias documented here suggests review sites may be improved by providing information about gender differences, controlling for gender when presenting composite ratings for physicians, and helping users write less biased reviews. « less
; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Journal of Medical Internet Research
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Objectives Electronic health record systems are increasingly used to send messages to physicians, but research on physicians’ inbox use patterns is limited. This study’s aims were to (1) quantify the time primary care physicians (PCPs) spend managing inboxes; (2) describe daily patterns of inbox use; (3) investigate which types of messages consume the most time; and (4) identify factors associated with inbox work duration. Materials and Methods We analyzed 1 month of electronic inbox data for 1275 PCPs in a large medical group and linked these data with physicians’ demographic data. Results PCPs spent an average of 52 minutesmore »on inbox management on workdays, including 19 minutes (37%) outside work hours. Temporal patterns of electronic inbox use differed from other EHR functions such as charting. Patient-initiated messages (28%) and results (29%) accounted for the most inbox work time. PCPs with higher inbox work duration were more likely to be female (P < .001), have more patient encounters (P < .001), have older patients (P < .001), spend proportionally more time on patient messages (P < .001), and spend more time per message (P < .001). Compared with PCPs with the lowest duration of time on inbox work, PCPs with the highest duration had more message views per workday (200 vs 109; P < .001) and spent more time on the inbox outside work hours (30 minutes vs 9.7 minutes; P < .001). Conclusions Electronic inbox work by PCPs requires roughly an hour per workday, much of which occurs outside scheduled work hours. Interventions to assist PCPs in handling patient-initiated messages and results may help alleviate inbox workload.« less
  2. In the era of big data, online doctor review platforms, which enable patients to give feedback to their doctors, have become one of the most important components in healthcare systems. On one hand, they help patients to choose their doctors based on the experience of others. On the other hand, they help doctors to improve the quality of their service. Moreover, they provide important sources for us to discover common concerns of patients and existing problems in clinics, which potentially improve current healthcare systems. In this paper, we systematically investigate the dataset from one of such review platform, namely,,more »where each review for a doctor comes with an overall rating and ratings of four different aspects. A comprehensive statistical analysis is conducted first for reviews, ratings, and doctors. Then, we explore the content of reviews by extracting latent topics related to different aspects with unsupervised topic modeling techniques. As the core component of this paper, we propose a multi-task learning framework for the document-level multi-aspect sentiment classification. This task helps us to not only recover missing aspect-level ratings and detect inconsistent rating scores but also identify aspect-keywords for a given review based on ratings. The proposed model takes both features of doctors and aspect-keywords into consideration. Extensive experiments have been conducted on two subsets of ratemds dataset to demonstrate the effectiveness of the proposed model.« less
  3. Objective: To identify differences in short-term outcomes of patients with coronavirus disease 2019 (COVID-19) according to various racial/ethnic groups.Design: Analysis of Cerner de-identified COVID-19 dataset.Setting: A total of 62 health care facilities.Participants: The cohort included 49,277 adult COVID-19 patients who were hospitalized from December 1, 2019 to November 13, 2020.Methods: We compared patients’ age, gender, individual components of Charl­son and Elixhauser comorbidities, medical complications, use of do-not-resuscitate, use of palliative care, and socioeconomic status between various racial and/or ethnic groups. We further compared the rates of in-hos­pital mortality and non-routine discharges between various racial and/or ethnic groups.Main Outcome Measures: Themore »primary outcome of interest was in-hospital mortali­ty. The secondary outcome was non-routine discharge (discharge to destinations other than home, such as short-term hospitals or other facilities including intermediate care and skilled nursing homes).Results: Compared with White patients, in-hospital mortality was significantly higher among African American (OR 1.5; 95%CI:1.3-1.6, P<.001), Hispanic (OR1.4; 95%CI:1.3-1.6, P<.001), and Asian or Pacific Islander (OR 1.5; 95%CI: 1.1-1.9, P=.002) patients after adjustment for age and gender, Elixhauser comorbidities, do-not-resuscitate status, palliative care use, and socioeconomic status.Conclusions: Our study found that, among hospitalized patients with COVID-2019, African American, Hispanic, and Asian or Pacific Islander patients had increased mortality compared with White patients after adjusting for sociodemographic factors, comorbidities, and do-not-resuscitate/pallia­tive care status. Our findings add additional perspective to other recent studies. Ethn Dis. 2021;31(3):389-398; doi:10.18865/ed.31.3.389« less
  4. Engineering Projects in Community Service (EPICS) is a middle and high school program, with a focus on the engineering design process and delivering real solutions to community partners. In order to evaluate the efficacy of the program, a pre-post test design was implemented to examine changes in attitudinal and behavioral measures. Pre-data were collected at the beginning of the school year, and paralleled the program’s registration process to ensure high response rates; post- data were then collected at the end of the school year. Demographic data demonstrate that of all 2018 - 2019 registered EPICS participants (N = 414), 41more »percent were female; 66.6 percent were non-white; and 30 percent held first generation student status. Importantly, 68.5 percent of participants reported that neither parent or guardian is an engineer, and 65.7 percent of participants reported that they “definitely will attend” a four-year university. These data suggest that the current sample is ideal for evaluating EPICS as a pre-college engineering education program, because most participants are not experiencing engineering in the home and may be less susceptible to parental pressures for choosing engineering as a college major and potential career, but have salient intentions to attend college. In addition to collecting demographic information, participants completed a series of measures designed to capture attitudes and behaviors toward engineering as a potential career field. The main measures of interest include Engineering Identity and Doing Engineering. Engineering Identity scores reflect participants’ personal and professional identities as engineers; Doing Engineering scores indicate participants’ prior experience with engineering and its related technical skills. Baseline data on the sample reveal average engineering identities (M = 38.41, SD = 6.44, 95% CI [37.77, 39.05]). A series of t-tests was conducted to examine gender differences in these measures. Men reported significantly higher engineering identities (M = 37.65, SD = 6.58) compared to women (M = 39.54, SD = 6.09), t(360) = 2.95, p = .003, F = .037. Men reported stronger and more frequent experiences with engineering, indicated by their higher Doing Engineering scores (M = 13.75, SD = 5.16), compared to women (M = 15.31, SD = 4.69), t(368) = 3.13, p = .002, F = .003. Interestingly, first generation students reported higher engineering identities (M = 37.45, SD = 6.53) compared to non-first generation students (M = 39.66, SD = 5.99), t(375) = 3.46, p = .001, F = 1.39. To examine the relationship between Engineering Identity and Doing Engineering, a correlation analysis was conducted and a moderate, positive relationship emerged, such that as students’ experience with engineering increased, their engineering identities also increased (R = .463, p > .000).« less
  5. Abstract STUDY QUESTION To what extent does the use of mobile computing apps to track the menstrual cycle and the fertile window influence fecundability among women trying to conceive? SUMMARY ANSWER After adjusting for potential confounders, use of any of several different apps was associated with increased fecundability ranging from 12% to 20% per cycle of attempt. WHAT IS KNOWN ALREADY Many women are using mobile computing apps to track their menstrual cycle and the fertile window, including while trying to conceive. STUDY DESIGN, SIZE, DURATION The Pregnancy Study Online (PRESTO) is a North American prospective internet-based cohort of womenmore »who are aged 21–45 years, trying to conceive and not using contraception or fertility treatment at baseline. PARTICIPANTS/MATERIALS, SETTING, METHODS We restricted the analysis to 8363 women trying to conceive for no more than 6 months at baseline; the women were recruited from June 2013 through May 2019. Women completed questionnaires at baseline and every 2 months for up to 1 year. The main outcome was fecundability, i.e. the per-cycle probability of conception, which we assessed using self-reported data on time to pregnancy (confirmed by positive home pregnancy test) in menstrual cycles. On the baseline and follow-up questionnaires, women reported whether they used mobile computing apps to track their menstrual cycles (‘cycle apps’) and, if so, which one(s). We estimated fecundability ratios (FRs) for the use of cycle apps, adjusted for female age, race/ethnicity, prior pregnancy, BMI, income, current smoking, education, partner education, caffeine intake, use of hormonal contraceptives as the last method of contraception, hours of sleep per night, cycle regularity, use of prenatal supplements, marital status, intercourse frequency and history of subfertility. We also examined the impact of concurrent use of fertility indicators: basal body temperature, cervical fluid, cervix position and/or urine LH. MAIN RESULTS AND THE ROLE OF CHANCE Among 8363 women, 6077 (72.7%) were using one or more cycle apps at baseline. A total of 122 separate apps were reported by women. We designated five of these apps before analysis as more likely to be effective (Clue, Fertility Friend, Glow, Kindara, Ovia; hereafter referred to as ‘selected apps’). The use of any app at baseline was associated with 20% increased fecundability, with little difference between selected apps versus other apps (selected apps FR (95% CI): 1.20 (1.13, 1.28); all other apps 1.21 (1.13, 1.30)). In time-varying analyses, cycle app use was associated with 12–15% increased fecundability (selected apps FR (95% CI): 1.12 (1.04, 1.21); all other apps 1.15 (1.07, 1.24)). When apps were used at baseline with one or more fertility indicators, there was higher fecundability than without fertility indicators (selected apps with indicators FR (95% CI): 1.23 (1.14, 1.34) versus without indicators 1.17 (1.05, 1.30); other apps with indicators 1.30 (1.19, 1.43) versus without indicators 1.16 (1.06, 1.27)). In time-varying analyses, results were similar when stratified by time trying at study entry (<3 vs. 3–6 cycles) or cycle regularity. For use of the selected apps, we observed higher fecundability among women with a history of subfertility: FR 1.33 (1.05–1.67). LIMITATIONS, REASONS FOR CAUTION Neither regularity nor intensity of app use was ascertained. The prospective time-varying assessment of app use was based on questionnaires completed every 2 months, which would not capture more frequent changes. Intercourse frequency was also reported retrospectively and we do not have data on timing of intercourse relative to the fertile window. Although we controlled for a wide range of covariates, we cannot exclude the possibility of residual confounding (e.g. choosing to use an app in this observational study may be a marker for unmeasured health habits promoting fecundability). Half of the women in the study received a free premium subscription for one of the apps (Fertility Friend), which may have increased the overall prevalence of app use in the time-varying analyses, but would not affect app use at baseline. Most women in the study were college educated, which may limit application of results to other populations. WIDER IMPLICATIONS OF THE FINDINGS Use of a cycle app, especially in combination with observation of one or more fertility indicators (basal body temperature, cervical fluid, cervix position and/or urine LH), may increase fecundability (per-cycle pregnancy probability) by about 12–20% for couples trying to conceive. We did not find consistent evidence of improved fecundability resulting from use of one specific app over another. STUDY FUNDING/COMPETING INTEREST(S) This research was supported by grants, R21HD072326 and R01HD086742, from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, USA. In the last 3 years, Dr L.A.W. has served as a fibroid consultant for Dr L.A.W. has also received in-kind donations from Sandstone Diagnostics, Swiss Precision Diagnostics, and for primary data collection and participant incentives in the PRESTO cohort. Dr J.B.S. reports personal fees from Swiss Precision Diagnostics, outside the submitted work. The remaining authors have nothing to declare. TRIAL REGISTRATION NUMBER N/A.« less