Discourse parsing has proven to be useful for a number of NLP tasks that require complex reasoning. However, over a decade since the advent of the Penn Discourse Treebank, predicting implicit discourse relations in text remains challenging. There are several possible reasons for this, and we hypothesize that models should be exposed to more context as it plays an important role in accurate human annotation; meanwhile adding uncertainty measures can improve model accuracy and calibration. To thoroughly investigate this phenomenon, we perform a series of experiments to determine 1) the effects of context on human judgments, and 2) the effect of quantifying uncertainty with annotator confidence ratings on model accuracy and calibration (which we measure using the Brier score (Brier et al, 1950)). We find that including annotator accuracy and confidence improves model accuracy, and incorporating confidence in the model’s temperature function can lead to models with significantly better-calibrated confidence measures. We also find some insightful qualitative results regarding human and model behavior on these datasets.
more »
« less
Skin Deep: Investigating Subjectivity in Skin Tone Annotations for Computer Vision Benchmark Datasets
To investigate the well-observed racial disparities in computer vision systems that analyze images of humans, researchers have turned to skin tone as a more objective annotation than race metadata for fairness performance evaluations. However, the current state of skin tone annotation procedures is highly varied. For instance, researchers use a range of untested scales and skin tone categories, have unclear annotation procedures, and provide inadequate analyses of uncertainty. In addition, little attention is paid to the positionality of the humans involved in the annotation process—both designers and annotators alike—and the historical and sociological context of skin tone in the United States. Our work is the first to investigate the skin tone annotation process as a sociotechnical project. We surveyed recent skin tone annotation procedures and conducted annotation experiments to examine how subjective understandings of skin tone are embedded in skin tone annotation procedures. Our systematic literature review revealed the uninterrogated association between skin tone and race and the limited effort to analyze annotator uncertainty in current procedures for skin tone annotation in computer vision evaluation. Our experiments demonstrated that design decisions in the annotation procedure such as the order in which the skin tone scale is presented or additional context in the image (i.e., presence of a face) significantly affected the resulting inter-annotator agreement and individual uncertainty of skin tone annotations. We call for greater reflexivity in the design, analysis, and documentation of procedures for evaluation using skin tone.
more »
« less
- Award ID(s):
- 2120497
- PAR ID:
- 10426294
- Date Published:
- Journal Name:
- ACM Conference on Fairness, Accountability, and Transparency
- Page Range / eLocation ID:
- 1757 to 1771
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
As US society continues to diversify and calls for better measurements of racialized appearance increase, survey researchers need guidance about effective strategies for assessing skin color in field research. This study examined the consistency, comparability, and meaningfulness of the two most widely used skin tone rating scales (Massey–Martin and PERLA) and two portable and inexpensive handheld devices for skin color measurement (Nix colorimeter and Labby spectrophotometer). We collected data in person using these four instruments from forty-six college students selected to reflect a wide range of skin tones across four racial-ethnic groups (Asian, Black, Latinx, White). These college students, five study staff, and 459 adults from an online sample also rated forty stock photos, again selected for skin tone diversity. Our results—based on data collected under controlled conditions—demonstrate high consistency across raters and readings. The Massey–Martin and PERLA scale scores were highly linearly related to each other, although PERLA better differentiated among people with the lightest skin tones. The Nix and Labby darkness-to-lightness (L*) readings were likewise linearly related to each other and to the Massey–Martin and PERLA scores, in addition to showing expected variation within and between race ethnicities. In addition, darker Massey–Martin and PERLA ratings correlated with online raters’ expectations that a photographed person experienced greater discrimination. In contrast, the redness (a*) and yellowness (b*) undertones were highest in the mid-range of the rating scale scores and demonstrated greater overlap across race-ethnicities. Overall, each instrument showed sufficient consistency, comparability, and meaningfulness for use in field surveys when implemented soundly (e.g., not requiring memorization). However, PERLA might be preferred to Massey–Martin in studies representing individuals with the lightest skin tones, and handheld devices may be preferred to rating scales to reduce measurement error when studies could gather only a single rating.more » « less
-
Face recognition (FR) systems are fast becoming ubiquitous. However, differential performance among certain demographics was identified in several widely used FR models. The skin tone of the subject is an important factor in addressing the differential performance. Previous work has used modeling methods to propose skin tone measures of subjects across different illuminations or utilized subjective labels of skin color and demographic information. However, such models heavily rely on consistent background and lighting for calibration, or utilize labeled datasets, which are time-consuming to generate or are unavailable. In this work, we have developed a novel and data-driven skin color measure capable of accurately representing subjects' skin tone from a single image, without requiring a consistent background or illumination. Our measure leverages the dichromatic reflection model in RGB space to decompose skin patches into diffuse and specular bases.more » « less
-
Recent progress in data-driven vision and language-based tasks demands developing training datasets enriched with multiple modalities representing human intelligence. The link between text and image data is one of the crucial modalities for developing AI models. The development process of such datasets in the video domain requires much effort from researchers and annotators (experts and non-experts). Researchers re-design annotation tools to extract knowledge from annotators to answer new research questions. The whole process repeats for each new question which is timeconsuming. However, since the last decade, there has been little change in how the researchers and annotators interact with the annotation process. We revisit the annotation workflow and propose a concept of an adaptable and scalable annotation tool. The concept emphasizes its users’ interactivity to make annotation process design seamless and efficient. Researchers can conveniently add newer modalities to or augment the extant datasets using the tool. The annotators can efficiently link free-form text to image objects. For conducting human-subject experiments on any scale, the tool supports the data collection for attaining group ground truth. We have conducted a case study using a prototype tool between two groups with the participation of 74 non-expert people. We find that the interactive linking of free-form text to image objects feels intuitive and evokes a thought process resulting in a high-quality annotation. The new design shows ≈ 35% improvement in the data annotation quality. On UX evaluation, we receive above-average positive feedback from 25 people regarding convenience, UI assistance, usability, and satisfaction.more » « less
-
Abstract The evolution of skin pigmentation has been shaped by numerous biological and cultural shifts throughout human history. Vitamin D is considered a driver of depigmentation evolution in humans, given the deleterious health effects associated with vitamin D deficiency, which is often shaped by cultural factors. New advancements in genomics and epigenomics have opened the door to a deeper exploration of skin pigmentation evolution in both contemporary and ancient populations. Data from ancient Europeans has offered great context to the spread of depigmentation alleles via the evaluation of migration events and cultural shifts that occurred during the Neolithic. However, novel insights can further be gained via the inclusion of diverse ancient and contemporary populations. Here we present on how potential biases and limitations in skin pigmentation research can be overcome with the integration of interdisciplinary data that includes both cultural and biological elements, which have shaped the evolutionary history of skin pigmentation in humans.more » « less
An official website of the United States government

