Cerebral aneurysm clips are biomedical implants applied by neurosurgeons to re-approximate arterial vessel walls and prevent catastrophic aneurysmal hemorrhages in patients. Current methods of aneurysm clip production are labor intensive and time-consuming, leading to high costs per implant and limited variability in clip morphology. Metal additive manufacturing is investigated as an alternative to traditional manufacturing methods that may enable production of patient-specific aneurysm clips to account for variations in individual vascular anatomy and possibly reduce surgical complication risks. Relevant challenges to metal additive manufacturing are investigated for biomedical implants, including material choice, design limitations, postprocessing, printed material properties, and combined production methods. Initial experiments with additive manufacturing of 316 L stainless steel aneurysm clips are carried out on a selective laser melting (SLM) system. The dimensions of the printed clips were found to be within 0.5% of the dimensions of the designed clips. Hardness and density of the printed clips (213 ± 7 HV1 and 7.9 g/cc, respectively) were very close to reported values for 316 L stainless steel, as expected. No ferrite and minimal porosity is observed in a cross section of a printed clip, with some anisotropy in the grain orientation. A clamping force of approximately 1 N is measured with a clip separation of 1.5 mm. Metal additive manufacturing shows promise for use in the creation of custom aneurysm clips, but some of the challenges discussed will need to be addressed before clinical use is possible.
more »
« less
Beyond Laurel/Yanny: An Autoencoder-Enabled Search for Polyperceivable Audio
The famous “laurel/yanny” phenomenon references an audio clip that elicits dramatically different responses from different listeners. For the original clip, roughly half the population hears the word “laurel,” while the other half hears “yanny.” How common are such ``polyperceivable'' audio clips? In this paper we apply ML techniques to study the prevalence of polyperceivability in spoken language. We devise a metric that correlates with polyperceivability of audio clips, use it to efficiently find new “laurel/yanny”-type examples, and validate these results with human experiments. Our results suggest that polyperceivable examples are surprisingly prevalent, existing for >2% of English words.
more »
« less
- Award ID(s):
- 1813049
- PAR ID:
- 10275107
- Date Published:
- Journal Name:
- ACL-IJCNLP
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this paper, the authors explore different approaches to animating 3D facial emotions, some of which use manual keyframe animation and some of which use machine learning. To compare approaches the authors conducted an experiment consisting of side-by-side comparisons of animation clips generated by skeleton, blendshape, audio-driven, and vision-based capture facial animation techniques. Ninety-five participants viewed twenty face animation clips of characters expressing five distinct emotions (anger, sadness, happiness, fear, neutral), which were created using the four different facial animation techniques. After viewing each clip, the participants were asked to identify the emotions that the characters appeared to be conveying and rate their naturalness. Findings showed that the naturalness ratings of the happy emotion produced by the four methods tended to be consistent, whereas the naturalness ratings of the fear emotion created with skeletal animation were significantly higher than the other methods. Recognition of sad and neutral emotions were very low for all methods as compared to the other emotions. Overall, the skeleton approach had significantly higher ratings for naturalness and higher recognition rate than the other methods.more » « less
-
Ecologists interested in monitoring the effects caused by climate change are increasingly turning to passive acoustic monitoring, the practice of placing autonomous audio recording units in ecosystems to monitor species richness and occupancy via species calls. However, identifying species calls in large datasets by hand is an expensive task, leading to a reliance on machine learning models. Due to a lack of annotated datasets of soundscape recordings, these models are often trained on large databases of community created focal recordings. A challenge of training on such data is that clips are given a "weak label," a single label that represents the whole clip. This includes segments that only have background noise but are labeled as calls in the training data, reducing model performance. Heuristic methods exist to convert clip-level labels to "strong" call-specific labels, where the label tightly bounds the temporal length of the call and better identifies bird vocalizations. Our work improves on the current weakly to strongly labeled method used on the training data for BirdNET, the current most popular model for audio species classification. We utilize an existing RNN-CNN hybrid, resulting in a precision improvement of 12% (going to 90% precision) against our new strongly hand-labeled dataset of Peruvian bird species.Jacob Ayers (Engineers for Exploration at UCSD); Sean Perry (University of California San Diego); Samantha Prestrelski (UC San Diego); Tianqi Zhang (Engineers for Exploration); Ludwig von Schoenfeldt (University of California San Diego); Mugen Blue (UC Merced); Gabriel Steinberg (Demining Research Community); Mathias Tobler (San Diego Zoo Wildlife Alliance); Ian Ingram (San Diego Zoo Wildlife Alliance); Curt Schurgers (UC San Diego); Ryan Kastner (University of California San Diego)more » « less
-
The goal of this research is to develop Animated Pedagogical Agents (APA) that can convey clearly perceivable emotions through speech, facial expressions and body gestures. In particular, the two studies reported in the paper investigated the extent to which modifications to the range of movement of 3 beat gestures, e.g., both arms synchronous outward gesture, both arms synchronous forward gesture, and upper body lean, and the agent‘s gender have significant effects on viewer’s perception of the agent’s emotion in terms of valence and arousal. For each gesture the range of movement was varied at 2 discrete levels. The stimuli of the studies were two sets of 12-s animation clips generated using fractional factorial designs; in each clip an animated agent who speaks and gestures, gives a lecture segment on binomial probability. 50% of the clips featured a female agent and 50% of the clips featured a male agent. In the first study, which used a within-subject design and metric conjoint analysis, 120 subjects were asked to watch 8 stimuli clips and rank them according to perceived valence and arousal (from highest to lowest). In the second study, which used a between-subject design, 300 participants were assigned to two groups of 150 subjects each. One group watched 8 clips featuring the male agent and one group watched 8 clips featuring the female agent. Each participant was asked to rate perceived valence and arousal for each clip using a 7-point Likert scale. Results from the two studies suggest that the more open and forward the gestures the agent makes, the higher the perceived valence and arousal. Surprisingly, agents who lean their body forward more are not perceived as having higher arousal and valence. Findings also show that female agents’ emotions are perceived as having higher arousal and more positive valence that male agents’ emotions.more » « less
-
Generating realistic audio for human actions is critical for applications such as film sound effects and virtual reality games. Existing methods assume complete correspondence between video and audio during training, but in real-world settings, many sounds occur off-screen or weakly correspond to visuals, leading to uncontrolled ambient sounds or hallucinations at test time. This paper introduces AV-LDM, a novel ambient-aware audio generation model that disentangles foreground action sounds from ambient background noise in in-the-wild training videos. The approach leverages a retrieval-augmented generation framework to synthesize audio that aligns both semantically and temporally with the visual input. Trained and evaluated on Ego4D and EPIC-KITCHENS datasets, along with the newly introduced Ego4D-Sounds dataset (1.2M curated clips with action-audio correspondence), the model outperforms prior methods, enables controllable ambient sound generation, and shows promise for generalization to synthetic video game clips. This work is the first to emphasize faithful video-to-audio generation focused on observed visual content despite noisy, uncurated training data.more » « less
An official website of the United States government

