skip to main content


Title: Classifying Humpback Whale Calls to Song and Non-Song Vocalizations using Bag of Words Descriptor on Acoustic Data
Humpback whale behavior, population distribution and structure can be inferred from long term underwater passive acoustic monitoring of their vocalizations. Here we develop automatic approaches for classifying humpback whale vocalizations into the two categories of song and non-song, employing machine learning techniques. The vocalization behavior of humpback whales was monitored over instantaneous vast areas of the Gulf of Maine using a large aperture coherent hydrophone array system via the passive ocean acoustic waveguide remote sensing technique over multiple diel cycles in Fall 2006. We use wavelet signal denoising and coherent array processing to enhance the signal-to-noise ratio. To build features vector for every time sequence of the beamformed signals, we employ Bag of Words approach to time-frequency features. Finally, we apply Support Vector Machine (SVM), Neural Networks, and Naive Bayes to classify the acoustic data and compare their performances. Best results are obtained using Mel Frequency Cepstrum Coefficient (MFCC) features and SVM which leads to 94% accuracy and 72.73% F1-score for humpback whale song versus non-song vocalization classification, showing effectiveness of the proposed approach for real-time classification at sea.  more » « less
Award ID(s):
1736749
PAR ID:
10198587
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)
Page Range / eLocation ID:
865 to 870
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A large variety of sound sources in the ocean, including biological, geophysical, and man-made, can be simultaneously monitored over instantaneous continental-shelf scale regions via the passive ocean acoustic waveguide remote sensing (POAWRS) technique by employing a large-aperture densely-populated coherent hydrophone array system. Millions of acoustic signals received on the POAWRS system per day can make it challenging to identify individual sound sources. An automated classification system is necessary to enable sound sources to be recognized. Here, the objectives are to (i) gather a large training and test data set of fin whale vocalization and other acoustic signal detections; (ii) build multiple fin whale vocalization classifiers, including a logistic regression, support vector machine (SVM), decision tree, convolutional neural network (CNN), and long short-term memory (LSTM) network; (iii) evaluate and compare performance of these classifiers using multiple metrics including accuracy, precision, recall and F1-score; and (iv) integrate one of the classifiers into the existing POAWRS array and signal processing software. The findings presented here will (1) provide an automatic classifier for near real-time fin whale vocalization detection and recognition, useful in marine mammal monitoring applications; and (2) lay the foundation for building an automatic classifier applied for near real-time detection and recognition of a wide variety of biological, geophysical, and man-made sound sources typically detected by the POAWRS system in the ocean. 
    more » « less
  2. An eight-element oil-filled hydrophone array is used to measure the acoustic field in littoral waters. This prototype array was deployed during an experiment between Jeffrey’s Ledge and the Stellwagen Bank region off the coast of Rockport, Massachusetts USA. During the experiment, several humpback whale vocalizations, distant ship tonals and high frequency conventional echosounder pings were recorded. Visual confirmation of humpback moving in bearing relative to the array verifies the directional sensing from array beamforming. During deployment, the array is towed at speeds varying from 4-7 kts in water depths of roughly 100 m with conditions at sea state 2 to 3. This array system consists of a portable winch with array, tow cable and 3 water-resistant boxes housing electronics. This system is deployed and operated by 2 crew members onboard a 13 m commercial fishing vessel during the experiment. Non-acoustic sensor (NAS) information is obtained to provide depth, temperature, and heading data using commercial off the shelf (COTS) components utilizing RS485/232 data communications. Acoustic data sampling was performed at 8 kHz, 30 kHz and 100 kHz with near real-time processing of data and enhanced Signal to Noise Ratio (SNR) from beamforming. The electrical system components are deployed with 3 stacked electronics boxes housing power, data acquisition and data processing components in water resistant compartments. A laptop computer with 8 TB of external storage and an independent Global Positioning System (GPS) antenna is used to run Passive Ocean Acoustic Waveguide Remote Sensing (POAWRS) software providing beamformed spectrogram data and live NAS data with capability of capturing several days of data. The acquisition system consists of Surface Mount Device (SMD) pre-amplifiers with filter to an analog differential pair shipboard COTS acquisition system. Pre-amplifiers are constructed using SMD technology where components are pressure tolerant and potting is not necessary. Potting of connectors, electronics and hydrophones via 3D printed molding techniques will be discussed. Array internal components are manufactured with Thermoplastic Polyurethane (TPU) 3D printed material to dampen array vibrations with forward and aft vibration isolation modules (VIM). Polyurethane foam (PUF) used to scatter breathing waves and dampen contact from wires inside the array without attenuating high frequencies and allowing for significant noise reduction. A single Tygon array section with a length of 7.5 m and diameter of 38 mm contains 8 transducer elements with a spacing of 75 cm (1 kHz design frequency). Pre- amplifiers and NAS modules are affixed using Vectran and steel wire rope positioned by swaged stops along the strength member. The tow cable length is 100 m with a diameter of 22 mm that is potted to a hose adapter to break out 12 braided copper wire twisted pair conductors and terminates the tow cable Vectran braid. This array in its current state of development is a low-cost alternative to obtain quality acoustic data from a towed array system. Used here for observation of whale vocalizations, this type of array also has many applications in military sonar and seismic surveying. Maintenance on the array can be performed without the use of special facilities or equipment for dehosing and conveniently uses castor oil as an environmentally safe pressure compensating and coupling fluid. Array development including selection of transducers, NAS modules, acoustic acquisition system, array materials and method of construction with results from several deployments will be discussed. We also present beamformed spectrograms containing humpback whale downsweep moans and underwater blowing (bubbles) sounds associated with feeding on sand lance (Ammodytes dubius). 
    more » « less
  3. Abstract

    Birdsong is a longstanding model system for studying evolution and biodiversity. Here, we collected and analyzed high quality song recordings from seven species in the familyEstrildidae. We measured the acoustic features of syllables and then used dimensionality reduction and machine learning classifiers to identify features that accurately assigned syllables to species. Species differences were captured by the first 3 principal components, corresponding to basic frequency, power distribution, and spectrotemporal features. We then identified the measured features underlying classification accuracy. We found that fundamental frequency, mean frequency, spectral flatness, and syllable duration were the most informative features for species identification. Next, we tested whether specific acoustic features of species’ songs predicted phylogenetic distance. We found significant phylogenetic signal in syllable frequency features, but not in power distribution or spectrotemporal features. Results suggest that frequency features are more constrained by species’ genetics than are other features, and are the best signal features for identifying species from song recordings. The absence of phylogenetic signal in power distribution and spectrotemporal features suggests that these song features are labile, reflecting learning processes and individual recognition.

     
    more » « less
  4. Abstract

    Previous work has demonstrated that there is extensive variation in the songs of White-crowned Sparrow (Zonotrichia leucophrys) throughout the species range, including between neighboring (and genetically distinct) subspecies Z. l. nuttalli and Z. l. pugetensis. Using a machine learning approach to bioacoustic analysis, we demonstrate that variation in song is correlated with year of recording (representing cultural drift), geographic distance, and climatic differences, but the response is subspecies- and season-specific. Automated machine learning methods of bird song annotation can process large datasets more efficiently, allowing us to examine 1,913 recordings across ~60 years. We utilize a recently published artificial neural network to automatically annotate White-crowned Sparrow vocalizations. By analyzing differences in syllable usage and composition, we recapitulate the known pattern where Z. l. nuttalli and Z. l. pugetensis have significantly different songs. Our results are consistent with the interpretation that these differences are caused by the changes in characteristics of syllables in the White-crowned Sparrow repertoire. This supports the hypothesis that the evolution of vocalization behavior is affected by the environment, in addition to population structure.

     
    more » « less
  5. Abstract

    To better understand spawning vocalizations of Norwegian coastal cod (Gadus morhua), a prototype eight-element coherent hydrophone array was deployed in stationary vertical and towed horizontal modes to monitor cod sounds during an experiment in spring 2019. Depth distribution of cod aggregations was monitored concurrently with an ultrasonic echosounder. Cod vocalizations recorded on the hydrophone array are analysed to provide time–frequency characteristics, and source level distribution after correcting for one-way transmission losses from cod locations to the hydrophone array. The recorded cod vocalization frequencies range from ∼20 to 600 Hz with a peak power frequency of ∼60 Hz, average duration of 300 ms, and mean source level of 163.5 ± 7.9 dB re 1 μPa at 1 m. Spatial dependence of received cod vocalization rates is estimated using hydrophone array measurements as the array is towed horizontally from deeper surrounding waters to shallow water inlet areas of the experimental site. The bathymetric-dependent probability of detection regions for cod vocalizations are quantified and are found to be significantly reduced in shallow-water areas of the inlet. We show that the towable hydrophone array deployed from a moving vessel is invaluable because it can survey cod vocalization activity at multiple locations, providing continuous spatial coverage that is complementary to fixed sensor systems that provide continuous temporal coverage at a given location.

     
    more » « less