We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.
Sounds great :), is the dataset you created available for the public?
It's a good idea to make the dataset public. I'll try and make it available soon.
Awesome :) I was in the first whale detection challenge as well and I'd love to see how my method fares on more realistic data. Thanks!
Hi, I stumbled on your website and was quite impressed. I'm working with similar spectrograms and I was wondering if I could get some advice on how you managed to code connecting the peaks of the whale calls to produce that line in figure 3? Perhaps get some sample code?
I'm working with the Matlab programming language.
!!! Where is it? ... Thx! :)
Sorry, still can't put the dataset I was working on public.
But the 2015 DCLDE workshop has two pretty cool datasets that you should check out: http://www.cetus.ucsd.edu/d...
Oh!!! Its exactly what I needed! Do you know if there is a workshop coming up soon?
DCLDE takes places every two years.
It might be productive to try your image-matching techniques on the continuous wavelet transform of the sound (which is a fast operation), rather than just the sound frequency.
Wavelets are like Fourier transforms, but they preserve the temporal component, rather than switching completely to the frequency domain.
Thanks for the pointer. I'll give the wavelet transform a try.
Hi ,
Did you ever implement this with wavelets?
(I've been inspired by your post/approach to try this in a totally different domain using "interpreted" time series, and was curious how wavelets had worked for you).
Interesting. I found out that scipy.signal has an implementation of CWT with the ricker wavelet
http://docs.scipy.org/doc/s...
Do you have any particular implementation or usage recommendation for CWT using Python libraries?
Great article. If I recall, there were 2 Kaggle competitions (http://www.kaggle.com/c/the..., http://www.kaggle.com/c/wha... for this task and the winner of both (Team Sluicebox) didn't use deep learning. Any idea how your current method compares with those results?
In terms of Kaggle score, you can directly compare mine and SluiceBox's results in the leaderboards: in the first competition they reached +0.3% better AUC, and in the second +0.2%.
My method hasn't really changed much since, it's just the datasets and requirements that changed.
Kridler has a SciPy talk where he describes his (great) approach. I think it involved a fair bit more labour; I remember it took SluiceBox a while before they reached the score that I had produced in the first weekend. So perhaps development time is one of the significant differences between the two approaches.
Interesting: if I remember correctly, for the second challenge, Kridler discovered by visual inspection of the spectrograms that the low frequency part was mostly noise so he decided to crop that part out before extracting features by convolution of templates on the spectrogram before feeding the features to his random forests.
In your case you don't need to extract convolutional features as the CNN will do it for you. But have you tried to crop the noisy, low frequency part manually? Would have it improved the final AUC or is the CNN able to judge that this is useless?
That's right. SluiceBox had to select good templates for their template matching by hand, I didn't have to do anything like that.
Regarding cropping the low frequency part; I guess that might have helped shave off a few more errors, since the CNN isn't too clever about where it finds the pattern it's looking for, so it will look not only in the typical 80-200 Hz up-call frequency range but also below. On the other hand, it knows how to deal with noise, so would have probably disregarded most of the low frequencies if they were indeed very noisy.
Working with the problem in real-life, I've become less worried though about improving on those 0.2%, and more interested in robustness, execution speed and the ability to adapt to new data and conditions.
That's a great story. I enjoyed this article, maybe even more than you enjoyed listening to underwater sounds.
Thanks, Zygmunt!
On the technical side, how did you choose hyperparams (conv. kernel size, number of layers etc.) for your network? Was there a lot of tuning?
I have a few slides that explain how I chose and optimized the CNN's hyper parameters in my DCLDE talk. They start on slide 28 with the title "Practical tips for better results": https://speakerdeck.com/dno...
I did quite a bit of tuning as part of the challenge, but most of it was done to squeeze fractions of a percent in the results. That is, it didn't make too much of a difference once I'd figured out a general architecture that worked. This general architecture was in turn very similar to convnet architectures you can find in work from Krizhevsky et al.
Despite what others say, my experience with these nets is that they are actually quite robust to learn. I find the search for hyper parameters that work well isn't too hard, once you follow a few basic rules as outlined in the talk.
Thanks, that's what I was looking for.
Hello Daniel, I am reading figure 2, and I'm wondering if the blue line is drawn manually or it's an output of CNN algorithm? If later, could you explain a bit? Many thanks!
Jing
No, the blue line in Figure 2 isn't produced by the CNN. I lifted that figure from the Kaggle competition website. It's likely the result of a feature extractor that's looking for sound parameters like duration, minimum/maximum frequency. Maybe similar to what Gillespie describes in "Detection and classification of right whale calls using an edge detector operating on a smoothed spectrogram".
Thank you. So this could be the output of an edge detection algorithm for dominant frequency. I need to read Gillespie's paper.
Fantastic writeup, Daniel. It's always very interesting trying models on other datasets to look at how they perform and how they can be adapted to work on a more general set. Thanks again for taking the time to write up all of these notes and results!
Great article! Thanks for sharing all this valuable information.
Can you please elaborate on your technique of spectrogram analysis. Did you convert the spectrograms into arrays or what was the procedure you followed?
On that topic - is your code available? I'd love ot play with implementing this
Very, Very neat! (I've been forwarding this post to a number of ML friends).
Still, wouldn't it be worthwhile to work on the audio images directly? Granted, you'd need to think up convolutions for the audio signals, but still, it seems odd to work on an audio image as an image, instead of the audio signal itself (If you can decompose it into a fixed length signal or window).
I am also curious to head your insight on this matter!
great post :) http://www.educhatforums.com/
Looks wonderful!