• Welcome to BirdForum, the internet's largest birding community with thousands of members from all over the world. The forums are dedicated to wild birds, birding, binoculars and equipment and all that goes with it.

    Please register for an account to take part in the discussions in the forum, post your pictures in the gallery and more.
Where premium quality meets exceptional value. ZEISS Conquest HDX.

How should I interpret BirdNET results? (1 Viewer)

OldFatherTime

Active member
United Kingdom
I'm using Chirpity and posed this question there, but as it uses BirdNET under the covers for analysis, this is really a BirdNET question.

For anyone who hasn't used Chirpity, it manages large numbers of recordings, feeds them to BirdNET and shows the combined results in an easy to explore display. One of its most useful (for me) summaries is the count of detections per species together with the highest confidence detection for that species. This combination of confidence figure plus number of detections is the basis of my assessment of whether the bird is present. A significant number of high confidence detections means the species is almost certainly present and I accept it as such. In addition, if prior knowledge is that the species in present, then I'll accept that as well. Anything else needs further investigation, and that will be a lot. How do I deal with all this? Obviously I can listen to the recording and look at the spectrogram to make an assessment. One problem for me is my limited hearing range - somewhere between 4-5kHz - which means I can't hear many bird songs, or at least part of them, so verification can be difficult. And of course you need the knowledge. But with potentially hundreds of recordings, individual assessment can be impractical.

Here's a summary of a recent analysis of recordings over 5 days from my local wood, 100 acres, mixed deciduous/coniferous:

Species name Max confidence Detections

Coal Tit 1.0 110
Eurasian Blue Tit 0.999 99
Eurasian Wren 0.997 859
European Robin 0.995 399
Goldcrest 0.994 25
Eurasian Blackbird 0.991 338
Common Woodpigeon 0.991 49
Common Chiffchaff 0.989 235
Northern Lapwing 0.971 5
Eurasian Oystercatcher 0.969 4
Water Rail 0.968 33
Eurasian Green Woodpecker 0.966 8
Dunnock 0.965 18
European Goldfinch 0.964 3
Common Chaffinch 0.964 7
Green Sandpiper 0.958 15
Black Redstart 0.955 5
Eurasian/Green-winged Teal 0.952 4
Great Tit 0.949 23
Song Thrush 0.948 139
European Pied Flycatcher 0.946 2
Eurasian Blackcap 0.942 29
Carrion Crow 0.942 18
Common Sandpiper 0.934 13
Common Buzzard 0.931 3
Spotted Flycatcher 0.931 14
Whimbrel 0.93 1
Tree Pipit 0.926 6
Common Greenshank 0.925 6
Tawny Owl 0.921 20
Marsh Tit 0.904 4
Great Spotted Woodpecker 0.893 1
Mistle Thrush 0.887 4
Snow Bunting 0.863 2
Common Crossbill 0.861 4
Willow Warbler 0.86 3
Little Ringed Plover 0.859 14
Yellowhammer 0.853 10
Eurasian Curlew 0.853 18
Long-eared Owl 0.845 1
Redwing 0.825 17
European Greenfinch 0.812 2
Long-tailed Tit 0.793 5
Pied Wagtail/White Wagtail 0.792 1
European Turtle Dove 0.792 1
Common Kingfisher 0.773 1
Hawfinch 0.745 1
Common Redshank 0.736 1
Common Firecrest 0.722 5
Spotted Redshank 0.721 1
Crested Tit 0.689 2
Grey Plover 0.672 1
Common Reed Bunting 0.671 1
Collared Dove 0.666 1
Barn Owl 0.615 1
Common Grasshopper Warbler 0.597 3
Wood Sandpiper 0.595 3
Ruddy Turnstone 0.589 1
Red Kite 0.577 1
Eurasian Siskin 0.564 2
Lesser Spotted Woodpecker 0.54 1
Common Tern 0.526 1
Stock Dove 0.514 1


(Note the confidence is the max for that species, other detections for that species are likely to be lower. And sorry, I tried to format these to make them more readable but failed.)

Recordings are made for 3 minutes in every 30, 24 hrs a day, so the detections of waders could be (nocturnal) migration so I can't just dismiss them. But there are many questionable species here, for example: Water Rail, Long-eared Owl, Kingfisher, Crested Tit, Reed Bunting, to name just a few. You could be forgiven for being persuaded by the Water Rail confidence and numbers, but having listened to some I'm fairly sure they are Song Thrushes.

Ideally, I'd like to come up with a simple algorithm - rule of thumb - that I can apply automatically to make the best determination possible of what's present and what isn't, probably based on confidence and detections. Is this reasonable? Any ideas?
 
Hi, unfortunately confidence on BirdNET can't be interpreted very easily, it merely reflects the closeness of fit based upon the data it has been trained on, not the likelihood that the species has been correctly identified. I've worked with BirdNET in many Spanish habitats and there are some species that it consistently gives low confidence for (but it's correct on) and others it gives high confidence false positives. I've had 99 confidence Little Owls be wrong. You therefore (unfortunately) need to interpret confidence on a species-by-species level, which is time consuming, of course. For some species, a 0.60 confidence is correct 99% of the time, for others a 99% threshold may be a higher or lower confidence. Here's a fantastic guidelines paper for use of BirdNET: https://www.researchgate.net/public..._of_BirdNET_scores_and_other_detector_outputs.
Ultimately, the only way to truly be sure is manually verify recordings.
Water Rail can be picked up on nocmig more than you'd expect, although I'm not sure how much the BirdNET algorithm has been trained on nocturnal flight calls.
 
Thanks for that. I've read the paper once and I think I'll have to do that a few more times before I get my head around it all (if ever!). It does, as you say, differentiate confidence from probability, although I still think some people would see them as roughly the same thing.

My use of BirdNET is mostly to alert me to species that may be present that I'm not currently aware of - as distinct from, say, a population survey of known species. This is probably the most inappropriate use of the technology, as (a) their presence is by definition questionable and (b) numbers are likely to be small given that I'm not aware of them. Perhaps the only way of using these figures is as a heads-up for further investigation.

Chirpity has an alternative "nocmig" mode, which uses a differently trained model and only looks at night-time periods. I haven't used this but I can already see when detections occur during the night and I ignore those. I'm only interested in populations within the wood. The "Water Rail" were daytime detections and were (in those I looked at) the repeated "tck tck tck" of a Song Thrush.

One issue I do see with BirdNET is caused by its 3s "chunking". Many false positives are the result of separately analysing the start or end of another bird's song i.e. taking it out of context. You'll be familiar with the Merlin app and it doesn't seem to suffer this. I know that Merlin comes from the same stable as BirdNET but understand that it uses a different algorithm, something continuous perhaps rather than isolated analyses. Sometimes I'll play a recording into Merlin to get a second opinion and it doesn't seem to produce these false positives. I don't know if there is anything that can be done to overcome this.
 

Users who are viewing this thread

Back
Top