This is the second post in this deep-dive series.
You can start from the beginning with Part 1, or skip forward to Part 3
In this post, we’ll look into some physical characteristics of the birds: how big they are and where they like to sit
Setup
As previously covered in Part 1, we’re looking at the hummingbird detection data from summer 2021. We did some preliminary data cleaning, and now we’ll investigate comparing sizes between species.
Code
library(tidyverse)library(lubridate)library(patchwork)library(ggpubr)knitr::opts_chunk$set(eval =FALSE) # read in bird detection data, and do some basic data cleaningbird_scaled <-read_csv('https://raw.githubusercontent.com/teaswamp-creations/galiwatch.ca/refs/heads/quarto-website/posts/hummer-data-cleaning-170323/bird_scaled.csv')
Because of the camera perspective, bounding box area will be affected by not only the bird size, but also its pose and closeness to the camera.
Birds closer to the front of the feeder have larger bounding boxes.
Code
bird_scaled %>%mutate(Area = (Height * Width)) %>%gghistogram(x="Area", bins =100) +labs(title ='Bounding box area after batch correction',subtitle ='Relative rarity of values at 18,000 and 38,000 pixels',y ='Count')
We can see two dips in the areas, as noted previously. The dips we see around 18,000 and 38,000 bounding box size are surprising, this means that there are certain height/width combinations that are not being picked up very frequently. This may indicate that there are certain positions on the feeder at which we can’t effectively identify birds.
We see this even more clearly by plotting bird height against width.
Code
bird_scaled %>%ggscatter(x ="Width", y ="Height", alpha =0.6, size=1) +labs(title ='Mid-sized birds are hard to call?') +theme(legend.position ='right', legend.direction ='vertical') +scale_color_ordinal(option='D')
Distribution of high-confidence ID’s
There’s some height/widths with no birds observed at them! To check whether this may be related to out classifier’s ability to call these birds, we can colour this plot by the confidence level (recall: we already filtered the data such that all ID’s have confidence > 0.7).
Code
bird_scaled %>%mutate(confidence_lvl =cut(confidence, c(0.7, 0.9, 0.98, 1), ordered_result=T)) %>%ggplot(aes(x=confidence, fill = confidence_lvl)) +geom_histogram(bins=50) +theme_pubr() +labs(fill ='Confidence', X ="Confidence", y ="Count")
We’ll colour confidence by levels for visualization purposes.
Code
bird_scaled %>%mutate(confidence_lvl =cut(confidence, c(0.7, 0.9, 0.98, 1), ordered_result=T)) %>%ggscatter(x ="Width", y ="Height", add ="reg.line", alpha =0.6, color='confidence_lvl', size=1,add.params =list(color ="black", fill ="lightgray", linetype='dashed')) +stat_cor(label.x =25, label.y =350) +stat_regline_equation(label.x =25, label.y =330) +labs(title ='Mid-sized birds are hard to call?',colour ='Confidence Level') +theme(legend.position ='right', legend.direction ='vertical') +scale_color_ordinal(option='D')
We can see that at least some of these “holes” seem to be associated with lower-confidence IDs, which is consistent with the hypothesis that they’re indicative of missing IDs from hard-to-call birds.
In terms of x location, distribution of bird sitting position is strikingly different at different confidence levels. High-confidence calls are much more concentrated at the sides of the feeder than the middle.
Our low-confidence bird calls coming mostly from a mid-sized bounding box, and towards the back of the feeder.
There are some low-confidence calls OFF the feeder. These are birds who were identified mid-flight! These fast-moving birds might be harder to classify. There are also low-confidence calls in the middle. These birds sit with their back to the camera, and may therefore be harder to identify.
The variability of confidence according to position may be confounded with Species and Sex classes: Through manual observation we noticed that different birds had different sitting spot preferences.
Female + immature male birds tend to to be ID’d towards the front of the feeder. Mature male Annas tend to be ID’d on the left, and mature male Rufous tend to be ID’d on the right side of the feeder.
This could be because they actually do sit in those spots more often, or because they’re easier to identify when they sit there.
Pose
In addition to whether a bird is close to the camera or not, we should keep in mind its pose on the feeder: the birds almost always face towards the middle, where the food is. Birds in the center of the feeder will likely result in bounding boxes that are taller, while birds on the sides may have a wider “profile view” bounding box:
Let’s looks at the height and width of the bounding boxes to get a better sense of the effect of sitting pose.
Code
bird_scaled %>%ggplot(aes(x = xmid, y=-ymid, colour =log2(Width/Height))) +geom_point() +theme_pubr() +scale_colour_gradientn(limits =c(-1, 1), oob=scales::squish,colours =c('navy', 'khaki2','darkred')) +lims(y=c(-550, -200)) +coord_cartesian(clip='off') +annotate('text', x =340, y =-200, label ='Tall & thin', angle =45, vjust=0, hjust =0.35) +annotate('text', x =420, y =-200, label ='Square', angle =45, vjust=0, hjust =0.2) +annotate('text', x =495, y =-200, label ='Short & wide', angle =45, vjust=0, hjust =0.5) +theme(legend.background =element_blank())
🧠 Observation
Bird pose, and subsequently aspect ratio depends on where on the feeder they sit.
We know that the bounding boxes with the largest area are those IDing bird close to the camera, but that in the middle, confidence in IDs is relatively low. We suspect that it is challenging to identify sex/species because they face away from the camera when birds are sitting in that spot. (ie. their back points to the camera when they eat from the feeder). Our confidence levels support that hypothesis:
High confidence ID’s tend to be short & wide (profile view) rather than tall and thin (back view).
Size comparison across species
Annas humming birds are a little larger than Rufous 1, and Annas male and females are more or less the same size 2. Within the Rufous species however, males tend to be a little smaller than females 3.
Measurments taken from allaboutbirds.org
Annas
Rufous
Length:
3.9 in (10 cm)
2.8-3.5 in (7-9 cm)
Weight:
0.1-0.2 oz (3-6 g)
0.1-0.2 oz (2-5 g)
Wingspan:
4.7 in (12 cm)
4.3 in (11 cm)
Assuming that preferred sitting position is independent of bird species and sex, we could compare directly the bounding boxes of the different classes.
However as we know, this assumption is not the case. If we want to make a fair comparison of sizes across the sex and species classes, we should account for pose/position of the birds.
Summary
Birds closer to the front of the feeder have larger bounding boxes. Our low-confidence bird calls coming mostly from a mid-sized bounding box, and towards the back of the feeder. Bird pose, and subsequently aspect ratio depends on where on the feeder they sit. High confidence ID’s tend to be short & wide (profile view) rather than tall and thin (back view).
Recalling that it’s not possible to distinguish the female from immature male birds, it’s also hard to tell apart the different species when their back is facing the camera.