Tracking hummingbirds at the feeder

hummingbirds

code

data analysis

deep-dive

Author

Cait Harrigan

Published

March 17, 2023

Reading time: 8 minutes

Series overview

This is the first post in our deep-dive series on the data from our hummingbird detection. We’ll cover a few different things:

All the data analyzed here was collected in 2021. In the future, we may be able to learn more by comparing across different years. We have images from the hummingbird feeder, and micro-climate data collected from the weather station.

Background

There are two species of hummingbirds found near Galiano: the green Anna’s Hummingbird and the orange Rufous Hummingbird. ID images from allaboutbirds.org.

We set up a camera to capture these feathery fellows visiting our hummingbird feeder.

A Female Anna’s (left) and Male Rufus (right) at our feeder

We trained a classifier to detect who’s who….

And we’re even able to keep track of when the feeder needs a refill!

Setup

Lets dive in to some of the data we’ve collected!

We’ll read in and prepare our data: the classifier output as well as bounding boxes of detected birds.

Code

library(tidyverse)
library(lubridate)
library(patchwork)
library(ggpubr)

# classifier labels
classes = c('Rufous_Male', 'Annas_Male', 'Person', 'Annas_Female', 'Rufous_Female')

# read in bird detection data, and do some basic data cleaning
bird <- read_csv('2021_reprocessed_hummers_combo_2023.csv') %>% 
  # lookup labels
  mutate(class = classes[label + 1]) %>%
  # put columns into tidy format
  separate(class, into=c('Species', 'Sex'), sep='_') %>%
  separate(image, into=c('Date', 'image'), sep="_", remove = F) %>%
  separate(image, into=c('hhmm'), sep=".jpg", extra = 'drop') %>%
  # clean up timestamps
  mutate(Timestamp = ymd_hm(paste(Date, hhmm))) %>%
  # remove people
  filter(label != 2) %>%
  # change column types, correct for variable image size
  mutate(Date=ymd(Date),
         Month=factor(month(Date, label = T)),
         Hour=hour(Timestamp),
         Species = ordered(Species),
         Sex = factor(Sex),
         MergeTime = floor_date(Timestamp, 'hour'),
         xmid = (xmax+xmin)/2, 
         ymid = (ymax+ymin)/2
  )

The bird detector gives us the the bounding box of each bird in an image, and the predicted sex and species of the bird. Note that because it’s hard to distinguish female birds from the immature males, the “female” class label encompasses both of these. Lets take a peek at the data.

Code

knitr::kable(head(bird))

Date	hhmm	xmin	ymin	xmax	ymax	confidence	x_size	y_size	Mo	OK	Species	Sex	Timestamp	Month	Hour	MergeTime	xmid	ymid
2021-04-20	1651	154	365	331	514	0.8494033	820	616	04	1	Rufous	Male	2021-04-20 16:51:00	Apr	16	2021-04-20 16:00:00	242.5	439.5
2021-04-20	1706	200	382	406	478	0.8925107	820	616	04	1	Rufous	Male	2021-04-20 17:06:00	Apr	17	2021-04-20 17:00:00	303.0	430.0
2021-04-20	1718	634	401	814	552	0.9755894	820	616	04	1	Rufous	Male	2021-04-20 17:18:00	Apr	17	2021-04-20 17:00:00	724.0	476.5
2021-04-20	1914	164	389	368	507	0.6146710	820	616	04	1	Rufous	Male	2021-04-20 19:14:00	Apr	19	2021-04-20 19:00:00	266.0	448.0
2021-04-20	1925	196	389	408	472	0.9455948	820	616	04	1	Rufous	Male	2021-04-20 19:25:00	Apr	19	2021-04-20 19:00:00	302.0	430.5
2021-04-20	1942	177	402	378	539	0.9968680	820	616	04	1	Rufous	Male	2021-04-20 19:42:00	Apr	19	2021-04-20 19:00:00	277.5	470.5

Number of birds by species

Our bird detector ID’d lots of visitors! Each ID represents a bird captured in an image, and not necessarily an individual. The images are captured once every minute, so if for example a single bird sat at the feeder for two minutes in a row, it would get counted twice. There may be multiple birds in a single image.

Code

n_table <- bird %>%
  group_by(Species, Sex) %>%
  summarise('Number of ids' = n(), .groups = 'keep')

n_table %>%
  kableExtra::kbl() %>%
  kableExtra::kable_styling(full_width = F)

Species	Sex	Number of ids
Annas	Female	4127
Annas	Male	1792
Rufous	Female	11720
Rufous	Male	1081

🧠 Observation

There are more Rufouses than Annas, and many more females + immature males than mature males captured in our images.

Classifier confidence

Each bird ID is made by picking the the most likely label, based on the prediction probability. Sometimes, the classifier isn’t as confident as others. For example, it may be harder to identify a bird that is facing away from the camera. We can plot the confidences, and see which birds are the hardest to label.

Code

bird %>%
  gghistogram(x = 'confidence', fill = 'Species', color='Species', bins=100,
              facet.by = c('Sex','Species'), scales = 'free_y', alpha=1, position='stack') +
  labs(title = 'Prediction for females may be better calibrated',
       subtitle = 'Higher proportion of low-confidence IDs for males vs females of both species', 
       x = 'Prediction confidence', y = 'Number of IDs') + 
  theme(legend.position = 'right') + 
  scale_fill_manual(values=c('chartreuse3', 'chocolate2')) + 
  scale_colour_manual(values=c('chartreuse3', 'chocolate2'))

There is almost certainly some mistakes in our classifier, including false positives (non-birds labeled as birds) and false negatives (birds that are missed)

To help make sure we’re only considering high-quality calls, we’ll pick a confidence threshold of 0.7 and only analyse the IDs above that level.

🧠 Observation

For both species, mature male birds are slightly harder to identify with high confidence than female + immature male. This may be because there are fewer of them.

Filtering for high-confidence IDs

We manually looked at the images, and found that that almost all the Rufouses after September 1st are false positives, so we’ll remove these observations. Also, since we have few IDs in October compared to other months, we will drop this month from our data. Then, we’ll filter to only retain the high-confidence (> 0.7 prediction probability) bird IDs

Code

table(bird$Month) %>%
  t() %>%
  kableExtra::kbl() %>%
  kableExtra::kable_styling(full_width = F)

Apr	May	Jun	Jul	Aug	Sep	Oct
1125	4701	6063	2724	2164	1875	68

Code

bird <- bird %>%
  filter(!(Timestamp >= ymd('2021-09-01') & Species =='Rufous')) %>%
  filter(Timestamp < ymd('2021-10-01')) %>%
  filter(confidence > 0.7)

Lastly, we’ll sanity check the bounding boxes for their size.

Code

bird <- bird %>%
  mutate(
    Height = ymax-ymin, 
    Width = xmax-xmin, 
    Area = Height * Width
  )

bird %>% 
  gghistogram(x="Area", bins = 100) + 
  annotate("segment", x = 21000, xend = 21000, y = 700, yend = 600, size = 1.2,
           colour = "red", arrow = arrow(type='closed', length=unit(0.04, 'npc')))  + 
  annotate("segment", x = 50000, xend = 50000, y = 400, yend = 300, size = 1.2,
           colour = "red", arrow = arrow(type='closed', length=unit(0.04, 'npc')))  + 
  labs(title = 'There are some very small and some very large bounding boxes.',
       subtitle = "Note also relative raity of areas at 25,000 and 50,000 pixels", y = 'Count')

This distribution more or less makes sense, we see some very small bounding boxes which may be mis-calls or birds who are only half in the image. There are also some cases where a bird is partially obscured because it’s sitting behind the feeder. There are some very large bounding boxes also. I’m not sure exactly whether these are real, giant humming birds, or some kind of error.

Lets see if looking at the bird position with respect to the feeder helps sort this out.

Batch correction and normalization

We can plot the location of where each bird sits, with the proxy that a birds will be located at the center of its bounding box (indicated by xmid and ymid). The birds appear to mostly sit in a circle, around the rim of the feeder. This gives the best access to the sugar water.

Code

bird %>% ggplot(aes(x = xmid, y = -ymid, colour=y_size * x_size)) +
  geom_point() +
  theme_pubr() + 
  labs(colour = 'Image size (pixels)') + 
  scale_colour_binned(n.breaks=3) + 
  theme(legend.position = 'right', legend.direction = 'vertical')

However, it looks like there are at least 2 distinct positions that the feeder was in over the summer. We’ll try to do batch correction for this, by assigning a batch number to each position, and subtract the average from the x and y direction. I think we can reasonably split the changes in position (the batches) simply by the date. This is easy to see in the ymid variable - the feeder moves twice in May, and once in June.

Code

bird %>% 
  mutate(Month = month(Date, label=T)) %>%
  filter(Date>'2021-05-01' & Date<'2021-06-15') %>%
  ggplot(aes(x=Timestamp, y = ymid))+ 
  geom_jitter(alpha = 0.5) + 
  facet_wrap("~Month", scales = 'free_x') + 
  theme_pubr() + 
  labs(title = 'Feeder Moves Up and Down in May and June')

Lets split these into batches for correction.

Code

bird$Batch <- 1
bird$Batch <- ifelse(bird$Timestamp > ymd_hms('2021-05-08 00:00:00'),2,bird$Batch)
bird$Batch <- ifelse(bird$Timestamp > ymd_hms('2021-05-11 00:00:00'),3,bird$Batch)
bird$Batch <- ifelse(bird$Timestamp > ymd_hms('2021-06-13 12:20:00'),4,bird$Batch)
bird$Batch <- factor(bird$Batch)

We can check it went as expected by recolouring the previous plot by batch

Code

bird %>% 
  mutate(Month = month(Date, label=T)) %>%
  mutate(xmid = (xmax+xmin)/2, ymid = (ymax+ymin)/2) %>% 
  filter(Date>'2021-05-01' & Date<'2021-06-15') %>%
  ggplot(aes(x=Timestamp, y = ymid, color=Batch))+ 
  geom_point(alpha = 0.5) + 
  facet_wrap("~Month", scales = 'free_x') + 
  theme_pubr() + 
  labs(title = 'We can Infer Position Batches Based on When the Feeder Moves',
       colour = 'Batch')

Now we’re ready to correct! We’ll transform all the x- and y- values by centering and scaling each batch. This will yield a new positioning relative to the middle of the feeder; we can no longer interpret this as the pixel index in the image. We’ll normalize the “mid” locations, then add the size of the bounding box back.

Code

bird_scaled <- bird %>% 
  mutate(Month = month(Date, label=T)) %>%
  mutate(Height = ymax-ymin, Width = xmax-xmin) %>%
  group_by(Batch) %>%
  mutate(across(c(Height, Width, xmid, ymid), ~scale(.x))) %>%
  mutate(xmin = xmid - Width/2, xmax = xmid + Width/2, 
         ymin = ymid - Height/2, ymax = ymid + Height/2)

Code

bird_scaled %>% 
  filter(Date>'2021-05-01' & Date<'2021-06-15') %>%
  ggplot(aes(x=Timestamp, y = ymid, color=Batch))+ 
  geom_point(alpha = 0.5) + 
  facet_wrap("~Month", scales = 'free_x') + 
  theme_pubr() + 
  labs(title = 'Batch corrected for feeder position',
       colour = 'Batch')

We can see that the correction didn’t perfectly fix things - but it looks pretty good! From now on, we’ll use the bird_scaled dataframe.

Code

bird %>%
  ggplot(aes(x=xmid, y=-ymid) ) +
  stat_density_2d(aes(fill=Batch), geom = "polygon", alpha=0.2) + 
  theme_pubr() + 
  labs(title = 'Bird density before batch correction', fill = 'Batch')
bird_scaled %>%
  ggplot(aes(x=xmid, y=-ymid) ) +
  stat_density_2d(aes(fill=Batch), geom = "polygon", alpha=0.2) + 
  theme_pubr() + 
  labs(title = 'Bird density after batch correction',fill = 'Batch')

The gap in the middle comes from the water column - we can’t detect birds sitting behind it! Of course, we have to be somewhat careful when converting back to the coordinates of the image. I negate the y coordinates, because they count down from the top of the image.

Code

img <- jpeg::readJPEG('feeder.jpg')

bird_scaled %>%
  ggplot(aes(x=xmid, y = -ymid)) + 
  ylim(-2, 10) + 
  background_image(img) + 
  geom_point(colour = 'lightblue')

Now we have our corrected bounding boxes! However, since we z-score normalized height and width, there are some negative values. This doesn’t make sense, so we’ll bring these back to the original space by using ungroup to add back the (pooled) mean and standard deviation. At the same time, we can filter out some extreme values (>2 sd from mean).

Code

bird_scaled <- bird %>% 
  mutate(Month = month(Date, label=T)) %>%
  mutate(Height = ymax-ymin, Width = xmax-xmin, 
         original_aspect_ratio = (Width/Height)) %>%
  group_by(Batch) %>%
  mutate(across(c(Height, Width, xmid, ymid), ~scale(.x))) %>%
  filter(abs(Height)<=2) %>%
  filter(abs(Width)<=2) %>%
  ungroup() %>%
  mutate(across(c(Height, Width, xmid, ymid), ~(.x * sd(bird$.x)) + mean(bird$.x))) %>%
  mutate(xmin = xmid - Width/2, xmax = xmid + Width/2, 
         ymin = ymid - Height/2, ymax = ymid + Height/2)

Since we normalized height and width independently, We should check that aspect ratio is preserved.

Code

bird_scaled %>%
  mutate(scaled_aspect_ratio = (Width/Height)) %>%
  ggplot(aes(x= scaled_aspect_ratio, y = original_aspect_ratio, colour = Batch)) +    geom_point() + 
  facet_wrap(~Batch, labeller = label_both) + 
  theme_pubr() + 
  coord_equal() + 
  geom_abline(intercept=0, slope = 1, linetype = 'dashed', linewidth = 1) + 
  labs(x = 'Scaled aspect ratio', y = 'Original aspect ratio', colour = 'Batch')

There are some boxes that may have got warped through the normalization, especially in batches 1 and 3, but for the most part it’s not terribly far off.

Code

bird_scaled %>% 
  gghistogram(x="log2(Width/Height)", bins = 100) + 
  labs(title = 'Bounding box aspect ratios after batch correction',
       subtile = 'Relative rarity of values at 18,000 and 38,000',
       y = 'Count') +
  annotate("segment", x = -0.2, xend = -2, y = 430, yend = 430, size = 1.2,
           colour = "red", arrow = arrow(type='closed', length=unit(0.04, 'npc'))) + annotate('text', label = 'Tall & thin', colour = 'red', x = -1, y = 450) +
  annotate("segment", x = 0.2, xend = 2, y = 430, yend = 430, size = 1.2,
           colour = "red", arrow = arrow(type='closed', length=unit(0.04, 'npc'))) + annotate('text', label = 'Short & wide', colour = 'red', x = 1, y = 450)  + 
  annotate('text', label='Square', x = 0, y = 450, colour = 'red')

🧠 Observation

There are more short and wide bounding boxes than tall and thin ones.

Number of ID’s after filtering

After all this filtering, here’s the final numbers of each class label:

Code

n_table %>% 
  bind_cols(
    bird_scaled %>%
    group_by(Species, Sex) %>%
    summarise(n=n(), .groups = 'keep') %>%
    pull(n)
  ) %>%
  rename(`Before filtering` = `Number of ids`,
         `After filtering` = `...4`) %>%
  kableExtra::kbl() %>%
  kableExtra::kable_styling(full_width = F)

Species	Sex	Before filtering	After filtering
Annas	Female	4127	2924
Annas	Male	1792	833
Rufous	Female	11720	9598
Rufous	Male	1081	634

🧠 Observation

There are more Rufouses than Annas, and many more females than males captured in our images.

Summary

We’ve filtered out low-confidence ID’s and suspicious aspect-ratio bounding boxes. Each row in our dataframe corresponds to a picture of a bird, not necessarily an individual. Female and immature males are not possible to distinguish, they are both classified as females. There are more birds labeled as female than male for both Anna and Rufous. There are more Rufous than Anna. Bounding boxes tend to be short and wide more often than tall and thin.

We’ll save the prepared bird_scaled dataframe for future use.

Code

write.csv(bird_scaled, 'bird_scaled.csv')

Read on to Part 2