• Welcome to BirdForum, the internet's largest birding community with thousands of members from all over the world. The forums are dedicated to wild birds, birding, binoculars and equipment and all that goes with it.

    Please register for an account to take part in the discussions in the forum, post your pictures in the gallery and more.
Feel the intensity, not your equipment. Maximum image quality. Minimum weight. The new ZEISS SFL, up to 30% less weight than comparable competitors.

Parsiger - analysing the wealth of Tarsiger data (1 Viewer)

opisska

rabid twitcher
Czech Republic
I realize that I could have probably asked the Tarsiger people to give me a database dump, but to be honest, this was far more fun :) I have made a simple set of BASH tools to download and parse the entire Tarsiger WP news history (all of the 710 pages of it, dating, amusingly, back to 1655 and no that 6 is not a typo), so I could convert it into data to play with. As the pages are obviously automatically created from a database, the parsing is straightforward, barring a few odd, mostly very old records, so I was able to get over 36000 data entries with date, species and country. Since the data clearly belongs to Tarsiger, I am not going to post the resulting data file (but I am willing to share it privately), but I would like to enlighten you with some insights that can be extracted from the data, because with that many data points, the possibilities are endless!

The entire archive has 890 species, including things that are almost funnily common in Europe (like Blue Tit), just found in very unusual corners of WP. I have created a subset that I call "our targets" where I removed everything I already have on a WP list + species that we are reasonably expecting to see in their breeding ranges in WP once we actually go there, if this is an area I deem accessible (so basically anything but Russia).

So far I made two plots of interest - the first just shows the distribution of dates of records for all species and for the selected ones. You can see that roughly October 10 is the peak date and also that the spring peak is quite suppressed in "our targets" - a cursory overview shows that this is probably due to a much higher fraction spring records being eastern WP species in western Europe and we have seen many of those species in their home WP ranges or in central Europe.

hist-all.png

In the second plot, I show the species ordered by the number of records, with a logarithmic y-scale for better readability. You can see that the distribution follows a broken exponential - the few very common vagrants are really common and then there is a long tail of very rare birds.

species.png

The 10 most reported species in the dataset and their number of observations are:
787 Pectoral_Sandpiper
713 Olive-backed_Pipit
664 Buff-breasted_Sandpiper
643 Dusky_Warbler
626 Orange-flanked_Bush_Robin_(Red-flanked_Bluetail)
552 White-rumped_Sandpiper
481 Lesser_Yellowlegs
467 Hume's_Leaf_Warbler
432 Red-eyed_Vireo
418 Ring-billed_Gull

The 10 species we most need are:
713 Olive-backed_Pipit
552 White-rumped_Sandpiper
432 Red-eyed_Vireo
401 American_Golden_Plover
390 Radde's_Warbler
361 Baird's_Sandpiper
352 Long-billed_Dowitcher
351 Semipalmated_Sandpiper
307 Laughing_Gull
300 Pacific_Golden_Plover

I am planning ultimatelly to do the time plot but country by country to see what are a good destination for which part of the year, but that will require some more playing with the data.

-------------------------------------------------------------

Below are some of my simple codes, if someone wants to do something similar and spare the effort. This is gonna make sense only for people who know BASH obviously (so I hid it in qoute not to scare anyone else). I find BASH to be typically the path of least effort to do text parsing and it has proven correct in this case.
First to download the pages, I just use wget and set the page limit by hand to what it is now
for i in $(seq 0 710) do wget -O $i.html "http://www.tarsiger.com/news/index.php?p=news&sp=wp&lang=eng&place=&country=&species=&day=&month=&year=&p_nr=$i" done

Then the main magic happens in a script I, being the master of puns, named parsiger.sh (which needs to be run for all files previous downloaded)
export LC_ALL=C ndate=false cat $1 | while read line do if [ "$ndate" = true ]; then ndate=false if [[ ! ${line::1} == "<" ]]; then adate=$(echo $line | cut -d"<" -f1) fi fi if [[ "$line" =~ .*"class=news_date".* ]]; then ndate=true fi if [[ "$line" =~ .*"news_species".* ]]; then atmp=$(echo $line | sed -e "s/.*news_species.*'>//" | sed -e "s/&acute;/'/") aspecies=$(echo $atmp | cut -d, -f1) alatin=$(echo $atmp | cut -d">" -f2 | cut -d"<" -f1) fi if [[ "$line" =~ .*"news_country_links".* ]]; then acountry=$(echo $line | sed -e "s/.*news_country_links.*'>//" | cut -d'<' -f1) echo -e "$adate\t$aspecies\t$alatin\t$acountry" fi done

Finally, there is super trivial code to turn the weird dates into days in year:
cat $1 | while read d1 d2 d3 d4 rest do d3b=$(echo $d3 | sed -e "s/[a-z]//g") d5=$(echo "$d2 $d3b $d4") d6=$(date -d "$d5" +%j) echo -e "$d4\t$d6\t$rest" done

To sort out the species, I expand the datafile into individual files for each species, simply using filesystem as the world's most convenient database:
rm species/* cat $1 | tr '\t' ';' | while read line do specie=$(echo $line | cut -f 3 -d';' | tr ' ' '_') echo $line>>species/$specie done

And then I just cat everything back in one file and plot with GNUPLOT:
set term png size 1200,800 set output 'hist-all.png' set xrange [1:12.99] set xtics 1 set grid set xlabel 'month' set ylabel 'records' plot 'parsiger-all-days.csv' u ($2/30.5+1.0):(1) smooth frequency with boxes title "all species" , 'parsiger-nase-days.csv' u ($2/30.5+1.0):(1) smooth frequency with boxes title "our targets" set output 'species.png' set xrange [*:*] set logscale y set xtics auto set xlabel 'species no.' plot 'species.csv' u 0:1 title "all species", 'species-nase.csv' u 0:1 title "out targets"
 

jurek

Well-known member
I see that most rarities in one day were seen on 11. October. I will call it Twitchers Day and plan to go looking for rarities this autumn! :D

A smaller Spring Twitchers Day is on 17 May.
 

Bismarck Honeyeater

Barely known member
I see that most rarities in one day were seen on 11. October. I will call it Twitchers Day and plan to go looking for rarities this autumn! :D

A smaller Spring Twitchers Day is on 17 May.
Some time ago I worked out that the best dates for rarities in the U.K. were May 27th in Spring and Oct 12th in Autumn. Of course it will vary year on year and this is just averaged out.
 

THE_FERN

Well-known member
I am planning ultimatelly to do the time plot but country by country to see what are a good destination for which part of the year, but that will require some more playing with the data.
...an animated map showing location by day or week. Geonames api could help here if dataset doesn't have co-ords.

...And then there's ebird data of course. One could compare for target species to check the "veracity" of ebird data. I'd be willing to bet the 2 coincide quite well for well-watched rarities. Ebird likely has some records Tarsiger doesn't, esp. wrt long stayers.
 

opisska

rabid twitcher
Czech Republic
The per country data is much noisier, so I binned it per week. There is 58 countries, so I took only the top 17 and had to split them into three plots to make any sense of the visualization. The following plots are for all species. I only now noticed that I have put "week" instead of "month" on the X-axis and I am too lazy to switch all the plots, so just imagine it says "month" :)

countries1.png
countries2.png
countries3.png

What you can see here is that some places, notably Azores and, for some reason, Ireland, really lack the spring peak. Also that if you want to spend your fall in Scandinavia, you really should go to Norway first and then move to Sweden. Pretty useful already, ha? Also to Iceland peak is notably earlier in the fall and Israel is very late.

The same can be done with just our selection of targets. The choice and order of the 17 countries is now different, so the colors are also different, sory about that!
countries-nase1.png
countries-nase2.png
countries-nase3.png
Our selection really brings Azores forward, as we are much heavier on missing American birds and it really kills Sweden as that has mostly eastern vagrants apparently. The other difference seem mostly random though.

Is this interesting? Useful? I don't know but I always wanted to see how it looks like :)
 

Hauksen

Forum member
Antarctica
Hi Jan,

The per country data is much noisier, so I binned it per week.

Is this interesting? Useful? I don't know but I always wanted to see how it looks like :)

I wonder if it could be presented less noisily if you'd calculate a value for "seasonality" (roughly "peak area to total area ratio"), then normalize the weekly plot by that value? Just a spontaneous idea, might turn out to be not so useful at all, but that's what I'd probably try. I think it would bring out the local migration peaks better.

Saisonality vs. latitude might be an interesting scatter plot, too :)

Regards,

Henning
 

opisska

rabid twitcher
Czech Republic
...an animated map showing location by day or week. Geonames api could help here if dataset doesn't have co-ords.

...And then there's ebird data of course. One could compare for target species to check the "veracity" of ebird data. I'd be willing to bet the 2 coincide quite well for well-watched rarities. Ebird likely has some records Tarsiger doesn't, esp. wrt long stayers.

Ebird is good for individual species, but there are hundreds of them and I do not know any way how to extract ebird data efficiently, so that's why I was so happy with Tarsgier as the source. For the few top species, just looking at ebird is the best source of overview probably.
 

THE_FERN

Well-known member
Ebird is good for individual species, but there are hundreds of them and I do not know any way how to extract ebird data efficiently, so that's why I was so happy with Tarsgier as the source. For the few top species, just looking at ebird is the best source of overview probably.
I think there's an api and/or some r code for individual sp.

e.g. eBird Data Extraction and Processing in R

Otherwise, take a cut of the database. I think that's available as CSV or other text which would allow you to use SED or similar. of course, I'd put this stuff in a db—postgres and/or sqlite. Of course the files can be girt massive, but I'm sure you've access to institutional interweb...

Cornell seems like an R "shop" broadly speaking.
 

THE_FERN

Well-known member
The per country data is much noisier, so I binned it per week. There is 58 countries, so I took only the top 17 and had to split them into three plots to make any sense of the visualization. The following plots are for all species. I only now noticed that I have put "week" instead of "month" on the X-axis and I am too lazy to switch all the plots, so just imagine it says "month" :)


Is this interesting? Useful? I don't know but I always wanted to see how it looks like :)
Suggest a better way is with a chloropleth map & small multiples. that could be a "schematic" [e.g. a hexagon cartogram, something like this:

https://bigthink.com/wp-content/uploads/2021/06/134652-134653.jpg?resize=480,270

] It should show successive movement from E=>W and S=>N in spring and vice versa in autumn. A simple heatmap for the months you're interested in will show you where to go...

...I have no idea how easy this is in gnuplot: in R it's fairly straightforward, and you can even do it in XL.
 

jurek

Well-known member
These tools would be a good addition to bird guidebooks.

Older edition of Sibley Birds had records of rarities marked as green dots on the maps. It was very informative, because one could visually inspect patterns. For example, records of a vagrant X cluster in Alaska and become thinner towards the East Coast, or that a vagrant Y has all records along the coast etc. This was much better than e.g. Collins Bird Guide which only puts a list of countries.

Maps of vagrants in old Sibley Guide were great, alas, looked too time-consuming to produce. But as we have now free software packages which take a text, automatically extract geographical names and put dots on a map – they would be a great addition to e.g. the new edition of Collins Birds of Europe.

Actually, one could make a bird guide with much more than a simple static map. For example a map with added frequency, and changing dynamically every month. So in Northern Europe in October you will not have a Nightingale on a map, and marked only as a rarity.
 

THE_FERN

Well-known member
Actually, one could make a bird guide with much more than a simple static map. For example a map with added frequency, and changing dynamically every month. So in Northern Europe in October you will not have a Nightingale on a map, and marked only as a rarity.
I think it's called "ebird"...

Seriously, though, I really wish they'd get rid of the paywall and integrate the BoW material. Also commission some more diagnostic illustrations, stop the crazy obsession with trips and hotspots, and... Well it's still pretty useful despite all this
 

Users who are viewing this thread

Top