• Welcome to BirdForum, the internet's largest birding community with thousands of members from all over the world. The forums are dedicated to wild birds, birding, binoculars and equipment and all that goes with it.

    Please register for an account to take part in the discussions in the forum, post your pictures in the gallery and more.
Where premium quality meets exceptional value. ZEISS Conquest HDX.

Latest IOC Diary Updates (8 Viewers)

I can't find any differences between the 'LifeList+' and 'LifeList+full ssp' sheets.
Have another look: in the 'full_ssp' file the genus and subspecies (='ssp') are always written out fully (and not indented):

Code:
-- IOC_Names_File_Plus-14.2.xlsx:
Struthio camelus
    S. c. syriacus
    S. c. camelus
    S. c. massaicus
    S. c. australis

-- IOC_Names_File_Plus-14.2_full-ssp.xlsx:
Struthio camelus
Struthio camelus syriacus
Struthio camelus camelus
Struthio camelus massaicus
Struthio camelus australis

I've not done the comparison but they should be otherwise the same (I hope).
 
Last edited:
Ah, so that's the difference.
I wonder if there's really a need for these different lists that differ only very slightly...? Same applies to the list with red highlights.
 
Ah, so that's the difference.
I wonder if there's really a need for these different lists that differ only very slightly...? Same applies to the list with red highlights.
I use only the master file because I assume that that is the motherlode of original data. That's nowhere documented - I came to that conclusion solely on the basis of the 'master' in the name of the file so I hope it is correct. But the master file has the disadvantage that you need some technical skill to turn it into the more 'normal' user friendly forms (i.e., with genus + space + epithet + space + epithet2 on one line). For me that is no problem but most people wouldn't know how (I think).

I think the 'red' file has red colored text on 'latest changes' (again, guesswork, it's not all that clearly documented).

So, different products for different consumers makes sense.
 
I use only the master file because I assume that that is the motherlode of original data. That's nowhere documented - I came to that conclusion solely on the basis of the 'master' in the name of the file so I hope it is correct. But the master file has the disadvantage that you need some technical skill to turn it into the more 'normal' user friendly forms (i.e., with genus + space + epithet + space + epithet2 on one line). For me that is no problem but most people wouldn't know how (I think).

I think the 'red' file has red colored text on 'latest changes' (again, guesswork, it's not all that clearly documented).

So, different products for different consumers makes sense.
Same here, but I do admit that every update I have to think for a second and try to remember which list I want
 
Black-goggled TanagerTrichothraupis melanopsAdd newly described subspecies griseonota.NEWOriginally described as a species (Cavarzere, Costa, Cabanne, Trujillo-Arias, Marcondes & Silviera), but here treated as a subspecies of Trichothraupis melanops.
 
Black-goggled TanagerTrichothraupis melanopsAdd newly described subspecies griseonota.NEWOriginally described as a species (Cavarzere, Costa, Cabanne, Trujillo-Arias, Marcondes & Silviera), but here treated as a subspecies of Trichothraupis melanops.
A typo encountered before: looks like 'Silviera' here also should be 'Silveira'. Perhaps we can prevent it from ending up in 15.1?

That typo ('Silviera') occurs in IOC website 'References' too. It occurs even twice because apparently the whole references list is double, or at least large parts of it...
 
Last edited:
A typo encountered before: looks like 'Silviera' here also should be 'Silveira'. Perhaps we can prevent it from ending up in 15.1?

That typo ('Silviera') occurs in IOC website 'References' too. It occurs even twice because apparently the whole references list is double, or at least large parts of it...
Thanks. Will fix Silveira in 15.1 and in references.

We recently became aware of the duplicate entries of the references. Working on a fix which isn't as trivial as it might seem.
 
We recently became aware of the duplicate entries of the references. Working on a fix which isn't as trivial as it might seem.

The list seems to be duplicated entirely, to the exception of one single ref ("Wetmore A. 1952. The birds of the islands Tobaga, Taboguilla and Uravá, Panama. Smithsonian Miscellaneous Collections 121(2): 1–32."), which is present in the first duplicate but absent in the second. All the other entries in the second duplicate are fully identical to the corresponding entries in the first duplicate. (Thus it seems that simply deleting the second duplicate would solve the problem.)

There is also something odd with one ref that apparently had an <hr> tag accidentally inserted in it, and looks like this :
Ferrer Obiol J, JM Herranz, JR Paris, JR Whiting, J Rozas, M Riutort & J González-Solís. 2022. Species delimitation using genomic data to resolve taxonomic uncertainties in a speciation continuum of pelagic seabirds, Molecular Phylogenetics and Evolution, doi: https://doi.org/10.1016/

.2022.107671
This occurs in the two duplicates.

Hoping this can help...
 
Last edited:
The list seems to be duplicated entirely, to the exception of for one single ref ("Wetmore A. 1952. The birds of the islands Tobaga, Taboguilla and Uravá, Panama. Smithsonian Miscellaneous Collections 121(2): 1–32."), which is present in the first duplicate but absent in the second. All the other entries in the second duplicate are fully identical to the corresponding entries in the first duplicate. (Thus it seems that simply deleting the second duplicate would solve the problem.)

There is also something odd with one ref that apparently had an <hr> tag accidentally inserted in it, and looks like this :

This occurs in the two duplicates.

Hoping this can help...
Thanks.
David
 
Cool stuff. How are you doing this—e.g. is it a Bash script ?

Suggest you might develop this into a script which can run all tests against the list each time (i.e. rather than doing it ad hoc). I'm sure you don't but shout if you need help to do that.
I've followed up on the idea to make data tests more structured and complete (but still using the data in SQL database tables). I can now run this test suite against earlier releases, so probably against future releases too. I don't know if a 'coverage' metric makes sense but it begins to be pretty complete, I think.

So I could use some more ideas.

I might compare with other world lists but I don't know if it's worth it with an amalgamated list coming before long. (I did send eBird/Cornell a list of 2023b errors (or what I think are errors) but I haven't heard from them.)

Below is a list of the quality checks I use on the master IOC list data. (plus a few checks against the Multiling data).

IOC 14.2
ma = table master_ioc_list which is spreadsheet: master_ioc_list_v14.2.xlsx
ml = table multiling, which is spreadsheet: Multiling_IOC_14.2.xlsx
  • id check: same numbers as spreadsheet rows
  • Authority checks: doublespace, period, ampersand, leading/trailing space/eol
  • Authority check position parentheses
  • Authority allowed characters in "Authority": [:alnum:] ( ) , & ' -
  • Authority has no comma
  • Author name, popular typos: Linneaus, Leotaud, Silviera, etc
  • Authority year not between 1758 and 2024 (release_year)
  • dashes chr(45) allowed; chr(8082) not allowed (whole row check)
  • no parentheses, and year_genus > year_species or year_genus > year_subspecies
  • all author regular expression (=show up new authors in new releases)
  • regex: genus, species, subspecies a-z check; authority: ^ [ [:alnum:] () , & ' -]+ $
  • regex: Infraclass: ^ [A-Z]+ $
  • regex: Parvclass: ^ [A-Z]+ $
  • regex: Order: ^ [A-Z]+ IFORMES $
  • regex: Family (Scientific): ^ [A-Z][a-z]+idae $
  • regex: Family (English): $$ ^ [ a-z A-Z ' & , -]+ \Z $$
  • regex: Species (English): $$ ^ [ [:alpha:] ' & , . -]+ \Z $$
  • Breeding Range (incl TrO and e PAL)
  • multiling minus master_ioc_list (+reverse)
  • multiling vs master_ioc_list should be 0 differences (compare binomials)
  • nominate species authority = subspecies authority (where species = subspecies)

if anyone can think of more ideas to torture this data I'd like to hear.

- kweetal
 
Last edited:
I've followed up on the idea to make data tests more structured and complete (but still using the data in SQL database tables). I can now run this test suite against earlier releases, so probably against future releases too. I don't know if a 'coverage' metric makes sense but it begins to be pretty complete, I think.

So I could use some more ideas.

I might compare with other world lists but I don't know if it's worth it with an amalgamated list coming before long. (I did send eBird/Cornell a list of 2023b errors (or what I think are errors) but I haven't heard from them.)

Below is a list of the quality checks I use on the master IOC list data. (plus a few checks against the Multiling data).

IOC 14.2
ma = table master_ioc_list which is spreadsheet: master_ioc_list_v14.2.xlsx
ml = table multiling, which is spreadsheet: Multiling_IOC_14.2.xlsx
  • id check: same numbers as spreadsheet
  • Authority checks: doublespace, period, ampersand, leading/trailing space/eol
  • Authority check position parentheses
  • Authority allowed characters in "Authority": [:alnum:] ( ) , & ' -
  • Authority has no comma
  • Author name, popular typos: Linneaus, Leotaud, Silviera, etc
  • Authority year not between 1758 and 2024 (release_year)
  • dashes chr(45) allowed; chr(8082) not allowed (whole row check)
  • no parentheses, and year_genus > year_species or year_genus > year_subspecies
  • all author regular expression (=show up new authors in new releases)
  • regex: genus, species, subspecies a-z check; authority: ^ [ [:alnum:] () , & ' -]+ $
  • regex: Infraclass: ^ [A-Z]+ $
  • regex: Parvclass: ^ [A-Z]+ $
  • regex: Order: ^ [A-Z]+ IFORMES $
  • regex: Family (Scientific): ^ [A-Z][a-z]+idae $
  • regex: Family (English): $$ ^ [ a-z A-Z ' & , -]+ \Z $$
  • regex: Species (English): $$ ^ [ [:alpha:] ' & , . -]+ \Z $$
  • Breeding Range (incl TrO and e PAL)
  • multiling minus master_ioc_list (+reverse)
  • multiling vs master_ioc_list should be 0 differences (compare binomials)
  • nominate species authority = subspecies authority

if anyone can think of more ideas to torture this data I'd like to hear.

- kweetal
I'd look at authority [and indeed word] frequency to find cryptic mis-spellings. Helps if there's also a lut of authorities...
  • parentheses balance (assume already in checks)
  • delete all expected characters (regexp_replace(...,'g')) and look at residual [for all columns, so probably on the results of select r., regexp_replace(r.::text...'g') from ioc r for convenience]
  • (should family English start with a capital too: ^[A-Z][a-z]+...$)
  • why is "." valid in an English species name
  • date frequency
For frequency etc I'd tie above SQL to R for visualisation. If using batch file [or linux equivalent] could make the whole thing, inc. any R report a single-click exercise.
 
Probably more one could do on range. E.g. name standardisation (capitalisation), country list comparison. From my reviewing Clements, approach to mountains (mnts, mountain, mt etc), islands (Is., Isles, I etc) not standardised. Again find with frequency analysis

Do we think the various products are each hand-crafted or is there code?

...Obviously one would do this from a master database and script each product to save effort (a RAP). (You could do with VBA + XL or perhaps power apps but likely to be more painful.)
 
I'd look at authority [and indeed word] frequency to find cryptic mis-spellings. Helps if there's also a lut of authorities...
Indeed - there are only ~1800 authors which I have parsed out and linked to more detail (date of birth/death, wikipedia, etc).
But yeah, frequencies might give some hints on geographical info, I'll look at that.
  • parentheses balance (assume already in checks)
done
  • delete all expected characters (regexp_replace(...,'g')) and look at residual [for all columns, so probably on the results of select r., regexp_replace(r.::text...'g') from ioc r for convenience]
that's basically the same as the allowing regex, but perhaps I can indeed do it on all values. Nice trick.
  • (should family English start with a capital too: ^[A-Z][a-z]+...$)
That's what I used (the spaces are not counted in regex' expanded syntax - makes for more readable regexen)
EDIT: Yes, you were right. FIXED.
  • why is "." valid in an English species name
That surprised me too, but I had to allow it because: where "Species (English)" ~ '[.]' yields:

Mrs. Hume's Pheasant
St. Helena Cuckoo
St. Helena Rail
St. Helena Crake
St. Helena Plover
St. Helena Petrel
St. Helena Hoopoe
St. Lucia Amazon
St. Vincent Amazon
St. Lucia Wren
St. Vincent Wren
St. Lucia Thrasher
Mrs. Gould's Sunbird
St. Lucia Oriole
St. Lucia Warbler
St. Kitts Bullfinch
St. Lucia Black Finch
(17 rows)

EDIT:
I set St. and Mrs. separate so I could remove the . from the regex. Good! thanks.


date frequency

I can do that easily although I don't understand what it will bring
For frequency etc I'd tie above SQL to R for visualisation. If using batch file [or linux equivalent] could make the whole thing, inc. any R report a single-click exercise.
I might look into that although there's no presentation ahead ;)

Thanks for your advice!
 
Last edited:
Indeed - there are only ~1800 authors which I have parsed out and linked to more detail (date of birth/death, wikipedia, etc).
But yeah, frequencies might give some hints on geographical info, I'll look at that.

done

that's basically the same as the allowing regex, but perhaps I can indeed do it on all values. Nice trick.

That's what I used (the spaces are not counted in regex' expanded syntax - makes for more readable regexen)
EDIT: Yes, you were right. FIXED.

That surprised me too, but I had to allow it because: where "Species (English)" ~ '[.]' yields:

Mrs. Hume's Pheasant
St. Helena Cuckoo
St. Helena Rail
St. Helena Crake
St. Helena Plover
St. Helena Petrel
St. Helena Hoopoe
St. Lucia Amazon
St. Vincent Amazon
St. Lucia Wren
St. Vincent Wren
St. Lucia Thrasher
Mrs. Gould's Sunbird
St. Lucia Oriole
St. Lucia Warbler
St. Kitts Bullfinch
St. Lucia Black Finch
(17 rows)

EDIT:
I set St. and Mrs. separate so I could remove the . from the regex. Good! thanks.




I can do that easily although I don't understand what it will bring

I might look into that although there's no presentation ahead ;)

Thanks for your advice!
I thought the contraction of "Saint" was "St" and that the "." is only needed where a word is abbrev. You might want to check with them, but I think those examples may be grammatically wrong.

The point about date frequency (and other frequencies) is just to identify outliers, possible errors. I agree it may bring nothing at all...
 
I see that Saint / St / St. - spelling was discussed not so long ago:


(And Clements (2023b) has indeed the same 'St.' spelling.)
 

Users who are viewing this thread

Back
Top