However, they clearly added data for some species since (in the thesis, they lacked bicinctus cox1, placidus nd3/rag1, melodus myo/nd3, ruficapillus adh5, morinellus nd3/adh5/bfib7/myo; but these species are now present in all 6 trees in Fig.2)
Mmh, no, that doesn't seem to be correct inference. (? NB - I don't have any actual experience with BEAST.) In fact, the 6 gene trees all include all the taxa in the thesis as well,
despite an explicit statement that, for several of them, data were entirely lacking for some of the loci. Thus these 6 trees would appear to be the individual gene trees that were co-estimated during the joint analysis of the 6 loci, not trees estimated independently based on the sequence data from a single locus. (I'm quite uncertain of the actual meaning of such trees, though. If species represented by no data at all for a given locus find a place in the gene tree inferred for this locus, and this place seems coherent with the rest, this means that the inference of this tree is significantly constrained by the information contained in the other loci...)
For six species (
Charadrius alticola,
asiaticus,
forbesi,
obscurus,
peronii,
placidus), the data were obtained from toe pads of museum specimens. For these, the sequences that were amplified were shorter (different pairs of primers delimiting shorter amplicons). ...In a number of cases, presumably shorter than 200bp, which is the lower limit for sequences that GenBank will proceed. The following sequences which, as far as I understand, they did amplify, are lacking in the deposited data set:
alticola: adh5, bfib7, rag1
asiaticus: adh5, myo, bfib7
forbesi: nd3, adh5, bfib7, rag1
obscurus: nd3, adh5, myo, bfib7
peronii: nd3, adh5, myo, bfib7, rag1
placidus: adh5, myo, bfib7
About these, I cannot say anything. Regarding the sequences that are available in the deposited data set:
- cox1:
- KM001331-2 Charadrius veredus: both are identified by the BOLD ID engine as Stiltia isabella. I think these may have been replaced with correct sequences at some point between the production of the thesis and that of the paper, however (the species was basal in the "cox1" tree (a) shown in the thesis; it is sister to asiaticus as expected in the equivalent tree in the paper).
- KM001298-300 Charadrius peronii: KM001299 may be correct; the other two are very similar to one another but differ from the first one; most of the substitutions are grouped in the central part of the sequence (bp1-110: 4; bp111-290: 32; bp291-385: 0); this central part (bp111-290) is 100% identical to a Gallinago paraguaiae sequence in BOLD, nested within Gallinago in ID trees, and unlike any Charadrius.
- KM001301-3 Charadrius placidus: BOLD finds no match to the entire sequence; bp1-130: no match in BOLD, nearest matches in BLAST are Calidris melanotos sequences, no Charadrius in the nearest 100 matches; bp131-300: fully identical to 10 congruent sequences of Charadrius alticola (7 from BOLD, and the 3 sequences of the present study); bp300-429: BOLD again finds no match, nearest matches in BLAST are Charadrius dubius sequences. In the "cox1" trees, this species ended up basal in CRD II (both in the thesis and paper), while it was within CRD I in the species tree.
- (KM001333-5 Vanellus miles: the three sequences are congruent and fall within Vanellus, but they differ a lot from the 4 sequences that are in BOLD; divergence seems stronger towards the end of the sequences (BOLD finds V. miles as nearest match when run on the first 200bp only [albeit distance still around 3%], but finds no match at all when run on the whole sequence), which I do not expect in a coding gene. These sequences were not used in the paper.)
- nd3: I see no problem.
(KM001266-7 Charadrius collaris: these sequences are not congruent with two older sequences, but the problem here is clearly with these two older sequences (FR823281: first part = alexandrinus, second part = ruficapillus; FR823282: first part congruent with the new sequences; second part = alexandrinus; these are associated to Rheindt et al. 2011 in GenBank, but I see no evidence in the work that they were used there).)
- adh5: I see no problem in the data that were used for the paper.
(KM001163 Vanellus miles: this sequence differs from the other two (KM001161-2), which are congruent; all substitutions are in the central part of the sequence (bp1-210: 0; bp211-340: 9; bp241-599: 0); the divergence is reconstructed as wholly autapomorphic. This part of this sequence may be not be fully correct; if given the choice, I would rather not use this sequence. These sequences were not used in the paper.)
- myo: I see no problem.
- bfib7: KM001489-90 Charadrius tricollaris: these three sequences are fully identical to the three sequences of C. thoracicus (KM001482-4), and fall in a position consistent with the latter. C. tricollaris ends up in CRD II, sister to thoracicus in the "bfib7" (d) tree, both in the thesis and in the paper. (But note that, in these trees, it is not apparent that the divergence is zero.) This may be what caused this species to end up in an unexpectedly basal position within CRD I(a) in the species trees.
(KM001470-2 Pluvialis squatarola: presumably not a problem, but there is an interesting, obvious 86bp inversion in the three sequences; not present in P. dominica; I had to replace these 86bp by their reverse complement to restore base homology in the alignment.)
- rag1: KM001514-5 Charadrius bicinctus: these two sequences are only one substitution away from the three sequences of Vanellus miles (KM001586-8), and nested within Vanellus...? In the "rag1" tree (f) of the thesis, this species also clusters with Vanellus; in the equivalent tree of the paper (no Vanellus included), the species is highly divergent and falls between CRD I and CRD II.
I've joined 6 single gene trees, and a combined ML tree including all the Charadriidae species with available data, constructed with a data set from which I removed the sequences that I view as problematic (based on the above). (I tried several other, more restricted data set compositions [less taxa, but less gappy matrix]--I can post these as well if there is interest, but the topology didn't change significantly. Keep in mind that for the six "museum toe pad species", only the available part of the data was included.
C. placidus was excluded entirely, as the only publicly available sequences for this species currently are the cox1 sequences discussed above, and these I think are incorrect.)
(For what it's worth, my own current preference would be to use
Charadrius for the clade sister to
Eudromias [
ie., all the species with bright eye ring and/or bill base], and
Anarhynchus for the clade sister to
Peltohyas [which groups species with invariably dull/dark eye ring and bill].)