Bird song is a pretty complicated behavior, with lots of constraints.
A song is first of all a result of a physiological process, requiring certain muscles etc. In that respect, I know of no reason why multiple birds (of similar size) couldn't produce identical sounds, and indeed we see that many birds are capable of mimicking other birds. (It's possible that what sounds like perfect mimicry to us in fact sounds terrible to the birds - but let's just assume that we're judging by human standards.)
A song is also a social phenomenon - it's not effective unless it's understood by other birds of the same species. This limits the speed at which new songs can develop, and it also means that two nearby species will usually need to find non-overlapping song styles so as not to waste each others' time.
Similarly, a song is a sonic phenomenon that occurs in a sonic "landscape" already crowded with sounds including other songs. To be heard at all, the song must avoid interference from other sounds or other songs. It's like radio signals: government regulators divide up the spectrum, and leave a bit of empty space between the stronger broadcast signals. But amateur operators all fit into a few small ranges, and avoid interfering by being physically dispersed compared to signal strength, and by simply taking turns - not trying to broadcast at the same times.
So there's a certain amount of audible "space" available which must be divided up between perhaps a few hundred songbird (and amphibian, and insect, etc) species in any one location. The question can be phrased thus: How many distinguishable ways can the sonic landscape be "sliced", and how does this compare to the number of songbird species * the number of spatially distinct habitats?