Jump to content

Talk:Race and genetics/Archive 3

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1Archive 2Archive 3Archive 4Archive 5Archive 6

Genetic variation image

I notice that Muntuwandi has just removed this image from the article, with the explanation “image meant to push a POV”. No further explanation of this has been provided, so if he wishes for this image to be removed, he’ll need to be more specific on how it’s a violation of Wikipedia’s policies.

I’m not even sure how Wikipedia’s definition of Civil POV pushing can apply to a single image, since POV-pushing is something that occurs over many edits during a long period of time. I certainly haven’t been doing that to this article; my adding of this image to it (and a small change to the text) is the only time I’ve edited it during a period of several months.

The image is also quite relevant to the topic being discussed in the section where it was included. That section is describing Cavalli-Sforza’s study, and his study actually contains the diagram that this image is based on. Since Cavalli-Sforza apparently thinks this image is useful as an illustration of his data, I think it’s reasonable to assume that it can also be considered a useful illustration of his data by Wikipedia’s standards. The reason I’ve cited it to Jensen rather than Cavalli-Sforza is just because I own a copy of Jensen’s book The g Factor, while I don’t own a copy of Cavalli-Sforza’s original study, so if I were to cite this to Cavalli-Sforza directly I would be afraid of subtly misrepresenting him.

For the reasons given above, I believe the removal of this image to be an erroneous edit, and am reinserting it. If you disagree, please justify your reasons for removing it here, and obtain a consensus for doing so. --Captain Occam (talk) 13:49, 18 October 2009 (UTC)

I agree with Muntu, this image is stupid. It's from jensen apparently, who is clearly not a relible source for either "race" or "genetics" subjects. I'm also confused by his mixing of terms such as "indo-European" and "African". Africa is a continent, but "indo-European" is a language group. There are plenty of good cladograms from much more reliable sources than Jensen, for example here. This article should not be citing Jensen at all. When did he become an expert molecular anthropologist? Alun (talk) 06:05, 23 October 2009 (UTC)
If you would like to edit the article to use a better cladogram or cite more famous/respected scientists than Jensen, then go ahead. I agree that this would be an excellent idea. But, in the meantime, don't just remove material because it is not as good as it might be. Replace, don't remove. (If you can get permission to use a specific cladogram, just let us know. I (or some other editor) would be happy to show you how to insert a better image, if you haven't done that before. David.Kane (talk) 13:08, 23 October 2009 (UTC)
I believe Alun's point is that while Jensen may be respected as a psychologist, he is not an expert on either race nor genetics, so citing him in a non-expertise field isn't of encyclopaedic value. However, in the meantime, if you got reverted, I would suggest you keep discussing on the talk page and try to gather consensus there rather than start edit-warring by reintroducing material which other editors obviously oppose.--Ramdrake (talk) 15:00, 23 October 2009 (UTC)
I believe that we should keep an image that has been in the article for a longtime until a consensus to remove it is reached. So, I will add it back. Cladograms are, obviously, useful. Until we find a better one, we should keep the one we have. No one has argued that it is worse than nothing. David.Kane (talk) 15:33, 23 October 2009 (UTC)
Wikipedia doesn't work this way, unfortunately. Look at it this way: so far, 3 editors at least have removed the image that you've kept reinserting. That should be a clue that the temporary consensus is now not to include the image. I would recommend that you respect this consensus for now and try to gather a new consensus on the talk page to include the image if you so wish. Edit-warring around the image will only get you blocked, and/or get the article protected, or both.--Ramdrake (talk) 15:44, 23 October 2009 (UTC)
If you seek to make a controversial change to a page, you must formally seek consensus. Having three editors that agree with you is not consensus. Feel free to make a formal motion on this topic, if you like. Note that this image is based on "The History and Geography of Human Genes" by L. Luca Cavalli-Sforza, Paolo Menozzi, & Alberto Piazza. I suspect that most editors will agree that these are experts on race and genetics and that an image based on their work belongs in this article. David.Kane (talk) 15:48, 23 October 2009 (UTC)

To summarize, here is the original edit. As best I can tell, the image is accurately sourced to "The History and Geography of Human Genes" by L. Luca Cavalli-Sforza, Paolo Menozzi, & Alberto Piazza. (If anyone has information to the contrary, please share with us.) All agree that this image might be replaced by a more up-to-date cladogram if one were available. Until that happens, at least two editors want to keep it. (Three if you include the editor that added it.) Three editors have sought to delete it. Other opinions are welcome. As always, we operate by consensus but controversial change requires consensus. David.Kane (talk) 16:01, 23 October 2009 (UTC)

To quote Wikipolicy: "the onus is on the editor seeking inclusion to gather consensus on the talk page". Not the other way around.--Ramdrake (talk) 16:42, 23 October 2009 (UTC)
There is nothing wrong with cladograms in general, however there is a problem with this particular cladogram. Firstly because it is sourced from Jensen, and secondly because the caption was somewhat misleading in giving the impression of the existence of clearly defined mutually exclusive genetic groups. In other words the image seemed to have been placed with the intention of demonstrating the existence of genetically discrete races, a position which has not been accepted by the majority of geneticists and anthropologists. There are several different cladograms that have been constructed, each depending on the genetic system used (blood groups, y-chromosome mtDNA etc) and the populations studied. There are also several different ways to group humans. There is this particular article, human genetic clustering which deals with some of grouping methods. Wapondaponda (talk) 18:19, 23 October 2009 (UTC)
Agree. There is a problem with David's claim. He states that the image is sourced to "the image is accurately sourced to "The History and Geography of Human Genes" by L. Luca Cavalli-Sforza, Paolo Menozzi, & Alberto Piazza", but the image file clearly states it's from Jensen. Furthermore David should know this as he was the one who uploaded the image. I'm sceptical of this image, for one thing it has Basques and Lapps (Sami) as Indo-Europeans, but neither of these groups speak Indo-European languages. Sami is an Uralic language, and Basques speak, well Basque, which is a language isolate. The other problem with this cladogram is that it's not very accurate and seems to have been designed to be misleading.
There are other problems with misleading statements in the text of the article, I cover them in the section below, but I'm particularly unhappy about claims of "genetically defined" populations. That means next to nothing of any substance. We should try to avoid apparently technical sounding phrases that are meaningless in the real world of genetics.
Further more claims that "races" comprise of individuals who are always genetically more similar to each other than they are to people without the "race" need to be supported by exceptional sources because these are claims that have been repeatedly debunked by anthropologists, molecular anthropologists, molecular biologists and population geneticists again and again.
I suggest that if we must have a cladogram, then the most accurate on is something more similar to the one in the paper I link to above, it shows clearly that most variation occurs in Africa, that there are African groups are closer to non-African groups than they are to other African groups (giving the lie to the claim that Africans are always more similar to other Africans than they are to non-Africans), that genetic distances between African groups are often larger than those of all non-Africans from each other combined, and that all non-African groups are sub-groups of African groups.
My personal problem with cladograms is that they portray a false sense of isolation between the groups. Most often in biology cladograms are used to draw distinctions between groups of organisms at the specific scale or above. When used in that case the fact that no gene flow is indicated between groups is accurate. But to apply the same method to intra-specific and indeed intra-subspecific groups produces a false sense that these groups are discrete. Kittles and Weiss (2003) say it best when they say

Cavalli-Sforza and many other prominent authors have presented the analytic results in tree-like diagrams of relationships among populations (18, 120, 121, 145,167). Dendrograms are a visual convenience for presenting data, but it is easy to lapse into accepting the populations thus portrayed as taxonomic rather than merely statistical spatial spot-sampling units, and equating these population samples to colloquially defined races as if the latter have the same kind of biological reality. However, careful geneticists know that if we had more geographically comprehensive samples, human genetic variation is actually characterized by clines (spatial gradients) of allele frequency rather than categorical variation between populations, and the pattern varies among genes for the historic reasons of drift, selection, and demographic history (18). Even defined in the usual ways, races do not correspond to discrete, much less monomorphic, human types. Instead, the pattern of variation can generally be described as isolation by distance: Genetic differences between populations are roughly proportional to the geographic distance between them.

Although obviously one cannot take a rejectionist stance simply because cladograms are used so frequently in genetics texts. Alun (talk) 21:45, 23 October 2009 (UTC)
Would something like this image be any better? As can be seen from the web page where it's found, this is cited to Cavalli-Sforza directly, rather than to Jensen. I suppose something like this could be considered better than what the article had originally, although apart from who it's cited to, it's not very different.
As can be seen from the diagram that's cited directly to Cavalli-Sforza, the original diagram in his study shows this apparent isolation. The reason for this is because his study deliberately avoided admixed populations, and only examined people whose ancestors had lived in the same area prior to 1492. Obviously, this is going to result in populations appearing more clearly-defined than they actually are at present. But when we're describing Cavalli-Sforza's study (as we do in the current article), we shouldn't be deliberately omitting some of his results because they present a "false sense of isolation". What we should do is present his results, while also explaining the methods he used that caused his results to appear the way they did. --Captain Occam (talk) 13:32, 24 October 2009 (UTC)
I’ve just replaced the image with a better one. As far as I can tell, the only real problem with this image was that it was cited to Jensen, when it ought to be cited to Cavalli-Sforza directly, since (as I pointed out above) there isn’t any significant difference between Jensen’s image and the original one from Cavalli-Sforza.
In fact, the image in Jensen’s book appears to be an exact copy of the one from Cavalli-Sforza. The credits for the image in Jensen's book state, "Cavalli-Sforza, L.L., Menozzi, P. & Piazza, A,. The history and geography of human genes. Copyright © 1994 by Princeton University Press. Reprinted by permission of Princeton University Press." This image wouldn't state that it's a copyrighted portion of the original study, and was reprinted with the permission of the original study's publisher, if it weren't the same image.
I’ve also added a sentence to the paragraph introducing Cavalli-Sforza’s study, in order to better explain why Cavalli-Sforza’s results show a greater degree of isolation than actually exists between these populations in the present. Hopefully these changes will address most people’s concerns. --Captain Occam (talk) 14:13, 24 October 2009 (UTC)
so here's a png version because the svg version didn't display properly, from here

(talk) 22:28, 23 October 2009 (UTC)]]

You just can't change the citation in the original image and then say it's OK. The original uploader cited it to Jensen, but you have changed that citation, but you were not the original uploader and it is not acceptable for you to change the citation unilaterally. If the image is from Jensen, then it is from Jensen. If you can find a better image from Cavalli-Sforza’s actual book, then go ahead. Furthermore it is not acceptable to simply scan images from a book or any source, that is a breach of copyright. If you want to include an image from a source, then you need to make a new image yourself with a drawing package that displays the same information but is different. Then it becomes your work. The problem with Jensen's image is not simply that it is cited to Jensen, it is inaccurate, as I state elsewhere on this talk page, it claims that Lapps and Basques are speakers of indo-European languages, which is just plain wrong, and it also mixes language groups, such as indo-European, with continental regions, such as Africa. It's a total shambles and there are much better sources for such illustrations. I have provided a link to one such source. I had a go at re-drawing it here but for some reason the wiki software won't render it. Usually it does it just fine, you can setill see the image by clicking on it, your browser should display it just fine, it's just the wiki that doesn't. I'll see what I can do to sort this out. Alun (talk) 14:03, 25 October 2009 (UTC)

Alun, is line length indicative of estimated distance/isolation in your image? --Aryaman (talk) 15:58, 25 October 2009 (UTC)

According to the original publication yes it is, although they do say that for some populations the samples were so small that the terminal lines may be shorter than they really are. Alun (talk) 16:40, 25 October 2009 (UTC)
Would you mind if I made a more exact rendition of this image, then? Comparing with the original, there are discrepancies which bother me (not an attack on your work, simply an observation), particularly regarding the Indian populations (notice that, for example, "Lower Caste" and "Tribal Indians" appear to share a line in your image, whereas they are clearly separated in the original; also the line for "Upper Caste" is nearly three times longer in your version). Thanks, --Aryaman (talk) 16:52, 25 October 2009 (UTC)
Certainly I don't mind. My version was a rush job and it is certainly true that it could have been better. I'd appreciate it very much :-) Alun (talk) 16:56, 25 October 2009 (UTC)
Done. The lines are now exactly as they appear in the original. I couldn't get the font to match perfectly, thus some names are slightly displaced. I also rearranged the key to reflect the vertical arrangement of the groups. But I think it's an improvement on the whole. Thanks, --Aryaman (talk) 21:04, 25 October 2009 (UTC)

Continued Discussion of Genetic variation image

I have broken the discussion up since the image above was making it hard to format correctly. talk claims "Furthermore David should know this as he was the one who uploaded the image." Untrue. Please get your facts straight. All that I have done is to protest the deletion of this image until something better is found to replace it. Assuming that Captain Occam is correct and this image is taken from a perfect copy of Cavalli-Sforza, then the fact that the editor who created it first found it in a book by Jensen is irrelevant. David.Kane (talk) 15:00, 24 October 2009 (UTC)

Even so, in this case I suppose it's better for it to be cited to Cavalli-Sforza directly rather than to Jensen, hence my having updated it. --Captain Occam (talk) 15:07, 24 October 2009 (UTC)
This is what is actually written in History and Geography of Human Genes regarding the image

The nine clusters chosen differ in their genetic homogeneity, but we are interested in establishing history and not in generating a classification scheme. A criticism raised by Bateman et al. (1990a) on this point misses the difference between taxonomy and phylogenetic analysis. Even if we were interested in taxonomy, calibrating the homogeneity of clusters on the basis of genetic distance in a tree would still generate an arbitrary classification that would inevitably depend on the sample of populations chosen. Lest there be no misunderstanding, we (Cavalli-Sforza et al. 1988; Cavalli-Sforza and Piazza 1990), unlike others (Bateman et al. 1990a, b) do not give the clustering obtained in the tree of figures 2.3.2 and 2.3.3 any "racial" meaning for reasons discussed in the first chapter. Clusters were formed for reducing the complexity of the data and were given specific names in order to simplify the discussion

Consequently any use of this image that attaches a racial meaning to these groups contradicts what the authors of the image have stated about the image. The quote from Kittles and Weiss 2003 has summarized perfectly how dendograms can be misused to imply the existence of discrete races. Occam on his talk page and blog has stated that he believes that there is a "biological basis for the concept of race". While he is entitled to his own opinion, Wikipedia isn't the place for original research or POV pushing. Wapondaponda (talk) 19:41, 24 October 2009 (UTC)
Why does my personal opinion, which I’ve expressed outside of Wikipedia, have any relevance here? If I were to research your own personal opinion as expressed at your personal website, and found that I didn’t like it, would that make your contributions “POV pushing” even if you were following Wikipedia’s policy with them?
The reason why this chart belongs in the article is quite simple: the section of the article that included it is discussing Cavalli-Sforza’s study, and Cavalli-Sforza’s study contains this image. As I stated before, if Cavalli-Sforza considers this image a useful illustration of his data, then we have no justification to think otherwise. In its present state, the article does not claim that the clusters shown on the chart necessarily represent racial groups, and readers who reach their own conclusion about this are not engaging in original research unless they include those conclusions in the article.
Ramdrake and Alun’s concerns (that the image doesn’t accurately represent Cavalli-Sforza’s data, and that Jensen shouldn’t be considered a reliable source about genetics) have been addressed at this point, and the article and image has been changed to take their concerns into consideration. However, neither of them mentioned my personal motives as a valid reason for removing this image. Unless you or they can raise a problem with this image that isn’t based on bulverism, it belongs in the article for the reason given above. --Captain Occam (talk) 22:10, 24 October 2009 (UTC)
You have a habit of putting words into other peoples mouths. I haven't yet seen Ramdrake or Alun indicate that you have addressed their concerns. You seem to be imposing consensus on others. Yes you have expressed your personal opinion outside wikipedia, but you posted a link to your blog on your talk page so your personal opinions can be viewed through wikipedia. It is relevant because WP:NPOV is one of wikipedia's core policies and you are violating NPOV by advocating your personal opinions in ways that contradict mainstream views on race and genetics. I have seen a number of incidents where persistent advocacy is dealt with administratively. Wapondaponda (talk) 05:48, 25 October 2009 (UTC)
The reason I believe that I’ve addressed their concerns is because I’ve changed the image and article to fix the problems that they pointed out. Since the problems that they pointed out aren’t problems anymore, this means that the concerns they’ve raised have been addressed.
“you are violating NPOV by advocating your personal opinions in ways that contradict mainstream views on race and genetics.”
Accusations like this aren’t meaningful if you don’t provide specific examples. I could just as easily accuse you of trying to exclude significant-minority viewpoints that have been covered by reliable sources, which NPOV policy states that we should include in proportion to their degree of coverage. The policy page states, “The neutral point of view is a means of dealing with conflicting perspectives on a topic as evidenced by reliable sources. It requires that all majority- and significant-minority views must be presented fairly, in a disinterested tone, and in rough proportion to their prevalence within the source material.”
If you’re going to accuse me of violating this policy, you need to provide specific examples of my attempting to provide greater weight to a viewpoint than is warranted by its degree of coverage in reliable sources. This was discussed fairly recently as pertains to the Race and intelligence article, and nobody there attempted to dispute my points about the degree of coverage that the hereditarian view has received in peer-reviewed literature. Unless you’re trying to reuse an argument that’s already been explicitly refuted, I would assume you aren’t referring to that article. Is your advocacy accusation against me based on any actual examples, or is this a baseless assertion? --Captain Occam (talk) 06:33, 25 October 2009 (UTC)
This is what you have written on your blog

This image can be considered a visual representation of my argument against the claim that there’s no biological basis for the concept of race, which is a popular belief among sociologists. It’s based on a study of the genetic distance between 42 human populations around the world, which was published in 1994 by the Italian geneticist Luigi Luca Cavalli-Sforza. (When someone has a name like that, you probably don’t need to look at DNA to know what country they’re from.) Since I haven’t read Cavalli-Sforza’s original paper, this image is based mostly on Arthur Jensen’s analysis of Cavalli-Sforza’s results in The g Factor.

On your blog, you state that this image proves that there is a biological basis for race. You have introduced the very image into the article race and genetics. It is pretty clear to me that you are trying to use this image in the article to prove that there is a biological basis for race.
Occam admits that he has not read Cavalli-Sforza's original publication but instead uses Arthur Jensen's publication. My initial impression is that Jensen has put his own spin onto Cavalli-Sforza's work because Cavalli-Sforza cladogram uses nine clusters instead of the six mentioned in the image. Since CS is the scientist who did the original investigation, if this image or a similar image is to be used, it should come directly from the horses mouth. While Jensen may be using this image to prove that there is a biological basis for race, Cavalli-Sforza was clear that the image he created should not be interpreted as having any "racial" meaning. Wapondaponda (talk) 09:31, 25 October 2009 (UTC)
“On your blog, you state that this image proves that there is a biological basis for race. You have introduced the very image into the article race and genetics. It is pretty clear to me that you are trying to use this image in the article to prove that there is a biological basis for race.”
Let me ask you again: what difference does it make what opinion I express in my blog, and whether I engage in original research outside of Wikipedia? And why do my motives matter, if everything I do at Wikipedia is following Wikipedia’s policies? I just linked you to an article on a logical fallacy called bulverism, which is the fallacy of assuming that if you think you can infer a person’s motive for making an argument, that means the argument must be invalid. This is what you’re doing.
In this case, my argument is that in a discussion about Cavalli-Sforza’s study, it’s worthwhile to include the same image that Cavalli-Sforza included in this study, because Cavalli-Sforza apparently considers this a useful illustration of his data. As was pointed out above by both me and David Kane, what book I originally found the image in is not relevant here, because the image is identical in either case, and it’s now cited to the study’s original author. But rather than attempting to actually address this argument, you’re simply attacking what you see as my motive for making it. Do you not see how this is an example of the bulverism fallacy? --Captain Occam (talk) 10:09, 25 October 2009 (UTC)

Captain, for the record, I don't see that it is appropriate to assume that my concerns have been addressed without asking me what I thought of it. It should also be noted that the comment that kept being returned to the article (that this clustering was similar to racial groupings and thus somewhat gave it a basis in genetics) was possibly the more offensive, as this is quite contrary to what Cavalli-Sforza said in his study. Also, if I were you, I'd take Muntu's warning about advocacy seriously, as I've observed similarly that you seem to overinterpret what other people say to fit your opinions. This is the kind of thing that can dangerously skew a discussion. Please be careful.--Ramdrake (talk) 13:52, 25 October 2009 (UTC)

I'd like to agree with Ramdrake. When we cite sources, we cite the point of view of the source, and not our own interpretation of the source. There are multiple problems with Jensen's interpretation of Cavalli-Sforza. Firstly Captain, you can't take some work from Jensen that is loosely based on Cavalli-Sforza, upload it to the commons, then change the citation to say it's from Cavalli-Sforza. It's not, it's from Jensen who claims it's from Cavalli-Sforza. If you want to include Cavalli-Sforza's work, then include that, not Jensen. Wapondaponda has clearly shown that the work you produced does not accurately reflect what Cavalli-Sforza produced. Indeed my concerns about the claim for "indo-Eurpeans" is justified, because Cavalli-Sforza doesn't use the term at all. I find the whole argument that this supports the concept of "race" rather strange, it's clear from this the Africans have a higher "taxanomic rank" than all other groups, they are a paraphyletic group to the rest of humanity, this observation alone discredits any validity for traditional concepts of "race".
My other major concern with Cavalli-Sforza's work is that it's so old, with the advent of cheap sequencing methods and a massive amount of sampling, there are a number of much better studies than this old one from 1994, we have Noah Rosenberg's studies from 2002, 2005] and 2007, and one published in 2008 with Cavalli-Sforza himself as an author Worldwide human relationships inferred from genome-wide patterns of variation. One thing all these studies have in common is that they include far more populations and investigate far more genes than the 1994 publication uses. Another thing they have in common is that they disagree with each other about just how this tree should look. That's because the tree looks different depending upon how the human population is sampled, how the data are analysed, and even which types of DNA polymorphism are analysed (e.g. microsatellite data gives different results to SNP data). The other thing that these papers all share is that all populations include members that belong to multiple clusters. That's right, a great many, possibly a majority, of individual people belong to multiple clusters. So if a cluster represents a "race", then most people are "multi-racial". Check out the data in these three papers, it's clear that clusters are far from genetically mutually exclusive. The best way to display these data is to put them on a map and show the proportion of each population in each geographical region that belongs to each cluster. That's what I did when I made by Bauchet map of European clusters, it clearly shows that clusters overlap, and no one seems to have a problem with this method of displaying data. The other way to display such data would be to use pie diagrams. Alun (talk) 16:35, 25 October 2009 (UTC)
"The study examined only people who lived in the same area where their ancestors had lived prior to 1492"
Is this seriously what Cavalli-Sforza says? I very much doubt it. How could the authors possibly know where all the ancestors of all of the participants lived prior to 1492? That's impossible, we don't have records that go that far back. How many ancestors does that include per person? A quick calculation, based on 25 years per generation, says it's 20 generations ago, which is 220 ancestors per person in the study, or 1048576 ancestors per person. And we're supposed to believe that Cavalli-Sforza claims to have verified that every one of these ancestors, for every individual in the study, lived in the same village as the person in the study during the year 1492? Come on, that's a really massively exceptional claim, and exceptional claims require exceptional sources, we need at least a quote from Cavalli-Sforza for that claim. Alun (talk) 16:54, 25 October 2009 (UTC)
“It should also be noted that the comment that kept being returned to the article (that this clustering was similar to racial groupings and thus somewhat gave it a basis in genetics) was possibly the more offensive, as this is quite contrary to what Cavalli-Sforza said in his study.”
That sentence has been gone from the article for a few days. The only thing being discussed now is whether to include the chart.
“Wapondaponda has clearly shown that the work you produced does not accurately reflect what Cavalli-Sforza produced. Indeed my concerns about the claim for "indo-Eurpeans" is justified, because Cavalli-Sforza doesn't use the term at all.”
Neither does Jensen, actually. I used that term for this group myself because I couldn’t think of a better one, but if you don’t like it, feel free to suggest a better term for the genetic cluster represented by people from India and Europe.
If you think the chart doesn’t represent Cavalli-Sforza’s data accurately enough, then you need to be specific about what needs to be changed. Just saying that it’s inaccurate doesn’t cut it. As I demonstrated above, the shape of the chart is identical in either case, as are the names of the individual populations shown on it. Is the problem just with the names of the clusters associated with each continent? And if so, what should they be renamed to?
“My other major concern with Cavalli-Sforza's work is that it's so old, with the advent of cheap sequencing methods and a massive amount of sampling, there are a number of much better studies than this old one from 1994”
Be that as it may, the fact of the matter is that the article devotes around half of the “models of genetic variation” section to Cavalli-Sforza’s study. If the article is going to go into this much detail about his study, I think it’s reasonable for it to include the same illustration that Cavalli-Sforza’s study uses.
“Is this seriously what Cavalli-Sforza says? I very much doubt it. How could the authors possibly know where all the ancestors of all of the participants lived prior to 1492?”
See for yourself. The study describes this here. Quoting from the study: “We restrict our interest to aboriginal populations, which we define as those already living in the area of study in A.D. 1492.” If you think I haven’t described this aspect of the study clearly, you can rephrase it.
I find it incredibly odd that you claim to be a geneticist by training, yet not only are you unfamiliar with such a famous genetic study; you’re also using an argument from incredulity against one of its basic premises, and are expecting me to be the one who explains it to you. As a geneticist, shouldn’t you be able to learn the answers to questions like this without my help? --Captain Occam (talk) 05:33, 26 October 2009 (UTC)
  • I don't expect you to explain the study to me, I expect you to explain the changes you make. For example the study states that they sample from populations that existed in 1492 (presumably they have historic records that document the names of these populations and searched for modern populations with the same names), but your edit claims that the "study examined only people who lived in the same area where their ancestors had lived prior to 1492". That is a very different claim. Populations are not static entities, and it is invalid to claim, as you are, that there has been no migration between populations since 1492. Indeed the modern populations may be significantly genetically different to the 1492 populations, to the extent that ethnic identity is fluid and people move around a fair amount. As I said, Cavalli-Sforza's work does not make the claim that your edit did. You are not accurately representing what the source says and I find that unacceptable.
  • So you are introducing original research by your own admission: "I used that term for this group myself". You can't do that, it also explains the unwarranted mixing of linguistic groupings and continental populations. So now you know, neither Lapps (Sami) nor Basques speak Indo-European languages. Why can't you just stick to what the sources say, rather than putting your own spin on them?
  • "the article devotes around half of the “models of genetic variation” section to Cavalli-Sforza’s study" So about half is not then? Besides Cavalli-Sforza's work is far from the most authoritative on the subject, and he has little to say about "race", except to say that he doesn't think the genetic data support the concept.
  • "If you think the chart doesn’t represent Cavalli-Sforza’s data accurately enough, then you need to be specific about what needs to be changed." No I don't, in Wikipedia the onus is on you to show that the chart truly represents what Cavalli-Sforza says, but by your own admission it doesn't, you admit to introducing OR to the chart, and the quote that you yourself include above in your post obviously doesn't support the claim that the study only sampled people who's ancestors lived in the same region in 1492. I have stated that I am against including this chart altogether, and have suggested what I think is a better one. that is my position, I don't have to accept this chart at all as far as I can see. What is wrong with the alternative I have proposed? I could just as easily say that "you have to accept my alternative or give a reason why it needs to be changed". But that's not how Wikipedia works, and you should know that by now. Alun (talk) 06:20, 26 October 2009 (UTC)
“You are not accurately representing what the source says and I find that unacceptable.”
Then edit the article yourself, and come up with a better way to describe this.
“Why can't you just stick to what the sources say, rather than putting your own spin on them?”
Because I couldn’t come up with a way to represent continental groups in the chart without coming up with a name for them. What kind of “spin” do you think this is? You obviously think I’m trying to prove that our socially defined “races” exist in biology also, but coming up with a name for this group on the chart actually goes against that idea, since Indians are not traditionally considered to be “white”.
“No I don't, in Wikipedia the onus is on you to show that the chart truly represents what Cavalli-Sforza says.”
You seem to be basing this on what Ramdrake said: that Wikipedia’s policy states "the onus is on the editor seeking inclusion to gather consensus on the talk page". What policy page says this? It isn’t stated on WP:consensus, and a Google search for this phrase returns no results.
My opinion is that since the article devotes more space to Cavalli-Sforza’s study than it does to any other single study of human genetic variation, including a chart from this study would be worthwhile. If there are aspects of it that are OR, fine; I can change them. But I have not been able to find any policy page that states what Ramdrake claimed Wikipedia’s policy is. --Captain Occam (talk) 06:41, 26 October 2009 (UTC)
Incidentally, you asked me what’s wrong with the chart that you proposed. The answer is that I actually don’t have a problem with it, and you’re free to add it to the article if you’d like. However, I see no reason not to include the Cavalli-Sforza chart in addition to yours, as long as the article is discussing Cavalli-Sforza’s study in as much detail as it currently does. --Captain Occam (talk) 06:56, 26 October 2009 (UTC)
  • "Then edit the article yourself" I actually did.[1]
  • "Because I couldn’t come up with a way to represent continental groups in the chart" That is not an excuse for OR
  • "You obviously think" Stick to the article, comment on content not on users.
  • "You seem to be basing this on what Ramdrake said" No, this is a standard at Wikipedia, if you want to include something, you have to show it's what a reliable source says. That's why we have verifiability and no original research policies. The three policies no original research, verifiability and neutral point of view are our core content policies, they are the bedrock of what makes Wikipedia reliable. If Ramdrake and I are both citing the verifiability policy it's because we are both long standing experienced editors and we both understand how important this policy is. I suggest strongly that you familiarise yourself with these three policies, and also our reliable sources guideline.
  • "including a chart from this study would be worthwhile." but you didn't, you included OR. Besides I tend to disagree, there are better more recent studies out there.
  • "I see no reason not to include the Cavalli-Sforza chart in addition to yours" What? It's not mine, it's from a reliable source. But yours is OR. You say you don't know what to call one cluster, but why don't you call it what it's called in the original? Alun (talk) 07:16, 26 October 2009 (UTC)
“No, this is a standard at Wikipedia, if you want to include something, you have to show it's what a reliable source says. That's why we have verifiability and no original research policies. The three policies no original research, verifiability and neutral point of view are our core content policies, they are the bedrock of what makes Wikipedia reliable. If Ramdrake and I are both citing the verifiability policy it's because we are both long standing experienced editors and we both understand how important this policy is. I suggest strongly that you familiarise yourself with these three policies, and also our reliable sources guideline.”
I’m not asking about that. I’m asking about your implication that if we can’t establish a consensus, that means the content should be removed, rather than the article being left in its previous state. What policy is there that says this?
“You say you don't know what to call one cluster, but why don't you call it what it's called in the original?”
As far as I know, he doesn’t actually gives names to these clusters; he just describes them. But a lengthy description of each cluster isn’t going to be able to fit into an image like this. My names for them were an attempt to come up with a concise stand-in for Cavalli-Sforza’s original descriptions. You say that my having done this is OR, and you might be right, but that still leaves the question of how they should be labeled. Do you have a better suggestion? --Captain Occam (talk) 07:38, 26 October 2009 (UTC)
  • "I’m asking about your implication that if we can’t establish a consensus, that means the content should be removed," I made no such implication, here are the quotes:

    Alun: in Wikipedia the onus is on you to show that the chart truly represents what Cavalli-Sforza says.
    Captain O: You seem to be basing this on what Ramdrake said.
    Alun: No, this is a standard at Wikipedia... The three policies no original research, verifiability and neutral point of view are our core content policies, they are the bedrock of what makes Wikipedia reliable.
    Captain O:I’m asking about your implication that if we can’t establish a consensus, that means the content should be removed

    I have never stated that the material should be removed because we have not established a consensus. I have stated that the material should removed because it is not correctly verified and it is OR. Consensus applies with regards to reliable material, it never applies to unreliable material. I removed this material because it does not meet our criteria for inclusion, i.e. it does not meet two of our core content policies. Our core content policies are non-negotiable, and no consensus can overturn them. Or to put it another way, if material exists in an article that is not supported by a reliable source, then it should be removed. In both of these cases the source did not support the claim. Cavalli-Sforza did not claim that the ancestors of the participants who were alive in 1492 lived in the same geographic regions as the participants, and the chart did not have a group called indo-Europeans. If you don't accurately say what the source says, the the material can and should be removed and no consensus can ever change that.
  • "As far as I know, he doesn’t actually gives names to these clusters; he just describes them. But a lengthy description of each cluster isn’t going to be able to fit into an image like this." But you say that he produces the cladogram in his book, so what is the caption that accompanies this cladogram? And if Cavalli-Sforza doesn't give these groups names in his book then there is probably a good reason for that. You can't go and introduce OR just because you think what you're doing is better than the original. Alun (talk) 13:44, 26 October 2009 (UTC)
“I have never stated that the material should be removed because we have not established a consensus.”
All right, it sounded like you were agreeing with Ramdrake about this. I wonder where he got that quote about any material at Wikipedia needing to be removed if the person who wants it included can’t build a consensus for this?
“But you say that he produces the cladogram in his book, so what is the caption that accompanies this cladogram?”
The caption of the chart I used doesn’t talk about the names for each of these clusters. But here, take a look at this. I just noticed this: he has a second chart that’s pretty similar to the first one, except that on this one the clusters are given names. Would you still consider it original research if I were to relabel these clusters according to the names they’re given by the original author on another chart in the same study?
That sounds to me like a reasonable compromise. And we can also include the cladogram that you posted, if you think it's important to have one from a more recent study.--Captain Occam (talk) 15:56, 26 October 2009 (UTC)
By the way, Alun: I think the content of the paper from Nature that you linked to might be worth including in the article also, rather than just its diagram. It seems like a pretty good overview of the scientific consensus about this topic, and it also explains something that I’ve been trying to explain to you for a little while. I’m going to quote the paper’s conclusion about this:
Data from many sources have shown that humans are genetically homogenous and that genetic variation tends to be shared widely among populations. Genetic variation is geographically structured, as expected from the partial isolation of human populations during much of their history. Because traditional concepts of race are in turn correlated with geography, it is inaccurate to state that race is “biologically meaningless.” On the other hand, because they have been only partially isolated, human populations are seldom demarcated by precise genetic boundaries. Substantial overlap can therefore occur between populations, invalidating the concept that populations (or races) are discrete types.
This is the way race is understood by scientists who believe that it has a basis in biology, including Arthur Jensen. In fact, if I were to quote his explanation of the genetics of race from The g Factor, it’s nearly identical to what’s explained by this paper, except that in Jensen’s case his explanation is based on Cavalli-Sforza’s study. According to the Nature paper, the idea that races exist as platonic categories with discrete boundaries was debunked more than 50 years ago, and Jensen certainly isn’t stupid enough to be more than 50 years behind the current research on this topic. So if you think that disproving this obsolete 19th-century idea is the same as disproving the viewpoint of people such as Jensen, you’re attacking a strawman. --Captain Occam (talk) 16:54, 26 October 2009 (UTC)
"This is the way race is understood by scientists who believe that it has a basis in biology" Please don't bring your own interpretations here, thanks, the paper does not say this, and unless you can find a specific scientist that cites this quote from this paper as his understanding of what "race" is, then what you're saying is a synthesis. What you say doesn't make sense, because the article doesn't actually say how "race is understood" by anyone, let alone by racialist scientists, what it says is that it's "it is inaccurate to state that race is “biologically meaningless”, which is a very long way from saying that "race" has any classificatory validity. But what they are saying is that peoples from distant parts of the planet are a little bit different from each other biologically. That's not an acceptance of "race" concepts, it's an acknowledgement that whereas concepts of "race" have some geographical validity, so human genetic variation is distributed geographically. But the consensus is that geetic variation is distributed by isolation by distance, which makes it almost impossible to define what a "race" is.
"if I were to quote his explanation of the genetics of race from The g Factor" if you were to do that it would be irrelevant. This article is not about psychology, and what Jensen believes is irrelevant, the man is not an expert anthropologist nor an expert geneticist. Why do you keep introducing him here?

 ::::::::"Jensen certainly isn’t stupid enough to be more than 50 years behind the current research on this topic." I wouldn't be too sure, I'd say he's more like 100 years behind in his anthropological understanding. Alun (talk) 08:07, 27 October 2009 (UTC)

(reset indent) Captain, I don't think any of the editors here have a problem with that. However, one must also acknowledge that this doesn't change the fact that the traditional conceptions of race have little to nothing to do with biology. Also, and this is quite relevant to the article, the vast majority of racial attributions for the comparative IQ studies was done using the traditional conceptions of race (in fact, I'm not aware of a single IQ study which would have used any kind of genetically-based "racial attribution"). This is where Rushton's theories fail grandiosely.--Ramdrake (talk) 17:30, 26 October 2009 (UTC)

If you read the paper, you’ll see that one of the things it states is that the traditional concepts of race are correlated with genetic clusters based on geographical origin, although imperfectly so. I’m not sure either way about Rushton, since most of my familiarity with the hereditarian perspective is based on Jensen’s writings, but Jensen does not consider IQ to be biologically influenced directly by socially defined races, since that obviously isn’t possible. In The g Factor, Jensen describes his theory about this in terms of differences in intelligence between populations defined by geographical origin (The chapter is titled "population differences in g", not "race differences in g"), and he proposes that these differences exist because the genes whose frequency varies between these genetic clusters include some of the genes which influence intelligence. Since socially-defined races correlate with these genetic clusters, differences in gene distribution between these clusters will also manifest themselves between socially-defined races, for the same reason why genetic susceptibility to certain drugs varies between socially-defined races also.
If Rushton doesn’t understand the distinction between socially-defined races correlating with genetic clusters and the two being the exact same thing, then that’s his problem. I would find it acceptable for the explanation of the hereditarian view in the race and intelligence article to be based mostly on Jensen’s writings, and devote only a small amount of space to Rushton.
I know this topic is kind of tangential, but if we’re going to get anywhere with improving the race and intelligence article, I think it’s important for everyone to be on the same page with regard to what the hereditarian theory about this actually is.
Would you find it acceptable for me to include an updated version of the Cavalli-Sforza chart in this article, with the clusters labeled with the terms that Cavalli-Sforza uses in his study’s other chart of them, and also to summarize the Nature paper there? --Captain Occam (talk) 18:07, 26 October 2009 (UTC)
No, it says that genetic variation correlates with geographical distance separating populations, and that this corresponds poorly with the traditional concept of "races". Genetic variation is overall clinal (gradual) and increases with distance. Clustering (how many clusters you get) depends on how many markers you analyze.--Ramdrake (talk) 19:14, 26 October 2009 (UTC)
Yes, that's exactly what it says. What the paper says is that genetic variation is geographically distributed clinally, but that "race" concepts are also geographically distributed, so it's not correct to say that "race" is biologically meaningless. But it doesn't say that ""race" has any biolgical validity, because "races" are thought to be discrete groups, and biology teaches us that there are no discrete human groups. Alun (talk) 08:07, 27 October 2009 (UTC)
I’m going to quote the paper that Alun linked to here: “Clustering of individuals is correlated with geographic origin or ancestry. These clusters are also correlated with some traditional concepts of race, but the correlations are imperfect because genetic variation tends to be distributed in a continuous, overlapping fashion among populations. Therefore, ancestry, or even race, may in some cases prove useful in the biomedical setting, but direct assessment of disease-related genetic variation will ultimately yield more accurate and beneficial information.” The paper then goes on to explain how genetically-based susceptibility to certain drugs varies between socially-defined races, which would not be possible if they didn’t correlate with genetic clusters.
This is what Jensen and I are saying. Jensen also makes the same point that how many “races” humans can be divided into is more or less arbitrary; I can quote his explanation about this, if you like.
Can you answer my question about the changes I’m proposing to the race and genetics article? --Captain Occam (talk) 19:37, 26 October 2009 (UTC)
Varoon Arya has found several more papers that discuss this question, and linked to them here. According to these studies, how closely self-identified races correlate with genetic clusters varies from one ethnic group to another, but in most cases the correlation is fairly strong. Since even studies that are critical of self-identification acknowledge this correlation, that it exists appears to be the consensus among researchers in this area.
You need to just accept the fact that you’re wrong about this, and that the research in this area does not say what you’re claiming it does. You’re free to disagree with the professional literature about this topic if you like, but Wikipedia’s policy is that viewpoints need to be represented in the article in proportion to their coverage in reliable sources. That means this correlation needs to be discussed in the article.
I’ll give you and Alun a day or so to see whether you dispute the conclusion we’ve drawn about what the professional literature says about this, but this case seems pretty clear-cut. After that, I’ll edit the article to bring it more into line with what the research on this topic actually says. I’ll also fix the OR problems with the Cavalli-Sforza chart, if I decide to include it. --Captain Occam (talk) 05:12, 27 October 2009 (UTC)
Captain O, you are selectively quoting, and again trying to introduce OR and your own personal opinion. To claim that the Nature paper I link to supports Jensen, or any other racialist is just not supported by the source. It's a thoughtful article, but you choose to pretend that t only gives a single point of view. Look at the parts you did not highlight: "These clusters are also correlated with some traditional concepts of race, but the correlations are imperfect because genetic variation tends to be distributed in a continuous, overlapping fashion among populations.
"According to these studies, how closely self-identified races correlate with genetic clusters varies from one ethnic group to another"
  • These clusters are not "races". These clusters, at best are attempting to determine the number of ancestral gene pools that humans derive from. Even then there is a great deal of disagreement, with different groups using different models, and unsurprisingly coming up with different results.
  • These "clusters" vary dramatically depending upon the sample set being used, with different clusters produced depending upon which populations are sampled. There has been a great deal of criticism of the sampling strategies used. Many of these papers are based on sampling from self identified populations that are geographically distant, this in itself creates an artificial "discreteness" to the populations. Most anthroplogists agree that if sampling were done on a more continuous basis, then we would see far less clustering. That is a citable fact.
  • Clustering analysis also produces different results depending upon the type of genetic variation used in the analysis. When rapidly mutating elements such as Short tandem repeats are analysed the result is different than when slowly mutating elements are analysed such as Single nucleotide polymorphisms, still different results are produced when things like transposons are investigated (e.g. Alu insertions). None of these produce consistent results, and most scientists put it down to the complex demographic history of the human species.
  • The products of these analyses are relative, they depend upon comparison with the samples being used in the study, they do not represent ancestral groups that are reproducible when different sampling strategies, different genetic elements or different statistical treatments are used. They do not represent real tangible entities, and no one pretends that they do.
  • Most strikingly, when we look at mtDNA or Y chromosomes, they do not segregate by anything like what can be thought of as "race".
"You need to just accept the fact that you’re wrong about this" No, we need to represent what reliable sources say. As far as I can tell what you propose is to take a few selective quotes out of context and then declare that there is a consensus in the scientific community that "clusters are races" and then pretend that this answers all questions. This is not allowed in Wikipedia, we say what the sources say, we do not give our own (or Arthur Jensen's for that matter) interpretations of the sources. This is called a synthesis and is a form of OR. You are proposing a synthesis by pretending that you can synthesise Jensen's work with work that absolutely does not support he concept of race by stitching together out of context parts of a source.
I'm sorry to say that you have a habit of including WP:OR in articles, and of coming up with your own interpretations of sources and and including them. You cannot do this, you can only say what the source says. This is evident above where you unilaterally decide that Jensen is saying the same thing as the paper I link to, and when you choose to believe that the paper supports "race" concepts, when it says no such thing. I urge you to stop looking for a "smoking gun" which will "prove" you are right. There is no right and wrong, there is only a spectrum of expert opinion, all of which we need to include in the article. If experts can't agree on this, I fil to understand why you think Wikipedia should treat it as a closed issue.
Because you think that the Nature genetics article I cite supports a racialist point of view (admittedly only when selectively quoted), you want to include it. But this article was from a supplement published by Nature Genetics called Genetics for the human race. Here are some other quotes from the supplement.

The paradigms of human identity based on 'races' as biological constructs are being questioned in light of the preponderance of data on human genome sequence variation and reflect the need for a new explanatory framework and vision of humankind with different fundamental assumptions about biological groups that can accommodate new knowledge from a new generation of research. Changing the paradigm from 'race' to human genome variation

As those ancestral origins in many cases have a correlation, albeit often imprecise, with self-identified race or ethnicity, it is not strictly true that race or ethnicity has no biological connection. It must be emphasized, however, that the connection is generally quite blurry because of multiple other nongenetic connotations of race, the lack of defined boundaries between populations and the fact that many individuals have ancestors from multiple regions of the world. What we do and don't know about 'race', 'ethnicity', genetics and health at the dawn of the genome era

'Race' is applied in formal taxonomy to variation below the species level. In traditional approaches, substantively morphologically distinct populations or collections of populations occupying a section of a species range are called subspecies and given a three-part Latin name. In current systematic practice, the designation 'subspecies' is used to indicate an objective degree of microevolutionary divergence. Do any of the human groups called 'races', including those from traditional anthropology, meet this latter criterion?...
Y-chromosome and mitochondrial DNA genealogies are especially interesting because they demonstrate the lack of concordance of lineages with morphology15 and facilitate a phylogenetic analysis. Individuals with the same morphology do not necessarily cluster with each other by lineage, and a given lineage does not include only individuals with the same trait complex (or 'racial type')...
'Race' is a legitimate taxonomic concept that works for chimpanzees but does not apply to humans (at this time). The nonexistence of 'races' or subspecies in modern humans does not preclude substantial genetic variation that may be localized to regions or populations.Conceptualizing human variation

and some others

Because biological information that is captured by common notions of race varies depending on how race is defined, studies designed to identify genetic factors associated with health-related traits might need to carefully explain how race was defined and used76, 77. Different notions of race and ancestry might be useful depending on the circumstance. However, the information about geographical ancestry captured by concepts of race is, in general, less than that obtained by making ancestry inferences from explicit genetic data (such as genome-wide SNPs, AIMs), and, for much of humanity (for example, Hispanics, Asian Indians), race is not a meaningful descriptor of biological ancestry. Deconstructing the relationship between genetics and race

Statement 1: We believe that there is no scientific basis for any claim that the pattern of human genetic variation supports hierarchically organized categories of race and ethnicity
Statement 2: We recognize that individuals of two different geographically defined human populations are more likely to differ at any given site in the genome than are two individuals of the same geographically defined population
Statement 3: We urge those who use genetic information to reconstruct an individual's geographic ancestry to present results within the broader context of an individual's overall ancestry
Statement 4: We recognize that racial and ethnic categories are created and maintained within sociopolitical contexts and have shifted in meaning over time
Statement 5: We caution against making the naive leap to a genetic explanation for group differences in complex traits, especially for human behavioral traits such as IQ scores, tendency towards violence, and degree of athleticism
Statement 6: We encourage all researchers who use racial or ethnic categories to describe how individual samples are assigned category labels, to explain why samples with such labels were included in the study, and to state whether the racial or ethnic categories are research variables
Statement 7: We discourage the use of race as a proxy for biological similarity and support efforts to minimize the use of the categories of race and ethnicity in clinical medicine, maintaining focus on the individual rather than the group
The ethics of characterizing difference: guiding principles on using racial categories in human genetics

So you can't pretend that it is somehow a settled case within the genetics community that "race" is real. Alun (talk) 08:07, 27 October 2009 (UTC)
“So you can't pretend that it is somehow a settled case within the genetics community that "race" is real.”
I’m not claiming that. All that I’m claiming is that a correlation exists between socially-defined races and genetic clusters, and none of the sources you’ve cited dispute that. In fact, one of the quotes you’ve posted even acknowledges this, and two of the papers that VA linked to are specifically about this correlation. It is not synthesis when the conclusion being presented is stated explicitly by the source material, and this conclusion is the also exclusive focus of several of the sources.
The only ideas that these sources categorically reject are the idea that socially-defined races are identical to genetic clusters, and that races exist as discrete hierarchical categories. I am not suggesting that the article include either of these notions, and to claim that I am is to attack a strawman.
What I’m suggesting is that we mention existence of this correlation in the article, along with the qualifications about its limitations. To omit any mention of this correlation’s existence is to omit an important point being made by the source material. --Captain Occam (talk) 08:28, 27 October 2009 (UTC)
I’m not sure if I need to be any clearer about this, but I’d like to make sure you understand the distinction between these two ideas, because you seem to be equating them even though they aren’t the same.
The first: “Race exists as a biological or genetic entity.”
The second: “Race is a social construct that correlates with genetic clusters based on geographical ancestry.”
The sources you’ve cited reject the first idea, but not the second, and one of them specifically mentions that the second idea is correct. The Nature paper also mentions that second idea is true, and two of the papers that VA linked to are devoted to this fact. This second idea is what I want to include in the article. Please don’t assume that I want to include the first idea rather than the second, because I don’t. --Captain Occam (talk) 09:00, 27 October 2009 (UTC)
Rosenberg's data clearly show that a great many individuals belong to multiple genetic "clusters"
That's not what they say. What they say is that human genetic variation is geographically structured. And they say that some "race" concepts are geographically structured, therefore it is not correct to say that "race" is biologically meaningless. But neither do they say that "race" is biologically meaningful. Now if you want to accurately say what the articles say, which is that human genetic diversity is geographically structured, but also distributed in a continuous manner, then I am obviously not opposed to that. If you want to say that genetic structure weakly correlates to some "race" concepts, then I'm not opposed to that. But if you want to say that "clusters" from clustering analyses represent real biological entities, then I'll have to dispute that, because none of the sources say this. Clusters are a statistical way of measuring the extent that the variation in an individual conforms to a "typical" person within a cluster. But it's often the case that even people in different clusters are often more genetically similar to each other than to people in their "own" cluster. It's also clear that a great many, possibly a majority, of individual people belong to multiple clusters, as the work of Noah Rosenberg et al. demonstrates (see image), which more or less makes us all multi-racial. Furthermore these clusters are relative and change depending on sampling strategy (which populations and how), type of genetic element measured, and the assumptions used in the statistical analysis (e.g. see Edwards (2003) "Lewontin's Fallacy").
Most geneticists now think that relying on self identified "race" as a meaningful measure of genetic similarity between individuals (what Jensen does) is not scientifically valid because individuals from the same group can be more different to each other than they are to individuals from different groups. This is even more obvious when one is dealing wit admixed groups such as the "white" and "black" populations of the USA, as Jensen is. See statements five: "We caution against making the naive leap to a genetic explanation for group differences in complex traits", six "We encourage all researchers who use racial or ethnic categories to describe how individual samples are assigned category labels, to explain why samples with such labels were included in the study, and to state whether the racial or ethnic categories are research variables" and seven "We discourage the use of race as a proxy for biological similarity and support efforts to minimize the use of the categories of race and ethnicity in clinical medicine, maintaining focus on the individual rather than the group". I fact, far from Jensen doing something that is supported by genetics, he is doing something that geneticists are more and more rejecting as biologically invalid. Geneticists are developing a range of ways to analyse samples that do not rely on misleading self categorizations that can confound medical research. It's just a shame that psychologists such as Jensen cling to the categories that are increasingly being rejected by genetic and medical researchers. Our species evolutionary history cannot be adequately explained by simplistic and uninformed nineteenth century concepts of "race" that people like Jensen rely on.
I'm all for including what sources say, I'm not for including what sources don't say. It is only your opinion that these sources support Jensen, and many of them explicitly state that "race" is a misleading concept when dealing with human genetic diversity, even if the idea of "race" is not totally biologically meaningless, and even if it weakly correlates with some types of clustering analyses, neither does that make "race" a valid category for the study of group differences in medical or behavioural sciences, and that is what the majority of the sources state. Again you're taking what is being said out of context, and you seem to believe that correlation is the same as cause and effect, whereas that is possibly the greatest scientific fallacy. Alun (talk) 12:19, 27 October 2009 (UTC)
All right, I’ve fixed the original research problems with the Cavalli-Sforza image, and added it back to the article. This image now uses the same names for these groups that his original study does. And now that I’ve found Cavalli-Sforza’s original study online, this chart no longer uses any information from Jensen.
As I said before, you’re free to add the cladogram based on the Nature study if you like, since I don’t have any problems with that image, but I think it should be in addition to the one that I’ve just added. As long as we’re describing Cavalli-Sforza’s results, I think we ought to include a visual representation of his data.
As long as nobody has any problems with the new version of this image, I’ll be adding some information about the correlation between genetic clusters and socially-defined races shortly. --Captain Occam (talk) 20:04, 27 October 2009 (UTC)
File:Cavalli-sforza dendogram.jpg
visual representation of genetic distances according to Cavalli Sforza 1997 constructed from the table below
Percentage genetic distances among major continents based on 120 classical polymorphisms
Africa Oceania East Asia Europe
Oceania 24.7
East Asia 20.6 10
Europe 16.6 13.5 9.7
America 22.6 14.6 8.9 9.5
File:Li-2008-genetic-cladogram.png
Tree of 51 populations from Li et al 2008
I see that Muntuwandi has just removed this image again, with the explanation “caption says ‘one of eight genetic groups to which all human populations belong’, not what Cavalli-Sforza says”. Earlier I linked to the page of his study where the groups on a similar chart are labeled, and there are eight of them; so unless it’s original research to state how many groups his chart shows, this number is supported by the source. Showing eight groups rather than six is one of the modifications that I made as part of citing this directly to Cavalli-Sforza rather than Jensen.
Muntuwandi, if you still have a problem with this chart, you’ll need to be more specific about what needs to be changed about it. I’ve followed all of the instructions that Alun and Ramdrake have provided about how to remove the original research from it. --Captain Occam (talk) 20:57, 27 October 2009 (UTC)
Yes I oppose the wording because it gives the impression that these 42 groups are fixed entities and all human populations "must" belong to one. There are several classification schemes. Hapmap has got one, HGDP also has another. In fact Worldwide human relationships inferred from genome-wide patterns of variation has constructed another dendogram based on 51 populations. Wapondaponda (talk) 21:32, 27 October 2009 (UTC)
You aren’t making sense. If the data is organized the way Cavalli-Sforza organizes it, then all 42 of the populations on this chart must belong to one of the eight groups, because the 42 populations are divided up this way. This is simply describing the way Cavalli-Sforza’s data is structured, so if you have a problem with that, what you’re really having a problem with is his study itself. The reason why we’re using his chart rather than someone else’s is because Cavalli-Sforza’s study is being discussed in that section of the article, so a visual representation of his data is relevant there in a way that charts of other people’s data wouldn’t be. And we can’t depict Cavalli-Sforza’s data any differently than how he depicts it himself, because as other editors have pointed out for earlier version of this chart, depicting it any other way is original research.
The reason his results are organized this way is because his study only examined populations which lived in the areas in question prior to 1492, so they don’t show the mixing between populations which has occurred during the time since then. I would not be opposed to including a sentence in the article explaining this. I added an explanation of this earlier, but Alun removed it, so if you want to add something like that back you’ll need to justify it to him. --Captain Occam (talk) 22:10, 27 October 2009 (UTC)

image arbitrary section break

I find the continued obsession with Cavalli-Sforza's image quite disappointing. There is a wealth of recent data concerning the global structure of human genetic variation, yet all this seems to be ignored. Furthermore it seems that some editors have ignored the underlying meaning behind Cavalli-Sforza's cladogram and instead have tried to misuse it by focusing on how CS grouped his populations. The underlying data used was simply a genetic distance matrix between populations, which I think is much more informative than the dendogram, because one has to guess lengths with the dendogram. This article could do with a lot of work to help improve the quality of information, but instead the entire discussion is about one image. So far the current controversy about the image has generated no new information that isn't already present in Wikipedia. So are the proponents of this image really adding any value. I tend to think not. Wapondaponda (talk) 21:27, 27 October 2009 (UTC)

As I pointed out above, if we're going to replace this image with a different cladogram or dendogram, the article will need to be restructured. The article currently devotes more space to Cavalli-Sforza's study than any other single study with the exception of the Herschfeld one, which has its own diagram also. If we're going to devote this much space to Cavalli-Sforza’s results, we should represent them visually also.
So far, no one has attempted to argue with this point. The main thing that was being discussed here was the original research problems with this image, which I think have been corrected now. --Captain Occam (talk) 22:17, 27 October 2009 (UTC)

The Cavalli-Sforza image is directly based on the genetic distance matrix in the article, so for consistency I believe it would make sense to use it rather than the tree for 42 populations for which the genetic distance matrix is not currently available. The image by Li et al is more sophisticated because it shows the relationship between genetic distance and geographic distance. The first "cluster" in red is Africa, followed by the middle east, Europe, Central/South Asia, East Asia, the Americas and finally Oceania. Wapondaponda (talk) 22:43, 27 October 2009 (UTC)

Hang on, you aren't being clear. Which image are you saying you think the article should use? The Cavalli-Sforza one that you just posted on this discussion page, the one that's currently in the article which is also by Cavalli-Sforza, or the one by Li et al? --Captain Occam (talk) 22:52, 27 October 2009 (UTC)
I've just noticed that both of the images you uploaded here are copyrighted. The image that's currently in the article is one that I've created myself and released under a creative commons license, although it's an accurate representation of Cavalli-Sforza's original version of this chart. If we have a choice between a free image and one that's copyrighted, the free image is always preferable. --Captain Occam (talk) 22:56, 27 October 2009 (UTC)
File:HLA-DRB1.png
Dendogram of HLA-DRB1
File:HLA-A-B-DRB1.png
Dendogram of HLA-A-B-DRB1
I am not sure of the actual copyright status of your image either. I am not concerned about the images because I don't believe they are really that important. I wanted to illustrate that there have been numerous models of genetic distance and so no one model is the "right model". I think the matrix does a far better job because it has actual numbers as opposed to guessing the length of a branch. The Cavalli-Sforza image is based on the genetic distances listed in the table, so one can look at the image to get a visual representation of the table. Whereas the image of 42 populations is not quite as informative as its genetic distance table is not currently available. The text in the subsection "genetic distance" is primarily based on the table of 5 populations and not the information from 42 populations. Consequently I see it as the most relevant image. The Li et al. 2008 is more recent and is based on 650,000 polymorphisms compared with the 120 polymorphisms surveyed by Cavalli-Sforza. So we should be thinking in the lines of trying to get the most updated information. Wapondaponda (talk) 23:36, 27 October 2009 (UTC)
As I said before, I don’t have a problem with more than one diagram of this being included in the article, in order to show how multiple models of the data are possible. If you want to add another image to the article in addition to the one that’s currently there, you’re welcome to do so. However, as per Wikipedia’s image policy, the article shouldn’t use an image that’s copyrighted if a free alternative exists. --Captain Occam (talk) 23:45, 27 October 2009 (UTC)
I've just added some of the additional information that I'd discussed with Alun and Ramdrake. Upon closer examination, it looks like the correlation between self-identified races and genetic clusters is already discussed in the "clusters controversy" section, so the main thing that needed to be added was an explanation of the possible medical applications of this information.
Alun and Ramdrake have already said that they wouldn't mind the article including this information (at least, I think Ramdrake's comment was referring to this), so hopefully they won't mind this section being added. However, they can let me know if there's anything in it that they think needs to be re-worded. --Captain Occam (talk) 00:55, 28 October 2009 (UTC)

Personally, I find all of these images very informative. Why can't we simply have a gallery at the bottom of the page? Provided that the images are properly referenced and relevant, I think it would be a great addition to the article. --Aryaman (talk) 01:03, 28 October 2009 (UTC)

I have added some HLA dendograms from hajjej et al. 2006. This is just to illustrate that when a particular genetic system is analyzed, the dendograms don't necessarily conform to "racial" types. The French cluster with San Bushmen of Africa, Sardinians also cluster with the San and the Greeks cluster with the Rimaibe of Burkina Faso and Ethiopians. Wapondaponda (talk) 16:42, 28 October 2009 (UTC)
Muntuwandi, Wikipedia’s image policy doesn’t allow you to replace a free image in an article with a copyrighted one like this. If you want to include that dendogram, you’ll have to make a creative commons version of it. And as has been pointed out here multiple times, any additional images you want to add will need to be in addition to the current one, not instead of it. --Captain Occam (talk) 17:22, 28 October 2009 (UTC)
Captain, you're right on the first one, but I don't think anyone agreed on your second requirement. It's a personal requirement of yours and doesn't necessarily have consensus.--Ramdrake (talk) 17:40, 28 October 2009 (UTC)
Varoon Arya agreed with it, and nobody else (other thsn Muntuwandi) expressed an opinion.
Right now, the only two free images we have of this are the current one (which is from Cavalli-Sforza), and the one from Jorde and Wooding that Alun posted. If you (or anyone else) wants to replace the current image with that one, you'll need to make a case for a chart depicting Jorde and Wooding's data being more relevant than one depicting Cavalli-Sforza's data, when it's going in a section of the article that's about Cavalli-Sforza's study. So far, nobody has tried to dispute the idea that in a section of the article about Cavalli-Sforza's study, an image of Cavalli-Sforza's data is more relevant than one of someone else's data. --Captain Occam (talk) 17:54, 28 October 2009 (UTC)
As I have mentioned earlier, the image that you propose is "free" only because you, the "author", say that it is free. So far I have not seen any independent editor verify that your image is free. Looking at it, it is almost exactly the same as the image from Cavalli-Sforza's book, with the exception that there is some color. My initial impression that this is a derivative work, and that such type of images are frequently deleted at Wikipedia:Possibly unfree files. It is just that nobody has yet brought a challenge against the copyright status of this image. I don't necessarily agree with wikipedia's policy on such generic two-dimensional images, but according to the policy, I suspect that your image is in violation. Wapondaponda (talk) 19:18, 28 October 2009 (UTC)
As far as I can tell from the policy, while simply copying charts or graphs 1:1 is a violation of copyright, recreating charts or graphs to reflect the original data is not. In this case, data refers to the groups (and their names), their relative positioning and the distances involved, which have a numeric value relative to the key provided. If Occam's image is a copy/vio, then literally thousands of images in WP would have been eliminated long ago. This is an original work if Occam releases it as such. --Aryaman (talk) 19:30, 28 October 2009 (UTC)
Well I don't necessarily agree with the policy. Some files that I have created have been deleted because of the same policy. However, Occam hasn't demonstrated that he has the data necessary to recreate the image, since he admits that he hasn't read Cavalli-Sforza's publication, how can he know what the correct length of each branch is. He has not provided the genetic distance matrix used to construct the tree and has not provided any methodology of how he constructed the tree. Unless one finds someone who is sympathetic at commons, I have seen more sophisticated derivative works deleted. Wapondaponda (talk) 19:53, 28 October 2009 (UTC)
Of course, if nothing else, it has been repeatedly proven that, if a person is set on being a dick about something, they can get just about anything removed from Wikipedia, including the results of perfectly sourced reliable academic literature. So, I suppose the question here is whether any of the present editors are actually willing to go to such lengths just to get an otherwise acceptable image removed from this article. I'm certainly not. --Aryaman (talk) 20:05, 28 October 2009 (UTC)
"However, Occam hasn't demonstrated that he has the data necessary to recreate the image, since he admits that he hasn't read Cavalli-Sforza's publication, how can he know what the correct length of each branch is."
I hadn't read it when I made the original version of this image in July. If you've looked at my more recent comments here, you'll see that I've read it recently now that I've found it online. Although since (as I pointed out before) the diagram in Jensen's book is identical to Cavalli-Sforza's original version, when I found the original version it wasn't necessary for me to change very much about this image.
So far, you’ve spent around two weeks trying to get this image removed from the article. You’ve attacked it on the basis of Jensen’s reliability as a source (until I sourced it to the original authors), on the basis of my motives, on the basis of whether it’s misleading, on the basis of whether it’s original research, on the basis whether there’s a better image to replace it with, and now it looks like your next tactic is going to be trying to get it removed based on copyright violations. Your desire to get rid of this image obviously isn’t based on any specific Wikipedia policy, because if it were, you wouldn’t start citing a different policy as a reason to get rid of it every time your previous argument based on a different policy has failed.
And in the meantime, you’ve taken up around half of the entire discussion page for this article with your efforts to remove a single image from it. Why is this so important to you? --Captain Occam (talk) 20:25, 28 October 2009 (UTC)
I think you should be asking yourself that question. Wapondaponda (talk) 20:36, 28 October 2009 (UTC)
If you're asking why I haven't just submitted to your dogged attempts to remove the image, there are two reasons why not. The first is that I never like to submit to logic that’s as obviously faulty as yours has been. (Even if you think the copyright violation claim is going to be valid, I assume you’re aware that the earlier arguments I listed that you were previously using aren’t valid, since otherwise you wouldn’t have stopped using them.) And the second reason is that I spent several hours creating this image. When I’ve put a fair amount of work into something, I’m not going to be happy with it being removed for a reason that’s so obviously half-baked.
But answer the question in your own case now. There are probably thousands of images at Wikipedia that have problems much more significant than this one does. Why have you devoted the past two weeks, and half of this discussion page, to trying to remove this image in particular? --Captain Occam (talk) 20:51, 28 October 2009 (UTC)

(outdent) We can solve this problem by using C-S's simplified 9-cluster tree, for which we have the Fst Distance Matrix data, and for which we can create a useable, free image. The 9-cluster tree accounts for a total of 80 different trees, and is thus rather representative of C-S's work. It has the same groups as Occam's image, except that it splits Caucasians into European and Non-European Caucasoid. I think this is a realistic compromise given Muntuwandi's demands regarding the use of such an image. Shall I recreate the tree, then? --Aryaman (talk) 13:58, 29 October 2009 (UTC) P.S.: I could easily integrate the Distance Matrix Data into the image, say, on the bottom half, with the exact numbers (which would represent an improvement in the article as well). Actually, I'll just go ahead and do this and then post the image when I'm done. --Aryaman (talk) 14:08, 29 October 2009 (UTC)

Which image from Cavalli-Sforza is this? I’m not sure I know which one you mean based on your description, so it would be useful if you could link to it.
Muntuwandi’s newest problem that he’s brought up with the current image is the fact that it’s a “derivative work” from Cavalli-Sforza’s original chart, as you can see from his deletion nomination at Wikimedia commons. If you create a free image based on another of Cavalli-Sforza’s diagrams, it seems like the same problem would apply to that also. Of course, since Muntuwandi’s underlying problem here appears to be with my motives for editing this article, perhaps it would satisfy him for the image used by the article to be one that was uploaded by you rather than me. --Captain Occam (talk) 14:17, 29 October 2009 (UTC)
The copyright policy page specifically states that we can recreate a chart or graph from the original data. In this case, we have C-S's original Fst Data Matrix for the 9-cluster tree which has the same 8 groups on your image, with the exception of splitting European and Non-Europeans into 2 groups (as there is a 154.7 ± 29 distance between them). This chart is in the same book which has been linked to several times, and is on pg. 80 of that work. I am creating the tree along with the matrix, and the tree will be wholly original. I know that it's a pity to use the 9-cluster tree seeing as you've already done that very nice image with the 42 populations groups into 8 clusters, but this seems like a solid compromise. --Aryaman (talk) 14:33, 29 October 2009 (UTC)
The purpose of the image was to illustrate the genetic history of humans, ie the timing and migratory routes that humans took after the "out of Africa" migration. Secondly it is to illustrate the principle of isolation by distance, that is populations that live near each other are more similar genetically than populations that live further apart. How Cavalli-Sforza grouped the populations shouldn't be connected to race because the author said so. I believe that if Occam and Aryaman were truly interested in what CS wrote, then you would be editing human genetic variation and not this article. I am opposed to any dendogram that gives the impression that it is "The correct dendogram".
The problem with using CS's nine cluster model is that it is disconnected from the text. Much of the current text in the subsection Race_and_genetics#Genetic_distance is from Cavalli-Sforza's other book Genes Peoples and Languages and not History and Geography of Human Genes. History and Geography of Human Genes is large book, over 1,000 pages, and is filled with a lot of technical detail. Genes Peoples and Languages(GPL) is based on the same information as History and Geography of Human Genes(HGHG), but it is more of a popular science book aimed at a general audience. Consequently it is more compact and less technical. The 42 populations and nine clusters in HGHG were reduced to the five continental populations in GPL. So the text in the article has been analyzed from the perspective of the 5 continental populations. If we are going to have a dendogram, then it should be consistent with the text. Wapondaponda (talk) 14:50, 29 October 2009 (UTC)
I don't see this as a problem at all, Muntuwandi. In fact, the data matrix I'm looking at gives a more detailed breakdown of the exact same data in the table currently in the article, which can only be seen as an improvement. It's not like C-S's two works contradict each other; they are entirely complementary. C-S states that the 9-cluster tree is in agreement with the majority (80) of trees, and the data it presents (especially when combined with the matrix) provide a very good illustration of exactly the things you mention. I'm not connecting any of this with "race", and the image won't, either, so no worries there. If that doesn't satisfy you, then I really don't understand your concerns. --Aryaman (talk) 15:02, 29 October 2009 (UTC)
I agree that a detailed breakdown is more informative, but is valuable at human genetic variation because the scope of human genetic variation is not restricted to race. This article is about race or the non-existence of race, and at some level the text and images should relate to race or the non-existence of race. I have mentioned below, that there is a need to merge some information from the two articles as there is a lot of overlap. Wapondaponda (talk) 15:12, 29 October 2009 (UTC)
Muntuwandi, while the article is about Race and Genetics, the section under discussion is about Genetic Distance. What may or may not happen to this section in the future is one thing. But seeing as half of this section as it currently stands discusses Cavalli-Sforza, providing a relevant image is both fully warranted and helpful to the reader. The article already has a chart presenting one (highly simplified) version of the distance matrix. Giving a better version of it seems like a very good idea to me. --Aryaman (talk) 15:24, 29 October 2009 (UTC)
Muntuwandi, at this point you’ve just rehashing arguments you’ve made before. Since everything in your last comment has been refuted in detail earlier on this page, there’s no point in explaining this to you again.
VA, I approve of your earlier suggestion about making an image gallery, and including several of the images there. If the one you’re making will be what goes in the part of the article discussing Cavalli-Sforza’s study, the image gallery could contain my image along with the one that Alun posted.
Even if your image ends up being an improvement over mine because it includes the Fst data, I don’t think Muntuwandi’s argument about my image being a copyright violation is any good. If it were, this would apply to every image that’s ever been “redrawn” from someone else’s illustration, which is a pretty commonplace thing to see both in textbooks and at Wikipedia. So we shouldn’t need to listen to him about it being necessary to leave my image out of the article entirely. The opinion of a single editor doesn’t count for much, and he also can’t edit war with us over this, because the terms under which he was unbanned limit him to one revert per day. --Captain Occam (talk) 15:08, 29 October 2009 (UTC)
To summarize my arguments, the current text and genetic distance matrix is based on a 5 continent model. I therefore support an image based on the 5 continents. I have no prejudice against 7, 9, 42 or 51 population models. It is just that they have not been discussed or analyzed in the text. If they were, I would not oppose them either because regardless of which model is used the outcome is still the same ie genetic diversity decreases with distance from Africa and genetic distance is proportional to geographic distance. I prefer the 5 continent model because it is simple and easily accessible to a non technical audience. I would support a more complex analysis for the main article human genetic variation. Some links to Genes peoples and languages include
I propose using this book to be the foundation for the section in question because it is accessible to a general audience, as per the review, and the subject of race is discussed extensively. Wapondaponda (talk) 15:41, 29 October 2009 (UTC)
I’m not sure I understand what you’re suggesting. Are you saying that rather than just removing or replacing the image, you now think we should rewrite this entire section of the article? --Captain Occam (talk) 15:51, 29 October 2009 (UTC)
If an image is to be used I propose creating one based on the 5 continents. However if one wants to use an image based on 7 clusters( Li et al. 2008) or 9 clusters (Cavalli-Sforza), then it only make sense to describe the clusters and the evolutionary relationships between them. Images improve articles. However dendograms are visual representations of data so an image without data and analysis cannot be adequately interpreted. Wapondaponda (talk) 16:11, 29 October 2009 (UTC)

(outdent) I'm not exactly sure what you mean, but let me finish this image and then let's see how it fits with the text. --Aryaman (talk) 16:13, 29 October 2009 (UTC)


Images second arbitrary break

I'll try to clarify. Images are typically used to support text. It is best practice to have an image that is directly related to its adjacent text. The nine cluster/42 population image is from one book, and the adjacent text is based on information from another book. Though the two books are by the same author and the material covered in the two books is related, they are different books. So far I have only seen interest in the image but no regard whatsoever to the image's relationship to the adjacent text. Up until now, I had not seen any interest in the underlying data that was used to generate the image. I hope you can understand why I have been concerned about this image, because I don't feel it is appropriate to use an image in a scientific article simply because it looks appealing. Wapondaponda (talk) 20:06, 29 October 2009 (UTC)

The image I have just uploaded is of 9 clusters based on the 42 populations discussed in the article. The numbers regarding genetic distance which are given in the article also appear in the table image, though they are in a more precise form and are supplemented by those which make up the remainder of the matrix. I have also given the individual node distances, which improves the overall value of the image itself. The information presented in the image is directly related to the information given in the text, though some minor improvements could easily be made to the text itself to make this more apparent. That "this is from a different book" is an unconvincing argument, as the authors have presented and discussed this data set several times (1988, 1994 and 1997 that I know of). Occam's image is directly related to the text of the article as it stands, though it lacks the distance matrix (which, strictly speaking, we really don't need). The new image I've posted is the summary linkage tree of the same 42 population group broken down into clusters (the same ones Occam's image has, but with Caucasoids being split into European and Non-European) with the attendant distance matrix. --Aryaman (talk) 21:27, 29 October 2009 (UTC)

Nice image, if this was a beauty contest, I would have no arguments. Here are some quotes from the text

A study by Cavalli-Sforza provides information on genetic distances between 42 native populations around the world, on the basis of 120 blood polymorphisms. For the purpose of simplicity, some admixed populations such as those of North Africa and West Asia were omitted from the analysis.

First contradiction, in the 5 continent model, admixed population such as North Africans and South Asians are omitted, whereas in the 9 cluster/42 population model they are included as Non-European Caucasoid and South Asians etc. Does anyone care?

The largest genetic distance between any two continents is between Africa and Oceania at 24.7. Based on physical appearance this may be counterintuitive, since Australians and New Guineans resemble Africans with dark skin and sometimes frizzy hair. This resemblance is probably an example of convergent evolution. This large figure for genetic distance reflects the relatively long isolation of Australia and New Guinea since the end of the last glacial maximum when the continent was further isolated from mainland Asia due to rising sea levels.

The next largest genetic distance is between Africa and the Americas at 22.6%. This is expected since the longest geographic distance by land is between Africa and South America. The shortest genetic distance at 8.9% is between Asia and the Americas indicating a more recent separation.

Africa is the most divergent continent, with all other groups being more related to each other than to Sub-Saharan Africans. This is expected in accordance with the Recent single-origin hypothesis. The population most closely related to Africans are Europeans. However, this short distance indicates significant interaction and gene exchange between Africa and Europe in the not so distant past. Europe has a genetic variation in general about three times less than that of other continents. Even though Europeans are the non-African group closest to Africans, Europeans are most closely related to East Asians.

I have highlighted the text in bold that specifically refers to the 5 continent model. As you can see, we are mixing apples and oranges when we try to mix both models, which results in contradictory statements, see WP:SYNTH. My suggestion, we should stick to only one population model. The text, the image and the data table should be based on a single model. Wapondaponda (talk) 05:06, 30 October 2009 (UTC)


Are you saying this is over whether the data is presented as "24.7" or "2472.0 ± 536", as "22.6" or "2261.4 ± 434"? As I mentioned, the text could easily be tweaked to reflect the data as presented in the image. Or if that's not your point, what then? --Aryaman (talk) 10:20, 30 October 2009 (UTC)
That is my point, if you would like to use the image and its corresponding Fst matrix, then might aswell rewright the whole text and remove all references to the previous Fst matrix, of course all info should be sourced from History and Geography of Human Genes and WP:NOR. You have selected information that is similar, the genetic distance between Africa and Oceania is the same in both matrices. However in the first Matrix Europe has the shortest genetic distance to Africa, whereas in the second Matrix, North African or Non-European Caucasoid have the shortest distance. In the first Matrix, the shortest genetic distance is between Asia and the Americas, whereas I have seen values much smaller in the second matrix. As a compromise, I suggest keeping the five population model intact, and if you like you can write up text for the 9 cluster/42 population model, and this can go next to the 5 continent analysis. Wapondaponda (talk) 14:58, 30 October 2009 (UTC)
Well, I thought it was clear that by including this matrix in the image, the old table matrix becomes superfluous. The data set is better all around, and it summarizes the trees of 80 out of 120 polymorphisms, which is a respectable majority. Please explain why you think the "5 continent model" is superior. How I see it, anyone reading this article is going to recognize these 9 clusters and know where they are generally located in the world. By the way, I don't see any problem in using both sources. Of course, we can't use data from two or more books to draw new and original conclusions, but we can certainly reference more than one work in this section. Don't you agree? --Aryaman (talk) 15:56, 30 October 2009 (UTC)
I wouldn't say the data is better, it is the same data, there is just more information with 9 clusters than with five. The reason why Cavalli-Sforza later opted to use 5 clusters rather than 9 was for clarity and simplicity. It appears that 5 population model was published after the 9 population model. The 5 populations selected are separated by large distances and therefore the patterns in genetic variation are clearer. The 5 population model also implicitly accounts for the 9 population model because all populations that are geographically intermediate are also genetically intermediate as well. In general Cavalli-Sforza's publications are quite old, however I still support their inclusion only because the content is much more accessible to a general audience than other publications. We should bear in mind that there are a lot of new studies that deal with human genetic variation that may be more accurate. Wapondaponda (talk) 16:57, 30 October 2009 (UTC)
When I wrote "better", I meant that the data, both as a set and individually, is more precise, giving the margin or error for each figure. But I think we agree on that point. I had a different interpretation of the 5 continent model, the main reason being that it allows statements specifically regarding "continents" to be made, shifting the focus away from "populations". (Comparing "Africa" and "Asia" seems somehow less likely to upset those who are only half-listening than comparing "Africans" and "Asians".) But that's merely my own conjecture. Whether it gives a clearer result is a matter of interpretation, I'd say. I think we can both agree that the 42 cluster tree could be somewhat confusing to the average reader - especially if it appears without some kind of grouping (as done by Occam through the use of color). I really do think the 5 cluster model takes things down one notch too far. I mean, the data could also be boiled down to an African/Non-African dichotomy or any other such dichotomy (and Cavalli-Sforza does this, too), but then it would be perfectly clear that the reduction has gone too far. Finding a balance is necessary. It is my opinion that the 9 cluster model is the best one for this article. I emphasize "opinion", because there is no way to determine this objectively. Making appeals to whether one or the other may suggest support for "racist" ideology is something which really needs to be put to rest. Let's face it: people decide whether they want to be racist or not long before they turn to science looking for "proof" for their POV. In my opinion, this really is a question of editorial preference. --Aryaman (talk) 17:51, 30 October 2009 (UTC)
Well to an outsider, it may seem simply to be an issue of "editorial preference". The five continent model has been on wikipedia for over two years, I am surprised that it has lasted this long. Though it is simplistic, I have seen some wikipedians have trouble with interpreting it [2]. Going through the most recent studies, it is clear that patterns of genetic diversity are more complex than 5 continent model, nonetheless it is still fundamentally sound. Most notably, the region around the Indian subcontinent is second to Africa in genetic diversity. The problem with the 9 continent model is that the data is not discussed in significant detail in the book, whereas the 5 continent model is discussed extensively (see [3]. If you are able to find enough analysis in the book, then it would be worth considering. So far, I haven't seen much. Wapondaponda (talk) 21:33, 30 October 2009 (UTC)
The link you provided is interesting. I think that editor, and perhaps other random visitors as well, would find the table more informative if it were to appear along with a linkage tree as in the new image I created. One additional factor to take into consideration is the reception of Cavalli-Sforza's work and how it is used in the wider discussion. As noted before, Jensen uses both the 42 cluster as well as the 9 cluster model (for example, in his 2004 interview with Miele). Other authors mix their terminology when referring to the work of Cavalli-Sforza, referring to both populations and geographic locations in one breath (Barbujani & Excoffier, 1999:31). Personally, I don't find this kind of use problematic; I think people are smart enough to figure out that "the Americas" is the geographic location coordinated with the "Amerind" or "Amerindian" population, and vice versa. I find the 9 cluster tree superior because a population such as "Non-European Caucasoid" is spread over a large geographic area, including North Africa, West Asia, Central Asia and the Indian subcontinent. This can be confusing in the 5 continent model. --Aryaman (talk) 18:01, 31 October 2009 (UTC)

Muntuwandi, why have you removed the image again? We're in the middle of discussing this, and I thought we were making progress towards some kind of agreement. Can you please explain your edits, particularly this one? --Aryaman (talk) 07:28, 2 November 2009 (UTC)

I've just undone his changes. If he wants to make this kind of change to the article, he'll need to justify it here first.
I think it's worth pointing out that I've brought up this behavior from him on his userpage twice, here and here. He's recently violated two of the conditions under which he was unbanned, and my bringing this up with him there doesn't appear to have helped anything, so I'd recommend going to AN/I about this behavior if it continues. --Captain Occam (talk) 10:32, 2 November 2009 (UTC)
I believe we have a tentative agreement that if you want to include information on the 9 population model, then you can do so provided that we do not mix up information from the two models. I believe that we agreed that you or Occam should write up text on the 9/42 population model that is consistent with the source, History and Geography of Human Genes. I removed the 42 population image because there is no discussion of the image in the reference, Genes, Peoples and Languages which deals with the 5 population model. As a result the image remains "unsourced". I understand than I have quite different opinion or philosophy on race related subjects than Aryaman or Occam. But this shouldn't be a reason to revert good faith and non-controversial edits such as what Occam has done. We should not be reverting out of spite. So I propose that we restore the 5 population model that I had contributed to, you or Occam can write up something on the 9/42 population model, we can include both models separately as they are based on different sources. Other members of the community can then decide if they only want one of the models or if they want both. I believe that is a fair compromise.

I had done some reorganizing to the article, firstly I removed the discussion on Lewontin's fallacy from human genetic variation because it is mentioned again in the clustering controversy. I added some information from these three sources.

  • Handley; et al. (2004). "Going the distance: human population genetics in a clinal world" (PDF). {{cite journal}}: Cite journal requires |journal= (help); Explicit use of et al. in: |last= (help)
  • Jorde; et al. (2004). "Genetic variation, classification and race" (PDF). {{cite journal}}: Cite journal requires |journal= (help); Explicit use of et al. in: |last= (help)
  • Tishkoff (2004). "Implications of biogeography of human populations for'race'and medicine" (PDF). {{cite journal}}: Cite journal requires |journal= (help)
  • Ramachandran; et al. (2005). "Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa". doi:10.1073/pnas.0507611102. {{cite journal}}: Cite journal requires |journal= (help); Explicit use of et al. in: |last= (help)

The Jorde et al. is cited by about 156 publications and Tishkoff by 144, Ramachandran by 150. So these articles are reliable and "mainstream" and uncontroversial. Wapondaponda (talk) 11:58, 2 November 2009 (UTC) Wapondaponda (talk) 11:58, 2 November 2009 (UTC)

Muntuwandi, the 9-cluster model was suggested as a compromise between the 42 population model and the 5 continent model. If you doubt the value of the 9-cluster model, please review Cavalli-Sforza's evaluation:

In conclusion, there has probably been enough intermingling of the clusters that a network representation (i.e. a tree with interconnections between branches) would be highly desirable. But the tree in figure 2.3.3 [see 9-cluster image] is probably the best result that can be obtained using present methods, that is, a phylogenetic tree without interconnections. [...] Considering that the nine large clusters we have used represent large, geographically contiguous regions (but separated according to natural boundaries that tend to create some discontinuity), it may be almost surprising that the tree we obtained is reasonably reproducible in different bootstrap samples.

I am certain that either Occam or I could rewrite the section to reflect the data set in the 9-cluster image and still make clear statements regarding continents without resorting to WP:OR or WP:SYNTH. But I don't see the point in having both side-by-side. While the 5 continent model may be superior for demonstrating the origin of mutations and their subsequent spread through populations, the 9-cluster model is decidedly superior in demonstrating the phylogenetic relationships between populations. As the text also refers to the Recent African origin of modern humans, I find it logical to use a population-based model as opposed to a continent-based model. --Aryaman (talk) 15:23, 2 November 2009 (UTC)
We have a content dispute here, so my proposal is simply to include both models separately. They do not contradict each other if sourced correctly. The five population model has been in this article and genetic history of Europe for more than two years. I can therefore say that there is a reasonable consensus on wikipedia in support of it. If you would like it to be replaced, the burden of proof is on you to prove that the community will prefer the material you suggest. As you or Occam have not yet written anything, I will continue to polish up the 5 continent model and ensure that all the material in the relevant sections is properly referenced and consistent with the sources. I will remove material that is not consistent with the sources, but this is not anything permanent, maybe you or Occam will re-add it when you write up the 9 cluster model. Wapondaponda (talk) 16:15, 2 November 2009 (UTC)
Muntuwand, these edits weren’t “uncontroversial”. You were going against VA’s suggestions with your edits, while not even attempting to address them. Whether any of these edits were justified or not can be discussed, but it’s never “uncontroversial” to make an edit that goes against the opinion of every other editor who’s currently working on an article.
Your newest reason for removing the image also doesn’t make sense. You said, “I removed the 42 population image because there is no discussion of the image in the reference, Genes, Peoples and Languages which deals with the 5 population model. As a result the image remains "unsourced".” The image is cited to The History and Geography of Human Genes, and nobody other than you has been of the opinion that this section of the article needs to be restricted to material that’s included in Genes, Peoples and Languages. As VA pointed out, the 42 population model and the 9 cluster model are also used by several other authors, which makes them a little more notable than some of the alternatives.
If you wish for the article to include some form of the five-population model, what would you think of VA’s earlier suggestion to make an image gallery for this article, and include a visual representation of this model there? I wouldn’t have a problem with that idea, and I think VA probably wouldn’t either, since he’s the one who suggested it originally. --Captain Occam (talk) 17:05, 2 November 2009 (UTC)
Yes, the only controversial edit was removing your image, the other text has not been contested and is fairly uncontroversial. So to restore you image, you have reverted other material without giving it any consideration. All the material in the section genetic distance is based on Genes Peoples and Languages, midway you have inserted an image and and some text referring to History and Geography of Human Genes, with no reference which gives the impression that the sentence and the image are from Genes Peoples and Languages. This oversight on your part creates significant inconsistencies as I have already pointed out. The 5 population model omits admixed populations such as North Africans, whereas the line of text you inserted refers to 42 population model which includes admixed populations such as North Africans. Naturally I don't expect you to care about such, since the most important thing for you is to show an image that you believe proves the existence of biological races. I have suggested for the umpteenth time, to be consistent, you or Aryaman should write up some information on your 42 population model, that is consistent with the image and even a genetic distance matrix. We should not mix sources up as that violates WP:SYNTH. Simply leaving your image hanging and disconnected from the text is a hallmark of POV pushing. Wapondaponda (talk) 17:58, 2 November 2009 (UTC)

removal of jensen

Regarding this undo by me. Who says these groups are "genetically defined"? And what does "genetically defined" mean in this context? We're an encyclopaedia right? SO we shouldn't bandy about words or phrases that are opaque. I am actually a geneticist by profession and education and as far as I know a genetically defined population would be one where the whole genome of each member of the population would be known. It might also mean something like an inbred mouse strain like this, where we know that each member of the strain will certainly carry a specific copy of a specific allele, e.g. see Nude mouse. That is not the case here.

I've also just noticed that the sentence immediately prior to this claims

"This study found that the populations examined by Cavalli-Sforza were clustered into several larger groups, each of whose populations are genetically closer to one another than they are to any populations outside of the group."

Who says this? In fact clustering does not mean that each membe rof the group is alwasy closer to another member of the group genetically than to a member of another group. Clustering means that each member of a group is genetically closer to the typical genotype for this group than it is to the typical genotype of another group. Witherspoon et al. (2007) "Genetic similarities within and between populations" Genetics 176: 351–359 doi:10.1534/genetics.106.067355 explain how it is possible to accurately classify people into groups even when they can still be more genetically similar to someone outside of their group. This sort of bold claim requires a quote I think. If we're going to include it, then we also need to give Witherspoon's conclusions because they directly contradict this claim. I'm removing it for the time being. Alun (talk) 06:21, 23 October 2009 (UTC)

Alun, for some reason, your cladogram doesn't show. :^< --Ramdrake (talk) 23:27, 23 October 2009 (UTC)
Sometimes .svg images take a while to display properly from the commons. I don't know why that is. Try clicking on the image, and then clicking on it again from it's description page. For me the image displays in my browser OK, just not embedded in Wikipedia. Usually this sorts itself out after a few hours. Alun (talk) 04:20, 24 October 2009 (UTC)

Proposal to Shorten the Article

This article strikes me as too long and unfocused. Does anyone have opinions on that? I am planning to delete a bunch of stuff. (Needless to say, others are free to revert those deletions.) As a concrete example, do we really need a section of genetic drift? I don't think so. We could mention the words "genetic drift" and provide a link to the Wikipedia article on that topic. David.Kane (talk) 17:59, 25 October 2009 (UTC)

Can you list some other sections you plan to delete. Also, to tighten up an article, it is recommended to summarize sections rather than delete them, unless they are totally irrelevant to the article.--Ramdrake (talk) 19:06, 25 October 2009 (UTC)

I would, for example, replace the section on genetic drift with the two words genetic drift placed somewhere sensible. I would do the same with "Recent admixture" and "Founder effect." Again, none of this is wrong or even poorly done. I just think that the article is too long. David.Kane (talk) 00:26, 26 October 2009 (UTC)
One suggestion is to merge some material into human genetic variation as the two articles overlap significantly. Wapondaponda (talk) 01:12, 26 October 2009 (UTC)
Agreed. The same may be true for other articles. I will give everyone another day or two to chime in before I start editing. David.Kane (talk) 12:36, 27 October 2009 (UTC)
I think we should come to an agreement first on what the scope of the article race and genetics should be. It is a difficult issue because in some circles, race is a social construct and not a genetic one, therefore there would be no genetics to discuss in race and genetics. Furthermore there has been little, if any, direct scientific investigation in the "genetics of race". By this argument, we would simply redirect this entire article to human genetic variation and merge any useful information. Race could be a subsection in human genetic variation. I think establishing the scope of the article will prevent the haphazard addition and subtraction of material. Another related article is human genetic clustering. The contents of all three articles currently overlap significantly. Wapondaponda (talk) 16:10, 27 October 2009 (UTC)
I will not be taking a stand on the social versus genetic construct issue. My deletions will come from both "sides." If you want to have a discussion on the "scope of the article," then I am happy to participate. But that discussion should not prevent change to the article in the meantime. Any changes can always be reverted. David.Kane (talk) 21:13, 27 October 2009 (UTC)

I am beginning this process now. David.Kane (talk) 18:11, 1 November 2009 (UTC)

I am largely finished. Main goals were to both trim content and re-organize the material. I think that this makes room for other more relevant material to be added. Thanks to all the suggestions above. David.Kane (talk) 21:46, 4 November 2009 (UTC)

Going forward

As I have previously mentioned, I intend to add information from the following publications regarding race and global patterns of human genetic variation

If they are any objections, to using the information from these articles, please state so. Since nobody has yet written anything from History and Geography of Human Genes, I may include some information myself. If they are any objections please state so. Wapondaponda (talk) 08:20, 3 November 2009 (UTC)

At least some progress on the text, though I still have some concerns. Firstly Arya has placed the Fst matrix within the image, because of space the numbers are barely legible. Secondly the image is quite large about 123kb, not a big deal for high speed connections, but many wikipedians have slow internet connections, so the page takes forever to load because of the image. Cutting out the distance matrix from the image may go a long way towards solving both those problems. Text loads faster than images. There is still an apples and oranges situation because the text in geographic analysis is not directly related to the matrix or the image. I intend to clarify this discrepancy so that there is no confusion. Wapondaponda (talk) 18:16, 3 November 2009 (UTC)

I agree with your comments on the distance matrix in the image. In fact, it was only included as a compromise, and I am certainly willing to cut it out. At the same time, I think the geographic data matrix might also benefit from being in image form, as there is a formatting error when it is aligned to the right of the page (for some reason there is no padding between the text and the table). As you likely noticed, I also included a sub-section on linguistic analysis, as this is also a major feature of Cavalli-Sforza's work. Besides that, it's quite interesting in itself. Much improvements can be made here, however, and the current text is nothing more than a rough draft. --Aryaman (talk) 18:23, 3 November 2009 (UTC)

I have cropped the image and reduced the resolution and uploaded a derivative work. It is down from 123kb to 12kb. We can use it or any other version that is acceptable. I have never quite figured out how to align wikitables. Concordance with linguistics is indeed important but is very general as their are numerous exceptions. I disagree with Ethiopians being in the category of exceptions as more recent studies have demonstrated. Wapondaponda (talk) 19:02, 3 November 2009 (UTC)

Could you please leave the image a bit bigger? Your last version makes it almost impossible to read the node distances. As long as those remain legible, I don't care how big or small it is.
If you have more recent data on Ethiopians, by all means, add it. As I know the linguistic categorization has not changed, I'd be interested in knowing what current research says about their genetic association. --Aryaman (talk) 19:34, 3 November 2009 (UTC)

Kudos to you for adding information from these articles! I am finishing up my efforts to trim less relevant material, the better to provide room for this much more important information. David.Kane (talk) 22:01, 3 November 2009 (UTC)

Yes linguistically Ethiopians speak Afroasiatic languages. In his book Cavalli-Sforza states that Afroasiatic languages are predominantly spoken by "caucasoid" peoples and hence he believes that this is inconsistent with Ethiopians being African. However, the contemporary theories regarding the origin of Afroasiatic languages, indicate that Afroasiatic is most likely an African language that was adopted by caucasoid populations, rather than the other way around. Though most Afroasiatic speakers are considered "caucasoid", (Arabic is the most spoken Afroasiatic language with about 300 million people in North Africa and the Middle East). The majority of distinct Afroasiatic languages are found in East Africa indicating that it is the most likely source of the language family. A similar situation is found with English, where the majority of English language speakers are found in North America, over 300 million, but English's original homeland is located in the regions that were populated by Germanic tribes in Northern Europe, where there are a lot fewer English speakers today than in North America, but a greater diversity of Germanic languages. In short there is a linguistic correlation between Ethiopians and their continent of origin as the leading hypotheses consider Afroasiatic African. Cavalli-Sforza's work is quite dated now, and he may not have had access to some of the latest publications concerning Afroasiatic. Wapondaponda (talk) 22:29, 9 November 2009 (UTC)

Scope of article

Defining the scope of this article is quite challenging because there isn't yet an established discipline that is dedicated to studying the genetics of race at the genomic level. However race, or biogeographical ancestry, is an important element in many biomedical and genetics studies. One could make an argument that because there isn't an established discipline of race and genetics, then the whole article is original research and should be deleted or merged into human genetic variation. When one thinks of race and genetics, a few things spring to mind.

  • Firstly race is socially constructed around certain traits that can be discerned visually. As a result race and genetics could discuss the genetics of traits such as skin color, hair texture, eye shape etc. Unfortunately what is known about these traits comes from observing phenotypes from generation to generation, not much is known about the actual genes.
  • The next issue that comes to mind is whether variation in physical appearance corresponds with invisible genetic variation. I suppose the answer is yes and no. Yes because isolation by distance means that people who live near each other will be genetically more similar than those who live further apart. As a result people who live near each other will look more similar. On the other hand, non-concordance means that people can still look different but share some of the invisible genetic ancestry. Genes that are beneficial can quickly spread over long distances, for example the sickle cell trait and haplogroup E both spread rapidly from sub-saharan Africa into Europe. The sickle cell trait conferred resistance to Malaria to Mediterranean Europeans, and haplogroup E spread from Africa to Europe with the emergence of agriculture. As a result both light skinned Europeans and Dark Africans share some common ancestry that would not be known without genomics(Blond blue-eyed brit with DNA of an African). Overall the genetic profile of an individual will be concordant with socially constructed races, but is more concordant with geography and isolation by distance than it is with race.
  • The most contentious issue is whether variation in external appearance is strongly concordant with certain behavioral traits, most notably intelligence or aggression (physiognomy, phrenology). The genetic basis of such traits has yet to be determined, so any hypotheses regarding these traits race and genes is still circumstantial.
  • The subject of human genetic variation is useful in creating a foundation for discussions of race and genetics, but the topic is too broad to have an extensive discussion in this article.
  • Certain regional topics are quite interesting, and can be considered as candidates for inclusion in the article. In the Americas, especially in the ethnic melting pots of Latin America, the distinction between socially constructed races and genetics breaks down due to extensive admixture. In the US because of the one drop rule, it is possible for someone to be genetically predominantly European but socially constructed to be African American. In India, there is the subject of "caste and genetics".

Wapondaponda (talk) 15:37, 5 November 2009 (UTC)

Genes for different appearance may be different from genes for invisible differences and can become unlinked over generations, so that the latter may be able to diffuse even when social segregation by appearance is in effect. --JWB (talk) 17:53, 5 November 2009 (UTC)

Indeed that is true. However two issues. Firstly in the case of the blonde brit with an African y-chromosome, the y-chromosome does not recombine so unlinkage does not apply to the y-chromosome, but would apply to other nuclear genes. The y-chromosome is passed down almost intact(NRY), so that the Briton has virtually the same y-chromosome as his African ancestor. But since he doesn't look African, it tells us that the genes that code for the physical traits associated with race aren't located on the y-chromosome. Secondly we cannot assume that social segregation, as practiced in contemporary times, has always been the norm. For instance, the Africa y-chromosome haplogroup E is frequent in caucasoid populations in North Africa, the Mediterranean and the Balkans. Since this was male mediated, ie African males mating with Eurasian females, it implies the absence of social barriers for African males. When we see admixture profiles in the Americas, we see the reverse in which case many socially constructed, or self identified whites, have African or Native American maternal lineages but not paternal, indicating female mediated gene flow and male social segregation. Wapondaponda (talk) 19:06, 5 November 2009 (UTC)
That is right - very little of anything besides sex determination is on the Y chromosome, so any other African genes were not linked at all to the African Y chromosome and the British carriers should have only 2-generations of other African genes which is likely to be zero by now. The same applies to those E sub-Y haplogroups that have been traced from Africa to Europe - most of the chain of transmission is likely to be through males who looked little more African or not at all compared to the surrounding population of their time and place. --JWB (talk) 19:25, 5 November 2009 (UTC)
Yes, the y-chromosome is basically a an X chromosome without one leg due to shrinkage. However Underhill et al. 2007 reports genes linked to infertility may still reside on the y-chromosome. I would say a small amount, but not zero African DNA, because nuclear scans still detect African DNA in European populations. In some cases, if nuclear markers are beneficial, they may even become more frequent than the accompanying y or mtDNA markers. In certain regions of Sicily, the sickle cell trait has frequencies as high as 13%[4]. There are likely to be other beneficial genes that have yet to be identified. In general genetic distances between Europe and Africa are shorter than between Africa and other continents, indicating that some African autosomal markers persist in the European population. Yes at some stage E haplogroup made the jump from a dark skinned population into a caucasoid population, where, how many times and when exactly that took place is interesting topic but has yet to be determined. According to Brace et al. 2005, the inhabitants of the early Neolithic Natufian culture from Israel, were apparently still sub-saharan looking. Wapondaponda (talk) 04:23, 6 November 2009 (UTC)
Yes, of course autosomal genes subject to selection can spread without carrying much with them even in the short term but genes close on the same chromosome. Cavalli-Sforza's "120 classical polymorphisms" are likely to be disease-related genes like these and to therefore correlate to proximity, climate, and history of disease and populations rather than earlier ancestral population origin. --JWB (talk) 18:46, 6 November 2009 (UTC)
One solution to thinking about the scope is to note that there are Wikipedia articles about race and (human) genetics already. This article should cover the intersection of those two articles. In reorganizing the material over the last week, this is what I have aimed at. In other words, if topic X is not covered in --- or would not be plausibly added to --- either Race_(classification_of_human_beings) or human genetic variation, then it does not belong here. David.Kane (talk) 18:21, 5 November 2009 (UTC)
That makes sense, however, wikipedia articles shouldn't necessarily be based on other wikipedia articles, but more so on reliable sources Wapondaponda (talk) 19:06, 5 November 2009 (UTC)
Agreed. I was talking about scope rather than sourcing. David.Kane (talk) 13:36, 6 November 2009 (UTC)

(outdent) At present, I don't have many complaints about the veracity of the current content. However the article looks very much like human genetic variation not an article about race. In the quest for making a sober article, all the racial stuff has been removed. Race is in many ways a social construct, and that aspect is missing from the current article and that is why it looks like human genetic variation. Social constructs are typically created based on a few visible traits and also on stereotypes. Therefore issues about race need not be rational or even scientific, as most scientist have abandoned traditional "platonic" views about race. This old version has some of, stereotypical and unscientific views on race. Much of the material has been pared down as some of it may have been inappropriate or politically incorrect. Nonetheless, I think there is some useful information that is more directly related to race and genetics than the current content. A lot of the current material focusses on human genetic clustering, but this is not what average Joes think about when dealing with R and G. For instance the following articles deal with the some of the common man's perceptions of race and genetics,

Another issue, which I have raised earlier, regards some of the regional peculiarities of race. For example it is common in Latin America for individuals who identify themselves as white to have African or Native American ancestry.

Uniparental genetic markers in select samples of self identified white individuals from middle class population in some Latin American countries[5]
Country Amerindian African
mtDNA Y-chromosome mtDNA y-chromosome
Brazil 33% 0% 29% 2%
Argentina 45% 9% ns ns
Chile 84% 22% ns ns
Colombia 90% 1% 8% 5%
Costa Rica 83% 6% ns 7%

This illustrates that social construction of race is not concordant with genetics. I am not willing to make any changes yet, as I feel that there is a need to get the philosophical argument about what this article should contain. Right now I feel it is heavy on human genetic variation, but light on the social construction of race. Wapondaponda (talk) 19:32, 9 November 2009 (UTC)

My impression has been that this article should discuss information relevant to the intersection of "genetics" and "race". I don't see how introducing discussion of the area where they do not intersect (e.g. discussing components of the popular conception of "race" that have absolutely nothing to do with genetics) is going to improve the article. Unless that is not what you are proposing? --Aryaman (talk) 21:21, 9 November 2009 (UTC)
Considering that the mainstream view is that race (the classical concept thereof) and genetics in fact do not really intersect, I see how Muntuwandi's proposal makes perfect sense.--Ramdrake (talk) 21:27, 9 November 2009 (UTC)
No one is arguing otherwise. But if the article were to discuss all those components of the popular conception of race which have no corresponding concept in genetics, then we would basically have an article criticizing the popular conception of race as it appears in racism. That would be fine to discuss at Racism, but we don't need to repeat all of that here. Of course, we could push Ramdrake's point further and say that, since there is so much unscientific flotsam in the popular conception of race, we should actually just delete the article, as it is inherently unscientific. However, I don't see that as a very good option, either. There is some overlap between the two, and this article seems the best place to discuss the findings relevant to that overlapping portion. --Aryaman (talk) 21:53, 9 November 2009 (UTC)
We don't need to discuss "all" components of the popular conception of race, but we need to discuss "some", to explain the mainstream view on race and genetics. And yes, there might be some overlap with Racism. We can address that in due time.--Ramdrake (talk) 21:59, 9 November 2009 (UTC)
Well, what would you suggest, then? I think one of the primary components of the popular conception which has been debunked by genetic research is the notion of "racial purity", i.e. clearly delineated, genetically disparate racial groups. If that's the kind of thing you and Muntuwandi would like to include, I don't see a problem with it. Could you draw up a list of such things for discussion? --Aryaman (talk) 22:20, 9 November 2009 (UTC)
I have informally listed some of the topics that I think are more directly related to race. The above table and corresponding article discuss some of the peculiarities of how race is socially constructed, in the United states some African Americans may have 70% or more European Admixture, but will identify themselves as black or African American because of the one-drop rule, whereas in Latin America, someone with 30% African or Native American ancestry may be identified as white. Many Native Americans have significant European ancestry. There was a recent controversy regarding genetic testing as a means of verifying tribal affiliation because it was argued that tribal affiliation is more cultural than a genetic issue[6]. In this case being native American may be more of a social construct rather than a genetic one.
With regard to the stereotypes of Blacks Athletes. Even Entine states that while individuals of West African descent are disproportionately represented in sprint related sports, they are non-existent in long distance sportes. Whereas East Africans dominate long distance sports but are almost non-existent in sprint sports. Rather it is not all East Africans, but just a few high altitude tribes such as the Kalenjin who dominate marathons. Furthermore other native american groups that come from high altitudes also do well in long distance sports. So it seems that geography or biogeography correspond better with certain stereotypes than race, and this becomes more apparent when actual data is analyzed.
I believe we all agree that many consider race a social construct, and if this article is to be about the intersection of race and genetics, then we must weave into it the social construction aspect. Wapondaponda (talk) 22:59, 9 November 2009 (UTC)
Stating that, in those parts of the world where there has been extensive admixture, the popular conception of race has adapted accordingly seems almost obvious enough to be non-notable, but if you can find a reliable source (I'm sure you can) which makes this kind of claim, then present it.
Wasn't Entine's book criticized for the exact same reason people criticize Jensen's work? I have to admit, I have not done any research into the connection between race and athletics, but a brief search produces a surprising amount of information. --Aryaman (talk) 23:15, 9 November 2009 (UTC)

"Considering that the mainstream view is that race (the classical concept thereof) and genetics in fact do not really intersect" Well, I guess that depends on what you mean by classical concept. If you give 100 people ten pictures of random people and ask them to classify the individuals by race, there will be a non-zero correlation between the classifications they pick and the genetics. It won't be perfect, obviously, but it will be much better than random. At the same time, I agree that social classification is important and that it is discussed in the main race article. So, discussion of it here would be appropriate as well. David.Kane (talk) 01:35, 10 November 2009 (UTC)