Saturday, May 12, 2007

JCVI Evolutionary Genomics Journal Club on Liu-Ochman


This week I was the presenter for our "Evolutionary Genomics" journal club at JCVI and I chose to present the Liu-Ochman paper on the the stepwise formation of the bacterial flagellum and its reception (part two, three) in the blogosphere. The basic claim of the article is that the 24 core proteins are homologous to each other and this explains how the flagellum evolved through repeated gene duplication events.

Nick Matzke doesn't buy the argument for several reasons: 1) It doesn't seem to be congruent with previous studies 2) At least two of the structures of the presumed homologous proteins don't "look" homologous. and 3) The authors seem to be using the Bl2Seq tool incorrectly. It's odd that Nick focuses so much effort on points 1) and 2) because it is really 3) that is the issue. Personally, I've never been convinced that protein structure is of much use in inferring homology or the lack of it; systematists have been burned so many times by incorrectly assumed (non)homology of gross morphological traits in light of convergent and divergent evolution; why should morphology at the protein level be any different? The beauty of molecular systematics is that it's freed us from having to deal with morphology at all.

At the journal club, we were split on the value of structure for inferring homology, but we all agreed that eyeballing structures, particularly structures that seem to have drawn with different programs and rotated differently. was not a very convincing argument.

We were much more convinced, however, by Nick's demonstration that the authors seem to have performed their BLAST matches incorrectly. As Nick showed, the authors did not have sequence filters enabled, which means that matches to low-complexity regions can artificially inflate BLAST significance, and perhaps more damning, the authors used multiple pairwise BL2Seq runs without correcting for the true size of the search space. And these weren't just assumed to be the problem; Nick demonstrated on a subset of the data that using the correct parameters caused several "significant" BLAST matches to disappear.

This was the introduction to scientific blogging for many of the attendees of the journal club, and they also had some interesting comments about the phenomenon after preparing for the club.

1) While the paper was much mentioned in scientific blogs, generally the mentions were just "Nick Matzke has shown that the Liu-Ochman paper is flawed; here's the link", and not independent analyses of the paper. Yes, this is true of blogs in general, not just scientific ones. But this sort of laziness is very common in traditional media too. Take a look at your newspaper and see how many articles are from news services like AP or Reuters rather than being independent reporting.

2) Why did discussion suddenly fall off after Nick's articles? Does the blogosphere really have such a short attention span?

3) Why haven't we seen a response from Liu and Ochman? Are they not aware of the discussion, or do they simply see criticism on blogs as not being worth responding to?

26 comments:

T Ryan Gregory said...

1) While the paper was much mentioned in scientific blogs, generally the mentions were just "Nick Matzke has shown that the Liu-Ochman paper is flawed; here's the link", and not independent analyses of the paper.

From what I can tell, the majority of blog posts are links to, comments on, or repetitions of other blog posts generally.

2) Why did discussion suddenly fall off after Nick's articles? Does the blogosphere really have such a short attention span?

That is a very reasonable hypothesis given the observation, and raises significant questions about whether blogs can be reliable venues for rigorous assessments of published papers.

3) Why haven't we seen a response from Liu and Ochman? Are they not aware of the discussion, or do they simply see criticism on blogs as not being worth responding to?

I have been speaking with Howard Ochman throughout the process. I happen to know him as a colleague, but I simply suspect that no one else bothered to contact him. I also am not surprised that he has refrained from responding to Nick's polemical and emotionally charged comments. Indeed, I would say that not engaging in such a discussion and instead waiting for a formal rebuttal submitted to a journal was the professional thing to do. He is aware that people have questions about the paper, which is why he had already freely shared the dataset with other researchers for re-analysis before any of this erupted online. Frankly, I have been very disappointed by Nick's approach to this, and to the bandwagon-jumping to which you alluded (and several other practicing scientists voiced this complaint as well in the comment sections of various blogs). Convincing non-experts on a blog (using points like "see, look" and "don't worry, we're working on this over email", no less!) is not peer review, and I am sure that Howard will respond when Nick's formal rebuttal goes through the proper peer-reviewed channels. I don't think you should be surprised if the authors of the original paper -- and many other scientists who read blogs (myself included) -- have been alienated by Nick's choice of arguments and rhetoric. I also don't at all buy the claim that the paper needed to be "stomped on" to prevent the creationists from referencing either it or positive comments about it. The only creationist discussions that I have seen focused almost entirely on Nick's rhetoric. I would argue that he made the situation worse by simultaneously distancing scientists and arming anti-evolutionists.

I think there could be a place for review/critique on blogs, but this case is not an example to follow in the future.

PZ Myers said...

1) That's the nature of this medium. Unless you're going to add something new to the discussion, a link with a short comment is entirely appropriate.

2) Yes, the blogosphere has a painfully short attention span. Most subjects will have a half-life of a day, maybe two. Over that time, you'll get a flurry of commentary and links, and then it's gone. The long-term memory of the blogosphere is in the search engines like google.

Blogs are NOT venues that replace published papers. They are tools for disseminating information to a much, much wider audience than would normally read a technical paper.

3) I'm not privy to Liu-Ochman's thoughts as TR Gregory is, but I'm not disappointed at all in the blog response. That's the nature of the beast, and it's a mistake to regard it as a scholarly back-and-forth, or even to consider that a deficiency of the medium.

If someone cares about the discussion going on in the blogs (and again, maybe Liu-Ochman don't), one will have to respond with some quickness, and make a case that directly addresses the immediate issues. Suggesting that we have to wait for something to waft through formal channels flouts the properties of the medium and neglects its strengths.

As for the concern that creationists have only focused on Nick's rhetoric -- that implies a deep misconception about creationists. No matter what any of us say, creationists will distort it, and their interpretations are not of concern. What is worrisome instead is that scientists and supporters of evolution would discredit themselves by using a paper that creationists could honestly refute. We weren't acting to silence creationists from mangling it, we were acting to prevent evolution advocates from making an embarrassing mistake.

T Ryan Gregory said...

1) That's the nature of this medium. Unless you're going to add something new to the discussion, a link with a short comment is entirely appropriate.

The problem is that most posts were not just links with comments but strong statements that the paper had been refuted, without having read either the original paper or any detailed rebuttals.

Blogs are NOT venues that replace published papers. They are tools for disseminating information to a much, much wider audience than would normally read a technical paper.

Which suggests that we should be more careful in how we portray the process by which published papers make it to print, and to draw a clear distinction between professional peer review and blog commentaries.

...it's a mistake to regard it as a scholarly back-and-forth...

Agreed.

Suggesting that we have to wait for something to waft through formal channels flouts the properties of the medium and neglects its strengths.

This is trying to have it both ways. Using the pretense of peer review to refute a paper without having to undergo any of the rigors of the actual process is not a strength in my view.

As for the concern that creationists have only focused on Nick's rhetoric -- that implies a deep misconception about creationists. No matter what any of us say, creationists will distort it, and their interpretations are not of concern.

Of course they will. So why make their job easier by giving them sound bites?

What is worrisome instead is that scientists and supporters of evolution would discredit themselves by using a paper that creationists could honestly refute.

Creationists don't have to refute the article. They can just quote Nick calling it a "dog" and their work is done for them. They are not interested in doing science, they are concerned with rhetoric.

We weren't acting to silence creationists from mangling it, we were acting to prevent evolution advocates from making an embarrassing mistake.

In my view, the most damage has come from 1) alienating professional scientists with the completely unnecessary polemics, 2) giving anti-evolutionists just what they want in rhetoric, 3) providing a very misleading impression of how peer review operates, and 4) making it seem that scientists jump to conclusions without even studying the issue. A few positive comments about a paper in a prestigious journal that would have to be reconsidered after the *real* self-correcting nature of science had taken place (assuming, for the sake of argument, that Nick's rebuttals survive peer review) would not be embarrassing. My post actually spent roughly equal time on Nick's model and on Liu-Ochmann, and was peppered with qualifiers like "appears" and "perhaps". What embarrassed me more than anything was the immediate and unquestioning acceptance of Nick's proclamation that the paper was lousy before he even gave any details (and even then, people pointed out several problems with his arguments).

Pedro Beltrao said...

1) A blog post serves two main uses currently. One is to engage or propose a discussion and the second it to direct attention. The second is currently the main way to evaluate the importance of a current discussion in the blogosphere. Also, commanding attention is the only significant power a blogger exerts. Science bloggers have shown recently for the first time that they can put it to good use too (Shelley/Willey case). Of course bloggers should not just link to something without critically analyzing. This can easily create a mob time situation.

2)The attention span is limited but the discussions do come up again. I don't see what the attention span has to do with the value of blogs for post-publication reviewing. I rather have some information than none at all.
There are some scripts the let anyone superimpose blog comments tracked by postgenomic in pubmed searches and in the publishers website. This way when I am researching the literature I can easily have a look at blog comments about a paper (if there are any). If i am really interested in the subject I am happy to read other people's opinion about the papers. It is just more information that I can take in while evaluating the paper's results.

It is also interesting how so much of the discussion regarding this paper has been not about the data itself but if post-publication reviewing in blogs is appropriate at all. I have been blogging for 3 to 4 years now, and I am thrilled that finally we are starting to have critical mass to get the best of the medium. It is beautiful that we can even be talking about this here. That like minded people from all over the world can think together. We should be careful, we want critical evaluations not mobs, but please help shape it up not shut it down.

Neil said...

Just a few random thoughts. I don't think that the flurry of comments for a blog post should be interpreted as a short attention span. Blogs are archived - by search engines, by Technorati, by aggregators such as Postgenomic - and topics are revisited. I return to older posts frequently and I know that others do too.

I'd agree that blogs can and should act as forums for scientific discussion and that this is a growing phenomenon, to be welcomed. I'd also agree that this particular example is not one of the best. As Jonathan says - the best argument against the paper is its questionable use of BLAST. If this had been addressed first without the polemic and the very poor structural biology criticism, it would have come across as a more professional criticism such as we see in journal comments and rebuttals.

A lot of us are also not interested in dragging creationism into everything. The more interesting discussion concerns whether it's good or bad science. Creationists don't care either way and will (mis)interpret anything as they see fit - so what's their relevance?

In summary, I think what we can learn from this is a better way to conduct scientific debate in a blog format. Basically follow the rules that you would in your professional life - criticise the data and back up with your own data. There are plenty of angry bloggers screaming into the void - scientists don't need to join them. It's the same advice that people give you when you're about to fire off an angry email - sit back, wait 5 minutes and think about it first.

Mark Pallen said...

Saluton Doktoro Melo
Mi povus respondi al via komentoj en Esperanto, sed se mi farus tion, neniu el la aliaj legantonj komprenus min, do mi daurigos en la Angla. [aside to others, yes, T. taxus and myself are both "native" speakers of Esperanto :-)]

I have time to post only the briefest of posts, but in response to earlier discussions, I should like to make a number of points:

1. The Liu and Ochman paper is flawed from beginning to end. It is not a matter of one or two small lapses in science or scholarship. I would take issue with something in almost every paragraph. For example, all three figures represent erroneous conclusions or results.

2. The scientific community does not have a short attention span. Rather, several of us, which includes Nick Matzke and established experts on BLAST and flagellar biology, have been seeking to get a rebuttal published in the scientific literature, in the light of claims about the inappropriateness of using "blogospheric rhetoric". This takes time!

We have focused on Fig 3. We tried PNAS, but they do not publish letters at the moment (although they are thinking about doing so in the future). Instead, we are currently trying a letter to Science, particularly as Science NOW gave uncritical publicity to the piece

We await a response from Science. Ochman has also been sent our letter. We assume that Science, if they take our letter, will give him a chance to defend the indefensible.

Ford Doolittle has also submitted a critique of Fig 2 and associated methodology to Current Biology, which is in press. The drift of his argument is that their approach to phylogenetic analysis does NOT support their claim to have uncovered positive evidence that lateral gene transfer has not occurred in these systems.

Given that our letter is still under consideration with Science, I am reluctant to share it verbatim with you (this would equate to prior publication), but it rests on the arguments Nick has already articulated on the Panda's Thumb:

Liu and Ochman did NOT do their BLAST searches correctly!

And they did NOT describe what they did accurately!

And the approach they used was highly idiosyncratic to the point of being inappropriate(not the way most experts would have attacked the problem).

But worst of all, they uncritically accept the results of sequence analyses without attempting to assess the plausibility of their findings on the basis of other evidence (reading the literature might have helped).

3. Given the above, you will have to be patient in waiting for the critiques of Figs. 2 and 3. However, even Fig 1 is flawed. Here are a few reasons why:

Fig. 1. Distribution of flagellar proteins (excluding chemotaxis proteins) among flagellated bacterial species. Those proteins encoded by the core genes are designated in bold. This figure is redrawn with permission from that appearing in the KEGG pathway database (www.genome.jp/kegg/pathway/eco/eco02040.html).


"Entero-gammaproteobacteria", as used in the Figure, is not a valid taxonomic unit. The term is not found anywhere else on the web by Google and nowhere in the published literature by Google Scholar! One assumes they mean either the Enterobacteriaceae or the gamma-proteobacteria (this shoddy nomenclature represents one of many symptomatic lapses in scholarship).

As no original results are provided it is impossible to evaluate all their claims in detail (another more serious lapse in scholarship, which means one should have no sympathy when they are bombarded with requests for data--they should have published all their data and methods in a supplement with the paper). However, some of the broad claims can be dismissed easily by the existence of counter-examples.

For example, the claim shown by the colouring in Fig. 1 that FlgM is found only in the "entero-gammaproteobacteria" is falsified by the finding of FlgM homologues in Aquifex, Campylobacter jejuni, Helicobacter, Leptospira, Treponema pallidum, Thermotoga, Bacillus subtilis: most of this is obvious from sequence annotations, some even from research publications. At the very least these homologies have been reported in Pallen et al 2005. And there is even a structure available for the Aquifex FlgM!. Conclusion: these guys haven't even done a proper literature review!

Conversely, their claim that FliA is found in all taxa considered is wrong-it is absent from Caulobacter crescentus and Borrelia burgdorferi. All these points have already appeared in Pallen et al 2005, which they cite in the introduction. Curiously, they don't bother to investigate or even mention the discrepancies between their work and that of others.


Other flagellar structural genes that are broadly but not universally distributed across flagellated species include flgH, flgI, fliD, fliE, and fliH. The absence of certain of these genes from a genome is understandable once the characteristics of the particular bacteria are considered. For example, the L and P ring proteins FlgH and FlgI are not necessary in the Firmicutes because these bacteria lack the outer membrane in which these proteins are typically situated in Gram-negative bacteria. FlgH and FlgI are also not necessary in Spirochaetes, which have a periplasmic flagellum located inside of the outer membrane. The Firmicutes and Spirochaetes are viewed as two of the most basal bacterial lineages (22, 23), suggesting that flgH and flgI originated after the core set of structural proteins.


They ignore the inconvenient truth that a FlgH homologue is found encoded in the genomes from the spriochaete genus Leptospira, within a flagellar gene cluster.

4. No one else seems to have noticed the remarkable lapse in time between when the work was done and when the paper was submitted: Genomes downloaded December 28, 2005; paper received for review January 11, 2007. Given that there are many more genome sequences out there now, this is hardly a current view of flagellar diversity! Prompted by PNAS editors and others, several of us are now working on a definitive research report on flagellar evolution and diversity that will cover the issues that Liu and Ochman attempted to cover, but do it properly.

5. And finally, if one is going to use pretentious Latin plurals, one should at least get it right! There is no Latin plural "apparati" as they have it in the Introduction. This is a masculine fourth declension noun, so the Latin plural is apparatus (with a lengthened final vowel). The plural in English is "apparatuses".

6. There is one great thing about this paper; there are so many errors of fact, methodology, style and reasoning that it provides a perfect learning resource for students in how not to perform and write about science!

I close with a quote from Alfred North Whitehead, which summarises the approach that all those reading this paper should take (and that should have been taken by the reviewers and the editor, Francisco Ayala):

"The guiding motto in the life of every natural philosopher should be: seek simplicity and distrust it".

"La gvidanta moto en la vivo de cxiu naturfilosofo devus esti: sercxu simplecon kaj malfidi gxin".

Reed A. Cartwright said...

One thing that I find interesting about this entire debate over Matzke's blog post is that many people criticized him because he was just a blogger. How could he know more about flagellar evolution than this published paper? The irony is that Matzke has been working on flagellar evolution for years, and one of his papers was actually cited by Liu and Ochman as background information about flagellar evolution.

I disagree with the claim that Nick's structure argument was weak. It seemed very powerful to me. Protein structures are more conserved than their sequences. But I guess it comes down to taste.

Jonathan Badger said...

Thanks for all the comments, everybody.

Good to hear that someone is following up the blog criticism with a letter in the formal channels, Mark.

I *know* that people use the argument that "protein structures conserved than their sequences", Reed -- but there isn't any good evidence supporting this. Basically, people take two enzymes with no sequence similarity that perform the same function and say "Look, these proteins must be homologous; the structure must have been conserved even though their sequences are completely different". They don't consider that the structure might have evolved independently in both cases because they were under the same selective pressure to catalyze the same reaction.

Douglas Theobald said...

Jonathan,

Contrary to your assertion, there is a mountain of evidence supporting the well-known fact (in structural biology anyway) that structures are more conserved than sequences. We can experimentally engineer dozens of amino acid point mutations in a protein without changing a protein's fold. This is prima facie evidence that structure, at the level of fold topology, is extremely robust to mutation. In contrast, every sequence mutation obviously changes the sequence.

And your claim that we "don't consider that the structure might have evolved independently in both cases because they were under the same selective pressure to catalyze the same reaction" is absolutely false. That hypothesis has been (and still is being) considered, and at the level of protein fold, there is very little evidence to support it and much against.

First, we have the fact that structures evolve much more slowly than sequences (due to the aforementioned resilience to mutation).

Second, we know that many different sequences can adopt the same protein fold and perform the same function; protein fold space is mapped redundantly onto sequence space. This again is strong evidence that no particular sequence is needed to specify a given fold, and that therefore folds are more conserved, evolutionarily, than sequences.

Third, many enzymes that catalyze the same function have wildly different folds; thus, no particular fold is required for any given enzymatic reaction. This is fundamental structural biology and enzymology -- really it is only the conformation and composition of the active site (a small and local portion of any protein domain) that matters for catalysis. Take the serine proteases, for instance, but there are countless examples.

Fourth and conversely, there are also countless examples where the same fold has very different functions in different proteins. Again, the conclusion is that a protein's fold and its specific function are largely decoupled.

Fifth, we have observed, both experimentally and in the wild, the evolution of novel functions by selection via mutations in proteins. All of these examples involve modification of protein folds, with negligible or only minor change in their topologies. In contrast, we have never observed the evolution of one fold to another. Of course this doesn't mean it is impossible, or even infrequent in the long term, but it does mean that its likelihood is very low relative to evolutionary change with retention of topology.

Each of these facts means that selection for a particular function is very unlikely to result in the same sequence or fold independently. In terms of sequence and structure, there are just far too many ways to skin a cat. While it is theoretically possible, after a very very long amount of time, that one fold can gradually evolve into another very different fold, there is currently no empirical evidence for any such transformation. And even if it can and does happen, it would certainly be extremely rare relative to the frequency of cases where evolution maintains the same fold and modifies it for different functions. Observing two proteins with very different folds is strong evidence against their homology, especially and most forcefully at the level of performing a given cellular function.

What this means for L&O: it is extremely improbable that two flagellar proteins would have wildly different protein folds and yet be homologous. It is in fact strong evidence against homology for these proteins at the level of flagellar function. In any case, this is a glaring ommission in the discussion section of the paper in question.

NickM said...

Hi all, I am joining the argument late. Mark Pallen has said much of what I would say, except I would emphasize that massive problems with the L&O paper were apparent from get-go, based just on very basic background knowledge of the flagellum and flagellum evolution, even before we knew about the BLAST issues.

Among other things:

1. The paper reported numerous homologies that no one else had ever found, using nothing more than routine BLAST. People have been looking for flagellum homologies for years -- basically, BLAST gives you a certain set, PSI-BLAST gives you a bit more, and beyond that requires painstaking and careful argumentation, and/or new techniques, to convince colleagues and reviewers (normally). To have a paper come along and declare dozens of new homologies that everyone else had missed, via a trivially easy BLAST search rather than some very painstaking new technique, was extremely surprising.

2. If you know anything about flagellar proteins -- e.g., see this table for a basic description -- the idea that they are all homologous is extremely wild. You have membrane-embedded proteins that are little more than 2 transmembrane helices (e.g. FliQ, ~90 amino acids), and you have large multi-domain, complex secondary structure, non-membrane-embedded, hexameric motor proteins (e.g. FliI, ~450 amino acids) which have hundreds of known nonflagellar homologs (the AAA ATPase group) spread throughout the 3 domains, all of them sharing immensely more sequence, structural, and functional similarity with FliI than FliI does with FliQ. If you know about this sort of thing ahead of time -- and ditto these considerations for most of the other 24 "core" flagellar proteins and the comparisons between them (except for some of the axial proteins) -- you just can't believe a BLAST search that scores them as homologous.

To give you a sense of how wild it is, imagine if someone said "finger bones are homologous to skulls and descended by duplication from a common ancestor." There is a very vague and remote way in which it might be kinda-sorta true, e.g. if all bones are related to some simple cartilage network in some remote proto-chordate ancestor (this would be analogous to the vaguely possible "all proteins are homologous to each other position") -- but certainly it is not what is normally meant by homology, and certainly it does not support the "all bones came from duplication of one ancestor through some mystical process" position over the well-known and published models of the evolution of limbs, heads, etc.

Then you have the problem of trying to build a coherant scenario of flagellum evolution from the "all genes from one" conclusion. There must be many functional transitional states on such a model. What were they? What did they do? Despite the promise of a "stepwise" model, L&O don't really say. On the "cooption" model -- traditional at this point because everyone else who has written seriously on this has reached this conclusion -- most of the flagellum parts have quite close nonflagellar relatives with other functions, and minimal change in protein structures is required to produce the flagellum (unlike L&O's model, which requires that proteins of almost every concievable size and structure derive from a common ancestor).

It could have been that L&O were onto some radical new discovery leading to radical conclusions, but if they were, they should have realized it and explained why the previous work was wrong and overturned by the new data. Instead, they didn't seem to realize they were saying something radical, and numerous technical flaws e.g. like those pointed out by Mark indicated that it was much more likely that they simply weren't very familiar with the relevant areas and were over-extrapolating from the axial protein homologies to the whole flagellum.

All of this was obvious from the get-go, and this is precisely what led to suspicion that something very funky was going on with the BLAST results, which now, after the fact, everyone agrees is a very serious problem. But without the background considerations the BLAST problem would not have been discovered.

So anyway, I am just trying to explain what the L&O paper looked like to someone who is familiar with flagellum research. Maybe you have to be up on flagellum research to see it, and maybe it was unwise to attempt to communicate this feeling via a blogpost without an elaborate backgrounder explaining all the basic information about the flagellum and flagellum evolution that weighed against the L&O paper (But the information is already out there on the web, I linked to some of it my initial blogposts). If I had it to do over again I might do it somewhat differently, but this was a pretty unique situation.

As one additional consideration, add the fact that the L&O paper was clearly aimed to get public attention, media attention, and to serve as a confident rebuttal to ID. It had the authority (PNAS) and rhetoric (We have debunked ID! Never mind that it has already been debunked!) to go far. I found out about the paper from a science journalist looking for an assessment, and it was already getting picked up on the blogs (and, we later found out, Science magazine). So I only had two real choices: (a) stay quiet and let a large number of pro-evolution people shoot themselves in the foot by triumphantly citing this paper as the latest and greatest anti-ID publication, only to have it collapse (as it would inevitably) later on, to the embarrassment of everyone and providing a permanent talking point to the ID guys about how evolutionists will uncritically accept any old thing that supports their position, or (b) do what I did and be frank about it. As I expressed in my initial blogpost, I didn't like doing what I did, but it wasn't really much of a choice.

On the structure issue:

I *know* that people use the argument that "protein structures conserved than their sequences", Reed -- but there isn't any good evidence supporting this. Basically, people take two enzymes with no sequence similarity that perform the same function and say "Look, these proteins must be homologous; the structure must have been conserved even though their sequences are completely different". They don't consider that the structure might have evolved independently in both cases because they were under the same selective pressure to catalyze the same reaction.

No, this is wrong. Here is the evidence supporting the "structure is more conserved than sequence" position. In innumerable cases, this is the pattern observed:

Sequence identity - structure similarity

100% sequence identity - Nearly superimposable structures

90% sequence identity - Nearly superimposable structures

80% sequence identity - Nearly superimposable structures

70% sequence identity - Nearly superimposable structures

60% sequence identity - Nearly superimposable structures

50% sequence identity - Nearly superimposable structures

40% sequence identity - Nearly superimposable structures

30% sequence identity - Nearly superimposable structures

Less often, but still very common, you can have 20% identity or less and still have nearly superimposable structures.

Conclusion: structural similarity is very commonly more conserved than sequence.

I agree that there are cases where small sequence changes can cause large structural change, but this is an exception to the general rule, apparently especially over evolutionary time, and cannot just be cavalierly invoked to explain otherwise bizarre BLAST results. At the very least you would need to note that such structural change is unusual and explain why you think the homology conclusion is warranted anyway.

I also agree that there are cases where structures can converge independently, and that there is an argument about how common this is, but (a) this is a completely different discussion and (b) it is irrelevant to the flagellum homology issue, because my argument against homology was that the structures were different, not the same.

My conclusion from all of this has actually been rather positive -- in an almost eerie way, science hangs together, such that when someone makes a really wild proposal based on a mistaken analysis, the errors propagate through the proposal and raise numerous conflicts with other data. These then serve as warning flags that will be detected by multiple people focusing on different issues, and eventually lead to correction of the errors. That this could be done so quickly, even in a very new field like flagellum evolution, indicates to me that there really is some rigor and scientific substance to the topic, and that it really is not "anything goes." This, I submit, is a much more serious rebuttal to ID than we would have had by keeping quiet about the problems.

NickM said...

Ah -- I see that Doug has made the structure point also, only in a much more detailed way.

NickM said...

On the structure issue, I have to admit that I perhaps didn't pick the best example for the naive reader by using FliI and FliC in my second PT post, since on an eyeball inspection of the images I used there appeared to be some vague structural similarities that people picked up on. For me this was not obvious since e.g. in FliI the N-term and C-term are at different ends of the structure, but for FliC they are near to each other, but this was not mentioned in the blogpost (it is in the papers I linked to) and not obvious in the diagrams.

On the other hand, a consideration in favor of the naive "just look at the structures!!" argument against homology is that there are structure databases, similar structures are classified into groups, and if structural similarities existed between the 6 or so flagellar proteins with known structures, this would have probably already been discovered and noted in the database and the literature.

Jonathan Badger said...

Douglas, I think we are arguing two different meanings of "structures are more conserved than sequences". I'm talking about in evolutionary terms, not at what can done to proteins or observed in human times; simply making mutations to the sequence and noticing that the structure (usually) doesn't change much is not at all informative in an evolutionary context, any more than the fact that mutating a culture of gram positive bacteria has never been to observed to yield a culture that developed an outer membrane and became gram negative bacteria has anything useful to say about the evolutionary relationships between gram positive and gram negative bacteria.

I guess my biases here are those of molecular phylogenetics; in the early days, molecular trees were often accused of being wrong when they disagreed with those of some morphologist who was convinced that the structure of tail feathers or what not was far more informative and conserved than the sequence data.

Douglas Theobald said...

Jonathan,

Unless you are of the opinion that empirical experiments can tell us nothing about evolution, I am sure we are talking about the same thing. I of course am referring to conservation in evolutionary terms and time-scales. I understand well your "bias" regarding morphology vs sequences, as I share it, but I don't think that applies here at all. Relative to morphology, we have a reasonably good handle on how protein structures evolve. Protein fold evolution is directly amenable to experimentation, whereas gross morphology is not (and neither is the evolution of things like bacterial cell walls). Making mutations in the sequence and noticing how the structure responds is exactly what is needed to understand the evolution of protein folds and domains, and this is exactly what is under examination here. And it also is crucially pertinent to justifying the evolutionary theory that undergirds models of sequence evolution, since most of the patterns observed in amino-acid transition matrices (and BLOSUM/PAM scoring matrices) are the direct result of the evolutionary requirement (negative slxn/neutral evolution) for substitutions that are not structurally perturbing (e.g., hydrophobics replacing hydrophobics, cysteines maintaining disulphide bonds, charge for like charge, etc.).

As I pointed out, you can throw a remarkably large number of mutations at a protein, and it will still have the same fold. And as Nick pointed out, there are many cases where two proteins share less than 15% sequence identity (even less than would be expected by random selection of amino acids), and yet they have the same fold. No matter what time scale you are looking at, this means that structure is much more conserved, evolutionarily, than sequence. Period. I don't see how you can possibly get around this conclusion.

I should also point out that your specific criticism really is beside the point as far as L&O are concerned. It is true that you must be careful if you're trying to conclude homology just based on similar structures (with no sequence similarity). The likelihood of fold convergence is much higher than the likelihood of sequence convergence, simply because there are many fewer possible folds than sequences -- thus, structural similarity without sequence similarity is not necessarily a case of divergent evolution; it could be convergence.

But that is not what L&O did -- they concluded homology (divergence) in spite of massive structural differences and functional differences. For all the reasons I listed previously, the odds are heavily against that. Sure, it's possible, but extraordinary claims require extraordinary evidence, and this is a claim where they are postulating homology in contradiction to everything we know about molecular structure.

Furthermore, because structure is much more conserved than sequence, it would be incredibly odd to find significant sequence similarity without structural similarity, under almost any evolutionary scenario you can imagine. That should have raised a big red flag. Not to mention that, even assuming L&O did everything right, the reported sequence similarities were very weak to begin with (evalues of 10^-4 are hardly impressive for blast searches).

NickM said...

I guess my biases here are those of molecular phylogenetics; in the early days, molecular trees were often accused of being wrong when they disagreed with those of some morphologist who was convinced that the structure of tail feathers or what not was far more informative and conserved than the sequence data.

Yeah, like Doug said, protein structure is a much different beast than organismal morphology.

It may interest you to look at this webpage where Doug took a close look at the alleged discrepancies between molecular and (organismal) morphology-based phylogenies, and despite various famous disagreements, in a statistical sense the differences in the phylogenies are quite small:
http://www.talkorigins.org/faqs/comdesc/section1.html#independent_convergence

Douglas Theobald said...

I will also note that Jonathan's comment about morphology vs sequence is conflating two separate and largely independent evolutionary issues: (1) what constitutes a phylogenetically informative or valid character and (2) how one infers/detects homologous structures and characters to begin with. The molecules vs morphology debate that Jonathan cites as support is really about (1), whereas the structural homology vs sequence homology issue (the issue relevant to L&O) is about (2).

For instance, in principle it is possible that protein fold identity could provide for extremely robust phylogenetic inferences (like Koonin and others are doing for bacterial phylogeny), yet also could be a very poor indicator of homology. Or vice versa or both or neither. And the same goes for molecular sequences vs morphology.

Jonathan Badger said...

will also note that Jonathan's comment about morphology vs sequence is conflating two separate and largely independent evolutionary issues: (1) what constitutes a phylogenetically informative or valid character and (2) how one infers/detects homologous structures and characters to begin with.

These aren't independent at all. The reason why some characters aren't valid is because they aren't homologous at all -- they are in fact similarities due to homoplasy (convergent evolution). Of course, you can't generally know which characters are homoplasious until a phylogeny is constructed, unfortunately.

For instance, in principle it is possible that protein fold identity could provide for extremely robust phylogenetic inferences (like Koonin and others are doing for bacterial phylogeny), yet also could be a very poor indicator of homology. Or vice versa or both or neither. And the same goes for molecular sequences vs morphology.

No. Either the structures are indeed homologous (and therefore valid phylogenetic characters) or they are not (and are not valid phylogenetic characters by definition).

Getting back to L&O, my point was that if they had solid evidence of sequence homology then to hell with the structures. Of course, the whole thing is academic because I think Nick has adequately shown that they *don't* have solid evidence of sequence homology at all.

Douglas Theobald said...

Of course, you can't generally know which characters are homoplasious until a phylogeny is constructed, unfortunately.

Which is another way of stating my point -- the above necessarily implies that (1) detection of homology and (2) phylogenetic inference are separate theoretical issues.

Just because a character is homologous does not mean that it is phylogenetically informative. A clear example is a homologous set of characters (perhaps a position in a DNA sequence) that inherently has extremely high rates of change. It might be that the states are truly homologous, yet that position is mostly phylogenetically meaningless.

Douglas Theobald said...

Getting back to L&O, my point was that if they had solid evidence of sequence homology then to hell with the structures.

And I'll say that point is cavalier and irresponsible. If they really had solid evidence of homology in the face of different folds, that would be huge news, a first. That would be a paper in and of itself, requiring in depth discussion and further analysis. You don't just postulate something that contradicts all we know about molecular evolution and ignore the implications.

Neil said...

I disagree with the claim that Nick's structure argument was weak

Just to clarify, I'm not saying that there is no argument from structural biology in this case. I'm saying that the way it was argued in the blog post - trying to compare two very different graphic representations - was poor.

It seems from this great thread of comments that the word "conserved" can mean different things to different people. I'd agree that structures are more conserved in the sense that very different sequences can fold into the same structure - analysis of the PDB suggests a finite set of folds that nature reuses. That's different from conserved in the sense of "do proteins share a common ancestor".

Douglas Theobald said...

When we say that "structure is more conserved than sequence", we mean it in the following precise evolutionary sense: the rate of change from one amino acid to another in the primary structure (i.e. sequence) of a protein is much greater than the rate of change from one protein fold to another (tertiary structure), or from one secondary structure element to another (SSEs like alpha helices, beta strands, pi-helices, turns, etc.). The empirical observation that very different sequences can fold into the same structure is strong evidence for the conservation of structure in the above sense, but that fact is not the definition of structural conservation.
As a protein evolves over time, mutations that change the structure of a protein, causing it to unfold or otherwise disrupt its topological conformation, are very likely to be detrimental, and they are removed by negative (purifying) selection. On the other hand, those mutations that preserve the fold of a protein and either maintain or improve its function (corresponding to nearly neutral and selectively beneficial mutations, respectively) are retained and become fixed as substitutions. For the many reasons I have listed already, we know that a protein can absorb and accommodate a very large number of substitutions with retention of tertiary fold (and function). The PDB database is filled with examples of this, where proteins have changed by up to 80% or more of their amino acids, yet they still have significant sequence similarity (indicating true homology, as in common descent from an ancestral protein domain) and yet they still have the same fold. In contrast, there are no examples, out of the more than 30,000 different protein domains in the PDB database, where we have significant sequence similarity and vastly different folds (e.g., folds that differ by more than a few trivial SSEs or that have unrelated topological connections among the SSEs). During the evolution of a protein domain, its 3D structure changes very little relative to the change in its sequence. This is what we mean by "structure is more conserved than sequence".
So, to answer your question "do these flagellar proteins share a common ancestor?". For the reasons given above, it is extremely suspicious, to put it mildly, to have a case of significant sequence similarity yet wildly different folds -- but this is precisely what L&O reported without blinking. If true, that would mean that during the divergence of these flagellar proteins from a common ancestor, the sequence changed much less than the structure, i.e. that in this case sequence was more conserved than structure. As I have already explained, this is unheard of empirically and is completely unexpected theoretically. Thus, the observation of different folds in these proteins is strong evidence against their common ancestry.

Larry Moran said...

I think there are some very good examples of proteins that have a common fold but are not homologous.

But that's not what Nick and Douglas (and lots of others) are arguing. What they're saying is that there are no examples of proteins that are descended from a common ancestors but have completely different folds.

As has already been pointed out, Liu & Ochman proposed that all flagellar proteins evolved from a common ancestor. They claimed that the relationship could be detected by sequence similarity. Furthermore, they claimed that this homology has escaped everyone who has ever looked for such similarities over the past two decades. (That includes me, by the way.)

Liu & Ochman failed to consider any of the implications of their claim, including the fact that some of their so-called homologues have very different structures.

There's no way to disguise the fact that this is bad science. I recognized it as soon as I read the paper and so did dozens of other scientists. (But apparently not the reviewers.)

If any of my students made such claims without realizing and discussing the implications then they would get a very poor grade. It's already been pointed out that the clear implications of this result would overturn much of what we know about molecular evolution.

I don't understand why we have to refrain from pointing this out on blogs or in other forums. Liu & Ochman don't get automatic protection from criticism just because they published in PNAS. It doesn't follow that the only way to comment on their work is to go through the lengthy peer review process of publishing a rebuttal. For one thing that would restrict any criticism to only those people who could afford the time and the money to publish.

I see the scientific literature as being no different than publishing a book, or an article in the New York Times. You put your ideas out in public and you take your chances. If everyone heaps praise on your findings then you will have no problem with that praise appearing in blogs, press releases, popular magazines, TV shows, or whatever. You can bet your life that Liu & Ochamn would have gladly appeared on Oprah if their paper really did destroy Intelligent Design Creationism. It's likely that they would have been paid to give plenary lectures long before any confirmation of their work had been published in the scientific literature.

If you're willing to accept the kudos when you get things right then you damn well better be willing to take your lumps when you get it wrong.

Pinko Punko said...

I don't think anyone was claiming that L/O shouldn't be criticized. It was more the fact that what was present was more similar to private conversations like "wow, this paper is a total piece of shit." I really respect Nick for his dedication to this, my previous comments were that he could have at least said "I have communicated my concerns to L/O and am awaiting their response, but here they are."

Much of the arguments were coming from blog commenters that are truly interested in science but are not scientists, so we get tons of comments at PZ's slamming PNAS like first year grad students, we get people slamming or parroting Nick's structure argument without understanding it (which was not presented clearly at PT). The structure argument is presented above in much clearer fashion. I didn't want to rehash my previous comments on the matter, but if the paper is as bad as it appears to be, then it is possible that the authors could have retracted it or corrected their own errors when apprised of the criticisms. Some of the criticisms were merely "listen, grasshopper, disagreements happen is science all the time. We very much want to get to the bottom of this, and in many cases disagreements get shoved under the rug. We don't want that. Perhaps looking back over how things have unfolded, could things been dealt with better or more collegially?" I think the answer to that is "yes."

What I really wanted to add was this recent Mol Cell paper on an E. coli protein containing a domain that has a totally different structure than its sequence homolog. The point is that this type of occurrence is incredibly novel, but for the most part from the evidence we have from structures, if sequence homology can be detected there will be structural homology, and that structural similarities between proteins can be used to identify conserved sequence motifs that otherwise would be incredibly difficult to identify by alignment alone. Just another version of structure more conserved than sequence argument (in most cases).

Here is one case where that is purportedly not true:

http://www.molecule.org/content/article/abstract?uid=PIIS1097276507001219

E. coli RfaH is a sequence homolog of NusG. It contains an alpha-helical coiled-coil domain where NusG has a Beta-barrel. Another case of similar sequence/different structure is of course proteins that can adopt multiple folds dependent on context like prion proteins. These are the exceptions of course and not the rule.

Pinko Punko said...

The link got truncated. here it is.

Douglas Theobald said...

Thanks Pinko -- the report in that Mol Cell paper is brilliant, very very cool. I've been waiting for somebody to give a counterexample that I'd missed (this is barely a month old, so I'm not too behind here). This paper is a good example, however, of how such a discovery should be handled. Belogurov et al. were very aware of the novelty of their discovery, and they devote a significant part of the text and figures to analyzing the implications. I'll note that in this case there is very good sequence similarity evidence for the common ancestry of these two dissimilar folds (one is a coiled-coil, two alpha-helices, while the other is a beta-barrel, a standard SH3 domain). To put this in perspective, however, note that there are about 30,000 unique domains in PDB database, which means there are around a half a trillion different possible pairwise comparisons that could be done. One of those comparisons has just given us the first indisputable case of fold-to-fold evolution. Also, these domains are on the small side, only 45 amino acids in length, which would make such a transition more probable than in a larger domain (domains usu. range from around 100 to 250 aa). IOW, it's extraordinarily rare, as Pinko points out.

Mark Pallen said...

It is also worth highlighting the work on prion proteins, which flip between two apparently different structures (one solved, the other predicted) when going from healthy to disease state, despite retaining the same amino acid sequence. See e.g. PMID: 16675391. BUT this again is very much the exception. The presumption is that is two proteins that have clearly different folds are highly unlikely to share a common ancestor.