Spencer Bliven

Thoughts and Research

ISMB 2012

ISMB Badge

My badge for ISMB, showing admission to 3DSIG and BOSC, as well as ISMB membership and Open Access pride.

I just got back from a fantastic ISMB 2012. The talks were very good this year, and it was fun to see many people I met last year in Vienna again.

I presented three documents at the conference:

I went to talks at BOSC, 3DSIG, and ISMB. Here are a my favorites:

  • Jonathan Eisen Science Wants to Be Open: If Only We Could Get Out of Its Way (BOSC)
    Great talk about the history of PLoS and the barriers to Open Access. Eisen calls for hiring and tenure committees to actually read papers, rather than making snap judgements based on journal names.
  • Titus Brown Doing Next-gen Sequencing Analysis in the Cloud (BOSC)
    Titus manages to speed up, reduce memory, and improve accuracy of genome assembly by throwing away unneeded reads.
  • Chris Sander Decoding genetic variation to compute 3D structures of proteins (3DSIG)
    Sander talks about his new EVFold method for predicting structure from multiple alignments. It’s a very exciting method with the potential to really leverage nextgen sequencing for structure prediction.
  • John-Marc Chandonian The next generation of SCOP and ASTRAL (3DSIG)
    SCOP has now given up on fully manual classification and started doing automated classification of close homologues, called SCOP 1.75A. They’re also working on SCOP 2.0, which will change from a structure hierarchy to a DAG representing both evolution and structural similarity, add hyperfamilies, show evolutionary clades, and allow tagging with other info like Uniprot IDs.
  • Ada Yonath What was first? The genetic code or its products (3DSIG)
    Ada talked a bit about her Nobel Prize work on the ribosome. I was particularly interested to hear how symmetry is essential at the ribosomal catalytic site. However, the most thought-provoking bit of her talk was speculation about the RNA-only protoribosome, which she thinks was a small, symmetric RNA dimer.
  • Sheng Wang Protein structure alignment beyond spatial proximity (3DSIG)
    DeepAlign is a good-looking protein alignment algorithm that uses both sequence and structural similarity in its scoring term. It also includes orientation information, so it’s better at avoiding register errors in sheets and helices. Oh, and it’s as fast as TM-Align! Jianzhu Ma from the same lab gave good a talk at ISMB about using DeepAlign for protein threading, so it’s practically useful.
  • Saliha Ece Acuner Ozbabacan Enriching the human apoptosis pathway by predicting the structures of protein-protein complexes (ISMB)
    Haiyuan Yu Understanding human disease through 3D protein interactome network (ISMB)
    Yu Xia A three-dimensional map of protein networks within and between species (ISMB)
    Several ISMB talks focused on using structural information about protein-protein interactions. Saliha Ece Acuner Ozbabacan introduced a pretty flexible method for modelling PPI structuctures based on structurally similar interfaces, although I wonder about the false positive rate. Haiyuan Yu showed that by localizing mutations to a particular protein interface we can much more accurately predict downstream effects and pathology, as opposed to just binning them by gene. Yu Xia compared viral-host interfaces with host-host interfaces and found significant overlap. Furthermore, the overlapping regions are rapidly evolving, suggesting a race between viruses creating inhibitors and hosts trying to avoid them. The increasing availability of protein comple structures and high-quality models seems to have stimulated some very interesting biology in the last few years.
  • Karen Lasker Molecular architecture of the 26S proteasome holocomplex determined by an integrative approach (ISMB)
    Andrej Šali Integrative Structural Biology (ISMB)
    Crystal structures are known for some subunits of the proteasome, but the whole complex seems to be too flexible to crystallize. So instead they took a low-resolution EM structure of the whole complex and combined it with high-resolution monomers, PPI information, and cross-linking data to get the complete structure. Karen talked about the proteasome in depth, while Andrej sold us on the integrative method as the next great thing for solving big biological structures. Phil has a planning grant relating to visualizing big hybrid structures, so we’re hoping to see more big complexes solved like this in the future.
  • Carl Kingsford Uncovering Ancient Networks from Present-Day Interactions (ISMB)
    Carl has an interesting method for creating plausible ancestral networks (eg the PPI network of LUCA/LUCE). Unfortunately it uses a pretty naive model of evolution, so the results can’t be taken too literally. Maybe this could be applied to structural similarity networks, given a smarter evolutionary model?
  • Arun Konagurthu Minimum Message Length Inference of Secondary Structure from Protein Coordinate Data (ISMB)
    The problem with secondary structure assignment is that no one can agree on exactly the same definition. Although it may be a little too complex for most biologists, Arun proposes an interesting definition based on the concept of compression, of all things. It’s a compelling mathematical definition, although I’ll have to see more examples to know if it’s really any good.
  • Alex Bateman Assessing the contribution of scientists to Wikipedia for Pfam and Rfam annotation (ISMB)
    Long before Topic Pages started, Alex started utilizing Wikipedia to crowd-source RNA and protein family annotation. He’s now compiled some fascinating data on how Rfam/Pfam articles grow, finding that they tend to be expanded gradually by wikipedians, punctuated by large, short increases when a scientist or annotator takes an interest. He also notes that professional curators are still essential for maintaining the quality and organization of the annotations, even if the brunt of the workload is distributed across the community.
  • Michal Linial Viral-host coevolution: Playing ‘seek and hide’ (ISMB)
    Viruses often steal genes from their hosts. Michal looks at such cases and finds a couple ways viruses modify the proteins. Because of the pressure viruses face for small genome sizes, they tend to shed unnecessary parts of the genes, such as domain linkers or even whole domains. This brings up some very interesting open questions about the function of those domains to the virus. Maybe they can compensate for domain loss by using symmetry and homomers? It would be interesting to look!
  • Fantastic (I thought) talks by my colleagues and collaborators:
    Andreas Prlić How to Use BioJava to Calculate One Billion Alignments at the RCSB PDB website (BOSC)
    Andreas Prlić Internal Pseudo-symmetry in Proteins (ISMB)
    Peter Rose Efficient Searching and Mining of the RCSB Protein Data Bank (ISMB)
    Philip Bourne Hiring and Supervising (ISMB)
    Lei Xie A structural systems biology approach to polypharmacological drug discovery (ISMB)
    Andreas talked about our research on symmetry and protein alignments. He has some really intriguing examples of why protein symmetry is important in his slides. Peter highlighted changes to the PDB over the last year, and Phil drew from his “Ten simple rules” series for a professional development talk. It was really nice to see Lei again, and hear about his latest research with his new lab at Hunter college.

The conference was fun too. It was my first time in Long Beach, and I grew to like the downtown area. The colored lights that project everywhere from fountains to bus stops at night are a little kitsch, but the restaurants and nightlife are nice. The Monday-night reception at the Aquarium of the Pacific was phenomenal—I want to go back and spend some more time with the bat rays! They have a couple cool webcams set up in lieu of visiting.


Topic Page update

April 30, 2012 | Posted in Science

Two weeks ago our Wikipedia article on Circular permutation in proteins was featured in on Wikipedia’s mainpage as a “Did you know…” article:

Did you know... that the protein Concanavalin A (pictured) cuts itself in two and then reassembles in a circularly permuted order?

We were quickly replaced by the Rice stink bug, but a snapshot of the page Wednesday morning is available from WebCitation.

Submitting an article for DYK is a bit of a hassle: think up a good hook, make sure your article is up to snuff, find someone else to review your hook favorably… However, the results of getting on the main page are stunning. Here’s the number of page views for Circular permutation in proteins this month:

daily page views

Daily page views for Circular permutation in proteins in April 2012. The page was featured on DYK on the labelled day.

Prior to our update, the page was seeing under 20 views per day. Since then, we’ve mostly been getting 20-60 views a day, with a few spikes probably due to attention from blogs. But being featured on the main page for just 9 hours resulted in 1440 hits.

Topic Pages were also featured in Daniel Mietchen’s new post to the PLoS blog, Bridging the Journal-Wikipedia Gap.

Update: In the interest of full disclosure, the hook was badly phrased. Concanavalin does not really cut “itself” in two, as the full procedure requires a restriction enzyme to make the cuts. This is a flaw with the DYK hook rather than the article itself, which merely describes the procedure as an “unusual protein ligation.”

DNA font

February 10, 2012 | Posted in Science,Technology, Tagged , , , , , ,
ATGCXYR

A short DNA 'word' showing the four bases A, T, C, and G, the 'unknown' base X, and the 'pyrimidine' and 'purine' characters Y and R.

I recently downloaded the Deja-Vu font family and discovered the open-source typography community. I thought it would be fun to try to make a font myself. Since much of my time is spent looking at biological sequences, I thought a way to visualize DNA molecules in a text editor would be cool. The result: DNA Type. DNA only contains a few letters, so I’ve only made 7 characters so far. However, you can use it to write secret messages to your fellow bio-nerds, as long as the messages contain only A, T, G, C, Y, R, and X! The message is read off from the top strand.

Spencer, what’s your favorite instrumental electronic band?
RATATAT
What was the name of that movie which predicted the infringement of human rights based on genetic predisposition, a possibility which seems very plausible in todays high-throughput sequencing world?
GATACA

DNA is cool, but I’m really more of a protein guy. With a protein font you can make meaningful amino acid glyphs for the whole alphabet. However, polypeptides are usually displayed alternating up and down. I think this is possible with glyph variants, but I need to learn more about the TrueType format and the FontForge program before attempting something so complex.

Download

Version 0.
Download TrueType font
Download FontForge source file

Known bugs

  • Pretty much unreadable at smaller than 48pts.
  • Only 7 characters. Not even the lowercase works.
  • No hinting for small sizes beyond FontForge’s autohints.
  • Doesn’t work for RNA. No one cares about the difference between Uracil and Thymine, anyway.
  • Ugly glyphs. What are you, an artist? Go back to research.

EVfold

For our weekly journal club I talked about a new method for de novo protein folding called EVfold. [Slides] Details can be read in the paper (plus 15 page supporting text)

Marks, D. S., Colwell, L. J., Sheridan, R., Hopf, T. A., Pagnani, A., Zecchina, R., & Sander, C. (2011). Protein 3D Structure Computed from Evolutionary Sequence Variation. PloS ONE, 6(12), e28766. doi:10.1371/journal.pone.0028766.t001

The authors are motivated by two observations:

“In spite of significant progress in the field of structural genomics over the last decade [20], only about half of all well-characterized protein families (PFAM-A, 12,000 families), have a 3D structure for any of their members [1].”
“As we are about to reach a truly explosive phase of massively parallel sequencing, we anticipate increased coverage of sequence space for protein families by several orders of magnitude, well above the level of 1000–10000 non-redundant sequences for protein family and with rich evolutionary information about protein structure directly from sequence.”

Basically, DNA sequencing is dirt cheap and will only get cheaper, but up until now this hasn’t been helping to solve protein structures.

Marks et al. try to remedy this situation by looking at co-evolving residue pairs. Basically, they hypothesize that residues which are located close together in 3D space will tend to evolve together. If one mutates to a smaller residue, the other will tend to mutate to something bigger to compensate. If one changes from positively charged to negative, the other will change from negative to positive to balance it out. The idea behind EVfold is to identify co-evolving residues from the thousands of sequences we have for some protein families, then use that information to provide distance constraints in order to predict the protein’s structure.

Of course, just because two residues co-vary doesn’t necessarily imply they are spatially close. They could indirectly influence each other, such as if both bind to a ligand or both bind some intermediate residue. So the authors use a technique called direct coupling analysis (DCA) to predict which residues are close together. This has been around for a few years (Weigt et al (2009). PNAS, 106(1), 67–72), although that’s not immediately clear form the paper. DCA assigns a quantity called direct information (DI) to each pair of residues, which correlates really well with whether the pair is close together.

Marks et al. figure S2c. Grey regions indicate residues of Ras protein which are close together in the crystal structure, while red dots indicate pairs which were predicted to be close based on DI.

EVfold takes the top-ranked residue pairs and assumes they are close together. It then uses those pairs as distance constraints to solve the structure. This is identical to using distance constraints from NMR to solve a structure, and uses well-know simulated annealing/molecular dynamics algorithms. At the end, you get lovely protein strucutures with 3-5Å RMSD from the crystal structure.

Marks et al. figure 2. Predicted (left) and observed (right) structures for three proteins. A few minor differences are visible, such as missing beta-strands, but all three predictions are correct overall.

Perhaps the most impressive fact about this is that EVfold is able to predict a structure in less than an hour from only sequence information. That is incredible compared with the days of supercomputer time needed for other ab initio methods like ROSETTA.

So has EVfold solved the structure prediction problem? Hardly. There are many proteins where finding 1000+ homologous sequences will be hard, even with advances in sequencing technology (vertebrate-only proteins, for instance). Also, the authors suggest that even with perfect distance constraints the simulated annealing methods will not be able to predict structures at less that 2Å. So major advances at refining structures are needed before the crystallographers will be out of a job.

Still, there are lots of applications for which 3-5Å models of widespread folds would be useful. For instance, one of the major difficulties I’ve run into in my work on fold space is that we know there are thousands of proteins which are dissimilar to all known structures. Do these represent new folds, or are they just more variants of existing known folds? The speed of EVfold means that it should be fairly easy to predict structures for all of these domains which have enough sequence information out there. That’s not as good as having experimentally determined structures for everything, but it could give us some intriguing insights into the completeness of protein fold space.

PDB40

November 22, 2011 | Posted in Science, Tagged , ,

I had a great time over Halloween at PDB40 at Cold Springs Harbor Laboratory, celebrating 40 years of protein structure. I especially like the talks from older structural biologist about the history of structural biology. It’s amazing to see the difficulties the early structural biologists overcame to solve and analyze proteins, as well as how far we’ve come since Kendrew solved the first protein structure in 1958. Still, I feel a little sad that I missed the days when proteins were solved like this:

John Kendrew with model of myoglobin in progress. © MRC Laboratory of Molecular Biology

A Richards Box, which used mirrors to visually overlay a Kendrew model on top of hand-drawn electron density maps. This public domain image comes from Protopedia and depicts Fred Richards' original box.

and diagrams of proteins were works of art:

Ribbon schematic (hand drawn & colored, in 1981) of the 3D structure of the protein triose phosphate isomerase. The barrel of 8 beta-strands is shown by green arrows and the 8 alpha-helices as brown spirals. By Jane Richardson.

I presented an updated poster with my work on Fold-space [PDF, 3.3MB].

Protein of the Day: GluCl

November 22, 2011 | Posted in Science, Tagged

I mostly post pretty proteins to twitter, but this one deserves more than one image. It’s a membrane pore protein called glutamate-gated chloride channel (GluCl). One of my students pointed it out to me as the target of two drugs to treat filariasis, which refers to a collection of diseases caused by parasitic worms including river blindness and Elephantiasis. They are considered neglected tropical diseases, meaning they’re a big problem in undeveloped countries, but don’t get as much attention from Big Pharma as they warrant because most patients are poor.

GluCl sits in the membrane of worm cells and selectively lets chlorine ions through, depending on the concentration of the molecule glutamate outside the cell. One structure for GluCl is 3RI5. This is in the benign worm C. elegans, but we assume it works similarly in the related disease-causing worms. Click the link and you’ll see this beauty:

Looking down the pore of GluCl. The five "arms" are antibodies, used to stabilize the protein during crystallization.

Sort of looks like a starfish, huh? The grey arms aren’t part of the protein themselves. They’re only there to make it crystallize. A more accurate picture of the protein might be something like this:

A side view of just the GluCl protein. Red and blue patches represent negatively and positively charged portions of the protein. An activator of the protein, Ivermectin, is shown in green. A few other small molecules are also shown in black, which are not important biologically but help stabilize the crystal.

A couple cool things are visible in this picture. First, the region where the protein spans the membrane is clearly visible as a white strip, since since the amino acids within the membrane must be hydrophobic. Second, we can see the drug Ivermectin (green sticks) bound in a pocket near the surface of the membrane. This forces the pore to stay open, flooding worm cells with chlorine and killing them. You can see the pore clearly in the top view:

The large pore in GluCl is visible in this view, which looks down on the protein from outside the cell. At the very bottom, the molecule picrotoxin blocks the channel, preventing chlorine ions from flowing.

The pore is wide at the top, on the outside of the cell, and narrows to a tiny whole at the cytosolic side. In this crystal there are two drugs at work. The Ivermectin is holding the pore in its “open” state, so we can see all the way through the protein. However, it is blocked by another molecule picrotoxin, which “clogs the drain” by binding in the pore.

For more details, read the paper for this structure:

Hibbs, R.E., Gouaux, E. Principles of activation and permeation in an anion-selective Cys-loop receptor. (2011) Nature 474: 54-60 [PubMed] [DOI]

ISMB

I had a great time attending 3DSIG and ISMB/ECCB in Vienna. The quality of the talks was very high and it was fun to meet so many other computational biologists. It is nice to finally put faces and personalities to names which I previously knew only through their papers.

My work was included twice at the conference. Andreas had a poster and laptop demo of the CE-CP and CE-symm tools as part of his poster ‘The RCSB PDB Protein Comparison Tool’ at 3DSIG. I also had a poster of my own with the rather presumptuous title “A comprehensive Review of Protein Fold Space and the Correlation of Structure with Function.

Mass Spectrometry Review of PTMs

I’ve been taking CHEM 283 with Dr. Majid Ghassemian. It’s a practical lab course on mass spectrometry. For the final, we had to analyze a sample of alpha-casein (the main protein in cow milk) and present a report on some aspect of the results. I chose to focus on methods for identifying post-translational modifications using mass spec, as exemplified by the programs MASCOT, InsPecT, and MS-Alignment.

Self-analysis: Good journal-club-type discussion of the algorithms. The data is largely irrelevant to my point (due to lack of InsPecT training on QStar), but I had to fit in my experimental results somehow.

IL-1 Principle Motions

For my rotation with Pat Jennings and José Onuchic I’ve been analyzing simulations of the IL-1 complex. I made up this page to display some of the movies I made of principle motions in IL-1R bound to either IL-1β or the antagonist.