Spencer Bliven

Thoughts and Research


I had a great time attending 3DSIG and ISMB/ECCB in Vienna. The quality of the talks was very high and it was fun to meet so many other computational biologists. It is nice to finally put faces and personalities to names which I previously knew only through their papers.

My work was included twice at the conference. Andreas had a poster and laptop demo of the CE-CP and CE-symm tools as part of his poster ‘The RCSB PDB Protein Comparison Tool’ at 3DSIG. I also had a poster of my own with the rather presumptuous title “A comprehensive Review of Protein Fold Space and the Correlation of Structure with Function.

Passed Quals!

July 22, 2011 | Posted in General, Tagged

On June 7th I officially passed my qualifying exam. It consisted of a written report and an oral presentation. Many thanks to my committee: Ruben Abagyan, Pat Jennings, and Andy MacCammon.

All things lead to Philosophy

May 27, 2011 | Posted in Math, Tagged , , , ,

The following meme has been kicking around reddit and other sites recently:

Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at "Philosophy". –xkcd #903 alttext

Clearly this is not true for every single article (for instance, it is currently not true for "Philosophy" itself), but it is true for a surprising number of pages. There are hundreds of forum posts by people marveling over the length of the path from cheese to philosophy and drawing deep connections to the scourge of lactose intolerance. And of course, the popularity of the meme has lead to an extensive edit war between people trying to ‘fix’ pages such that they conform to the rule, and people trying to ‘break’ pages out of spite, and those just trying to revert all the changes done by the previous two groups. In my opinion, the fact that pages like “Philosophy” exist seems very unsurprising to me.

Wikipedia conjecture. Consider a directed graph where every node has outgoing degree exactly one. If nodes are added randomly in a scale-free manner (eg the probability of linking to an existing node is proportionate to that node’s incoming degree), then the expected fraction of nodes in the largest connected component will increase monotonically.
Philosophy corollary. Given a sufficiently large such graph, there will be a node which is reachable from a large fraction of the graph. Here “large” is defined as “large enough to stimulated discussion by bored reddit denizens.”

Perhaps I’ll try to prove this after quals, when I have more time for nonsense.

Java Interpreters

April 29, 2011 | Posted in Technology, Tagged , ,

I’m a big fan of the iPython interpreter. I like having an interpreter running while I develop for prototyping and debugging. Since I currently develop in java primarily, I thought I’d take a look at what java interpreters are available. I had three main features which I wanted. In order of importance:

  1. Basic history and command editing, at least as good as bash.
  2. Autocompletion. At a minimum, autocomplete built-in commands and previously seen code. Ideally, autocomplete instance methods from loaded libraries.
  3. Eclipse compatibility. Ideally, it should run as a eclipse plugin. Baring that, it should be able to find the most recently compiled version of class files (for instance, through a local maven repo).

Sadly, none of the solutions I found fulfilled all three of my requirements.

1. Groovy

Groovy is a dynamic language that runs on the JVM. Java code is valid Groovy code, but groovy includes a lot of nice dynamic features ala python or ruby, such as dynamic typing. The community feels very rails-like, with a popular agile web server (Grails) which holds most of the die-hard interest, and plenty of hip conferences.

Pros: Dynamic language, fully compatible with java. Comes with an interpreter. Active development, including a MacPorts installer. Strong community. Bash-like history feature.
Cons: Hard to configure correctly (classpaths, maven integration, etc). No autocompletion, no eclipse integration.

2. BeanShell

BeanShell is a java interpreter. It’s actually quite similar to Groovy, but positions itself as a dev tool rather than a new language. The documentation refers to autocompletion features, but they didn’t work for me. It only seems to have been developed for 6 months in 2005 by two developers, so that’s not surprising.

Pros: Bash-like history feature. Embeddable!!!! (<--this is useless to me.)
Cons: No development since 2005, no autocompletion, no eclipse integration, tricky to get classpath right.

No autocompletion? Did you try jLine, you ask? Yes, I did find and follow those arcane instructions for wrapping BeanShell with the java version of readline. It was a pain, and it didn’t even autocomplete words from my history. FAIL.

3. EclipseShell

The command-line tools didn’t seem to be cutting it, so I checked out EclipseShell, which is an eclipse wrapper for BeanShell. This one almost worked, but feels like a beta or first release. Screenshots show autocompletion, but it sure doesn’t work on current versions of eclipse. The interface tries to be this matlab-style cell format, but just looks like a text editor. In short, good idea but no followthrough.

Pros: Edit like a text file, eclipse integration, easy installation.
Cons: broken autocompletion, no recent development.

My location over the past year

April 22, 2011 | Posted in Technology, Tagged , ,

Recently there has been much ado over the discovery that the iPhone keeps a log of everywhere you’ve been. I choose to push my paranoia aside and focus on the benefits of this: a cool app that lets you visualize your travels.

Here’s my map, from last june through the present. You can see my route on 8/26-27/2010 when I drove from Seattle to San Diego. And there’s a video below! A few notes:

  • Observations are binned into rectangles of 1/100 of a degree. The size of the circles represents the number of observations in that square centidegree over the time period.
  • Location is calculated by cell towers rather than GPS, so it’s not very accurate. For instance, I have never been to Eureka, CA, despite a number of observations to the contrary.
  • The phone doesn’t seem to log locations unless you switch towers, so the movie skips over periods of time where I stayed in one place
  • I had to modify the source code slightly to get the movie below to progress in 1-hour intervals. The original showed 1-week timesteps in a half-hearted attempt to prevent housewives from using it to spy on their husbands.

Secure Synergy

April 15, 2011 | Posted in Technology, Tagged , ,

Synergy is a really cool little program that allows one to share a keyboard, mouse, and clipboard seamlessly between multiple computers. I have it set up at work so that I can use my desktop keyboard and mouse to control my laptop.

I’ve been happy with it, but this morning it occurred to me that anyone on my work network could theoretically view all my keystrokes. So today I implemented a script to securely connect to the synergy server from my laptop. It is based on a suggestion from the synergy FAQ.

# Opens a secure synergyc connection
# usage: synergyc_secure server [synergyc options]
# Author: Spencer Bliven


ssh -x -f -L $LPORT:localhost:$RPORT -o ExitOnForwardFailure=yes \
    "$SERVER" 'sleep 10' &&
synergyc "$@" localhost:$LPORT

Site Updates: Research

February 2, 2011 | Posted in Updates, Tagged

Graduate Cuisine

January 28, 2011 | Posted in General, Tagged

I attend a lot of seminars as a grad student. Most of these include free food, which is both a blessing and a curse. A typical week for me:

MondayNo seminars on Mondays, ever. Get lunch from the taco truck which parks just outside my window.
Tuesday Bourne Journal Club. Papa John’s sausage & pepperoni pizza.
Wednesday Mass spec seminar. Papa John’s sausage & pepperoni pizza.
Thursday Bioinformatics seminar. Papa John’s sausage & pepperoni pizza, cookies, or faculty lunch (depends on week).
Friday Pevzner Journal Club. Costco sandwiches. Usually free beer at a happy hour (CS, BMS, Pharmacy, or our lab, depending on week).

Not exactly balanced, but it’s difficult to say no to free food!

Surprises with floating point operations

December 17, 2010 | Posted in Programming,Technology, Tagged , , , , , , , ,

At work I am currently writing software that calculates some thermodynamic properties of proteins. I recently refactored some of the code and I wanted to make sure that I didn’t screw something up and change the output. So I was concerned when I compared the old and new output and discovered some differences:

Corex output before and after the refactor. The second column of numbers in each file gives the conformational entropy (Sconf).

Looking into the difference further, I finally tracked it back to a single change in the C source code. The original programmer liked to write out additions explicitly: Sconf=Sconf+backboneEntropy+sidechainEntropy; During the refactor I changed this to the more readable (IMHO) Sconf += backboneEntropy + sidechainEntropy;

This small change resulted in all the numerical differences I was seeing. To better understand the reason for the difference, here’s a simple C program:

#include <stdio.h>
int main(int args, char *argv[]) {
    float a, b, sum1, sum2;

    a= 4.1;
    b = -0.12;
    sum1 = sum2 = 11.9;

    sum1 = sum1 + a + b;
    sum2 += a + b;

    printf("=+ %f\n+= %f",sum1,sum2);
    return 0;

This yields the output =+ 15.880000 += 15.879999

Here is the unoptimized assembly code for lines 9-10, commented with “pseudo-c” descriptions of what’s happening. XMM0 and XMM1 are two 128-bit registers used by 64-bit processors (like my Intel Core 2 duo) for floating point operations. However, the ‘ss’ at the end of all the operations means that only the lowest 32 bits are used in the computation, with overflow being discarded.

	.loc 1 9 0    ;line 9
	movss	-12(%rbp), %xmm0  ; XMM0 = sum1
	addss	-4(%rbp), %xmm0  ; XMM0 += a
	addss	-8(%rbp), %xmm0  ; XMM0 += b
	movss	%xmm0, -12(%rbp)  ; sum1 = XMM0
	.loc 1 10 0    ;line 10
	movss	-4(%rbp), %xmm0  ; XMM0 = a
	movaps	%xmm0, %xmm1  ; XMM1 = XMM0
	addss	-8(%rbp), %xmm1  ; XMM1 += b
	movss	-16(%rbp), %xmm0  ; XMM0 = sum2
	addss	%xmm1, %xmm0  ; XMM0 += XMM1
	movss	%xmm0, -16(%rbp)  ; sum2 = XMM0

So basically, the difference between “=…+” and “+=” is just the order of operations; the former does (sum1 + a) + b while the latter does sum1 + (a + b). The small differences in rounding behavior between these two variants add up to account for the 0.01 differences I was observing in my program.

Executing commands from TextWrangler/BBEdit

December 6, 2010 | Posted in Technology, Tagged , , , , , , , ,

I use TextWrangler occasionally for editing code. It has a well-developed applescript dictionary, so I decided to write a script to execute commands directly from TextWrangler.

The following Applescript copies either the selected text or the current line into your application of choice. For instance, “iTerm” is a reasonable choice, as that would allow you to start a command (ipython, for instance), and then quickly modify and execute (python) statements without overwriting your clipboard. The pastebin is configured for R code since I originally wrote it in response to a request by an R user.

EDIT: I also uploaded a pre-compiled version of the script (using Terminal).