Spencer Bliven

Thoughts and Research

All things lead to Philosophy

May 27, 2011 | Posted in Math, Tagged , , , ,

The following meme has been kicking around reddit and other sites recently:

Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at "Philosophy". –xkcd #903 alttext

Clearly this is not true for every single article (for instance, it is currently not true for "Philosophy" itself), but it is true for a surprising number of pages. There are hundreds of forum posts by people marveling over the length of the path from cheese to philosophy and drawing deep connections to the scourge of lactose intolerance. And of course, the popularity of the meme has lead to an extensive edit war between people trying to ‘fix’ pages such that they conform to the rule, and people trying to ‘break’ pages out of spite, and those just trying to revert all the changes done by the previous two groups. In my opinion, the fact that pages like “Philosophy” exist seems very unsurprising to me.

Wikipedia conjecture. Consider a directed graph where every node has outgoing degree exactly one. If nodes are added randomly in a scale-free manner (eg the probability of linking to an existing node is proportionate to that node’s incoming degree), then the expected fraction of nodes in the largest connected component will increase monotonically.
Philosophy corollary. Given a sufficiently large such graph, there will be a node which is reachable from a large fraction of the graph. Here “large” is defined as “large enough to stimulated discussion by bored reddit denizens.”

Perhaps I’ll try to prove this after quals, when I have more time for nonsense.

Java Interpreters

April 29, 2011 | Posted in Technology, Tagged , ,

I’m a big fan of the iPython interpreter. I like having an interpreter running while I develop for prototyping and debugging. Since I currently develop in java primarily, I thought I’d take a look at what java interpreters are available. I had three main features which I wanted. In order of importance:

  1. Basic history and command editing, at least as good as bash.
  2. Autocompletion. At a minimum, autocomplete built-in commands and previously seen code. Ideally, autocomplete instance methods from loaded libraries.
  3. Eclipse compatibility. Ideally, it should run as a eclipse plugin. Baring that, it should be able to find the most recently compiled version of class files (for instance, through a local maven repo).

Sadly, none of the solutions I found fulfilled all three of my requirements.

1. Groovy

Groovy is a dynamic language that runs on the JVM. Java code is valid Groovy code, but groovy includes a lot of nice dynamic features ala python or ruby, such as dynamic typing. The community feels very rails-like, with a popular agile web server (Grails) which holds most of the die-hard interest, and plenty of hip conferences.

Pros: Dynamic language, fully compatible with java. Comes with an interpreter. Active development, including a MacPorts installer. Strong community. Bash-like history feature.
Cons: Hard to configure correctly (classpaths, maven integration, etc). No autocompletion, no eclipse integration.

2. BeanShell

BeanShell is a java interpreter. It’s actually quite similar to Groovy, but positions itself as a dev tool rather than a new language. The documentation refers to autocompletion features, but they didn’t work for me. It only seems to have been developed for 6 months in 2005 by two developers, so that’s not surprising.

Pros: Bash-like history feature. Embeddable!!!! (<--this is useless to me.)
Cons: No development since 2005, no autocompletion, no eclipse integration, tricky to get classpath right.

No autocompletion? Did you try jLine, you ask? Yes, I did find and follow those arcane instructions for wrapping BeanShell with the java version of readline. It was a pain, and it didn’t even autocomplete words from my history. FAIL.

3. EclipseShell

The command-line tools didn’t seem to be cutting it, so I checked out EclipseShell, which is an eclipse wrapper for BeanShell. This one almost worked, but feels like a beta or first release. Screenshots show autocompletion, but it sure doesn’t work on current versions of eclipse. The interface tries to be this matlab-style cell format, but just looks like a text editor. In short, good idea but no followthrough.

Pros: Edit like a text file, eclipse integration, easy installation.
Cons: broken autocompletion, no recent development.

My location over the past year

April 22, 2011 | Posted in Technology, Tagged , ,

Recently there has been much ado over the discovery that the iPhone keeps a log of everywhere you’ve been. I choose to push my paranoia aside and focus on the benefits of this: a cool app that lets you visualize your travels.

Here’s my map, from last june through the present. You can see my route on 8/26-27/2010 when I drove from Seattle to San Diego. And there’s a video below! A few notes:

  • Observations are binned into rectangles of 1/100 of a degree. The size of the circles represents the number of observations in that square centidegree over the time period.
  • Location is calculated by cell towers rather than GPS, so it’s not very accurate. For instance, I have never been to Eureka, CA, despite a number of observations to the contrary.
  • The phone doesn’t seem to log locations unless you switch towers, so the movie skips over periods of time where I stayed in one place
  • I had to modify the source code slightly to get the movie below to progress in 1-hour intervals. The original showed 1-week timesteps in a half-hearted attempt to prevent housewives from using it to spy on their husbands.

Secure Synergy

April 15, 2011 | Posted in Technology, Tagged , ,

Synergy is a really cool little program that allows one to share a keyboard, mouse, and clipboard seamlessly between multiple computers. I have it set up at work so that I can use my desktop keyboard and mouse to control my laptop.

I’ve been happy with it, but this morning it occurred to me that anyone on my work network could theoretically view all my keystrokes. So today I implemented a script to securely connect to the synergy server from my laptop. It is based on a suggestion from the synergy FAQ.

#!/bin/bash
# Opens a secure synergyc connection
# 
# usage: synergyc_secure server [synergyc options]
#
# Author: Spencer Bliven

SERVER="${1:-desktop}"
shift
LPORT=24800
RPORT=24800

ssh -x -f -L $LPORT:localhost:$RPORT -o ExitOnForwardFailure=yes \
    "$SERVER" 'sleep 10' &&
synergyc "$@" localhost:$LPORT

Site Updates: Research

February 2, 2011 | Posted in Updates, Tagged

Graduate Cuisine

January 28, 2011 | Posted in General, Tagged

I attend a lot of seminars as a grad student. Most of these include free food, which is both a blessing and a curse. A typical week for me:

MondayNo seminars on Mondays, ever. Get lunch from the taco truck which parks just outside my window.
Tuesday Bourne Journal Club. Papa John’s sausage & pepperoni pizza.
Wednesday Mass spec seminar. Papa John’s sausage & pepperoni pizza.
Thursday Bioinformatics seminar. Papa John’s sausage & pepperoni pizza, cookies, or faculty lunch (depends on week).
Friday Pevzner Journal Club. Costco sandwiches. Usually free beer at a happy hour (CS, BMS, Pharmacy, or our lab, depending on week).

Not exactly balanced, but it’s difficult to say no to free food!

Surprises with floating point operations

December 17, 2010 | Posted in Programming,Technology, Tagged , , , , , , , ,

At work I am currently writing software that calculates some thermodynamic properties of proteins. I recently refactored some of the code and I wanted to make sure that I didn’t screw something up and change the output. So I was concerned when I compared the old and new output and discovered some differences:

Corex output before and after the refactor. The second column of numbers in each file gives the conformational entropy (Sconf).

Looking into the difference further, I finally tracked it back to a single change in the C source code. The original programmer liked to write out additions explicitly: Sconf=Sconf+backboneEntropy+sidechainEntropy; During the refactor I changed this to the more readable (IMHO) Sconf += backboneEntropy + sidechainEntropy;

This small change resulted in all the numerical differences I was seeing. To better understand the reason for the difference, here’s a simple C program:

#include <stdio.h>
int main(int args, char *argv[]) {
    float a, b, sum1, sum2;

    a= 4.1;
    b = -0.12;
    sum1 = sum2 = 11.9;

    sum1 = sum1 + a + b;
    sum2 += a + b;

    printf("=+ %f\n+= %f",sum1,sum2);
    return 0;
}

This yields the output =+ 15.880000 += 15.879999

Here is the unoptimized assembly code for lines 9-10, commented with “pseudo-c” descriptions of what’s happening. XMM0 and XMM1 are two 128-bit registers used by 64-bit processors (like my Intel Core 2 duo) for floating point operations. However, the ‘ss’ at the end of all the operations means that only the lowest 32 bits are used in the computation, with overflow being discarded.

	.loc 1 9 0    ;line 9
	movss	-12(%rbp), %xmm0  ; XMM0 = sum1
	addss	-4(%rbp), %xmm0  ; XMM0 += a
	addss	-8(%rbp), %xmm0  ; XMM0 += b
	movss	%xmm0, -12(%rbp)  ; sum1 = XMM0
	.loc 1 10 0    ;line 10
	movss	-4(%rbp), %xmm0  ; XMM0 = a
	movaps	%xmm0, %xmm1  ; XMM1 = XMM0
	addss	-8(%rbp), %xmm1  ; XMM1 += b
	movss	-16(%rbp), %xmm0  ; XMM0 = sum2
	addss	%xmm1, %xmm0  ; XMM0 += XMM1
	movss	%xmm0, -16(%rbp)  ; sum2 = XMM0

So basically, the difference between “=…+” and “+=” is just the order of operations; the former does (sum1 + a) + b while the latter does sum1 + (a + b). The small differences in rounding behavior between these two variants add up to account for the 0.01 differences I was observing in my program.

Executing commands from TextWrangler/BBEdit

December 6, 2010 | Posted in Technology, Tagged , , , , , , , ,

I use TextWrangler occasionally for editing code. It has a well-developed applescript dictionary, so I decided to write a script to execute commands directly from TextWrangler.

The following Applescript copies either the selected text or the current line into your application of choice. For instance, “iTerm” is a reasonable choice, as that would allow you to start a command (ipython, for instance), and then quickly modify and execute (python) statements without overwriting your clipboard. The pastebin is configured for R code since I originally wrote it in response to a request by an R user.

EDIT: I also uploaded a pre-compiled version of the script (using Terminal).

Daily WTF

December 1, 2010 | Posted in Technology, Tagged , , , , , ,

Whenever I’m writing C code I am amazed that we have to keep track of things like array lengths and string terminators. It’s no wonder poor coders create weird bugs, like this one I found in my work code for storing atom names (like chlorine-1, carbon-2, etc).

char names[MAX_NAMES][6];
int num_names, i;
for(i=0; i < num_names; i++) {
    printf("(%d) %s\n", i, names[i]);
}

This is what I see:

(0)  CL1   CL2   C2 
(1)  CL2   C2 
(2)  C2 
(3)  CD2   N2 
(4)  N2 
...

WTF? Turns out there are exactly three spaces around each of those strings, making the 3-character names overflow. That’s what happens when you use strcpy on strings that are too short to contain the result.

Probabilistic Towers of Hanoi

November 23, 2010 | Posted in Math, Tagged , , , ,
The Lunch of Hanoi

Yellow curry on rice, Thai iced tea, and mango sticky rice.

At lunch after stats class I received the three to-go boxes at right. Probably due to some sort of brain-addling by aforementioned stats class, this lead me to think about the Towers of Hanoi problem, and how in real life stacking problems (eg lunches, moving trucks, etc) it is often ok to have one or two big boxes stacked on top of little boxes. This led to the following problem:

Probabilistic Towers of Hanoi

Like the Towers of Hanoi problem, we have three pegs and a stack of N disks. Disks are initially sorted on one peg, and the goal is to move single disks between pegs until all the disks are stacked in order on the second peg.

Unlike Towers of Hanoi, we allow larger disks to be stacked on top of smaller disks. This exponentially reduces the number of moves required, since it effectively reduces the stack height by 1 (recall that solving Towers of Hanoi requires 2^n-1 moves).

However, every time we stack a larger disk on top of a smaller disk, the tower falls over with probability p. In general, p could be some function g(\cdot) which varies depending on the differences in disk sizes, the height of the stack, etc., or it could be something simple like a fixed number. If the tower falls we lose the game and civilization is destroyed. Alternately, maybe we just have to redo the fallen tower, wasting some moves to do so.

The probabilistic towers of hanoi problem then becomes: Given some probability \epsilon, what is the best strategy for moving disks such that the expected probability of failure is less than \epsilon and the expected number of moves is minimal?

Solution

If there is a fixed probability p of losing the game each time we add a disk to a stack that contains a smaller disk, we can make at most \left\lfloor \frac{log(1-\epsilon)}{log(1-p)}\right\rfloor inverted moves. These should be distributed in such a way as to reduce the number of moves as much as possible. For one inverted move, this is as follows:

  1. Disks 1…(n-2) from A to C (2^{n-2}-1 moves)
  2. Disk (n-1) from A to C (1 inverted move)
  3. Disk n from A to B (1 move)
  4. Disk (n-1) from C to B (1 move)
  5. Disks 1…(n-2) from C to B (2^{n-2}-1 moves)

This takes a total of 2^{n-1}-1 moves, including one inversion. That’s around 50% better than the original algorithm. If additional inverted moves are allowed while keeping the probability of failure low, they should be distributed evenly between steps 1 and 5 to further reduce the moves. Given sufficient inverted moves, this procedure will move a stack of n disks in only 3(2^{n/2}-1) moves.

Open Problems

The fixed-probability case isn’t particularly interesting statistically. A more interesting probability function would be some physics-inspired statistic reflecting how ‘top-heavy’ a pile is; a large disk perched precariously on top of a stack would be more likely to fall than a medium disk on a slightly smaller base. I feel like some surprising behavior could emerge in these situations requiring more clever algorithms.

For even n. For odd n, it takes
2^{(n+3)/2 }-3)
.