Spencer Bliven

Thoughts and Research

Graduate Cuisine

January 28, 2011 | Posted in General, Tagged

I attend a lot of seminars as a grad student. Most of these include free food, which is both a blessing and a curse. A typical week for me:

MondayNo seminars on Mondays, ever. Get lunch from the taco truck which parks just outside my window.
Tuesday Bourne Journal Club. Papa John’s sausage & pepperoni pizza.
Wednesday Mass spec seminar. Papa John’s sausage & pepperoni pizza.
Thursday Bioinformatics seminar. Papa John’s sausage & pepperoni pizza, cookies, or faculty lunch (depends on week).
Friday Pevzner Journal Club. Costco sandwiches. Usually free beer at a happy hour (CS, BMS, Pharmacy, or our lab, depending on week).

Not exactly balanced, but it’s difficult to say no to free food!

Surprises with floating point operations

December 17, 2010 | Posted in Programming,Technology, Tagged , , , , , , , ,

At work I am currently writing software that calculates some thermodynamic properties of proteins. I recently refactored some of the code and I wanted to make sure that I didn’t screw something up and change the output. So I was concerned when I compared the old and new output and discovered some differences:

Corex output before and after the refactor. The second column of numbers in each file gives the conformational entropy (Sconf).

Looking into the difference further, I finally tracked it back to a single change in the C source code. The original programmer liked to write out additions explicitly: Sconf=Sconf+backboneEntropy+sidechainEntropy; During the refactor I changed this to the more readable (IMHO) Sconf += backboneEntropy + sidechainEntropy;

This small change resulted in all the numerical differences I was seeing. To better understand the reason for the difference, here’s a simple C program:

#include <stdio.h>
int main(int args, char *argv[]) {
    float a, b, sum1, sum2;

    a= 4.1;
    b = -0.12;
    sum1 = sum2 = 11.9;

    sum1 = sum1 + a + b;
    sum2 += a + b;

    printf("=+ %f\n+= %f",sum1,sum2);
    return 0;
}

This yields the output =+ 15.880000 += 15.879999

Here is the unoptimized assembly code for lines 9-10, commented with “pseudo-c” descriptions of what’s happening. XMM0 and XMM1 are two 128-bit registers used by 64-bit processors (like my Intel Core 2 duo) for floating point operations. However, the ‘ss’ at the end of all the operations means that only the lowest 32 bits are used in the computation, with overflow being discarded.

	.loc 1 9 0    ;line 9
	movss	-12(%rbp), %xmm0  ; XMM0 = sum1
	addss	-4(%rbp), %xmm0  ; XMM0 += a
	addss	-8(%rbp), %xmm0  ; XMM0 += b
	movss	%xmm0, -12(%rbp)  ; sum1 = XMM0
	.loc 1 10 0    ;line 10
	movss	-4(%rbp), %xmm0  ; XMM0 = a
	movaps	%xmm0, %xmm1  ; XMM1 = XMM0
	addss	-8(%rbp), %xmm1  ; XMM1 += b
	movss	-16(%rbp), %xmm0  ; XMM0 = sum2
	addss	%xmm1, %xmm0  ; XMM0 += XMM1
	movss	%xmm0, -16(%rbp)  ; sum2 = XMM0

So basically, the difference between “=…+” and “+=” is just the order of operations; the former does (sum1 + a) + b while the latter does sum1 + (a + b). The small differences in rounding behavior between these two variants add up to account for the 0.01 differences I was observing in my program.

Executing commands from TextWrangler/BBEdit

December 6, 2010 | Posted in Technology, Tagged , , , , , , , ,

I use TextWrangler occasionally for editing code. It has a well-developed applescript dictionary, so I decided to write a script to execute commands directly from TextWrangler.

The following Applescript copies either the selected text or the current line into your application of choice. For instance, “iTerm” is a reasonable choice, as that would allow you to start a command (ipython, for instance), and then quickly modify and execute (python) statements without overwriting your clipboard. The pastebin is configured for R code since I originally wrote it in response to a request by an R user.

EDIT: I also uploaded a pre-compiled version of the script (using Terminal).

Daily WTF

December 1, 2010 | Posted in Technology, Tagged , , , , , ,

Whenever I’m writing C code I am amazed that we have to keep track of things like array lengths and string terminators. It’s no wonder poor coders create weird bugs, like this one I found in my work code for storing atom names (like chlorine-1, carbon-2, etc).

char names[MAX_NAMES][6];
int num_names, i;
for(i=0; i < num_names; i++) {
    printf("(%d) %s\n", i, names[i]);
}

This is what I see:

(0)  CL1   CL2   C2 
(1)  CL2   C2 
(2)  C2 
(3)  CD2   N2 
(4)  N2 
...

WTF? Turns out there are exactly three spaces around each of those strings, making the 3-character names overflow. That’s what happens when you use strcpy on strings that are too short to contain the result.

Probabilistic Towers of Hanoi

November 23, 2010 | Posted in Math, Tagged , , , ,
The Lunch of Hanoi

Yellow curry on rice, Thai iced tea, and mango sticky rice.

At lunch after stats class I received the three to-go boxes at right. Probably due to some sort of brain-addling by aforementioned stats class, this lead me to think about the Towers of Hanoi problem, and how in real life stacking problems (eg lunches, moving trucks, etc) it is often ok to have one or two big boxes stacked on top of little boxes. This led to the following problem:

Probabilistic Towers of Hanoi

Like the Towers of Hanoi problem, we have three pegs and a stack of N disks. Disks are initially sorted on one peg, and the goal is to move single disks between pegs until all the disks are stacked in order on the second peg.

Unlike Towers of Hanoi, we allow larger disks to be stacked on top of smaller disks. This exponentially reduces the number of moves required, since it effectively reduces the stack height by 1 (recall that solving Towers of Hanoi requires 2^n-1 moves).

However, every time we stack a larger disk on top of a smaller disk, the tower falls over with probability p. In general, p could be some function g(\cdot) which varies depending on the differences in disk sizes, the height of the stack, etc., or it could be something simple like a fixed number. If the tower falls we lose the game and civilization is destroyed. Alternately, maybe we just have to redo the fallen tower, wasting some moves to do so.

The probabilistic towers of hanoi problem then becomes: Given some probability \epsilon, what is the best strategy for moving disks such that the expected probability of failure is less than \epsilon and the expected number of moves is minimal?

Solution

If there is a fixed probability p of losing the game each time we add a disk to a stack that contains a smaller disk, we can make at most \left\lfloor \frac{log(1-\epsilon)}{log(1-p)}\right\rfloor inverted moves. These should be distributed in such a way as to reduce the number of moves as much as possible. For one inverted move, this is as follows:

  1. Disks 1…(n-2) from A to C (2^{n-2}-1 moves)
  2. Disk (n-1) from A to C (1 inverted move)
  3. Disk n from A to B (1 move)
  4. Disk (n-1) from C to B (1 move)
  5. Disks 1…(n-2) from C to B (2^{n-2}-1 moves)

This takes a total of 2^{n-1}-1 moves, including one inversion. That’s around 50% better than the original algorithm. If additional inverted moves are allowed while keeping the probability of failure low, they should be distributed evenly between steps 1 and 5 to further reduce the moves. Given sufficient inverted moves, this procedure will move a stack of n disks in only 3(2^{n/2}-1) moves.

Open Problems

The fixed-probability case isn’t particularly interesting statistically. A more interesting probability function would be some physics-inspired statistic reflecting how ‘top-heavy’ a pile is; a large disk perched precariously on top of a stack would be more likely to fall than a medium disk on a slightly smaller base. I feel like some surprising behavior could emerge in these situations requiring more clever algorithms.

For even n. For odd n, it takes
2^{(n+3)/2 }-3)
.

Latex!

November 22, 2010 | Posted in Math,Metablog, Tagged , , , ,

I installed a \LaTeX plugin! It will pretty up any of my mathy posts, and you can use it in comments too! For inline math formulas, surround your \LaTeX code with two dollar signs: $$\frac{4}{3}\pi r^3$$ gives \frac{4}{3}\pi r^3. For display equations, add an exclamation mark: $$!E\psi(r)=-\frac{\hbar^2}{2m}\nabla^2\psi(r)+V(r)$$ gives

E\psi(r)=-\frac{\hbar^2}{2m}\nabla^2\psi(r)+V(r)

If you need two dollar signs for some reason, use the html escape sequence for dollar, &#36;.

WordPress with Sqlite on PHP 5.1

November 12, 2010 | Posted in Metablog, Tagged , ,

After a long plod, I finally got wordpress running on acsweb! This is exciting, since it means it is possible to comment on posts, link to them, google for them, etc. Setting up wordpress was made challenging at every turn by my crappy UCSD server, which is outdated or badly configured in a number of respects:

  • No MySQL, so I had to use the PDO plugin with a Sqlite database
  • PHP 5.1, which should be compatible with PDO, but is missing the preg_last_error() function
  • Error logging and printing is disabled, making discovering the previous error an hours-long debugging process

For anyone else trying to setup wordpress on a similar setup, here were the steps that worked for me:

  1. Run phpinfo() to make sure php is running properly and you can access the webserver. Make sure PDO support is enabled, which should result in a section called ‘pdo_sqlite’.
  2. Download and unpack wordpress
  3. Download PDO for WordPress. Copy all the contents of the zip file into wordpress/wp-content/.
  4. Set up wp-config.php, as described in the PDO instructions
  5. Apache needs permission to create the sqlite database file in your wordpress directory. For now, I just
    chmod -R a+w wordpress
  6. If you are running PHP version 5.0-5.1 (PHP 4 is SOL due to PDO requirements), then edit the file wp-content/pdo/driver_sqlite/pdo_sqlite_driver_create.php. Look for the function addQuery($query) around line 207. Comment out the line with preg_last_error():
    private function addQuery($query) {
        $this->_query[] = $query;
        //$this->_errors[] = preg_last_error(); // PHP >= 5.2;
    }
    
  7. Look at your wordpress install from a browser. It should guide you through the “easy” install process and set you up with an admin account. Make sure everything works right.
  8. Eventually you probably want to remove that global write permission we added to the wordpress directory, except for the database and the uploads folder.
    chmod -R go-w wordpress
    mkdir -p wordpress/wp-content/uploads
    chmod -R go+w wordpress/wp-content/uploads

Note that these instructions were tested with WordPress 3.0.1 with PDO plugin 2.7.0. I submitted a patch to PDO for wordpress, so hopefully future versions will be compatible with PHP 5.1.

Hardware Hacking: Clock

September 25, 2010 | Posted in Engineering, Tagged , , , , , , , ,
Clock Face Our house came with this clock in the kitchen. The clock runs several hours fast per day, making it completely useless for telling time. So of course, I took it apart and messed with it. I was hoping to find some timing adjustment inside, but the clock must have cost around 5ยข to make, leaving no room for such niceties. In the end I just figured out the gear assemblage and put it back together. Here’s the insides:
Mechanism Front Motor Diagram
The motor on the left is rather clever. There’s an IC on the back containing the piezo. Every second, this reverses the polarity of the attached coil, causing a magnet to flip over. This makes the ‘click’ sound in electric clocks. Unfortunately, I can’t do anything to fix a faulty IC circuit.
Mechanism TopGears Diagram

Gear layout. The motor drives gear 1 (omitted from photograph), which is transmitted to the second hand (3), the minute hand (5), and the hour hand (7). Gears are labeled with the number of teeth.

Gear Ratio to previous time/revolution
1 - 2 sec
2 4 8 sec
3 15/2 60 sec
4 15/2 15/2 min
5 8 60 min
6 3 3 hr
7 4 12 hr
One thing I was unable to figure out is the mechanism keeping the magnet flipping in a consistent direction. So as a prank/experiment I reversed the leads on the motor coil. We’ll see how long it takes my roommates to figure out that the broken clock now runs backwards.

Connecting through firewalls

June 23, 2010 | Posted in Technology, Tagged , , , , , ,

I have access to a number of servers which are behind firewalls. To access them I generally create a SSH tunnel through the firewall and then connect to the server through the tunnel. To speed up this process I made the following bash function (copy it into your .bashrc):

#Forward ssh connections through a firewall
function fw {
    USAGE=$(cat<<'END'
fw firewall destination [port]
fw alias

Generate an ssh tunnel through a firewall & connect through it to the destination.
Port refers the local port devoted to the tunnel. Only one tunnel may be generated per port.
Alias refers to a named tunnel, specified in ~/.ssh/tunnels. 
END
    )
    if [[ ! "$#" =~ [1-3] || "$1" == "-h" || "$1" == "--help" ]]; then
        echo "$USAGE" >&2
        return 70
    fi

    TUNNELS=~/.ssh/tunnels

    if [[ "$#" == 1 ]]; then #alias mode
        TUNNEL=$(egrep -i "^$1  " $TUNNELS 2>/dev/null ||
            { echo "Tunnel $1 not found in $TUNNELS." >&2; return 2; } )
        FW=$(echo $TUNNEL|awk '{print $2}')
        DEST=$(echo $TUNNEL|awk '{print $3}')
        PORT=$(echo $TUNNEL|awk '{print $4}')
    else 
        FW="$1"
        DEST="$2"
        PORT="${3:-2222}"
    fi

    # Create tunnel & establish a connection through.
    # Tunnel will close when the last connection through it closes.
    ssh -f -L $PORT:${DEST##*@}:22 $FW 'sleep 10' &&
        ssh -p $PORT -l "${DEST%%@*}" 127.0.0.1
}

Now, to ssh into ‘hotstuff.ucsd.edu’ through firewall ‘kerberos.ucsd.edu’, just run

$ fw sbliven@kerberos.ucsd.edu sbliven@hotstuff.ucsd.edu 2200
You can also make additional connections via ssh/sftp/sshfs on port 2200.

Frequent connections can be stored in a configuration file. Put a line in ~/.ssh/tunnels for each connection with an alias, the firewall, the destination, and the port:

# ~/.ssh/tunnels
# Alias fwuser@Firewall destuser@Destination    Port
hotstuff        sbliven@kerberos.ucsd.edu       sbliven@hotstuff.ucsd.edu       2200
Now, just write fw hotstuff and everything will connect.

Mass Spectrometry Review of PTMs

I’ve been taking CHEM 283 with Dr. Majid Ghassemian. It’s a practical lab course on mass spectrometry. For the final, we had to analyze a sample of alpha-casein (the main protein in cow milk) and present a report on some aspect of the results. I chose to focus on methods for identifying post-translational modifications using mass spec, as exemplified by the programs MASCOT, InsPecT, and MS-Alignment.

Self-analysis: Good journal-club-type discussion of the algorithms. The data is largely irrelevant to my point (due to lack of InsPecT training on QStar), but I had to fit in my experimental results somehow.