Spencer Bliven

Thoughts and Research

Surprises with floating point operations

December 17, 2010 | Posted in Programming,Technology, Tagged , , , , , , , ,

At work I am currently writing software that calculates some thermodynamic properties of proteins. I recently refactored some of the code and I wanted to make sure that I didn’t screw something up and change the output. So I was concerned when I compared the old and new output and discovered some differences:

Corex output before and after the refactor. The second column of numbers in each file gives the conformational entropy (Sconf).

Looking into the difference further, I finally tracked it back to a single change in the C source code. The original programmer liked to write out additions explicitly: Sconf=Sconf+backboneEntropy+sidechainEntropy; During the refactor I changed this to the more readable (IMHO) Sconf += backboneEntropy + sidechainEntropy;

This small change resulted in all the numerical differences I was seeing. To better understand the reason for the difference, here’s a simple C program:

#include <stdio.h>
int main(int args, char *argv[]) {
    float a, b, sum1, sum2;

    a= 4.1;
    b = -0.12;
    sum1 = sum2 = 11.9;

    sum1 = sum1 + a + b;
    sum2 += a + b;

    printf("=+ %f\n+= %f",sum1,sum2);
    return 0;
}

This yields the output =+ 15.880000 += 15.879999

Here is the unoptimized assembly code for lines 9-10, commented with “pseudo-c” descriptions of what’s happening. XMM0 and XMM1 are two 128-bit registers used by 64-bit processors (like my Intel Core 2 duo) for floating point operations. However, the ‘ss’ at the end of all the operations means that only the lowest 32 bits are used in the computation, with overflow being discarded.

	.loc 1 9 0    ;line 9
	movss	-12(%rbp), %xmm0  ; XMM0 = sum1
	addss	-4(%rbp), %xmm0  ; XMM0 += a
	addss	-8(%rbp), %xmm0  ; XMM0 += b
	movss	%xmm0, -12(%rbp)  ; sum1 = XMM0
	.loc 1 10 0    ;line 10
	movss	-4(%rbp), %xmm0  ; XMM0 = a
	movaps	%xmm0, %xmm1  ; XMM1 = XMM0
	addss	-8(%rbp), %xmm1  ; XMM1 += b
	movss	-16(%rbp), %xmm0  ; XMM0 = sum2
	addss	%xmm1, %xmm0  ; XMM0 += XMM1
	movss	%xmm0, -16(%rbp)  ; sum2 = XMM0

So basically, the difference between “=…+” and “+=” is just the order of operations; the former does (sum1 + a) + b while the latter does sum1 + (a + b). The small differences in rounding behavior between these two variants add up to account for the 0.01 differences I was observing in my program.

Executing commands from TextWrangler/BBEdit

December 6, 2010 | Posted in Technology, Tagged , , , , , , , ,

I use TextWrangler occasionally for editing code. It has a well-developed applescript dictionary, so I decided to write a script to execute commands directly from TextWrangler.

The following Applescript copies either the selected text or the current line into your application of choice. For instance, “iTerm” is a reasonable choice, as that would allow you to start a command (ipython, for instance), and then quickly modify and execute (python) statements without overwriting your clipboard. The pastebin is configured for R code since I originally wrote it in response to a request by an R user.

EDIT: I also uploaded a pre-compiled version of the script (using Terminal).

Daily WTF

December 1, 2010 | Posted in Technology, Tagged , , , , , ,

Whenever I’m writing C code I am amazed that we have to keep track of things like array lengths and string terminators. It’s no wonder poor coders create weird bugs, like this one I found in my work code for storing atom names (like chlorine-1, carbon-2, etc).

char names[MAX_NAMES][6];
int num_names, i;
for(i=0; i < num_names; i++) {
    printf("(%d) %s\n", i, names[i]);
}

This is what I see:

(0)  CL1   CL2   C2 
(1)  CL2   C2 
(2)  C2 
(3)  CD2   N2 
(4)  N2 
...

WTF? Turns out there are exactly three spaces around each of those strings, making the 3-character names overflow. That’s what happens when you use strcpy on strings that are too short to contain the result.