Spencer Bliven

Thoughts and Research

Arduino IDE keywords

January 18, 2012 | Posted in Arduino,Technology, Tagged , ,

The other day I made my first library (a 7-segment display controller) for my new Arduino Uno, following two nice tutorials. They both mention that it’s a good idea to make a keywords.txt file for new libraries, which gives hints to the Arduino IDE’s syntax highlighter. However, neither gives a thorough explanation of format of that file. I thought I would document my findings.

The built-in keywords are defined an a simple text file. On my computer, this lives at /Applications/Arduino.app/Contents/Resources/Java/lib/keywords.txt. Here’s how it starts:

# LITERAL1 specifies constants

HIGH	LITERAL1	Constants
LOW 	LITERAL1	Constants

The interesting thing here is that there are three fields which get parsed. Only the first two are useful.

  1. The keyword to highlight
  2. The type of keyword it is. This really just determines the color, but most people seem to use the following convention:
    • KEYWORD1 Classes, datatypes, and C++ keywords
    • KEYWORD2 Methods and functions
    • KEYWORD3 setup and loop functions, as well as the Serial keywords
    • LITERAL1 Constants
    • LITERAL2 Built-in variables (unused by default)
  3. Documentation page. This is used by the ‘Help<Find in Reference’ menu item. For example, the reference for HIGH in the example above would be file:///Applications/Arduino.app/Contents/Resources/Java/reference/Constants.html.

By default, Arduino 1.0 colors all the KEYWORD types orange, and all the LITERAL types blue. These defaults are set in the /Applications/Arduino.app/Contents/Resources/Java/lib/theme/theme.txt. Here’s the relevant snippet (the comments seem to be inaccurate or outdated):

# TEXT - KEYWORDS

# e.g abstract, final, private
editor.keyword1.style = #cc6600,plain

# e.g. beginShape, point, line
editor.keyword2.style = #cc6600,plain

# e.g. byte, char, short, color
editor.keyword3.style = #cc6600,bold


# TEXT - LITERALS

# constants: e.g. null, true, this, RGB, TWO_PI
editor.literal1.style = #006699,plain

# p5 built in variables: e.g. mouseX, width, pixels
editor.literal2.style = #006699,plain

Just change any of the hexadecimal colors. I like the following:

editor.keyword1.style = #cc6600,plain
editor.keyword2.style = #993300,plain
editor.keyword3.style = #993300,bold
editor.literal1.style = #006699,plain
editor.literal2.style = #0099CC,plain

If loop and setup aren’t showing up bold, you may be using Monaco, which doesn’t have a bold style. I recommend using another fixed-width font which does have a bold style, such as DejaVu Sans Mono. This can be set in the Arduino preferences file, ~/Library/Arduino/preferences.txt:

editor.font=DejaVu Sans Mono,plain,10
editor.antialias=true

Make sure the Arduino IDE is not running, as it overwrites the preferences file upon exit.

EVfold

For our weekly journal club I talked about a new method for de novo protein folding called EVfold. [Slides] Details can be read in the paper (plus 15 page supporting text)

Marks, D. S., Colwell, L. J., Sheridan, R., Hopf, T. A., Pagnani, A., Zecchina, R., & Sander, C. (2011). Protein 3D Structure Computed from Evolutionary Sequence Variation. PloS ONE, 6(12), e28766. doi:10.1371/journal.pone.0028766.t001

The authors are motivated by two observations:

“In spite of significant progress in the field of structural genomics over the last decade [20], only about half of all well-characterized protein families (PFAM-A, 12,000 families), have a 3D structure for any of their members [1].”
“As we are about to reach a truly explosive phase of massively parallel sequencing, we anticipate increased coverage of sequence space for protein families by several orders of magnitude, well above the level of 1000–10000 non-redundant sequences for protein family and with rich evolutionary information about protein structure directly from sequence.”

Basically, DNA sequencing is dirt cheap and will only get cheaper, but up until now this hasn’t been helping to solve protein structures.

Marks et al. try to remedy this situation by looking at co-evolving residue pairs. Basically, they hypothesize that residues which are located close together in 3D space will tend to evolve together. If one mutates to a smaller residue, the other will tend to mutate to something bigger to compensate. If one changes from positively charged to negative, the other will change from negative to positive to balance it out. The idea behind EVfold is to identify co-evolving residues from the thousands of sequences we have for some protein families, then use that information to provide distance constraints in order to predict the protein’s structure.

Of course, just because two residues co-vary doesn’t necessarily imply they are spatially close. They could indirectly influence each other, such as if both bind to a ligand or both bind some intermediate residue. So the authors use a technique called direct coupling analysis (DCA) to predict which residues are close together. This has been around for a few years (Weigt et al (2009). PNAS, 106(1), 67–72), although that’s not immediately clear form the paper. DCA assigns a quantity called direct information (DI) to each pair of residues, which correlates really well with whether the pair is close together.

Marks et al. figure S2c. Grey regions indicate residues of Ras protein which are close together in the crystal structure, while red dots indicate pairs which were predicted to be close based on DI.

EVfold takes the top-ranked residue pairs and assumes they are close together. It then uses those pairs as distance constraints to solve the structure. This is identical to using distance constraints from NMR to solve a structure, and uses well-know simulated annealing/molecular dynamics algorithms. At the end, you get lovely protein strucutures with 3-5Å RMSD from the crystal structure.

Marks et al. figure 2. Predicted (left) and observed (right) structures for three proteins. A few minor differences are visible, such as missing beta-strands, but all three predictions are correct overall.

Perhaps the most impressive fact about this is that EVfold is able to predict a structure in less than an hour from only sequence information. That is incredible compared with the days of supercomputer time needed for other ab initio methods like ROSETTA.

So has EVfold solved the structure prediction problem? Hardly. There are many proteins where finding 1000+ homologous sequences will be hard, even with advances in sequencing technology (vertebrate-only proteins, for instance). Also, the authors suggest that even with perfect distance constraints the simulated annealing methods will not be able to predict structures at less that 2Å. So major advances at refining structures are needed before the crystallographers will be out of a job.

Still, there are lots of applications for which 3-5Å models of widespread folds would be useful. For instance, one of the major difficulties I’ve run into in my work on fold space is that we know there are thousands of proteins which are dissimilar to all known structures. Do these represent new folds, or are they just more variants of existing known folds? The speed of EVfold means that it should be fairly easy to predict structures for all of these domains which have enough sequence information out there. That’s not as good as having experimentally determined structures for everything, but it could give us some intriguing insights into the completeness of protein fold space.