Matrix languages and quickies

22/01/2005

High productivity of matrix languages like Matlab and S+ or their Open Source siblings Scilab and R are a joy to use. I wrote programs in Matlab during my PhD and I can still go back to the code and perfectly understand what is going on there. Now I am writing a lot of S+ and R code where a few lines manage to perform complex operations.

A good programmer can certainly produce better performing (on terms of speed and memory requirements) program using a low(ish) level language like C, C++ or FORTRAN. However, I am not such a good programmer and it would take me ages to do some of my work if I needed to write things using those languages. Most of the time execution speed and memory usage are not the limiting factors, and speed of development rules.

I am extremely happy now using R and playing with the idea to use it as a statistics server for a few small applications. Omega Hat seems to be a very valuable resource for all things ‘connecting R to other software’.

A long lived quicky

Around 2001 I wrote a ‘temporary quicky’ to compare new Eucalyptus samples to already identified haplotypes. I did that in a few lines of VBA in MS Excel, which was the software used as a repository for these haplotypes. At the time I suggested ‘this is a quick fix and it would be a good idea to develop a proper data base’, and suggested a structure allowing for user roles, web access, etc. I was told that ‘this is not a priority’ and ‘we are happy with the spreadsheet’.

Yesterday I was having lunch with the owner of this spreadsheet, who told me that a.- it is still being used after four years! and b.- they were having some problems because they changed a bit the structure for storing the haplotypes. I offered help to fix the problem but I was told that ‘one of my students will try to fix it, because the problem has to be something very simple’.

I thought that the comment was a bit dismissive and if it was so easy why haven’t they fixed it in over a month? Granted, the code is extremely simple but they do not have any programming experience whatsoever.

VBA is a fine scripting language, which allows people to write short and useful programs. However, I would question that in this case an Excel spreadsheet is the best option for storing molecular genetics information.

A better generic language

In general, scripting languages (like Matlab or R) feel like a better fit for me. Python, my all time favourite language, feels much more productive than any other language I have ever used. In addition, combining Python with the Numerical Python library produces an excellent all purpose/matrix programming language. This can be used for prototyping and—if one is happy with performance—transformed into a standalone program using a utility like py2exe.

Filed in programming, software

No comments yet.

Write a comment: