Skynet in Python
5/02/2009After a long hiatus I have come back to doing some (extremely basic, I have to admit) Python coding. This xkcd§ comic is a timely reminder:

Well, that and minimization of the objective function.
logs written by Luis Apiolaza in Christchurch, New Zealand
Archive of articles classified as' "programming"
Back homeAfter a long hiatus I have come back to doing some (extremely basic, I have to admit) Python coding. This xkcd§ comic is a timely reminder:

Well, that and minimization of the objective function.
As I have mentioned before, I have been putting together some dynamically generated maps for environmental information. A barebones version of my Python code to generate the KML file is:
#!/usr/bin/env python # encoding: utf-8 import urllib, random # Charting function def lineChart(data, size = '250x100'): baseURL = 'http://chart.apis.google.com/chart?cht=lc&chs=' baseData = '&chd=t:' newData = ','.join(data) baseData = baseData + newData URL = baseURL + size + baseData return URL # Reading test data: connecting to server and extracting lines f = urllib.urlopen('http://gis.someserver.com/TestData.csv') stations = f.readlines() kmlBody = ('') for s in stations: data = s.split(',') # Generate random data a = [] for r in range(60): a.append(str(round(random.gauss(50,10), 1))) chart = lineChart(a) # data is csv as station name (0), long (1), lat (2), y (3) kml = ( '<Placemark>\n' '<name>%s</name>\n' '<description>\n' '<![CDATA[\n' '<p>Value: %s</p>\n' '<p><img src="%s" width="250" height="100" /></p>\n' ']]>\n' '</description>\n' '<Point>\n' '<coordinates>%f,%f</coordinates>\n' '</Point>\n' '</Placemark>\n' ) %(data[0], data[3], chart, float(data[1]), float(data[2])) kmlBody = kmlBody + kml # Bits and pieces of the KML file contentType = ('Content-Type: application/vnd.google-earth.kml+xml\n') kmlHeader = ('<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n' '<kml xmlns=\"http://earth.google.com/kml/2.1\">\n' '<Document>\n') kmlFooter = ('</Document>\n' '</kml>\n') print contentType print kmlHeader print kmlBody print kmlFooter
Well, this is not exactly barebones, because we also wanted to generate dynamic graphs for each placemark, in the easiest possible way. My first idea was to use one of the multiple javascript libraries available in the net However, a quick search revealed that KML files do not support javascript in the description tag. That was the time when I remembered playing with Google Charts a while ago. The lineChart function above is simply a call to create a line chart using the charts API. Because this is a test, I used 60 randomly generated data points, which explains the presence of random as an imported library.
Originally, I did not want to use javascript at all, so inserted the code as a search in maps, generating a link like http://maps.google.co.nz/maps?q=http://gis.someserver.com/dynamicmap.py Just copy the address, send it to some one and, presto, they have access to my map. However, I wanted to embed it in a blog post§ and I was struggling to do it. The solution was to click on the ‘Link’ link in the generated map to copy the ‘Paste HTML to embed in website’ link. This gives an iframe block that can be copied in any page or blog post.
While helping a friend to create another map, we faced the problem that the data set was being updated every five minutes. What is the problem? The map was not being refreshed often enough. The I am not sure if the problem was a browser cache or Google Maps, but it could be solved by calling the KML file with a random extra argument (the script does not need take any arguments, so anything after the question mark is ignored). In my case I needed a frequent random argument, so I use the current time (using the date would work for once a day updates). This meant inserting the map using javascript (and using a Google Maps key). The code for a simple page–from the header onwards–would look like:
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
<title>A simple dynamic python generated map</title>
<script src="http://maps.google.com/maps?file=api&v=2&key=my_key"
type="text/javascript"></script>
<script type="text/javascript">
//<![CDATA[
function load() {
if (GBrowserIsCompatible()) {
var map = new GMap2(document.getElementById("map"));
map.setCenter(new GLatLng(-33.458943, -70.658569), 11);
var pollution = new GGeoXml("http://gis.uncronopio.org/testmapscsv.py?"+
(new Date()).getTime());
map.addOverlay(pollution);
}
}
//]]>
</script>
</head>
<body onload="load()" onunload="GUnload()">
<div id="map" style="width:750px;height:600px"></div>
</body>It was not too bad for mucking around on Friday in between doing house chores.
Last week I was contacted by my friend Marcelo about increasing awareness of air pollution problems in Santiago, Chile. He was becoming involved in the problem from a technical point of view (GIS and urban forestry). One of the main problems was the lack of proper information for decision making, so we decided to quickly put together a prototype. Today the page on particulate material pollution went online.
The general process was relatively simple. CONAMA provides data on pollution in graphical form (see, for example, here). I had a quick look at the pages using Firebug, which showed that all the data used for the graphs was contained in one of the javascript files called by the page (variable.js). Then I could obtain up to date pollution data by reading that file, which seems to be updated hourly.
The other component was the location of the air quality stations together with the coordinates of the polygon that marks the city boundary. Marcelo provided me with a KML file containing all the coordinates.
The really fun part was to write a script using Python glueing all these components. The advantages of working with such a great high level language is the default library, which makes chores like reading a file located in another web site very simple, like:
import urllib f = urllib.urlopen('http://www.conama.cl/rm/airviro/hoy/variable.js') lines = f.readlines()
Probably the most challenging part has been to quickly learn the basics of KML (without having much free time to do so). The documentation for KML is OK, but the tutorial was not exactly what I was trying to do, so there was a fair amount of trial and error to get things working properly.
Overall, coming back to Python (which I started using in version 1.5) has been a lot of fun, particularly when one has a project of ’social value’.
I started using Python way back at the end of 1996 or early 1997. I was working in my PhD, for which the first project involved writing some simulations in FORTRAN. Originally I was using FORTRAN90, but then I needed to move my project to a server that had only FORTRAN77, so I was stuck with something that looked—at least to me—really ugly. While I was looking for alternatives (I used Mathematica, Matlab, SAS, Python and ASReml in my PhD), I stumbled on an article by Konrad Hinsen discussing using Python to glue FORTRAN programs. Intrigued, I downloaded Python and ordered a copy of Mark Lutz’s Programming Python (the October 1996 first edition). After reading the book for a while I was hooked on the language.
I used Python in and out for small projects, and later dropped almost all programming (that was not stats) around 2001. I have missed that quite a bit until yesterday when working with a list of words that Orlando is using. We had typed around 450 words in Spanish (and he uses around the same number in English) and I wanted to check if we had repeated words. I downloaded Python, wrote a few lines and presto! We did have around 20 repeated words and it was so nice to be able to write something in Python.
After that I did check a few web pages and I realised that the language has evolved quite nicely (although I rarely use the object oriented stuff) and there are at least two books that I will be browsing soon:
Both books are available as free downloads in a variety of formats, as well as in real old-fashioned paper. I will certainly buy the nicest one in a paper copy.
I forgot to mention that one of the great things about Python was the existence of an excellent set of libraries for matrix operations (at the time was Numpy) that has grown in to a great set of resources for scientific computing called SciPy.
Checking the server logs I have discovered that many people that arrive at my posts on calling VB from R are, in fact, looking for the reverse. I have never done any programming calling R from VB; however, while I was looking for COM clients for R I also found information on COM servers. OmegaHat lists RDCOMServer as a package that exports S (or R) objects as COM objects in Windows. It provides examples on using VB, Python and Perl to call R code.
Another option is Thomas Baier’s R(D)COM Server, which is provided with examples in the same languages used by RDCOM Server.
High productivity of matrix languages like Matlab and S+ or their Open Source siblings Scilab and R are a joy to use. I wrote programs in Matlab during my PhD and I can still go back to the code and perfectly understand what is going on there. Now I am writing a lot of S+ and R code where a few lines manage to perform complex operations.
A good programmer can certainly produce better performing (on terms of speed and memory requirements) program using a low(ish) level language like C, C++ or FORTRAN. However, I am not such a good programmer and it would take me ages to do some of my work if I needed to write things using those languages. Most of the time execution speed and memory usage are not the limiting factors, and speed of development rules.
I am extremely happy now using R and playing with the idea to use it as a statistics server for a few small applications. Omega Hat seems to be a very valuable resource for all things ‘connecting R to other software’.
Around 2001 I wrote a ‘temporary quicky’ to compare new Eucalyptus samples to already identified haplotypes. I did that in a few lines of VBA in MS Excel, which was the software used as a repository for these haplotypes. At the time I suggested ‘this is a quick fix and it would be a good idea to develop a proper data base’, and suggested a structure allowing for user roles, web access, etc. I was told that ‘this is not a priority’ and ‘we are happy with the spreadsheet’.
Yesterday I was having lunch with the owner of this spreadsheet, who told me that a.- it is still being used after four years! and b.- they were having some problems because they changed a bit the structure for storing the haplotypes. I offered help to fix the problem but I was told that ‘one of my students will try to fix it, because the problem has to be something very simple’.
I thought that the comment was a bit dismissive and if it was so easy why haven’t they fixed it in over a month? Granted, the code is extremely simple but they do not have any programming experience whatsoever.
VBA is a fine scripting language, which allows people to write short and useful programs. However, I would question that in this case an Excel spreadsheet is the best option for storing molecular genetics information.
In general, scripting languages (like Matlab or R) feel like a better fit for me. Python, my all time favourite language, feels much more productive than any other language I have ever used. In addition, combining Python with the Numerical Python library produces an excellent all purpose/matrix programming language. This can be used for prototyping and—if one is happy with performance—transformed into a standalone program using a utility like py2exe.
After yesterday’s experience calling other software from R, I continued working to connect to my DLL. I first registered my DLL as a COM server in windows typing in the command line:
regsvr32 c:\data\rconnect\growthmod.dll
Then I had a look at my DLL using the SWinTypeLibs library, typing in R:
library(SWinTypeLibs) wood <- LoadTypeLib("c:\data\rconnect\growthmodels.dll") getTypeLibTypes(wood) DLLFuncs <- getFuncs(wood[[1]]) DLLElements <- getElements(wood[[1]])
The command getTypeLibTypes(wood) showed that the DLL contained _cPlantationModels (of type dispatch) and cPlantationModels (of type coclass). That is why I then continued working with wood[[1]], of type dispatch.
After that I needed help from Tim. He had compiled the DLL leaving the class module cPlantationModels with an instancing of type 5 (multiUse), making it available for outside users. Connecting to the library was relatively straightforward with Tim’s help:
library(RDCOMClient) growthmod <- COMCreate("growthmodels.cPlantationModels") growthmod$ModelCreate(30,11,8,4,0,25,5,5,1000,0,1,2,0,1) standvol <- growthmod$TreeValue(20) growthmod$ModelDestroy()
This code loaded the RDCOM client library, created a connection to the DLL, instantiated a growth model with starting values, evaluated the model and destroyed the instantiation. Now we are ready to start testing the models from inside R.
I want to test a series of growth models that are part of a DLL(Dynamic Link Library) written in Visual Basic. Although I could write several pieces of code in VB to test the library, I will try to access it from R. In this way I can run simulations and then process the results from within a statistical package.
I obtained the RDCOM client and the SWinTypeLibs facilities from the Omega Hat Project for Statistical Computing. If you are running R 2.0 you should get the latest compilations from here (RDCOMClient_0.8-1.zip, 11-Oct-2004, 12:08, 433K and SWinTypeLibs_0.2-1.zip, 11-Oct-2004, 12:09, 147K).
It is possible to learn about the classes, methods, parameters and return types of a COM server using the SWinTypeLibs library. Once we have learn about the server, we can access it using the RDCOMClient (this is explained in this presentation in PDF format, 50KB). For example:
library(RDCOMClient) ie = COMCreate("InternetExplorer.application") ie[["Visible"]] <- TRUE\ ie$Navigate2("http://www.uncronopio.org/quantum/")
will load the library, connect to Internet Explorer, display the window and point the browser to Quantum Forest. An example using Excel would be:
exc <- COMCreate("Excel.Application") exc[["Visible"]] <- TRUE\ books <- exc[["Workbooks"]] books$Add() actsheet <- books[[1]][["ActiveSheet"]] myrange <- actsheet$Range("A1:C10") myrange[["Value"]] <- asCOMArray(matrix(rnorm(30, 10, 3)))
This will connect to Excel, show the program, get a list of workbooks, add one, get which sheet is active, define a range and fill it with N(0,1) random numbers.
I continue exploring the package to see if I can connect to my own DLL.
This weekend I was working in my latest project — codename breedOmatic — writing a Python script to process the information obtained through a web form. I started working directly in the “production machine”; however, with a lousy internet connection the development and testing process was really slow.
I decided to use IIS (Internet Information Services) in my laptop — I am running windows 2000, but I discovered that it was not available. As an aside, at work machines are always setup in a very limited, barebones, way. Then, I needed to download a very small server (so I had some hope with my connection). Here comes TinyWeb, a wonderfully small (53KB) free web server by Ritlabs. I also installed TinyBox, a free controler for TinyWeb. The couple of programs work very well.
In a previous post I mentioned that I would give a try to JEdit, a java based text editor. Well, I installed it and I have had a great experience using it; and I am writing this post with it. Tip: if you are going to be using HTML or XHTML some very useful plugins for this editor is the XML plugin. There are some dependencies though, so you also need to have two other small plugins: Sidekick and ErrorLine.
P.S. 2004-09-28
1. I found a bug in JEdit, printing to networked printers under Windows 2000.
2. Note to self: the first line of a Python script needs to contain #!c:\python23\python.exe to be ran in a windows server.
It has been a while since writing the previous post. Most of my ‘web time’ has been devoted to put together the Forestry in Tasmania site (where I am still doing some work). Anyway, this week I imported an Excel spreadsheet into Splus, where I had defined a few columns as time. I usually do not use time formats in Excel, but this was not my project and I needed the times in Splus. Surprise! All times were there, but with weird dates attached. Then I tried importing dates and, surprise again, dates had this 00:00:00 attached. To be more precise, only the time showed in the column when using the data viewer.
A few visits to the online language reference and to the discussion group archives where useful. Times and dates are stored using the timeDate class, and it is possible to query the variable containing time and date using the functions: hours, minutes and seconds for time, and days, weekdays, months, quarters, years, and yeardays (which gives the number of day within a year) for dates. These functions return vectors. In addition, it is possible to create data frames where each column contains an integer with the following functions: hms (hour, minutes and seconds), mdy (month, day and year) and wdydy (weekday, yearday and year).
timeDate can also be used to transform strings to objects of class timeDate. For example:
x <- timeDate(c("1/1/1998 3:05:23.4", "5/10/2005 2:15:11.234 PM"))
which then can be queried as:
mdy(x) hms(x) wdydy(x) hours(x) etc