Archive of articles classified as' "statistics"

Back home

Intellectual bullying

16/05/2005

I am a big fan of Splus and R and I am often recommending these systems to colleagues. One of the many reasons I like the programs is their very active users’ communities. If one runs into trouble, physician there are always other users willing to help. Nevertheless, angina a while ago I started noticing a disturbing trend: one of the Splus/R language demigods was becoming increasingly arrogant—and rude—in his replies to fellow posters. What was even more surprising is that nobody seemed to care. As Eric Hoffer said:

Rudeness is the weak man’s imitation of strength.

I was uncomfortable, but the situation was below my ‘annoyance threshold’ until this reply to this post. I was so annoyed that submitted this message, which generated an interesting discussion that resulted in a ‘lawyer-style’ apology from the offender. The interesting thing is that—after my post—I received several personal emails from people that where equally uncomfortable with the situation, but felt uneasy challenging the ‘powers that be’. This seems to be yet another example of the issues faced by internet communities, where a few individuals can easily damage the quality of communication in a forum.

I think that after this experience we will go back to a much more respectful and welcoming email list, so people are not afraid of posting simple questions and being flamed in the process.

Filed in quotes, statistics, web No Comments

Calling R from Visual Basic

4/02/2005

This post is an utterly miscellaneous brain dump:

  • Last week I got pharyngitis and am still taking antibiotics, side effects situation that I really dislike.
  • Yes, geriatrician I am a doctor in the real sense of the word, despite what the physician that prescribed antibiotics thinks.
  • Last Saturday I got one of the worst haircuts ever—at least that I can remember—at Just Cuts. Yes, it is my fault for first choosing to go to a such dubious place: avoid it if you can. Nevertheless, every time I passed outside Paul’s barber shop he was busy. Today I went to his place with my tail between my legs and beg him to have it fixed. We had a laugh, had it fixed and he made me promise not to repeat my sin.
  • Cooked a beautiful marinated octopus pasta last night. Looking forward to eat the left overs at lunch time.
  • Last Christmas I got a few vouchers from ‘Music without Frontiers’, one of the few music stores in Hobart where one can find something outside the ‘top 20′. I went back to my old listening habits, and got:
  • Andrew told me that last Saturday was Captain Beefheart’s birthday. I did not know who CB was so I will have to borrow some of his art.

It is hard for me to get interested in current mainstream music: no challenges, one can guess what is coming so easily that tends to be a big yawn. That’s all folks.

High productivity of matrix languages like Matlab and S+ or their Open Source siblings Scilab and R are a joy to use. I wrote programs in Matlab during my PhD and I can still go back to the code and perfectly understand what is going on there. Now I am writing a lot of S+ and R code where a few lines manage to perform complex operations.

A good programmer can certainly produce better performing (on terms of speed and memory requirements) program using a low(ish) level language like C, viagra 60mg
C++ or website
I am not such a good programmer and it would take me ages to do some of my work if I needed to write things using those languages. Most of the time execution speed and memory usage are not the limiting factors, and speed of development rules.

I am extremely happy now using R and playing with the idea to use it as a statistics server for a few small applications. Omega Hat seems to be a very valuable resource for all things ‘connecting R to other software’.

A long lived quicky

Around 2001 I wrote a ‘temporary quicky’ to compare new Eucalyptus samples to already identified haplotypes. I did that in a few lines of VBA in MS Excel, which was the software used as a repository for these haplotypes. At the time I suggested ‘this is a quick fix and it would be a good idea to develop a proper data base’, and suggested a structure allowing for user roles, web access, etc. I was told that ‘this is not a priority’ and ‘we are happy with the spreadsheet’.

Yesterday I was having lunch with the owner of this spreadsheet, who told me that a.- it is still being used after four years! and b.- they were having some problems because they changed a bit the structure for storing the haplotypes. I offered help to fix the problem but I was told that ‘one of my students will try to fix it, because the problem has to be something very simple’.

I thought that the comment was a bit dismissive and if it was so easy why haven’t they fixed it in over a month? Granted, the code is extremely simple but they do not have any programming experience whatsoever.

VBA is a fine scripting language, which allows people to write short and useful programs. However, I would question that in this case an Excel spreadsheet is the best option for storing molecular genetics information.

A better generic language

In general, scripting languages (like Matlab or R) feel like a better fit for me. Python, my all time favourite language, feels much more productive than any other language I have ever used. In addition, combining Python with the Numerical Python library produces an excellent all purpose/matrix programming language. This can be used for prototyping and—if one is happy with performance—transformed into a standalone program using a utility like py2exe.

Our telephone service for the last three years has been provided by Ecomtel (a small company), information pills
although the physical infrastructure belongs to Telstra (the largest telecommunications provider in Australia). Initially we were very happy with Ecomtel’s services, rx
they had low charges and their service seemed to be very responsive. The icing in the cake for me was their reliance on Open Source Software (e.g., prostate Linux), which made easier for them to be very competitive in price—particularly for international calls.

This year we logged an issue with Ecomtel, because our low speed of connection to internet (maximum of 14.4 Kbps, pathetic, isn’t it?). After some investigation, it was established that the problem was in the quality of our line—that belongs to Telstra, which happens to be a paired gain system rather than an individual copper line. We pointed out to Ecomtel that according to the TIO the minimum speed of connection to internet should be 19.2 Kbps:

The Internet Assistance Program was set up as a joint venture between Telstra Corporation and the Federal Government to ensure a minimum transmission speed of at least 19.2 kilobits per second to all users of its fixed network. Subsequently, it was decided that a minimum speed of 19.2 kilobits per second would become a condition of Telstra’s licence agreement. While this condition is not binding on other network carriers (where Telstra does not provide the underlying infrastructure), the TIO views this as an industry benchmark and expects that regardless of which network a customer is connected to, the standard telephone line provided should be capable of a minimum transmission speed of 19.2 kilobits per second.

The fact is that physically it is not possible to achieve 19.2 Kbps with a paired gain connection. Telstra and Ecomtel say that we should pay for a new telephone connection (cost AU$209) to change to a copper line. Our position is that i- we were never offered the option between types of line when connected in the first place and ii- the current line does not meet the condition for Telstra’s licence agreement anyway. By the way, the ‘new connection’ only involves unplugging our line from one connector and plugging it back in a connector sitting next to the original one.

We have been discussing the issue with Ecomtel for over a month, and they have not been very responsive during this time. We will now take this issue to the TIO and see if we can get a new connection without the extra cost. During this process, we changed from loyal (telling our friends to switch to Ecomtel) to dissatisfied (writing this post) customers. A real business lesson in how to alienate your customers.

Checking the server logs I have discovered that many people that arrive at my posts on calling VB from R are, dermatologist
in fact, looking for the reverse. I have never done any programming calling R from VB; however, while I was looking for COM clients for R I also found information on COM servers. OmegaHat lists RDCOMServer as a package that exports S (or R) objects as COM objects in Windows. It provides examples on using VB, Python and Perl to call R code.

Another option is Thomas Baier’s R(D)COM Server, which is provided with examples in the same languages used by RDCOM Server.

Filed in programming, statistics No Comments

Calling Visual Basic DLLs from R, part 2

21/12/2004

After yesterday’s experience calling other software from R, refractionist I continued working to connect to my DLL. I first registered my DLL as a COM server in windows typing in the command line:

regsvr32 c:data
connectgrowthmod.dll

Then I had a look at my DLL using the SWinTypeLibs library, visit this site typing in R:

library(SWinTypeLibs)
wood <- LoadTypeLib("c:data
connectgrowthmodels.dll")
getTypeLibTypes(wood)
DLLFuncs <- getFuncs(wood[[1]])
DLLElements <- getElements(wood[[1]])

The command getTypeLibTypes(wood) showed that the DLL contained _cPlantationModels (of type dispatch) and cPlantationModels (of type coclass). That is why I then continued working with wood[[1]], herpes of type dispatch.

After that I needed help from Tim. He had compiled the DLL leaving the class module cPlantationModels with an instancing of type 5 (multiUse), making it available for outside users. Connecting to the library was relatively straightforward with Tim’s help:

library(RDCOMClient)
growthmod <- COMCreate("growthmodels.cPlantationModels")
growthmod$ModelCreate(30,11,8,4,0,25,5,5,1000,0,1,2,0,1)
standvol <- growthmod$TreeValue(20)
growthmod$ModelDestroy()

This code loaded the RDCOM client library, created a connection to the DLL, instantiated a growth model with starting values, evaluated the model and destroyed the instantiation. Now we are ready to start testing the models from inside R.

Filed in programming, software, statistics 1 Comment

Calling Visual Basic DLLs from R

20/12/2004

I want to test a series of growth models that are part of a DLL(Dynamic Link Library) written in Visual Basic. Although I could write several pieces of code in VB to test the library, check I will try to access it from R. In this way I can run simulations and then process the results from within a statistical package.

I obtained the RDCOM client and the SWinTypeLibs facilities from the Omega Hat Project for Statistical Computing. If you are running R 2.0 you should get the latest compilations from here (RDCOMClient_0.8-1.zip, view 11-Oct-2004, 12:08, 433K and SWinTypeLibs_0.2-1.zip, 11-Oct-2004, 12:09, 147K).

It is possible to learn about the classes, methods, parameters and return types of a COM server using the SWinTypeLibs library. Once we have learn about the server, we can access it using the RDCOMClient (this is explained in this presentation in PDF format, 50KB). For example:

library(RDCOMClient)
ie = COMCreate("InternetExplorer.application")
ie[["Visible"]] <- TRUE\
ie$Navigate2("http://www.uncronopio.org/quantum/")

will load the library, connect to Internet Explorer, display the window and point the browser to Quantum Forest. An example using Excel would be:

exc <- COMCreate("Excel.Application")
exc[["Visible"]] <- TRUE\
books <- exc[["Workbooks"]]
books$Add()
actsheet <- books[[1]][["ActiveSheet"]]
myrange <- actsheet$Range("A1:C10")
myrange[["Value"]] <- asCOMArray(matrix(rnorm(30, 10, 3)))

This will connect to Excel, show the program, get a list of workbooks, add one, get which sheet is active, define a range and fill it with N(0,1) random numbers.

I continue exploring the package to see if I can connect to my own DLL.

Filed in programming, software, statistics 1 Comment

A nice R interface

17/11/2004

I was looking for a good free ‘point and click’ statistical software for some colleagues of mine. Although I did not find anything even close to what I was looking for — sort of R with menus — at least I found a nice R interface. JGR (pronounced like Jaguar) provides object browser, try data and code editors (both with syntax highlighting), ampoule help browser and package manager.

I can see tools like the package manager evolving into tools for specific analyses like linear models, glands descriptive statistics, etc. I hope these guys can pull it off.

One of the reasons I have revisited R is that it provides access to a large range of cutting edge statistical procedures. This software project has a critical mass of users — actually, a great users’ community — that implies early adoption of statistical techniques.

I have struggled for a while with smoothing splines, and what better way of learning their use than playing with some data. Years ago I met Rod Ball in Forest Research (Rotorua, New Zealand), where he always made useful comments during my PhD presentations. The world is a small place, and now I am playing with Rod’s lmeSplines library, which allows fitting smoothing splines using lme. The library can be downloaded from CRAN (Comprehensive R Archive Network, see contributed packages under your operating system) and there is an introduction to its use in R news (PDF file 1.3MB).

P.S. 2004-11-18. I was contacted by a fellow ASReml user, worried because I was not using ASReml’s splines facilities. I want to make clear that there is no problem whatsoever with splines in ASReml. It is just that I am trying many alternative models (including non-linear ones), data manipulation and graphics with a dataset of squid behaviour, and doing everything inside one piece of software it is easier.

Filed in software, statistics No Comments

Alphabet soup: R, S and SAS

20/08/2004

After a series of post on environmental issues, geriatrician it is good to come back to software and statistics. I am a user of many different statistical packages: Splus and Genstat for general statistics, ascariasis PASS for power calculations and ASReml for generalised linear mixed models (GLMM, particularly for quantitative genetics analyses).

While I consider that PASS and ASReml are pretty good value for money — especially the latter — I sometimes question my choice of generic software. Until 1996 I used to be a ‘heavy duty’ SAS user (at least of base, stat, iml and graph). Some of the many drawbacks of SAS were its low quality graphics, low speed for large analyses and its price (near AU$10,000 a year for the modules I used for work). SAS is also quite monolithic and usually is much slower on the adoption of new analytical techniques. On the other hand, SAS database facilities are quite good and it can cope with large analyses relatively well, albeit very slowly.

In 1996 I discovered ASReml (~AUD$1,500) and, given that most of my work was of the GLMM kind, I switched and never looked back. ASReml is lightning fast (orders of magnitude more than SAS) and it allows to implement very large analyses, enough for running a national genetic evaluation for forest trees.

Genstat is a good generic package, but the graphics have lower quality than Splus’s, the language is too Fortranian for my taste and the community of users is quite small. The small community makes discussion and implementation of new routines slower than in Splus, and gives the feeling of a ‘dead end’ from a language evolution point of view.

Splus is a terrific piece of software. However, buying it is around AU$4,500 and AU$1,000 per year of maintenance. This is still quite a bit of money, particularly for small businesses. This takes me to R, an open source implementation of S, without the fancy GUI but with the same analytical capabilities of Splus. If you are a ‘power user’ of Splus there are very good incentives to move to R; firstly, it is free and, secondly, you are already quite proficient on R, because the syntax is very similar. If you ask me, Splus is slowly pricing itself outside the small business market and many ‘power users’ of Splus, which rarely use its GUI, will not see any benefits on keeping paying for software licenses, because this time there is ‘almost a free lunch’. At work we are evaluating R for use by the biometricians and, maybe, by the more sophisticated researchers. Other researchers will still probably stay with Genstat, at least until R develops a decent GUI. But that time — if ever happens — we may decide to switch everybody to R.

Filed in software, statistics No Comments

Bending covariance matrices

2/04/2004

One of the reasons I keep this weblog is because it sometimes acts as a repository of links to work that I do not perform very often. This morning I was asked about software for bending matrices, hospital so here it is.

One of the typical problems when running multivariate genetic analyses is obtaining non-positive definite matrices (with negative eigenvalues. Thanks Greg for the link). This problem is often approached by modifying or ‘bending’ the matrix, in such a way that there are no negative eigenvalues. A typical reference for this procedure is Hayes, J.F. and W.G. Hill. 1981. Modification of estimates of parameters in the construction of genetic selection indices (’bending’). Biometrics 37: 483-493.

Karin Meyer is a scientist from AGBU (the Animal Genetics and Breeding Unit not the Armenian General Benevolent Union) who made available a set of Fortran programs called PDMATRIX, that allows bending matrices before using them in BLUP analysis (see Karin’s page under miscellaneous). This set includes:

  • FLBEND that “modifies a non-positive definite matrix comprised of pair wise or block covariance estimates, where estimates may or may not be based on experimental results”.
  • ITSUMCOV that “deals with specific case of multiple, multivariate analyses of different subsets of a number of traits. Say we have q traits in total, and S analyses involving k traits each”.

Thus, bending the matrices is just a matter of dusting your Fortran compiler and running the software.

Filed in genetics, statistics No Comments

GLME model for birds

18/03/2004

I completed analysing the data on birds abundance (number of birds) and richness (number of species). At the end, anabolics I settled down on a Generalised Linear Mixed Model with a Poisson distribution, log link, random plot effects and a first order autoregressive correlation structure. This approach takes into accoun that the repeated counts (48 fortnightly asessments for each plot) are not independent.

It was my first time using the ‘experimental’ Splus correlatedData library, which seems to be working pretty well, although there are still a few non-yet-implemented features like deviance. I tested the model using both gee (the function for generalised estimating equations) and glme (a generalised version of lme). Results were very close to each other and, more importantly, they did make sense from a biological point of view.

Now comes the ‘writing the report’ part, so we can publish the results. Still pending is trying a zero inflated Poisson model — two stage approach with a binomial component followed by a Poisson one — to take into account some overdispersion. I doubt that it will make much of a difference, but it would be nice to see an implementation in Splus.

P.S. This was published as MacDonald, M.A., Apiolaza, L.A. and Grove, S. 2005. The birds of retained vegetation corridors: a pre- and post-logging comparison in dry sclerophyll forest in Tasmania. Forest Ecology and Management 218: 277–290. You can get a copy looking at publication 26.

Filed in environment, statistics No Comments

Generalised linear mixed models

7/03/2004

After lunch of March 3rd, ambulance I posted my last message and made my last visit to an environmentalist forum. It was an interesting experience to present people alternative views, oncologist breaking the echo chamber effect. However, obesity it required quite a bit of time that I could not longer justify to myself. Rather than talking about protecting the environment I prefer to spend that time working on it. I am now in the middle of an interesting statistical analysis of birds populations in Tasmanian forests.

After working on the typical descriptive statistics, I tried a log-linear model in Splus, which captured a small part of the variation. I then wanted to try a generalised linear mixed model (like glmm in Genstat), to include random terms in the model. However, the current professional version of Splus does not come with procedures to implement this. Fortunately, I found in Insightful’s site the S+CorrelatedData experimental library that implements glmm. I will give it a try during next week, including first random site effects and then random regressions for sites.

As always, working in a new problem is an excellent learning experience, and a good opportunity to add new statistical tools to my repertoire.

Filed in environment, statistics, tasmania 1 Comment

The elements of style

3/03/2004

Several years ago, symptoms I think it was 1999, viagra buy I downloaded and read Exegeses on Linear Models (PDF, nurse 108KB), a paper written by Bill Venables in 1998. I then put it aside for five years, until last week.

It was a beautiful day in Hobart when I drove up to Bellerive’s Esplanade (Eastern Shore). For reasons beyond this note I spent one hour sitting next to the river reading the paper. I really enjoyed the language and the examples. The paper is elegant, witty and guides the reader around many interesting topics.

Near the end of the paper, I was suddenly distracted by intense sounds around. It was a combination of seagulls (I was completely surrounded by them) and the sails of five boats, twenty metres from my seat, flapping during a race. Just another day reading statistics.

Filed in language, statistics No Comments