Using R for genetic analyses

11/06/2009

As some people know, I have been using asreml for genetic analyses for quite a few years and even keep the ASReml Cookbook§. I was quite happy to see the development of asreml-R, a package that makes available most of ASReml’s functionality from R. This made my life easier: I still use plain-vanilla ASReml for very big jobs, but I can access a much more comprehensive statistical system for fairly substantial jobs.

One of my main problems with asreml-R is that is not available for OSX (mac). Yes, I can dualboot or use a virtual box, but both options are a bit of a pain. I rather use my computer with its primary operating system and no strange overheads. I have requested several times to have a mac version. It seems that the code can be compiled without problems, but it is the license management software that is not available for the mac.

I then started looking for options to run genetic analyses. nlme was designed around hierarchical models and fitting experimental designs did not feel right. lme4 is looking good and the main issue was around fitting pedigrees, a matter at least partially solved by the pedigreemm package. I then came across the MCMCglmm§ package, which has some nice features: it makes Bayesian analyses accessible, ready support for pedigrees and a syntax not that different from asreml-R.

After playing with the MCMCglmm library, I found that I could not use pedigrees with parents acting both as males and females. I modified the code (line 26 of inverseA.R) to print a warning rather than to stop and the compiled the library again. Voila! it is working (the beauty of having access to the source).


R CMD INSTALL /Users/lap44/Downloads/MCMCglmm --library=/Users/lap44/Library/R/2.9/library

By the way, ASReml is still my primary tool at the moment, but I enjoy having good alternatives.

Filed in genetics, mac, software, statistics No Comments

Slide

1/06/2009

Slide

It is a long way down the slide. Auckland Zoo§.

Filed in geocoded, photos No Comments

Clusterf*k

1/06/2009

This email follows up earlier communication regarding the major ICT systems failure on Friday night [actually it was the first news I got], where the University’s primary data storage system had multiple simultaneous disk failures.

Note that while no staff email has been lost, staff files on the P: or K: drives have only been recovered from backups made early Friday morning. Files that were created new, or modified during Friday have been lost.

Just in case, if you emailed me on Thursday/Friday you better send me the email again.

Filed in web No Comments

The value of professional associations

29/05/2009

I have followed with interest the discussion on what should be the role of the New Zealand Institute of Forestry (NZIF§). It seems that a frequent position espoused by members is that the NZIF has two options to provide value to its members: making registration a legal requirement and ensuring high professional standards. I would contend that the first one is an oxymoron: ‘let us create value by making membership compulsory, so then members derive value from membership’. This generates as much value to members as a protection racket does to its victims. The second approach relies on the existence of an authority with the capacity to evaluate high professional standards. But Who are our peers in our narrow fields of specialization? Who can judge us as being ‘good enough’ to sell services as a growth modeler, forest economist, tree breeder, etc? At the end of the day, the market is king (or queen), and we are judged every time that we complete a professional assignment. The same goes for other activities: I would hire an accountant or a lawyer based on recommendations and experience—which are often translated in the market place through availability and fees charged—rather than by membership of a professional association.

That leaves us with how do we really derive value from voluntary association? We interact with other members, we exchange information, we learn. Do we strictly need the NZIF for this learning? Probably not, although it facilitates the process. Maybe the right function for the NZIF is to create opportunities for professional development, conferences, coordinated submissions, and making clear the role of forestry to New Zealand society. I think that the NZIF provides value by making communication easier for its members while any artificial barriers will only be detrimental to the interest of people working in the forestry sector and to their customers.

P.S. This quote from Free to Choose§ by Milton and Rose Friedman makes the point very clearly:

Licensure is widely used to restrict entry, particularly for occupations like medicine that have many individual practitioners dealing with a large number of individual customers. As in medicine, the boards that administer the licensure provisions are composed primarily of members of the occupation licensed—whether they be dentists, lawyers, cosmetologists, airline pilots, plumbers, or morticians. There is no occupation so remote that an attempt has not been made to restrict its practice by licensure. According to the chairman of the federal Trade Commission: “At a recent session of one state legislature, occupational groups advanced bills to license themselves as auctioneers, well-diggers, home improvement contractors, pet groomers, electrologists, sex therapists, data processors, appraisers, and TV repairers. Hawaii licenses tattoo artists. New Hampshire licenses lightning-rod salesman.”

The justification offered is always the same: to protect the consumer. However, the reason is demonstrated by observing who lobbies at the state legislature for the imposition or strengthening of licensure. The lobbyists are invariably representatives of the occupation in question rather than of the customers. True enough, plumbers presumably know better than anyone else what their customers need to be protected against. However, it is hard to regard altruistic concern for their customers as the primary motive behind their determined efforts to get legal power to decide who may be a plumber.

Filed in forestry, new zealand No Comments

The pain of moving (computers)

13/05/2009

It was the time to retire ‘Mastropiero’§ (my old mac laptop). While software wise it was running well (using Leopard) the building quality of the first series of macbook pros was not stellar. The new laptop—’Abraxas’—is a macbook pro with 320 GB hard drive, 4 GB RAM and 2.66 GHz Core 2 Duo processor.

Despite of all the propaganda, migration assistant is a fairly useless beast (at least in my personal situation). The university buys the computer and sets up a user account that, incidentally, has always the same name for a given user. This means that I can not just migrate my old account because there is already an account with the same name (and a bunch of settings) in place. In addition, migration assistant is pretty much an all or nothing affair, and I wanted to start with a fairly clean installation.

At the end I connected both laptops to the network and moved my data across. I imported all my songs into iTunes and copied the photo library, which was automatically upgraded from version 6 to version 8.

In the transfer process I dropped a number of programs that I was not using much. My current list of programs is in ‘I Use This’§. There is still a small amount of duplication; for example, both Eaglefiler and Devonthink are on the list, although eventually I will only keep the former. Another case in hand is MS Office. I can’t really stand MS Word and PowerPoint, particularly in their mac incarnations. If Office 2004 was slow, the 2008 version is a turd. I am trying to get by using OpenOffice, which I still do not consider completely satisfactory. I also have Pages, which is not quite compatible with Word. I think that OpenOffice still does a better job; it is uglier but more functional.

From a teaching point of view Keynote (presentations) and TeXShop (lecture notes) do the heavy lifting. My calendar is managed in iCal, which is synchronized to Google Calendar and also to my resucitated Palm T3; the latter using Mark Space’s missing sync. I dutifully ignore Palm’s own software.

Statistics are managed through R, although I am still waiting for a mac version for asreml-R a commercial package for genetic analyses. All publication quality plots are done there as well.

The university IT guys setup dual booting for me (20 GB windows XP partition), but I haven’t yet managed to have time to boot into windows. They also installed the developer tools, which I hope to use to do some programming with Python and C++ or Fortran 95 (depending on time availability).

And that is! A simple setup with oodles of space and memory; at least it feels like that now. Let’s wait for a year and see how it feels.

Filed in mac No Comments

Drylands

5/04/2009

There is a substantial amount of land with low rainfall–say between 500 and 900 mm of rain per year. Usually, this land is allocated to low productivity uses, for example sheep farming. Could we use durable wood, drought tolerant eucalypts? We could then have diversification of land use, alternative products and even additional carbon sequestration.

When we say dry it looks like this:

Dry in Marlborough

Drylands in Marlborough.

I would say that it is certainly worth a try; more precisely, a proper try. There is a history of half-hearted attempts in the matter, so if we are going to have a go, better we do it well or not at all.

Filed in forestry, photos No Comments

Eight years is nothing

31/03/2009

The screenshot shows the ‘State of the Surname’ as of 2001. Eighty-five hits for Apiolaza, most of them referring to my papers and emails to groups.

Google Search 2001
Google search for 2001.

Repeating today the search, we get a dramatically different answer§.

Filed in photos, web 1 Comment

This time is Calvino

30/03/2009

This happens relatively frequently: I am talking with someone else that doesn’t know me well and, at some point of the conversation I have mentioned that I am a forester. Then we move into books and I mention someone like Borges or Calvino and they look at me with this puzzled face as in ‘I didn’t know that foresters could read’. I know, it happens to other professions as well; just for the record not all of us are semi-literate apes, working with a chainsaw.

I was sorting out my bookshelves at work when I found a copy of ‘The literature machine’, a collection of essays by Italo Calvino§. It had my name and signature, together with 2002, Melbourne, Australia. (Digression: besides my name and signature I always put the city where I bought a book). I had vague memories of walking around in Melbourne’s CBD and finding an underground bookshop. At the time I was not looking for anything in particular, just browsing titles.

Why did I buy the book and never read it? I do remember browsing it and getting distracted by something more urgent, albeit clearly unimportant, because I cannot remember what was it. Probably I was not ready either; it has happened to me before. From ‘Uncle Tom’s cabin’§ when I was nine, to ‘The Fountainhead’§ when I was a teenager, to ‘The literature machine’ seven years ago. Most likely there is an issue of maturity, of being ready to read a particular story, philosophy or approach to the world.

Many years ago I read some of Calvino’s books, like Cosmicomics§ (brilliantly funny) and ‘The cloven viscount’§ (very enjoyable reading). But I particularly struggle with two literary forms: essays and plays. I sometimes can get into the former, but the latter has proven–until today–insurmountable.

However, today is the time for Calvino and essays. There is something deeply stimulating in these essays, together with a quaintness created by forty years gone since they were written. The feeling of freshness, possibility and hope from 1968 reads strange in 2009. At the same time, there is a bit of breaking with the system, since the implosion of the international economy. Maybe it is an excellent time to resonate with Calvino, as in the old days.

Filed in books, influences, writing No Comments

Teaching stats and software

12/03/2009

Forestry deals with variability and variability is the province of statistics. The use of statistics permeates forestry: we use sampling for inventory purposes, we use all sort of complex linear and non-linear regression models to predict growth, linear mixed models are the bread and butter of the analysis of experiments, etc.

I think it is fair to expect foresters to be at least acquainted with basic statistical tools, and we have two courses covering ANOVA and regression. In addition, we are supposed to introduce/reinforce statistical concepts in several other courses. So far so good, until we reach the issue of software.

During the first year of study, it is common to use MS Excel. I am not a big fan of Excel, but I can tolerate its use: people do not require much training to (ab)use it and it has a role to introduce students to some of the ’serious/useful’ functions of a computer; that is, beyond gaming. However, one can hit Excel limits fairly quickly which–together with the lack of audit trail for the analyses and the need to repeat all the pointing and clicking every time we need an analysis–makes looking for more robust tools very important.

Our current robust tool is SAS (mostly BASE and STAT, with some sprinkles of GRAPH), which is introduced in second year during the ANOVA and regression courses. SAS is a fine product, however:

  • We spend a very long time explaining how to write simple SAS scripts. Students forget the syntax very quickly.
  • SAS’s graphical capabilities are fairly ordinary and not at all conducive to exploratory data analysis.
  • SAS is extremely expensive, and it is dubious that we could afford to add the point and click module.
  • SAS tends to define the subject; I mean, it adopts new techniques very slowly, so there is the tendency to do only what SAS can do. This is unimportant for undergrads, but it is relevant for postgrads.
  • Users tend to store data in SAS’s own format, which introduces another source of lock-in.

In my research work I use mostly ASReml§ (for specialized genetic analyses) and R§ (for general work), although I am moving towards using ASReml-R (an R library that interfaces ASReml) to have a consistent work environment. For teaching I use SAS to be consistent with second year material.

Considering the previously mentioned barriers for students I have started playing with R-commander§, a cross-platform GUI for R created by John Fox (the writer of some very nice statistics books§, by the way). As I see it:

  • Its use in command mode is not more difficult than SAS.
  • We can get R-commander to start working right away with simple(r) methods, while maintaining the possibility of moving to more complex methods later by typing commands or programming.
  • It is free, so our students can load it into their laptops and keep on using it when they are gone. This is particularly true with international students: many of them will never see SAS again in their home countries.
  • It allows an easy path to data exploration (pre-requisite for building decent models) and high quality graphs.
  • R is open source and easily extensible.

I think that R would be an excellent fit for teaching; nevertheless, there would be a few drawbacks, mostly when dealing with postgrads:

  • There are restrictions to the size of datasets (they have to fit in memory), although there are ways to deal with some of the restrictions. On the other hand, I have hit the limits of PROC GLM and PROC MIXED before and that is where ASReml shines.
  • Some people have an investment in SAS and may not like the idea of using a different software.

We will see how it goes because–as someone put it many years ago–there is always resistance to change:

It must be remembered that there is nothing more difficult to plan, more doubtful of success, nor more dangerous to manage, than the creation of a new system. For the initiator has the enmity of all who would profit by the preservation of the old institutions and merely lukewarm defenders in those who would gain by the new ones.—Niccolò Machiavelli, The Prince, Chapter 6.§

Filed in software, statistics 6 Comments

Internet Blackout New Zealand

17/02/2009

New Zealand's new Copyright Law presumes 'Guilt Upon Accusation' and will Cut Off Internet Connections without a trial. Join the black out protest against it!

16 to 23 February 2009. Click on the image for more information.

Filed in new zealand, web No Comments