Archive of articles classified as' "software"

Back home

Default to R

8/09/2009

Last March§ I posted an explanation of the issues behind getting R accepted in our School for teaching statistics.

At School level, weight loss I needed to spend substantial time compiling information to prove that R could satisfy my colleagues’ statistical needs. Good selling points were lme4, information pills lattice/ggplot and pointing my most statistically inclined colleagues to CRAN. Another important issue was the ability to have a GUI (Rcmdr) that could be adapted to our specific needs. We will develop an extra menu item to fit non-linear models for growth models used in forestry. Our School has now adopted R as the default software for teaching any statistical content during the four years of the curriculum.

At the university level, my questions to the department of Mathematics and Statistics sparkled a lot of internal discussion, which resulted in R being adopted as the standard software for some of the second year courses (it was already the standard for most courses in 3rd and 4th year). The decision was not unanimous, particularly because for statisticians knowing SAS is one of those ‘must be in the CV’ skills, but they went for change. The second year courses are offered across colleges, which makes the change very far reaching. These changes will also imply that in the near future many computers in the university will come with R preinstalled.

It is nice to see interesting changes once in a while.

Filed in software, statistics No Comments

Using R for genetic analyses

11/06/2009

As some people know, visit I have been using asreml for genetic analyses for quite a few years and even keep the ASReml Cookbook§. I was quite happy to see the development of asreml-R, ed a package that makes available most of ASReml’s functionality from R. This made my life easier: I still use plain-vanilla ASReml for very big jobs, but I can access a much more comprehensive statistical system for fairly substantial jobs.

One of my main problems with asreml-R is that is not available for OSX (mac). Yes, I can dualboot or use a virtual box, but both options are a bit of a pain. I rather use my computer with its primary operating system and no strange overheads. I have requested several times to have a mac version. It seems that the code can be compiled without problems, but it is the license management software that is not available for the mac.

I then started looking for options to run genetic analyses. nlme was designed around hierarchical models and fitting experimental designs did not feel right. lme4 is looking good and the main issue was around fitting pedigrees, a matter at least partially solved by the pedigreemm package. I then came across the MCMCglmm§ package, which has some nice features: it makes Bayesian analyses accessible, ready support for pedigrees and a syntax not that different from asreml-R.

After playing with the MCMCglmm library, I found that I could not use pedigrees with parents acting both as males and females. I modified the code (line 26 of inverseA.R) to print a warning rather than to stop and the compiled the library again. Voila! it is working (the beauty of having access to the source).


R CMD INSTALL /Users/lap44/Downloads/MCMCglmm --library=/Users/lap44/Library/R/2.9/library

By the way, ASReml is still my primary tool at the moment, but I enjoy having good alternatives.

Filed in genetics, mac, software, statistics 2 Comments

Teaching stats and software

12/03/2009

Forestry deals with variability and variability is the province of statistics. The use of statistics permeates forestry: we use sampling for inventory purposes, and we use all sort of complex linear and non-linear regression models to predict growth, caries linear mixed models are the bread and butter of the analysis of experiments, women’s health etc.

I think it is fair to expect foresters to be at least acquainted with basic statistical tools, and we have two courses covering ANOVA and regression. In addition, we are supposed to introduce/reinforce statistical concepts in several other courses. So far so good, until we reach the issue of software.

During the first year of study, it is common to use MS Excel. I am not a big fan of Excel, but I can tolerate its use: people do not require much training to (ab)use it and it has a role to introduce students to some of the ’serious/useful’ functions of a computer; that is, beyond gaming. However, one can hit Excel limits fairly quickly which–together with the lack of audit trail for the analyses and the need to repeat all the pointing and clicking every time we need an analysis–makes looking for more robust tools very important.

Our current robust tool is SAS (mostly BASE and STAT, with some sprinkles of GRAPH), which is introduced in second year during the ANOVA and regression courses. SAS is a fine product, however:

  • We spend a very long time explaining how to write simple SAS scripts. Students forget the syntax very quickly.
  • SAS’s graphical capabilities are fairly ordinary and not at all conducive to exploratory data analysis.
  • SAS is extremely expensive, and it is dubious that we could afford to add the point and click module.
  • SAS tends to define the subject; I mean, it adopts new techniques very slowly, so there is the tendency to do only what SAS can do. This is unimportant for undergrads, but it is relevant for postgrads.
  • Users tend to store data in SAS’s own format, which introduces another source of lock-in.

In my research work I use mostly ASReml§ (for specialized genetic analyses) and R§ (for general work), although I am moving towards using ASReml-R (an R library that interfaces ASReml) to have a consistent work environment. For teaching I use SAS to be consistent with second year material.

Considering the previously mentioned barriers for students I have started playing with R-commander§, a cross-platform GUI for R created by John Fox (the writer of some very nice statistics books§, by the way). As I see it:

  • Its use in command mode is not more difficult than SAS.
  • We can get R-commander to start working right away with simple(r) methods, while maintaining the possibility of moving to more complex methods later by typing commands or programming.
  • It is free, so our students can load it into their laptops and keep on using it when they are gone. This is particularly true with international students: many of them will never see SAS again in their home countries.
  • It allows an easy path to data exploration (pre-requisite for building decent models) and high quality graphs.
  • R is open source and easily extensible.

I think that R would be an excellent fit for teaching; nevertheless, there would be a few drawbacks, mostly when dealing with postgrads:

  • There are restrictions to the size of datasets (they have to fit in memory), although there are ways to deal with some of the restrictions. On the other hand, I have hit the limits of PROC GLM and PROC MIXED before and that is where ASReml shines.
  • Some people have an investment in SAS and may not like the idea of using a different software.

We will see how it goes because–as someone put it many years ago–there is always resistance to change:

It must be remembered that there is nothing more difficult to plan, more doubtful of success, nor more dangerous to manage, than the creation of a new system. For the initiator has the enmity of all who would profit by the preservation of the old institutions and merely lukewarm defenders in those who would gain by the new ones.—Niccolò Machiavelli, The Prince, Chapter 6.§

Filed in software, statistics 6 Comments

Multivariate simulation for a breeding program

13/01/2009

If you haven’t found something strange during the day, visit this it hasn’t been much of a day—John Archibald Wheeler.

No one can retell the plot of a Cortázar story; each one consists of determined words in a determined order. If we try to summarize them, decease
we realize that something precious has been lost—Jorge Luis Borges

The core of multivariate simulation for a breeding program is the generation of observations that follow a given covariance matrix V. Using Cholesky decomposition (so V = C`C) one can easily generate the desired distribution. I use the R core.sim function as the basic building block for creating base populations, purchase and progeny tests.

# core.sim generates n.obs observations, <a href="http://buycialisonlinecoupon.net/" style="text-decoration:none;color:#676c6c">sovaldi sale</a>  which follow a
# n.vars multivariate normal distribution with mean 0
# and variance C`C. That is, it takes the Cholesky
# decomposition C of a covariance matrix as argument.
# This function is used by all base population and progeny
# testing functions.
core.sim <- function(C, n.obs, n.vars){
N <- matrix(data = rnorm(n.obs*n.vars),
nrow = n.vars, ncol = n.obs)
S <- t(C %*% N)
return(S)
}

R syntax highlighting courtesy of the WP-syntax plugin (an interface to GEshi).

Filed in research, software, statistics No Comments

Mac update 2008

25/12/2008

Nearing the end of the year I keep track of the software that I use the most in my mac. Firstly, valeologist I am a researcher—I always struggle with the word scientist—so the programs I use the most have my own bias:

  • Statistical analysis: R (free). R it is the closest thing to a lingua franca for computational statistics: it is cross-platform, resuscitator flexible and its graphics are great. The mac version comes with a much better script editor than its windows counterpart.
  • Writing: LaTeX, for which I use MacTeX (free). Sometimes I do provide in this blog my reasons for this choice. Initially I was using TexShop as editor, but I have moved to TextMate.
  • Reference management: Bibdesk (free). Some eye-candy on top of the time-proof BibTeX format.
  • Text editor: TextMate (€39). Well, I do pay for good software, and TextMate has the right combination of features, footprint and macness. I do miss one or two features, but clearly not enough.
  • Presentations: Keynote, which is part of the iWork suite ($79). In fact, this is the only part of iWork that I do use. When teaching some subjects (like statistics) I do require a fair number of equations in the presentations, for which I use LaTeXiT (free). Some times I embed Google Earth flyovers in presentations, for which I use iShowU ($20).
  • Keeping it organized: EagleFiler ($40) for project archives, web snippets and email archiving. Good quality software and a very responsive developer.
  • Keyboard goodness: Quicksilver (free) acts as application launcher, search utility, etc.

Concerning web interaction (I do keep a few sites), my list is not that long:

  • Browser: Firefox (free). Safari is nice, but I would miss the following plugins: Firebug, Zotero and Delicious (in that order).
  • FTP: Cyberduck (free). I have fairly simple requirements in this department, so I find it difficult to justify paying for something like Transmit.
  • Blogging: MarsEdit ($30) is a solid and straightforward piece of software; worth the money.
  • Twitter: I use Twitterrific (the ad-supported version) to update twitter, which updates this site’s sidebar and my facebook status.

By the way, Rui Carmo keeps a good list of mac alternatives to windows software.

Important new addition: MoneyWorks.

A cursory web search on accounting software for the mac will lead mostly to disappointment. The big players (Quickbooks and MYOB) have shocking versions for the mac. On the other hand, most small players (like iBank or Cha-Ching) only target the personal finance market. I just started working with MoneyWorks, which is a decent and usable (I have no better adjectives for this category, it is accounting software for God’s sake) program for small businesses. Almost enjoyable!

Filed in mac, software No Comments

Return to Latex

15/12/2008

Last night I was helping my wife to fix an MS Word document. The document had been edited by several people, resuscitation with varying setups, medications so it was a real mess. Different page sizes (letter and A4), page paragraph settings, sections, etc. Making changes at any point of the document created all sort of side effects; for example, content moving to different pages or missing formating when deleting supposedly unrelated sections. At the end, she copied and ‘pasted unformatted’ to another document to be able to fix the document from scratch. The always touted advantage of electronic documents comes down in flames when one needs to resort to such basic fix. Better than typing everything again, but still unacceptable as a proper solution.

In contrast, I have been again enjoying the chance of writing some fairly long documents by myself; that is, with no co-authors. While writing a review paper I can go back to a LaTeX document that I wrote back in 2001. I copied the useful parts–mostly long equations and a couple of paragraphs–and pasted it in my new document. The style will be taken care by the article class (or even the memoir class if I were feeling fancy).

When working in solitary mode I now default to LaTeX. My only change this year was to move from compiling documents with XeLaTeX instead of LaTeX. Reasons? Easy access to my system fonts and full use of unicode, so I can write with whatever characters I prefer. My current setup is documented here.

The only big choice comes to whatever text editing system one prefers. I have done most of my writing during the last two years in TexShop, which is excellent. Nevertheless, I used Aquamacs in a project (just for the sake of it) and there was and old fashion setting that I really liked: automatic flow hard wrapping to a fixed column. This created a neat and easy to read file, which was independent of window size (in contrast to soft wrapping). The drawback is that emacs is a monster program and so much mac-unlike that it is really hard for me to find my way around all the options (and, god, there is an awful lot of options).

My geeky side does enjoy trying editors, and at night time, I tested most of the editors listed in here, from a LaTeX editing viewpoint. My finalists were TexShop, Aquamacs and Textmate. The latter is pure macness and works beautifully; it evens automatically detects documents that require XeLaTeX based on regular expressions. The compilation window is beautiful as well, and it even supports basic code folding for documents (although why folding of sections is not supported is beyond me). The question would be if folding justifies 39 euros… Well, it could be a nice Christmas gift to myself.

P.S. 2008-12-19: I did buy a copy of Textmate, with 15% academic discount.

Filed in software, writing No Comments

Status on writing tools

31/10/2008

As far as I know, life 2008 has been my most productive year ever from a writing point of view. Besides the blogging and micro-blogging stuff (aka informal public writing), tuberculosis I have worked in lecture notes and I have been writing an inordinate (for me) number of words in research papers. I still hope to submit a couple of more papers this year.

I have written all blog posts using MarsEdit, which is an excellent simple editor. I used MS Word for quite a few papers because I am working with LaTeX-unaware students and colleagues. However, I have also bee using LaTeX for all my lecture notes and a number of long(ish) research papers where I am working mostly by myself.

Until recently, I was using TexShop + LaTeX, but then I discovered XeLaTeX, which added unicode support–so I can write zúñiga in my files–and font management. I can easily access all my fonts in the mac in a fairly simple way. I am documenting the switch in the wiki side of this site.

During some asreml training I re-discovered emacs during Brian’s explanations. I installed Aquamacs (an OS X emacs version), which comes with ESS (emacs speaks statistics) to interface with R and AUCTeX (a LaTeX editing environment) pre-installed. Overall, I am still finding my way within Aquamacs, but the whole system feels very powerful. Now, if I manage to get an asreml version for the mac that would be total bliss.

Filed in mac, software, writing No Comments

Updating ephemera

8/05/2008

Ephemera plural noun: a- things that exist or are used or enjoyed for only a short time. b- items of collectible memorabilia, pharm typically written or printed ones, that were originally expected to have only short-term usefulness or popularity: e.g. Mickey Mouse ephemera.

Alternative definition: Pieces of text that will become irrelevant in a short while and/or do not deserve a full-length page.

Sometimes I have the occasional sound bite or flotsam and jetsam that — although I want to make public — do not warrant writing a full post. I used to deal with this through my Ephemeral wiki page, but it still was a bit of a hassle, and after a while was not being updated. I finally gave up and joined the Twitter crowd, not because I am interested in following other people, but due to the ability to easily update my blog with ephemera.

Thus, I can now either write a quick post either using TwitPod* (a mac twitter client) or via a text message from my mobile phone. The message is then published via WPTwitter (a Wordpress plugin) and painlessly put on the sidebar of my blog, under a ‘Twitter flotsam’ header.

* It may not be the best Twitter client, but did I mention that I use it only for posting and that I do not follow other users? Yes, I did.

Filed in software, web No Comments

Things have to look like

6/05/2008

I think that I have written before about this. Why do virtual things have to look like something — often a supposedly familiar object — else? I was stuck on this again through two almost unconnected but with almost the same look interfaces:

They both happen to look like the front page of a newspaper. Of course they are not the only sites that look like a newspaper, but I just happened to come across them and not any other piece of virtual junk.

The idea behind Times is simple: you (me) are used to read news in a newspaper. RSS items are ’sort of news’ and they look nice when organised as a newspaper. Then, they should be presented as such. The morning news screams ‘I am kind of a newspaper so I should look like one’. Period.

But should they? Papers are big, spread when there was plenty of space available to open the bloody things completely. But now we fold them, we read ‘text’ (either plain or highly decorated) and try to extract maximum information from ever increasing noise. Is something that looks like a several centuries old piece of paper a good interface to summarise too many pieces of flotsam? May be updates, news, email should just float to the top according to our reading habits or topic popularity (some times inverting to extremely unpopular topics).

Does an arbitrary spatial configuration provides any additional information? Doubtful, but it some times looks pretty.

Filed in software, web No Comments

Settling down after upgrade

21/03/2008

Following last week’s major upgrade to Leopard and Office 2008 I needed to pick up with a few things.

I have used only the basic features of Office 2008: Word and PowerPoint look OK, urologist although I still prefer Keynote to the latter if I can get away with it (e.g. when I am teaching). Entourage just does not cut it for me. The interface received a liftoff–although it is still far from pretty–but functionality wise is lacking:

  • Email, website like this calendar and contacts synchronise with exchange without problems, but tasks and notes do not.
  • Contact groups are created in a local account rather than in exchange.
  • There is no simple way to add keyboard shortcuts to file messages.
  • The task functionality is still underwhelming.

Given these issues I am still relying on Mail, Address Book and iCal. The former two synchronise with exchange, while the latter does not. I am publishing the calendar in a webdav server so can access it remotely (just in case). Nevertheless, to dos in Mail are not up to scratch either, so I am relying on Things.

I did test a few task management applications and the best designed (for my taste) where Omnifocus and Things. The problem with Omnifocus is that kept pushing me to work in a very specific way, which happens not to fit with my own way of doing things. In contrast, Things let me order task in lots of different ways.

And for long documents

At the moment I am working in three long documents with a fair amount of complexity and (too) many equations. I am using MacTeX (a LaTeX distribution) with TexShop as front end and BibDesk for reference management. The interesting thing is that BibDesk has a much better interface that Endnote 9, which is the version that we are still using in the University.

I can use LaTeX only because I am working by myself on these documents, but if that were not the case, then I would rely on the not so liked standard: MS Word.

Filed in mac, software, writing No Comments