DataDebrief

Monday, June 6, 2011

10 R One Liners to Impress Your Friends

Following the trend of one liners for various languages (Haskell, Scala, Python), here's some examples in R

Multiply Each Item in a List by 2

#lists
lapply(list(1:4),function(n){n*2})
# otherwise
(1:4)*2

Sum a List of Numbers

#lists
lapply(list(1:4),sum)

# otherwise
sum(unlist(list(1:4))) # or simply
sum(1:4)

Verify if Exists in a String

wordlist = c("lambda", "data", "plot", "statistics", "R")
tweet = c("R is an integrated suite of software facilities for data manipulation, calculation and graphical display")
wordlist[wordlist %in% (c(unlist(strsplit(tweet,' ', fixed=T))))]

Read in a File

readLines("data.file", n=-1)

Happy Birthday to You!

lapply((1:4),function(x){ paste(c("Happy Birthday to ", ifelse(x!=3, "you", "dear Name")), sep="", collapse="")})

Filter list of numbers

n = c(49, 58, 76, 82, 88, 90); c(list(n[which(n<=60)]),list(n[which(n>60)]))

Fetch and Parse an XML web service

library('XML'); xmlParseDoc('http://search.twitter.com/search.atom?&q=R-Project', asText=F)

Find minimum (or maximum) in a List

# for lists
lapply(list(c(14, 35, -7, 46, 98)), min, classes="numeric", how="replace")

# otherwise
min(unlist(list(14, 35, -7, 46, 98)))
# or simply
min(c(14, 35, -7, 46, 98))
max(c(14, 35, -7, 46, 98))

Parallel Processing

# http://cran.r-project.org/web/packages/doSMP/vignettes/gettingstartedSMP.pdf
# copy from Section 4 An example doSMP session
library(doSMP); w <- startWorkers(workerCount = 4); registerDoSMP(w); foreach(i = 1:3) %dopar% sqrt(i)

Sieve of Eratosthenes

##ok, this one is a little cheating
library('spuRs'); primesieve(c(),2:50)

Wednesday, March 9, 2011

Ascii code table in R

A quick method to enumerate the printable ascii characters with their hex & decimal values.

The following code relies on taking the "raw" value of a base 10 int (this gives a hex value), and then using the builtin function rawToChar, which gives a character. You can of course change the range (up to 255). Not sure and haven't tested, but for UTF-8, may need ?Encoding (such as enc2utf8(x))

To use as a function that can accept an ascii character and return the decimal value, something like this:

asc <- function(x) { strtoi(charToRaw(x),16L) }

asc("a")
[1] 97

To use as a function to return the ascii character given it's decimal code value, try something like this:

chr <- function(n) { rawToChar(as.raw(n)) }

chr(asc("a")) # 97
[1] "a"

To print the ascii table (for 32:126), try something like this:

coderange = c(32:126)
asciitable_printable = data.frame(
 coderange,
 as.raw(coderange),
 row.names=rawToChar(as.raw(coderange),multiple=TRUE)
)

colnames(asciitable_printable) <- c("dec","hex")
asciitable_printable

Results

Monday, January 10, 2011

EmEditor R code macro - Almost interactive R development for Emeditor

Get the new macro now hosted on github

Edit 18th Jan 2011: The below text refers to the old version of the macro and is no longer relevant, a new post will describe the new macro, and it is also documented on the github site.

As a follow up to the earlier post regarding Emeditor Professional as an editor for R development, I have updated the script based on user feedback. The script now sends selected text or the whole (if no text selected) file to a running RGui Console if set to, otherwise as per previous iteration, will send it to rterm.

So, there are now 2 ways to run the macro, based on setting the value for a variable called "useSource" - the default is "true", which means any selected text (or file) will be sent to an already running instance of RGui; If useSource is set to "false", the macro will operate the "old" way, by invoking rterm each time.
This has a number of benefits over invoking rterm each time:

Keeps the same workspace loaded - No need to keep repeating tasks, such as loading libraries, performing lengthy calculations etc, if you only changed some other part of the code;
Plots now display;
You can see what has executed.

The macro relies on the R command "source", as sending text via sendkeys is unreliable. By default, the macro has echo=TRUE,keep.source=TRUE set. For further information on the source command, in R, run "help(source)".

Note that when useSource=true, all R output is only shown in R, only errors with emeditor/macro will show in the Emeditor output window.

More features planned for the future:

Auto launch RGui if useSource=true but no rgui running;
Be closer in features to the vim r plugin; and
Move the macro code to github
Maybe create an actual emeditor plugin and run R in a hidden shell for more integration.

Get the new macro now hosted on github
As always, any issues please leave a comment, or log an issue

Friday, October 15, 2010

US Unemployment Rate since 1948

This is a quick response to a discussion on the Visual Analytics LinkedIn group regarding a Wall Street Journal (WSJ) Graph of 'Historical US Unemployment'.

The WSJ chose a heatmap where ordinarily a simple line chart would suffice and discussion revolved around whether the heatmap was successful in providing information to the reader anymore than a line chart would.

Consequently, I put together a quick line plot in python + matplotlib using the following data sources:

United States Department of Labour - Bureau of Labor Statistics - Historical Unemployment rate
List of US Recessions post 1947 on wikipedia - An extract from the "post depression" section. This may vary from the WSJ determined periods. Wikipedia also contains a number of US (un)employment graphs in the entry for Unemployment.

The following graph is a little different to the WSJ one with respect to:

WSJ chose colour bands for each percentage point, I have chosen 0-4 as good, 4-8 ok, 8+ a bit high, loosely based on the colours of the WSJ example. I have no idea if these values are considered good or not by economists.
Periods of recession are represented by vertical bands, as opposed to white dots in the WSJ example.
For readability, the dates have been labeled every 4th year (as per Bureau) - it would vary depending on zoom level if it was implemented as a dynamic chart for the web.

Graph of US Unemployment rates 1948 - 2009

Graph of Australian and US Unemployment Rates 1948 - 2010
Note: Australian data only available from late 70s, source: Australian Bureau of Statistics - Labour force data

Code:

import datetime
import pylab
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FuncFormatter as ff
from numpy import ma,logical_or, arange

rawdata = []
ymd = []

k = []
mdata = []

def chomp(string):
    return string.rstrip('\r\n')

f = open('SeriesReport20101014161301.csv', 'rb')
rawdata = f.readlines()
f.close()
for i in xrange(1,len(rawdata)): #skip header
    rawdata[i] = chomp(rawdata[i])
    yrates = rawdata[i].split(',')
    if i == 1:
        startyear = int(yrates[0])
    elif i == (len(rawdata)-1):
        endyear = int(yrates[0])+1

    for q in xrange(1,13):
        mdata.append(yrates[q])


for x in xrange(startyear,endyear):
    for g in xrange(1,13):
        k.append(datetime.date(x, g, 1))

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot_date(k, mdata, '-',label='unemployment rate', color='#571d1c', linewidth=1)
ax.set_xlim(datetime.date(1947, 12, 1),datetime.date(2010, 1, 1))
ax.xaxis.set_major_locator(mdates.YearLocator(4))
ax.xaxis.set_minor_locator(mdates.MonthLocator(1))

# acceptable? rate
p = plt.axhspan(0, 4, facecolor='green', alpha=0.05) 
#getting worse
p = plt.axhspan(4, 8, facecolor='#aaaaff', alpha=0.05) 
# bad?
p = plt.axhspan(8, 12, facecolor='red', alpha=0.05) 
#ax.xaxis.grid(color='white', linestyle='-', linewidth=0.5)

ax.set_ylim(0,12)
ax.yaxis.set_major_locator(pylab.MultipleLocator(1))
ax.yaxis.set_minor_locator(pylab.MultipleLocator(0.1))
#ax.yaxis.grid(color='white', linestyle='-', linewidth=0.5)

#recession http://en.wikipedia.org/wiki/List_of_recessions_in_the_United_States
plt.axvspan(datetime.date(1948, 11, 1), datetime.date(1949, 10, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1953, 7, 1), datetime.date(1954, 5, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1957, 8, 1), datetime.date(1958, 4, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1960, 4, 1), datetime.date(1961, 2, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1969, 12, 1), datetime.date(1970, 11, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1973, 11, 1), datetime.date(1975, 3, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1980, 1, 1), datetime.date(1980, 7, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1981, 7, 1), datetime.date(1982, 11, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(1990, 7, 1), datetime.date(1991, 3, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(2000, 11, 1), datetime.date(2001, 10, 1), facecolor='orange', alpha=0.1)
plt.axvspan(datetime.date(2007, 11, 1), datetime.date(2009, 6, 1), facecolor='orange', alpha=0.1)


labels = ax.get_xticklabels()
for label in labels:
    label.set_rotation(30)
plt.legend(loc='best')
plt.xlabel('Date' )
plt.ylabel('% Rate' )
plt.title('U.S. Unemployment rate for the period 1948 - 2009')
plt.show()

Sunday, October 3, 2010

Plotting time vs date in R

Having done the plot with python+matplotlib, thought I would have a go reproducing it in R, using only the builtin "plot". So, just to recap - this is a plot of the sun times (rise/set,twilight and blinding) as generated by the great python library pyephem. The R code reads in a csv file as output from a modified version of the python code used in my original post.

For completeness, the csv generation code is below:

import ephem
import datetime, math
import pylab
 
place = ephem.city('Melbourne')
 
start_date = datetime.datetime(2009,12,1,12)
end_date = datetime.datetime(2011, 1, 31,12)
 
base_offset = '0'
twilight_offset = '-6:00:0.0' # "twilight" = centre of the sun is -6deg ideal horizon
eyeline_offset = '15:34:0.0' # arbitrary +15deg
 
sun = ephem.Sun(place)
 
dates = []
sunrise = []
sunset = []
firstlight = []
lastlight = []
firsteyel = []
lasteyel = []

numdays = (end_date - start_date).days
dates = [start_date + datetime.timedelta(days=i) for i in xrange(numdays+1)] 
dates.sort()

def dt2m(dt):
    return (dt.hour*60) + dt.minute

def m2hm(x):
    h = int(x/60)
    m = int(x%60)
    return '%(h)02d%(m)02d' % {'h':h,'m':m}

sunrise = map(lambda x:dt2m(ephem.localtime(place.next_rising(sun,start=x))),dates)
sunset = map(lambda x:dt2m(ephem.localtime(place.next_setting(sun,start=x))),dates)

place.horizon = twilight_offset
firstlight = map(lambda x:dt2m(ephem.localtime(place.next_rising(sun,start=x))),dates)
lastlight = map(lambda x:dt2m(ephem.localtime(place.next_setting(sun,start=x))),dates)

place.horizon = eyeline_offset
firsteyel = map(lambda x:dt2m(ephem.localtime(place.next_rising(sun,start=x))),dates)
lasteyel = map(lambda x:dt2m(ephem.localtime(place.next_setting(sun,start=x))),dates)

writer = open("suntimes.csv", "w")
writer.write("date,firstlight,sunrise,firsteyel,lasteyel,sunset,lastlight\n")
for n in xrange(numdays):
 writer.write(str(dates[n]) +","+  (m2hm(firstlight[n]))+"," \
 + m2hm(sunrise[n])+"," + m2hm(firsteyel[n])+"," + m2hm(lasteyel[n])+"," \
 + m2hm(sunset[n])+"," + m2hm(lastlight[n])+"\n")
writer.close()

I chose to leave the format of the times, as they resemble the format of the Geoscience Australia times.

This is how the csv file looks:

date,firstlight,sunrise,firsteyel,lasteyel,sunset,lastlight
2009-12-01 00:00:00,0518,0551,0719,1858,2026,2059
2009-12-02 00:00:00,0518,0551,0719,1859,2027,2100
2009-12-03 00:00:00,0518,0551,0719,1859,2028,2101
2009-12-04 00:00:00,0518,0551,0719,1900,2029,2102
2009-12-05 00:00:00,0518,0550,0719,1901,2030,2103

And here's the resulting graph:

Here's the R code:

suntimes <- read.csv("suntimes.csv")
 
suntimes$date <- sub(' 00:00:00$', '', suntimes$date) #strip off trailing 0
suntimes$date <- as.Date(suntimes$date,"%Y-%m-%d") # make real date so we can axis.date
suntimes$firstlight <- sub('^([[:digit:]]{3})$', '0\\1', suntimes$firstlight) #pad 0
suntimes$sunrise <- sub('^([[:digit:]]{3})$', '0\\1', suntimes$sunrise)
suntimes$firsteyel <- sub('^([[:digit:]]{3})$', '0\\1', suntimes$firsteyel)
suntimes$lasteyel <- sub('^([[:digit:]]{3})$', '0\\1', suntimes$lasteyel)
suntimes$sunset <- sub('^([[:digit:]]{3})$', '0\\1', suntimes$sunset)
suntimes$lastlight <- sub('^([[:digit:]]{3})$', '0\\1', suntimes$lastlight)
  
#calc as minutes from midnight
suntimes$firstlighttimes <-  as.integer(substr(suntimes$firstlight,1,2))*60 + as.integer(substr(suntimes$firstlight,3,4))
suntimes$sunrisetimes <-  as.integer(substr(suntimes$sunrise,1,2))*60 + as.integer(substr(suntimes$sunrise,3,4))
suntimes$firsteyeltimes <-  as.integer(substr(suntimes$firsteyel,1,2))*60 + as.integer(substr(suntimes$firsteyel,3,4))
suntimes$lasteyeltimes <-  as.integer(substr(suntimes$lasteyel,1,2))*60 + as.integer(substr(suntimes$lasteyel,3,4))
suntimes$sunsettimes <-  as.integer(substr(suntimes$sunset,1,2))*60 + as.integer(substr(suntimes$sunset,3,4))
suntimes$lastlighttimes <-  as.integer(substr(suntimes$lastlight,1,2))*60 + as.integer(substr(suntimes$lastlight,3,4))
 
plot(suntimes$date,suntimes$firstlighttimes,type="l", lwd=1,col='blue',ylim=c(240,1330), axes= F, xlab= "Dates", ylab= "Time")
# make grid first, then overlay points
abline(h=(seq(from=240,to=1330, by=20)), lwd =0.1, lty="dotted", col='#fafafa')
abline(h=(seq(from=240,to=1330, by=60)), lwd =0.1, lty="dotted", col='#fceae4')
abline(v=(seq(from=min(suntimes$date),to=max(suntimes$date),"month")), lwd =0.1, lty="dotted", col='#fceae4')

points(suntimes$date,suntimes$firstlighttimes,type='l', lwd=1,col='blue')
#points(suntimes$date,suntimes$sunrisetimes,type='p',pch=20, cex = 0.75, lwd=0.5,col='#FF0000')
points(suntimes$date,suntimes$sunrisetimes,type='l', lwd=2,col='#FF0000')
points(suntimes$date,suntimes$firsteyeltimes,type='l', lwd=1,col='orange')
points(suntimes$date,suntimes$lasteyeltimes,type='l', lwd=1,col='orange')
#points(suntimes$date,suntimes$sunsettimes,type='p',pch=20, cex = 0.75, lwd=0.75, col='#FF0000')
points(suntimes$date,suntimes$sunsettimes,type='l', lwd=2, col='#FF0000')
points(suntimes$date,suntimes$lastlighttimes,type='l', lwd=1,col='blue')
 
#customise X Axis
axis.Date(side=1, at=seq(from=min(suntimes$date),to=max(suntimes$date),"days"),col.ticks='gray', format="%b", labels="")
axis.Date(side=1, at=seq(from=min(suntimes$date),to=max(suntimes$date),"month"),  col.ticks='red', format="%b\n%Y" )
 
#customise Y Axis
yaxisMajorTicks.hours <- seq(from=240,to=1330, by=60) # whole hours from 4am to 10pm
yaxisMajorTicks.hournames <- as.character(yaxisMajorTicks.hours/60) #eg 240/60=4am
yaxisMajorTicks.hournames <- sub('^(.+)$', '\\1:00', yaxisMajorTicks.hournames) #append ":00"
 
yaxisMinorTicks.tens <- seq(from=240,to=1330, by=10)
yaxisMinorTicks.tensnames <- as.character(yaxisMinorTicks.tens)
yaxisMinorTicks.tensnames <- sub('^(.+)$', "", yaxisMinorTicks.tensnames)
# draw major over minor ticks
axis(side=2, at=yaxisMinorTicks.tens, col.ticks='gray',labels=yaxisMinorTicks.tensnames)
axis(side=2, at=yaxisMajorTicks.hours, col.ticks='red',labels=yaxisMajorTicks.hournames, las=2)
box()
#legend  max(yaxisMajorTicks.hours),c(1,2,1,2,1)pch=c(NA,20,NA,20,NA)
legend("right","center", c("firstlight","sunrise","in eyes","sunset","lastlight"), cex=0.75, 
   col=c("blue","red","orange","red","blue"), lty=1, lwd=2, bty="o", bg='#ffffff');

Friday, October 1, 2010

Plotting time vs date in python and matplotlib

Having to commute to the city each day and frequently getting the blinding sun, I thought as a fun exercise, I could plot the sunrise/sets and thus estimate with some accuracy when the sun is most blinding (using a purely subjective (which seems to fit my anecdotal observations) value for the degrees above the horizon).

The essentials (using python 2.6) are pyephem (astronomical library), numpy (scientific library), and matplotlib (2D graphics and charting library).

Calculating the times is very easy using pyephem, set your observer object to the place you want and off you go, calling next_rising or next_setting, they are returned as datetime objects when wrapped in ephem's "localtime" function (otherwise they're custom dates).

The hard part? plotting times on the Y axis using matplotlib ... if you do it with datetime objects. I gave up after trying a few different ways, and resorted to a very easy workaround.

Plot the times as integer minutes : 0 being midnight, 60 = 1am ... which is 24hr * 60. Then format the Y axis as Hour:Minute using a custom formatter:

from matplotlib.ticker import FuncFormatter as ff


def m2hm(x, i):
    h = int(x/60)
    m = int(x%60)
    return '%(h)02d:%(m)02d' % {'h':h,'m':m}

ax.yaxis.set_major_formatter(ff(m2hm))

To get sensible tick marks, I set the major ticks to an hour

ax.yaxis.set_major_locator(pylab.MultipleLocator(60))

The minor are set to 15 minutes though I left off showing the labels for the Y minor as it clutters the graph, but left it commented in the code.

import ephem
import datetime, math
import pylab
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FuncFormatter as ff

place = ephem.city('Melbourne')

start_date = datetime.datetime(2009,12,1,12) # yyyy,m,d,h - midday is best
end_date = datetime.datetime(2011, 1, 31,12) # midday Jan 31st, 2011

twilight_offset = '-6:00:0.0' # "twilight" = centre of the sun is -6deg ideal horizon
eyeline_offset = '15:00:0.0' # arbitrary +15deg

sun = ephem.Sun(place)

dates = []
sunrise = []
sunset = []
firstlight = []
lastlight = []
firsteyel = []
lasteyel = []

numdays = (end_date - start_date).days
dates = [start_date + datetime.timedelta(days=i) for i in xrange(numdays+1)] 
dates.sort()

def dt2m(dt):
    return (dt.hour*60) + dt.minute

def m2hm(x, i):
    h = int(x/60)
    m = int(x%60)
    return '%(h)02d:%(m)02d' % {'h':h,'m':m}

sunrise = map(lambda x:dt2m(ephem.localtime(place.next_rising(sun,start=x))),dates)
sunset = map(lambda x:dt2m(ephem.localtime(place.next_setting(sun,start=x))),dates)

place.horizon = twilight_offset
firstlight = map(lambda x:dt2m(ephem.localtime(place.next_rising(sun,start=x))),dates)
lastlight = map(lambda x:dt2m(ephem.localtime(place.next_setting(sun,start=x))),dates)

place.horizon = eyeline_offset
firsteyel = map(lambda x:dt2m(ephem.localtime(place.next_rising(sun,start=x))),dates)
lasteyel = map(lambda x:dt2m(ephem.localtime(place.next_setting(sun,start=x))),dates)

fig = plt.figure()

ax = fig.add_subplot(111)
ax.plot_date(dates, firstlight, ':',label=' First light (' + twilight_offset + ')', color='#06544e')
ax.plot_date(dates, sunrise, linestyle='-',label=' Sunrise', color='red')
ax.plot_date(dates, firsteyel, linestyle='-', label=' In eyes ('+ eyeline_offset + ')', color='orange')
ax.plot_date(dates, lasteyel, linestyle='-', color='orange')
ax.plot_date(dates, sunset, linestyle='-', label=' Sunset', color='red')
ax.plot_date(dates, lastlight, ':',label=' Last light (' + twilight_offset + ')', color='blue')

ax.set_xlim([dates[0],dates[len(dates)-1]])
ax.xaxis.set_major_locator(mdates.AutoDateLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
#ax.xaxis.grid(False)
ax.xaxis.grid(color='#ffcf06', linestyle=':', linewidth=0.5)

ax.set_ylim(240,1330) #4am to 10pm
ax.yaxis.set_major_locator(pylab.MultipleLocator(60))
ax.yaxis.set_minor_locator(pylab.MultipleLocator(15))
ax.yaxis.set_major_formatter(ff(m2hm))
#ax.yaxis.set_minor_formatter(ff(m2hm))
ax.yaxis.grid(color='r', linestyle=':', linewidth=0.5)

labels = ax.get_xticklabels()
for label in labels:
    label.set_rotation(30)
plt.legend(loc='best')
plt.xlabel('Date' )
plt.ylabel('Hour (24hr)' )
plt.title("Sun times\n" + str(start_date) + " - " + str(end_date) + "\n (localised for " + place.name + ')')
plt.show()

Thursday, September 23, 2010

Automatically click Yes on Internet Explorer Security Alert in Python for automated testing (with Selenium)

Trying to run Selenium RC tests for a secure test website, and kept getting an "alert" from Internet Explorer 6 that there's a problem with the certificate. After searching for answers, I couldn't find a solution to force Internet Explorer 6 to ignore/accept the cert, despite adding it to "trusted sites", and the certificate to "Trusted" stores. In my case, the alert mentions the company name does not match (so it seems it will never accept it).

So, the only workaround I could think of was to automatically click the Yes button (or maybe send alt+y). Below is the python code that seems to do the trick for me:

import win32gui
import win32api


def ie_secalert_yes(maxtry=10):     
    """Left click YES button if the IE Security Alert dialog is found.
    Put this is in a thread, lest it be blocked from ever finding the alert.
   
    Keyword arguments:
    maxtry -- Give up looking for dialog after this many attempts (default 10)
   
    Note:
    (hdlg) FindWindow did not seem to work with class = "Dialog", but 32770
    does.
   
    """
   
    hdlg = 0
    hwnd = 0
    curtry = 0
   
    while curtry < maxtry: 
        hdlg = win32gui.FindWindow(32770, "Security Alert")
        if hdlg == 0:
            curtry += 1
            time.sleep(0.2)
        else:
            curtry=maxtry

     if hdlg > 0:
        hwnd = win32gui.FindWindowEx(hdlg,0,"Button", "&Yes")
        if hwnd > 0:
            win32api.PostMessage(hwnd, 0x0201, 0, 0) #WM_LBUTTONDOWN
            win32api.PostMessage(hwnd, 0x0202, 0, 0) #WM_LBUTTONUP

You could return hdlg or hwnd if you wanted, they'll be > 0 if the dialog was found. The above also works fine when there is no alert dialog too, in case there's any concern. This could easily be adapted/extended to handle any other window outside of selenium's control.

You should put the above in a thread (I used a timer), then init the timer before selenium, but run it after selenium.start, an example

t = threading.Timer(3.0, ie_secalert_yes)
    self.selenium = selenium("localhost", 4444, "*iexplore", URL) 
    self.selenium.start()
    t.start()

Caveats: Only tested in Windows XP, Selenium RC 2, and Python 2.6.5, Internet Explorer 6.

Tuesday, September 21, 2010

Auto-complete for R in EmEditor

This will work for any language you have a syntax file for, but I'll use R as the example.

1. Open an R file, and on the plugins menu, click "Word Complete"

2. Type a couple of letters e.g. rn and then press ctrl+space (default word suggest keyboard combination) and hopefully an auto-suggest popup menu will appear.

3. If nothing seems to happen, be sure you have configured R with the R syntax file (see previous post), otherwise you may need to do some further configuration.
3.1 Go to Tools > Plugins > Customize Plugins > Word Complete (be sure it has a check) > R and double check the various settings, also verify the keyboard short cut assigned under the keyboard tab.

This post was inspired by Tal Galili's discussion of wordpress themes, and Yihui Xie's Notepad++ autocomplete efforts.

Sunday, September 12, 2010

EmEditor Professional as an R script editor

R is not supported "out of the box" by EmEditor, so here's a few tips I've found for using it as a great editor for R.

Code Syntax highlighting
Executing R scripts and capturing output
Use ctags symbols to navigate files.
If you create packages for CRAN or like neat code, you might like to use a macro to tidy it up.

Getting the R syntax file
This is on the EmEditor website, in the Library > Syntax Files section. Thank you to whomever created and uploaded it. While you are in the properties, it's also a good opportunity to change tabs to spaces.

Creating an "External Tool" to run R scripts
The following settings (see also screenshot) in the create/properties screen for external tools should get things running:
Command: C:\Program Files\R\R-2.11.0\bin\Rterm.exe
Arguments: --vanilla --quiet --slave --file="$(Path)"
Initial Directory: $(Dir)

Icon Path: C:\Program Files\R\R-2.11.0\bin\R.exe
Save file: ☑Checked
Use Output bar: ☑Checked
Close one exit: ☑Checked
Input: None
Output: Create new document
Standard Error: Create new document

Using a macro to run R scripts
Edit 18th Jan 2011: The below text refers to the old version of the macro and is no longer relevant, a new post will describe the new macro, and it is also documented on the github site. Get the new macro now hosted on github

Edit 10th Jan 2011 - This macro has been updated and there is now a specific post regarding it.
See emeditor-r-code-macro-almost.html

This offers more flexibility than the external tool choice, particularly if you want to run only a portion of code (more than passing an "expression"), or if the file has not been saved yet. The Rrun.jsee macro is one I constructed to allow passing a saved file, unsaved file or snippet of code to rterm - feel free to use it and improve it.

Generating and using ctags symbols for R
The simplest way of using ctags for an unsupported language is to use the regular expression support, and include it in a file .ctags in your home directory (c:\users\xxxx\ for vista). This will then automatically get picked up by ctags and you won't need to do any further config regarding it in EmEditor (or other editor for that matter). It's also O/S transportable.
My .ctags has the following, you may want to modify it to suit your particular needs though (you probably don't want the apply or plot, or even variable).
Note: ctags does not seem to support multiline, so if you have a function assigned on a different line (like when it is tidied by R (see below)) it won't find it.

--langdef=R
--langmap=R:.r
--regex-R=/([[:alnum:]\._]+)([[:space:]]*)<-([[:space:][:cntrl:]]*function)/\1/f,function/
--regex-R=/([[:alnum:]]+)[[:space:]]*<-/\1/v,variable/
--regex-R=/[^[:alnum:]]plot[[:space:]]*\(([[:alnum:],=" ]+)\)/\1/p,plot/
--regex-R=/([[:alpha:]]apply[[:space:]]*\([[:alnum:],=" ]+\))/\1/a,apply/i

To double check the file is ok, you can run "C:\Program Files\EmEditor\PlugIns\ctags.exe" --list-kinds it will show R listed. When you update the symbols in EmEditor, you should see something similar to this:

Tidying up R code with a macro in EmEditor
Based on the "Writing R Extensions" manual, I created an EmEditor macro (jsee) to do basic R code tidying. It is based off the run R macro, so you can tidy sections of code as well as the whole file. It should be easily modified to run through cscript/wscript rather than EmEditor, just beware some objects are EmEditor only.
Note: This method will remove any comments in the code, it also seems to force new lines for function assigns, eg, turns this:

shorten.to.string <- function(line, token)

to

shorten.to.string <- 

function (line, token)

I'm not sure if that breaks suggested style rules or not, is a pain for picking up with ctags though (more a ctags failing).