Project

General

Profile

ThoughtsOnR.txt

Dan Higgins, 06/14/2004 10:24 AM

 
1
 Hi All,
2

    
3
I have been working on trying to understand some of the details of the R system (http://www.r-project.org/) and how it might be integrated into Kepler. For those who are unfamiliar with R, "R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files." (from the "R FAQ"). R is a powerful system for statistical and other calculations. It is comparable to Matlab or SAS but has the advantage of being free, easily extended, and available for PCs, Macs (OS X), and Unix systems. There are also numerous extensions from a variety of sources. It thus appears to be fairly widely accepted and used by numerous researchers.
4

    
5
A first-cut on building an R actor would seem to be to use a local version of R (since it can be freely installed on almost any computer) and run it as a sub-process to Kepler. An obvious method for doing this is to use one of the CommandLine/Exec actors.
6

    
7
I say 'one of ...' because there are at least 2 existing actors for running arbitrary subprocesses from within Kepler/Ptolemy.  The "CommandLine" actor can be found in the the Kepler graph editor tree under "actors/kepler/spa/CommandLine". The author listed in the source is Ilkay Altintas, and this actor runs under the 3.0.2 version of Ptolemy/Kepler. A second similar actor, called "Exec" is included with the Ptolemy 4.0Beta release under "MoreLibraries/Esoteric/Exec". The Exec actor was written for Ptolemy 4 by Chris Brooks and (I think) uses some new features that are not available in version 3.0.2. [Specifically, there is an "Expert Mode" for setting additional parameters.]
8

    
9

    
10

    
11

    
12

    
13
Both the CommandLine and Exec actors use the Java 'exec' method to launch a subprocess. They differ in the details, however. CommandLine actually  launches a command processor ('cmd.exe/command.exe' on Windows and 'sh' on Mac/Linux) so that  the command entered by a user is essentially identical to that entered in a terminal window to launch a process. This can include I/O redirection like "< myfile.in". In the Exec actor, the command follows the underlying  Java method more closely and has ports for input and output streams. The command string cannot include redirection. Both actors wait for the subprocess to finish before their 'fire' action completes.
14

    
15
Now consider just how we might integrate R into Kepler. R can be run in an interactive mode (start up; type a command; see response; type another command) or in a batch mode (start R with a script file which has a series of command and write the results to an output file). Creating an R workflow in the batch mode is fairly easy. A screen shot of a workflow which uses the CommandLine actor to run R to create a jpeg plot and then display it shown below.
16

    
17

    
18

    
19
The script file used in the example is:
20

    
21
x <- seq(-10, 10, length = 50)
22
y <- x
23
rotsinc <- function(x, y) {
24
    sinc <- function(x) {
25
        y <- sin(x)/x
26
        y[is.na(y)] <- 1
27
        y
28
    }
29
    10 * sinc(sqrt(x^2 + y^2))
30
}
31
sinc.exp <- expression(z == Sinc(sqrt(x^2 + y^2)))
32
z <- outer(x, y, rotsinc)   
33
jpeg(filename = "RTest.jpg", width = 480, height = 480, pointsize = 12,
34
     quality = 75, bg = "white")
35
par(bg = "white")
36
persp(x, y, z, theta = 30, phi = 30, expand = 0.5, col = "lightblue")
37

    
38
It can be seen in this batch approach that one can get the results from an R calculation from the output stream or from a file created by R that is then read by other Kepler actors. A problem comes up, however, if one considers how to dynamically input instructions/data to R. In batch mode, this could require the dynamic creation of script files, although it would be nicer if ports for inputing data/instructions existed for an R actor. One thus has the question of how to import information from other parts of a workflow to an R actor.
39

    
40
And what about using R in an interactive mode? Both the CommandLine actor and the Exec actor start a subprocess and then wait for it to finish. This means that the R code is loaded, executed, and then removed from memory.  For an interactive environment (or for the case where the R calculation is repeatedly executed). it would be desirable to only load R once!  There doesn't seem to any reason why  the R process has to be stopped between firings. One could keep the process in memory (a static variable?) and simply read the input stream, execute it, write the output to the output stream, and then wait for the next input as part of a fire event.  [Or perhaps there needs to be some class level R actor and a set of instances that do certain calculations by communicating with the class actor???]
41

    
42
In any case, it is possible to simulate an interactive R session using save/load workspace options when starting and ending an R session. But it would be useful if the CommandLine actor had an 'inport' port to receive commands. Also, it might be useful if the Exec actor really had input and output streams instead of the String tokens currently used (to handle long inputs).
43

    
44
That ends these semi-random thoughts for now.
45

    
46
Any comments or suggestions?
47

    
48
Dan
49

    
50
-- 
51
*******************************************************************
52
Dan Higgins                                  higgins@nceas.ucsb.edu
53
http://www.nceas.ucsb.edu/    Ph: 805-892-2531
54
National Center for Ecological Analysis and Synthesis (NCEAS) 
55
735 State Street - Room 205
56
Santa Barbara, CA 93195
57

    
58

    
59
Instead of calling Exec, I suggest creating a Java Native Interface to
60
R.  The Ptolemy/Matlab interface uses JNI.  
61

    
62
The advantage to a JNI interface is that you can have tighter coupling
63
with R or Matlab
64

    
65
The Java Exec interface is tricky to use, since command line parsing
66
gets tricky, and the semantics of reading and writing to a process can
67
be strange.  You could use files to tranfer data to and from R, but
68
this can get tricky as well.  
69

    
70
In Ptolemy Classic, we had some common infrastructure that allowed
71
us to interface to Matlab and Mathematica.  Perhaps we could build
72
on the Matlab interface?
73

    
74
There is also code in $PTII/jni that makes it fairly easy to wrap
75
C functions.  Right now, the UI is broken, but the backend works.
76

    
77

    
78

    
79
I wrote an Exec actor in Ptolemy II 4.0, and frankly, I'm not proud of
80
it. It does the minimal, it executes a sub process and gets the result.
81
I tried hacking around with a better actor that was more complex, but
82
ran in to problems.  It seems like using PN to read and write data is
83
probably the way to go.  I was disappointed that I could not implement
84
an interactive shell.  I was surprised that Unix pipes, something I've
85
trivially used forever, is a little tricky to implement in sdf.
86

    
87
I'm not particularly wedded to this version of the Exec actor, 
88
in the end, we decided to have something simple.
89

    
90

    
91
On Feb 25, 2004, I wrote the following to the ptolemy internal mailing
92
list.
93

    
94

    
95

    
96
>> I simplified the Exec actor so that fire() does not return until 
97
>> the subprocess returns.  
98
>>  
99
>> This change makes it impossible to use this actor to invoke bash once
100
>> and send different commands to the same bash process repeatidly -
101
>> so, we cannot use the InteractiveShell to invoke commands in
102
>> the same running bash process, we would need to start a separate
103
>> process for each command.  To use InteractiveShell, we would need to
104
>> have some sort of PN specific version of Exec (which I did not
105
>> implement).
106
>>  
107
>> I left the command parameter as a PortParameter so that the
108
>> I can invoke the actor over an over with a different command line.
109
>> This makes it possible to write "Run all demos" as a model.
110
>> There is such a model in the test suite.
111
>> 
112
>> Now, if the subprocess terminates with a non-zero value, fire() throws
113
>> an exception.
114
>> 
115
>> Also, I made the environment parameter be a record, though this change
116
>> does make the code more complex.
117
>> 
118
>> Also, I looked into sending and end-of-transmission (EOT) character
119
>> as a way of terminating a cat process.  EOT is Control-D or \04.
120
>> I can send a \04 character, but cat does not seem to notice
121
>> and it does not terminate.
122
>> 
123
>> The following example illustrates the problem.
124
>> 
125
>> I create a text file that has an embedded Control-D char and
126
>> run it through cat and cat does not care.  
127
>> 
128
>> cxh@maury 85% cat controld.c
129
>> #include <stdio.h>
130
>> int main(int argc, char ** argv) {
131
>>     printf("%s%c%s", "foo", 4, "bar");
132
>> }
133
>> cxh@maury 86% cc controld.c
134
>> cxh@maury 87% a.out > controld.txt
135
>> cxh@maury 88% od -c controld.txt
136
>> 0000000   f   o   o 004   b   a   r
137
>> 0000007
138
>> cxh@maury 89% cat <controld.txt
139
>> foobarcxh@maury 90% 
140
>> 
141
>> I think Control-D handling is done in the shell.
142
>> 
143
>> I could modify Exec so that when it sees a Control-D it terminates the
144
>> process, but this would terminate the subprocess too soon if the
145
>> process was very long running.  
146

    
147

    
148
-Christopher
149