awwx.ws

lang2

Call Perl and Python from Arc

This is my project to allow libraries in other langauges to be easily called from Arc. This preliminary version supports Perl and Python. I haven’t written much code using this yet, so the interface will change as I figure out what I should actually be doing.

I’ll use “Lang” as an abbreviation to mean the other language, that is, Perl or Python.

In this approach:

See lang-faq for a discussion of this design.

Interface

The Lang program is constructed with print statements, much like how the macros in html.arc generate HTML. The complete program is sent over to the Lang process, which returns the result of the expression converted to an Arc value.

arc> (python subprocess (expr (pr "5+6")))
11
arc> (perl subprocess (pr "5+6"))
"11"

“subprocess” means to run the program using the subprocess style.

Python distinguishes between statements like “import sys” and expressions like “5+6”, so expr gets and returns the value of the expression for us. This isn’t needed for Perl since a Perl statement can be used as an expression.

Perl doesn’t distinguish between numbers and strings, and so by default the value is converted to an Arc string. We can explicitly ask for an Arc number:

arc> (perl subprocess (pr "arcnum(5+6)"))
11

tolang will convert an Arc value into a Lang value for us:

arc> (let xs '(0 10 20 30 40)
       (python subprocess (expr (tolang xs) (pr "[3]"))))
30

tolang automatically prints the Lang value, so it doesn’t need to be enclosed in its own pr.

It can be useful to see what Lang program is being generated; langex will expand and print the program for us, without running it. Here’s how the same code is expanded into Python:

arc> (let xs '(0 10 20 30 40)
       (langex python (expr (tolang xs) (pr "[3]"))))
result = ([0,10,20,30,40][3])

nil

here we can see that expr works by assigning the expression to the result variable, and that the Arc list (0 10 20 30 40) is converted to a Python list [0,10,20,30,40].

The template syntax gives us a couple of abbreviations: “...” prints its contents so that it becomes part of the program, and a «foo» inside of a “...” calls tolang for us on the Arc expression foo.

arc> (let xs '(0 10 20 30 40)
       (python subprocess (expr “«xs»[3]”)))
30

Here’s an example which calculates the SHA-224 hash of a string:

arc> (def sha224 (data)
       (perl subprocess “
     use Digest::SHA qw(sha224_hex);
     sha224_hex(«data»);
     ”))
#<procedure: sha224>
arc> (sha224 "hello")
"ea09ae9cc6768c50fcee903ed054556e5bfc8347907f12598aa24193"

Process styles

subprocess

The subprocess style is the simplest: the Lang program is launched in a subprocess, it runs to completion outputing its result, and terminates.

A disadvantage of the subprocess style is that it can be expensive to startup the Lang interpreter every time you want to make a library call, especially if you’re loading large libraries. The advantage is that each run is independent: you don’t have to worry about one library interfering with another or what kind of I/O the program might be doing.

singlethread

With this style Lang runs a single threaded event driven server. Perl uses AnyEvent and Python uses Twisted.

In the event loop style, you don’t wait for I/O to be performed, and then run multiple threads so that one I/O operation doesn’t block another. Instead, you initiate an I/O operation, and arrange to be called back when the operation completes.

The advantage of this style is high performance: Lang and the libraries you’re using only need to be loaded once. The disadvantage is that if the program does any I/O it should use the I/O facilities of the event framework; otherwise you’ll block the whole server waiting for I/O to complete. This makes this style a poor choice for calling libraries that do “normal” I/O instead of being written for the event framework.

Forking server

I haven’t implemented this style yet, but it’s worth a mention as a compromise between the subprocess and single thread styles. In this style Lang and any libraries you’re going to use are loaded once, and then the server forks processes to handle program requests from Arc. As each concurrent request is running in its own process it is free to call libraries that do blocking I/O without impacting other requests.

Concurrency

Each call to perl or python makes a separate HTTP request to the Lang server, so Arc can make as many simultaneous calls as it wants.

An Arc thread making a lang call will pause until the call returns. However you can fire up multiple threads to make simultaneous calls. When running a web server using Arc’s srv, each web request is handled in a separate thread, and so making a long running Lang call in one request won’t impact a Lang call made in another request.

In the subprocess style, each program runs in its own process, and so any simultaneous Lang calls will run simultaneously in their own process.

In the single threaded server style, the Lang server will by default respond to requests one by one, unless you use the event loop facilities to do I/O or run things in another thread.

Generating Lang literals from Arc

The tolang provides default conversions of Arc values into Lang values:

ArcPerlPython
'foo'foo'u'foo'
123123123
"foo"'foo'u'foo'
(1 2 3)[1,2,3][1,2,3]
{a 1 b 2}{'a':1,'b':2}{u'a':1,u'b':2}
nil[][]

In Python, Arc symbols and strings are converted to Python Unicode style strings by default; you can generate Python byte strings instead using pystr:

arc> (langex python “«(pystr "foo")»”)
'foo'
nil

Arc’s nil is by default converted to an empty array in Lang, if you need a boolean you can use lbool:

arc> (langex perl “«(lbool nil)»”)
0
nil
arc> (langex python “«(lbool t)»”)
True
nil

Generating Arc literals in Lang

Lang strings, numbers, lists, and hashes / dictionaries are converted automatically to corresponding Arc values. Tables are returned using my {...} table notation.

A number of functions in Lang are available to get specific Arc values.

arcnil returns an Arc nil and arct returns an Arc t.

arcbool(x) returns an Arc t if x is considered a true value in Lang, and nil otherwise. For example, in Arc the number zero is treated as a true value:

arc> (if 0 'yes)
yes

but in Perl 0 is false, so:

arc> (perl subprocess “arcbool(0)”)
nil

arcstr(x), arcsym, and arcnum produce Arc strings, symbols, and numbers.

By default hashes / dictionaries generate string keys. Sometimes symbol keys are nicer on the Arc side:

arc> (python subprocess
       (expr “arcsymtab({'foo': 3, 'bar': 4})”))
{foo 3 bar 4}

Error handling

Errors on the Lang side are caught and passed back to Arc. (The error is reported by passing back an Arc table value with a unique key which won’t be accidentally generated by the Lang code). On the Arc side the error traceback and a copy of the program is printed to stderr, and the error is raised in Arc with Arc’s err function.

In this design having an uncaught error is considered an exceptional condition; if there are some particular errors that you’re expecting then you probably should catch them yourself on the Lang side, and then return some particular value to indicate that the expected error happened.

Debug printing

When Lang is run in a subprocess, or the Lang server forks off, its stderr is left open to Arc’s stderr (which is usually your terminal, unless you’ve done something else to it). Thus you can print things to stderr from Lang to see what’s going on:

perl: print STDERR "foo\n";

python: print >>stderr, "foo"

Security

The Lang server listens only on the loopback interface (127.0.0.1 aka localhost), and so doesn’t accept connections from any other computer. Still, it does allow arbitrary code execution from any process on your computer that connects to the lang server port. This may be OK on computers where you’re the only person running programs. In other environments, or if you want to run the Lang server on a different computer, you’ll want better security.

Control port

The Arc process opens and listens on a port which the Lang server connects to. No data is transferred on this port; the sole purpose is to get the Lang server to shut down when the Arc process terminates. When the Arc process ends, even by a segmentation fault or a hard kill, the operating system will close all its ports. The Lang server notices that it has lost the connection to the control port and exits. This avoids having old Lang server processes lying around.

HTTP implementation

I wanted to be able to pass the input stream of the HTTP response from the Lang server directly to Arc’s read function, so I wrote a very simple HTTP client implementation that does nothing but make POST form requests. For a more general purpose HTTP implementation, see Mark Huetsch’s web.arc in Anarki.

Prerequisites

This hack depends on arc3.1, lock1, post2, implicit2, between0, template5, and xloop0.

Get this hack

Using the hackinator:

$ hack \
    ycombinator.com/arc/arc3.1.tar \
    awwx.ws/defarc0.patch \
    awwx.ws/defarc-ac0.patch \
    awwx.ws/extend0.arc \
    awwx.ws/scheme0.arc \
    awwx.ws/defvar1.patch \
    awwx.ws/defvar2.arc \
    awwx.ws/implicit2.arc \
    awwx.ws/defarc-literal0.patch \
    awwx.ws/arc-write0.patch \
    awwx.ws/between0.arc \
    awwx.ws/skipwhite1.arc \
    awwx.ws/extend-readtable0.arc \
    awwx.ws/table-rw3.arc \
    awwx.ws/xloop0.arc \
    awwx.ws/span0.arc \
    awwx.ws/template5.arc \
    awwx.ws/re3.arc \
    awwx.ws/client-socket1.arc \
    awwx.ws/binary1.arc \
    awwx.ws/parseurl0.arc \
    awwx.ws/readline1.arc \
    awwx.ws/begins-rest0.arc \
    awwx.ws/urlencode1.arc \
    awwx.ws/read-headers0.arc \
    awwx.ws/encodequery0.arc \
    awwx.ws/post2.arc \
    awwx.ws/lock1.arc \
    ~/code/lang/lang2.arc

Contact me

Twitter: awwx
Email: andrew.wilcox [at] gmail.com