Thursday, January 28, 2010

December 2009 Mini-sprint Report

Tyler Green and I held a mini-sprint on Mython on December 17th, 2009. We worked on the following:

The trampoline parsing framework.
A regular expression quotation function.
A Cheetah quotation function.

I’m pleased to start moving the trampoline parsing framework out into Basil (see the basil.parsing.trampoline module, available in the Basil repository). I have been batting around the idea of using Python’s generators to implement recursive-descent parsers for a few years now, starting with a proof of concept demonstration in Fall of 2008. Finally several issues with the existing MyFront front-end have forced me to roll something into Basil.

At its core, the framework is quite simple. The framework uses a trampoline to dispatch to a set of functions that return generators.Â When a LL(1) state machine (such as those generated by pgen) would push a nonterminal symbol, or a recursive-descent parsing function would call another parsing function, the generator yields the name of the nonterminal symbol. When the LL(1) state machine would pop, or a recursive-descent parsing function would return, the generator simply returns, which raises a StopIteration exception in Python. The top-level trampoline code simply maintains a stack of generators, pushing and dispatching to a new generator when a generator yields a nonterminal symbol, popping when a generator raises StopIteration. This method keeps recursive-descent parsers from running into Python’s relatively shallow call stack bounds, and affords a form of syntactic extensibility by virtue of having a per-nonterminal dispatch table.

Interested parties should expect to see more about this particular module and its application in a new Mython front-end. The unit test for the trampoline module (see basil.parsing.tests.test_trampoline for the code), demonstrates how to use the framework, defining a recursive-descent parser for a simple calculator. At the time of writing, I’m still in the process of integrating the Mython-specific pieces of the front-end and handling some corner cases that I was previously ignoring.

While at OOPSLA 2009, Martin Hirzel noted that one relatively easy demonstration for Mython might involve embedding a regular expression language. While we might want to have something more powerful in the future, Python’s regular expression sub-language embeds easily into Mython. Tyler and I followed a similar strategy to the LLVM assembly embedding I did in November 2009.

Prior to the sprint I looked at our options for storing compiled regular expressions in a module. Unfortunately, the only clear option for serializing and deserializing regular expression state machines uses Python’s pickle module, which involves re-compilation of the regular expression. The result should be comparable to the LLVM assembly embedding: we gain static checks, and can drop extra backslashes. We don’t really save any space in bytecode, nor compiled program run time.

The quotation function is now in basil.lang.regex, with a corresponding unit test in basil.lang.tests.test_regex.Â Since the quotation function uses the built-in re module we can import the quotation function at compile time, and not include it
at run time. Here’s what we arrived at:

def requote(name, src, env):
    reobj = re.compile(src.strip())
    recode = pickle.dumps(reobj)
    recode1 = ("import pickle\n" +
               "%s = pickle.loads(%r)\n" % (name, recode))
    ast, env = env["myfrontend"](recode1, env)
    return ast.body, env

The requote() function compiles the regular expression source into a regular expression object, then serializes the object using the pickle module. Finally, the function generates run-time code that deserializes the pickle string.Â The following shows the Mython portion of the first (and currently only) regular expression test (from “test_regex01.my“):

#! /usr/bin/env mython
quote [myfront]:
    from basil.lang.regex import requote

quote [requote] myre0:
    you only need two \\ to match one backslash

The Mython test code binds a compiled regular expression to the myre0 identifier in the bytecode module. If we disassemble the module code object in the .pyc we see the following (reformatted a little):

...
2          12 LOAD_NAME                0 (pickle)
15 LOAD_ATTR                1 (loads)
18 LOAD_CONST               2 (                \
"cre\n_compile\np0\n(S'you only need two \\\\\\\\ to match" \
" one backslash'\np1\nI0\ntp2\nRp3\n.")
21 CALL_FUNCTION            1
24 STORE_NAME               2 (myre0)
...

For those who are adept at reading Python pickle strings (I was at some point in my career), we can see that the regular expression pickle simply calls the same re.compile() function that the quotation function called. I would argue that at least we gained additional static checks, but hopefully developers are unit testing their embedded regular expression strings before using them in production, making static checks not pay off until there are so many regular expressions in the code base, nobody is sure they’ve all been checked. The test does at least save us two backslashes in the demo code (though the backslashes are then double escaped in the pickle string). I hope readers will speak up if I am missing a more efficient compile-time representation trick, such as binary pickles.

Further building on the trick that I showed in the November 2009 article, Tyler and I started looking at embedding formatting strings in Mython. For example, we might want to rewrite the requote() function in Mython as shown below:

def requote (name, src, env):
    re_obj = re.compile(src.strip())
    re_pickle_str = repr(pickle.dumps(re_obj))
    quote [mython_template] out_ast:
        import pickle
        $name = pickle.loads($re_pickle_str)
    return out_ast, env

This example uses a hypothetical quotation function, mython_template(), to generate and compile Mython code to an abstract syntax tree (AST). This quotation function combines the string formatting, and parsing (quotation) steps of requote(). Once compiled, the quoation function should expand back to something similar to the original requote() function.

On our way to something like mython_template(), it occured to us that Cheetah is an expressive formatting language that would be easy to embed in Mython. The result is two new quotation functions, cheetah() and echeetah, in the basil.lang.cheetah module. The cheetah() function takes the embedded string and uses it to create a constructor function (a curried call to the class constructor) for building a Cheetah Template object. The second function, echeetah() builds a Cheetah Template instance, using the run-time environment to satisfy the namespace arguments in the constructor. An example of using these quotation functions appears in the basil.lang.tests.test_cheetah module, which in turn loads, compiles, and runs the test_cheetah01.my Mython module.

This work continues to build examples of quotation functions. I have been working on getting a unit test suite set up for regression testing purposes, and something is available. I am looking forward to hardening the Mython implementation using these tests, which will certainly be a goal of future mini-sprints, and the up-and-coming
sprint at PyCon 2010.

posted by jriehl at 10:37 am

Comments Off

Wednesday, January 13, 2010

OOPSLA 2009 and Extensible Languages

I hate to have sat on this essay as long as I have, since it has been over two months since OOPSLA 2009 finished up.Â I hope that later is better than never.Â After rereading this post, I also see one piece of information missing: the part where I drop a little sizzle for myself.Â I presented at DLS 2009, and it went well [Riehl09].Â In that talk I argued for language extensibility and show how the extensibility mechanism in Mython affords easier language embedding and optimization.Â How funny I and other extensible language implementers should stumble into Liskov’s warning a few days later…

I have always considered myself fortunate when I get to hear ACM Turing Award winners speak, and this year was no exception.Â I was happy to see Barbara Liskov reprise her 2009 ACM Turing Award lecture at OOPSLA 2009.Â Her talk was interesting for several reasons.Â I found it exciting to listen to her discussion of the history of abstract data types, which were fundamental when I was first learning to program.Â I also liked Liskov’s review of CLU; It is always humbling to see how little modern systems languages have progressed in the last three or four decades.Â I particularly liked her pointers into the literature, and her injunctions against reinvention of the wheel.

As a researcher, I was particularly interested in her slide and comments about extensible languages: she didn’t think they were a good idea.Â If I remember correctly, Liskov stated that extensible languages were researched in the 1960’s and 1970’s, abandoned as a bad idea, and recent efforts to revive them will encounter the same barriers.Â Before I address this issue further, I’d like to discuss the two papers she cited for extensible languages:

R. M. Balzer. “Dataless programming.”Â FJCC 1967 [Balzer67].
Stephen A. Schuman and Philippe Jorrand.Â “Definition mechanisms in extensible programming languages.”Â FJCC 1970 [SJ70].

The paper by Balzer does not seem pertinent to my notions of language extensibility.Â Balzer’s dataless programming language “feels” like a special form of aspect-oriented programming, where users define data layout in a special segment of the program specification.Â Algorithms access data through a generic interface for decomposing and iterating through data structures.Â I would note this citation makes more sense in the context of Liskov’s 1996 history of CLU [Liskov96], where the dataless programming language’s mechanism may have been characteristic of an approach used in other extensible languages:

Extensible languages contained a weak notion of data abstraction.Â This work arose from a notion of “uniform referents” [Balzer 1967; Earley 1971; Ross 1970].

The paper by Schuman and Jorrand is more recognizably about language extensibility.Â They present an extensible language, describing facilities for modifying both the semantic and syntactic properties of their language.Â They focus on semantic extensibility “in the domain of values”, providing basic mechanisms for user defined data structures.Â Their extension types include support for type casts and name indexed unary operators.Â They do not go into other forms of semantic extensibility, mentioning they are possible, but outside of the scope of this particular paper.

Schuman and Jorrand’s paper presents a form of concrete syntax macros for syntactic extension.Â They build on top of syntax macros as originally described by Leavenworth [Leavenworth66], but add additional functionality.Â Schuman and Jorrand’s macro specifications consist of three things: a production, an optional predicate, and a replacement.Â Their approach feels like a parser description language, where the production is in BNF style, extending the existing grammar of the extensible language, and the replacement is similar to a parsing action.Â Using a production makes their approach able to use a single macro definition instead of having separate forms for expression and statement macros.Â Predicates also allow their macros to do general purpose compile-time metaprogramming, binding values in the compile-time environment used by the replacement clause.

I want to quickly give some credit to Barbara’s point about reinvention of the wheel.Â I see similarities between Schuman and Jorrand’s macros and the “patterns” proposed in a new language, Ï€ (pi-programming.org), presented by Roman KnÃ¶ll in the Onward! track this year [KM09].Â I think both of these mechanisms present issues of composibility and modularity.Â Schuman and Jorrand do spend some time wringing their hands over this issue, especially since efficiency of compilation was a larger issue when they developed their system.Â Neither of these papers gives me an idea of why these ideas weren’t developed further and put into the languages I learned as a teenager.Â I can imagine an extensible Turbo Pascal where we were invited to learn more about object oriented programming by implementing our own object system.Â I can’t trivially find a paper “Extensibility considered harmful” that exposes the fatal flaws in extensibility, and explains why I wasn’t taught to extend my language.Â The impression I got from Liskov was that these systems led to a “Tower of Babel” problem, making user programs unreadable, and therefore too difficult to reuse and maintain.

It isn’t surprising, therefore, that a member of the PLT, Shriram Krishnamurthi, raised this issue during the question period of Liskov’s presentation.Â The Lisp and Scheme community have lived with syntax macros for a long time, and they seem to get by just fine with them.Â If I understand correctly, these communities see “language towers” develop, and they fall in and out of use.Â It’s unfortunate that I wasn’t introduced to the Little Lisper until I was about to graduate from college, and even now have to admit I don’t have my head fully wrapped around the macro system in MzScheme.

I think extensibility is a desirable property, and we can tame the issues it raises.Â Users can use lexical scoping to delimit syntactic extensions, making it clear where they are using custom syntax (see “Mood-specific Languages” in [WP07]).Â We also have several new approaches to dealing with ambiguity in parsing concrete syntax macros.Â In the limit, our machines are much bigger than they were in the 1970’s, and the cubic space and time of parsing an arbitrary context-free grammar simply isn’t the barrier it once was.Â I don’t see how one can readily prove extensibility is a desirable or undesirable property (though I plan to show one technique I’m developing for Mython saves a great deal of coding and porting effort).Â The fact that previous extensibility mechanisms were not widely adopted does not prove extensibility is undesirable.Â At worst, this outcome only shows that we haven’t found the right mechanism or pedagogical tools.Â Therefore, I remain interested in lowering the barriers to the evolution of languages, and seeing what happens.

posted by jriehl at 10:26 am

Comments (1)

Wednesday, November 4, 2009

Embedding LLVM Assembly in Mython

Today we’re going to look at how we can use Mython and llvm-py to embed LLVM assembly code into a Mython module.Â For those not familiar with Mython, I wouldn’t worry too much; what we are doing should not look, nor work too different from the following bit of code (which requires Python 2.5, LLVM, and llvm-py to work, by the way):

import StringIO, llvm, llvm.core, llvm.ee
llvm_asm = """
@msg = internal constant [15 x i8] c"Hello, world.\\0A\\00"

declare i32 @puts(i8 *)

define i32 @not_really_main() {
    %cst = getelementptr [15 x i8]* @msg, i32 0, i32 0
    call i32 @puts(i8 * %cst)
    ret i32 0
}
"""
llvm_module = llvm.core.Module.from_assembly(
                  StringIO.StringIO(llvm_asm))
mp = llvm.core.ModuleProvider.new(llvm_module)
ee = llvm.ee.ExecutionEngine.new(mp)
not_really_main = llvm_module.get_function_named(
                      'not_really_main')
ee.run_function(not_really_main, [])

This code first defines an LLVM module in a Python string, then builds a LLVM module from the embedded code, and finally uses a JIT to link and run a function from the embedded module.Â I would note two things about this little demo.Â One, while the multiline string allows un-escaped quote characters, developers must still take care to escape backslashes.Â Failing to do this causes the LLVM assembler to reject the string literal.Â Two, the user of this code pays for the assembly of the LLVM code each time it is run.Â Both of these are relatively minor problems, but they illustrate why a developer might prefer Mython over embedding another language as strings in a Python file.Â Later, we shall develop these arguments in more depth.

This post demonstrates how we can take the infrastructure in llvm-py and use it to embed LLVM source.Â We show how to assemble the embedded LLVM source into LLVM bitcode at compile time.Â We’ll then stash the bitcode for consumption by the LLVM JIT compiler and linker at run time.Â This approach saves us from the bitcode compilation time, and ideally saves some space in the Python bytecode.Â More importantly, this approach ensures that errors in the embedded source are detected at compile time, not run time.

Preliminaries

If you’re not terribly familiar with Python, LLVM, and llvm-py, I’d recommend reading at least the Python tutorial, the LLVM assembly tutorial, and the llvm-py user guide.Â The llvm-py user guide should, in turn, point you at a specific test case for using the LLVM JIT, which the above code follows except it builds the module from assembly source.Â The llvm-py documentation builds the module using wrapper objects for the LLVM intermediate representation (IR).

At the time of writing, I used LLVM 2.5 (via MacPorts), and built llvm-py from the Google code repository.Â Originally, I tried the llvm-py port, but the llvm-py 0.5 tarball they use doesn’t build against LLVM 2.5.Â I encountered this problem with llvm-py 0.5 again on Cygwin, this time doing a manual build and install of LLVM 2.5 from a source tarball.Â I was also able to build and install the llvm-py Subversion head, but the example code for this post does not work (it can’t dynamically resolve puts()).

Mython introduces a special form of quotation into the Python language.Â The idea is that you can embed raw strings in your source code, and these strings are interpreted into Python code at compile time.Â Quotation blocks look something like this:

quote [quotefn] name:
Â Â Â  ...

The quotefn() is ideally a function that takes a name, a string, and a dictionary, and returns a 2-tuple containing a list of Python abstract syntax trees (AST’s, specifically, statement nodes), and a dictionary.Â Instead of giving a quick demonstration of how to define and use a quotation function in Mython, let’s go ahead and demonstrate these by embedding LLVM assembly.Â I will explain the Mython code as we go along.

I recommend you grab a copy of MyFront (which is part of the Basil language framework), and the test1.my source file from the Google Code repository (see availability, below).Â The following discussion essentially gives the source code for test1.my, but lists it out of order.

Interfacing llvm-py and Mython

So now that I’ve discussed the preliminaries, let’s just go ahead and start defining the driver we’ll use to test the compile-time wrapper for the LLVM assembler.Â Let’s assume that we already have a “quotation” function for LLVM assembly.

quote [llvm_as] llvm_module:
 @msg = internal constant [15 x i8] c"Hello, world.\0A\00"
 declare i32 @puts(i8 *)
 define i32 @not_really_main() {
     %cst = getelementptr [15 x i8]* @msg, i32 0, i32 0
     call i32 @puts(i8 * %cst)
     ret i32 0
 }

Our job consists of defining llvm_as() to be a quotation function that translates this quotation block into something like the following:

llvm_module = llvm.core.Module.from_bitcode(
                  StringIO.StringIO("..."))

At run time, the above constructs a LLVM module from the elided bitcode in the string literal (the "...").Â We therefore need to define a compile-time function that does the following:

Takes the embedded source code and assembles it into an LLVM module.
Translates the LLVM module into a string literal containing LLVM bitcode.
Compiles a Python abstract syntax tree that will reconstruct the LLVM module from the embedded bitcode.

Before we proceed, let us assume that we already have three bound variables, each corresponding to a quotation function parameter: name, source, and env.Â The name variable is bound to the string literal "llvm_module".Â The source variable contains the string of the LLVM assembly, with the leading indentation white space removed.Â The env variable is a dictionary that is supposed to be an explicit replacement of the __globals__ dictionary, originally used by Python to manage its global namespace, but passed by MyFront explicitly as a reminder that it is a compile-time environment, not a run-time environment.Â I’m not sure if this “explicit store passing” style actually buys us anything, and this may be dropped from quotation functions in later versions of Mython.

We’ve seen some of the above steps accomplished in the introduction.Â We first must build a LLVM module from the LLVM assembly code, which is bound to the source variable:

fobj1 = StringIO.StringIO(source)
llvm_module = llm.core.Module.from_assembly(fobj1)

We now have the same module we’ll want to use at run time bound at compile time (actually its a functionally identical module).Â We need to emit the bitcode that we’re going to embed in the run time code we’ll be generating:

fobj2 = StringIO.StringIO()
llvm_module.to_bitcode(fobj2)

This writes the bitcode as a string literal within the StringIO file abstraction.Â We can now build Python code in another string:

runtime_src = ("%s = llvm.core.Module.from_bitcode("
               "StringIO.StringIO(%r))\n" %
               (name, fobj2.getvalue()))

Normally, I would expect the next step to be a possibly involved process of walking over some intermediate representation and constructing a Python AST to pass back to the compiler.Â In this case, we can avoid having to do this, since all we need to do is embed the LLVM code as a string argument.Â To convert the run-time code into an AST, we are going to take advantage of the fact that the compiler reflects its front-end in the env dictionary.Â MyFront maps the string "myfrontend" to a function that translates Mython source code and the compile-time environment into a Python AST, and a possibly mutated compile-time environment.Â This function allows us to simply take the above string and parse it into a Python AST like so:

runtime_ast, env = env["myfrontend"](runtime_src, env)

The myfrontend() function specifically returns a Module AST node.Â In order to get a list of statement AST nodes, we’ll just have to look at the body member of the returned Module object.Â The fully wrapped up Mython quotation function looks like this:

quote [myfront]:
    def llvm_as (name, source, env):
        assert name is not None
        fobj1 = StringIO.StringIO(source)
        llvm_module = llvm.core.Module.from_assembly(fobj1)
        fobj2 = StringIO.StringIO()
        llvm_module.to_bitcode(fobj2)
        runtime_src = ("%s = llvm.core.Module.from_bitcode("
                       "StringIO.StringIO(%r))\n" %
                       (name, fobj2.getvalue()))
        runtime_ast, env = env["myfrontend"](runtime_src, env)
        return runtime_ast.body, env

If you are curious about the above quotation block, the myfront() quotation function simply evaluates the embedded code at compile time and in the compile-time environment.Â This allows us to define the llvm_as() function at compile time, but then throw it away at run time.

The only thing that is left is to test it:

def main ():
    import llvm.ee
    print llvm_module
    print "_" * 60
    provider = llvm.core.ModuleProvider.new(llvm_module)
    llvm_engine = llvm.ee.ExecutionEngine.new(provider)
    not_really_main = llvm_module.get_function_named(
                          'not_really_main')
    retval = llvm_engine.run_function(not_really_main, [])
    print "_" * 60
    print "Returned", retval.as_int()

if __name__ == "__main__":
    main()

When I run this on my Mac (again, this is all in the test01.my source file), I see the following (the lines that start with “$” show command line inputs):

$ MyFront test1.my
$ python -m test1
@msg = internal constant [15 x i8] c"Hello, world.\0A\00"\
Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ; <[15 x i8]*> [#uses=1]

declare i32 @puts(i8*)

define i32 @not_really_main() {
 %cst = getelementptr [15 x i8]* @msg, i32 0, i32 0\
Â Â Â Â Â Â Â Â Â Â Â Â Â  ; <i8*> [#uses=1]
 %1 = call i32 @puts(i8* %cst)Â Â Â Â Â Â Â Â Â Â  ; <i32> [#uses=0]
 ret i32 0
}

____________________________________________________________
Hello, world.

____________________________________________________________
Returned 0

I was not able to get LLVM to dynamically link puts() on the Cygwin platform.Â The resulting runtime code correctly outputs the module source, but then raises a signal, causing a core dump.Â It would be nice if the abort signal was simply thrown as an exception.Â I am reminded of the utility of David Beazley‘s wrapped application debugger (WAD), or something similar, for catching signals and then translating them to Python exceptions.

Discussion

Now that we have looked at how to embed LLVM assembly in Mython, let’s look more closely at possibilities for answering why you would want to use Mython’s approach.Â This section looks at three things.Â First, it compares code size at the module level.Â Second, it gives measurements and discusses any possible differences in the run-time performance. Finally, this section demonstrates how both approaches to embedding handle errors in the embedded assembly.

I did not expect the resulting module sizes.Â The Python version, test0.py, compiles to a file, test0.pyc, which is 1,252 bytes in size.Â The Mython version compiles to test1.pyc, and is 1,335 bytes big.Â However, when I use llvm-as on the assembly code alone, I see that without comments, the assembly code is smaller than the LLVM bitcode file by 79 bytes (213 bytes for the source code, 292 bytes for the bitcode). I assume that for more complicated input source, the LLVM bitcode will be smaller than the source (one can always skew this by adding comments; the original standalone hello.ll was 538 bytes with white space and comments).

To compare the run-time performance of the naive and Mython embeddings, I created a test harness to measure three scenarios:

The time it takes to construct an LLVM module from assembly source.Â This should be representative of the time taken by a naive embedding.
The time it takes to construct an LLVM module from assembly source and then serialize it into LLVM bitcode.Â This should reflect the compile-time cost of the Mython embedding.
The time it takes to construct an LLVM module from bitcode. This reflects the run-time code of the Mython embedding.

I implemented this test harness in test2.py, which can be found in the same repository as the other two test modules (see availability, below).Â I am seeing the following results output from the test harness (times are in seconds, and reflect the minimum, maximum, and average times over 100 measurements of a function that performs the given test a 1000 times):

$ ./test2.py
Naive embedding summary: min=0.0542359 max=0.0596418 avg=0.0549726
Compile-time summary: min=0.134633 max=0.150107 avg=0.136246
Run-time summary: min=0.069649 max=0.0750451 avg=0.0704705

These results come as a second surprise.Â Since the test harness solely runs wrapped LLVM code, it might seem that the LLVM infrastructure handles string inputs slightly faster than bitcode. After thinking about this for a minute, a more likely explanation is that the bitcode input is larger than the assembly string input. Using the sizes given above, we can see the bitcode string is about 1.37 times larger than the assembly source.Â The module construction time is only about 1.28 times longer.Â These relative numbers imply that if I did use more complicated assembly source with equivalent or smaller resulting bitcode, I would see a slight performance increase. This run-time performance increase would come at a small additional cost at compile time.Â On my machine, these numbers imply it would take an additional 13.6 milliseconds per 1000 lines of embedded assembly code (not counting deallocation time).

Finally, we look at what happens when there is a syntax error in the embedded code.Â In the repository, I copied the test0.py and test1.my files to the bad0.py and bad1.my, respectively.Â I then remove the leading “@” from the function definition.Â Here is the result of compiling these two modules using the MyFront compiler (note that I’ve hand shortened the file paths using ellipses):

$ rm *.pyc
$ MyFront bad0.py
$ MyFront bad1.my
Error in quote-generated code, from block starting at line 41:
  Traceback (most recent call last):
    File ".../basil/lang/mython/MythonRewriter.py", line 106, in
handle_QuoteDef
    ret_val, env = quotefn(node.name, node.body, env)
    File "bad1.my", line 4, in llvm_as
    File ".../site-packages/llvm/core.py", line 330, in from_assembly
    raise llvm.LLVMException, ret
  LLVMException: expected function name
$ ls *.pyc
bad0.pyc

I have chosen to focus on just using the compiler, so you can clearly see that the naive embedding was quietly compiled into a Python bytecode file.Â In this particular case, the LLVM error would be caught at import time:

$ python -m bad0
Traceback (most recent call last):
  File ".../runpy.py", line 95, in run_module
    filename, loader, alter_sys)
  File ".../runpy.py", line 52, in _run_module_code
    mod_name, mod_fname, mod_loader)
  File ".../runpy.py", line 32, in _run_code
    exec code in run_globals
  File ".../sandbox/llvm/bad0.py", line 27, in
    llvm_module = llvm.core.Module.from_assembly(StringIO.StringIO(
llvm_source))
  File ".../site-packages/llvm/core.py", line 330, in from_assembly
    raise llvm.LLVMException, ret
llvm.LLVMException: expected function name

If you were to just compile this file and ship it, you might be condemning users to a nasty surprise.Â I know you’d still catch these kinds of bugs by extensive testing, right?Â The specific bug I’ve injected would be pretty easy to find, since the exception would occur as soon as you import the module.Â If you assembled the LLVM code inside a function, or on some special path, these kinds of bugs become much harder to find.Â You would have to be especially careful if the LLVM source was automatically generated.

I am slightly embarrassed to note that this kind of experiment can still go horribly wrong in Mython.Â Since the current Mython implementation uses the Python tokenize module, it will not detect a DEDENT token if your embedded code has imbalanced brackets, braces, or parentheses.Â Feel free to delete the close brace from the embedded LLVM and watch the resulting mess output by MyFront’s recursive descent parser.Â I hope to have this problem fixed
shortly.

Conclusion

To conclude, I was really hoping to make the following claim:

We can embed LLVM bitcode in Python, and this should offer our compiled modules greater speed without sacrificing platform independence.

In this case, I was not able to make this claim.Â The idea is that the time necessary to parse a string and create a LLVM module should be less than the time necessary to construct a module from a bitcode string of equal size.Â This claim might be easier to show for embeddings of native machine code, but that would cost us platform independence.Â I would be interested in learning more about the LLVM bitcode format, and determining when it is likely that the bitcode for a module is larger than its source code (our example has a string literal in it, which might play some part in the source and bitcode sizes).

I hope the following claims are easier to accept given this example:

Mython makes it possible to embed code from other languages without string escapes.
Mython makes it possible to check embedded code at compile time.
If you already have a language implementation that can interface with Python, it is very simple (< 10 lines of code) to embed and statically check it in Mython.

I hope you will take the time to play around with building more quotation functions in Mython, and see what you can do with them.Â I think quotation functions are a powerful mechanism for metaprogramming, and I hope to continue to provide interesting examples of their utility.

Availability

Instructions for obtaining Mython, and its implementation, the MyFront compiler, are given here: http://code.google.com/p/basil/wiki/GettingStarted

The source code for the Python and Mython demonstration and test modules are in the Basil framework sandbox.Â You can get them from Google Code here: http://code.google.com/p/basil/source/browse/trunk/sandbox/llvm/

posted by jriehl at 5:35 pm

Comments (1)

Monday, April 13, 2009

(More) PyCon 2009 Sprint Post-mortem

So I think the Mython sprint went well.Â I was thinking the sprint report on the Google Code wiki was going to be all I wanted to say about our activities.Â However, I’d like to link this stuff from my log, and do a quick iteration on expressing my gratitude.

I’d like to thank everyone involved, which included Chris Cope, Tyler Green, and Andy Terrel.Â I’d also like to thank the sprint organizers (esp. Jacob Kaplan-Moss, whom we turned to for all kinds of help, from setting up the sprint page, to getting networking and a whiteboard) and the sponsors.Â At first I didn’t like getting placed in a room with auditorium style seating, but I have to admit that it grew on me (especially when I was trying to look over someone’s shoulder).

I hope we can organize something similar for PyCon 2010.

posted by jriehl at 9:39 am

Comments Off

Thursday, March 5, 2009

The ALT-G Bitbucket Repository

If you are not following the ALT-G group, I’d like to announce that source code for the Camlp5 source-to-source translator is available on Bitbucket at http://bitbucket.org/jriehl/altg/.Â Specifically, you’ll find it in the camlp5 subdirectory.

If you dig the Camlp5 stuff, the viscomp subdirectory might also be of interest.Â In there I give an example of taking apart the read-eval-print loop for the SML/NJ compiler.Â This decomposition draws a line between the front-end (parser) and the back-end (everything else).Â This separation gives me several things: a parser for SML/NJ (not too hard to get a hold of, but I did have to go digging), and an evaluator that can operate on either strings or SML/NJ abstract syntax.Â It should come as no surprise that, following my SML/NJ investigation, my attention has turned to source-to-source translation.

One idea I take from my work on Mython, is applying the evaluator at compile time.Â A lot of other people seem to avoid doing this, and I’m still a little curious why (other than old reservations about binary size).Â Camlp5 could have incorporated parts of the Ocaml compiler to get parse-time evaluation, and this would make its “#pragma” declaration much more useful.Â If I understand correctly, Camlp5 wanted a clean break with the Ocaml code, so they have developed an interpreted evaluator that is not complete (at least for the version I was using).

In conversations I’ve had with John Reppy, he also seems to prefer not coupling a programmable front-end with a specific evaluator since the front-end would become saddled with limitations of the evaluator (SML/NJ’s 31-bit integers were one example given).Â I’ll certainly have to think more about this design point, and would be interested in hearing more about the pro’s and con’s of general purpose compile-time evaluation (other than the obvious limitation that user code can now cause the compiler to not halt).

posted by jriehl at 11:25 am

Comments Off

Wednesday, March 4, 2009

Source-to-source Translation Using Camlp5

In this post, I’d like to illustrate how we can use Camlp5 to build a source-to-source translator using concrete syntax. Recall (from Jones et al.) that a translator involves three things: a source language, a target language, and an implementation language. Our example will use a domain-specific language as its source language, Ocaml as its target language, and Ocaml with the Camlp5 extensions as its implementation language. We’d like to specifically highlight how Camlp5 allows us to use parts of the concrete syntax of the source and target languages.

Using Camlp5

Note that Camlp5 syntax has some differences with Ocaml syntax. These are documented in the Camlp5 manual, and I’m still unfamiliar enough with the two grammars that I can’t necessarily identify which is which.

It should suffice to say that my examples will be in the implementation language, Camlp5. I’d encourage the reader to play along using the Camlp5 read-eval-print loop (REPL), which is run on top of the Ocaml REPL as follows:

$ ocaml -I +camlp5 camlp5r.cma

More seasoned Camlp5 users might note that I fibbed a little about there being only two grammars. Camlp5 has an additional syntax that is accessed by substituting “camlp5o.cma” for “camlp5r.cma”. I’ll defer to the Camlp5 user manual for further explaination of these two front-ends. All my examples will use the “restricted” syntax implemented in “camlp5r.cma”.

Implementing the Source Language

We begin implementing our translator by settling on a domain-specific language for our source language. Who doesn’t love infix-notation integer calculators? Not just the Camlp5 user manual, I assure you (see the info files for Bison, for example). Why should we break tradition, and overlook this classic example?

We can define a quick abstract syntax for calculator expressions as follows:

type expr =
    [ Op of string and expr and expr
    | Int of int
    | Var of string ];

This data type forms an intermediate target language for the parser.

Camlp5 has lots of parsing goodies, but the parts I like the best are the inline syntax definitions and the extensible parsers. Before we can uses these features, we’ll need to tell Camlp5 to extend its own front-end. We accomplish this using the “#load” directive. Following that, we can define a grammar and entry point:

#load "pa_extend.cmo";

value gram = Grammar.gcreate (Plexer.gmake ());
value exp = Grammar.Entry.create gram "expression";

EXTEND
  exp:
  [ "SUM"
    [ x = exp; "+"; y = exp -> Op "+" x y
    | x = exp; "-"; y = exp -> Op "-" x y]
  | "PROD"
    [ x = exp; "*"; y = exp -> Op "*" x y
    | x = exp; "/"; y = exp -> Op "/" x y]
  | "ATOM"
    [ x = INT -> Int (int_of_string x)
    | x = LIDENT -> Var x
    | "("; x = exp; ")" -> x ] ]
  ;
END;

Here we have defined an extensible parser with the sole non-terminal, exp. We have stratified the productions for exp into three levels: SUM for addition and subtraction, PROD for multiplication and division, and ATOM for constants, variables, and explicit grouping. The order these production are defined in is important. The ordering establishes precedence without having to declare intermediate non-terminals (like “term” or “factor”). The actual string labels for each grouping are optional, but I use them here for future experimentation with extending our little language.

Individual productions are organized similar to patterns, with a left-hand side and right-hand side, separated by an arrow, “->”. The left-hand side is a matching and binding pattern, with non-terminals appearing as labels, and binding forms using an identifier and equal sign. Note that we can interleave lexical information as strings in the pattern. When the left-hand side matches, the system evaluates the right-hand side expression in an environment extended with symbols bound during matching.

Camlp5 expands the contents of the EXTEND section into a set of statements that side effect the exp entry. We can witness this by running Camlp5 as a pre-processor, asking it to print the expansion in a simplified Camlp5 syntax (assuming we have the above example code defined in a file, “CalcBase.ml”):

$ camlp5r pr_r.cmo CalcBase.ml

Once we have a grammar, we can construct a parser:

value parse s = Grammar.Entry.parse exp (Stream.of_string s);

…and now that we have a parser, we can parse:

# parse "2 + 3 * foo";
- : expr = Op "+" (Int 2) (Op "*" (Int 3) (Var "foo"))
# parse "42 / 3 - bar";
- : expr = Op "-" (Op "/" (Int 42) (Int 3)) (Var "bar")
# parse "(9 + 2)^2";
- : expr = Op "+" (Int 9) (Int 2)

The last example given above has a syntax error (using the caret operator, “^”). Our function parses as much of the input string as it can, and returns the last legitimate result. We can force an exception by doing something like passing it the empty string, or an empty pair of parenthesis. In order to force the whole string to be in the concrete syntax, we would have to add some sort of delimiter or cue to the grammar. Section 20.9 of the Camlp5 user manual gives an example of adding an end-of-string token to the lexer and using this to ensure the whole string is in the formal language.

I’ve purposely omitted a lot of details for the sake of brevity, particularly details on lexical analysis and extensible scanners. For more information on writing extensible parsers, I refer readers to Section 8 of the Camlp5 user manual.

Implementing an Evaluator

Implementing an evaluator for our domain-specific language is straightforward (for people who’ve already seen this done a thousand times, at least). All we need to do is build a function that walks and evaluates the abstract syntax data structure, along with some minimal infrastructure:

value lookup = fun id -> if id = "thingy" then 1000 else 0;

exception EvalFailure of string;

value rec eval e = match e with
  [ Op "+" x y -> (eval x) + (eval y)
  | Op "-" x y -> (eval x) - (eval y)
  | Op "*" x y -> (eval x) * (eval y)
  | Op "/" x y -> (eval x) / (eval y)
  | Op opstr _ _ -> raise (EvalFailure ("Unknown operator: '" ^
                                        opstr ^ "'"))
  | Var idstr -> lookup idstr
  | Int n -> n];

We use a fixed look-up function for variables, which is appropriate since there is no syntax for assigning values to variables. We also raise an exception in the event that an unrecognized operator string is present in an Op constructor. If we are using the parse function from the previous section, we should never see an exception raised. Nothing in our example stops someone from defining a new parser with unrecognized operators, much less constructing them “by hand”. This possibility reflects a design choice that permits introducing new binary operators without changing the abstract syntax. I intend to demonstrate using this flexibility in later experiments where we’ll extend the source language.

We can compose the evaluator function with the parser to get an evaluator on strings in the source language:

value evalstr s = eval (parse s);

…and we have a calculator:

# value tests = ["2 + 3 * foo"; "42/3 - bar";
                 "(9 + baz) * (9 + baz)"];
value tests : list string =
  ["2 + 3 * foo"; "42/3 - bar"; "(9 + baz) * (9 + baz)"]
# List.map evalstr tests;
- : list int = [2; 14; 81]

An Aside: Staging and Printing in the Target Language

A language as simple as the calculator language implemented above does not really require the same kind of infrastructure as a more sophisticated intermediate language. In more complicated systems, we often have a set of transformers for the intermediate language (optimizers being one example of this). These transformers are simply functions that take an expression in the intermediate language and return an equivalent expression in the intermediate language. In a system with numerous transformers, we find it useful to see what the input and output terms are, so we may want to define a pretty-printer for our language.

Camlp5 has a pretty-printing infrastructure available for building functions that translate from an intermediate language back to a human readable string. I am going to ignore that infrastructure here, and instead try to use one of the pretty printers that come with Camlp5. One of these pretty-printers takes a Camlp5 syntax tree and translates it to a string.

In order to use the Camlp5 pretty-printer, I am going to use staging. Staging involves “raising the evaluation level” of code. We raise the evaluation level by translating abstract syntax into abstract syntax that constructs the original abstract syntax. With a staging function, we can translate our domain-specific abstract syntax (of type expr) into Camlp5 abstract-syntax, and then feed that into the pretty printer. The funny thing that makes this (more or less) work is that the code to construct abstract syntax is often identical to how we’d pretty-print it anyway (the exception to this rule of thumb occurs when we demand pretty-printing in the concrete or “surface” language).

We continue with this staging idea by using Camlp5’s quotation facilities. A quote is a means of saying “this bit of code should construct an intermediate representation of itself”. If we want to use quotation in Camlp5, we need to extend the language with quotation syntax. Here is an example of Camlp5 quotation:

#load "q_MLast.cmo";

value demoexp = let loc = Ploc.dummy in <:expr< 3 + 4 >>;

From the REPL this should ultimately display the following:

value demoexp : MLast.expr =
  MLast.ExApp
    (MLast.ExApp  (MLast.ExLid  "+")
       (MLast.ExInt  "3" ""))
    (MLast.ExInt  "4" "")

Great, so what if we wanted to get a string back out from our quoted expression? We first need to load a pretty printer. Loading a pretty-printer module registers the module’s pretty-printer with Camlp5. We can reference this pretty printer in the Pcaml structure. The following example builds a printing function that translates from Camlp5 expression abstract syntax to strings:

# #load "pr_r.cmo";
# value print e = Eprinter.apply Pcaml.pr_expr Pprintf.empty_pc e;
value print : MLast.expr -> string = <fun>
# print demoexp;
- : string = "3 + 4"

From the very same REPL we can now define a staging function that, modulo quotation syntax, looks like an identity transformer for our abstract syntax:

value rec stage (e : expr) : MLast.expr =
  let loc = Ploc.dummy in match e with
    [ Op ops lt rt -> <:expr< Op $str:ops$ $stage lt$ $stage rt$ >>
    | Int n -> <:expr< Int $int:string_of_int n$ >>
    | Var vid -> <:expr< Var $str:vid$ >> ];

Again, the angle brackets mark parts of code that should not be evaluated, but rather construct Camlp5 abstract syntax. The parts surrounded by dollar signs are anti-quotations. These are portions of code that should be evaluated, mostly for the purpose of assisting in the construction of abstract syntax. For example, in the case matching operators, we quote the application of the operator constructor, but then anti-quote to make a recursive call to stage. In the recursive call, stage builds abstract syntax for the subexpressions, contained within the operation constructor.

Okay, so we can print Camlp5 abstract syntax, and we can stage calculator abstract syntax into Camlp5 abstract syntax. Like the previous section, all that’s left to do is composing these two functions:

value printcalc e = print (stage e);

Now we can test the printcalc function on our test strings, demonstrating the amazing capability of taking some code and printing a string that is equivalent under some evaluation and staging regime:

# value testasts = List.map parse tests;
value testasts : list expr =
  [Op "+" (Int 2) (Op "*" (Int 3) (Var "foo"));
   Op "-" (Op "/" (Int 42) (Int 3)) (Var "bar");
   Op "*" (Op "+" (Int 9) (Var "baz")) (Op "+" (Int 9) (Var "baz"))]
# List.iter (fun ast -> print_endline (printcalc ast)) testasts;
Op "+" (Int 2) (Op "*" (Int 3) (Var "foo"))
Op "-" (Op "/" (Int 42) (Int 3)) (Var "bar")
Op "*" (Op "+" (Int 9) (Var "baz")) (Op "+" (Int 9) (Var "baz"))
- : unit = ()

Like other bits of Camlp5, the details of defining locations (the loc value that I keep let-binding) are outside the scope of this discussion. These locations are important for error reporting. See Section 20.5 in the Camlp5 manual for an example of using locations to improve error messages.

Implementing Custom Quotation

I didn’t go into the details of quotation above, but hopefully I’ve sparked some curiosity about it. Quotation in Camlp5 provides a form of what I call parametric quotation. In the case of Camlp5, there is an identifier parameter to the quotation syntax. We saw an example of identifier arguments to quotation in the previous section. Specifically, the expr identifier, embedded in the quotation brackets, “<:expr<“, told the Camlp5 pre-processor that we were quoting a Camlp5 expression. Please note that the expr identifier used in quotation does not reference the data type we built for the calculator abstract syntax!

The identifier argument to the quotation syntax references what is called a quotation expander in Camlp5 parlance. A quotation expander can take two forms. In this post, we only use one of these quotation expander constructions. We construct a quotation expander from a pair of functions from strings to Camlp5 abstract syntax. One function is responsible for returning Camlp5 abstract syntax for an expression, of type MLast.expr. The other function is responsible for returning abstract syntax for a pattern, of type MLast.patt.

One detail of quotation not previously mentioned was that Camlp5 quotations can do more than just construct abstraction syntax. Quotations can also appear as patterns, where they are used to match against abstract syntax. Camlp5 provides a quotation expander for building patterns using concrete syntax, patt. For example, we can do the following to build abstract syntax for pattern matching a zero constant in our calculator:

# let loc = Ploc.dummy in <:patt< Int 0 >>;
- : MLast.patt =
MLast.PaApp <abstr> (MLast.PaUid <abstr> "Int")
    (MLast.PaInt <abstr> "0" "")

We already have a parser for our calculator, and we can define the expression expander function by composing the parser with our staging function. To define a pattern expander requires us to define another parser that basically copies the concrete syntax for calculator expressions. One primary difference between these two parsers, however, is that if we want our pattern matcher to bind any variables, we need to add additional support for anti-quotation.

The code below copies the calculator expression grammar, but adds an anti-quotation reduction at the atom stratum. This example illustrates how anti-quotation is a property of the quotation expander, not the containing Camlp5 syntax. Some location book-keeping is handled by a new non-terminal, pat_antiquot, and modulo changing the anti-quotation “operator” to a percent sign, the following is adapted from the example in Section 20.9 in the Camlp5 manual:

value pat = Grammar.Entry.create gram "expression";

EXTEND
  GLOBAL: pat;

  pat:
  [ "SUMPAT"
      [ x = pat; "+"; y = pat -> <:patt< Op "+" $x$ $y$ >>
      | x = pat; "-"; y = pat -> <:patt< Op "-" $x$ $y$ >> ]
  | "PRODPAT"
      [ x = pat; "*"; y = pat -> <:patt< Op "*" $x$ $y$ >>
      | x = pat; "/"; y = pat -> <:patt< Op "/" $x$ $y$ >> ]
  | "ATOMPAT"
      [ x = INT -> <:patt< Int $str:x$ >>
      | x = LIDENT -> <:patt< Var $str:x$ >>
      | "%"; r = pat_antiquot -> r
      | "("; x = pat; ")" -> x ] ]
  ;

  pat_antiquot:
  [ [ i = LIDENT ->
        let r =
          let loc = Ploc.make_unlined (0, String.length i) in
          <:patt< $lid:i$ >>
        in
        <:patt< $anti:r$ >> ] ]
  ;

END;

Since the calculator language is a source language, we do not need to add anti-quotation to the expression expander. If we wanted to use the calculator language as a target language, or build rewriting functions (might include such useful things as a step-wise evaluator or constant folding optimizer), we would find adding anti-quotation to the calculator expression non-terminal much more helpful.

Given the new entry point for creating patterns from the calculator concrete syntax, we can now construct an expander function for both entry points and register the full expander with Camlp5:

value expand_expr s = stage (parse s);
value expand_patt s = Grammar.Entry.parse pat (Stream.of_string s);
Quotation.add "calc" (Quotation.ExAst (expand_expr, expand_patt));
Quotation.default.val := "calc";

The last statement makes our quotation expander the default quotation expander. We can now quote into the concrete syntax of our calculator without having to give an identifier argument:

# << 3  + xyzzy >>;
- : expr = Op "+" (Int 3) (Var "xyzzy")

In this example and at any point following registration of the expander in the REPL, we are able to quote into our custom concrete syntax. If we wanted to quote into our new language in a source file, we would have to define the quotation expander in a separate compilation unit and load the compiled module using the #load directive. At present, I’m not sure how much of a limitation separate compilation poses to the application of parametric quotation. While Mython currently doesn’t have all the nifty extensible parser infrastructure, it offers the ability to use a quotation expander as soon as it is defined.

Using Quotation in an Evaluator

Now that we have a full quotation expander for the calculator language, we can use quotation to build an evaluator that matches against concrete syntax as opposed to abstract syntax:

value rec eval2 e = match e with
  [ << %x + %y >> -> (eval2 x) + (eval2 y)
  | << %x - %y >> -> (eval2 x) - (eval2 y)
  | << %x * %y >> -> (eval2 x) * (eval2 y)
  | << %x / %y >> -> (eval2 x) / (eval2 y)
  | Var idstr -> lookup idstr
  | Int n -> n
  | _ -> raise (EvalFailure ("Unhandled abstract syntax :" ^
                             (printcalc e))) ];

The new evaluator should work identically to our previous evaluator:

# List.map eval2 testasts;
- : list int = [2; 14; 81]

I propose that using concrete syntax simplifies matching and rewriting code, making it easier to read and maintain. On the other hand, I worry over several counter-arguments. If we were to count characters, there is little net savings in using the concrete syntax over the abstract syntax. We reduce character count by no longer having to write out the constructor name, but we gain characters in the arguments to the various quotation and anti-quotation forms. I’m able to save a little more in the above example by making the calculator language the default expander. Using quotation also requires the writers and maintainers keep a map from concrete to abstract syntax in their heads. I suspect most people would find it easier to make sure a match is exhaustive by consulting the constructors listed in the algebraic data-type definition.

If we were to look at domain-specific optimizations, I think these weaknesses would not be as severe. As far as brevity is concerned, I would expect to be matching more complicated source language expressions, with drastically reduced readability if we were to write them out using the abstract syntax. I would also expect matches to be non-exhaustive by construction. These domain-specific optimizations would rewrite source terms in specific cases only, ignoring input terms otherwise.

From Evaluator to Translator

Finally, we arrive at a source-to-source translator by copying the previous evaluator, and then quoting the right-hand side of our pattern matches:

exception TranslationFailure of string;

value rec tr e = let loc = Ploc.dummy in match e with
  [ << %x + %y >> -> <:expr< $tr x$ + $tr y$ >>
  | << %x - %y >> -> <:expr< $tr x$ - $tr y$ >>
  | << %x * %y >> -> <:expr< $tr x$ * $tr y$ >>
  | << %x / %y >> -> <:expr< $tr x$ / $tr y$ >>
  | Var idstr -> <:expr< lookup $str:idstr$ >>
  | Int n -> <:expr< $int:string_of_int n$ >>
  | _ -> raise (TranslationFailure ("Unhandled abstract syntax :" ^
                                    printcalc e)) ];

From the REPL we can test the translator, expecting to see Camlp5 code that would evaluate to the same thing as if we ran our evaluator on it:

# value tr_print e = print_endline (print (tr e));
value tr_print : expr -> unit = <fun>
# List.iter tr_print testasts;
2 + 3 * lookup "foo"
42 / 3 - lookup "bar"
(9 + lookup "baz") * (9 + lookup "baz")
- : unit = ()

Again, I worry over the complexity of additional quotation code. Conversely, it feels very natural to be able to copy the evaluator from the previous section, and simply quote the right-hand side of the match subexpressions. Doing this transformation automatically almost seems in our reach. We’d just quote the whole evaluator, then transform it by staging the right-hand expressions, anti-quoting recursive calls, and other fiddly bits (it looks like some form of type-mapping might be required to handle some of the anti-quotation parameters).

Unfortunately, Camlp5 doesn’t seem to be able to handle the step of quoting the evaluator:

# let loc = Ploc.dummy in <:expr< << 3 + 4 >> >>;
                                  ^^^^^^^^^^^
While expanding quotation "expr":
Parse error: illegal begin of expr_eoi

Of course, I’m rapidly wandering outside the limits of my understanding of Camlp5, and a more thorough investigation into nested quotation might show I’m not doing something properly.

Conclusions

This post shows how to use Camlp5 to write a source-to-source translator. Our translator uses concrete syntax to both match expressions in the source language and construct expressions in the target language. When we compose our translator with the Ocaml compiler, we leave the realm of interpretation and start building compilers. I think such flexibility reflects a future where our domain-specific languages can both be very high-level, but also fast.

I think this post points towards “metaprogramming nirvana”. I have demonstrated how to code using bits of multiple language by using a sophisticated language infrastructure. We used a form of parametric quotation to easily switch into a source language we defined and back into the implementation language. We did the same with the target language, which was given to us as part of the implementation language.

Hopefully, I’ve set the stage for further investigations into metaprogramming nirvana while giving you something that is both concrete and digestible.

posted by jriehl at 3:54 pm

Comments Off

Tuesday, October 21, 2008

Is this sizzle enough?

So I’m trying to draw a little more attention to my work, especially since I’ll have a captive audience at DSPD 2008. I’ve had to create a little more content, and this will help the links in my talk point to places more interesting than web “Hello, World”‘s. I would invite interested parties to review and (privately) comment on these new “leavings” since they are supposed to convey not just levity, but also serious intent and capability. And so…

In some ways I’d like to emulate the easy going professionalism you can see in Dave Beazley’s site. (BTW, congrats on the kid, Dave!)

Thus continues Sizzle Quest 2008. If you are curious where this obsession with sizzle came from, it stems from a keynote at PyCon 2008, given by Brian Fitzpatrick. The takeaway I took away was that all the virtuous content/code/product one makes is moot if it is not publicized enough. This resonates with a lot of the feedback I received from my dissertation committee. They often complained about my avoidance of the first person and how I understate my contributions (not to mention how these choices lead head long into the passive voice). It is hoped the readership will observe some improvement (*smirk*).

posted by jriehl at 1:58 pm

Comments Off

Monday, October 13, 2008

Announcing ALT-G

I just started an informal working group to further my metaprogramming agenda. Here is the text of an announcement email I just sent:

“I’ve been having several conversations about applications of language technologies with various PL lunch regulars. This last Friday, Wonseok and I decided to start a small working group, which I’m calling the “Applied Language Technologies Group” (ALT-G). This group should be relatively low bandwidth and informal at this point. Our initial short term goal is to come up with a whitepaper about “What’s visible in the visible compiler”. Our longer term goal is to evaluate and contribute to applications of programming language technologies, with the hope that someday we’ll reach “metaprogramming nirvana” (which is a hard place to reach; just ask Charles Simonyi).

If you’re interested, I invite you to join the Google Group: http://groups.google.com/group/alt-g”

Of course, if anyone else is interested, I invite them to join remotely.

posted by jriehl at 11:01 am

Comments Off

Sunday, September 14, 2008

Graphic design sans designer.

So my younger brother is a motion designer, and I’ve been laying into him for some possible graphic design help.Â I suspect this is like asking me to fix your computer (or, more appropriately, set up a website).Â I’ve finally come to the determination that he’s quietly giving me the same message I eventually give friends and family: do it yourself.

I seem to recall being much more direct when we were younger.Â Often while trying to ditch him and go out with my friends, I’d sit him in front of the computer and show him various graphics and animation tools.Â Naturally, he’d ask me how to use these packages.Â I had no problem telling him to figure things out for himself, threatening violence if he bugged me any more, and then leaving for the evening.Â It would seem I’m now taking an extended release version my own medicine.Â Probably gentler on the stomach.

So my solution today was to fire up Gimp and see if I could copy Python‘s graphic design style (beg, borrow, or steal, right?).Â Here is the result:

I hope people like it, without getting too upset about me “stealing” the style.Â It’s not like my version is actually good, but it should suffice for “Sizzle Quest 2008”.Â I’ve always “gotten” the rationale for paths, I just wish the learning curve (*smirk*) wasn’t so bad.Â I had to spend the better part of an afternoon playing around with the most rudimentary designs, and the result above is still not 100%.

Insert brief teasers about all the stuff I didn’t have time to talk about in this premier post (yay!).

posted by jriehl at 9:42 pm

Comments Off

Jon Riehl’s Log

Thursday, January 28, 2010

December 2009 Mini-sprint Report

Wednesday, January 13, 2010

OOPSLA 2009 and Extensible Languages

Wednesday, November 4, 2009

Embedding LLVM Assembly in Mython

Preliminaries

Interfacing llvm-py and Mython

Discussion

Conclusion

Availability

Monday, April 13, 2009

(More) PyCon 2009 Sprint Post-mortem

Thursday, March 5, 2009

The ALT-G Bitbucket Repository

Wednesday, March 4, 2009

Source-to-source Translation Using Camlp5

Using Camlp5

Implementing the Source Language

Implementing an Evaluator

An Aside: Staging and Printing in the Target Language

Implementing Custom Quotation

Using Quotation in an Evaluator

From Evaluator to Translator

Conclusions

Tuesday, October 21, 2008

Is this sizzle enough?

Monday, October 13, 2008

Announcing ALT-G

Sunday, September 14, 2008

Graphic design sans designer.

Links