OOPSLA 2009 and Extensible Languages
I hate to have sat on this essay as long as I have, since it has been over two months since OOPSLA 2009 finished up. I hope that later is better than never. After rereading this post, I also see one piece of information missing: the part where I drop a little sizzle for myself. I presented at DLS 2009, and it went well [Riehl09]. In that talk I argued for language extensibility and show how the extensibility mechanism in Mython affords easier language embedding and optimization. How funny I and other extensible language implementers should stumble into Liskov’s warning a few days later…
I have always considered myself fortunate when I get to hear ACM Turing Award winners speak, and this year was no exception. I was happy to see Barbara Liskov reprise her 2009 ACM Turing Award lecture at OOPSLA 2009. Her talk was interesting for several reasons. I found it exciting to listen to her discussion of the history of abstract data types, which were fundamental when I was first learning to program. I also liked Liskov’s review of CLU; It is always humbling to see how little modern systems languages have progressed in the last three or four decades. I particularly liked her pointers into the literature, and her injunctions against reinvention of the wheel.
As a researcher, I was particularly interested in her slide and comments about extensible languages: she didn’t think they were a good idea. If I remember correctly, Liskov stated that extensible languages were researched in the 1960’s and 1970’s, abandoned as a bad idea, and recent efforts to revive them will encounter the same barriers. Before I address this issue further, I’d like to discuss the two papers she cited for extensible languages:
- R. M. Balzer. “Dataless programming.” FJCC 1967 [Balzer67].
- Stephen A. Schuman and Philippe Jorrand. “Definition mechanisms in extensible programming languages.” FJCC 1970 [SJ70].
The paper by Balzer does not seem pertinent to my notions of language extensibility. Balzer’s dataless programming language “feels” like a special form of aspect-oriented programming, where users define data layout in a special segment of the program specification. Algorithms access data through a generic interface for decomposing and iterating through data structures. I would note this citation makes more sense in the context of Liskov’s 1996 history of CLU [Liskov96], where the dataless programming language’s mechanism may have been characteristic of an approach used in other extensible languages:
Extensible languages contained a weak notion of data abstraction. This work arose from a notion of “uniform referents” [Balzer 1967; Earley 1971; Ross 1970].
The paper by Schuman and Jorrand is more recognizably about language extensibility. They present an extensible language, describing facilities for modifying both the semantic and syntactic properties of their language. They focus on semantic extensibility “in the domain of values”, providing basic mechanisms for user defined data structures. Their extension types include support for type casts and name indexed unary operators. They do not go into other forms of semantic extensibility, mentioning they are possible, but outside of the scope of this particular paper.
Schuman and Jorrand’s paper presents a form of concrete syntax macros for syntactic extension. They build on top of syntax macros as originally described by Leavenworth [Leavenworth66], but add additional functionality. Schuman and Jorrand’s macro specifications consist of three things: a production, an optional predicate, and a replacement. Their approach feels like a parser description language, where the production is in BNF style, extending the existing grammar of the extensible language, and the replacement is similar to a parsing action. Using a production makes their approach able to use a single macro definition instead of having separate forms for expression and statement macros. Predicates also allow their macros to do general purpose compile-time metaprogramming, binding values in the compile-time environment used by the replacement clause.
I want to quickly give some credit to Barbara’s point about reinvention of the wheel. I see similarities between Schuman and Jorrand’s macros and the “patterns” proposed in a new language, Ï€ (pi-programming.org), presented by Roman Knöll in the Onward! track this year [KM09]. I think both of these mechanisms present issues of composibility and modularity. Schuman and Jorrand do spend some time wringing their hands over this issue, especially since efficiency of compilation was a larger issue when they developed their system. Neither of these papers gives me an idea of why these ideas weren’t developed further and put into the languages I learned as a teenager. I can imagine an extensible Turbo Pascal where we were invited to learn more about object oriented programming by implementing our own object system. I can’t trivially find a paper “Extensibility considered harmful” that exposes the fatal flaws in extensibility, and explains why I wasn’t taught to extend my language. The impression I got from Liskov was that these systems led to a “Tower of Babel” problem, making user programs unreadable, and therefore too difficult to reuse and maintain.
It isn’t surprising, therefore, that a member of the PLT, Shriram Krishnamurthi, raised this issue during the question period of Liskov’s presentation. The Lisp and Scheme community have lived with syntax macros for a long time, and they seem to get by just fine with them. If I understand correctly, these communities see “language towers” develop, and they fall in and out of use. It’s unfortunate that I wasn’t introduced to the Little Lisper until I was about to graduate from college, and even now have to admit I don’t have my head fully wrapped around the macro system in MzScheme.
I think extensibility is a desirable property, and we can tame the issues it raises. Users can use lexical scoping to delimit syntactic extensions, making it clear where they are using custom syntax (see “Mood-specific Languages” in [WP07]). We also have several new approaches to dealing with ambiguity in parsing concrete syntax macros. In the limit, our machines are much bigger than they were in the 1970’s, and the cubic space and time of parsing an arbitrary context-free grammar simply isn’t the barrier it once was. I don’t see how one can readily prove extensibility is a desirable or undesirable property (though I plan to show one technique I’m developing for Mython saves a great deal of coding and porting effort). The fact that previous extensibility mechanisms were not widely adopted does not prove extensibility is undesirable. At worst, this outcome only shows that we haven’t found the right mechanism or pedagogical tools. Therefore, I remain interested in lowering the barriers to the evolution of languages, and seeing what happens.
[Posting email comment received Aug. 25, 2011. -Jon]
I am interested in the history of data description. In “A survey of extensible programming languages” by N. Solntseff and A. Yezerski (1974), which you might know, the authors refer to ‘The notion of “metadata” introduced by Bagley’ – this is cited as source that Philip Bagley introduced the term “metadata” in context of extensible programming languages. Unfortunately I have not found a copy of his paper yet (if you happen to got one, please let me know!):
http://www.dtic.mil/docs/citations/AD0680815
I also created a bibliography of Bagley’s works. I don’t know whether and where he lives now:
http://www.mendeley.com/groups/1224631/philip-bagley-bibliography/papers
Cheers,
Jakob
Comment by Jakob — August 26, 2011 @ 11:12 am