{"id":85,"date":"2010-01-28T10:37:22","date_gmt":"2010-01-28T17:37:22","guid":{"rendered":"http:\/\/log.jonriehl.com\/?p=85"},"modified":"2010-02-04T17:46:04","modified_gmt":"2010-02-05T00:46:04","slug":"december-2009-mini-sprint-report","status":"publish","type":"post","link":"https:\/\/log.jonriehl.com\/?p=85","title":{"rendered":"December 2009 Mini-sprint Report"},"content":{"rendered":"<p>Tyler Green and I held a mini-sprint on <a href=\"http:\/\/mython.org\/\">Mython<\/a> on December 17th, 2009.  We worked on the following:<\/p>\n<ul>\n<li>The trampoline parsing framework.<\/li>\n<li>A regular expression quotation function.<\/li>\n<li>A <a href=\"http:\/\/www.cheetahtemplate.org\/\">Cheetah<\/a> quotation function.<\/li>\n<\/ul>\n<p>I&#8217;m pleased to start moving the trampoline parsing framework out into Basil (see the <tt>basil.parsing.trampoline<\/tt> module, available in the <a href=\"http:\/\/code.google.com\/p\/basil\/source\/checkout\">Basil repository<\/a>).  I have been batting around the idea of using Python&#8217;s generators to implement recursive-descent parsers for a few years now, starting with a proof of concept demonstration in Fall of 2008.  Finally several issues with the existing MyFront front-end have forced me to roll something into Basil.<\/p>\n<p>At its core, the framework is quite simple.  The framework uses a trampoline to dispatch to a set of functions that return generators.\u00c2\u00a0 When a LL(1) state machine (such as those generated by <tt>pgen<\/tt>) would push a nonterminal symbol, or a recursive-descent parsing function would call another parsing function, the generator yields the name of the nonterminal symbol.  When the LL(1) state machine would pop, or a recursive-descent parsing function would return, the generator simply returns, which raises a <tt>StopIteration<\/tt> exception in Python.  The top-level trampoline code simply maintains a stack of generators, pushing and dispatching to a new generator when a generator yields a nonterminal symbol, popping when a generator raises<tt> StopIteration<\/tt>.  This method keeps recursive-descent parsers from running into Python&#8217;s relatively shallow call stack bounds, and affords a form of syntactic extensibility by virtue of having a per-nonterminal dispatch table.<\/p>\n<p>Interested parties should expect to see more about this particular module and its application in a new Mython front-end.  The unit test for the <tt>trampoline<\/tt> module (see<tt> basil.parsing.tests.test_trampoline<\/tt> for the code), demonstrates how to use the framework, defining a recursive-descent parser for a simple calculator.  At the time of writing, I&#8217;m still in the process of integrating the Mython-specific pieces of the front-end and handling some corner cases that I was previously ignoring.<\/p>\n<p>While at OOPSLA 2009, <a href=\"https:\/\/researcher.ibm.com\/researcher\/view.php?person=us-hirzel\">Martin Hirzel<\/a> noted that one relatively easy demonstration for Mython might involve embedding a regular expression language.  While we might want to have something more powerful in the future, Python&#8217;s regular expression sub-language embeds easily into Mython.  Tyler and I followed a similar strategy to the LLVM assembly embedding I did in <a href=\"http:\/\/log.jonriehl.com\/?p=43\">November 2009<\/a>.<\/p>\n<p>Prior to the sprint I looked at our options for storing compiled regular expressions in a module.  Unfortunately, the only clear option for serializing and deserializing regular expression state machines uses Python&#8217;s <tt>pickle<\/tt> module, which involves re-compilation of the regular expression.  The result should be comparable to the LLVM assembly embedding: we gain static checks, and can drop extra backslashes.  We don&#8217;t really save any space in bytecode, nor compiled program run time.<\/p>\n<p>The quotation function is now in <tt>basil.lang.regex<\/tt>, with a corresponding unit test in <tt>basil.lang.tests.test_regex<\/tt>.\u00c2\u00a0 Since the quotation function uses the built-in <tt>re<\/tt> module we can import the quotation function at compile time, and not include it<br \/>\nat run time.  Here&#8217;s what we arrived at:<\/p>\n<pre><code>def requote(name, src, env):\r\n    reobj = re.compile(src.strip())\r\n    recode = pickle.dumps(reobj)\r\n    recode1 = (\"import pickle\\n\" +\r\n               \"%s = pickle.loads(%r)\\n\" % (name, recode))\r\n    ast, env = env[\"myfrontend\"](recode1, env)\r\n    return ast.body, env\r\n<\/code><\/pre>\n<p>The <tt>requote()<\/tt> function compiles the regular expression source into a regular expression object, then serializes the object using the<tt> pickle<\/tt> module.  Finally, the function generates run-time code that deserializes the pickle string.\u00c2\u00a0 The following shows the Mython portion of the first (and currently only) regular expression test (from &#8220;<tt>test_regex01.my<\/tt>&#8220;):<\/p>\n<pre><code>#! \/usr\/bin\/env mython\r\nquote [myfront]:\r\n    from basil.lang.regex import requote\r\n\r\nquote [requote] myre0:\r\n    you only need two \\\\ to match one backslash<\/code><\/pre>\n<p>The Mython test code binds a compiled regular expression to the<tt> myre0<\/tt> identifier in the bytecode module.  If we disassemble the module code object in the <tt>.pyc<\/tt> we see the following (reformatted a little):<\/p>\n<pre><code>...\r\n2          12 LOAD_NAME                0 (pickle)\r\n15 LOAD_ATTR                1 (loads)\r\n18 LOAD_CONST               2 (                \\\r\n\"cre\\n_compile\\np0\\n(S'you only need two \\\\\\\\\\\\\\\\ to match\" \\\r\n\" one backslash'\\np1\\nI0\\ntp2\\nRp3\\n.\")\r\n21 CALL_FUNCTION            1\r\n24 STORE_NAME               2 (myre0)\r\n...\r\n<\/code><\/pre>\n<p>For those who are adept at reading Python pickle strings (I was at some point in my career), we can see that the regular expression pickle simply calls the same <tt>re.compile()<\/tt> function that the quotation function called.  I would argue that at least we gained additional static checks, but hopefully developers are unit testing their embedded regular expression strings before using them in production, making static checks not pay off until there are so many regular expressions in the code base, nobody is sure they&#8217;ve all been checked.  The test does at least save us two backslashes in the demo code (though the backslashes are then double escaped in the pickle string).  I hope readers will speak up if I am missing a more efficient compile-time representation trick, such as binary pickles.<\/p>\n<p>Further building on the trick that I showed in the <a href=\"http:\/\/log.jonriehl.com\/?p=43\">November 2009 article<\/a>, Tyler and I started looking at embedding formatting strings in Mython.  For example, we might want to rewrite the<tt> requote()<\/tt> function in Mython as shown below:<\/p>\n<pre><code>def requote (name, src, env):\r\n    re_obj = re.compile(src.strip())\r\n    re_pickle_str = repr(pickle.dumps(re_obj))\r\n    quote [mython_template] out_ast:\r\n        import pickle\r\n        $name = pickle.loads($re_pickle_str)\r\n    return out_ast, env\r\n<\/code><\/pre>\n<p>This example uses a hypothetical quotation function,<tt> mython_template()<\/tt>, to generate and compile Mython code to an abstract syntax tree (AST).  This quotation function combines the string formatting, and parsing (quotation) steps of <tt>requote()<\/tt>.  Once compiled, the quoation function should expand back to something similar to the original <tt>requote()<\/tt> function.<\/p>\n<p>On our way to something like <tt>mython_template()<\/tt>, it occured to us that <a href=\"http:\/\/www.cheetahtemplate.org\/\">Cheetah<\/a> is an expressive formatting language that would be easy to embed in Mython.  The result is two new quotation functions, <tt>cheetah()<\/tt> and <tt>echeetah<\/tt>, in the<tt> basil.lang.cheetah<\/tt> module.  The <tt>cheetah()<\/tt> function takes the embedded string and uses it to create a constructor function (a curried call to the class constructor) for building a Cheetah<tt> Template<\/tt> object.  The second function, <tt>echeetah()<\/tt> builds a Cheetah <tt>Template<\/tt> instance, using the run-time environment to satisfy the namespace arguments in the constructor.  An example of using these quotation functions appears in the<tt> basil.lang.tests.test_cheetah<\/tt> module, which in turn loads, compiles, and runs the <tt>test_cheetah01.my<\/tt> Mython module.<\/p>\n<p>This work continues to build examples of quotation functions.  I have been working on getting a unit test suite set up for regression testing purposes, and something is available.  I am looking forward to hardening the Mython implementation using these tests, which will certainly be a goal of future mini-sprints, and the up-and-coming<br \/>\nsprint at <a href=\"http:\/\/us.pycon.org\/2010\/about\/\">PyCon 2010<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Tyler Green and I held a mini-sprint on Mython on December 17th, 2009. We worked on the following: The trampoline parsing framework. A regular expression quotation function. A Cheetah quotation function. I&#8217;m pleased to start moving the trampoline parsing framework out into Basil (see the basil.parsing.trampoline module, available in the Basil repository). I have been [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-85","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/log.jonriehl.com\/index.php?rest_route=\/wp\/v2\/posts\/85","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/log.jonriehl.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/log.jonriehl.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/log.jonriehl.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/log.jonriehl.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=85"}],"version-history":[{"count":7,"href":"https:\/\/log.jonriehl.com\/index.php?rest_route=\/wp\/v2\/posts\/85\/revisions"}],"predecessor-version":[{"id":92,"href":"https:\/\/log.jonriehl.com\/index.php?rest_route=\/wp\/v2\/posts\/85\/revisions\/92"}],"wp:attachment":[{"href":"https:\/\/log.jonriehl.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=85"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/log.jonriehl.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=85"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/log.jonriehl.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=85"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}