Language engineering is the future

[21 August 2008: We have been discussing language engineering over the year. The article below may be a little bit outdated by now. In any case, we prefer the term "software language engineering" (SLE) these days. This decision also has to do with the fact that there is an existing field of natural language engineering, which we think is naturally complemented by SLE on the software engineering site of things.]


Now that I kicked off LINQ to XSD (a piece of XMLware), it’s time to rest and think about IT and entropy. (As another means of distraction, I was actually trying to buy a Zune today, but it looks like I am a bit late and need to wait for the shops in Redmond to restock. Sigh!) A friend of mine, Jean-Marie Favre, at the top of his web site, challenges us by asking for a bet for the next paradigm in software engineering. Jean-Marie seems to think (and I believe he would be right then) it’s going to be language engineering. In worst case, language engineering is among the very few important trends.


The term language engineering should make you (also) think of model-driven architecture (MDA) or some forms of model-driven development (MDD) and all such MDxyz. I am the last person to tout the MDA horn because I was initially really put off by the too informal and (UML-like) model-centric way of MDA … as it appeared to me from a ‘principled CS point of view’. (As usual, I am not making any statements on behalf of or approved by Microsoft.) We had decades of research on compilers, application generators, domain-specific languages, automated software engineering, software transformation, program refinement, re-engineering and declarative programming just to see MDA coming along without paying much apparent attention to this valuable past. For instance, look at the list of references in the “MOF Model to Text Transformation Language RFP” -- Wouldn’t you expect some sort of mentioning of language processing, pretty printing, any sort of grammar-based methods? This aggressive omission of classic topics caters for reinventions of the wheel and a tower of Babel …


Loads of people have jumped on the MDxyz bandwagon, and there is an intact research community focusing on the vision of raising abstraction levels in software development on the grounds of (i) ‘modeling languages’ as opposed to mainstream programming languages and (ii) transformation and generation as opposed to writing low-level ‘platform-specific’ code in the first place. MDxyz comes with this excellent premise, assuming that I haven’t misunderstood things terribly. Being a transformation addict myself, this looks like something I want to work on ASAP -- perhaps I always did. Masterminds in software engineering such as Jean Bézivin have worked on a general view on MDxyz that is both scientifically and practically relevant and interesting. I believe Jean is using the terms ‘model-driven language engineering’ as well as technical/technological spaces.


In my own work, I have been interested in something that we call ‘grammarware engineering’; we generalized over the most obvious interpretation of the grammar term and the most obvious applications and benefits of grammars and grammar-based software. From an MDxyz perspective, we have been talking about a specific technical/technological space -- the grammar space, perhaps with modest attempts to go beyond a single space.


In October, I attended MODELS 2006 in Genoa, which is the prominent UML/MDxyz forum. Unfortunately, I wasn’t attending too many conference presentations due to a sickness acquired on the 24hours long flight (which required enormous amounts of garlic) and due to deadlines I carried with me. At least, I made sure to attend the ATEM workshop (joint organization with Jean-Marie Favre, Dragan Gasevic, Andreas Winter) and I saw Jean Marie’s prophecy taking clearer shape. That is, at ATEM and surrounding discussion at MODELS, people seemed to perceive a sense of language engineering -- I certainly did.


The point to be made is really that we live in a messy IT world of an unbearable amount of explicit and implicit language descriptions residing in code as much as everywhere else and the first law of IT entropy (made up for this post) is that ‘order’ of software can only increase if we manage to get a handle on all these language descriptions so as to (i) mediate efficiently between them; (ii) be able to evolve them in a controlled manner and; (iii) practically obsolete those language descriptions that are theoretically superseded by other language descriptions.


We need a real community effort for language engineering.

Neither MOF nor grammarware are self-contained.



Who needs a language engineer?


Rather than trying to get philosophical I am publishing the abstract of a talk I will be giving on the 8th of December at CWI at Amsterdam: Language engineers are into grammars, types and declarative programs; they seek to facilitate these concepts for the purpose of improving the practice of software development; they push forward language-engineering foundations inspired by practical needs. How capable are today's language engineers? What are the challenges for language-engineering research ahead of us? I use the format of a "stress test for a language engineer" to provide subjective answers to these questions. This stress test comprises a number of milestones, e.g.: (i) re-engineer a data-processing application so that it uses a new XML API instead of an old one; (ii) given a mapping tool from XML schemas to object models, enable OO-level class refactoring to affect the underlying XML schemas appropriately. The result of my reflection is rather terrifying in terms of the limits that we face, but this is good news from the perspective of a researcher.



Addendum: I sent a draft of this post to Jean-Marie. He potentially obsoletes my post in so far that he has published a new bet on his web site (in reply to my draft), but he is friendly enough to keep the old bet there for some time and reconfirms “Language engineering is the future …”. More importantly, JM also removed the webstats4u code from his site that bothered visitors who weren’t totally pop-up safe. I am using statcounter as my web-site hit counter except for some legacy pages that I still need to convert once language engineering has come to speed so that the conversion is carried out semi-automatically.


Ralf Lämmel


PS: Slides for my talk are now online.