Bootstrapping, Default Formatting, and Skeleton Editing in the Spoofax Language Workbench
Gabriël Konat, Luís Eduardo de Souza Amorim, Sebastian Erdweg, and Eelco Visser. In Language Workbench Challenge (LWC). 2016. [ pdf ]
The IDE Portability Problem and its Solution in Monto
Sven Keidel, Wulf Pfeiffer, and Sebastian Erdweg. In Proceedings of Software Language Engineering (SLE). ACM, 2016. [ pdf ]
Principled Syntactic Code Completion using Placeholders
Luís Eduardo de Souza Amorim, Sebastian Erdweg, Guido Wachsmuth, and Eelco Visser. In Proceedings of Software Language Engineering (SLE). ACM, 2016. [ pdf ]
Efficient Development of Consistent Projectional Editors using Grammar Cells
Markus Völter, Tamás Szabó, Sascha Lisson, Bernd Kolb, Sebastian Erdweg, and Thorsten Berger. In Proceedings of Software Language Engineering (SLE). ACM, 2016.
Nominated for Distinguished Paper Award. [ pdf ]
Bootstrapping Domain-Specific Meta-Languages in Language Workbenches
Gabriël Konat, Sebastian Erdweg, and Eelco Visser. In Proceedings of Generative Programming: Concepts & Experiences (GPCE). ACM, 2016. [ pdf ]
IncA: A DSL for the Definition of Incremental Program Analyses
Tamás Szabó, Sebastian Erdweg, and Markus Völter. In Proceedings of International Conference on Automated Software Engineering (ASE). ACM, 2016. [ pdf ]
Program analyses support software developers, for example, through error detection, code-quality assurance, and by enabling compiler optimizations and refactorings. To provide real-time feedback to developers within IDEs, an analysis must run efficiently even if the analyzed code base is large.
To achieve this goal, we present a domain-specific language called IncA for the definition of efficient incremental program analyses that update their result as the program changes. IncA compiles analyses into graph patterns and relies on existing incremental matching algorithms. To scale IncA analyses to large programs, we describe optimizations that reduce caching and prune change propagation. Using IncA, we have developed incremental control flow and points-to analysis for C, well-formedness checks for DSLs, and 10~FindBugs checks for Java. Our evaluation demonstrates significant speedups for all analyses compared to their non-incremental counterparts.
An Extensible Framework for Variable-Precision Data-Flow Analyses in MPS
Tamás Szabó, Simon Alperovich, Markus Völter, and Sebastian Erdweg. In Proceedings of International Conference on Automated Software Engineering (ASE) — Tool Track . ACM, 2016. [ pdf ]
Data-flow analyses are used as part of many software engineering tasks: they are the foundations of program understanding, refactorings and optimized code generation. Similar to general-purpose languages (GPLs), state-of-the-art domain-specific languages (DSLs) also require sophisticated data-flow analyses. However, as a consequence of the different economies of DSL development and their typically relatively fast evolution, the effort for developing and evolving such analyses must be lowered compared to GPLs. This tension can be resolved with dedicated support for data-flow analyses in language workbenches.
In this tool paper we present MPS-DF, which is the component in the MPS language workbench that supports the definition of data-flow analyses for DSLs. Language developers can define data-flow graph builders declaratively as part of a language definition and compute analysis results efficiently based on these data-flow graphs. MPS-DF is extensible such that it does not compromise the support for language composition in MPS. Additionally, clients of MPS-DF analyses can run the analyses with variable precision thus trading off precision for performance. This allows clients to tailor an analysis to a particular use case.
Exploration of Language Specifications by Compilation to First-Order Logic
Sylvia Grewe, Sebastian Erdweg, Michael Raulf, and Mira Mezini. In Proceedings of International Symposium on Principles and Practice of Declarative Programming (PPDP). ACM, 2016. [ pdf ]
Exploration of language specifications helps to discover errors and inconsistencies early during the development of a programming language. We propose exploration of language specifications via application of existing automated first-order theorem provers (ATPs). To this end, we translate language specifications and exploration tasks to first-order logic, which many ATPs accept as input. However, there are several different strategies for compiling a language specification to first-order logic, and even small variations in the translation have a large impact on the time it takes ATPs to find proofs.
In this paper, we present a systematic empirical study on how to best compile language specifications to first-order logic such that existing ATPs can solve typical exploration tasks efficiently. We have developed a compiler product line that implements 36 different compilation strategies and used it to feed language specifications to 4 existing first-order theorem provers. As a benchmark, we developed a language specification for typed SQL with 50 exploration goals. Our study confirms that the choice of a compilation strategy greatly influences prover performance and shows which strategies are advantageous for prover performance.
Automating Proof Steps of Progress Proofs: Comparing Vampire and Dafny
Sylvia Grewe, Sebastian Erdweg, and Mira Mezini. In Proceedings of Vampire Workshop. EPiC, 2016. [ pdf ]
Developing provably sound type systems is a non-trivial task which, as of today, typically requires expert skills in formal methods and a considerable amount of time. Our Veritas project aims at providing support for the development of soundness proofs of type systems and efficient type checker implementations from type system specifications. To this end, we investigate how to best automate typical steps within type soundness proofs.
In this paper, we focus on progress proofs for type systems of domain-specific languages. As a running example for such a type system, we model a subset SQL and augment it with a type system. We compare two different approaches for automating proof steps of the progress proofs for this type system against each other: firstly, our own tool Veritas, which translates proof goals and specifications automatically to TPTP and calls Vampire on them, and secondly, the programming language Dafny, which translates proof goals and specifications to the intermediate verification language Boogie 2 and calls the SMT solver Z3 on them. We find that Vampire and Dafny are equally well-suited for automatically proving simple steps within progress proofs.
Towards Live Language Development
Gabriël Konat, Sebastian Erdweg, and Eelco Visser. In Proceedings of Workshop on Live Programming Systems (LIVE). 2016. [ pdf ]
We would like to see live programming applied to language development, to get live language development. With live language development, a language developer gets fast feedback when they change their language, enabling experimentation with language design and development.
In this paper, we describe what live language development is and why it is useful, and we analyze what is needed to achieve live language development. Moreover, we describe our work in progress in supporting live language development in the Spoofax language workbench.
CPL: A Core Language for Cloud Computing
Oliver Bračevac, Sebastian Erdweg, Guido Salvaneschi, and Mira Mezini. In Proceedings of International Conference on Modularity (AOSD), pages 94–105. ACM, 2016. [ pdf ]
Running distributed applications in the cloud involves deployment. That is, distribution and configuration of application services and middleware infrastructure. The considerable complexity of these tasks resulted in the emergence of declarative JSON-based domain-specific deployment languages to develop deployment programs. Existing deployment programs unsafely compose artifacts written in different languages, leading to bugs that are hard to detect before run time. Furthermore, deployment languages do not have extension points for custom implementations of existing cloud services, e.g., adding application-specific load balancing policies is difficult.
To address these shortcomings, we propose CPL (Cloud Platform Language), a statically-typed core language for programming both distributed applications as well as their deployment on a cloud platform. In CPL, application services and deployment programs interact through statically typed, extensible interfaces, and an application can trigger further deployment at run time. We formally define the dynamic and static semantics of CPL and demonstrate that it enables type-safe, composable and extensible libraries of service combinators, such as load balancing and fault tolerance.
Sound Type-Dependent Syntactic Language Extension
Florian Lorenzen and Sebastian Erdweg. In Proceedings of Symposium on Principles of Programming Languages (POPL), pages 204–216. ACM, 2016. [ pdf ]
Syntactic language extensions can introduce new facilities into a programming language while requiring little implementation effort and modest changes to the compiler. It is typical to desugar language extensions in a distinguished compiler phase after parsing or type checking, not affecting any of the later compiler phases. If desugaring happens before type checking, the desugaring cannot depend on typing information and type errors are reported in terms of the generated code. If desugaring happens after type checking, the code generated by the desugaring is not type checked and may introduce vulnerabilities. Both options are undesirable.
We propose a system for syntactic extensibility where desugaring happens after type checking and desugarings are guaranteed to only generate well-typed code. A major novelty of our work is that desugarings operate on typing derivations instead of plain syntax trees. This provides desugarings access to typing information and forms the basis for the soundness guarantee we provide, namely that a desugaring generates a valid typing derivation. We have implemented our system for syntactic extensibility in a language-independent fashion and instantiated it for a substantial subset of Java, including generics and inheritance. We provide a sound Java extension for Scala-like for-comprehensions.
Evaluating and Comparing Language Workbenches: Existing Results and Benchmarks for the Future
Sebastian Erdweg, Tijs van der Storm, Markus Völter, Laurence Tratt, Meinte Boersma, Remi Bosman, William R. Cook, Albert Gerritsen, Angelo Hulshout, Steven Kelly, Alex Loh, Gabriël Konat, Pedro J. Molina, Martin Palatnik, Risto Pohjonen, Eugen Schindler, Klemens Schindler, Riccardo Solmi, Vlad Vergu, Eelco Visser, Kevin van der Vlist, Guido Wachsmuth, and Jimi van der Woning. In Computer Languages, Systems and Structures, 44, Part A:24–47. Elsevier, 2015. [ pdf ]
Language workbenches are environments for simplifying the creation and use of computer languages. The annual Language Workbench Challenge (LWC) was launched in 2011 to allow the many academic and industrial researchers in this area an opportunity to quantitatively and qualitatively compare their approaches. We first describe all four LWCs to date, before focussing on the approaches used, and results generated, during the third LWC. We give various empirical data for ten approaches from the third LWC. We present a generic feature model within which the approaches can be understood and contrasted. Finally, based on our experiences of the existing LWCs, we propose a number of benchmark problems for future LWCs.
Modular Capture Avoidance for Program Transformations
Nico Ritschel and Sebastian Erdweg. In Proceedings of Software Language Engineering (SLE), 59–70. ACM, 2015. Best Presentation Award. [ pdf ]
The application of program transformations and refactorings involves the risk of capturing variables, which may break the intended semantics of the transformed code. One way to resolve variable capture is by renaming of the involved identifiers. However, in a modular context, the renaming of exported declarations is undesirable (affecting a module's clients), and the renaming of imported declarations is impossible (requiring changes to third-party modules).
We present an algorithm name-fix that detects and eliminates variable capture modularly. We extend a previous non-modular version of name-fix in order to (i) minimize renamings of exported declarations, (ii) propagate necessary renamings of exported declarations to clients, and (iii) avoid renamings of imported declarations altogether. Together with support for transitive name bindings and conflicting declarations, our extensions to name-fix enable the application to real-world languages that feature separate compilation. To demonstrate the applicability of name-fix, we use it to modularly resolve variable capture for optimizations, refactorings, and desugarings of Lightweight Java.
A Sound and Optimal Incremental Build System with Dynamic Dependencies
Sebastian Erdweg, Moritz Lichter, and Manuel Weiel. In Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 89–106. ACM, 2015. [ pdf ]
Build systems are used in all but the smallest software projects to invoke the right build tools on the right files in the right order. A build system must be sound (after a build, generated files consistently reflect the latest source files) and efficient (recheck and rebuild as few build units as possible). Contemporary build systems provide limited efficiency because they lack support for expressing fine-grained file dependencies.
We present a build system called pluto that supports the definition of reusable, parameterized, interconnected builders. When run, a builder notifies the build system about dynamically required and produced files as well as about other builders whose results are needed. To support fine-grained file dependencies, we generalize the traditional notion of time stamps to allow builders to declare their actual requirements on a file's content. pluto collects the requirements and products of a builder with their stamps in a build summary. This enables pluto to provides provably sound and optimal incremental rebuilding. To support dynamic dependencies, our rebuild algorithm interleaves dependency analysis and builder execution and enforces invariants on the dependency graph through a dynamic analysis. We have developed pluto as a Java API and used it to implement more than 25 builders. We describe our experience with migrating a larger Ant build script to pluto and compare the respective build times.
A Co-contextual Formulation of Type Rules and Its Application to Incremental Type Checking
Sebastian Erdweg, Oliver Bračevac, Edlira Kuci, Matthias Krebs, and Mira Mezini. In Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 880–897. ACM, 2015. [ pdf ]
Type rules associate types to expressions given a typing context. As the type checker traverses the expression tree top-down, it extends the typing context with additional context information that becomes available. This way, the typing context coordinates type checking in otherwise independent subexpressions, which inhibits parallelization and incrementalization of type checking.
We propose a co-contextual formulation of type rules that only take an expression as input and produce a type and a set of context requirements. Co-contextual type checkers traverse an expression tree bottom-up and merge context requirements of independently checked subexpressions. We describe a method for systematically constructing a co-contextual formulation of type rules from a regular context-based formulation and we show how co-contextual type rules give rise to incremental type checking. Using our method, we derive incremental type checkers for PCF and for extensions that introduce records, parametric polymorphism, and subtyping. Our performance evaluation shows that co-contextual type checking has performance comparable to standard context-based type checking, and incrementalization can improve performance significantly.
Towards Secure Integration of Cryptographic Software
Steven Arzt, Sarah Nadi, Karim Ali, Eric Bodden, Sebastian Erdweg, and Mira Mezini. In Proceedings of Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software (Onward), 1–13. ACM, 2015. [ pdf ]
While cryptography is now readily available to everyone and can, provably, protect private information from attackers, we still frequently hear about major data leakages, many of which are due to improper use of cryptographic mechanisms. The problem is that many application developers are not cryptographic experts. Even though high-quality cryptographic APIs are widely available, programmers often select the wrong algorithms or misuse APIs due to a lack of understanding. Such issues arise with both simple operations such as encryption as well as with complex secure communication protocols such as SSL. In this paper, we provide a long-term solution that helps application developers integrate cryptographic components correctly and securely by bridging the gap between cryptographers and application developers.
Our solution consists of a software product line (with an underlying feature model) that automatically identifies the correct cryptographic algorithms to use, based on the developer's answers to high-level questions in non-expert terminology. Each feature (i.e., cryptographic algorithm) maps into corresponding Java code and a usage protocol describing API restrictions. By composing the user's selected features, we automatically synthesize a secure code blueprint and a usage protocol that corresponds to the selected usage scenario. Since the developer may change the application code over time, we use the usage protocols to statically analyze the program and ensure that the correct use of the API is not violated over time.
Type Systems for the Masses: Deriving Soundness Proofs and Efficient Checkers
Sylvia Grewe, Sebastian Erdweg, Pascal Wittmann, and Mira Mezini. In Proceedings of Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software (Onward), 137–150. ACM, 2015. [ pdf ]
The correct definition and implementation of non-trivial type systems is difficult and requires expert knowledge, which is not available to developers of domain-specific languages and specialized APIs in practice. We propose Veritas, a workbench that simplifies the development of sound type systems. Veritas provides a single, high-level specification language for type systems, from which it automatically tries to derive soundness proofs and efficient and correct type-checking algorithms. For verification, Veritas combines off-the-shelve automated first-order theorem provers with automated proof strategies specific to type systems. For deriving efficient type checkers, Veritas provides a collection of optimization strategies whose applicability to a given type system is checked through verification on a case-by-case basis. We have developed a prototypical implementation of Veritas and used it to verify type soundness of the simply-typed lambda calculus and of parts of typed SQL. Our experience suggests that a high degree of automation is possible for type systems of domain-specific languages.
Using Vampire in Soundness Proofs of Type Systems
Sylvia Grewe, Sebastian Erdweg, and Mira Mezini. In Proceedings of Vampire Workshop, pages 33–51. EPiC, 2015. [ pdf ]
Type systems for programming languages shall detect type errors in programs before runtime. To ensure that a type system meets this requirement, its soundness must be formally verified. We aim at automating soundness proofs of type systems to facilitate the development of sound type systems for domain-specific languages.
Soundness proofs for type systems typically require induction. However, many of the proofs of individual induction cases only require first-order reasoning. For the development of our workbench Veritas, we build on this observation by combining automated first-order theorem provers such as Vampire with automated proof strategies specific to type systems. In this paper, we describe how we encode type soundness proofs in first-order logic using TPTP. We show how we use Vampire to prove the soundness of type systems for the simply-typed lambda calculus and for parts of a typed SQL. We report on which parts of the proofs are handled well by Vampire, and what parts work less well with our current approach.
Domain-specific languages (Dagstuhl Seminar 15062)
Sebastian Erdweg, Martin Erwig, Richard F. Paige, and Eelco Visser. In Dagstuhl Reports, 5(2):26–43. 2015. [ pdf ]
This report documents the program and outcomes of Dagstuhl Seminar 15062 "Domain-Specific Languages", which took place February 1-6, 2015. The seminar was motivated on the one hand by the high interest in domain-specific languages in academia and industry and on the other hand by the observation that the community is divided into largely disconnected subdisciplines (e.g., internal, external, visual, model-driven). The seminar included participants across these subdisciplines and included overview talks, technical talks, demos, discussion groups, and an industrial panel. This report collects the abstracts of talks and other activities at the seminar and summarizes the outcomes of the seminar.
Capture-Avoiding Program Transformations with name-fix
Sebastian Erdweg, Tijs van der Storm, and Yi Dai. In Proceedings of GI Fachtagung Software Engineering, pages 93–94. GI, 2015.
We present an algorithm called name-fix that automatically eliminates variable capture from a generated program by systematically renaming variables. name-fix is guided by a graph representation of the binding structure of a program, and requires name-resolution algorithms for the source language and the target language of a transformation. name-fix is generic and works for arbitrary transformations in any transformation system that supports origin tracking for names.
i3QL: Language-Integrated Live Data Views
Ralf Mitschke, Sebastian Erdweg, Mirko Köhler, Mira Mezini, and Guido Salvaneschi. In Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 417–432. ACM, 2014. [ pdf ]
An incremental computation updates its result based on a change to its input, which is often an order of magnitude faster than a recomputation from scratch. In particular, incrementalization can make expensive computations feasible for settings that require short feedback cycles, such as interactive systems, IDEs, or (soft) real-time systems.
This paper presents i3QL, a general-purpose programming language for specifying incremental computations. i3QL provides a declarative SQL-like syntax and is based on incremental versions of operators from relational algebra, enriched with support for general recursion. We integrated i3QL into Scala as a library, which enables programmers to use regular Scala code for non-incremental subcomputations of an i3QL query and to easily integrate incremental computations into larger software projects. To improve performance, i3QL optimizes user-defined queries by applying algebraic laws and partial evaluation. We describe the design and implementation of i3QL and its optimizations, demonstrate its applicability, and evaluate its performance.
Towards Virtual Traits in Scala
Manuel Weiel, Ingo Maier, Sebastian Erdweg, Michael Eichberg, and Mira Mezini. In Proceedings of Scala Workshop, pages 67–75. ACM, 2014. [ pdf ]
Scala is a powerful language that supports a variety of features, but it lacks virtual traits. Virtual traits are class-valued object attributes and can be redefined within subtraits. They support higher-order hierarchies and family polymorphism. This work introduces virtual traits into Scala and explains how to encode virtual traits on top of existing Scala features. We have implemented this encoding using Scala annotation macros and have conducted two small case studies.
Evolution of Software Systems with Extensible Languages and DSLs
Sebastian Erdweg, Stefan Fehrenbach, and Klaus Ostermann. In IEEE Software, Special Issue on New Directions in Programming Languages, 31(5):68–75. IEEE, 2014. [ pdf ]
Domain-specific languages (DSLs) provide various advantages regarding the maintainability of software systems. Unfortunately, existing software systems do not exploit DSLs and their maintenance benefits. Based on the extensible programming language SugarJ, we present a process for gradually integrating DSLs into existing software systems, we report on our experience in integrating three DSLs into two existing software systems, and we outline a roadmap for the development of tool support for the integration of DSLs.
Capture-Avoiding and Hygienic Program Transformations
Sebastian Erdweg, Tijs van der Storm, and Yi Dai. In Proceedings of European Conference on Object-Oriented Programming (ECOOP), pages 489–514. Springer, 2014. [ pdf ]
Program transformations in terms of abstract syntax trees compromise referential integrity by introducing variable capture. Variable capture occurs when in the generated program a variable declaration accidentally shadows the intended target of a variable reference. Existing transformation systems either do not guarantee the avoidance of variable capture or impair the implementation of transformations.
We present an algorithm called name-fix that automatically eliminates variable capture from a generated program by systematically renaming variables. name-fix is guided by a graph representation of the binding structure of a program, and requires name-resolution algorithms for the source language and the target language of a transformation. name-fix is generic and works for arbitrary transformations in any transformation system that supports origin tracking for names. We verify the correctness of name-fix and identify an interesting class of transformations for which name-fix provides hygiene. We demonstrate the applicability of name-fix for implementing capture-avoiding substitution, inlining, lambda lifting, and compilers for two domain-specific languages.
Tracing Program Transformations with String Origins
Pablo Inostroza, Tijs van der Storm, and Sebastian Erdweg. In Proceedings of International Conference on Model Transformation (ICMT), pages 154–169. Springer, 2014. [ pdf ]
Program transformations play an important role in domain-specific languages and model-driven development. Tracing the execution of such transformations has well-known benefits for debugging, visualization and error reporting. In this paper, we introduce string origins, a lightweight, generic and portable technique to establish a tracing relation between the textual fragments in the input and output of a program transformation. We discuss the semantics and the implementation of string origins using the Rascal meta programming language as an example. We illustrate the utility of string origins by presenting data structures and operations for tracing generated code, implementing protected regions, performing name resolution and fixing inadvertent name capture in generated code.
Modular Specification and Dynamic Enforcement of Syntactic Language
Constraints when Generating Code
Sebastian Erdweg, Vlad Vergu, Mira Mezini, and Eelco Visser. In Proceedings of International Conference on Modularity (AOSD), pages 241–252. ACM, 2014. [ pdf ]
A key problem in metaprogramming is to guarantee that generated code is well-formed with respect to the context-free and context-sensitive constraints of the target language. We propose typesmart constructors as a dynamic approach to enforcing the well-formedness of generated code. A typesmart constructor is a function that is used in place of a regular constructor to create values, but it may reject the creation of values if the given data violates some language-specific constraint. While typesmart constructors can be implemented individually, we demonstrate how to derive them automatically from a grammar, so that the grammar remains the sole specification of a language's syntax and is not duplicated. We have integrated support for typesmart constructors into the run-time system of Stratego to enforce usage of typesmart constructors implicitly whenever a regular constructor is called. We evaluate the applicability, performance, and usefulness of typesmart constructors for syntactic constraints in a compiler for MiniJava developed with Spoofax and in various language extensions of Java and Haskell implemented with SugarJ and SugarHaskell.
Finding Bugs in Program Generators by Dynamic Analysis of Syntactic
Language Constraints (demo paper)
Sebastian Erdweg, Vlad Vergu, Mira Mezini, and Eelco Visser. In Proceedings of International Conference on Modularity (AOSD), pages 17–20. ACM, 2014. [ pdf ]
Program generators and transformations are hard to implement correctly, because the implementation needs to generically describe how to construct programs, for example, using templates or rewrite rules. We apply dynamic analysis to program generators in order to support developers in finding bugs and identifying the source of the bug. Our analysis focuses on syntactic language constraints and checks that generated programs are syntactically well-formed. To retain a language's grammar as the unique specification of the language's syntax, we devised mechanisms to derive the analysis from the grammar. Moreover, we designed a run-time system to support the modular activation/deactivation of the analysis, so that generators do not require adaption. We have implemented the analysis for the Stratego term-rewriting language and applied it in case studies based on Spoofax and SugarJ.
A Framework for Extensible Languages
Sebastian Erdweg and Felix Rieger. In Proceedings of Generative Programming: Concepts & Experiences (GPCE), pages 3–12. ACM, 2013. [ pdf ]
Template Constructors for Reusable Object Initialization
Marko Martin, Mira Mezini, and Sebastian Erdweg. In Proceedings of Generative Programming: Concepts & Experiences (GPCE), pages 43–52. ACM, 2013. [ pdf ]
Reuse of and abstraction over object initialization logic is not properly supported in mainstream object-oriented languages. This may result in significant amount of boilerplate code and proliferation of constructors in subclasses. It also makes it impossible for mixins to extend the initialization interface of classes they are applied to. We propose template constructors, which employ template parameters and pattern matching of them against signatures of superclass constructors to enable a one-to-many binding of super-calls. We demonstrate how template constructors solve the aforementioned problems. We present a formalization of the concept, a Java-based implementation, and use cases which exercise its strengths.
Software Evolution to Domain-Specific Languages
Stefan Fehrenbach, Sebastian Erdweg, and Klaus Ostermann. In Proceedings of Software Language Engineering (SLE), volume 8225 of LNCS, pages 96–116. Springer, 2013. [ pdf ]
Domain-specific languages (DSLs) can improve software maintainability due to less verbose syntax, avoidance of boilerplate code, more accurate static analysis, and domain-specific tool support. However, most existing applications cannot capitalize on these benefits because they were not designed to use DSLs, and rewriting large existing applications from scratch is infeasible. We propose a process for evolving existing software to use embedded DSLs based on modular definitions and applications of syntactic sugar as provided by the extensible programming language SugarJ. Our process is incremental along two dimensions: A developer can add support for another DSL as library, and a developer can refactor more code to use the syntax, static analysis, and tooling of a DSL. Importantly, the application remains executable at all times and no complete rewrite is necessary. We evaluate our process by incrementally evolving the Java Pet Store and a deliberately small part of the Eclipse IDE to use language support for field-accessors, JPQL, XML, and XML Schema. To help maintainers to locate Java code that would benefit from using DSLs, we developed a tool that analyzes the definition of a DSL to derive patterns of Java code that could be represented with a high-level abstraction of the DSL instead.
The State of the Art in Language Workbenches
Sebastian Erdweg, Tijs van der Storm, Markus Völter, Meinte Boersma, Remi Bosman, William R. Cook, Albert Gerritsen, Angelo Hulshout, Steven Kelly, Alex Loh, Gabriël Konat, Pedro J. Molina, Martin Palatnik, Risto Pohjonen, Eugen Schindler, Klemens Schindler, Riccardo Solmi, Vlad Vergu, Eelco Visser, Kevin van der Vlist, Guido Wachsmuth, and Jimi van der Woning. In Proceedings of Software Language Engineering (SLE), volume 8225 of LNCS, pages 197–217. Springer, 2013. [ pdf ]
Language workbenches are tools that provide high-level mechanisms for the implementation of (domain-specific) languages. Language workbenches are an active area of research that also receives many contributions from industry. To compare and discuss existing language workbenches, the annual Language Workbench Challenge was launched in 2011. Each year, participants are challenged to realize a given domain-specific language with their workbenches as a basis for discussion and comparison. In this paper, we describe the state of the art of language workbenches as observed in the previous editions of the Language Workbench Challenge. In particular, we capture the design space of language workbenches in a feature model and show where in this design space the participants of the 2013 Language Workbench Challenge reside. We compare these workbenches based on a DSL for questionnaires that was realized in all workbenches.
Modular and Automated Type-Soundness Verification for Language Extensions
Florian Lorenzen and Sebastian Erdweg. In Proceedings of International Conference on Functional Programming (ICFP). pages 331–342. ACM, 2013. [ pdf ]
Language extensions introduce high-level programming constructs that protect programmers from low-level details and repetitive tasks. For such an abstraction barrier to be sustainable, it is important that no errors are reported in terms of generated code. A typical strategy is to check the original user code prior to translation into a low-level encoding, applying the assumption that the translation does not introduce new errors. Unfortunately, such assumption is untenable in general, but in particular in the context of extensible programming languages, such as Racket or SugarJ, that allow regular programmers to define language extensions.
In this paper, we present a formalism for building and automatically verifying the type-soundness of syntactic language extensions. To build a type-sound language extension with our formalism, a developer declares an extended syntax, type rules for the extended syntax, and translation rules into the (possibly further extended) base language. Our formalism then validates that the user-defined type rules are sufficient to guarantee that the code generated by the translation rules cannot contain any type errors. This effectively ensures that an initial type check prior to translation precludes type errors in generated code. We have implemented a core system in PLT Redex and we have developed a syntactically extensible variant of System F-omega that we extend with let notation, monadic do blocks, and algebraic data types. Our formalism verifies the soundness of each extension automatically.
XPoints: Extension Interfaces for Multilayered Applications
Mohamed Aly, Anis Charfi, Sebastian Erdweg, and Mira Mezini. In Proceedings of Enterprise Distributed Object Computing (EDOC), pages 237–246. IEEE, 2013. [ pdf ]
Extensibility is a key requirement in modern software applications. In the context of business applications it is one of the major selection criteria from the customer perspective. However, there are some challenges concerning the specification and enforcement of extension interfaces. Extension interfaces define the resources of the base applications that are allowed to be extended, where and when the extension code will run, and what resources of the base application an extension is allowed to access. While concepts for such interfaces are still a hot research topic for "traditional" software constructed using a single programming language, they are completely missing for complex consisting of several abstraction layers. In addition, state-of-the-art approaches do not support providing different extension interfaces for different stakeholders.
This paper attempts to fill this gap by introducing XPoints, an approach and a language for specifying and enforcing extension interfaces in multilayered applications. An extension interface in XPoints defines the available extension points on the different abstraction layers, controls the access and visibility of the core application to the extension, and con- strains the interplay between extension points possibly from different abstraction layers. Several extension interfaces can be overlaid over the same core application, hence, enabling multiple extender views to co-exist. Using an XPoints interface, a software provider can automatically generate the extensibility code infrastructure to provide the extension interface for the core application.
Modular and Automated Type-Soundness Verification for
(extended abstract) Sebastian Erdweg and Florian Lorenzen. In Proceedings of Workshop on Scalable Language Specification (SLS), 2013. [ pdf ]
We present a formalism for building language extensions that ensures soundness of extensions with respect to the type system of the base language. To build a type-sound language extension with our formalism, a developer declares an extended syntax, type rules for the extended syntax, and translation rules into the (possibly further extended) base language. Our formalism then modularly and automatically validates that the user-defined type rules are sufficient to guarantee that code generated by the translation rules cannot contain any type errors.
Extensible Languages for Flexible and Principled Domain Abstraction
Sebastian Erdweg. PhD thesis, Philipps-Universität Marburg, 2013. [ pdf ]
Most programming languages are designed for general-purpose software development in a one-size-fits-all fashion: They provide the same set of language features and constructs for all possible applications programmers ever may want to develop. As with shoes, the one-size-fits-all solution grants a good fit to few applications only.
The trend toward domain-specific languages, model-driven development, and language-oriented programming counters general-purpose languages by promoting the use of domain abstractions that facilitate domain-specific language features and constructs tailored to certain application domains. In particular, domain abstraction avoids the need for encoding domain concepts with general-purpose language features and thus allows programmers to program at the same abstraction level as they think.
Unfortunately, current approaches to domain abstraction cannot deliver on the promises of domain abstraction. On the one hand, approaches that target internal domain-specific languages lack flexibility regarding the syntax, static checking, and tool support of domain abstractions, which limits the level of actually achieved domain abstraction. On the other hand, approaches that target external domain-specific languages lack important principles, such as modular reasoning and composition of domain abstractions, which inhibits the applicability of these approaches in the development of larger software systems. In this thesis, we pursue a novel approach that unifies the advantages of internal and external domain-specific languages to support flexible and principled domain abstraction.
We propose library-based extensible programming languages as a basis for domain abstraction. In an extensible language, domain abstraction can be realized by extending the language with domain-specific syntax, static analysis, and tool support. This enables domain abstractions as flexible as external domain-specific languages. To ensure the compliance with important software-development principles, we organize language extensions as libraries and use simple import statements to activate extensions. This facilitates modular reasoning (by inspecting import statements), supports the composition of domain abstractions (by importing multiple extensions), and allows uniform self-application of language extensions in the development of further extensions (by importing extensions in an extension definition). A library-based organization of extensions enables domain abstractions as principled as internal domain-specific languages.
We designed and implemented SugarJ, a library-based extensible programming language on top of Java. SugarJ libraries can declare and export extensions of SugarJ's syntax, static analysis, and editor support. Thereby, a syntactic extension consists of an extended syntax and a desugaring transformation from the extended syntax into SugarJ base syntax, an analysis extension matches on part of the current file's abstract syntax tree and produces a list of errors, and an editor extension declares editor services such as coloring or code completion for certain language constructs. SugarJ extensions are fully self-applicable: An extended syntax can desugar into the declaration of another extensions, an extended analysis can check the declaration of an extension, and an extended editor can assist developers in writing extensions. To process a source file with extensions, the SugarJ compiler and IDE inspect the imported libraries to determine active extensions. The compiler and IDE adapt the parser, code generator, analyzer, and editor of the source file according to the active extensions.
In this thesis, we do not only describe the design and implementation of SugarJ, but also report on extensions of the original design. In particular, we designed and implemented a generalization of the SugarJ compiler that supports alternative base languages besides Java. Using this generalization, we developed the library-based extensible programming languages SugarHaskell, SugarProlog, and SugarFomega. Furthermore, we developed an extension of SugarJ that supports polymorphic domain abstraction and ensures communication integrity. Polymorphic domain abstraction enables programmers to provide multiple desugarings for the same domain-specific syntax. This increases the flexibility of SugarJ and supports scenarios known from model-driven development. Communication integrity specifies that components of a software system may communicate over explicit channels only. This is interesting in the context of code generation where it effectively prohibits the generation of implicit module dependencies. We augmented SugarJ's principles by enforcing communication integrity.
On the basis of SugarJ and numerous case studies, we argue that flexible and principled domain abstraction constitutes a scalable programming model for the development of complex software systems.
Embedding a Questionnaire DSL with SugarJ
Sebastian Erdweg. In Language Workbench Challenge (LWC), 2013. [ pdf ]
We describe our SugarJ-based solution to the 2013 Language Workbench Competition. As part of this competition, we developed domain-specific language (DSL) for questionnaires that features conditional questions, locally defined questions, derived values, name checking, type checking, checking for overlapping question instances, and basic tool support such as code coloring. Using SugarJ, we have realized the questionnaire DSL as a language extension of Java that translates a questionnaire into a Java Swing component. We have realized the questionnaire DSL via separate components for the syntax, code generation, tool support, name checking, type checking, and overlap checking. Moreover, we use SugarJ's support for layout-sensitive syntax to use indentation instead of parentheses in the design of the questionnaire DSL.
Language Composition Untangled
Sebastian Erdweg, Paolo G. Giarrusso, and Tillmann Rendel. In Proceedings of Workshop on Language Descriptions, Tools and Applications (LDTA), pages 7:1–7:8. ACM, 2012. [ bib | pdf ]
In language-oriented programming and modeling, software developers are largely concerned with the definition of domain-specific languages (DSLs) and their composition. While various implementation techniques and frameworks exist for defining DSLs, language composition has not obtained enough attention and is not well-enough understood. In particular, there is a lack of precise terminology for describing observations about language composition in theory and in existing language-development systems. To clarify the issue, we specify five forms of language composition: language extension, language restriction, language unification, self-extension, and extension composition. We illustrate this classification by various examples and apply it to discuss the performance of different language-development system with respect to language composition. We hope that the terminology provided by our classification will enable more precise communication on language composition.
Layout-sensitive Generalized Parsing
Sebastian Erdweg, Tillmann Rendel, Christian Kästner, and Klaus Ostermann. In Proceedings of Software Language Engineering (SLE), volume 7745 of LNCS, pages 244–263. Springer, 2012. [ pdf ]
The theory of context-free languages is well-understood and context-free parsers can be used as off-the-shelf tools in practice. In particular, to use a context-free parser framework, a user does not need to understand its internals but can specify a language declaratively as a grammar. However, many languages in practice are not context-free. One particularly important class of such languages is layout-sensitive languages, in which the structure of code depends on indentation and whitespace. For example, Python, Haskell, F#, and Markdown use indentation instead of curly braces to determine the block structure of code. Their parsers (and lexers) are not declaratively specified but hand-tuned to account for layout-sensitivity.
To support declarative specifications of layout-sensitive languages, we propose a parsing framework in which a user can annotate layout in a grammar as constraints on the relative positioning of tokens in the parsed subtrees. For example, a user can declare that a block consists of statements that all start on the same column. We have integrated layout constraints into SDF and implemented a layout-sensitive generalized parser as an extension of generalized LR parsing. We evaluate the correctness and performance of our parser by parsing 33290 open-source Haskell files. Layout-sensitive generalized parsing is easy to use, and its performance overhead compared to layout-insensitive parsing is small enough for most practical applications.
A Variability-Aware Module System
Christian Kästner, Klaus Ostermann, and Sebastian Erdweg. In Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 773–792. ACM, 2012. [ pdf ]
Module systems enable a divide and conquer strategy to software development. To implement compile-time variability in software product lines, modules can be composed in different combinations. However, this way variability dictates a dominant decomposition. Instead, we introduce a variability-aware module system that supports compile-time variability inside a module and its interface. This way, each module can be considered a product line that can be type checked in isolation. Variability can crosscut multiple modules. The module system breaks with the antimodular tradition of a global variability model in product-line development and provides a path toward software ecosystems and product lines of product lines developed in an open fashion. We discuss the design and implementation of such a module system on a core calculus and provide an implementation for C, which we use to type check the open source product line Busybox with 811 compile-time options.
Layout-sensitive Language Extensibility with SugarHaskell
Sebastian Erdweg, Felix Rieger, Tillmann Rendel, and Klaus Ostermann. In Proceedings of Haskell Symposium, pages 149–160. ACM, 2012. [ bib | pdf ]
Programmers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more scenarios (e.g., arrow notation in GHC), and some Haskell programmers implement preprocessors for their individual needs (e.g., idiom brackets in SHE). But manually written preprocessors cannot scale: They are expensive, error-prone, and not composable. Most researchers and programmers therefore refrain from using the syntactic notations they need in actual Haskell programs, but only use them in documentation or papers. We present a syntactically extensible version of Haskell, SugarHaskell, that empowers ordinary programmers to implement and use custom syntactic sugar.
Building on our previous work on syntactic extensibility for Java, SugarHaskell integrates syntactic extensions as sugar libraries into Haskell's module system. Syntax extensions in SugarHaskell can declare arbitrary context-free and layout-sensitive syntax. SugarHaskell modules are compiled into Haskell modules and further processed by a Haskell compiler. We provide an Eclipse-based IDE for SugarHaskell that is extensible, too, and automatically provides syntax coloring for all syntax extensions imported into a module.
We have validated SugarHaskell with several case studies, including arrow notation (as implemented in GHC) and EBNF as a concise syntax for the declaration of algebraic data types with associated concrete syntax. EBNF declarations also show how to extend the extension mechanism itself: They introduce syntactic sugar for using the declared concrete syntax in other SugarHaskell modules.
Toward Variability-Aware Testing
Christian Kästner, Alexander von Rhein, Sebastian Erdweg, Jonas Pusch, Sven Apel, Tillmann Rendel, Klaus Ostermann. In Proceedings of Workshop on Feature-Oriented Software Development (FOSD), pages 1–8. ACM, 2012. [ pdf ]
We investigate how to execute a unit test in all configurations of a product line without generating each product in isolation in a brute-force fashion. Learning from variability-aware analyses, we (a) design and implement a variability-aware interpreter and (b) reencode variability of the product line to simulate the test cases with a model checker. The interpreter internally reasons about variability, executing paths not affected by variability only once for the whole product line. The model checker achieves similar results by reusing powerful off-the-shelf analyses. We experimented with a prototype implementation for each strategy. We compare both strategies and discuss trade-offs and future directions.
Library-based Model-driven Software Development with SugarJ (demo paper)
Sebastian Erdweg and Lennart C. L. Kats and Tillmann Rendel and Christian Kästner and Klaus Ostermann and Eelco Visser. In Companion to the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 17–18. ACM, 2011. [ bib ]
SugarJ: Library-based Language Extensibility (poster paper)
Sebastian Erdweg and Lennart C. L. Kats and Tillmann Rendel and Christian Kästner and Klaus Ostermann and Eelco Visser. In Companion to the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 187–188. ACM, 2011. [ bib ]
SugarJ is a Java-based programming language that provides extensible surface syntax, static analyses, and IDE support. SugarJ extensions are organized as libraries; conventional import statements suffice to activate and compose language extensions. We demonstrate how programmers can use SugarJ to modularly extend Java's syntax, semantic analyses and IDE support.
Growing a Language Environment with Editor Libraries
Sebastian Erdweg and Lennart C. L. Kats and Tillmann Rendel and Christian Kästner and Klaus Ostermann and Eelco Visser. In Proceedings of Conference on Generative Programming and Component Engineering (GPCE), pages 167–176. ACM, 2011. [ bib | pdf ]
Large software projects consist of code written in a multitude of different (possibly domain-specific) languages, which are often deeply interspersed even in single files. While many proposals exist on how to integrate languages semantically and syntactically, the question of how to support this scenario in integrated development environments (IDEs) remains open: How can standard IDE services, such as syntax highlighting, outlining, or reference resolving, be provided in an extensible and compositional way, such that an open mix of languages is supported in a single file?
Based on our library-based syntactic extension language for Java, SugarJ, we propose to make IDEs extensible by organizing editor services in editor libraries. Editor libraries are libraries written in the object language, SugarJ, and hence activated and composed through regular import statements on a file-by-file basis. We have implemented an IDE for editor libraries on top of SugarJ and the Eclipse-based Spoofax language workbench. We have validated editor libraries by evolving this IDE into a fully-fledged and schema-aware XML editor as well as an extensible LaTeX editor, which we used for writing this paper.
SugarJ: Library-based Syntactic Language Extensibility
Sebastian Erdweg, Tillmann Rendel, Christian Kästner and Klaus Ostermann. In Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 391–406. ACM, 2011. Distinguished Paper Award. [ bib | pdf ]
Existing approaches to extend a programming language with syntactic sugar often leave a bitter taste, because they cannot be used with the same ease as the main extension mechanism of the programming language—libraries. Sugar libraries are a novel approach for syntactically extending a programming language within the language. A sugar library is like an ordinary library, but can, in addition, export syntactic sugar for using the library. Sugar libraries maintain the composability and scoping properties of ordinary libraries and are hence particularly well-suited for embedding a multitude of domain-specific languages into a host language. They also inherit the self-applicability of libraries, which means that the syntax extension mechanism can be applied in the definition of sugar libraries themselves.
To demonstrate the expressiveness and applicability of sugar libraries, we have developed SugarJ, a language on top of Java, SDF and Stratego that supports syntactic extensibility. SugarJ employs a novel incremental parsing technique that allows changing the syntax within a source file. We demonstrate SugarJ by five language extensions, including embeddings of XML and closures in Java, all available as sugar libraries. We illustrate the utility of self-applicability by embedding XML Schema, a metalanguage to define XML languages.
Variability-Aware Parsing in the Presence of Lexical Macros and
Christian Kästner, Paolo G. Giarrusso, Tillmann Rendel, Sebastian Erdweg, Klaus Ostermann and Thorsten Berger. In Proceedings of Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 805–824. ACM, 2011. [ bib | pdf ]
In many projects, lexical preprocessors are used to manage different variants of the project (using conditional compilation) and to define compile-time code transformations (using macros). Unfortunately, while being a simply way to implement variability, conditional compilation and lexical macros hinder automatic analysis, even though such analysis would be urgently needed to combat variability-induced complexity. To analyze code with its variability, we need to parse it without preprocessing it. However, current parsing solutions use heuristics, support only a subset of the language, or suffer from exponential explosion. As part of the TypeChef project, we contribute a novel variability-aware parser that can parse unpreprocessed code without heuristics in practicable time. Beyond the obvious task of detecting syntax errors, our parser paves the road for further analysis, such as variability-aware type checking. We implement variability-aware parsers for Java and GNU C and demonstrate practicability by parsing the product line MobileMedia and the entire X86 architecture of the Linux kernel with 6065 variable features.
Library-based Language Extensibility
Sebastian Erdweg. In Participant's Workshop of Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE). 2011. Best Presentation Award (sponsored by Google). [ pdf ]
Abstract Features in Feature Modeling
Thomas Thüm, Christian Kästner, Sebastian Erdweg, and Norbert Siegmund. In Proceedings of Software Product Line Conference (SPLC), pages 191–200. IEEE Computer Society, 2011. [ bib | pdf ]
A software product line is a set of program variants, typically generated from a common code base. Feature models describe variability in product lines by documenting features and their valid combinations. In product-line engineering, we need to reason about variability and program variants for many different tasks. For example, given a feature model, we might want to determine the number of all valid feature combinations or detect specific feature combinations for testing. However, we found that contemporary reasoning approaches can only reason about feature combinations, not about program variants, because they do not take abstract features into account. Abstract features are features used to structure a feature model that, however, do not have any impact at implementation level. Using existing feature-model reasoning mechanisms for product variants leads to incorrect results. We raise awareness of the problem of abstract features for different kinds of analyses on feature models. We argue that, in order to reason about program variants, abstract features should be made explicit in feature models. We present a technique based on propositional formulas to reason about program variants. In practice, our technique can save effort that is caused by considering the same program variant multiple times, for example, in product-line testing.
Featherweight TeX and Parser Correctness
Sebastian Thore Erdweg and Klaus Ostermann. In Proceedings of Software Language Engineering (SLE), volume 6563 of LNCS, pages 397–416. Springer, 2010. [ bib | pdf ]
TeX (and its LaTeX incarnation) is a widely used document preparation system for technical and scientific documents. At the same time, TeX is also an unusual programming language with a quite powerful macro system. Despite the wide range of TeX users (especially in the scientific community), and despite a widely perceived considerable level of "pain" in using TeX, there is almost no research on TeX. This paper is an attempt to change that.
To this end, we present Featherweight TeX, a formal model of TeX which we hope can play a similar role for TeX as Featherweight Java did for Java. The main technical problem which we study in terms of Featherweight TeX is the parsing problem. As for other dynamic languages performing syntactic analysis at runtime, the concept of "static" parsing and its correctness is unclear in TeX and shall be clarified in this paper. Moreover, it is the case that parsing TeX is impossible in general, but we present evidence that parsers for practical subsets exists.
We furthermore outline three immediate applications of our formalization of TeX and its parsing: a macro debugger, an analysis that detects syntactic inconsistencies, and a test framework for TeX parsers.