Getting Started

There are many different things that can be extended in ExtendJ. Consequently, it may be difficult to figure out where and how to make changes in order to achieve your goal. This page tries to help by giving a short introduction to different extension points in ExtendJ.

For a high-level overview of the design of ExtendJ, and possible extension points, take a look at these presentation slides.

This page is a work in progress. More information is added over time.

Template Projects

Use the following minimal template projects as starting points for your extension:

The following projects are small example extensions that are suitable to look at for inspiration when developing your extension:

  • String Repetition - Changes the behaviour of the multiplication expression and allows you to multiply a string, like in Python. This example uses desugaring for code generation.
  • Spread Operator - A partial implementation of the spread operator from the Groovy language. This extension also adds a print-assign operator. The extension was developed by students for a compiler course project.

Getting used to JastAdd

ExtendJ is built with the JastAdd metacompiler. Working with ExtendJ requires knowing a bit about how JastAdd works, and, in particular, how Reference Attribute Grammars (RAGs) work. There are many resources for learning about JastAdd and RAGs. Here are a few good references:

The notation for JastAdd attributes is documented in the JastAdd reference manual.

Extension Points

The following sections of this page show different extension points that are available to extensions.

In general, we have extension points for the following parts of the compiler:

  • Scanner and Parser.
  • Analysis.
  • Code generation.

The scanner and parser are quite limited in their extensibility. Much more freedom is available for modifying the analysis and code generation.

Extending Syntax

Most language extensions add some new syntax elements such as new operators or statements. Adding syntax requires modifying the scanner and/or parser specifications.

Here is an example of how to add scanning and parsing for a simple version of Groovy's spread operator to ExtendJ:

Scanner file (Spread.flex):

<YYINITIAL> {
  "*." { return sym(Terminals.SPREAD); }
}

Parser file (Spread.parser):

primary_no_new_array =
    simple_name.p SPREAD simple_name.id {: return new Spread(p, id); :}
  | primary.p SPREAD simple_name.id     {: return new Spread(p, id); :}
  ;

Terminals (also called tokens) are implicitly generated from the parser specification. Any identifier in a parser rule that is not matched by a nonterminal will be added to the terminals set. The unique identifier for each terminal is accessed in the scanner via the class JavaParser.Terminals which is generated by Beaver.

Note that the semantic actions in the parser file build a new AST node called Spread. This is a new node representing the spread operator, and it must be added via an AST file:

Abstract grammar (Spread.ast):

Spread : Access ::= Qualifier:Expr Access;

More additions need to be made to handle type analysis and error checking for the new operator.

Extending Type Analysis

Adding a spread expression to Java requires defining the type of a spread expression so that it can work in the existing type analysis framework. The type of each expression in ExtendJ is defined by the type() attribute. We need to add a new equation for this attribute on the Spread type:

eq Spread.type() = ...;

The type of a spread expression is a collection type containing elements of the type of the variable or method on the right hand side of the spread operator. Here is an example of computing this using an attribute:

eq Spread.type() {
  TypeDecl collection = lookupType("java.util", "Collection");
  GenericTypeDecl generic = collection.asGenericType();
  if (generic != null) {
    TypeDecl elementType = getAccess().type();
    if (elementType.boxed().isUnknown()) {
      return generic.lookupParTypeDecl(Collections.singletonList(elementType));
    } else {
      return generic.lookupParTypeDecl(Collections.singletonList(elementType.boxed()));
    }
  } else {
    return unknownType();
  }
}

The code uses a helper attribute asGenericType(), which we introduce here just to make it easier to handle the generic collection type without a type cast:

syn GenericTypeDecl TypeDecl.asGenericType() = null;
eq GenericTypeDecl.asGenericType() = this;

The other attributes used in Spread.type() are available in the core ExtendJ type analysis. The lookupType attribute looks up a globally visible type. The GenericTypeDecl.lookupParTypeDecl() attribute is used to define a specific a parameterization of a generic type. These parameterizations are represented by nonterminal attributes on GenericTypeDecl.

Extending Code Generation

There are two ways of extending code generation in ExtendJ: direct bytecode generation, and desugaring. Bytecode generation is the more powerful alternative, but it is also quite involved, especially if you need to allocate new local variable indices. Therefore, we strongly recommend that you try desugaring first. The desugaring approach may be used to quickly prototype code generation for a new language feature by mapping the new construct to already-existing language constructs.

For example, in the string-repeat example, string repetition is achieved by mapping string multiplication to a plain old Java for-loop:

a = "-"*10;

becomes

StringBuilder buf = new StringBuilder();
for (int i = 0; i < 10; ++i) {
  buf.append("-");
}
a = buf.toString();

When you use desugaring, it is very convenient to use the JavaDumpTree program in ExtendJ to print the AST for the desired generated code. That is, you write a small Java program in plain old Java code which is the intended mapping from your new construct. Then, run this command (assuming the file is named Desugared.java:

java -cp extendj.jar org.extendj.JavaDumpTree Desugared.java

This will print something like this:

ForStmt
  List
    VarDeclStmt
      Modifiers
        List
      PrimitiveTypeAccess Package="@primitive" ID="int"
      List
        VariableDeclarator ID="i"
          List
          Opt
            IntegerLiteral LITERAL="0"
  Opt
    LTExpr
      VarAccess ID="i"
      IntegerLiteral LITERAL="10"
  List
    ExprStmt
      PreIncExpr
        VarAccess ID="i"
  Block
    List
      ExprStmt
        Dot
          VarAccess ID="buf"
          MethodAccess ID="append"
            List
              StringLiteral LITERAL="-"

Now this can be translated into an attribute that builds the same thing. See the string-repeat example for the details.