Son of Polyglot
From PolyglotWiki
Below are some ideas we've had for the next version of Polyglot (aka Son of Polyglot). Feel free to add to the list, or share your opinions.
Contents |
Scheduler
Improving the scheduler and interaction with the scheduler to help debuggability and to enforce (or at least better document) invariants between the ASTs and the type system. The current compiler control flow is very difficult to understand.
- debuggability of goal processing flow
- debugging tools for current framework
- queries: how did we get here (this goal)?
- making clear/declarative relationships between:
- goals
- processing
- data
- invariants
- reasoning about compiler control flow
- failures or missing/dynamically-discovered dependencies
- consider implementing passes as Ant tasks (just a thought)
- transformational passes either shouldn't fail, or must have appropriate functional/transactional guarantees on compiler state
- Making it easier to create new ASTs that are "up-to-date" with respect to the current pass.
Mutability issues
- mutability for types
- "committing" changes to TypeSystem
- transactional type system OR functional type system (efficiency?)
- use handles for Type's to introduce level of indirection between client refs and Type objects themselves?
- semi-immutability of AST's
- immutable AST nodes with monotonically-mutable annotations?
- "monotonic fields"?
- maybe explicitly declare producers of any given monotonic bit of data?
- support for rewriting structure of AST's (as distinct from monotonicity of "derived characteristics" of AST nodes - two separate spaces of data)
- simple support for creating new AST nodes, knitting them into an existing AST, and ensuring the state of processing of new node is consistent with that of the new parent AST.
- Ensure checking passes, when rerun don't report errors twice.
- In general, need mechanisms for enforcing idempotency of compiler passes, or marking bits of the AST/compiler state to allow later instances of a pass to know they shouldn't do that work again.
Error handling
- exception-throwing strategy
- uniform enough?
- documentation?
- annotations on methods? static checking?
- less reliance on exceptions
- better support for multiple errors per node
- use error numbers rather than/in addition to strings to help unit testing when error messages change
- move error strings to a factory so extensions can change error messages
- add an Error node so compilation can continue after an error is found
- error messages should be snarkier
Polyglot as a service
Better support for using Polyglot as service (esp. inside Eclipse)
- Better error handling and recovery. Polyglot gives up too easily, leading to a lot of spurious warnings when used in Eclipse.
- polyglot should be reentrant (no static fields caching references to things like the current TypeSystem, the ErrorQueue, and so on)
- granularity of processing (particularly for interactive use, e.g. in an Eclipse IDE)
- compilation-unit: fails to process subsequent source files when errors in one source file
- fine-grained: in interactive context, should process as much as possible, e.g. if can't type-check one node, continue with siblings
- Better support for incremental compilation, for example allowing passes to be run over methods rather than over entire source files.
Boilerplate
Less boilerplate code for ASTs, perhaps automatically generating using a pre-processor getters, setters, reconstruct methods, visitChildren methods, etc. But not actually requiring that the code be pre-processed.
- Possibly add Polyglot AST support to JikesPG auto-gen ASTs?
Java 5
- Java 5 extension. At a minimum, allowing Java 5 class files. At most, supporting it in the base compiler. I prefer leaving it as an extension to avoid code cluttered with ifdefs (or the Java equivalent).
- Using Java 5 as the meta language.
Back end support
- Figuring out some way to have extensible bytecode would be interesting.
- Making the current C back end work would be cool (but it should be a C back end, I think, and should use ASTs instead of just dumping out text directly. The wrong way to do this would be to invent a whole new set of parallel C AST nodes because the current Java AST nodes could mostly be used for C too.)
Separate compilation/serialization
- reducing the size of serialized type info
- Use Java 5 annotations for additional type information, rather than serializing the type information and storing in fields. As a start, can just store the serialized type information in a class annotation.
Code cleanup
- cleanup literals: introduce IntLit and LongLit interfaces, ditto for Float and Double
- cleanup enums: remove and use subtyping? Java 5 enums? ints? are enums extensible in a subclass?
- make Flags a Collection<Flag>. Make Public, Private, etc subclasses of Flag.
- don't require a position in NodeFactory methods. Would this help with Java 5 annotation implementation?
- add asserts, esp to constructors to check well-formedness
Performance
- faster compilation
- don't discard CFG after every data flow pass
- don't discard context after each pass
Type checking
- pass context into subtyping and type equivalence checks
- Remove assumptions that only ReferenceTypes have members, or all ReferenceTypes are subtypes of Object. Remove assumptions that primitives, and String do not have subtypes.
- Don't have back pointer from type objects to TS. Maybe pass TS in to checker methods instead. Want to allow subclassing/delegation of TS so we can just rerun typechecker with a different type system to have it check annotations.
- Factor out type factory from TS.
Distribution
- change extension skeleton so extensions aren't nested in the polyglot source tree [done]
- move utility code not used by the base compiler out of base compiler to help deployment
- better regression testing, junit support
Code generation
- support passing comments through from lexer to codegen, esp Javadoc
- generate/transform Javadoc with generated/transformed code?
- generate line number information in comments (extend CodeWriter)
- true quasi-quoting for both Java and extensions
- perhaps a tool to generate NodeFactory calls from Java code. The generated code can be pasted into the compiler, or just linked in if massaged with a script. Call it off-line quasi-quoting.
Other stuff
- replace copy with clone, but make public
- enable replacement of post compiler pass so can invoke a C backend
- don't use toString for pretty printing [done]
- GLR parser generator. Finish Ibex and ship with polyglot.
- add factory method for producing an ambiguous node from a qualified name [done]
- introduce a new Node StmtList, representing a list of statements, but not lexically scoped like a block is. This would simplify translation where the translation of a single statement may be many statements. Perhaps, decouple Block with scoping.
- allow code generation without type information to make it easier to generate ASTs [done]
- Distinguish between declarations and uses of fields, locals, etc in types. In some extensions, declarations are templates and all uses substitute into the template to produce a type. Maybe ParsedClassType shouldn't be a Type, but ClassType should delegate to it, possibly performing substitutions. Sys resolver should return ParsedClasses, not Types?
- Simplify TypeNode. Should ArrayTypeNode be ambiguous.
- Unify ASTs and Types to eliminate another parallel class hierarchy. Maybe ParsedClass should be an AST node, but Types are not?
- Should local classes have refs to their containing CodeInstance? Maybe pass CodeInstance into TypeBuilder.pushCode to allow extensions to use it if they need to.
- Provide utilities for wrapping fields in methods, etc. Use anon classes for , =, etc.
[[ e.f ]] = (new Object() { int eval(C target) { target.setF(target.getF() 1); } }).eval( [[ e ]] )
- Fix constant check/type check dependency. Merge the passes?
- Cleanup interface inconsistencies between print and translate. e.g., Node.translate(w, tr), Type.translate(Resolver), Type.print(CodeWriter), Flags.translate().
- Remove CodeWriter from translate()/prettyPrint() signature.
- Separate visitor traversal and dispatch so Translator and PrettyPrinter can share dispatch code with NodeVisitor.
- Use GoF visitor pattern if possible.
- Fix so don't need to call through del().
- Constructors with factory methods should be protected.
- Remove TypeObject.position(). It makes no sense for things like ArrayType. This is a case where we need to distinguish between TypeObjects and declarations.
- Remove toReference() etc. Use casts instead.
- Better TypeObject comparators. Sometimes want to map typeEquals over a FieldInstance or MethodInstance. This applies to other operations as well. Do we need TypeObject visitors?
- Generalize Loop so other statements can contain break and continue statements. For example, atomic. Refactor CFGBuilder.
- Output java files in their own directory to avoid overwrites when compiling java files.
Dependent types
- Allow AST nodes embedded in Types to avoid parallel class hierarchies like in Jet, Jif, X10.
- Add formal names to ProcedureInstance to help with dependent types.
- Better hooks for substitution when looking up methods and fields.
Licensing
- Get a clean-room grammar so we can distribute all of polyglot under EPL as well as LGPL.
- Should we just change the license to BSD or MIT? Need to get approval from all developers, though.
Documentation
- Write a manual.
- Document intent of prettyPrint vs. translate.
- Document goals and what's expected before running them.
- More complete javadoc.
