1 files changed, 284 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..86d3a66
--- /dev/null
+++ b/README.md
@@ -0,0 +1,284 @@
+# PostHaste in C with a JIT
+
+This is an implementation of the PostHaste language in C with a two-stage
+compilation pipeline, AST -> bytecode -> machine code.
+
+## Building
+
+At its simplest:
+
+```
+make
+```
+
+This will produce a binary called `posthaste`, which takes on input parameter,
+that being the file to execute. You can alternatively give the `Makefile` some
+arguments:
+
++ `RELEASE=<0/1>` toggles optimization flags for the `posthaste` binary between
+  `-O0` and `-O2 -flto=auto`. Optimization flags can be tweaked in
+  `scripts/makefile`.
+  By default, `RELEASE=0`.
+
++ `DEBUG=<0/1>` disables/enables debugging output, such as dumping a textual
+  representation of the AST and bytecode.
+  By default, `DEBUG=1`.
+
++ `ASSERT=<0/1>` disables/enables assertions. Can be used to eek out that little
+  extra bit of performance.
+  By default, `ASSERT=1`.
+
++ `JIT=<0/1>` disables/enables JIT compilation.
+  By default, `JIT=1`.
+
+For example, `make RELEASE=1 DEBUG=0 ASSERT=0 JIT=1` should produce a true speed
+demon.
+
+## Usage
+
+```
+./posthaste <input file>
+```
+
+## Implementation details
+
+I've tried to sprinkle comments here and there where it seemed to make sense,
+but more significantly I've tried to make the source code as readable as
+possible. I guess we'll see how well I succeeded.
+
+A lot of the information in this document can also be found in various
+comments throughout the codebase, and as always, the code itself doesn't lie.
+
+### Scoping
+
+Scopes consist of a vector of visible symbol definitions and a pointer to a
+possible parent scope. Each AST node knows which scope it belongs to (at least
+once semantic checking is done), so reaching a visible symbol from any other
+symbol involves looking through the visible symbols in the same scope. If a
+symbol matching the query string is found, great, return the AST node that
+defined that symbol. Otherwise, repeat the process for the parent scope.
+
+I just used a simple vector of string -> AST node pairs, ideally I would
+probably implement some kind of hashmap, but for small-ish programs I don't
+think there's a noticeable difference in symbol lookup performance.
+
+Each scope just has a single vector of visible symbols, i.e. it wouldn't be possible
+for a variable and a procedure or function to share the same name, but that's
+not an issue as the lexer forcibly separates variable names from procedure and
+function names.
+
+This gives me a fairly nice way to handle local variables and formals, as we can
+create a new scope for a procedure/function and set its parent as the file
+scope, and now we can insert symbols into this child scope and 'shadow' global
+symbols just by returning early from the symbol lookup. Only name clashes
+in the scope we're trying to add a symbol to causes a termination.
+This also works recursively for `until` and `unless` blocks.
+
+The data structures that handle scoping are implemented in
+`include/posthaste/scope.h` and `src/posthaste/scope.c`.
+
+### Semantic checking
+
+Semantic checking is done in two phases. The first phase just inserts all
+function/procedure definitions into the top level scope so they can call
+each other, regardless of the order in which they're defined.
+The second phase pretty much does everything else.
+
+Type checking is fairly easy in PostHaste, as there are really only a handful of
+primitive types, so we can use a simple enum to represent them all. Each AST node
+gets a member that holds the type of the node and the semantic checker places
+the correct type into this member during a depth-first traversal of each AST
+node.
+
+The chosen scope structure is also helpul here. As we're traversing statement
+lists, we add definitions to the scope we're currently in as we reach them. This
+way, we can query if some specific symbol has already been defined, either in
+the current scope or some parent scope. If no symbol matches the query, we can
+report an error.
+
+I do static type checking, so all AST nodes in the program get some type
+assigned to them. Some nodes don't strictly have concrete types, like `TYPE_INT`
+or `TYPE_DATE`, so I have an internal `TYPE_VOID`. For example, a procedure call
+without a return type gets internally converted to a `TYPE_VOID`. Since types
+are just enum values, checking if two types are compatible is just doing an
+equality check. `TYPE_VOID` is a bit tricky, as each node has to be aware of its
+existence, and for example variable definitions must explicitly check for it
+instead of blindly deducing the variable's type based on the expression.
+
+Some notable checks:
+
++ Procedure arguments vs formal parameters. This is fairly easy, as call arguments are
+  stored in a list, as are formal parameters, so after doing type checking for
+  both, we can first check if the lists are the same length and that each AST
+  node at the same index in both lists has the same type. Since formal
+  parameters must have concrete types, the issue mentioned with `TYPE_VOID` is
+  sidestepped.
+
++ Return type checking, specifically the `auto` type. Essentially, when the checker
+  enters a procedure or function definition AST node, it first and foremost sets
+  the return type. If the return type name is `auto`, the type is set to `TYPE_AUTO`.
+  The parser also inserts an implicit `AST_RETURN` node as the body of the
+  function, so that any time we run into a return node, we can check if the type
+  of the return expression matches the parent function/procedure. If the type is
+  `TYPE_AUTO`, the return node sets the return type to be the expression type.
+
++ Boolean types. Comparisons return a `TYPE_BOOL`, which can be used by until
+  and unless and printed with a corresponding "true" or "false" string.
+  `TYPE_INT` is considered truthy, and until/unless accepts either an int or a
+  bool.
+
++ Non-void procedure call as statement. Effectively, wherever there's a
+  statement list, I have a `analyze_statement_list()` function, which also
+  checks that none of the statements are a non-void procedure call. This was a
+  late addition, as I only now realized that such a requirement existed, and
+  feels a little bit hacky but it seems to work.
+
+Semantic checking is implemented in `include/posthaste/check.c` and
+`src/check.c`.
+
+### Lowering
+
+I refer to bytecode generation from AST as lowering. I went with a scheme
+similar to what Lua uses, i.e. the bytecode assumes it has access to two arrays,
+one for global values and one for local ones. The global value array, as the
+name suggests, contains the global values in the program. The local value array
+is referred to as a stack, as both local variables and temporary values exists
+in it. As a slight optimization, function calls just move an internal stack
+pointer register up by some amount, so the same stack is shared between all
+functions.
+
+It's worth noting that at this point, there's not really any relevant
+differences between procedures and functions, so from here on I'm using them
+interchangeably unless specifically referring to the PostHaste
+function/procedure differentiation.
+
+The bytecode is of a fairly common three-value type, that is each instruction
+consists of an opcode, one output location and two input locations. Not all
+locations must be used in each instruction. Each location effectively encodes
+whether it points to the global array or the stack, and an index into the array.
+I also added a separate `CONST` instruction that pushes a constant value to an
+output location, so for example `2 + 3` would look something like this:
+
+```
+CONST l1, 2 // write 2 to stack location 1
+CONST l2, 3 // write 3 to stack location 2
+ADD g0, l1, l2 // load stack locations 1 and 2 and add them together,
+               // write the result to the global location 0
+```
+
+The lowering effectively just has to select where to place values into the stack
+and/or global array, which is surprisingly easy to do. Since we don't allow
+variable definitions as expressions, each variable definition just pushes a
+virtual stack pointer up by one slot, effectively reserving that slot for
+itself. Expressions place their value into the current virtual stack pointer
+location, and in the case of binary operations like `ADD`, `SUB`, etc, they
+increment the stack pointer temporarily while lowering the left/righthand side,
+and then set the stack pointer back to where it was.
+
+Each stack location is 64 bits wide. For ease of implementation, `TYPE_DATE` and
+`TYPE_INT` are both encoded as 64 bit integers, as are pointers to strings and
+bools. The 64 bit values are just interpreted differently based on the
+instructions, which effectively means we must have a working static type
+checker.
+
+Argument passing is done by emitting `ARG` instructions, which push the location
+they're given to a 'virtual' argument stack. Getting the return value
+correspondingly has a `RETVAL` instruction, which moves the returned value from
+a 'virtual' return register. I say 'virtual' because the JIT compiler gets rid
+of these constructs, but they are very real in the bytecode interpreter. In
+general, I tried to opt for making the instruction set easier to compiler later,
+rather than make the interpretation fast.
+
+Lowering is implemented in `include/posthaste/lower.h` and `src/lower.c`.
+
+### Interpreting
+
+The bytecode interpreter is fairly straightforward. It has a number of internal
+registers, `pc`, `sp`, and the previously mentioned virtual argument stack. Each
+procedure call in PostHaste executes the interpreter recursively, copying
+arguments from the stack to where the formals are defined to be. Formal parameters
+must be the first consequtive stack locations in the order they are defined in
+PostHaste, which makes argument handling fairly easy to implement.
+
+The interpreter uses computed gotos to try and reduce execution overhead. By
+using labels and gotos, we can avoid using a while loop with a switch statement
+within it, potentially saving a couple jumps between bytecode instructions.
+The bytecode instruction set itself is not designed to be particularly fast,
+as previously stated. Another example of this is that locations have to be
+decoded at runtime. For example, when an instruction requests a value from some
+location, the location has to be parsed to check whether we're dealing with
+a location in the stack or in the global array, incurring an extra conditional
+for each location.
+
+Recursion is handled just as any other function call.
+
+The interpreter is implemented in `include/posthaste/interpret.h` and
+`src/interpret.c`.
+
+### Compiling
+
+The compiler is fairly straightforward. I chose to use Lightening, a library
+I've made some contributions to (ported it to MIPS and PowerPC, among other
+things), but the maintainer has been awol for a little while so I'm using my
+fork of it.
+
+Lightening is more or less a very lightweight portable assembler, that gives the
+user a number of virtual instructions and registers that are more or less 1:1
+mapped to machine instructions. This means that a user would have to implement
+code generation and register allocation by hand, but this project is
+simple enough that we can get all of that for basically nothing.
+
+Each bytecode instruction is compiled 'as if' it was being
+interpreted, so loading data from either the stack or global array, doing some
+operation to that data, and writing something back. This means each instruction
+is kind of a self-contained thing, and there are no dependencies between
+registers. This means that as long as we can implement each instruction with the
+registers we have, we're golden. And we practically only need three registers,
+as the three-value instructions can be implemented using two hardware registers, and
+keeping the global array pointer in a third register. The stack is just, well,
+the stack, so we use the stack pointer to access the stack. The stack pointer is
+automatically handled by Lightening, making our lives easier.
+
+I mentioned in my email that I JIT-compiled all functions regardless of if they
+were called or not, but that's not true anymore. I added a simple scheme that
+only run the JIT compiler on functions that get called, turned out to be easy
+enough as that information is 'implicitly' encoded in the bytecode. Just wait
+until you run into a `CALL` instruction, check if the corresponding `fn` has
+already been compiled. If not, compile, otherwise, continue compilation of your
+current function.
+
+The JIT compilation is implemented in `include/posthaste/compile.h` and
+`src/compile.c`.
+
+Machine code execution is trivial, just call the functions as if they were any
+other function and watch the magic happen. There's unfortunately not really any
+good ways to view the generated machine code, as that would require some
+external disassembler library, which I didn't want to add a dependency to.
+Recursion is handled just as any other function call. Machine code execution is
+implemented in `include/posthaste/execute.h` and `src/execute.c`.
+
+# Thoughts on assignment
+
+Obviously I went a bit overkill so not sure how representative my opinions are
+but I really enjoyed working on this. The language itself is simple enough that
+no individual phase felt like a slog, but is enough like a real language that it
+really was quite satisfying seeing things work.
+
+Some semantic things were a little bit weird to implement, for example the
+requirement that calls to non-void functions/procedures shouldn't be allowed as
+statements. That one I had to explicitly check for in code, it would've been
+easier to just ignore it as it didn't really 'naturally' fit into the structure
+of the checker. The programs themselves would still work.
+
+The semantic checker was probably the most difficult bit to be honest, I'm still
+not entirely sure I'm not overlooking some corner case somewhere but the
+bytecode lowering and JIT compilation sort of just fell into place. I have
+played around with some bytecode-adjacent things, and I think I made the right
+choice with regard to the design complexity.
+
+The `get_a0()` etc. thing seemed to work quite
+nicely, I might try using it in other projects. It's not anything I came up
+with, I believe GCC does something similar, but for example `ast_visit()` turned
+out to be way easier to implement with just a fixed `struct ast`. If I had used
+a separate struct for each AST node type or mangled them all into a `union`, I'm
+fairly sure I would've quintupled the size of this project.