aboutsummaryrefslogtreecommitdiff

FWD

fwd is a toy language where the idea is that functions/procedures can never return values, instead all computation happens "forward" by heavily utilizing closures.

Building

git submodule update --init --recursive
make

Why?

Informally: Objects have a lot of context to keep track of, are all references valid, who owns what, etc. and there have been lots of approaches to control this complexity. This is yet another experiment, based on the idea that we could syntactically specify which scope an object is valid in.

Effectively, imagine that instead of being handed an object and a bunch of rules about how you're allowed to use it, you just specify what code to run and let some implementer ensure that the context is safe for it.

Eventually I should probably write a better explanation, but this is good enough for now.

How?

Functions don't return values, instead they take some amount of closures to execute. One pretty central feature of fwd is how these closures are expressed syntactically, namely that they can be expressed outside of the function argument list. As an example,

give_me_some_object() => obj1;
insert_something(obj1) => obj2;
do_something_with(obj2);

=> can be thought of as a closure operator, and in this case the body of the closure is the rest of the function, so

give_me_some_object(|obj1|{
    insert_something(obj1, |obj2|{
        do_something_with(obj2);
    }
});

It's possible to specify the exact scope for a closure as well:

check_cond(x) => a1 {
    /* cond was true */
} => a2 {
    /* cond was false */
}
/* runs after check_cond, not 'within' it */

This 'scoping' in combination with move semantics seems like an easy way to syntactically specify both ownership and lifetime of an object, no need for lifetime annotations or extra stuff like that. At least hopefully, this is still a very rough idea and I wouldn't be surprised if I have an excessively optimistic view of this. Will have to play around with the compiler as I develop it. Presumably I'll also want references, but they are as of yet unimplemented.

Examples

See examples/. Currently fwd just outputs the C++, so a 'full' compilation sequence would be something like

fwd examples/uniq.fwd > /tmp/output.cpp
g++ -Ilib -std=c++20 /tmp/output.cpp -o /tmp/output

Motivating case(s)

As an example, let's consider a hashmap. In a regular return-oriented language, you could have four possibilities for inserting elements (assuming move semantics and everything being mutable):

/* (1) owning map, owning element */
new_map = insert(map, element);

/* (2) owning map, reference element */
new_map = insert(map, &element);

/* (3) reference map, owning element */
insert(&map, element);

/* (4) reference map, reference element */
insert(&map, &element);

Of these, cases (1) and (3) is safe. The ownership of element is transferred to map, so the element stays alive as long as the map. (2) and (4) are unsafe, since they depend on how the map is used after the return of the function.

However, what if the function doesn't return, but rather passes the map on to a closure? The cases (1) and (3) are effectively the same, but case (2) now becomes safe, since new_map is guaranteed to be destroyed before insert returns, and &element is owned by someone whose lifetime is longer than insert (unless some unsafe fuckery happens). (4) is still unsafe, technically speaking it would be safe to use within the closure that it calls but any code outside of that is effectively impossible to reason about. It could theoretically be considered safe if the element is removed after the closure exits, so the &map and &element are linked only within the function but not outside it, but I'm not entirely certain how that could be codified in the move semantics. Besides, at that point, I'm not sure there's a significant difference between (2) and (4) in practical applications, but more testing is needed.

Current state

The 'meat' of the project is still unimplemented, that being proper move semantics. There's an initial parser and a small generator for C++ so that I can quickly get up and running and test out programs. There's also lib/fwdlib.hpp which acts as a bridge between C++ and fwd, and any functions that start with fwd_ are actually just C++ functions. This way I can start experimenting with the language even in a very crude state, and I don't have to implement containers or generics myself. At the moment I also kick pretty much all type checking to C++, eventually I'd probably like to take care of it at a higher level.

Future

If (big if) this experiment turns out to be useful on some level I'll probably slowly try to make it more self-reliant. My target would be systems programming, possibly even embedded systems, though I don't yet know if this 'paradigm' is powerful enough or if I might need some escape hatches like unsafe in Rust.

Really, no returns?

Semantically, yes, although it is useful to convert certain closure calls to equivalent return statements where possible. This reduces stack usage if nothing else, and existing compilers are more used to return-oriented programming so it might also be possible that the produced assembly is more efficient.

As an example, see examples/fib.fwd. The naive C++ currently generated (besides not actually compiling due to template instantiation depth, heh) converts what's usually a binary tree of successive calls into a linear sequence that takes up a huge amount of stack memory. However, we can note that a closure call is equivalent to a return when

  1. the function has only one closure parameter
  2. all code paths end in the closure call
  3. the closure call does not reference any local variables by reference (except references that were taken as parameters)
  4. all nodes in paths to closure calls can be converted to equivalent returns

Our fibonacci function matches all these cases (assuming fwd_if would be a statement instead of a function call as it is now, this was just quicker to do), and could theoretically be turned into a more typical return-oriented function for great benefit. Checking these requirements should be relatively straightforward.

We could potentially loosen up these requirements, for example we could allow multiple closure parameters and just return a variant, although the caller now has to pattern match on the result. Taking things even further, we could start treating functions more as macros and just replace the function call with the function body, although this has some other, more fuzzy, requirements like not allowing recursion.

We might even want to require that a function can be turned into a 'returning' function (being pretentious we might prefer calling it 'folding' but anyway) so that it can be interfaced from C code, for example.

Function calls as arguments

At the moment, it's a bit cumbersome to directly pass closure arguments to other functions:

get_some_value() => n;
do_something_with(n) => ...

If we don't really care about the named variable n, it might make more sense to provide syntactic sugar like

do_something_with(get_some_value()) => ...

that translates to the same thing as above, with some internal variable 'name'. This would also require that get_some_value has some limitations, like a single closure argument that consists of a single value. This might be particularly significant for condition checking, like

// assuming this is the form of if statements, as they're still not in the
// grammar
get_some_value() => cond;
if cond {...}

versus

if get_some_value () {...}

Error handling

At the moment there's really no way to handle errors. You could use guard statements, see examples/guard.fwd, but they are rather limited, in that the caller has no idea if the guard was executed or not. You could potentially terminate the program directly, which is fine in some cases, but we might want some way to signal to the caller that something has gone wrong and they might want to choose what happens.

This might be a case where exceptions or something like exceptions could be used, though probably a more limited version of them, maybe something like Haskell's error that only takes strings, or some other value type that cannot possibly fail itself. Monadic (hope I'm using that word right) errors do seem kind of inviting, where if one statement fails out, the error is just propagated upwards.

Code that excplicitly wants to handle its errors would now look something like (assuming !> means "handle error")

do_something() => n {
    /* ok */
}
!> e {
    /* error branch, if omitted just throws the error to whoever called us? */
    println("caught an error :((((");
    println(e); /* if the error value is just a constant string */
    error "now I have overwritten your error, lol";
    /* possibly also attaches file and line numbers or something? Or is that too heavy?*/
}

Some semantics are still a bit unclear, like how multiple calls should be handled:

do_something() => n;
do_something_with(n) => m;
...
!> e {
    /* who does this belong to? Should this be allowed at all? */
    /* at the moment I'm leaning towards binding to the closest function call */
}

It might also be a bit confusing to follow the code of execution:

do_something() => n;
!> e {...}
do_something_with(n) => m;
...

would effectively be something like

err_t err = do_something(|n|{
    do_something_with(n, |m|{
        ...
    });
});
if (err) {...}

Now it's also a bit unclear if one should use error statements, std::optional or std::expected to signify that something failed. I guess it might depend on the context, but having to deal with multiple ways a function might fail is a bit annoying in any language and something I'd like to avoid if possible.

error should still run the local destructors, but if we have something like

create_some_object((*whatever_t) next)
{
    malloc(sizeof(whatever_t)) => ptr;
    init_whatever(ptr) => whatever;
    next(whatever);
    free(whatever);
}

(preferably the pointer from malloc would immediately be thrown into a box or something, that way we could fairly easily handle its lifetime automatically but anyway)

this would now leak memory on each error, whether that be from init_whatever or next. Technically speaking a memory leak is safe, but it should preferably be avoided whenever possible. The fix would be something like

create_some_object((*whatever_t) next)
{
    malloc(sizeof(whatever_t)) => ptr;

    init_whatever(ptr) => whatever;
    !> e {free(ptr); error e;}

    next(whatever);
    !> e {free(ptr); error e;}

    free(ptr);
}

but I'm not entirely certain if this can be automatically detected. In theory I guess we could implement the safe and unsafe attributes and provide wrappers for creating box pointers that immediately fix the issue, but hmm.

Calling closure multiple times

This is a relatively small detail, same as in Rust. Depending on how a closure is used, it can have different properties. If the closure is called just once (probably the most common situation), it is allowed to move captured variables. If called multiple times, it is not, since a previous invocation might've already moved some variables.

Then there are also function pointers that don't have any closure attached to them. Currently I'm thinking of using the syntax

 (int) - closure that can be called once
&(int) - closure that can be called multiple times
*(int) - raw function pointer, can be called multiple times

The semantics for passing these different types around seem to match variables fairly close, where a 'call' is a kind of destructive move. The exact semantics are still unimplemented, though.