FWD
fwd
is a toy language where the idea is that functions/procedures can never
return values, instead all computation happens "forward" by heavily utilizing
closures.
Building
git submodule update --init --recursive
make
Why?
Informally: Objects have a lot of context to keep track of, are all references valid, who owns what, etc. and there have been lots of approaches to control this complexity. This is yet another experiment, based on the idea that we could syntactically specify which scope an object is valid in.
Effectively, imagine that instead of being handed an object and a bunch of rules about how you're allowed to use it, you just specify what code to run and let some implementer ensure that the context is safe for it.
Eventually I should probably write a better explanation, but this is good enough for now.
How?
Functions don't return values, instead they take some amount of closures to
execute. One pretty central feature of fwd
is how these closures are expressed
syntactically, namely that they can be expressed outside of the function
argument list. As an example,
give_me_some_object() => obj1;
insert_something(obj1) => obj2;
do_something_with(obj2);
=>
can be thought of as a closure operator, and in this case the body of the
closure is the rest of the function, so
give_me_some_object(|obj1|{
insert_something(obj1, |obj2|{
do_something_with(obj2);
}
});
It's possible to specify the exact scope for a closure as well:
check_cond(x) => a1 {
/* cond was true */
} => a2 {
/* cond was false */
}
/* runs after check_cond, not 'within' it */
This 'scoping' in combination with move semantics seems like an easy way to syntactically specify both ownership and lifetime of an object, no need for lifetime annotations or extra stuff like that. At least hopefully, this is still a very rough idea and I wouldn't be surprised if I have an excessively optimistic view of this. Will have to play around with the compiler as I develop it. Presumably I'll also want references, but they are as of yet unimplemented.
Examples
See examples/
. Currently fwd
just outputs the C++, so a 'full' compilation
sequence would be something like
fwd examples/uniq.fwd > /tmp/output.cpp
g++ -Ilib -std=c++20 /tmp/output.cpp -o /tmp/output
Motivating case(s)
As an example, let's consider a hashmap. In a regular return-oriented language, you could have four possibilities for inserting elements (assuming move semantics and everything being mutable):
/* (1) owning map, owning element */
new_map = insert(map, element);
/* (2) owning map, reference element */
new_map = insert(map, &element);
/* (3) reference map, owning element */
insert(&map, element);
/* (4) reference map, reference element */
insert(&map, &element);
Of these, cases (1) and (3) is safe. The ownership of element
is transferred
to map
, so the element
stays alive as long as the map
.
(2) and (4) are unsafe, since they depend on how the map is used after the return of the
function.
However, what if the function doesn't return, but rather passes the map on to a
closure? The cases (1) and (3) are effectively the same, but case (2) now
becomes safe, since new_map
is guaranteed to be destroyed before insert
returns, and &element
is owned by someone whose lifetime is longer than
insert
(unless some unsafe fuckery happens). (4) is still unsafe,
technically speaking it would be safe to use within the closure that it calls
but any code outside of that is effectively impossible to reason about. It could
theoretically be considered safe if the element is removed after the closure
exits, so the &map
and &element
are linked only within the function but not
outside it, but I'm not entirely certain how that could be codified in the
move semantics. Besides, at that point, I'm not sure there's a significant difference
between (2) and (4) in practical applications, but more testing is needed.
Current state
The 'meat' of the project is still unimplemented, that being proper move semantics.
There's an initial parser and a small generator for C++ so that I can quickly
get up and running and test out programs. There's also lib/fwdlib.hpp
which
acts as a bridge between C++ and fwd
, and any functions that start with fwd_
are actually just C++ functions. This way I can start experimenting with the
language even in a very crude state, and I don't have to implement containers or
generics myself. At the moment I also kick pretty much all type checking to C++,
eventually I'd probably like to take care of it at a higher level.
Future
If (big if) this experiment turns out to be useful on some level I'll probably
slowly try to make it more self-reliant. My target would be systems programming,
possibly even embedded systems, though I don't yet know if this 'paradigm' is
powerful enough or if I might need some escape hatches like unsafe
in Rust.
Really, no returns?
Semantically, yes, although it is useful to convert certain closure calls to equivalent return statements where possible. This reduces stack usage if nothing else, and existing compilers are more used to return-oriented programming so it might also be possible that the produced assembly is more efficient.
As an example, see examples/fib.fwd
. The naive C++ currently generated
(besides not actually compiling due to template instantiation depth, heh)
converts what's usually a binary tree of successive calls into a linear sequence
that takes up a huge amount of stack memory. However, we can note that a closure
call is equivalent to a return when
- the function has only one closure parameter
- all code paths end in the closure call
- the closure call does not reference any local variables by reference (except references that were taken as parameters)
- all nodes in paths to closure calls can be converted to equivalent returns
Our fibonacci function matches all these cases (assuming fwd_if
would be a
statement instead of a function call as it is now, this was just quicker to do),
and could theoretically be turned into a more typical return-oriented function
for great benefit. Checking these requirements should be relatively
straightforward.
We could potentially loosen up these requirements, for example we could allow multiple closure parameters and just return a variant, although the caller now has to pattern match on the result. Taking things even further, we could start treating functions more as macros and just replace the function call with the function body, although this has some other, more fuzzy, requirements like not allowing recursion.
We might even want to require that a function can be turned into a 'returning' function (being pretentious we might prefer calling it 'folding' but anyway) so that it can be interfaced from C code, for example.
Function calls as arguments
At the moment, it's a bit cumbersome to directly pass closure arguments to other functions:
get_some_value() => n;
do_something_with(n) => ...
If we don't really care about the named variable n
, it might make more sense
to provide syntactic sugar like
do_something_with(get_some_value()) => ...
that translates to the same thing as above, with some internal variable 'name'.
This would also require that get_some_value
has some limitations, like a
single closure argument that consists of a single value. This might be
particularly significant for condition checking, like
// assuming this is the form of if statements, as they're still not in the
// grammar
get_some_value() => cond;
if cond {...}
versus
if get_some_value () {...}
Error handling
At the moment there's really no way to handle errors. You could use guard
statements, see examples/guard.fwd
, but they are rather limited, in that the
caller has no idea if the guard was executed or not. You could potentially
terminate the program directly, which is fine in some cases, but we might want
some way to signal to the caller that something has gone wrong and they might
want to choose what happens.
This might be a case where exceptions or something like exceptions could be
used, though probably a more limited version of them, maybe something like
Haskell's error
that only takes strings, or some other value type that cannot
possibly fail itself. Monadic (hope I'm using that word right) errors do seem
kind of inviting, where if one statement fails out, the error is just propagated
upwards.
Code that excplicitly wants to handle its errors would now look something like
(assuming !>
means "handle error")
do_something() => n {
/* ok */
}
!> e {
/* error branch, if omitted just throws the error to whoever called us? */
println("caught an error :((((");
println(e); /* if the error value is just a constant string */
error "now I have overwritten your error, lol";
/* possibly also attaches file and line numbers or something? Or is that too heavy?*/
}
Some semantics are still a bit unclear, like how multiple calls should be handled:
do_something() => n;
do_something_with(n) => m;
...
!> e {
/* who does this belong to? Should this be allowed at all? */
/* at the moment I'm leaning towards binding to the closest function call */
}
It might also be a bit confusing to follow the code of execution:
do_something() => n;
!> e {...}
do_something_with(n) => m;
...
would effectively be something like
err_t err = do_something(|n|{
do_something_with(n, |m|{
...
});
});
if (err) {...}
Now it's also a bit unclear if one should use error statements,
std::optional
or std::expected
to signify that something failed. I guess
it might depend on the context, but having to deal with multiple ways a function
might fail is a bit annoying in any language and something I'd like to avoid if
possible.
error
should still run the local destructors, but if we have something like
create_some_object((*whatever_t) next)
{
malloc(sizeof(whatever_t)) => ptr;
init_whatever(ptr) => whatever;
next(whatever);
free(whatever);
}
(preferably the pointer from malloc
would immediately be thrown into a box
or something, that way we could fairly easily handle its lifetime automatically
but anyway)
this would now leak memory on each error
, whether that be from init_whatever
or next
. Technically speaking a memory leak is safe, but it should preferably be avoided whenever possible.
The fix would be something like
create_some_object((*whatever_t) next)
{
malloc(sizeof(whatever_t)) => ptr;
init_whatever(ptr) => whatever;
!> e {free(ptr); error e;}
next(whatever);
!> e {free(ptr); error e;}
free(ptr);
}
but I'm not entirely certain if this can be automatically detected. In theory I
guess we could implement the safe
and unsafe
attributes and provide wrappers
for creating box
pointers that immediately fix the issue, but hmm.
Calling closure multiple times
This is a relatively small detail, same as in Rust. Depending on how a closure is used, it can have different properties. If the closure is called just once (probably the most common situation), it is allowed to move captured variables. If called multiple times, it is not, since a previous invocation might've already moved some variables.
Then there are also function pointers that don't have any closure attached to them. Currently I'm thinking of using the syntax
(int) - closure that can be called once
&(int) - closure that can be called multiple times
*(int) - raw function pointer, can be called multiple times
The semantics for passing these different types around seem to match variables fairly close, where a 'call' is a kind of destructive move. The exact semantics are still unimplemented, though.