aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: c3ecf85007f2d4fac00d690329519e06407ff6b0 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
# FWD

`fwd` is a toy language where the idea is that functions/procedures can never
return values, instead all computation happens "forward" by heavily utilizing
closures.

## Why?

Informally: Objects have a lot of context to keep track of, are all references
valid, who owns what, etc. and there have been lots of approaches to control
this complexity. This is yet another experiment, based on the idea that we could
syntactically specify which scope an object is valid in.

Effectively, imagine that instead of being handed an object and a bunch of rules
about how you're allowed to use it, you just specify what code to run  and let
some implementer ensure that the context is safe for it.

Eventually I should probably write a better explanation, but this is good enough
for now.

## How?

Functions don't return values, instead they take some amount of closures to
execute. One pretty central feature of `fwd` is how these closures are expressed
syntactically, namely that they can be expressed outside of the function
argument list. As an example,

```
give_me_some_object() => obj1;
insert_something(obj1) => obj2;
do_something_with(obj2);
```

`=>` can be thought of as a closure operator, and in this case the body of the
closure is the rest of the function, so
```
give_me_some_object(|obj1|{
    insert_something(obj1, |obj2|{
        do_something_with(obj2);
    }
});
```

It's possible to specify the exact scope for a closure as well:
```
check_cond(x) => a1 {
    /* cond was true */
} => a2 {
    /* cond was false */
}
/* runs after check_cond, not 'within' it */
```

This 'scoping' in combination with move semantics seems like an easy way to
syntactically specify both ownership and lifetime of an object,
no need for lifetime annotations or extra stuff like that.
At least hopefully, this is still a very rough idea and I wouldn't be surprised
if I have an excessively optimistic view of this. Will have to play around with
the compiler as I develop it. Presumably I'll also want references, but they are
as of yet unimplemented.


## Examples

See `examples/`. Currently `fwd` just outputs the C++, so a 'full' compilation
sequence would be something like

```
fwd examples/uniq.fwd > /tmp/output.cpp
g++ -Ilib -std=c++20 /tmp/output.cpp -o /tmp/output
```

### Motivating case(s)

As an example, let's consider a hashmap. In a regular return-oriented language,
you could have four possibilities for inserting elements (assuming move
semantics and everything being mutable):
```
/* (1) owning map, owning element */
new_map = insert(map, element);

/* (2) owning map, reference element */
new_map = insert(map, &element);

/* (3) reference map, owning element */
insert(&map, element);

/* (4) reference map, reference element */
insert(&map, &element);
```

Of these, cases (1) and (3) is safe. The ownership of `element` is transferred
to `map`, so the `element` stays alive as long as the `map`.
(2) and (4) are unsafe, since they depend on how the map is used after the return of the
function.

However, what if the function doesn't return, but rather passes the map on to a
closure? The cases (1) and (3) are effectively the same, but case (2) now
becomes safe, since `new_map` is guaranteed to be destroyed before `insert`
returns, and `&element` is owned by someone whose lifetime is longer than
`insert` (unless some unsafe fuckery happens). (4) is still unsafe,
*technically* speaking it would be safe to use within the closure that it calls
but any code outside of that is effectively impossible to reason about. It could
theoretically be considered safe if the element is removed after the closure
exits, so the `&map` and `&element` are linked only within the function but not
outside it, but I'm not entirely certain how that could be codified in the
move semantics. Besides, at that point, I'm not sure there's a significant difference
between (2) and (4) in practical applications, but more testing is needed.

## Current state

The 'meat' of the project is still unimplemented, that being proper move semantics.
There's an initial parser and a small generator for C++ so that I can quickly
get up and running and test out programs. There's also `lib/fwdlib.hpp` which
acts as a bridge between C++ and `fwd`, and any functions that start with `fwd_`
are actually just C++ functions. This way I can start experimenting with the
language even in a very crude state, and I don't have to implement containers or
generics myself. At the moment I also kick pretty much all type checking to C++,
eventually I'd probably like to take care of it at a higher level.

## Future

If (big if) this experiment turns out to be useful on some level I'll probably
slowly try to make it more self-reliant. My target would be systems programming,
possibly even embedded systems, though I don't yet know if this 'paradigm' is
powerful enough or if I might need some escape hatches like `unsafe` in Rust.

### Really, no returns?

Semantically, yes, although it is useful to convert certain closure calls to
equivalent return statements where possible. This reduces stack usage if nothing
else, and existing compilers are more used to return-oriented programming so it
might also be possible that the produced assembly is more efficient.

As an example, see `examples/fib.fwd`. The naive C++ currently generated
(besides not actually compiling due to template instantiation depth, heh)
converts what's usually a binary tree of successive calls into a linear sequence
that takes up a huge amount of stack memory. However, we can note that a closure
call is equivalent to a return when

1. the function has only one closure parameter
2. all code paths end in the closure call
3. the closure call does not reference any local variables by reference
    (except references that were taken as parameters)
4. all nodes in paths to closure calls can be converted to equivalent returns

Our fibonacci function matches all these cases (assuming `fwd_if` would be a
statement instead of a function call as it is now, this was just quicker to do),
and could theoretically be turned into a more typical return-oriented function
for great benefit. Checking these requirements should be relatively
straightforward.

We could potentially loosen up these requirements, for example we could allow
multiple closure parameters and just return a variant, although the caller now
has to pattern match on the result. Taking things even further, we could start
treating functions more as macros and just replace the function call with the
function body, although this has some other, more fuzzy, requirements like not
allowing recursion.

We might even want to require that a function can be turned into a 'returning'
function (being pretentious we might prefer calling it 'folding' but anyway) so
that it can be interfaced from C code, for example.

### Function calls as arguments

At the moment, it's a bit cumbersome to directly pass closure arguments to other
functions:
```
get_some_value() => n;
do_something_with(n) => ...
```

If we don't really care about the named variable `n`, it might make more sense
to provide syntactic sugar like
```
do_something_with(get_some_value()) => ...
```

that translates to the same thing as above, with some internal variable 'name'.
This would also require that `get_some_value` has some limitations, like a
single closure argument that consists of a single value. This might be
particularly significant for condition checking, like
```
// assuming this is the form of if statements, as they're still not in the
// grammar
get_some_value() => cond;
if cond {...}
```

versus
```
if get_some_value () {...}
```

### Error handling

At the moment there's really no way to handle errors. You could use guard
statements, see `examples/guard.fwd`, but they are rather limited, in that the
caller has no idea if the guard was executed or not. You could potentially
terminate the program directly, which is fine in some cases, but we might want
some way to signal to the caller that something has gone wrong and they might
want to choose what happens.

This might be a case where exceptions or something like exceptions could be
used, though probably a more limited version of them, maybe something like
Haskell's `error` that only takes strings, or some other value type that cannot
possibly fail itself. Monadic (hope I'm using that word right) errors do seem
kind of inviting, where if one statement fails out, the error is just propagated
upwards.

Code that excplicitly wants to handle its errors would now look something like
(assuming `!>` means "handle error")
```
do_something() => n {
    /* ok */
}
!> e {
    /* error branch, if omitted just throws the error to whoever called us? */
    println("caught an error :((((");
    println(e); /* if the error value is just a constant string */
    error "now I have overwritten your error, lol";
    /* possibly also attaches file and line numbers or something? Or is that too heavy?*/
}
```

Some semantics are still a bit unclear, like how multiple calls should be
handled:

```
do_something() => n;
do_something_with(n) => m;
...
!> e {
    /* who does this belong to? Should this be allowed at all? */
    /* at the moment I'm leaning towards binding to the closest function call */
}
```


It might also be a bit confusing to follow the code of execution:
```
do_something() => n;
!> e {...}
do_something_with(n) => m;
...
```

would effectively be something like
```
err_t err = do_something(|n|{
    do_something_with(n, |m|{
        ...
    });
});
if (err) {...}
```

Now it's also a bit unclear if one should use error statements,
`std::optional` or `std::expected` to signify that something failed. I guess
it might depend on the context, but having to deal with multiple ways a function
might fail is a bit annoying in any language and something I'd like to avoid if
possible.

`error` should still run the local destructors, but if we have something like
```
create_some_object((*whatever_t) next)
{
    malloc(sizeof(whatever_t)) => ptr;
    init_whatever(ptr) => whatever;
    next(whatever);
    free(whatever);
}
```
(preferably the pointer from `malloc` would immediately be thrown into a `box`
or something, that way we could fairly easily handle its lifetime automatically
but anyway)

this would now leak memory on each `error`, whether that be from `init_whatever`
or `next`. Technically speaking a memory leak is safe, but it should preferably be avoided whenever possible.
The fix would be something like
```
create_some_object((*whatever_t) next)
{
    malloc(sizeof(whatever_t)) => ptr;

    init_whatever(ptr) => whatever;
    !> e {free(ptr); error e;}

    next(whatever);
    !> e {free(ptr); error e;}

    free(ptr);
}
```

but I'm not entirely certain if this can be automatically detected. In theory I
guess we could implement the `safe` and `unsafe` attributes and provide wrappers
for creating `box` pointers that immediately fix the issue, but hmm.

### Calling closure multiple times

This is a relatively small detail, same as in Rust. Depending on how a closure
is used, it can have different properties. If the closure is called just once
(probably the most common situation), it is allowed to move captured variables.
If called multiple times, it is not, since a previous invocation might've
already moved some variables.

Then there are also function pointers that don't have any closure attached to
them. Currently I'm thinking of using the syntax
```
 (int) - closure that can be called once
&(int) - closure that can be called multiple times
*(int) - raw function pointer, can be called multiple times
```

The semantics for passing these different types around seem to match variables
fairly close, where a 'call' is a kind of destructive move. The exact semantics
are still unimplemented, though.