doc/notes


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529

= Expressions =

0 1 2...  -> just push themselves on the stack
"string"  -> pushes "string" on the stack
<non-bareword><bareword> -> "bareword" <non-bareword>
  /bareword -> "bareword" /
  |bareword -> "bareword" |
  \bareword -> "bareword" \

bareword  -> lookup "bareword" in current scopes
  -> passive -> push value on the stack
  -> active  -> call value on current stack
  -> quote   -> call value on current stack, even in quoted mode

/ -> nop
"string" | -> resolve "string" in current scope, push value
"string" \ -> resolve "string" in current scope, call value on current stack

[ -> push array begin marker on stack
] -> since the topmost array marker, create array from stack values

( -> push tuple begin marker on stack
) -> since the topmost tuple marker, create tuple from stack values

{ -> push quote begin marker on stack, increase parser quote count
} -> since the topmost quote marker, everything becomes one closure (deduce type), decrease parser quote count

add -> adds
|add -> push add function pointer
\add -> adds

|f |g ; -> { f g }    (modulo subtle scoping differences)
{ ...1 } variable ; { ...2 } ; -> { ...1 variable } { ...2 } ; -> { ...1 variable ...2 }
1 2 |add * -> 3

|f * -> f

StructuredData.field -> StructuredData "field" . -> dereference field in structured data, if passive push, if active do
SD "field" .| -> dereference, push value
SD "field" .\ -> dereference, call

v "name" deff -> define name as active name, assign v
v "name" defv -> define name as passive name, assign v
v =name -> set current value of name to v, leave modes intact

< -> push new scope
> -> pop scope, structured data object now on stack
< 0 =a 0 =b > -> something which has a and b is on top of stack
< ... parent >' -> push scope with non-local parent (at the end only)

0 1 2 _    -> 0 1 2 2
0 1 2 -0   -> 0 1 2
0 1 2 -000 -> 0 1 2 2 2
0 1 2 -020 -> 2 0 2
0 1 2 -1   -> 0 1
0 1 2 -2   -> 0

exch = -01
pop = --
dup = _

-<digits> -> delete stack contents up to largest digit, recreate according to digits

= Characters =

!: co-routines and threads
": string quote
#: line comment, complex scope types
$: <open>
%: <open>
&: <open>
': type assignment
(: tuple begin
): tuple end
*: apply function
+: <open>
,: <open>
-: stack manipulation
.: field dereference
/: stringify
0-9: numbers
:: <open>
;: function composition
<: scope start
=: assignment
>: scope end
?: ternary operator, error functions
@: <open>
A-Z: bareword characters
[: array begin
\: callify
]: array end
^: <open>
_: stack manipulation
`: <open>
a-z: bareword characters
{: quote begin
|: passify
}: quote end
~: <open>

= Register Meanings =

r15: call stack and internal storage
rsp: data stack
r14: current scope
r13: current coroutine

= Stuff on the Stack =

0: zero
1: one
?: data
x: don't care

1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx????????????????????????????????
unboxed integer

00000000000000000110????????????????????????????????????????????
reference to heap object

0000000000000000000000000000000000000000000000000000000000000001
array begin marker

0000000000000000000000000000000000000000000000000000000000000010
quote begin marker

0101010101010101010101010101010101010101010101010101010101010101
bottom of stack marker

0000000000000000000000000000000000000000000000000000000000000100
error data marker (only for external debugging)

0000000000000000000000000000000000000000000000000000000000000101
dynamic variable binding marker (call stack only)
next two cells (upwards into the stack)
* reference to dynamic variable object
* reference to current value

= Memory Management =

== Currently implemented ==

Inspiration: http://wiki.luajit.org/new-garbage-collector

0x6???????????: Heap memory ("16 TB should be enough for everyone.")
0x5???????????: GC block bitmap
0x4???????????: GC mark bitmap
0x3???????????: miscellaneous allocations (initial assembly in particular)

== Musings about possibly better schemes ==

0x500?????????: GC block bitmap
0x501?????????: GC block bitmap
0x502?????????: GC mark bitmap
0x503?????????: GC mark bitmap
0x504?????????: GC free bitmap
0x505?????????: GC free bitmap
0x506?????????: GC "all cells used"-tree
0x507?????????: GC "all cells used"-tree
0x508?????????: GC "any cells used"-tree
0x509?????????: GC "any cells used"-tree

Large set of reachable, old objects
Large set of unreachable, new objects
Small set in between

=== How to find free space ===

A valid state:

1                16-cell element blocked
1       1         8-cell element blocked
1   0   1   1     4-cell element blocked
1 1 0 0 1 1 1 1   2-cell element blocked
1011000010101010  usage of cells

Tree-Layout:

E
6   D
2 5 9 C
013478AB

Right child: 1 sub
Left child: 1 height level sub shl sub 

Algorithm of allocation (of byte-count n):

Find correct level m = height-log_2(n/16) (from top).
Walk zeros in "all cells used"-tree down to level m-1.
Find zero in "any cells used"-tree on level m.
Set "all cells used"-bit on level m (or multiple bits on the lower levels according to object size)
  Propagate as necessary.
Set "any cells used"-bits on levels m to 0.

Overhead: 1/32.

== Object Memory Layout ==

=== Int ===
* Length in bytes (including header, always 16)
  bit 63-60: 0 0 0 0
  bit 59: reserved for GC
* value

=== String ===
* Length in bytes (including header)
  bit 63-60: 0 0 0 1
  bit 59: reserved for GC
* hash (0 if not yet calculated)
* Exact length
* data (UTF-8)

=== Float ===
* Length in bytes (including header, always 16)
  bit 63-60: 0 0 1 0
  bit 59: reserved for GC
* double precision value

=== Extension Area ===
* Length in bytes (including header)
  bit 63-60: 0 1 0 0
  bit 59: reserved for GC
* data
  [ <reference> ]*

=== Function ===
* Length in bytes (including header)
  bit 63-60: 0 1 0 1
  bit 59: reserved for GC
* captured scope pointer (0 if non-capturing function)
* type pointer (0 if untyped)
* code pointer

=== Function Code ===
* Length in bytes (including header)
  bit 63-60: 0 1 1 0
  bit 59: reserved for GC
  bit 58: optimized or being optimized or excluded from optimization
* Length of opcode block (rounded to 8 byte)
* [ <opcode> ]*
* [ <object pointer> ]*

=== Array ===
* Length in bytes (including header)
  bit 63-60: 0 1 1 1
  bit 59: reserved for GC
* [ <object pointer> ]*

=== Function Type ===
* Length in bytes (including header)
  bit 63-60: 1 0 0 0
  bit 59: reserved for GC
* Number of input stack elements
* [ <input type pointer> ]*
* Number of output stack elements
* [ <output type pointer> ]*
  ( TODO: Think about representing this directly as array pointers )
  ( TODO: Think about moving the counts to the beginning )

=== Scope ===
* Length in bytes (including header)
  bit 63-60: 1 0 0 1
  bit 59: reserved for GC
  bit 58: parent pointer exists (for scopes) (always 1 for now)
  bit 57: extension area pointer exists (always 1 for now)
* name table (mapping entry names to offsets)
* parent scope (0 if no parent)
* extension area pointer (0 if no extra members (yet))
* data
  [ <reference> ]*

=== Name Table (hashed implementation) ===
* Length in bytes (including header)
  bit 63-60: 1 0 1 0
  bit 59: reserved for GC
  bit 58: template table, needs copy on write
* Number of keys (also next scope entry index to assign)
* Hash buckets (16 bytes each)
  [
    <string reference>  (8 bytes)
    <execution mode>    (4 bytes)
    <scope entry index> (4 bytes)
  ]*

=== Stack ===
* Length in bytes (including header)
  bit 63-60: 1 0 1 1
  bit 59: reserved for GC
* Current stack pointer value
* <data ...>
* bottom of stack marker

=== Coroutine State ===
* Length in bytes (including header)
  bit 63-60: 1 1 0 0
  bit 59: reserved for GC
* Instruction pointer
* Current scope pointer
* Call stack object pointer (may be zero to indicate empty stack)
* Data stack object pointer (may be zero to indicate empty stack)

= Musings About The Optimizer =

Main problem: How do we find out if the names mean what we believe they do?

== Runtime Checking ==
* Optimize for the _current_ values of names
* Check whether changes have occured
  * maybe some kind of version numbering in the scopes
+ semantically identical to fully dynamic scopes
- slow

== Dynamic / Static / Constant Scopes, Guessing ==
* A scope can be static (i.e. names don't change activation mode and no new names)
* A scope can be constant (i.e. names don't change values and it is static)
* The compiler guesses what is correct
  * The user can override
* Conflicting runtime-changes raise errors
+ should be doable as long as no joker redefines =, defv, deff
+ should result in fast code for most methods
- surprising runtime errors

== Dynamic / Static / Constant Scopes, User Specified ==
* A scope can be static (i.e. names don't change activation mode and no new names)
* A scope can be constant (i.e. names don't change values and it is static)
* Default scopes are dynamic, but the user can specify otherwise
- many methods would not get optimized because the user forgot to flag the scope

== Dynamic / Static / Constant Names (implemented) ==
* A name can be static (i.e. it is never overwritten after being resolved in a child scope and remains at the same scope index)
* A name can be constant (i.e. its value never changes after being resolved in a child scope)
* An optimized version of the code can be generated whenever it is called (using current name values)
+ fine grained control, sensible defaults possible
- even more assignment semantics
=== Name Properties ===
* [d] deeply constant: objects referenced (indirectly) from this name can be assumed constant (constant implied)
* [c] constant: the object reference of this name can be assumed constant (code constant implied)
* [t] code / type constant: the function type can be assumed constant (including whether this function has no captured scope)
* [s] static: the path from local scope to where the name resolves can be assumed constant
          the name index within the scope can be assumed constant
          the activation level can be assumed constant
* [] dynamic: nothing can be assumed
* [v] inactive: push value
* [f] active: execute unless quoted
* [m] member: like active, but push scope resolving in first
* [q] quote-active: execute on first resolution (execution is instant)
+---------------------------------------------------+
|   |      |   s   |   t   |   st   |   c   |   d   |
+---+------+-------+-------+--------+-------+-------+
| v | defv | defvs | defvt | defvst | defvc | defvd |
| f | deff | deffs | defft | deffst | deffc | deffd |
| m | defm | defms | defmt | defmst | defmc | defmd |
| q | defq |       |       |        |       |       |
+---+------+-------+-------+--------+-------+-------+
* defaults: |defvs "==" deffd |deffs "=*" deffd |defms "=." deffd
* it is possible to globally turn checking (after dynamic resolution) on

== Sticky Resolution ==
* Active names are considered constant after being resolved for the first time
* Passive names are considered static after being resolved for the first time
- fails to correctly handle the { =*array ... } use case

= Musings about Types =

1 [ 2 3 ] add -> [ 3 4 ]
[ [ 1 ] [ 2 ] ] len -> 2     # scanning for applicable base type from top

A->int B->int add    ->  B->A->int
A->int A->int add    ->  A->int

# argument order to '' is from stack-top to stack-lower
[ 1 2 3 ] [ /foo /bar /quux ] { defv }' [ "" 1 ] [ ] '' *

[ /foo /bar ] [ 2 3 ] { "Key %s -> %d" format sys .out .writeall } [ "" "" ] [ ] '' *
[ /a /b /c ] _ { cat } *                        # -> [ /a /b /c /a /b /c ]
[ /a /b /c ] _ { cat } [ 1 1 ] [ "" ] '' *    # -> [ /aa /bb /cc ]
[ /a /b /c ] _ [ 0 ] [ "" ] '' cat              # -> [ [ /aa /ba /ca ] [ /ab /bb /cb ] [ /ac /bc /cc ] ]
[ /a /b /c ] _ |cat cross

Types are represented by concrete values of that type.
Scalars (i.e. integers, strings, floats) are all represented by integers.

Co-iterability is decided by type-match and
* integers: co-iteration only for equal non-zero integers

Question: Is it better to handle scalars uniformly or handle strings (or floats) as a separate type?
Question: How to deal with functions like eq, which can take all scalars?

== Variant: Scalar-Type (implemented) ==

eq [ sys .typed .SCALAR _ ] [ Int ] ''
{ 1 eq } { "foo" eq } and # [ sys .typed .SCALAR ] [ Int ] # -- -- 0
{ 1 eq } '1 { "foo" eq } '2 and # [ sys .typed .SCALAR _ ] [ Int ] # 1 eq -01 "foo" eq and

== Variant: Type-Variables ==

eq [ sys .typed .var _ ] [ Int ] ''
{ 1 eq } { "foo" eq } and # [ $X $Y ] [ Int ] # 1 eq -01 "foo" eq and

== Variant: Scalar Argument-Selector ==

=== Obvious Case 1 ===
1 [ 1 2 3 ] add => [ 2 3 4 ]
[ 1 2 3 ] 1 add => [ 2 3 4 ]
Because:
[ 1 1 1 ] [ 1 2 3 ] add => [ 2 3 4 ]

=== Obvious Case 2 ===
If f, g unrelated:
|f |g h => { f ==x g ==y y x h } *

|add |sub mul => { ==x ==y x y add x y sub mul } *

WTF??? Hence:
Say f, g have a type which specifies which stack position they'd like to consume, like so:
|f type => [ 2 0 ] . [ 0 ]
|g type => [ 1 0 ] . [ 0 ]
then
|f |g h => { ==z ==y ==x y x g ==G z x f ==F F G h } *

=== Obvious Case 3 ===
{ -- { -- { 1 } } } =*f (i.e. the constant function from two arguments to 1)
|f |f add => { -- { -- { 2 } } } (i.e. the constant function from two arguments to 2)

With the above:
|f type => [ 0 ] . [ [ 0 ] . 0 ]

=== Case 4 ===
{ _ } =*dup (clearly with type [ 0 ] . [ 0 0 ])

|dup |add ; => { _ add }
X |dup add => { ==y y dup ==D1 ==D2 X D2 add D1 }
|dup X add => { ==y y dup ==D1 ==D2 D2 X add D1 }

=== Case 5 ===


f: A->B->X
   |  |
   v  v
   a0 b0
   |
   v
   a1

= Musings about function composition operator =

Maybe ; should act differently when getting a string, or use another single char for category operations

Uses: Categorical operators over Set
Anyway: This should be handled by a library, use escape mechanism for ;

= Musings about the API =

sys .file
sys .net
sys .net .tcpip

  "127.0.0.1" 80 connect
  "www.google.de" 80 connect

sys .poll
  create -> p
  file userdata p .add
  file p .remove
  timeout p .poll

sys .linux

str
  ... "format" sprintf
  string "format" sscanf 
  end string prefix
    { 0 end string infix }
  start string postfix
    { start string len infix }
  start end string infix
    { start end range string * }

str .encode
  string from to encode

str .utf8

bin
  ... "format" bprintf
  string "format" bscanf

math
  [ 1 2 3 ] 16 unbase -> 1+2*16+3*16*16
  12345 10 base -> [ 5 4 3 2 1 ]

= Musing about asm regex engine =

Pseudocode: x86 assembly.
Program counter: Instruction offset relative to start of pseudocode.
Thread: Instruction counter + capture data
Threadlist:
[
  <string header>
  <size>
  <bitfield of contained PCs>
  [ # thread data
    <pc>
    <capture data>
  ]*
]

= Musing about co-routines =

* Includes data stack, call stack, rip

{ ... } !! # create co-routine with empty data and call stack
{ ...2 } { ...1 } !!' # create co-routine with full copy, starting with ...1, push it to stack, then continue execution at ...2
{ ... } !!" # create co-routine with shared data stack and empty call stack
CR      !!_ # clone co-routine, full copy of all stacks

... CR n ! # pass control and n stack elements to CR, passes "calling" CR object on top of stack
... CR *   # pass control to CR, don't switch data stack, passes "calling" CR object on top of stack
... f  n ! # pass control to function f, ignore n
... f *    # pass control to function f

{ { ==badError ==error
  badError error doStuffWhichFailsInVariousWays
} { "a bad error occured, but we caught it" dump } !!'
} { "a non-bad error occured, but we caught it" dump } !!'