RegEx stuff: http://sebfisch.github.com/haskell-regexp/ = Expressions = 0 1 2... -> just push themselves on the stack "string" -> pushes "string" on the stack -> "bareword" /bareword -> "bareword" / |bareword -> "bareword" | \bareword -> "bareword" \ bareword -> lookup "bareword" in current scopes -> passive -> push value on the stack -> active -> call value on current stack -> quote -> call value on current stack, even in quoted mode / -> nop "string" | -> resolve "string" in current scope, push value "string" \ -> resolve "string" in current scope, call value on current stack [ -> push array begin marker on stack ] -> since the topmost array marker, create array from stack values ( -> push tuple begin marker on stack ) -> since the topmost tuple marker, create tuple from stack values { -> push quote begin marker on stack, increase parser quote count } -> since the topmost quote marker, everything becomes one closure (deduce type), decrease parser quote count dup -> dups |dup -> push dup function pointer \dup -> dups |f |g ; -> { f g } { ...1 } variable ; { ...2 } ; -> { ...1 variable } { ...2 } ; -> { ...1 variable ...2 } 1 2 |add * -> 3 |f * -> f |f *10 g -> f g StructuredData.field -> StructuredData "field" . -> dereference field in structured data, if passive push, if active do SD "field" .| -> dereference, push value SD "field" .\ -> dereference, call x "string" = -> dereference, create (passive/active depends on type) field if not existent, set value to x x pointer = -> dereference, set value to x x /pointervar = -> sets the pointer itself x |pointervar = -> sets the value the pointer points to x \pointervar = -> if the pointer points to a value which is a pointer, assign where that pointer points to v "name" deff -> define name as active name, assign v v "name" defv -> define name as passive name, assign v < -> push new scope > -> pop scope, structured data object now on stack < 0 =a 0 =b > -> something which has a and b is on top of stack 0 1 2 _0 -> 0 1 2 2 0 1 2 _1 -> 0 1 2 1 0 1 2 _2 -> 0 1 2 0 0 1 2 -0 -> 2 0 1 2 -1 -> 1 0 1 2 -2 -> 0 exch = -01 pop = - nop = _ dup = _0 _ -> copy stack contents according to digits - -> delete stack contents up to largest digit or count of "-" whichever is larger, recreate according to digits 1 [ 2 3 ] add -> [ 3 4 ] [ [ 1 ] [ 2 ] ] length -> 2 # scanning for applicable base type from top A->int B->int add -> B->A->int A->int A->int add -> A->int = Characters = !: ": string quote #: line comment $: %: &: ': (: tuple begin ): tuple end *: apply function +: ,: -: stack manipulation .: field dereference /: stringify 0-9: numbers :: ;: function composition <: scope start =: assignment >: scope end ?: ternary operator @: A-Z: bareword characters [: array begin \: callify ]: array end ^: _: stack manipulation `: a-z: bareword characters {: quote begin |: passify }: quote end ~: = Memory Management = Inspiration: http://wiki.luajit.org/new-garbage-collector 0x6???????????: Heap memory ("16 TB should be enough for everyone.") 0x5???????????: GC black bitmap 0x4???????????: GC allocation maps Large set of reachable, old objects Large set of unreachable, new objects Small set in between == Object Memory Layout == === Int === * Length in bytes (including header, always 16) bit 63-60: 0 0 0 0 bit 59: reserved for GC * value === String === * Length in bytes (including header) bit 63-60: 0 0 0 1 bit 59: reserved for GC * hash (0 if not yet calculated) * Exact length * data (UTF-8) === Scope === * Length in bytes (including header) bit 63-60: 0 0 1 0 bit 59: reserved for GC bit 58: parent pointer exists (for scopes) (always 1 for now) bit 57: extension area pointer exists (always 1 for now) * name table (mapping entry names to offsets) * parent scope (0 if no parent) * extension area pointer (0 if no extra members (yet)) * data [ ]* === Name Table === ( TODO: create a hash-table based implementation here ) * Length in bytes (including header) bit 63-60: 0 0 1 1 bit 59: reserved for GC * End of fill in bytes (i.e. if this field == length, table is full) * data [ , ]* ( TODO: Oh, the performance... ) The 0th string maps to 0 and so on. === Extension Area === * Length in bytes (including header) bit 63-60: 0 1 0 0 bit 59: reserved for GC * data [ ]* === Function === * Length in bytes (including header) bit 63-60: 0 1 0 1 bit 59: reserved for GC * captured scope pointer (0 if non-capturing function) * type pointer (0 if untyped) * code pointer === Function Code === * Length in bytes (including header) bit 63-60: 0 1 1 0 bit 59: reserved for GC * [ ]* === Array === * Length in bytes (including header) bit 63-60: 0 1 1 1 bit 59: reserved for GC * [ ]* === Function Type === * Length in bytes (including header) bit 63-60: 1 0 0 0 bit 59: reserved for GC * Number of input stack elements * [ ]* * Number of output stack elements * [ ]* ( TODO: Think about representing this directly as array pointers ) ( TODO: Think about moving the counts to the beginning ) = Musings About The Optimizer = Main problem: How do we find out if the names mean what we believe they do? == Runtime Checking == * Optimize for the _current_ values of names * Check whether changes have occured * maybe some kind of version numbering in the scopes + semantically identical to fully dynamic scopes - slow == Dynamic / Static / Constant Scopes, Guessing == * A scope can be static (i.e. names don't change activation mode and no new names) * A scope can be constant (i.e. names don't change values and it is static) * The compiler guesses what is correct * The user can override * Conflicting runtime-changes raise errors + should be doable as long as no joker redefines =, defv, deff + should result in fast code for most methods - surprising runtime errors == Dynamic / Static / Constant Scopes, User Specified == * A scope can be static (i.e. names don't change activation mode and no new names) * A scope can be constant (i.e. names don't change values and it is static) * Default scopes are dynamic, but the user can specify otherwise - many methods would not get optimized because the user forgot to flag the scope == Dynamic / Static / Constant Names == * A name can be static (i.e. it is never overwritten after being resolved in a child scope and remains at the same scope index) * A name can be constant (i.e. its value never changes after being resolved in a child scope) * An optimized version of the code can be generated whenever it is called (using current name values) + fine grained control, sensible defaults possible - even more assignment semantics === Name Properties === * [d] deeply constant: objects referenced (indirectly) from this name can be assumed constant (constant implied) * [c] constant: the object reference of this name can be assumed constant (code constant and static implied) * [t] code / type constant: the final code block (not function) reference and function type can be assumed constant * [s] static: the path from local scope to where the name resolves can be assumed constant the name index within the scope can be assumed constant * [] dynamic: nothing can be assumed * [v] inactive: push value * [f] active: execute unless quoted * [q] quote-active: execute on first resolution (execution is instant) +---------------------------------------------------+ | | | s | t | st | c | d | +---+------+-------+-------+--------+-------+-------+ | v | defv | defvs | defvt | defvst | defvc | defvd | | f | deff | deffs | defft | deffst | deffc | deffd | | q | defq | | | | | | +---+------+-------+-------+--------+-------+-------+ * defaults: |defvst "==" deffd |deffst "=*" deffd * it is possible to globally turn checking (after dynamic resolution) on == Sticky Resolution == * Active names are considered constant after being resolved for the first time * Passive names are considered static after being resolved for the first time - fails to correctly handle the { =*array ... } use case = Musings about Types = [ /foo /bar ] [ 2 3 ] { "Key %s -> %d" format sys .out .writeall } [ "" "" ] [ ] typed * [ /a /b /c ] _ { cat } * # -> [ /aa /bb /cc ] [ /a /b /c ] _ { cat } [ "" "" ] [ "" ] typed * # -> [ /aa /bb /cc ] [ /a /b /c ] _ [ 0 ] [ "" ] typed cat # -> [ [ /aa /ba /ca ] [ /ab /bb /cb ] [ /ac /bc /cc ] ] [ /a /b /c ] _ |cat cross Types are represented by concrete values of that type. Co-iterability is decided by type-match and * integers: co-iterated type is bitwise and, if zero, no co-iteration * strings: co-iterated type is longest common substring, if epsilon, no co-iteration == Musings about function composition operator == Maybe ; should act differently when getting a string, or use another single char for category operations Uses: Categorical operators over Set Anyway: This should be handled by a library, use escape mechanism for ;