diff options
| author | Marshall Lochbaum <mwlochbaum@gmail.com> | 2022-04-17 22:08:30 -0400 |
|---|---|---|
| committer | Marshall Lochbaum <mwlochbaum@gmail.com> | 2022-04-17 22:08:30 -0400 |
| commit | 6d08e831d160f0293f4f3587fef57e81008132fc (patch) | |
| tree | 4e9116c25ef7f1b535fd3f7fff8112801615c8f3 | |
| parent | a1b60a18922e578a97e50efe1a2e29863b6f8d92 (diff) | |
Minor editing
| -rw-r--r-- | docs/implementation/compile/intro.html | 10 | ||||
| -rw-r--r-- | implementation/compile/intro.md | 10 |
2 files changed, 10 insertions, 10 deletions
diff --git a/docs/implementation/compile/intro.html b/docs/implementation/compile/intro.html index 957baf32..2197dc4e 100644 --- a/docs/implementation/compile/intro.html +++ b/docs/implementation/compile/intro.html @@ -6,7 +6,7 @@ <div class="nav">(<a href="https://github.com/mlochbaum/BQN">github</a>) / <a href="../../index.html">BQN</a> / <a href="../index.html">implementation</a> / <a href="index.html">compile</a></div> <h1 id="array-language-compilation-in-context"><a class="header" href="#array-language-compilation-in-context">Array language compilation in context</a></h1> <p>Chances are that if you're reading this, you know the difference between an interpreter and a compiler, and what separates C from interpreted Python. Good! You have a lot more to learn to understand what compilation means for the array language paradigm. The most important is that, for an array language, compiling isn't automatically better in the way that Python compilers tend to run a lot faster than interpreters. The two strategies have different advantages, and a hybrid approach could be much better than either alone. Here we'll discuss what makes BQN fast, what keeps it from always being fast, and how compiling might help.</p> -<p>The most important piece of context is that you don't actually care about performance. On array programming forums there are many questions about how to speed up code or what the fastest approach would be. Nine times out of ten, or more, such a post is about a program that takes a fraction of a second—sometimes, under a millisecond, sometimes, under a microsecond. The program was already fast enough! In other cases the answer is almost never to switch languages, but to tweak the code slightly or use a different algorithm. The use cases that call for a general-purpose language and demand full hardware efficiency are rare, and it's much more important to make a language that is easy to write code in than one that runs it quickly.</p> +<p>Scratch that, the <em>most</em> important piece of context is that you don't actually care about performance. On array programming forums there are many questions about how to speed up code or what the fastest approach would be. Nine times out of ten, or more, such a post is about a program that takes a fraction of a second—sometimes, under a millisecond, sometimes, under a microsecond. The program was already fast enough! In other cases the answer is almost never to switch languages, but to tweak the code slightly or use a different algorithm. The use cases that call for a general-purpose language and demand full hardware efficiency are rare, and it's much more important to make a language that is easy to write code in than one that runs it quickly.</p> <h2 id="array-interpreters"><a class="header" href="#array-interpreters">Array interpreters</a></h2> <p>Learn to run the code before you can tree-walk the source.</p> <h3 id="fast-and-slow-evaluation"><a class="header" href="#fast-and-slow-evaluation">Fast and slow evaluation</a></h3> @@ -19,7 +19,7 @@ <p>C requires the programmer to specify the types used, and more generally a statically typed language is one where the type of a particular value is fixed based on the source code (or each copy of the value, with generics or other code generation mechanisms).</p> <p>BQN uses dynamic types, but in each array it tries to determine a common element type: for example it might notice that every element can be represented as a signed two-byte integer. The type is stored only once in the array header, and all elements are packed into memory using the appropriate type. So the two-byte integer array would be stored using a fixed-size header (which mostly consists of the shape and other non-type things), then two bytes for each element.</p> <p>I sometimes think of C's approach to types as <strong>temporally homogeneous</strong>, because once you write a function, a call one minute has the same types as a call the next, and BQN's as <strong>spatially homogeneous</strong>, because in an array each element has the same type as another one address over.</p> -<p>The temporal/spatial distinction isn't the same as static versus dynamic typing. Array elements in a statically-typed language are <em>both</em> temporally and spatially homogeneous, although in my opinion C compilers today only properly take advantage of the temporal part. And a dynamically-typed language can take advantage of temporal homogeneity with a combination of type checking and type inference, like a Javascript JIT compiler does.</p> +<p>The temporal/spatial distinction isn't the same as static versus dynamic typing. Array elements in a statically-typed language are <em>both</em> temporally and spatially homogeneous, although in my opinion C compilers today only properly take advantage of the temporal part. And a dynamically-typed language can take advantage of temporal homogeneity with a combination of type inference and runtime checks, like a Javascript JIT compiler does.</p> <h3 id="and-which-is-better"><a class="header" href="#and-which-is-better">And which is better</a></h3> <p>Because static typing lets the programmer say exactly what type the data should have, it ought to be faster, right? Well that's a bold hypothesis, C Programming Language, given that you spent the entire 90s working to convince programmers that it's better to let the compiler choose which assembly instructions are run and in what order.</p> <p>Array type detection can be both fast and accurate. First off, scanning a BQN array to get the appropriate type is very quick to do with vector instructions, particularly if it's an integer array. But also, this isn't usually necessary, because most primitives have an obvious result type. Structural manipulation leaves the type unchanged, and arithmetic can be done with built-in overflow checking, adding about 10% overhead to the operation in a typical case.</p> @@ -37,17 +37,17 @@ <p>Most array language compilers, and certainly the most successful ones, are based on static typing. Typed array languages have become a major subject of research in the last ten years or so, and the following two groups have arisen:</p> <ul> <li>ML-family (like Haskell): <a href="https://github.com/diku-dk/futhark">Futhark</a> has size-dependent types and <a href="https://github.com/google-research/dex-lang">Dex</a> has fully dependent types</li> -<li>MLIR and XLA, low-level drivers of deep learning frameworks like TensorFlow</li> +<li><a href="https://mlir.llvm.org/">MLIR</a> and XLA, low-level drivers of deep learning frameworks like TensorFlow</li> </ul> <p>One of the reasons to cover static types first is that all of these languages are promising compilation targets for BQN code when types have been determined. They're mostly designed for large arrays, with a significant focus on GPUs. This would make them a sort of a specialized tool in an array language, where most programs use many small and medium arrays even if the goal is to manipulate large ones.</p> <p>Three major efforts to apply ahead-of-time compilation to APL are <a href="https://www.snakeisland.com/apexup.htm">APEX</a>, <a href="https://github.com/melsman/apltail">apltail</a>, and <a href="https://github.com/Co-dfns/Co-dfns">Co-dfns</a>. It's worth noting that none has had any apparent uptake in the APL world. I think this is because they have to leave out significant functionality to compile ahead of time, and because performance doesn't actually matter to APL programmers, as mentioned in the introduction. From my reading it appears that APEX is statically typed, since each value in the source code can have only one type, and declarations (written as APL comments) are required to disambiguate if the are multiple possibilities. apltail and Co-dfns are dynamically typed, but apltail compiles to Typed Array Intermediate Language (TAIL), which, as the name insists, is statically typed. As far as I know TAIL is used only as a target for apltail. Some effort has been expended on the <a href="https://github.com/henrikurms/tail2futhark">tail2futhark</a> project, but it's no longer maintained.</p> <h3 id="compiling-with-dynamic-types"><a class="header" href="#compiling-with-dynamic-types">Compiling with dynamic types</a></h3> <p>Moving to dynamically-typed languages, the actual compilation isn't going to change that much. What we are interested in is types. When and how are they determined for values that haven't been created yet?</p> -<p>First I think it's worth discussing <a href="https://julialang.org/">Julia</a>, which I would describe as the most successful compiled dynamically-typed array language. Each array has a particular type, and it sticks with it: for example if you multiply two <code><span class='Function'>Int8</span></code> arrays then the results will wrap around rather than increasing the type. But functions can accept many different argument types. Julia does this by compiling a function again whenever it's called on types it hasn't seen before. The resulting function is fast, but the time spent compiling causes significant delays. The model of arrays with a fixed type chosen from many options is the same as NumPy, which follows a traditional interpreted model. But it's different from APL and BQN, which have only one number type and optimize using subsets. J and K sit somewhere in between, with a small number of logical types (such as separate integers and floats) and some possibility for optimization.</p> +<p>First I think it's worth discussing <a href="https://julialang.org/">Julia</a>, which I would describe as the most successful compiled dynamically-typed array language. Each array has a particular type, and it sticks with it: for example if you multiply two <code><span class='Function'>Int8</span></code> arrays then the results will wrap around rather than increasing the type. But functions can accept many different argument types. Julia does this by compiling a function again whenever it's called on types it hasn't seen before. The resulting function is fast, but the time spent compiling causes significant delays. The model of arrays with a fixed type chosen from many options is the same as NumPy, which follows a traditional interpreted model. But it's different from APL and BQN, which have only one number type and optimize using subsets. J and K sit somewhere in between, with a small number of logical types (such as separate integers and floats) and less subset use.</p> <p>The ahead-of-time compilers apltail and Co-dfns mentioned in the previous section take different approaches. apltail uses a powerful (but not dependent) type system with type inference to detect which types the program uses. Co-dfns compiles to ArrayFire code that is still somewhat dynamic, with switches on rank or types. It's possible the ArrayFire compiler can optimize some of them out. I think that while these impressive projects are definitely doing something worthwhile, ahead-of-time compilation on its own is ultimately not a good basis for an array language implementation (but it's just my opinion, and I may well be wrong! Don't let me stop you!). There's too much to gain by having access to the actual data at compilation time, and being able to fit it into a smaller type.</p> <p>However, I would be very interested in compiling BQN's stack-based IR using these ahead-of-time methods. The ArrayFire code from Co-dfns would be easiest to generate and can probably be adapted to BQN primitives (Dyalog has indicated in a non-committal way that they're interested in integrating Co-dfns into Dyalog APL as well). Other backends could provide better performance at the cost of type analysis, which brings us to the question of how to approach types in running dynamic code.</p> <p>A very early example of JIT compilation, <a href="https://aplwiki.com/wiki/APL%5C3000">APL\3000</a>, begins by compiling each function for the exact types it's called with on the first call and recompiles for more general types when its assumptions are broken. This keeps the program from spending all its time compiling, but also can't optimize a function well over multiple types; given how cheap memory is now I think it's better to compile multiple versions more readily.</p> -<p>I wrote in more detail about various strategies in the page on <a href="https://mlochbaum.github.io/BQN/implementation/compile/dynamic.html">dynamic compilation</a>. For one example of an advanced approach, my guess as to the best framework—if implementation costs are not taken into account—to deal with large arrays on a typical computer is to layer many strategies together:</p> +<p>I wrote in more detail about various strategies in the page on <a href="https://mlochbaum.github.io/BQN/implementation/compile/dynamic.html">dynamic compilation</a>. For one example of an advanced approach, my guess as to the best framework—given unlimited time to implement it—to deal with large arrays on a typical computer is to layer many strategies together:</p> <ul> <li>At the largest scale, execution is dynamic</li> <li>When possible, a few upcoming primitives are statically analyzed together and probably fused</li> diff --git a/implementation/compile/intro.md b/implementation/compile/intro.md index ca115d59..dc019eb4 100644 --- a/implementation/compile/intro.md +++ b/implementation/compile/intro.md @@ -4,7 +4,7 @@ Chances are that if you're reading this, you know the difference between an interpreter and a compiler, and what separates C from interpreted Python. Good! You have a lot more to learn to understand what compilation means for the array language paradigm. The most important is that, for an array language, compiling isn't automatically better in the way that Python compilers tend to run a lot faster than interpreters. The two strategies have different advantages, and a hybrid approach could be much better than either alone. Here we'll discuss what makes BQN fast, what keeps it from always being fast, and how compiling might help. -The most important piece of context is that you don't actually care about performance. On array programming forums there are many questions about how to speed up code or what the fastest approach would be. Nine times out of ten, or more, such a post is about a program that takes a fraction of a second—sometimes, under a millisecond, sometimes, under a microsecond. The program was already fast enough! In other cases the answer is almost never to switch languages, but to tweak the code slightly or use a different algorithm. The use cases that call for a general-purpose language and demand full hardware efficiency are rare, and it's much more important to make a language that is easy to write code in than one that runs it quickly. +Scratch that, the *most* important piece of context is that you don't actually care about performance. On array programming forums there are many questions about how to speed up code or what the fastest approach would be. Nine times out of ten, or more, such a post is about a program that takes a fraction of a second—sometimes, under a millisecond, sometimes, under a microsecond. The program was already fast enough! In other cases the answer is almost never to switch languages, but to tweak the code slightly or use a different algorithm. The use cases that call for a general-purpose language and demand full hardware efficiency are rare, and it's much more important to make a language that is easy to write code in than one that runs it quickly. ## Array interpreters @@ -30,7 +30,7 @@ BQN uses dynamic types, but in each array it tries to determine a common element I sometimes think of C's approach to types as **temporally homogeneous**, because once you write a function, a call one minute has the same types as a call the next, and BQN's as **spatially homogeneous**, because in an array each element has the same type as another one address over. -The temporal/spatial distinction isn't the same as static versus dynamic typing. Array elements in a statically-typed language are *both* temporally and spatially homogeneous, although in my opinion C compilers today only properly take advantage of the temporal part. And a dynamically-typed language can take advantage of temporal homogeneity with a combination of type checking and type inference, like a Javascript JIT compiler does. +The temporal/spatial distinction isn't the same as static versus dynamic typing. Array elements in a statically-typed language are *both* temporally and spatially homogeneous, although in my opinion C compilers today only properly take advantage of the temporal part. And a dynamically-typed language can take advantage of temporal homogeneity with a combination of type inference and runtime checks, like a Javascript JIT compiler does. ### And which is better @@ -63,7 +63,7 @@ I'm now discussing what most people mean when they say "compiled APL", which mea Most array language compilers, and certainly the most successful ones, are based on static typing. Typed array languages have become a major subject of research in the last ten years or so, and the following two groups have arisen: - ML-family (like Haskell): [Futhark](https://github.com/diku-dk/futhark) has size-dependent types and [Dex](https://github.com/google-research/dex-lang) has fully dependent types -- MLIR and XLA, low-level drivers of deep learning frameworks like TensorFlow +- [MLIR](https://mlir.llvm.org/) and XLA, low-level drivers of deep learning frameworks like TensorFlow One of the reasons to cover static types first is that all of these languages are promising compilation targets for BQN code when types have been determined. They're mostly designed for large arrays, with a significant focus on GPUs. This would make them a sort of a specialized tool in an array language, where most programs use many small and medium arrays even if the goal is to manipulate large ones. @@ -73,7 +73,7 @@ Three major efforts to apply ahead-of-time compilation to APL are [APEX](https:/ Moving to dynamically-typed languages, the actual compilation isn't going to change that much. What we are interested in is types. When and how are they determined for values that haven't been created yet? -First I think it's worth discussing [Julia](https://julialang.org/), which I would describe as the most successful compiled dynamically-typed array language. Each array has a particular type, and it sticks with it: for example if you multiply two `Int8` arrays then the results will wrap around rather than increasing the type. But functions can accept many different argument types. Julia does this by compiling a function again whenever it's called on types it hasn't seen before. The resulting function is fast, but the time spent compiling causes significant delays. The model of arrays with a fixed type chosen from many options is the same as NumPy, which follows a traditional interpreted model. But it's different from APL and BQN, which have only one number type and optimize using subsets. J and K sit somewhere in between, with a small number of logical types (such as separate integers and floats) and some possibility for optimization. +First I think it's worth discussing [Julia](https://julialang.org/), which I would describe as the most successful compiled dynamically-typed array language. Each array has a particular type, and it sticks with it: for example if you multiply two `Int8` arrays then the results will wrap around rather than increasing the type. But functions can accept many different argument types. Julia does this by compiling a function again whenever it's called on types it hasn't seen before. The resulting function is fast, but the time spent compiling causes significant delays. The model of arrays with a fixed type chosen from many options is the same as NumPy, which follows a traditional interpreted model. But it's different from APL and BQN, which have only one number type and optimize using subsets. J and K sit somewhere in between, with a small number of logical types (such as separate integers and floats) and less subset use. The ahead-of-time compilers apltail and Co-dfns mentioned in the previous section take different approaches. apltail uses a powerful (but not dependent) type system with type inference to detect which types the program uses. Co-dfns compiles to ArrayFire code that is still somewhat dynamic, with switches on rank or types. It's possible the ArrayFire compiler can optimize some of them out. I think that while these impressive projects are definitely doing something worthwhile, ahead-of-time compilation on its own is ultimately not a good basis for an array language implementation (but it's just my opinion, and I may well be wrong! Don't let me stop you!). There's too much to gain by having access to the actual data at compilation time, and being able to fit it into a smaller type. @@ -81,7 +81,7 @@ However, I would be very interested in compiling BQN's stack-based IR using thes A very early example of JIT compilation, [APL\3000](https://aplwiki.com/wiki/APL%5C3000), begins by compiling each function for the exact types it's called with on the first call and recompiles for more general types when its assumptions are broken. This keeps the program from spending all its time compiling, but also can't optimize a function well over multiple types; given how cheap memory is now I think it's better to compile multiple versions more readily. -I wrote in more detail about various strategies in the page on [dynamic compilation](https://mlochbaum.github.io/BQN/implementation/compile/dynamic.html). For one example of an advanced approach, my guess as to the best framework—if implementation costs are not taken into account—to deal with large arrays on a typical computer is to layer many strategies together: +I wrote in more detail about various strategies in the page on [dynamic compilation](https://mlochbaum.github.io/BQN/implementation/compile/dynamic.html). For one example of an advanced approach, my guess as to the best framework—given unlimited time to implement it—to deal with large arrays on a typical computer is to layer many strategies together: - At the largest scale, execution is dynamic - When possible, a few upcoming primitives are statically analyzed together and probably fused |
