 So apparently, I didn't manage to frighten everybody away. Or you like having your minds blown. Thanks to Naresh and the team at Functional Conf for inviting me back. It's the third year I'm here. And I always learn a lot from these events. I find if you just learn one new idea at one of these events, it's generally worth the trip. So if you're still looking for a purple guy for your team, I might be the man. I'm no longer the CTO of dialogue, though. Jay Fode, if you were here last year, who accompanied me and gave a workshop last year, is now the CTO. He's a C programmer. I'm originally from the user side. So I used APL for a long time. I managed the development team of dialogue. For a while now, I call myself the CXO. So I'm going out, listening to people's requirements, coming back and translating that into work for the CTO and his team. I have Roger Huey with me here, who's on the implementation team, and also co-author of the Jay programming language, which you may have heard of, which is similar to APL. He's going to be giving a tutorial this afternoon with more examples of APL. So I'm going to be doing PowerPoint. And I think Roger is going to be live. Yeah, it's not all slides. Anyway, so in John's keynote this morning, he mentioned this Bacchus' talk, his during award acceptance talk. And so that reminded me, this is a slide I used, I think, two years ago in my talk here. So Bacchus was aware of APL back in 77. And he pointed out that Iverson was looking for these expressions that are not word at the time, new forms of parallelism. He did go on to mention a number of drawbacks of APL that at that time it was very procedural. And you could say what dialogue has been working hard on for the last 10 to 15 years is actually trying to remove some of the drawbacks, the deficiencies that Bacchus found in APL. So a little bit about the background of APL. So I'm going to be talking about notation as a tool for parallel thought. But this talk will also act as an introduction to APL. How many people saw one of my talks already? So that's worthwhile doing. If any of you able to follow all of this, please talk to me about a job afterwards. It's going to move quite fast when I get going. OK, so Iverson was a mathematician. And he was very focused on notation as a tool of thought, so providing language that would make it easier for people to express thoughts to free their brains from over complex notations. And I think these two are quotes from his Turing Award speech two years later. And what Iverson was trying to do, he learned math a little bit later. He went to university a little bit later than most people. He was in the Air Force first and then only joined the, only went to university a bit later. And he started working on linear algebra. And he was very frustrated by the fact that mathematical notation uses a very wide variety of forms and that the precedence rules and so on are very complicated. And after thinking about that for a while, he set out to try and organize that and come up with a better language that would allow him to express his thoughts more clearly. So he sort of, he lined all of these up and then he thought about it for a while and he decided, well, A B is gonna be A times B. That's E to the power X, X divided by Y, the B logarithm of A. And he wanted to use symbols and you can see here this is a log seen from the end, right? The tree that's been chopped. So he used that for logarithm. The A to the power reciprocal of N is the Nth root of A. Matrix multiplication, he came up with this notation. It's a generalized doing a plus reduction of the multiplication of combinations of rows and columns. FGX, he found that to be good, so he kept it as it was. But he chose this as the precedence rule for all of APL. So this is why APL executes from right to left because that's what mathematics does in this case. And he had to pick one and he felt that was the most general one. The third trigonometric function squared and the sigmas and the pi's are the, iota six is the index generator, so it generates an array with the numbers from one to six and then you have either the plus reduction or the multiplication reduction. And then finally, this isn't going to fit on the right hand side. I need a little bit more space. It looks like this. Reading APL, so if we take that slightly complex expression, go back, I'm gonna be cut off, yeah? Yeah, so this is divided N is the reciprocal of N. So A to the power reciprocal N is the Nth root of, so this is just exponentiation, the normal power function. But the argument is the reciprocal of N which is why it gives you the Nth root. Yeah, so there is only one function to do exponentiation. It's just that we're using the reciprocal as N. Yes, all right, it's the, well, you see that's one of the interesting questions, isn't it? How does this mathematical notation work? I mean, I think it's the tangent. Are you saying taking the tangent twice or is it the tangent squared? I think in mathematical notation, this is taking the tangent and then squaring the result. Or did I get that wrong? Sorry? Right, because the three circle X is the third trigonometric function, so that's the tangent of X and then I'm squaring the result of that. But then, exactly, I mean, this is what I was gonna say, well, this means taking the tangent twice, doesn't it? Tangent of tangent of X. Are you squaring the function or are you squaring the result of taking the function? And that's what he set out to clean up in a sense. So reading APL, when I put my hand down, it gets really loud. So I would write this as two times A and here's commute, which Shashi showed us, so divided into minus B, plus or minus, plus catenate minus, the square root of, so 0.5 power commute, and then B squared minus four times A times C. And that's sort of the maximum length of an APL expression that I would recommend. If it gets much longer than that, you can't read it from left to right. So APL executes from right to left, but you should be able to read a good APL expression from left to right as it's being done here. If you can't do that, you should probably rephrase it so that it becomes easier to understand. Okay, so the syntax of APL or the syntaxes of APL are very, very simple. There are really only these cases which are here. There's an array, which could be just a list of numbers juxtaposed. So this is a three-element vector. Or you have a function followed by an argument. Or you have a function between infix. So functions are either prefix or infix. And then we have second-order functions which we call operators, which is a little bit confusing because you might use that for a function. But for us, operators are second-order functions, and they can either take a function on the left, so they associate with... If they're monadic, as we call them, which is another unfortunate term, they only take one parameter for one operand. It's on the left. So plus reduction is sort of the classical map reduce, which APL has had since 1966. Or an operator, this is the dot operator, it takes two operands, and this is the generalized vector product. So you multiply the elements element-wise, and then you do a plus reduction. And then we have this anomalous syntax for indexing, where you have an array, square brackets, and so one of the things that's different about APL, I guess, or maybe Julia does this too, you can index by an array in Julia. Yeah? Yeah. So there are other languages that implement that, but I think, again, APL pioneered it. So an important symbol that you'll see in a lot of the code is the lamp symbol. You may not recognize that a lamp, but that's because you're all too young. When APL was invented, lamps looked like this, at least in mines. So it illuminates the code. It's a lamp. And so here, let's have a look at what we call scalar functions. So the set of functions that take items pair-wise, scalers at a time, we call scalar functions. So there's addition. And this is sort of really the signature expression in APL, that you just type vectors like that with no functions to compose them, put an infix function. And APL knows that obviously what you want to do is item-wise addition. You don't have, you can have only one number, or one element on one of the sides, and APL understands you. You want to reuse that number. So 25 is less than 30, but it's greater than these two, so you get 001. And for us, there's no, we don't distinguish between Booleans and, and we don't have truth values as separate. They are just one-bit integers. So that's stored as a bit array in APL. So we have functions like maximum, multiplication. This time we are creating, this is a 2 by 3 reshape of i0 to 6. So the integers from 1 to 6 reshaped into a 2 by 3 array, and then all multiplied by 10. And even the random number generator accepts an array. So it takes an array argument, and for each element in the array, it generates a random integer between 1, or between the index origin. We have this business with index origin and it's possible 0 or 1. The default is 1, because that's what most domain experts prefer, but of course anybody who did real programming knows that 0 is the only correct value. You can also give a 0 here, and then you'll get a floating-point value back between 0 and 1. Okay, so if you're curious about this, you might do this already. There's a site called TryAPL.org, which is an online REPL. And if you go to the, tab, there's a cheat sheet where you can hover over all these symbols and you can see what they do, and you can type APL symbols. Even if you don't have a keyboard, you can click on this tab here and construct APL expressions. Okay, so now the rest of the talk is really a long list of compositions, functional composition forms that are in APL, that are all parallelizable to some degree. So the one that I guess most of you are going to be familiar with is reduction. So in APL, if we have a matrix, 3 by 4 reshape of iout of 12, then the times reduction of mat inserts a multiplication between all the elements of each row. So we get 24 for the first row and so on. If you put a bar through the slash, it's the reduction on the first dimension, along the leading dimension. So we get the product. And there's a form where, if you have a multi-dimensional array, you can put square brackets after the slash to identify the dimension that you want reduced from the array. So the result is always an array with rank one less than the argument. There's a very similar function, scan. You can scan along the trailing dimension or the leading dimension or any selected dimension. So the scan gives you the partial results as you move along the vectors. Okay? The outer product, that's the combination. So that's the application of the function to all items from the left and right argument. So if we had one and two on the left, 10, 100, 1,000, then we'll get, oh, the font, the color isn't very good there, but you see it's one times 10, one times 100, one times 1,000. And then, yeah, so we get an array which has the catenation of the shapes of the two arguments. They have a two-dimensional, two-dimensional, you'll get a four-dimensional result. And it's all very predictable. Yeah? Inner product, that's the dot with two functions, one on each side. So that's also, that's taking all combinations, but it's taking all combinations of rows from the left argument and columns from the right argument. You get a result then which has as many rows as the first argument and as many columns as the last argument. But rather than just implementing the classical vector or matrix product, Iverson recognized that this is a completely general construct. So it's the plus reduction, it's the F reduction of G between these two vectors into the corresponding position of the result. But it turns out that there are a whole bunch of other functions that are useful. For example, the combination of and and equals will tell you whether, oh, it says any, it should say all. I think I switched from or to and in the middle of the night. So the and reduction will give you a one for this row versus that column, so the top right in the result telling you that this row is identical to that column. And you could do or or dot equals. There are a number of functions that are useful to combine that way. So now a brief step into what we call mixed functions. So we looked at scalar functions, so they're the ones that just, you know, take an item at a time. There are a bunch of functions typically to do structural operations on arrays where you don't want to operate on the individual items but the entire array in various combinations. And those are called mixed because the rules about how they take pieces out of the arguments are, well, mixed. You basically need to look each one up in the manual to understand how it applies. So rotations, we have the 3 by 4 reshape of iota 12, our good friend. If we do the 2 0 minus 1 rotate with the, you can see, you can see from the vertical bar that the rotations are going this way, it's moved this one 2 to the, well, you can't see whether the left or right is a bad example. This one we can see that it's moved them, this one hasn't moved, that's corresponding to the 0, and this, the negative one has moved to the right by one position. So you can rotate all the rows different ways and if you put the bar horizontally through the circle then obviously you're rotating this way and you're rotating the columns various amounts. We have functions called take and drop which are like head and tail in your classical functional languages but they apply to the matrix. So the negative 2 to take is the 2 take from the bottom corresponding to the negative 2 and the 2 and the one drop, you don't have to provide all as many elements in the left argument as you have dimensionality on the right, it will pad with what's suitable. So the one drop says drop one row and I mean obviously you'll do that for all the columns but we're not dropping any columns unless we extend that argument. Transpose, reverse the order of the dimensions but there's a dyadic version of that where you can say I want to do the one one transpose or the one three two transpose where you explicitly reorder the dimensions and if you have a permutation, I mean if you have something there that's not a permutation so we have repeated here one one what you're actually telling the interpreter or the language is that as it, when it's taken an item the order that it should increase the indices to pick the next item and if you say one one that means increase on both dimensions so we get the diagonal so we can extract the diagonal from a matrix with a one one transpose we'll extract the diagonal plane from a three dimensional cube with a one one two transpose or a one two two or a one yeah and so on two one two. So index of we saw the monadic version that's the index generator which just generates numbers as many numbers as you put in the right argument starting at the index origin the dyadic version of that is a related function which looks the right argument up in the left argument so maybe in the demo I did before the break I was looking times up in a table of times to find the index so one is in the second position two is not found so that's seven which is one more than the number of items in the list and three is in the first position we have sorting although Iverson decided that sorting was actually consisted of two parts grading and sorting so the grade there's a grade up and a grade down grade tells you which order to pick the items in and then if you have a little anonymous function here which indexes the right argument by its own grade then of course that sorts the numbers and the reason for breaking it up this way is that Iverson realized that very often you have keys in one array and you have data in another array or you might have several arrays reordered in the same fashion so it was useful to be able to split those two operations up sort by something and then reorder but of course that's only useful in a language that supports array indexing and so on so I guess maybe that's why you don't see that in other languages so to summarize the basic parallel forms we have all these scalar functions including the circle function and the matrix we have reductions and scans outer product, inner product and this base value that I used in the demo where I was interpreting the columns of a matrix in a specified user defined base and then we have a bunch of mixed functions we haven't looked at all of these there's matrix inversion or matrix division and we've looked at most of the others so then in 1982 which is a while ago now APL transition there was a major extension of APL where we switched from having arrays which were strictly rectangular and all items of the same type to allowing any item of an array to be another array so here we have a two element vector where the first item is a three element vector and over here we have another two element vector one is a simple integer vector and the next one is itself a two element vector with a seven and then eight and nine in the next item and if when we do multiplication we start at the top level and we do item wise so we have two items at the top so one, two, three is multiplied by four, five, six and then ten is multiplied by seven, eight, nine and here we have only a single number and it pervades into the nested structure so it's used all the way down so the ten here is used for the seven and for the eight and the nine so this was at the time where relational databases were coming out and you really had tuples of mixed data types to work with so APL was extended to cope with that so we've seen that map is implicit in the scalar functions if you want to use any of the mixed functions or you want to do something slightly out of the ordinary you may need to explicitly use what we call each which I guess you would call map in most functional languages so if we did the plus-reduce of this, it's a two element vector so the plus goes in between the two items and we get four, five, six, plus seven, eight, nine but if what we wanted to do was actually add up to a sum of each of these vectors we would reduce each and then we would go down a level and apply the plus reduction to four, five, six and get 15, seven, eight, nine and get the next one a bit later I guess this is significantly later Iverson and Huey working in J came up with the constructs which are called function trains where you juxtapose two functions and you get an array out of that but if you juxtapose two functions without applying without providing them with data so you put parenthesis around here you get a construct where the FG is called an ATOP so it's F ATOP and then G applied to the two arguments this is useful in a number of situations I mean one is just this is really mostly a performance trick if you say I want to do a top of reshape the interpreter can realize that that just means you need 10,000 random numbers between one and six whereas if you did the deal of the 10,000 reshape of six the interpreter would first construct the 10,000 element array of sixes which is an expensive thing to do and then without knowing that they were all the same it would go and look at each one and generate roll dice 10,000 times and really optimize both the use of the underlying random number generator and not create the arrays and so on and then there's a form with three functions next to each other that's called a fork so the monadic example FGH of omega is F of omega GH of omega so if the middle function were plus for example this would be this is actually identical to the meaning in mathematical notation if you say F plus H that means F of X plus G of X and that means that the mean the average can be expressed as a three element train of the sum divide and count or tally as we call it because the mean is the sum divided by the count and of course this allows the interpreter to really optimize things because the parsing of something like this and the recognition in the AST that there's an averaging going on here can save the interpreter a lot of work and there's even little funny if you use comma as the middle one so you can sort of catenate functions together that's not what's happening here what this means is 1 plus 0.1 catenated with 1 minus 0.1 but it reads as you know catenating the functions together so this expression which was used in that quadratic equation at the beginning means 1 plus or minus 0.1 right and of course when you're looking at parallelism you could go off and parallelize if they are functional you could run the times in parallel so the interpreter has an opportunity to do some parallelization we've seen a few of these anonymous functions so far of course you can name them user defined functions follow exactly the same convention as the primitives so they can be prefix or infix and you can have user defined operators which are also prefix or infix following exactly the same rules as the primitives and the huge advantage of that of course is that you can apply them with all of the operators because they all understand functions that are either prefix or infix so sometimes you'll say well it's a bit of a weakness that EPL can't have parenthesized arguments and items but the cost of that would be that the application to many of these operators would be lost and so far at least we've decided to shy away from that so here's a Fibonacci where you're passing a two element vector down this is a recursive call the right argument is how far you have left to go so you decrement by 1 and then you pass down to the next level you drop the first item in the elements and push it down this is a guard which says if you get to 0 then just return the first item of the and EPL supports tail recursion so it recognizes that to tail recursion and doesn't start to build up a stack okay so the power operator I'm a little bit behind so I'll be very quick now so that allows a function to be applied a number of times so plus to the power 3 adds to 3 times giving 9 you can bind one of the operands and get a monadic operator called twice so you could say plus twice the right operand can be a function in which case the function is applied between the nth and the n minus 1th result and when it returns true you stop so 1 plus reciprocal of omega power limit we would call this gives you the golden ratio and it's parallelizable for certain functions so parallelism of these basic primitives we implement that in the interpreter so if the machine has multiple cores I'm comparing here cases of addition, division logarithm and OR is extended to be out of the Boolean domain to be the greatest common divisor on numbers and we can see that on my laptop which is an i7 we get no speed up for plus because it's just memory bound if you set all the cores to doing pluses you get no speed up and then as the complexity of the operation increases we get up to close to a 4 times speed up here this is something where the hyper threading actually manages to help but typically you'll get no more than a factor of 2 on this parallelism right so APL was extended into object oriented applications because we wanted to do stuff like GUI so here I'm saying create a form with a caption, with pixel coordinates of a certain size and then I can say but b1, b2, b3 is the f quad new each and then here's one of these lambda functions where I vary the button caption and the position I pass in 1, 2, 3 and I get 3 buttons and then I can do things like this where at each level of applying the dot if I get an array then the next item in this expression applies to all the items so f dot b1, b2, b3 is an array of 3 buttons and then I ask for the caption and the position for each one of the buttons so I get a 3 element array of 2 element arrays with the caption and button and if I say set the position element to 1 to 50 I'm executing this expression now inside the context of the array that I've created over there and I move all the buttons at once so you can apply I mean this is a little bit of a party trick but this kind of thing does actually happen in practice you may want to reorder several items at once and you can also, you have similar things if you're interacting with a com object model or a dot net object model so here many of you may be familiar with this so this is starting Excel as an OLE client open a workbook get out the sheets collection and turn it into an array because APL is very imperative and eager and doesn't really like lazy constructs so I get a 3 element array I can refer to the names so here's the spreadsheet in question it has 3 tabs you can even say sheet.usedrange.value2 so we're going 2 levels down and that expression extracts in fact all of the data out of an Excel spreadsheet no matter how many tabs it has if you find that's useful now so we've seen that in Dialog APL this evaluates an expression in the context of each object and when we needed a sort of a way to specify asynchronous programming we decided that a nice way to do that would be to define a form of namespace as we call these things where if you do an execution inside that space it's actually running in a separate process it will immediately return a future to you so your array is immediately fully populated with items and you can do structural operations of that take, drop, reshape, transpose on that without blocking but then when you get to a primitive that actually needs to know the value of each specific item the interpretable block if that item hasn't been manufactured yet by the asynchronous call so it allows subject matter experts to quite easily do asynchronous programming without getting into semaphores or message parsing or things that they will find very complicated so here I've created three empty isolates and then inside each one I've just done a delay for four seconds so I immediately get back a three element array we can see that I called, I asked for the time here and it was zero seconds here a tiny delay to get to here this happened immediately but then when I sum the array of delays the interpreter needed the values so there well the sum of the delays is 12 seconds because they each delayed for four seconds and returned the amount of time they delayed but the amount of time that passed was actually only four seconds so we delayed for 12 seconds in a mere four seconds which is pretty neat and then creating the isolates is sometimes that's useful if you want to have maybe maintain some state in them, have some long lived global data out there but there's also a parallel operator so if you had an expression like this which computes the this just counts the co-primes below N by computing the GCD and seeing how many ones there are and if you had an expression like that an APL programmer can just insert the parallel operator anywhere apply it to a function anywhere he thinks it might be safe and worthwhile to parallelize it from a mathematical perspective I mean to us it's very important that APL retains its the fact that it's a mathematical notation you can disregard the parallels in there if the functions are pure but we get deterministic parallelism right so Dialog APL in recently has picked up a bunch of the work that was done in Sharp APL and J with some more operators one of the most important ones perhaps is the rank operator which is a generalization of the scalar functions which allows you to specify exactly how you want to pick up items from the right and left so if we say multiplication rank 1 0 that says take rank 1 sub arrays from the left and combine them with rank 0 arrays from the right so we can have a 2 by 3 matrix and a 2 element vector and we can say well we're going to start by looking at this vector and that scalar and produce this part of the result and then we're going to move down to the next vector on the left and the next scalar on the right so it's like sort of each left each right all boiled into one with extensions to any number of dimensions skip this example because we're running out of time actually this is too important skip so the combination of rank and the way the operators apply along dimensions allows us to this is our average computation from before so plus reduce along the first dimension divided by the number of items which is also the count on the first dimension so here we have a 3 dimensional array the 2 by 2 by 4 reshape of iota 6 if we just say if we just apply the function as it is it's going to go it's going to go along the leading dimension so it's going to go down add the planes together so 1 plus 9 is divided by 2 is 5 if we say we want to do the average rank 2 it's going to apply the function to 2 dimensional sub arrays so it's going to take these numbers and average them producing that and then finally if we say well average rank 1 it's going to apply that to the vectors here and we average the rows so the same average expression can be directed along a selected dimension using the rank operator we extended iota to a higher rank so you can look up a row in the matrix and then another one which is important enough to spend time on it even if we're running out is the key operator so the key operator is similar to rank in that it applies your function to selected sub arrays the difference here is that the sub arrays are the items from the right argument so the value here grouped by the distinct keys so here we have red and blue the first function here is omega which just says take the keys which are provided as the left argument of the function and the data which is provided as the right argument to your function and just return those as a tuple so the first call to the function is with the key of red and the values of 10, 30, 40 and the second call is with 20 and 50 so it's very similar to an sql group by operation as a functional construct in the language so if we just wanted to count them we could say the key the count of the number of items and that will tell us we have three reds and three blues and of course Roger who implemented this doesn't go and call this user defined function because there are a number of patterns both of these are patterns that occur very frequently in applications he recognizes the idiom and he goes off and says well that's just a and bang you're done faster than there's no time for coffee when you're using APL that's one of them and now so it's a 50 year old language we just celebrated we had a big party earlier this week in Glasgow where we celebrated the 50th anniversary but we're not done yet so here's a proposed operator design that we're still working on it might be in version 16 of dialogue APL next year it might take us a bit longer to agree on the design so here's the identity function that's not very interesting but I needed to show how stencil works because I'm saying identity function stencil 3 on a 3 element vector and what that does is it applies your function to a moving window where each of the items in the array has the opportunity to be the center item and this is the window size so if you say 3 that means you get one item on each side so one needs something on the left side and 3 needs something on the right side and we just add that with the numeric fill element so the result here this is the first invocation of the identity function this is the second this is the third and we see each item in the middle and so this is just to show what the window is being called on so if you wanted to do a classical blur stencil operation you might say a quarter, a half, a quarter vector product with the data in the window stencil 3 and that'll apply this stencil this is a one-dimensional example yeah it's padded that with a 0 on the left and we get that number so each one of these numbers corresponds to an item of the array so you may have if you've looked APL up on the internet at all you will have seen this expression which is a little bit horrible because there's two outer product rotates being done and these two rotates first on the horizontal and vertical are just trying to do that stencil operation to give you the data which is around each point and then this logic here and the value of the cell or dot and with 3 and 4 implements the rules now with stencil we're going to be able to make life even shorter in APL because we can just say plus reduce ravel, omega, stencil 3, 3 on the array representing the board and if you think about that a little bit longer oops, only three minutes to go you can implement you can recognize that in fact life is also a more general stencil operation if you define edges to weights the neighbors with a value of 2 and the center is 1 then you can say well if you do a weighted plus dot times on the window then if the result is 5, 6 or 7 then you have you're going to continue that's good so you can build this array which is also a very common technique in APL you generate all the possible outcomes and then you just say well life is good indexed by the plus reduction of the weighted multiplication on a 3, 3 tile and we can get that to run fast both on a single CPU and also on a GPU we think and I have an example of neural networks how many people here working on neural networks look the slides up so you can you know you can apply sigmoid weighting of neurons to a moving window and so on quite easily with this we have models of this so you don't have to wait for it to make it into the interpreter and with two minutes to go so the indexing is non-functional at the moment and the brackets are really a very side effect oriented way of specifying things we're working on a new operator called at where you could say 42 at 2 1, 2, 3, 4, 5 and you'd get 42 in the second element or 2 multiplied at 1 and 5 or star at and then you can have a function on the right which needs to return a boolean needs to be a logical function and so this says star apples and then you can even do crazy things like this if you have a function that tells you whether a number is prime or not you could say star at where not prime 10 by 10 reshape away out of 100 and see how the primes are scattered I'm out of time so I'm really just going to skip to the to the last slide here the main point I'm trying to make is that you know APL are now subject matter experts to participate directly in coding so they don't have to tell you a story that you go and code and you make a DSL for them they can write their own DSL in this language and if time to market is important to you you can do that at least for the prototype and then if it performs like a dog you can always go in and help them speed it up the futures and isolates allow the users pretty direct control over parallelization which I'm working on trying to automate parallelization but that's like the holy grail people have been working on that for 50 years in APL it's much easier to detect parallelism most of the people working on parallelism today have these super clever compilers that detect loops in C code and parallelize them whereas here you can start at a much higher level and then you can do all the clever compiler things as well if you thought that was interesting go to try APL there's a full day APL workshop here tomorrow Roger and I will be introducing APL and give you time to play with it or go to one of these web pages and I don't know how to say is the next speaker ready to come up or do I have time for a question while okay while he sets up so as long as I'm actively removing myself I have time for a question or two maybe yes we have a microphone and then at the back why use APL yes well I mean that's still true right so maybe I didn't understand the question we feel that we have we have a language with no precedence rules where you can look at a line of code and with two days of training you can learn how to take it apart you can decompose it you can apply the kind of rules that Debashish and others have talked about you know you have all the functional qualities that have been discussed you have in fact something which I would argue is much closer to algebra and much more easy to reason about in that fashion than Haskell or Erlang I mean they have all the properties but the notation doesn't help you as a tool of thought to do the algebraic manipulation and I think Roger will I mean we'll have more time for questions maybe in Roger's session because he's going to be talking about APL we'll steal some of his time Nuresh can I ask you to manage the time here and decide how many more questions to allow oh right we have a coffee we have a break now oh there's a 15 minute lol oh okay so there's lots of time yes is there is there any any point in time when you see if there's a problem then you feel that this itself is not possible in APL you have to go to APL and then modify it for that you can solve this problem has anything like that happened before or well I like to note that APL is a generally purpose DSL which of course is meaningless it's a DSL for any problem which can be described easily in arrays and then I'll claim that everything can be done with arrays but of course that's not true you have problems which really legitimately require tree structures and lots of recursion and so on and certainly APL interpreters are not a great tool for that because the cost you're dealing with very very small arrays and the cost of using an interpreter is a very good tool of thought for that I think Roger has some examples which really show that it helps you think about the problem but there you might want to rewrite in a scalar compiled language to get the performance if you need to do a lot of tree tree traversals so there are you can do it in APL but it doesn't give you the benefits that I've been trying to sell to you one of the problems we have is that good idiomatic APL is shape invariant, rank invariant shape invariant so you can look at a function like that average function and you have no idea whether it's going to be dealing on an empty array or a hundred million floating complex numbers so we I mean we're trying to do we're slowly the APL community is really quite isolated and doesn't spend anywhere near enough time at events like this which is why it's so valuable for me to come here and talk to people we're trying to get into the same kind of compiler just in time compilation this was called on a large floating point array let's just in time compile that but unfortunately I think we allow the user so much power which allows them to move very very fast but we're thinking of allowing optional type declarations in the language in terms of when people get to a point where now it really needs to run fast they can transition to a compiled more static invariant we do also have some users who want type declarations for the correctness on floating point numbers only we decided don't need to do that the impact of that on real applications is typically quite small is the implementation using disillusioned users? no for the parallelization of floating point operations the interpreter is directly using multiple threads so that's very lightweight yeah we have a threshold where you can configure at what array size you want to the at operator yeah I can't show it to you so does it return a view or does it allocate a new array right so that's one of the real tricks to having a high performance APL interpreter you need to try and recognize all the cases where the ref count to an array is one so when you all the mutations are I mean we don't have mutable arrays all the arrays are immutable but of course most a lot of APL programmers are doing modeling where they have 100 megabyte array and they change one number in the middle of it so we spend a lot of time tracking ref counts and making sure that if the ref count is low we can do all sorts of tricks so if the ref count is low means one yeah so we haven't implemented that yet that was also a model and it's only going to be really valuable to our users if it can recognize the ref count one and do it in place otherwise it gets too heavy it really depends on what you're doing so if you write loopy APL code that runs around and handles one number at a time it might be 100 or a thousand times slower than C it's faster than Python if you use something like key where we've written optimized cases we have one byte integers, two byte integers and four byte integers and one bit booleans and the interpreter in fact spends time trying to collapse data so not only does it automatically promote data it will actually when it does a garbage collection and at various other times it will actively demote data so if your floating point array goes to all zero it will become a bit boolean and consume 64 times less space and that has meant that people who have tried to write compilers for APL have really struggled to beat the performance of the interpreter and also handwritten C often won't compete with many APL applications because we've figured out that it's a one byte integer and you'd have to be pretty crazy in C to start coding special cases for all those data types so we've been optimizing the interpreter for 35 years so good luck yes okay that's probably it I collect all my stuff thank you