 Hello and welcome. It is October 30th, 2023. We're here in active inference guest stream number 62.1 with Michael Carl and friends. Today, the talk is called Deep Temporal Models of the Translation Process, a translation agent grounded in empirical data. We're going to have a presentation by Michael, followed by a panel discussion. So if you're watching live, please feel free to write any questions in the live chat. Otherwise, we're looking forward to this presentation and discussion. So thank you again for joining Michael and friends to you for the presentation. Yeah, good morning. So my talk Deep Temporal Models of the translation process, I would like to outline an idea for an architecture for the translation process. And to do this, I have three points here. First, I would like to introduce translation process research, what this is, and the data that we have collected. I will show the data acquisition tools, visualization and segmentation. Then I will come into an architecture that is based on these partial observable Remarkov decision processes and that uses the free energy principle. And in the end, I will talk about an experimental implementation and initialization of those parameters. Okay, before starting talking about the translation process research, here is a small contextualization of this field of research. So it started around 40 years ago and it's all about investigating what happens in the mind of the translator. So how do translators produce translations? How is it possible to have two languages in one mind? What makes translations difficult and so on? So this type of research started around 40 years ago with crings around the beginning of the 80s and they were using introspection and think-aloud protocols that was a labor-sum manual analysis that went along with this. Then 10 years later, about Antluki Jakobsen devised a key-logging tool called Translog that would record keystrokes. Translator would sit in front of an editor and type in the translations and the keystrokes would be logged. Then 10 years later in 2005, eye-tracking devices were added to this device. So with an eye tracker, the gaze on the screen would be recorded in time and with a key-logger, the output. So we have an eye tracker recording where on the screen a translator logs and the keylogging log what the translator writes. So we have somehow an input and an output. Then in 2010, around that time, we started collecting all these data into a database putting the data that we had collected by then in one database and a couple of years ago we started having this with a browser interface and Python and R interface and so I liked to call this translation data analytics because it resembles a little bit data analytics these Python tools and the large database that I will talk later a little bit about. So our research is on behavioral measures means keystrokes and gaze data but there's of course also other kinds of data acquisition like brain activities, EEG, fMRI and so on but I will talk about this part where we have key-logging data and eye tracking. Okay, so how does it work? So this is a picture of the kind of setup that we have used quite frequently in this research that you see here on the screen, TransLog2 that is an editor. So on the top of this window here you have a source text window that translator is supposed to translate in the lower editing box here and by typing there the keystrokes would be logged in time so every keystroke comes with a timestamp and one could then look into the rhythm and the rhythm of translations when does a translator type fluently and so on when do they pause and then in addition we have here an eye tracker we have light bulbs here this light is reflected in the eyes of a translator and there's a device, a camera tool here which then would record and calculate where on the screen the translator is looking and so we have the two things we have the input to the translator so basically the data that goes into the translator's mind and the keystrokes that come out and this data can be then processed and represented in several ways and one way of representing this is shown here so this is a translation progression graph where we have on the y-axis here on the left side we have the source text that consists of 160 words it's an English source text and that is translated into a target language and then on the x-axis horizontally we have the translation process data and then first we see here that the translator quite tightly quite nicely reads the entire source text the blue x's here are fixations on the source text you can see how the translator reads the source text that's the orientation phase then we have here a drafting phase where the translator basically drafts types in the translation and the green dots are then fixations on the target side so the blue dots are the fixations on the source text the green dots are fixations on the target side you can see how the translator goes back and forth between the source and the target side and then behind here you have black dots and red dots are the insertions up in black red are deletion so we can trace how the translation evolves this would be the drafting phase and then in the end we have a revision phase where again the translator reads in this case almost the entire text mostly with the eyes on the target side so we see many green dots pluses in this case here so this is a one translation style where we have nicely these three phases but other translators behave quite differently so for instance this translator here as you can see here reads a little bit one and a half segment and then starts typing in the translation then again here in segment three reads the segment here the source segment and then types in and then goes on quite fluently with only very little look ahead in the source text immediately reads and types out the translation with no larger orientation phase and also no revision whatsoever it seems so if we zoom into this part here of this process it looks like this again we can see on the left side here the source text English source text and on the right side a translation to Spanish in this case and you can see here that all these words are aligned so the source English source words and the Spanish target words are aligned on a small level here we for instance we can see that there is two words in the Spanish side aligned to three words on the source on the source side so we see on the here on the horizontal axis we see the word alignments between the source and the target text and again on the here in the graph the translation data so the process data so first in the first couple of seconds the blue dots signal or indicate a reading on the source side a source text trunk so we can see that the translator takes in some new information from the source site here from the source text and then types out here a translation with the eyes on the target side reading here these words then something happens and you can see there was something typed and oh and to deletion of an oh and so on so there seems to be some kind of typos a little bit of problem and then in this moment the translator then moves with their eyes from the source text the target text to presumably to check what happens on the screen here and then goes on smoothly typing here again something seems to happen but it looks like something different a little bit because maybe the translator has an intuition that something went wrong here so we see these refixations on on the target side last normas okay around here and this whole stretch and then there is a deletion and once this deletion is somehow these translation hurdle or kind of a small problem is solved the translator goes on smoothly okay so the so we have two kinds of streams here we have a stream of keystrokes the translator types in translation so and we have a stream of gaze data where the translator reads stuff and these two streams are synchronized as you can see on this slide here but they can be also fragmented segmented in different ways and one way of fragmenting the stream into units into yeah into into units of translation units here in this case is by looking into the pause so here we can see there is a pause a keystroke pause no keystrokes occurred for more than one second this is the white stretch here and then in black these black strap boxes here would indicate coherent typing so no keystroke is no keystroke gap more than one second so it's kind of coherent typing and we see that there's pausing and typing pausing and typing which is an observation for all kinds of text production that text production occurs in in form of pausing and typing and it's assumed that in the pausing in the pauses the translators or the writers take in some new information or think about a kind of a problem or something and then they you have these bursts these typing bursts in which the text is produced and the sequence of these pauses and typing would be a translation unit and Fabio Albers here in this panel has started looking into those a couple of years ago with some interesting results so you can see so this there's here in this case there's not only a sequence of pausing and typing but also we have here the gaze going from the source to the target to the source to the target and this looks like quite independently while the guys move from the source to the target and so on the eyes the hands type out stuff or not so we can then chop up this kind of data also in smaller units depending on the coordination of the eyes and the hands and here we have called this a translation units sorry activity units where we can see here a unit or a stretch of time where the eyes are on the source text only here we can see a stretch of time where the eyes are on the target side activity unit of type two and then we see here different units where the eyes go back and forth between the source and the target side of the window of the of the of the screen while typing so we have activity unit five is typing while reading the source text six is typing while reading the target text so you can see that the eyes go back and forth between the source and the target while typing or while while reading with no typing so this is one way of chopping up the activity the the translation process data and recently we have also started annotating what we call higher level translation states so the assumption is that the translator can be in a in a phase of orientation so the translator can take in information and orientation the or they can be in a in a state of flow in a flow state in a state of fluent production where the text is produced or just like here they they can be hesitation so some surprising things pop up some some intuition is something went wrong and and they they come into a state of hesitation where until until the hurdle is solved and then they go on with again a flown fluent translation or an orientation so this was a kind of an annotation that we had previously but before now going into the the modeling in in terms of an hierarchical architecture I would like to show how we have already annotated or analyzed this data previously here you can see that we looked into these activity units means these types of units and we can make a distinction between reading a translator can read the source text the translator can read the target text translator can write a while can write or they can pause so where there is no input and it's a completely connected graph as you can see where a translator can switch from any one state to another so means we can we can look then into bigrams of these activity units here to figure out what kind of typical situations we can see and some time ago a couple of people did this kind of research so looking into bigrams of activity units and then see what are the typical transitions between these activity units and we figured out that expert translators for instance move more frequently from the source text reading to typing so they cycle around this way where when the text becomes more difficult we can see that there is a reading and source text reading and target text reading there we see some somehow slightly more cycles around this way and then in post editing the eyes are more often on the target side and writing so we can see cycling the translator cycling around this way okay so this was one type of analysis that has been conducted but as you can see all this is on one level so this is just a flat kind of a bigram architecture assumption there were other ideas of that there is a probably multiple concurrent processes going on in the translator's mind and this is the monitor model that we suggested like around 10 years ago more it's and I and others too and here the idea is that we have two different kinds of processes one horizontal process in which translator engages in kind of automatized routines by producing a translation and these while producing these translations a monitor would loom in the background and in the back of the mind and check whether the produced translation works out whether this is in line according to the goals to the to the translation goals and to the requirements of the translation and if something goes wrong the monitor would interfere and stop this kind of horizontal processes and and interfere with remedies or with changing changing the output so the monitor model but it's not the only model that has suggested this kind of architecture there's actually quite a few similar suggestions here is one from Anthony Pym in 2017 he talks about a cruise mode and a bump mode and then he says the cruise mode is the default mode that's where the translator usually wants to be in a cruise mode in a in a flow state and then until they will stay there in this flow mode or cruise mode until something goes wrong a bump occurs and then the translator is pushed into this bump mode which requires some attention and he suggests that there is a couple of strategies how a translator can come out of this bump mode back into a cruise mode and he says he has several suggestions from simple to more complex here low effort to high effort and then the translator interferes more or less in uh in the in the output in the in the target text and here he has a couple of strategies how to do this there were other suggestions already from 87 there this clinic talk about a talk mode a talk block and rest block or good about s mode and i mode but there's a so you can see there's a couple of similar ideas people named this slightly differently so before going now into the architecture i still want to talk a little bit about the translation process database which we have used in our study so the translation process database is an effort as i said before that started around 2010 so 13 years ago or so and it has lots of data from different studies translation phd projects and similar studies so more than 90 studies are collected there there's more than 6 000 translation sessions with mostly they all have key locked data so all these translation sessions are locked with key strokes in different not only in transloc too but there's different kinds of data acquisition tools and mostly they have also eye tracking data so all these data together makes up several hundreds hours of process data there is a very many different translators and we have different translations into more than 10 different languages mostly this is from English English into all these different languages that you can see here and in different kinds of modes also mostly from scratch translation but very much also post editing machine translation post editing that's the situation where the translator instead of producing a text from scratch already has a pre translated output from a machine translation system and then post edits means takes out the errors and so on so but we also have a spoken translation interpreting side translation and dictation so this is the content of the database and this database is publicly available it can be downloaded from source forge free of charge and all this data all these more than 6000 sessions are aligned on a segment and a word level and we have extracted features from this data that is represented in 11 tables I wouldn't talk about this here with more than 300 features so a very rich database publicly available but if researchers are interested they can also have a private account with a browser interface they can upload their own data and then we have an interface to Jupiter with R and also Python X and some some toolkits to extract the data all this data from the public part and from from their private data and integrate this and we have a toolkit to work with this data now I would like to come back to this data that we have annotated here so we have these three phases okay orientation flow state and hesitation and we have in an experiment uh uh annotated six texts with these three states and we had two annotators t and y they annotated both these six texts so six text each text about 150 words so this is almost around a thousand words or so and these six texts consisted of 1815 activity units of type one to eight so we have six types or six different types of activity units t one is reading on the source text t two is reading on the target text this is typing only typing with eyes on the source text on the target text and this is pausing so we have a total of this 1815 activity units and they were annotated with the s o and s labels from these two annotators and you can see here that mostly in the s state so in this flow state we have one of those states that go along with typing so t four t five t six whereas in this hesitation phase we have mostly 42 percent chance that we we see the eyes on the source or the target side whereas the orientation phase is a little bit undecided here then we also checked um the um how well these two annotators coordinate how well is the how how good is the translator inter inter annotator agreement and they agreed best in the flow state eight so you can see here that the inter annotator agreement is best in this fluent translation that was the most easy to detect at least controversial and according to this chart here they were most confused with this orientation phase which then sometimes the other translator would consider this to be an s or an h state okay so this part then about the data and I would like now to talk about how we managed to get this into into this architecture and first a word about this free energy principle and what that could mean for a translator here is this picture shows that there's a translator working on a text and then we have this text in the environment and the idea of this free energy principle is that a translator comes into a situation with some expectations so a translator usually is told what this text is about what is the target language what is the the topic so that the domain of the text and so on so they come with some kind of expectations a distribution of predictions and then they come and observe have some observe observations and there can be a discrepancy there can be a larger or a smaller gap between the observations and the predictions and then there are two ways to reduce this free energy so the amount of discrepancy between this this expectations and the observations is free energy and this is supposed to be or should be minimized this discrepancy and one way to do this is to interfere with the text a translator can interfere with the text so they can produce a translation there according to their preferences thereby changing the environment and then observe something that they expected and and they can then also change or and or they can change their internal model right which would then change the predictions and ideally a translator would be in a situation where they have some predictions and then some actions that goes along with their preferences then they observe something that they have predicted they go along the minimally change them the beliefs again and they would be in a flow state going like this smoothly predicting continuations of the translation and observing that it was also correct their predictions this can be formalized in the free energy principle with a free energy principle as follows so we have this q is the distribution of the predictions the p is the distributions of the observations and if these two distributions are identical these divergence would be zero which would mean that that yeah so that the two distributions are identical and the other part is the evidence which is a translator produces something there's an action this alpha here okay and if a translator would try to maximize the probability that an observation of a of a translation is increases right within the given context of the source text so this would be the evidence so a translator would I would try to maximize this evidence or to minimize this divergence in order to arrive into a state into a flow state or to maintain this flow state okay how can this now be a modeled in in an architecture so I would like now to go through these three states here through in a drafting phase so I'm talking about a drafting phase we have the orientation a flow state and the hesitation I would like to go and and look into how this could be modeled in this partial observer like a model and so here we can model this or assume that there is a kind of a Markov process here we have an observation then a transition into a successive flow state then a transition into a hesitation into a state of hesitation here and in this orientation phase a translator would read a piece of source text and by reading this they would allocate some internal resources they would think about what how the translation could be they would have an idea whether things whether they first should look up some resources some words and a lexicon or so they would read in a chunk of source text get a gist of it how to translate it and in initialize some kind of internal resources that I will look into in a moment and then execute and then be in this flow state and execute or produce this translations more or less fluently while doing this a monitor process would run in the background and whenever something goes wrong or is they have the intuition that something should be different the monitor process would stop this flow this fluent translation and kick the translator out into another state in this case here in the hesitation phase again we have an initialization of the resources and so on so we have on this level one kind of a Markov process and if we zoom into this part here then we can see that the so according to this to this partial observer Markov decision processes we have these four resources the abcd matrices and the translator would then initialize those matrices check first check the translation whether there's a translation hurdle here in the observation then initialize somehow translation equivalences and these matrices here okay how would that work if we look again into this part so a translator comes out of this orientation phase and goes into this state of fluent translation here we can see again that the translator reads or types in the translation while reading while the eyes stay at the places where they type here in this case on the south side then goes the target side and so on so if you look into this we have here a small segment the breakdown of traditional norms in english okay so the breakdown of traditional norms which is translated into spanish and you can see that the typing in of the translation is in the order of the spanish word order so they would so they would translate in the spanish word order while going with the eyes back and forth to try find resources what is the next word to be translated okay so how could that be formalized in this formalism again we can we can think of this as a transition of alignment groups so the translator has us initialized these alignment groups and then in the order of the target language syntax so in the order of the french final outcome okay and then they there would be initializations of these transition matrices and the likelihood of recognition of the observations and so on so this would be an initialization of these processes but in addition we have active inference that predicts the sequence in the future so if a translator is at a certain stage here in alignment group two for instance to produce a translation for this alignment group the idea of this active inference is that for at each moment there would be a possible path there is a possible path in the future and and and a translator would then compute what is the best the most likely or the most effective path to proceed so what are the transitions in the in the future and this is called the expected free energy that is computed here based on a set of preferences and so basically if you put this together i will come back to this a little bit later when i talk about the initialization of these matrices here but to have an overview here so we have here on this level we have these alignment groups and transitions from one alignment group to the next one here and on this level here we have this active inference part where kinds of best path is computed based on preferences and if you plug this part into our previous architecture it could look like this where we have again here allocation of resources so so the checking whether the the alignment groups how these alignment groups could be initialized then we have here initialization of these matrices and a kind of a of a fluent state in which a translator goes over produces the translation while monitoring what comes or whether they are in a good shape here okay we can see that there are two levels one level here a state level and another level an activity an activity level where these activities actually take place so assume that the translator was kicked out from this flow state and ends up in a hesitation state here something very similar could happen the initialization of these parameters first and then if you look back into this part of the process here the the assumption is that the that something the translator has a feeling that something is wrong with last normers with this part of the translation and therefore looks very often back and maybe thinks about how to go on with this does some remedy and then goes on so how could that be represented I think or maybe it is it could be so during this phase a translator would think about all kinds of adjustments of the model so it could imply the adjustment of the transitions so maybe a completely different translation would be chosen so adjust the translation and then accordingly adjust the actions what to do next and accordingly change also believes what the translator will observe when changing all these parameters here so we could plug this into this overall model again and we can see that we have here these two kinds of levels one state level and one action level and switching back and forth between these two states or better say a concurrent activation of these two two levels of of activation so to sum up this part of my talk here is the idea of the three states so we have an orientation phase where a translator takes in new information that is an epistemic affordance basically adjusts internal model checks whether every or all the resources are available to go on to start with the translation then we have this fluent production which which is a flow state which is a pragmatic affordance where the actual work is done so here we have a fluent typing where the observations where the where the production the the actions that take place are in line with future observations so there are these these actions here do not produce his surprise or something like this and then we have this third state where a hesitation where a translator is kicked out somehow from this flow state because something went wrong and then this hurdle needs to be remedied so something needs to have to be happened in order for him or her go back to that flow state okay so this is basically the idea so far and and we have now tried in a small experiment to initialize these parameters in this partial observer Markov decision process so I would like to talk about this part of the process where we have initialization of the translation equivalence the alignment groups here right in this reading in this reading and then these abcd matrices are initialized I would like to talk about this how this could be done and then in particular also look into this place here where we use this e parameter to supervise to indicate whether everything goes wrong according to some habits and goals okay so first if a translator allocates these resources we see something like this we can see something like this assume that we have a situation where the person translates from English into Spanish so we have here an English sentence translated into and this is happening in the mind so the translator allocates some kind of resources and has an idea to translate the sentence into this Spanish one and we can see here that we have alignment groups 1 to 8 in this case but they are ordered already in the order of the target language syntax you can see here these three first words are quite monotonous translation but then there is some kind of reordering here in the end about the translator mentally reorders these alignment groups into the Spanish order and allocates somehow this Markov process here so this would be perhaps the first step and then we have in this model here a kind of a prior so this d1 is the prior what is the prior of the first action to be taken and this prior would be that the translator starts with the first word to translate in their allocated chain here okay so all the probability mass is associated with the first word there is not actually not a choice to start somewhere else of course that could be different if they break up somewhere in the middle then this probability mass would be somewhere else but right in the beginning all the probability would be on word zero next we have the initialization of this preferences and here let's if we assume that the segment has eight words so zero to seven here then the preference is that the translator goes through the segment from the first word to the last word where the last word has the highest probability so this kind of probability distribution would would prefer the ordering going through through the the the graph in the order of the alignment groups then we can we initial the initialization of the transition matrices the B matrices could look like this so for the moment we have in this toy implementation three kinds of actions we have insertions a translator can insert something delete something or read three kinds of actions so we have three kinds of different transitions from one state from one alignment group to the next one and here in when inserting the translator would like to go from one state to the next from one alignment group to the next alignment group here so means that for instance they are in alignment group two after the insertion they would find themselves in alignment group three but it could be also the case that the writing process the word is long for instance or the translator only writes a small part of the word that after a writing event after a typing after the insertion the translator still finds itself in alignment group two because they didn't finish it so some probability mass is still left over for this possibility here or it could be that that this word was very short or maybe it was a very strong collocation and the translator wrote in one go more than one word and they could find themselves after typing in alignment group translation for alignment group two they could find themselves in alignment group four here so there's some probability mass left over distributed for different kinds of possibilities something very analogous here with the deletions when deleting the translator would go backwards now from alignment group four for instance to alignment group three but it could also be that there's a half a word deleted or that there's a couple of words deleted so there's a probability that the translator will be end up in a different alignment group so they can cycle around the same alignment group or they can jump over one and so on and if reading something nothing happens in the output so they would stay in the same alignment group but there's also a probability that actually they will find themselves in a different kinds of an alignment group and this can be the case for instance when here this likelihood of recognition is the a initialization of this a parameter so that is the probability of the observation being in a certain alignment group given they are in a certain state here so assume the translator is in alignment group one they could it could well well be that they have input and they observe that actually already word number three is produced or something else so that would be this probability but the highest probability would be according to this initialization that they the observation is that they would always see observe they are in a state which they are actually in right but it could be differently too okay so that would be the initialization of ABCD matrices and now I would like to talk a little bit about this E matrix and we have used this I don't know whether this was intended to be so in this architecture we have used this E parameter to indicate whether a translator should perhaps jump out of leave this flow state and go into another state and in the following way so so so the idea would be that if something here the the future is planned right so the the translator thinks what is the best possible path to pursue in the future and if there is no clear distinction if there is no clear idea what happened in the past or in the future there would be some confusion some kind of high entropy perhaps and this would indicate that maybe something goes wrong and the translator would leave that flow state and go into another state and we tried to I tried several ways of how that could be kind of computed and one way would be to look into the past so we are assume we are in in this alignment group two here and this pie would then predict what the next state and the next alignment group is here and it would take into account the past the the what was already produced and we so we have the past the alignment activity unit one and two with their respective states and from this past observation they would predict what is the next state what is the next state the top level state here and and it turned out that this doesn't work very well so this probability distribution just gives us by chance distributions but if we take into account also the future so means what is anticipated how this path continues with activity unit three and four in the future then this gives us a probability above chance whether a translator should stay in this flow state or go out and do something else okay and that's basically what I would like to say I just thought this is very surprising these that the future actually the anticipated future helps a translator to decide whether they should move from a flow state into a hesitation in this case okay to my conclusion would be like this so this is a very preliminary implementation and then the idea would be that we actually would need to implement still a layer zero where the actual insertions and deletion takes place so this has not been done or only very roughly I didn't present this then we would need an evaluation so eventually we have this a large database more than 5000 actually more than 6000 sessions and we could train all these parameters on the different sessions and we have and then we would need an evaluation method somehow to evaluate the output of this agent and to see whether they can simulate as the real data to what extent they can simulate this real data that we have then we have also a fine grained that's what we are trying to work on right now a finer grained classification of a hesitation a hesitation could be due as we have seen in this particular instantiation that the translator rereads the target language target word very many times but we also have instances where the translator goes back and forth from the source to the target side compares obviously something or where they where they reread pieces of text and so on and all these different kinds of patterns would indicate different cognitive activities and maybe they could give some indication what leads to the hesitation what happens there and how to model this in a more fine grant then eventually if we have an evaluation method then we could also implement some learning so we can learn all these abcd matrices dependent on translation styles expertise and text difficulties for instance so in this database we are not only have the data but we also have metadata so we have for instance information about expertise who was it who produced this translation so we have we could model expertise and we would assume that trans or we already know that expert translators produced translations in different ways than novice translators do then we have different translation styles are right in the beginning I showed two styles so we have this planner who carefully reads first the text and then and then starts typing or we have this head starter different translation styles and we could see how we could modify this abcd matrices in order to model these different translation styles then we have this planning horizon this pie so how much in the future does a translator look and and what are the effects how for instance does linguistic structure play a role there's there's some kind of controversy in the in the in the translation community as to whether translators take in full linguistic phrases or whether they only look into some words ahead and to what extent so the linguistic structure plays a role in planning the horizon and then of course we have this whole other work package to annotate the data so as of now we have six texts annotated but we can also do this this is all the six texts are for English into Spanish and we have very many different languages we have also different translation modes we are also considering to look into post editing right now how does this whole idea scale over to post editing modes and there are certainly many more ways to analyze the agent in future and that's my my conclusion at my end thank you very much for listening and if you have some comments or questions please go ahead thank you Michael that was awesome that was really cool and it's cool to see how it's developed from many times talking with you about it in the textbook group so to kick off the discussion how about Moritz and or Fabio please feel free to say hello introduce yourself and give any opening remarks or thoughts whichever one of you wants to go first all right can I start please all right and so okay so hello Michael Daniel and everyone watching us well you know Michael so I work with translation process data you mentioned some of the work I did on micro macro translation units but what I was really curious to discuss with you is just this final idea that you talk about planning horizons right so and in your example I mean from a human perspective right not from the kind of implementation that you're proposing here that's how we could just see one thing in relation to the other so people usually talk about you talked about linguistic structure for the planning horizon take right people usually talk about function and content words which is perhaps kind of easier for people to understand but in the work I do and then you and Moritz have already taken up on it so we talk about conceptual encodings and procedural encodings and hybrid encodings so in terms of in-threshold processing conceptual encoding would constrain in-threshold processing and an example you gave there was just like las normas tradicional is the traditional norms to what extent syntactic priming you know so in a way guides this kind of planning horizon in there the other thing would be conceptual encodings these expand in-threshold processing to a sense that they can be enriched and just like you could say several four normas las normas tradicional you could have guidelines or rules or instructions so this would be open right so to what extent this idea of conceptual and procedural encodings constrain or expanding in-threshold processing would have an effect or an impact on the planning horizon that you propose at the end is that something that you have taken into consideration or is it something that one should take into consideration as far as the work or is implementing your proposal is concerned so that is for me yeah certainly so I would assume that this functional words go along with a whole frame maybe they're not standing alone but they modify a frame so yes so we didn't really we didn't really look into this so the all this we I mean we are a little I mean quite ahead so this is something for the future but I think yes your idea is a good idea to see to what extent function word and content word would change the horizon and I would assume if there's a function word that would imply their horizon would be further right so there is this some ideas from this construction grammar that a whole frame must be available before a translator can actually produce a translation so they must have a whole frame whatever that is whether it's a noun phrase or a clause or sentence or so that must be available before a translator can produce a translation so I was thinking in this line but of course this also implies your idea with function words and content words right they are somehow encoded in these frames yeah so so this would be maybe one idea one could look one could see to what extent these horizon requires complete linguistic structures or maybe only part partial so we are talking here about the source segment right the source segment but yeah I guess there's much that one can do and experiment with and see also the interaction of this horizon with the other kinds of initializations how does this interact I believe there's much interesting stuff to test when this agent works a little bit better as it does right now yeah most certainly I would be interested in following it up because I think there is kind of very interesting thing in there for us to look at yeah okay yes yes so I think it was 2011 that I saw my ever first ever progression graph in Copenhagen and it was like seeing the light and suddenly there was a possibility of working with a kind of data that wasn't available before then so I think the database which back then wasn't a database of course it was just scattered experiments and so on so it has come a long way and I think that's great and soon we'll have an agent possibly so yeah I hope so yes exactly so so I have one comprehension question and your slides weren't numbered but where you essentially map your the data onto the free energy principle situating the translator as an agent in this equation right and the source text reading takes the place of perception right yes yes and maybe I'm naive in that sense but reading or generally comprehension of linguistic material there's quite a bit of evidence that comprehension engages the very same production mechanisms that are at play during yes during during language production so I've always found it difficult theoretically or empirically indeed to separate source text reading processes from target text production processes but obviously right typing is typing and reading is reading but still the kinds of processes that take place while reading the source text I mean we have co-activation that is a reality it's it's I guess beyond doubts that target language aspects play a role during source text reading and and on several slides in your talk you've intimated this this quite close link between the two processes so how do you deal with that in in in this agent or in this framework yeah so this separating out into these two factors this is only one possibility how this free energy could be reduced so but both kinds of parts contribute to this reduction and so yeah so there is some kind of anticipation in the reading that's I guess what you're saying no so you you read something that you want to read to reduce that free energy and I think that's that's implied here in this evidence I'm yeah so I'm not really very sure what your question is but the idea would be that we have these two kinds of possibilities it could be factored differently or formalized differently but basically there are two of these two possibilities either you change the world to make it more closely to your preferences or you change your internal model to make it come closer to what you observe so so how the external world or the observations present themselves of course there's a kind of a dependency between these two so there's a predictive processing I guess that's what you're talking about right predictive exactly so I mean also in your publications you talk about adapting the world as producing a certain target text or adapting the target text once it's written and so on right but but that very same principle I think can apply to the source text reading process right I mean I don't think it's a problem for this model or anything I just I just wonder how how how you would translate that into your in your implementation if if I'm correct the action part of of your model is the actual typing right but if reading is an action then might change how you implement the whole thing or not maybe I misunderstand could I give a thought on that please yeah yeah well there have been many active inference models of reading and in those settings the action the affordances are the icicle and so there's icicle across this is even finer scale than the progression graphs that were shown here to differentiate letters to reduce uncertainty about words differentiation of words within a sentence and so on within a broader narrative context so that's the kind of action as ocular motor movement and then the hidden state the external world state is like the semantic position of the sentence and then on the outbound from the agents the actions could include typing and so one could either abstract or coarse grain away and just say we're just going to accept the words as observations or you can go finer scale with the actions being the ocular motor circuiting which might be important in the case of like dyslexia or can just accept the word straightforwardly as an observation and then it's really interesting to think about what is the task of translation in that setting and how do the language models that we have today help us understand this kind of semantic transposition from one language to another which many have written about and thought about like if there's a street called main street do you translate it as main street or do you change main but leave street or do you leave one without the other there's a lot of degrees of freedom in the understanding of the task but you're absolutely right that every model and active inference you have a kind of inbound observation like modality an outbound action selection approach and then for hierarchical metacognitive models there can be action that is internal like attention so then those don't need to be overt activity like typing or eye movement so in that sense in these models I guess reading a text is is is understood in the terms of generating that text really right the icade could be understood as not seeking out necessarily the most rewarding section of text but reducing uncertainty and so the agent has some generative model of text a deeper structure of language or even the deeper structure of of the world their core knowledge priors and then their activity is selected to reduce their uncertainty about the text and so their generative model is is kind of reconstructing the semantics of the text as they sort of it's like picking up an unknown object and just touching it to familiarize oneself with the semantic shape in this case and that's a generative active process it sounds to me like a like a deconstructive process of reading right the text is not a static entity you're actually changing it by reading it and and so on yeah but I'm sorry I interrupted you Michael yeah yeah so I I think this could happen in what I call this layer zero so that is not we didn't really address this right so this would be something still on a lower level and yes so I'm I'm not really sure to what extent gaze data should be modeled there and typing so certainly in this layer zero we have typing activities but to what extent we also should model their gaze activities maybe that's a possible tool and how they interfere maybe that could be modeled inside this layer zero down there letting basically unaffected the stuff that I have presented so far but perhaps it's also like this that if we implement this and have some ideas it could maybe affect the other layer so this architecture could could be open and stuff can be changed no so I don't know one would need to try all this but also our data that we have the granularity that you suggest is not really suited so this I guess if we look into saccades and exact reading patterns and so on our data that we have is is too noisy for those kinds of modeling I think we have rough so there's often drift and as you know no there's often drift in in this case data the gaze to word mapping is not always very precise only only approximately so I'm not really sure how these things could be modeled then in this in this lower level layer zero or so but maybe it could also be like this if we have a set of data that is very precise that you have recorded with with much more controlled experiments maybe this could form a kind of this kind of processes that you allude to and then on top of this there could be the other processes I don't know but this would be even a step further yeah I mean there is definitely noise in the data but it's it's certainly less than 50 percent noise right I didn't I mean it's difficult to assess this no it's difficult to with um headsets coming into broader use there may be millions of eye tracking speech recognition and production data sets so articulating the structure and the kind of adjacencies these models is very important also you could model that noise in the eye tracking assignment as a kind of sensor noise and so yeah of course noise is going to degrade the quality of inference relative to a noiseless observation but there's still vastly more information than not um right I'll ask some questions from the live chat okay Dave Douglas asks English default noun phrase word order is precisely the opposite of that of Spanish which we saw in the example a naive natural language processing theory transformational grammar for example would predict that translators use a stack in their brains to push and pop phrase components do your data see fulfillment of transformational grammar or does the stack hypothesis match behavior for short phrases but break down with longer phrases well we didn't I mean in this architecture we don't talk about these linguistic theories um but it could be so maybe in this so that leads to the a little bit the same question that Fabio had now what is what what is about the prediction the the pi what is the horizon the prediction horizon and whether we can use you know relevance theory with with with with function words and with content words or we use a construction grammar with frames and these frames should be fulfilled or we use chomsky stuff all this could be tried I don't have any reservation with respect to any of those so all possible or we could just you know look for n-grams right um irrespectively of the type but yes so that's what we often see in the data actually that a translator if there is a reordering like this right so so assume that inside a monotonous kind of a translation there is suddenly required a reordering of the words into the target language then we can see that the translator looks ahead exactly that amount of text in the source language that uh accounts amounts to that reordering so a translator has a very good feeling for um for this the structure and for the restructuring and from where to get the necessary data in the source text where the eyes need to go to get the necessary data to continue translation so this is amazing how precisely um the translators can look into this data and pick out this information that they need in order to continue that is I mean in experience translators I find that very surprising if if I'm go ahead please please if I may I have I have one one other question about um the how how how likely it is you think that um the kind of agent you you will hopefully soon produce generalizes to other tasks other than translation ah but Daniel you brought this into into mind but I mean you know sort of the task of translation I mean there is Jacobson the very old sort of quote from Jacobson right sort of not not aren't lucky Jackson but Roman Jacobson right I mean sort of you know is is comprehending the world an act of translation I mean you know that general but but to what other task can you see an application of this agent and the principles that govern it well I'm first of all busy with trying to understand the situation right now and applying it to our data um yeah so I don't know well I mean basically all kinds of text production first of all but I don't know so we are talking about linguistic events here I guess yeah well once coding C sharp coding or ah I was not thinking of this that is yeah well it was a question from the live chat from upcycle club wrote I also have a question would it be possible to extend or modify this model to cover transcription so transcription where there's an auditory modality so different sensory input at the at the kind of embodied layer of the model but after it propagates into a semantic layer there might be something like a more general linguistic nexus where whether it came in through through the auditory sense or the visual sense and then whether it was going out through typing or through speaking there might be some shared skills or it may be a hypothesis to test that where do these different language skills um relate to each other one model would be there's a single linguistic nexus sensory input independent sensory or action output independent another extreme hypothesis would be that there are totally different or very different cognitive architectures that are apt for these different linguistic tasks no well we actually have spoken data in our database and the way we have done this is so there is a speech a spoken signal and we have assigned a timestamp for every word so we have a transcription of the spoken signal into text and then assigned a timestamp for each word and then it really doesn't matter in this database it just looks like a typed translation right so because there's a timestamp and there's a word and you know a sequence of characters and whether that was initially typed by the translator or spoken doesn't matter so we know of course in the analysis and the analysis is different and there's a couple of papers also that we have with with the spoken data only recently one paper where we have a side translation with text side interpretation with text a quite complicated setup so it's it's possible and well i guess that's the assumption somehow similar processes take place in the mind of a translator when whether they use this mode or that mode um well maybe not maybe it's different but in the database at least it it looks the same and i don't see a problem to model this uh in a way but then also this architecture is not meant as actually a translation system so this is not a replacement of a machine translation system it's not meant to be like this it's rather meant to assess the effort so what is the process of of the translation so so it assumes i mean in this example that i gave we i assume that there is a translation available and this agent would then just reproduce how effortful it is to produce that particular translation so where does the i go for example where are pauses where do problems occur where does a translator get stuck all this but given that there is already a translation it would not be a new way of machine translation but rather orthogonal to this looking into the process how the translation comes into being that's what the agent would model i well at least that's currently my idea or both of those orthogonalities could be embodied in an architecture that does translation as a cognitive process and observes its own translation meta cognitively to track its efforts or maybe to improve its learning because definitely as a language learner english and otherwise when you had policy oh just plan for longer or just it's like but tell that to a beginner language learner it's like well i may not have the agency to simply plan longer or i might not have the linguistic affordances to conjugate this word in this way yeah so how do you think this kind of model of of linguistic effort helps us understand language learning yeah so there you have it there you have another application and i can completely see that fit so how does language beginners learn this know what kinds of processes go on there how can we model this in this agent and what in what consists progress in language learning so what kinds of parameters change as a language learner goes from the very beginning stage to a more advanced stage how can we model this in the abcd matrices also in the look ahead parameters and so on i think that that could be completely fine example different from translation how this agent could be used yeah interesting good idea who would who would own the agent own yes so intellectual property rights yes i mean this is worth money right i mean benefits from this yeah well it's still a long time ahead but until now all the database was made available free and openly and but on a non-commercial basis i think that's the license but first it would need to be developed but if it's really worth money so i'm very happy if you want to buy a share but this is a very interesting question and it's it's pragmatic it's not linguistic conceptual or ontological but this balance or blend of well what is open source in the active inference ecosystem the packages like py mdp and the analytical framework and the education and these components are open source and then how does that work with what people develop in what ways is the active inference generative model development ecosystem similar or different than say the linux open source development environment maybe some people do just want to put things out there with the creative comments or with with a purely open or with a non-commercial open license but these are major major questions that come up in in every area of modeling yeah i think it's it's a big question and there's no easy answer right well my my my impression is i mean it's far far away right now from that situation but there could be an open version of this agent and then if somebody wanted to use that for special purposes there could be somehow i guess there are many business models like this no they rely on open source stuff and if they tune this fine tune this to special applications then the kind of business interests come in i i think there's many ways and many many many templates how to go about those things yeah it's an interesting question just one last point on that is there can be an open source generative model that uses proprietary data so then you can say this is how the iccades work this is how the attention mechanism works but also we have this curated data set that is our product on the other hand there could be an open source data set and someone could say we have a generative model that's our proprietary work or there's the quadrant with both being proprietary or both being open but that's just one way to kind of slice up the space and people who have academic backgrounds or other might prefer or dwell in one of those areas i'll read another comment from the live chat and see if any of you fellows have a thought so dav writes this will be a hard one to read speed pitch and volume within individual spoke words in addition to pauses are critical to know what exactly is being said how do we model these features of speech and language that on one hand language models are showing us well you get a ton of semantics out of just the string of tokens and yet this abrades away the pauses and the tonality and these features that in terms of our recognition and generation of speech can be almost dominant factors yeah so if you want me to say something with respect to this um i guess well if you transcribe this manually this the the speech there is a lan that could be used and one could add those kinds of additional annotations in another track for instance intonation or speed uh i don't know whether there is tools to figure this automatically so usually speech recognition uh systems only have the transcription of the words but not additional information for our experiments that we transcribed we used a what's an ibm system that a speech recognition system was the only one that we found that produces a timestamp for each word that was produced and that was very interesting because we could then compute the time when word was spoken and heard so we could compute an ear voice span with this kind of information but they would not produce this kind of additional prosodic prosodic information or something like this so from of course from the timestamp you can see the speed right that would be somehow implicit encoding i guess but the pitch and all this there is no such information i don't know whether there is speech recognition systems that produce this kind of thing but once this is there it's just an additional feature for a word and it could be taken into account just like any other super segmental feature perhaps like like for instance a phrase take or something like this and it could then be also compared with linguistic structure and the prosodic structure and so on all these nice things could be done if these features are there i don't see a problem i mean from that point of view i don't see a problem how that could be how that it could be tackled in this approach i see a problem more of from where do we get this data and whether we have tools to produce this automatically or semi-automatically maybe interesting yeah dave writes in the chat assembly ai also captures word onset and termination to millisecond granularity so that's what we use for the act of inference journal like when we transcribe and curate this gas stream or any of our streams we have that kind of timing data and then it's kind of like reading between the lines we could then add another track like pitch deviation or volume deviation and then do some kind of affective inference okay what if we only saw the volume deviants or if we only saw the pacing patterning and then okay now what if we have this kind of sensor fusion with the tokens and then these prosodic elements of speech and then do model comparison and as you point out this isn't trying to reconstruct a translator in order to make a translation algorithm but it can give us some diagnostic information on the translation the effort yeah um fabio or more it's any kind of other dots or or questions so where where does the research go or how can people get involved if they're curious in these topics so i i think we have now well i outlined in the beginning a trajectory of this translation process research and i think this agent is really a new way of modeling this and i i see much potential there to actually model this so so we looked into we analyzed the translation process data that we have for many years and i think the type time is ripe now also with all your fantastic institute and the activities that go on in this institute uh i mean that fits somehow i think it's a timely kind of a thing so that's i think uh that's where the research goes i mean at least from my side and anyone who is interested in joining in uh very welcome so there are so many open ends that could be put together i've tried to outline a few ideas that i was thinking about and of course the code that is there can be made available to anyone who would like to continue there uh yeah fabio or more it's where do you go from here well i'm i'm interested in that level zero i am i i like i like the small things and and i i think they are um promising i find them interesting and and and um yeah so i hope i hope to contribute on that level thank you excellent whereas for me i mean i've always been interested in human and threshold processing and the the question of the planning horizon and the tie in there is what really attracts me at the moment so to see what the relationship would be between you know so um the planning horizon or implementing that in the mall that would somehow emulate things we know about human and threshold processing awesome a closing note from Dave this is the kind of high volume highly granular data needed to scrutinize and improve the active inference and free energy principle frameworks expert behavior especially is really valuable to study cool well thank you all thank you michael for this great presentation work till next time thank you so much for paying us thank you thank you bye