 Okay, well, let me start. So today it's pretty much following the spirit of Tuesday's lecture, sort of drawing out some consequences. Hopefully you'll find a couple of the things we talk about to be sort of puzzling and curious. Maybe unanticipated properties of general complex processes that we can now talk about that we couldn't talk about if we didn't have this framework that are also strike when I think is fundamental properties. But first some logistics, so projects, right? So we're all supposed to be thinking about some projects now and then the sort of latter half of the homework due next week is to write up a project proposal and I sort of outlined maybe in some extreme detail the things I'm looking for in your project proposal. This is to help you to think things through, focus the project, hopefully even some of the questions that you should address help make the projects more practical. So you know this is a class on information processing and intrinsic computation and nonlinear dynamical complex systems. So one way to look at it is pretty broad topic and in fact I'm relatively un-Catholic about what the particular phenomenon is. But you should be thinking about some sort of nonlinear dynamical system. Could be spatially extended, which we haven't talked about too much, but that's fine. That you have some questions about that you're interested in and that you at least have a working hypothesis that using some dynamical system, some information theory and computation mechanics will give you some insight in. So the proposal shouldn't be too long, two or three pages and you know you can just address these different points here with just a few sentences, short paragraph. So the first thing I'd like to be clear about what your goal is, what you think you'll get out of it, what would you like to learn, kind of a very high level description, then maybe the real key step is what system you're going to study. Some sort of dynamical system that's nonlinear, time dependent. And then once you sort of select that it's good to think through what the state space is. How many dimensions is the state space? What's the dynamic over that? Do you already have a model for that for this system? And then what about this system's behavior or pattern forming properties do you find interesting or notable or why is it studied and passed? This doesn't have to be dissertation research. Again, like I said before, my role is usually to take the projects back from the precipice of impossibility or dissertation level work to something that's practical you can do in about a month. So and answers to these questions sort of parameterized difficulty sometimes. And then what sort of dynamical properties are interesting? You're going to study bifurcation sequence, you're going to study chaotic behavior, time dependent pattern formation, phase transition, what dynamical properties are you going to look at. Also then think about what we've been talking about in terms of information processing, information generation and storage, intrinsic computation, what sort of information processing properties might be relevant would help you understand how the system works or even why it's interesting. And of course a methods question, what methods will you use? You know I will study this set of ordinary differential equations in four dimensions, the Hennon-Heil's molecular oscillator and I'm going to use a fourth order Runge-Kutta or you know Adams-Baschermoulton integrator and I want to estimate the Lyapunov exponents. So that would be an example sort of answer what kind of methods or I will calculate the block entry from the symbolic dynamics of the system. And then you know venture some guesses to what you're going to find and hopefully you're setting this up. So there's an interesting informational or computational question about the system. Well how do you think that's going to turn out? The system exhibits phase transitions, it's believed to be a critical phenomenon with arbitrarily long-range system correlations. I think I'm going to find my estimates of excess entropy diverging something like that you know some kind of make a little bit of a guess here stick your neck out. And then practically once this is kind of fleshed out think about what the steps are you know literature search write some code learn the computation mechanics in Python package improve my sage skills you know run the simulator that I've written borrow some bit in simulator analyze the data do a bunch of experiments analyze the data write up the results you know be a little more explicit about this these things all take time and then even though it always seems a little bit ridiculous at the beginning it's extremely helpful at least in these steps for each step write down how long you think it's going to take. Obviously you will come back and probably every week revise your estimate of time but it's good to lay out these explicit steps and go one day two days I'm going to spend a week on this four days on this and then the important step is totaling it up and then you realize oh oh that's 17 weeks okay I need to rethink this I need to make it simpler and again I will be the advocate for simpler if this will help you understand the basic ideas we've been talking about so now some people have gone on and to do a publishable work that's been published or can be published but we haven't published yet so so there's a whole range of things but you know that this is sort of a first cut about a month worth of effort you know but not working full time you can have other things to do I appreciate so so so again the main sort of focus or topic should be something within information processing and computation and natural or engineered dynamical systems you can also think about and I'll mention some examples but it can be engineered doesn't have to be a natural system we often get people from computer science here that study design systems definitely talk to us the proposal that you're handing in for the homework is just the opening salvo in a dialogue and we'll be talking to you you can come talk to me in office hours send me an email when I have the proposals I will send you comments on them talk to Ryan or Alec or Karana they all have experience doing these things definitely let's chat in the next week or two about this and get a good project set up so and then what's the result of all this so once you get sort of project moving along there'll be a project report that you'll give and since we have a this diaspora of people in the class that includes Berkeley I found a location halfway between Berkeley and here in Martinez that's accessible by train by Amtrak at friends very nice house so my tentative plan is to have like a half day or kind of workshop there'll be maybe 10 12 projects presented and then a barbecue so make a little party of this so I still have to in about a month but we'll go around do the scheduling thing but but put that in the back of your mind if you have like a free day to get there I mean I'll also be driving down there so I can take a few people the people from Berkeley can easily bark and then take Amtrak up to Martinez not too long a ride then after so very often the the the moment of giving your project presentation forms a great deadline and that's when the project really gets moving and you'll have another week or so to finish the report so a written report that includes it you know code documentation of the code to or now you can hand in the written report or some people just make a website with all that stuff attached to it you'll see that I have at the bottom of the course home page there's a link I think it's the last thing on the page example project so there are projects going back eight years so you can get some sense of what's there even even those topics that proved out well would be worth doing and again this is just kind of a quick just to stimulate your thinking a little bit maybe to indicate how broadly you can search um here's just some example topics um well maybe sort of generally pick some complex system cellular tomaton or the duffing on driven oscillator or the van der poel oscillator estimate some information quantities okay that would be I would say probably a straightforward project start with a known nonlinear system as different behaviors as chaotic think about the symbolic dynamics once you get the simulation running you can use all the facilities we have built into the sage campy server to estimate entropy from sequences entropy rates total predictability build the epsilon machine which we'll talk about how to do that practically great and hopefully you'll see that you can actually learn a lot another another kind of related thing to do is oh I'm interested in a class of systems I have um two-dimensional spin system and I'm interested in I happen to know this particular version of an easing system as a function of coupling strength and and the temperature goes through a phase transition so that's that that specifies a whole family of processes and one thing you can do is once you figure out how to estimate your chosen information or computational statistic from it you can do a survey now of course sometimes these things require a super computer to do but you know for this there's enough horsepower behind um well the servers we have or if it was really compute intensive we could help you move it over to something we actually have a much bigger server if you want to get some good final statistics long runs many examples many parameter settings we could also think about that once it's all kind of running on the base cases um we talked a bit about cellular automata there's certainly other kinds of spatially extended systems I mentioned these lattice one-dimensional maps still in many ways kind of a research area you could do some kind of pattern not dynamical pattern analysis of that talk about information storage and flow and space and time um I already mentioned spin systems here 1d spin systems many of these can be solved analytically so it's interesting to kind of compare simulation results to the analytical 2d spin systems not so many analytical results or we could think about just big topics mostly we've been talking about information and computation intrinsic computation but what's the relationship between energy in some system so if you picked an example dynamical system that was well characterized by energy energy conserving system then you could explore the relationship between energy used by the system and how it was doing its information processing this is very much a contemporary topic in particular there's been some recent experimental works on trying to verify various bounds between energy and information processing by doing experiments that are essentially Maxwell's demon rectifying information they're using information to rectify thermal fluctuations into usable work so this goes you know it's been a long-standing problem right this goes back to clerk maxwell james court maxwell in the mid 1800s and it's only in the last six years that people started doing some serious experimental work to test the theories that are now out there the general question would be just the relationship between phase transitions and computation like i mentioned before hypothesis systems that go into these critical states with long-range spatial and temporal correlations store a lot of history so there are lots of such as mechanical systems spin systems pots models is one case of that probabilistic cellular automator another class that go through phase transitions is a function of what you might call a temperature another interesting area that's become sort of a revisited in the last 15 years as the has more is lost starts to poop out on our computers so they don't get faster and faster they don't get faster and faster because they're too hot which actually takes us back up to what i was mentioning before so now what people are doing is they're just giving lots of processors spread out so they can cool but the increasing the the density of information processing has reached a thermal limit so there is a fundamental connection between information processing and physical devices these ones that physical devices that we have designed to do our computational bidding and this is there's an aspect of these questions that is profoundly technological and important but so now more is law sort of failing on the current CMOS technologies have led folks to think about alternatives for doing computation probably the one you've heard most about is quantum computation that if you had a computer that doesn't have the rather than implementing classical logic in basically many degrees of freedom which are sort of classical physical systems you build your computer out of physical elements that are fundamentally quantum mechanical and then there are claims theoretical claims that this kind of quantum computer can actually break all of our secret codes very very fast now the technical issues right now they're only up to about I guess well the different claims around but about a dozen quantum bits a few bits seem to be experimentally feasible that not doesn't allow one to do too many interesting computations but people are working hard on this one of the technology challenges is trying to keep these quantum systems isolated from the environment around them to the extent a quantum system interacts with the environment it senses measured by the environment and collapses down to its classical state thereby losing all of the quantum properties that give it this computational leverage so so so that's one of the main challenges here but you know a nice review of that would be great people have proposed doing computation with DNA there's a fairly recent proposal to start using actual DNA to store to use it as a database for storing you know all of our sort of digital information there people started to revisit analog continuous computation thinking about what advantages that might have this tends to be harder to to design with when you build chips tweak things to make them accurate but oftentimes analog computers can run quite fast and they may not be accurate in the same sense but they can be useful devices to explore things people look at stochastic computation in in in neural systems using analogs of biological evolution to solve optimization problems I have some function I want to optimize well now I think of this is what's called a fitness landscape and I have a population of solutions that are individuals competing and the better they do in offering partial solutions the better performance they have the more children they get to have in the next generation so this is sort of algorithmic use of evolutionary procedures to optimize it's a version of stochastic optimization so and of course neuro biological systems presumably this tissue between our ears was evolved to store and process information that's a presumption and somewhat oddly even the hypothesis that it's storing and processing information is still a little bit controversial these days in neurobiology but a lot of interesting proposals for how to think how do you adapt information theory to work in neural systems how do you think of a spike train coming out down on axon as storing information or representing some pattern in an input quite a lot of this other possibilities actually just doing a review interesting notions of self-assembly nanotechnology people are now building one-dimensional cellular tomata at a small bits of DNA that self-assemble themselves in essentially in a one-dimensional space-cross time diagram they claim that at some point when that gets developed they'll be able to do spatial computations with DNA the other kinds of self-assembly people are studying we'll get a couple lectures on complex materials complex one-dimensional materials in about a month there's all a lot of interesting work of new work in non-equilibrium thermodynamics let's just look at chemical pattern forming systems some of the examples I mentioned in the first lecture of the quarter described the bellows of sabatinsky reaction really interesting same sort of mechanisms are where it's been implicated in how biologic organisms form think of how you know fertilized egg turns into a blastular turns in and so on starts forming suddenly at some point you have an infant that presumably is some kind of time-dependent process of pattern formation forming shape really interesting area obviously bioinformatics huge area related to information in biomolecules a lot of interesting things you can do for example people try to have tried to estimate epsilon machines or more simply estimate entropy rates stored information in bio sequences and there's just tons of this stuff out there you know publicly available databases and also more in the maybe social science sphere in economics people are now interested in more dynamical models of economies the world economy or even in stock exchange how markets get formed how markets get cleared how that sets prices how their booms and busts there are a lot of interesting dynamical models now being pursued in economics also great topics so here you can just review like what's currently being talked about as sort of a complex system and its information processing properties or write some code or develop a simulation of some self-organizing system I already mentioned these very statistical mechanical models x y model basically an array of little clocks that interact magnetically heisenberg model spins on a sphere pots model cellular tomata map lattices population dynamics ecological models quite interesting typically non-linear there's some debate about whether the chaos that's been seen ecological models actually applies to real organisms in populations in the environment interesting controversial topic and then also evolutionary dynamics networks huge topic in fact it's sort of dwarfs what we call complex systems now network science is called so we have neural networks the internet formation of you know fads about via twitter cascades you click gene expression metabolic networks all these things people are now using dynamical models people are interested in how networks themselves say through their how nodes are connected to communication topology how much computation or functionality does that give to a system or take away from it does add robustness I mean lots of questions also transportation networks traffic flow the flow of you know food coming into a metropolitan area the power grid world trade lots of different kinds of models here that one could look at and another possibility is we have a lab across the way and you can build if you'd like some sort of chaotic or pattern forming system electronic circuit I'll mention in a second mechanical device I was just at the exploratorium they have a number of nice little chaotic exhibits you can play with one is a pendulum you get to drive another is a balancing strut that you can drive periodically and goes chaotic they often don't tell you in the little explanation that it's chaotic but there are a number of examples in there that sort of installed over the year you know if someone was a chemist I'd love to have someone actually implement this Bellasov-Zabotinsky pattern-forming reaction system it's really very cool to see this thing actually oscillating in front of you it's one thing to do simulations nothing to actually see the real thing so video feedback there's something I played with many many years ago back when I was trying to understand pattern-forming systems and this is this is a pretty simple idea if you know if you have a camcorder all you have to do is take the camcorder plug it into a TV right so what's a camcorder do it takes a light field projects that onto an imaging device the imaging device and electronics turns it into an electronic signal actually a serial signal and then what's a TV do TV takes an electronic signal turns it into a light field well what you do is you point the camera at the TV screen so the image coming in gets converted by the camera into electronic signal that then goes around and then becomes an image on the screen which then in the optical domain then gets converted by the camera back into electronic signal around and around you go it's an iterated system and then what you do the basic instructions are mess with all of the knobs so the important things you take the camera and you actually turn it so that means the image every time it comes around gets rotated a little bit that kind of stabilizes what you see and then you take and crank up the contrast and the brightness and turn over the hue and whatever and then after messing around takes a little bit of patience suddenly what you'll see you take your hands off this no more messing around you'll see images start to dance around on the screen spontaneously so it's a wonderful pattern forming system in fact you can reproduce most of these other pattern forming examples that the kinds of dynamics and behaviors and the patterns that get generated in using this video feedback system so I've got a setup in our lab that one can play with if you're interested in doing an experimental project it's very very rich also it's kind of nice compared to if you're doing chemistry there are all sorts of issues about doing the experiment the timescale for this Bellisab's Abitinsky reaction is on the order 5 10 15 minutes it's kind of slow in other words this thing 30 frames a second you get lots of data so you can sample that analyze it anyway lots of possibilities maybe more mathematical ones and they've actually been quite successful sort of more mathematical projects on dynamical systems one is looking at how chaotic behavior reacts to when you add external noise to it does it destroy it it's an it's an unstable if I add noise as a destroy it or this enhance it you can look at how external noise varies various different kinds of bifurcations or routes to chaos and that was important for example in discovering that what's called weak turbulence actually had a chaotic attractor in it because any real experimental system isn't going to be running at 64 bits of precision like a digital simulation it'll have some it's it's it's has a physical embedding therefore there's thermal agitation so there's just sort of the macro scale that you're seeing the nonlinear dynamics a few modes coupling nonlinear nonlinearly and then below that level there's some thermal effect which you have to take into account um looking at the just how various nonlinear systems evolve probability densities we talked about that for one-dimensional maps starting with initial distributions and see how they changed we also looked at the dot-spreading examples with the ODE simulations Lorenz and rustler attractors you can try the same kinds of studies there but also then say tracks since you have a probability distribution at each moment in time how is the entropy increasing in that probability distribution whereas the information being generated calculating maybe this is maybe a more technical thing for a mathematician how do you approximate invariant distributions right remember invariant distributions are kind of a fixed point in the space of distributions they come back to themselves under the dynamic so that's a fixed point equation you can try to solve or you can look at how you converge to it when you start with distributions very far away and there are various kinds of of competition mechanics and informational analyses of the evolution of probability distributions you can do now there also are other ways of looking at chaotic behavior rather than the side of trajectories looking at the state space more classical ways of looking at a Fourier analysis assuming that the signal coming out x of t coming out of the Lorenz system is best described in terms of Fourier components well we sort of did that when i we did the demo the audio demo because our ears here in terms of frequency in the frequency domain and it sounded noisy so there's a puzzle here why when i'm doing foyer analysis and i look at a low dimensional like rustler Lorenz in three dimensions why why is it noisy why does it look broadband what's what's going on there why with this chosen analysis method does it look like it could be an infinite dimensional system with all frequencies excited how does that happen so even this what might seem like a very conventional kind of project has some interesting kind of questions questions that arise immediately and there are other kinds of analysis you know wavelength analysis trying to figure out and and these you know foyer and wavelength analysis are used a lot in time series all over the place experimentally so there's even kind of a meta lesson here if you take a known nonlinear system that you're familiar with its geometry and so on we think we understand it we had even estimated its entropy rate how it evolves distributions and then you look at it with these techniques are the results or the interpretations you get from a foyer analysis really describing the system itself or are you being misled by the characteristics that the analysis methods are adding in it's so even the simple thing there's it's kind of sobering to think through what's going on makes you a little suspicious when maybe you start to study an experimental system it are my chosen set of methods giving me a view of what's really going on or are they coloring things right so so even foyer analysis has interesting things people use nonlinear systems actually now for doing date encryption actually also for music generation and all sorts of stuff for the interesting anyway just some topics I just kind of threw up there if one were philosophically inclined a number of the although we're doing kind of more mathematical theoretical development of the ideas of intrinsic computation it does bring up number of basic philosophical issues which you might call sometimes I call it experimental epistemology this is what I think I do I'm trying to understand how we come to know something and we have to make some commitment to what a pattern is and how structured things are some commitment to a notion of what randomness is to start to analyze how scientists build theories how adaptive organisms move through their environment and survive so this brings us right up to number of philosophical issues what is causality is there's sort of this semi-raging debate although maybe it's a small community between people in computer science and artificial intelligence who have one notion of causality they're trying to recapture the notion that that scientists do experiments on the world it's only through interaction with the system that you can detect the causal independence causal sorry dependence between what you do and the structure of a system you can't really learn the system's causal structure unless you interact with it you intervene you make hypothesis if I take this apart then that thing won't happen the problem of course is that this flies immediately in the face of everything we've been doing in terms of looking at systems on their own terms if I take for example to be concrete if I take the Lorenz set of three differential equations and remove one of the terms it's not the Lorenz system anymore the Lorenz attractors this emergent property that takes x y and z interacting through how the you know the vector field or the right hand side to produce that big attractor I can't go in and intervene without destroying the thing just like I mean it's common criticism of reductionism applied to life there's a problem there so this is definitely been kind of brought up again in particular in the big data era how can you from just doing data analysis conclude that the world is structured somehow well we have our answer here that we're developing we have a way of going from the broad data we claim and building this but that picture is not that well integrated in with these these other views yet so so purpose what's the world of purpose in the system and how does purpose emerge also very interesting thing how does the system become sort of smart enough or have a model of the world and then be aware that has a model of the world so that it can ascribe you know some some some goal state for the world that's interacting with and of course randomness but the interesting thing here and I'm sort of calling out is they're very interesting studies psychological studies of how how bad humans are at detecting randomness so this famous cc's now passed away aim is to first get a psychologist down at Stanford years ago how people tend to dismiss events that seem too ordered as not being consistent with randomness but if I flip a coin five times it is possible that I get five heads but people say no no no something is wrong there but it's okay right it'll happen one in 30 times so same thing with coincidence you know our sort of sense of wonder like oh you've got the same birthday that's not amazing or you know we all have this sense one first kind of wonders where that comes from some sort of surprise relative to a model or expectations we have but there's very nice paper by a friend of mine down at Stanford in the math department Percy Diaconis on basically analyzing the the sort of mathematical likelihood of coincidences that's based on this thing called the birthday problem right if you have a room of 35 people you can basically just stand up and make the claim two people in this room have the same birthday at 95 percent of the time you'll be correct and everyone will go wow how did you do that well you work out the math and that's just a likely thing because when you make a statement like that you're talking about an equivalent class could be any number of pairs of people and therefore there are lots of those pairs so this interesting trade off anyway nice nice paper I may or may not have this on the supplementary reading list but someone's interested in that Percy spent many years uh as a well he ran off to the circus in high school and then and developed talent as a magician and then he spent many years debunking magicians so he's a very colorful writer it's a great it's a great paper prediction that means somehow has to go along with causality what do we mean by prediction and also this there's this interesting historical era kind of based around the mathematician Norbert Wiener in the 40s and 50s but there's Claude Shannon and John von Neumann and Alan Turing and all these interesting characters in the mid 20th century sort of developing this field that the Norbert Wiener called cybernetics which of course we hear again and sometimes I think a little bit of what we're doing like in the class is revisiting a lot of the original motivations of cybernetics sort of control and information processing and engineered and and biological systems was his interest so there's a there's a wonderful biography of that period it's about he focuses mostly on Wiener called dark humor of the information age it's a great a great read you find any of these topics interesting there has been sort of this historical whole about well why there are no departments at cybernetics in the west but there are in the former eastern bloc countries deep political um and sort of personal peculiarities determine a lot of the history of how we deal with information and computing anyway it's kind of into that there so some examples um from past years so so um actually so Christina last spring actually was last winter class a year ago she actually built an analog computer in the lab little vanderpole oscillator she was interested in trying to understand the role of feedback and what it meant for a system to when you finally got all the pieces hooked together to have an emergent property to self organize so yeah so this this is called vanderpole oscillator that's balthasar vanderpole um he sort of worked through the 1920s and 30s and one of his first interest was in in heart arrhythmias and so he built these vacuum tube based oscillators whose equations we studied last quarter to study in essentially an analog computer to study as he varied the coupling between the sinus node and the atrial node the instabilities that could happen so uh niki sanderson she was a math student she did basically a mathematical kind of project um looking at how looking at how thinking epsilon machines as generators and then as you vary the transition probabilities you change the distribution of her words in particular you can turn off a transition and how that generates a different kind of process so in a sense she was looking at the relationship between different stochastic processes in particular using the epsilon machine representation of them and seeing how these different epsilon machines were related to each other as you vary the transition probabilities so there's actually an architecture to the space of epsilon machines or equivalently to the space of stochastic structured stochastic processes so we're still talking to her she went off to applied math and uh at uh University of Colorado Boulder um beacrom physicist actually works with rice and desu sa on networks but the project he did was studying phase transitions and looking at computation and the complexity in phase systems with phase transitions like he's still working with this on some work we're going to publish along these lines charlie brummett um also works with rice and desu sa he looked at networks of chaotic maps so sort of like that lattice of one-dimensional logistic maps but now arbitrary interconnection topology adapted a notion of this called directed information from the literature does you know it's information flowing from node a to node b or from node b to node a and how does that relate to the dynamical behavior that supports it um and then he then then he made his network grow based on how the information was flowing on the network really complicated probably so complicated that this would easily turn into a dissertation topic he's off doing other things now but it was a nice just first cut very exploratory i measure you know at root it was developing a little simulation system i have logistic maps on some randomly connected communication network and i went around and measured various kinds of mutual information in the from the local symbolic dynamics to see if that was correlated with the behavior help them understand how the set of nodes would kind of cluster together and synchronize and this this group wouldn't uh luke gritsky mathematician he basically developed that whole i mean i just had one slide on it last time on the epsilon machines as semi-groups he was interested in this sort of algebraic analysis of of these machines other kinds of properties of that wrote a very nice paper that maybe someday will publish uh uh paul rikers who was actually working with christina i didn't mention okay um he looked at um just extending computation mechanics to spacetime so rather than a one-dimensional time series you have space and time and how can we talk about the storage of information and the flow of information in spacetime right so christina's other project which she did during the spring was to look at viney plants in her yard called passion flowers and they grow these little tendrils so that they go from one vine to another and they makes the the the whole plant as a whole well climb up walls but then also be more robust and so she ended up discovering that the literature was wrong that these tendrils uh in the literature that the tendrils were supposedly always holding on to two things that they would never grow free well in fact a lot of them are free and so she and paul studied the pattern formation dynamic as they kind of search around and coil up and become springy and probably symbolic dynamics even for this coiling left right typically the to the generic state is there's this helical coil but then every once in a while go from clockwise to counterclockwise when that does that there's a little region with a little kind of loop that's called a technical term a perversion and so we studied the perversions in passion flower tendrils yeah no it's a really nice project we're still working on this yeah yeah we're still uh you remember ryan so he's tiang now he looked at kind of similar to paul's project just calculating block entropy in one space one time dimension right we were talking about block entropy so far but that's a word and we look at how that block entropy grows as we make the word longer well in two dimensions you have patches so how do you even define a block entropy well you can define a distribution over patches do p log p but then it turns out there's some interesting questions about what limits do you take and so on so just kind of exploring that uh ben current student mind he gave a kind of a book report on the state of quantum computation kind of compared contrast of computation mechanics watson looked at spiking models he's doing this now for dissertation research uh paul was from psychology he was interested in in in models of of cognitive adaptation learning and how that was related to various kinds of uncertainty in the model um and the nick travers mathematician he looked at the computation mechanics of elementary cellular top law and also trying to come up with a a metric between different epsilon machines so anyway these and more are listed down through that link at the very bottom of the home page okay so that's yeah it sounds like the requirement of flexibility basically just have to use vocabulary and devices yeah of this course yeah the methods right yeah yeah some dynamical system something non-linear a non-trivial behavior can be chaotic or could be like a spatial pattern forming system and then hopefully you know that behavior you know suggests some that somehow information production storage intrinsic computation are involved in that and then they you know projects are typically some kind of that kind of analysis and then comparing it to the dynamical behavior and how the dynamical behavior and structure the system supports that so yeah you know and and don't don't don't hesitate to be too simple it's pretty rich like you could you could just take a the driven uh you know vander paul oscillator and just study that all these systems are actually quite rich so um and of course if it's so simple and you get it all figured out well i have lots of suggestions to make things more complex so don't worry that's that's my job that's my day job okay um any other questions on projects or okay good all right so now back to the main program here uh information diagrams for processes well so at the end of the information theory section the winter class uh we talked i introduced information diagrams um but now we have this new representation in fact we can think about we actually have a new set of random variables namely the causal states so what i want to do is review a little bit what i introduced because that was i guess more than a month ago we went over this so i'll review a bit how we use these information diagrams and they're very handy graphical tool that helps suggest sort of mathematical properties or at least hypotheses um they're in a sense not absolutely necessary because you know at the end of the day if you're doing a proof or calculating something it's symbols on paper but it's they're extremely suggestive and very helpful so so and by way of getting there i want to first throw up a little question um that's somewhat of a harbinger of what we're going to talk about um in a couple weeks um but also to kind of motivate the use of the information diagram and a generalization we're going to do uh next week so so and that that question is um just looking at these processes we have these stationary processes um what properties are time symmetric and what properties are time asymmetric um so we'll have some process again distributed here given some distribution over the chain of bianfant random variables we have the past and the future here left right going arrows and now i want us to think about the sort of the forward process or one way to say this that's what i when i give it to you that's what i'm giving you the forward process or the given direction call it forward and that is what i'm thinking of is that the random variables there's an index that increases as i go left to right okay um now you know this could be a spatial lattice too and then this notion of forward and backward to be um bad terminology would then just be looking at it going from left to right now what we're going to do is contrast to the reverse process the basic idea is we just scan things in the opposite order if i give you a bunch of data maybe i give it to you on a hard disk and i don't tell you what direction i stored the data and you don't know as you scan through the data i tell you it's a time series but i forget to tell you whether time is increasing with memory address or decreasing with memory address some computers actually store things the opposite way with decreasing memory address when you load things in so it's not obvious so what i mean here is by the reverse process so forward process is just the one we've been talking about arrow going to the right uh reverse process arrow going to the left and what i mean is that if we start with our chain of random variables i present it in the opposite way so that the indices would decrease but the idea is very simple i give you this process chain of variables you scan it this way and you scan it this way and the question is what properties are the same and what properties are different okay so first thing we can imagine we can ask to define a quantity called the forward entropy rate that's the entropy rate of the forward process and this is if you might say in sort of prediction mode i'm trying to predict the next symbol given the past some arbitrarily long past so this is what we've been working with i'm just adding some notation you know little arrows and pluses of minus you can also ask it's easy to define very natural to say oh well what's the uh what's the entropy rate of the reverse process and that's easy to think about i've been sort of looking at the future i come up to the present and i try to predict the previous symbol so forward mode is i have a past i probably predict the next symbol the reverse way is i've looked at all the future and i try to predict the previous one okay so those are all well-defined question in general if i give you some complex process which direction is most unpredictable future so we have forward i have two choices so i'm i'm willing to entertain all responses to the question exactly and how would you know if i don't tell you right exactly right right yeah i could or it or it could be i mean when queen was just sort of bringing up there could be no natural association with the direction i give it to you it could be different but then you know sometimes i'll see it larger in one direction than another you know so it could just be your choice of which is forward yes right is arbitrary right but there still could be differences right so some processes presented forward have they're more predictable forward and less particular reverse and other processes vice versa yeah well nobody this this issue of time asymmetry does or symmetry does come up in thermodynamics so here's the punch line they're equal whether i'm scanning this way and trying to predict the next symbol of the future or retro addicting the entropy rates are the same for stationary process yes right yeah if it's non-stationary i could make up all sorts of weird things where yeah it's it's like this giant markov chain and after this it's dc constant value or something so then right but anyway so general stationary ergodic process forward and over scans of the process are have the same entropy rate they're equally unpredictable and it's not too hard the proof isn't too long so we'll do it here to sketch out why so we're going to take our you know our definition of the forward entropy rate and that again is just the uncertain the next symbol given the past and again now let me be a little more careful about looking at now a finite length past so so i'm going to think of blocks now of length l histories of length l and then how they predict the next symbol now this is a conditional entropy and we went through the argument in previous lectures i can rewrite this conditional entropy as the difference between two block entropies remember we were talking about the block entropy growth and there was a two point slope estimate that was the argument going this way so difference between the entry the block entropies at two different lengths length l length l minus one that's equivalent just through little information identity to this conditional distribution okay but the process is stationary therefore i can just do a time shift so i'm going to take this block here this one over here length l minus one and just shift it up so that the last symbol is at time zero so i'm just talking about the entropy of this block here same as the entropy of this block here as long as it's still as long as it's the same length the origin of time makes no difference due to stationarity okay so i just replaced that guy in there so now i have this forward entropy is now the difference between these two these two block entropies here the first one is still the same as before but now i shifted this one but by shifting it what i what i'm going to do is then actually use the same identity up here and that lets me pull out there's an extra random variable here and that's the one that is the when i convert back to this conditional entropy that's the one that i'm who's uncertainty i'm trying to to estimate here and then i'm conditioning now on this l minus one block that shifted a little bit so all i've done here is really just shift this in time and but actually sort of reverse it so here i was looking at a past a past and trying to get x zero and now what i've just done down here is i'm looking at the next block and trying to get the previous one right this one further back essentially just reverse the time um well just change notation here just to make it look visually a little easier assume now we're conditioning on these infinites now this is a future right i'm right so so this is this symbol here uh located in time this block is what follows after it and i'm going to let this get longer and longer that essentially is a future but attached at this time index but then that is by stationarity i can just shift this all down basically i can set l to whatever i like so i'll just set l equal to zero and then we have this previous symbol conditioned on the future an infinite future and that is the entropy for the reverse process so this depends on stationarity as you pointed out so again just real simple information identities and a little trick of shifting the time origin so the entropy rates are time symmetric both directions are equally predictable if you like or at least equally predictable asymptotically right h mu is the asymptotic large um conditioning history limit okay well what about um the excess entropy well the excess entropy of the forward and reverse process they're equal and this proof is trivial right excess entropy of the forward process is the shared information between the past and the future excess entry between the reverse process is the this mutual information between the future and the past i just swap these things but we know this function i i have x y it's symmetric in x and y right inside the log we just have the joint over the marginals and doesn't matter how i write them down so just immediately so so both the entropy rate and the excess entropy don't really indicate any kind of temporal asymmetry in process they're equal okay fair enough i don't know whether that's intuitive or not but there you have it facts the matter any stationary process has equal entropy rate and equal excess entropy in forward or reverse scans okay but what about we now have this new thing we're working this new construct the epsilon machine so just kind of formally here though we're going to talk about the forward and reverse epsilon machines that we get from the forward scan and reverse scan so formally i just kind of write that down it's the forward process mod the predictive equivalence relation and we get this forward machine m forward arrow and then reverse machines just you start with the reverse scan process and get the yet so the question is how are these guys related or the other way to phrase it is a process differently structured in forward and reverse time we have a no you've been here before you're saying yes okay right you think given the way the previous things when it's sort of like come on so yes exactly the same so so here right good question so here when i apply this what i mean is this i'm to make the m's m forwards causal states i'm grouping pasts to predict the future and then here i'm grouping futures to predict the past okay so they are switched yes absolutely yeah there's a little bit that we have different choice about what gets reversed so sometimes the vocabulary can kind of be a tongue twister but yeah okay so so can it be differently structured or the other way to say it is do i have to use different resources different amounts of history or future to predict at what is we just agreed was equal entropy rate right right that's the question yeah right right m forward and reverse are they the same or different can they be the same or different good okay so in general they they can be different depends on the process i mean if i give you one one one one one one one okay come on that's all the same so okay they can be equal but the surprising thing now and again just to kind of emphasize now that we have this structural view or process there's this this a new question we can ask and it turns out in general the forward and reverse machines aren't the same i find that strange especially in light of the result for entropy rate and excess entropy and also the numbers the amount of information stored so right so that you had a more subtle guess maybe there's some very detailed difference but when i look at the amount of memory that they're storing internally in the causal states that could have been the same too well that can happen you can have these be different and this be the same this is just a scalar number so there's structural variations so i'll do the proof by example so going back actually to the logistic map that's how first discovered this so it turns out if you look at one of these misa ravage parameter settings so remember we have a logistic map we have this control parameter r r times x times one minus x is the functional form you can find one of these misa ravage parameters where the maximum becomes a fixed point after four iterates it becomes a fixed point so those if you think of the bifurcation diagram that's where the veils are and the veils all cross you know your maximum minimum and then the iterate and then the fourth becomes equal to the fifth and there's a way with this equation you get exactly the parameter setting at which that happens if you remember these where the where we have a misa ravage parameter setting that the distributions on the interval are very nice and so you can actually calculate things pretty accurately and then if we fix this parameter setting in our logistic map of the interval and then use the generating binary partition about a half here's what we get when we look at now there's a notion of forward because the map iterates forward right x of n plus one is equal to some function of x n so we're iterating forward if i look at the forward binary sequence i get out i get this machine to some approximation um four states uh start state is recurrent um yeah there's you know there's one transition here this unit probability go through and of course and calculate the forward entropy rate so about 0.8 bits per symbol and you can calculate by uh the statistical complexity by calculating the left eigenvector of the transition matrix to get the asymptotic state probabilities p log p that's about 1.8 bits okay well here's the data i gave it to you right you get generated this is actually done in just the simulation you know do a million iterations get a million bits throw it into this and calculate your coulomb's classes and transition structure well you can scan it the opposite direction really easy to do algorithmically and this is what you get in the reverse direction so it's not the same there are four causal states but now there are three recurrent causal state and one transient state which is the start state all right as soon as i see a zero i just rattle around down here check our theorems go back and calculate from this we actually get about 0.8 0.8 bits per symbol for the entropy rate the unpredictability but then now the we have p log p over three states we get about 1.4 bits of statistical complexity so right so so what does this mean it means that in the forward direction sort of asymptotically looking at the recurrent states i have to sort of make more distinctions using four causal states or remember more from the history in order to optimally predict at this rate which is the same in both directions of 0.8 bits per symbol then if i go in scan in the opposite direction there's less effort required the machine in some sense is smaller if we're using this stored information measure of statistical complexity it's smaller easier i don't know what words you want to use to to retro-addict the symbolic dynamics for this miscellaneous setting in the reverse direction then the forward is it is this related to the leopon its point and that you're starting from oh that's a good question yeah yeah that's a good question right so so um okay so the direct relationship between these two analyses is that the leoponov exponent and we know this because this miscellaneous parameter setting has a distribution on the interval that is well behaved called absolute continuous measured with respect to Lebesgue in that case the leoponov exponent is equal to the entropy rate of the symbolic dynamics now there's well and which was the same both ways so it somehow didn't quite capture it now it's it's you think about this one step more you're going to wait a second i'm what does it mean back in the original dynamical system to go in reverse time that's not what we're talking about here we're just scanning a process in forward and reverse we're not inverting the map and running it in the other direction that's the different set of questions and in particular if we did that we would change things that were unstable spreading in the forward iteration to things that were contracting so we're not doing that but it's it's an excellent question actually let's be careful here yeah right we're just using the map as a generator of a binary process and asking this question about the statistics of the word distribution from that not pulling it back to the original dynamical system but that is an interesting question and we can you know measure this difference of time uh because of the cause of reverse billy again it's just a scalar it's just the difference between the forward and reverse machines and again the whether it's positive or negative doesn't really accord with any natural notion of how the data is presented to you or what the original direction of time is there are examples that go both ways so but it's just kind of a handy thing to say something is causally reversible this is zero causally irreversible it's not in the case of the miss ravage process we had about a third of a bit difference in state information had to use more of state information to predict than to retroject okay so so yeah causally reversibly so so just to sort of summarize this the information that you need to remember in order to optimally predict an optimally retroject differs can differ even though the amount of unpredictability is the same in both directions so this is just sort of one one puzzle here it kind of brings up a couple questions for example if i gave you quote the forward machine can you well it generates the process if i generate the process i can scan it in the other direction and then i could get the epsilon machine from that is a way of going directly from the forward machine to the reverse machine and analytical way of doing this and the answer is yes so that's what we're going to talk about next week and there are also some other interesting properties that i'll now illustrate getting back to the information diagrams sort of along these lines this point about temporal asymmetry and what we're going to talk about now how hidden the information is state information state information is or really kind of setting us up for the techniques we'll introduce next week okay so just this was like a month or so ago what are information diagrams well you remember if we had just two variables right we have we represent it's like a venn diagram we have a random variable x and then the size i mean the the size of it is is represents the amount of entropy in that random variable we have another random variable y colored in yellow we have that we have the joint entropy is represented by summing these pieces up okay h of x y we have a conditional entropy the uncertainty at x given y should imagine that's the set theoretic operation of i have my entropy in x and then i get rid of i subtract out y's contribution so this conditional entropy is this red crescent here same thing over here for the uncertainty y given x we have the information in y and i carve out x and so it's this yellow crescent the overlap was the mutual information and then we even had this distance measure which was the sum of the two conditional entropies the two outside crescents anyway so this is just review of the earliest information theory thing we were doing um what's sort of more interesting now is to think about a three variable i diagram which we covered so we have three variables x y and z we have various information measures you know single variable marginals uh three variable joints uh the three-way overlap you know three-way mutual information the various conditional entropies conditional mutual information informations are all possible information measures there there are seven atomic uh smallest pieces so we have again in the venn diagram picture we're building up to do this for processes we have now these three variables x y and z and their sizes we have first these atoms so we have the uncertainty in y condition on x and z which means we have the uncertainty why we carve out the z and x pieces same thing here this little wedge over here is the uncertainty in x condition on y and z we subtract out the y and z pieces and so on up here um these little shield pieces in here these are mute first of all they're mutual informations so between y y and z but then i subtract out the x piece so it's conditioned on x same thing down here i have the mutual information between x and y then i take out the z piece and so on for here so i'm showing you what the so-called atoms are the smallest pieces in this and then of course the center three-way overlap that's the mutual information between x y and z so you have this nice graphical picture of this uh we also talked about what happens when there is this Markov chain relationship between the three variables right x goes to y goes to z right and the main property of a Markov chain was that though we've described it sort of narratively is if i know why i shield z from x okay well that has a consequence down here right so what's the main consequence of Markov chain well it means that the information shared between x and z if i know why is zero that's the informational statement of shielding it means this little piece down here is zero so in the case of a Markov chain for three variables this all simplifies and we have this it's almost like a fan fold here i just sort of slid down the upper variable and got rid of that little piece down there uh one consequence of a Markov chain is that all of the mutual information between x and z here that would be this overlap is contained within y maybe that kind of makes a little sense if notion of shielding whatever y i mean y's got to encompass that so that if i fix y i fix all of this and then all i have are these two other pieces that don't overlap therefore there's no relationship between them they share no mutual information uh in addition when we have a Markov chain like this all of these areas or their associated information measures are positive if you remember in the general three variable case we went through that x plus y equals z made the third variable z that was a sum mod two of the other two variables and we showed that this guy could be negative so the general statement were that was that these various measures are assigned measure which means sometimes you have to be very careful when you make an area on your two-dimensional diagram you have to indicate that it might be negative i was playing around with the 3d plots where this you know if it was negative it was the divot and it was positive would be above or something but you have to be a little bit careful of this we had this other representation maybe it's more useful when you have a lattice of random variables like this exactly the same set of atoms that are positive shown here i just kind of spread it out you can see how y encompasses all of for example the mutual information fixing this keeps these two atoms separate okay that was one and one of the homeworks we did the four variable okay so that's just a little bit of review so now let's take this back to processes themselves so we have you know the first problem would be oh my god i've got this you know the process is specified by this joint distribution of a bianfinite chain of random variables what a horrible mess that's an infinite dimensional information diagram not useful well the trick here is just to think of the past and the future's aggregate random variables as if there were in individual random variables so now we have two of these and so we can start with a just do this analysis of a process over past future and we just start with a two variable i diagram and then sort of apply the properties that we know and sort of simplify things so first of all okay we're going to start with the past is this composite random variable same thing with the future we have all these different information measures here things things information quantities we can write down but they're really just three atoms there's these two conditional uncertainties uncertainty in the future given the past uncertainty past given the future and then the mutual information between past and future so that's easy enough to do should be very familiar in fact we barely change anything compared to the two variable case right then we have the excess entropy okay fair enough so so this is the you know uncertainty in the future given the past the overlap is the excess entropy the mutual information and so on okay so now what this is sort of background we have this new random variable what causal state are we in right so we can ask how is the causal state related to the future and the past so that means we're going to use a three variable i diagram so we're going to bring in the the epsilon machine and we're going to start with the general three variable i diagram and sort of get rid of pieces that we can if we can okay so again the random variables are past future and causal state and then we have you know this requisite long list of possible information quantities we could write down and then we have what we have three variables in our i diagram that means we have two to three minus one or seven atoms independent pieces so let's build this up now and also apply some of the things we know to simplify if we can okay so here's our our future marginal distribution we have our past here okay and then this will be the yellow will represent the state information and the question is well as i've drawn it i've made the spatial relationships as if they were arbitrary random variables just like in the three variable case okay but we that there's more we know okay so in particular this brown wedge down here this is well it's the mutual information between the past and the future that's this big piece excess entropy and then i condition on the state information i pulled that out so what do we know about this how much information does the past share with the future given that i know the causal state none it's zero right that's shielding right there's one of the first properties we approved so that's zero so it's actually a Markov chain so there's a Markov chain past goes the causal state goes to future as a Markov chain that's good like we said before one of the theorems we proved was yeah they're kind of this these causal states are sort of Markovian in general sense so okay that's good let's get rid of that so we have our kind of fan fold here of our three random variables uh what else can we comment on here well let's look at let's look at this wedge okay this piece here okay that is the the state information and i remove the past oops sorry and i remove the past right so it's this it's this crescent here and i remove the past so how uncertain am i in the causal state if i've seen the past you know exactly what is how we built the darn things right so right if i have a past it's the epsilon function the epsilon function is a function i plug in any past it tells me the name of the state i'm in or the state i'm in so that is also zero okay so we just proved that actually all of the state information has to be contained in the past okay well we built it from the past so it's a little bit intuitive but one has to prove this in addition if we have a process that's not trivial this piece over here this will be the past where i've removed the state information and there'll be some just because i know what state i'm currently and i don't know what particular past took me there so there'll be some uncertainty if i know the state what past happened okay so so this wedge will have a positive area here so that means the state information has to be smaller to that extent so it's it's it's contained within this is contained within the past but it has to be smaller than it okay so so that's the argument that this this piece has to be positive for certain in the general case so now let me let me redraw this a little more cleanly so this will be our past i'm changing the shape here just to make the the boundaries work we just argued that the state information the statistical complexity has to be wholly contained within this but it's strictly smaller in the general case now we have the past right um okay and then now we've got this this atom over here right so this is the future when i subtract off the past we have of course where the past and the future overlap that's the excess entropy but you're now you're starting to notice the relationship to the state information contains e in fact as we did before i can replace histories with causal state they give me the same essentially the same information so now e we can see is actually not only is it the mutual information between the past and the future it's the mutual information between the causal state and the future i can do the same thing over here right i had the uncertainty in the future given the past well that's the same thing as the uncertainty in the future given the causal state so now we're starting to actually simplify the diagram a bit using using the causal states there's just quite a bit we can say about this same thing over here this was the uncertainty in the past given the causal state right that's this this green wedge here now there's one atom left over so what is this well symbolically you can just write out what it is it is the state information and i subtract off the future so that's what the yellow crescent is so that's sort of strange here i am in the causal state well let me say it this way here's a particular future what causal state i'm in i'm uncertain to some degree to the extent that that's a positive area or positive quantity given that i know the future what's my uncertainty in the current state that's sort of strange but it does smack a little bit of the issues we talked about before in terms of thinking about retro dicting like the time a cemetery discussion so okay so there are few things few few more things we can say here so the first thing i want to do is uh think about this guy um and actually sort of correct what is that's kind of pointed out before a little bit kind of sloppy notation so this is the uncertainty in the future given the causal state well these causal states are supposedly optimally predictive so we can show in fact that that that atom now i'm going to be careful and put the actual length of the future so this is future block of length they'll conditioned on the causal state well that's actually just the length of the block times h mu right if i know the causal state i'm predicting every next step with an uncertainty so we calculate h mu with h mu bits and uncertainty so how can we see that pretty straightforward it's not too long well uh let me just go from this quantity back to what we were working with in the in the eye diagram i can just replace the the causal state with the past here and then let me just expand out what i mean here of course is that this is a block random variable with l individual symbols so here i am with i'm kind of dropped some indices here this is a past that ends just before minus one just before this history block begins so this is the past and then this is immediately following l block and then we just place some information identity games this is the sort of joint conditional entropy chain rule so what i can do here is choose to pull out from this joint distribution on this on this side x zero but if i do that i have to add on this other entropy term the uncertainty in x zero given the past well this is a new past except advanced one time so i'm just going to denote that this although i should put an index here so so now we've got this single symbol prediction here from the past and now i've got this l minus one block i'm trying to predict from this shifted past it comes right up to this future and so on i just keep doing this each time pulling out a single symbol so i would pull out x one through the same kind of operation and i'm going to get terms like that and each time i'm i'm readjusting the time origin of this past probably should have kept it but that's okay if there are these different time origins it's all stationary each one of these is now the next symbol conditioned on immediately proceeding past those are going to be the same through the assumption of stationarity and i have l of those terms so i've l times the single step uncertainty and that's just l times h mu so we actually know a lot about this so now i don't have to lie to you anymore you know that what i was saying before especially when i wrote out something like this this is ridiculous this is for a generic stochastic process this is infinite i drew it as a finite bubble but actually we now know how it scales in the sense it's kind of foliated with every l it's foliated and it adds an area that's h mu each time now you should almost imagine that what you're looking at is the block entropy growth here it's just foliated for the histories so as long as we know the causal state we're in the state of optimal prediction and each new symbol on average is going to be that informative so it's just foliated by strips of width h mu or area h mu all the way out so we know how it scales so we don't have to worry about this really being infinite we know exactly how it scales um okay but what about this guy the mystery wedge this bizarre thing right uncertainty in the current state given the future we already saw this on tuesday but now we can actually give it some some meaning so remember we were talking about that the statistical complexity is always an upper bound on the excess entropy it was a relatively straightforward pack and unpack your mutual information exercise so e you know mutual information between the past and the future well that i can kind of pull this out i choose to pull out the future and i subtract off the uncertainty future given the past well that's fine and then i can of course replace my past i'm conditioning on with causal states but then this looks like a mutual information so it's actually i just rewrote this as the mutual information between the states in the future well we already kind of concluded that but then i can unpack this again so i pull out the uh uncertainty in the causal states minus the uncertainty of causal states given the future that's that's it now before what we did is i just observed that this is always a positive quantity and if i drop it then this will always be less than the first term which is the statistical complexity so that's how we got our bound between the internal state information and this observed mutual information the upper bound okay but it's explicit so this wedge is playing this role right this is the mystery wedge before we just ignored it but it just showed up we can no longer ignore it sitting there in the middle of our information diagram and what it what this is telling us is that this this mystery wedge what it's doing it's controlling the difference between the internal state information and e right if this is zero then they would be the same so now we actually have a criteria for when the observed information and internal state information will be the same it's when that conditional entropy is zero it's still a little bit weird to be thinking about my uncertainty in the current state given the future so um so you just sort of write this thing out this is forget this inequality here i can just rewrite this as i now have an explicit expression for this still kind of mysterious in interpretation but i know it's c mu minus e it's controlling this so in a sense it's how hidden the state information is so we call it the crypticity right you have a process with all this internal state information like we said before in the cryptographic processes e could be arbitrarily close to zero looking very much like a fair coin but there could be lots of internal state information what happens is somehow the process and how it's producing its observed symbols it's spreading out that state information over very long time scales and that's sort of crudely speaking this crypticity measures that so we're going to come back and um talk about crypticity a lot more it's sort of a central idea in trying to answer how hidden hidden processes are so let me just finish up here one final observation it's maybe not quite so interesting as discovering the crypticity but uh if you remember we had those two outside atoms in the process information diagram and if you remember we actually had this metric here the distance between two random variables is the sum of their two conditional entropies well that's exactly the form we have here so if we sum up the outside wedges we actually have this funny quantity it's the distance between the past and the future somehow it just seems almost poetic what's the distance between the past and the future and we can say say more about this in particular from the information diagram we know that the you know uncertainty of the past given the future we just showed that that was on the left hand side that's this uncertainty of the past given the causal state plus the plus the crypticity and then on the right hand side the uncertainty of the future given the past of Langphel well that was just H mu we just derived that so we can actually write out explicitly what this distance is between the past and the future so these two terms there's this retro addictive term current causal state in the past there's the mystery wedge of crypticity and then it scales linearly with the length we're going into the future so there's actually quite a bit we understand about the the information diagram now kind of or it helped us understand these things so what's going on we talked about this mystery wedge have this tentative interpretation that it's measuring the the how hidden the state information is from the observed talking about retradiction sort of some things um are the same in reverse time but other things aren't in particular the structure of the epsilon machines can be different so we need this is all a setup we need some more tools here so that's going to be the the role of next week so these things it's a generalization of the notion of of causal state we call the mixed states and this is going to allow us the techniques will allow us to take a given epsilon machine and reverse it in time to in a sense get the forward machine of the reverse process explicitly analytically and then we can start talking about then we won't worry any more about reversing time because we'll have a way of going back and forth when we switch the direction of time so so that's it um for today unless you have some questions