 So, you can divide these amino acids into different classes. So, these are my 20 amino acids glycine, alanine, serine, thionine and so on and so forth. You can sort of characterize the fundamental properties. So, one class of amino, so it is made up of carbon, oxygen, nitrogen, hydrogen, sulfur. This over here is my side chain, which so this is the fundamental the main chain and this is the side chain. The side chain distinguishes the different amino acids. There are few classes. So, one class is the hydrophobic amino acids, which do not like to interact with water. So, whenever you have hydrophobic amino acids, they will try to tend to cluster together. You could have hydrophilic amino acids, which like water and therefore, if you put in hydrophilic amino acids, they will try to sort of stay away from each other and close it to water. Similarly, you can have charged amino acids and I will upload them which ones are charged and which ones are not. So, you can take these amino acids and join them together to form what is the protein backbone. So, here is my sequence of amino acids that forms this protein backbone. Given a sequence, the protein can fold in different ways. So, that first level what is called the secondary structure, you can have alpha helices and beta sheets depending on the sequence of amino acids and whether correct hydrogen bonding can form or not. So, you have this alpha helix and the beta sheets. So, these are called the secondary structures. You can have a higher level structure which is called the tertiary structure. So, for example, these you have alpha helices over here which coil amongst themselves. So, this is for example, the hemoglobin protein. So, depending on how it folds, so this is the protein folding, it determines its function for example, the hemoglobin forms a pore where the heme atom can come and bind. And then there is the quaternary structure where different polypeptides can come to. So, hemoglobin contains four of these heme oxygen binding sites and together you have this hemoglobin protein and there are an immense number of them. So, let me that is all I want to say as far as proteins goes. So, you have the primary sequence for proteins. So, the sequence from there on you have the secondary structure which are alpha helices and beta sheets and from there on you have the tertiary structure which is actually the folded structure of the protein and so on. So, when we go on to when we talk about protein folding later on in the course, we will sort of use these properties. So, again protein folding is sort of an open question right. So, that is one of the most common themes that I will tell again and again in this course. No question in biology is completely solved, everything is an open question right. So, that is why it is nice because you have lot of areas in which you can contribute, but nothing is completely understood. So, protein folding is of course is one of the canonically hard problems to solve, but the way one at least starts to think about them is to try to group these amino acids depending on their properties depending on. So, the simplest class of modeling depends on whether the amino acid is hydrophobic or hydrophilic or charged and depending on that you try to build a model which are called these HP models of Dill and Thomas and company. So, anyways I will talk about that when I get to protein folding at least a little bit about that not a lot I have another video. So, here is the sort of central dogma what I was talking about the transcription translation process it is actually a fascinating thing. So, even if you know or if you have forgotten it is nice to know once again. So, here is my DNA strand right it contains this A T G's and C's I have this machinery of proteins that comes and binds to this DNA strand. It unzips the DNA strand which means it opens up this double helix and makes a single strand ok. So, once you have made a single strand this machinery sort of zips along this DNA and it reads off this basis and it produces the M RNA. So, this yellow thing is my M RNA which is being produced. So, let me pause once I got whatever. So, here in this DNA it is where you have this A T G's and C's on this yellow ribbon which is my RNA you have this A U C G's and C's ok. So, that is where the uracil comes in. So, it produces a copy of this DNA in this RNA molecule ok. Now, the DNA contains introns and exons which means that only some parts of this are required to actually produce a protein. So, so gene so I have this sequence right A G T C C G whatever it turns out that when you are producing the protein not this. So, it starts off at let us say this A U G's since that is what I said and that is my start and it ends somewhere at an U A A which is one of my stops ok. So, that is that this whole thing is this yellow thing that is going to be produced in the red by one M RNA ok. It turns out that not all of these are required to produce the protein. You have regions which are not required which are called the introns and regions which are required which are called the exons. So, when the M RNA is produced it has both these introns and these exons. The cell then snips off this parts which are not required. So, here for example, this green parts which are the parts which are colored green are the parts that are not going to be required for producing the protein. So, what the cell will do is that will splice off those parts using another protein machinery which is called the spliceosome. So, here this thing the spliceosome will form it will bring together the two ends of this green strand and it will cut them off it will join the yellow parts to one another. So, that the exons are connected together ok. So, it is a fascinating thing how the cell does this sort of reproducibly again and again and again and again we do not understand the complete mechanics of this. So, it will splice off that green part at some point and join the exons yes that is it. So, it will do that over the length over the length of the gene. So, gene typically will have many introns and exons. So, it will strip off all of these green regions and get you a continuous yellow strand which is all the coding part of the gene the green parts the non-coding parts ok. So, all those green parts have now been cut off and you have this polymer which is like this whole coding yellow polymer. So, this yellow polymer is then ejected out from the nucleus. So, it comes out of the nucleus out in the cell cytoplasm you have this another protein machinery which is called the ribosome. The ribosome then reads this mRNA this messenger RNA this yellow thing that came out and it constructs the protein. So, these green things are transfer RNAs which are bringing these amino acids corresponding to these sequences and as it brings in these amino acids it constructs the protein one by one by one or codon by codon by codon and this red thing comes out which is your protein this red thing will ultimately fold and form the tertiary structure of the protein. So, this is the central dogma transcription DNA gives rise to RNA gives rise to proteins. So, this transcription translation process. So, here is the protein molecule coming out of the ribosome and will fall fold into whatever structure it is supposed to fold in for its particular function. And the same ribosome will sort of read any different RNAs and produce the right protein that is encoded in the sequence in the mRNA here or when in the first part or in the second part not here this one plate those green things may be those are the transfer RNAs these things right. So, these are the transfer RNAs this tRNAs contain an amino acid. So, they will have let us say a methionine or another tRNA will have let us say glycine and so on. So, corresponding to each amino acid you have this package which this transfer RNA which will bring this correct amino acid ok. So, once it brings it into the ribosome the ribosome takes off this methionine and it releases the transfer RNA back. So, these are sort of your Amazon deliveries which are bringing the right package. So, that you can incorporate them according to the sequence into the protein right. Now, as I said so, this part at least we understand how this happens the DNA produces RNA produces proteins, but like I said the cell contains many, many genes right genes are regions of the DNA. So, what is a gene? A gene is a region of the DNA which encodes for a protein region that encodes for a protein that encodes for protein ok. So, of course, you have many, many millions of genes it is not necessary that all of these genes are going to be active at the same time right or indeed active in the same cell type. So, how these genes are expressed itself is a field of study. So, these are experiments done on DNA microarrays where you so, each spot in this microarray is a single gene and then the intensity of this fluorescent spot tells you how active that gene is. You can do that so, you can do this so, this is this is a black and white microarray which you can convert to a color and then you can do that over a period of time. So, this axis is time and you can do that for different genes. So, in this picture over here my pointer does not work, but in this long picture over here the x axis is time the y axis is different genes ok. So, different and I think if I remember correctly green is sort of strongly on and red is strongly off I could be mistaken it could be the other way round. So, you will see that a gene stays on for some time then it switches off and then again it turns on back at some other time something stays off it switches on then again turns off and you can look at how these how these gene expressions are correlated in time. So, which genes switch off with what other genes ok. So, you can build these correlation matrices among genes by looking at this sort of gene expression data as a function of time and that again tells you functional information about how genes cross talk with each other how different proteins interact with each other. So, if one protein is interacting with another protein those two genes might be turned on at the same time or they may be turned off at the same time. So, these are enormously complicated, but again this data is available we do not exactly completely understand this data of course, but at least as far as modeling purposes goes a lot of data is available. So, again this gene expression omnibus is again an open source genomics database where you can go in and you can see what sort of this gene expression data is for different organisms for different cell types are available and again these are slightly older numbers I am sure the numbers have sort of improved in the time since I took this slide alright. So, the idea. So, now that I have talked a little bit about biology so, what I hope to achieve during this if you to buy this course is that we cannot cover all of this biology that is a given biology is enormously complicated I will cover may be 5 percent right, but what I hope that you will take away from this course is an idea of how should you given a biological problem that you are interested in how should you go about building a model that takes into account the quantitative data that is available and makes quantitative predictions ok. We will do different systems, but hopefully the idea the ideas that will develop what will be applicable across to whatever other system that you might be interested in. So, the again the generic idea is that of course, you so sort of try to identify which parts of the system are relevant to the question that you are asking and then you overlook the other parts the other complexities and you develop a mathematical model and through that model you generate quantitative testable predictions right. What should this model satisfy? Of course, it should satisfy basic physical laws just because biology is complicated does not mean it can violate physics. So, you will have whatever conservation laws you will have whatever laws of thermodynamics that we know. The key thing is to sort of identify what are the relevant features for the question that you are trying to ask. So, if you are trying to ask about this chromosome folding does it really matter what exactly the sequence is maybe it matters maybe it does not I do not know. You might say that well because this is this is at a scale which is one third of a nanometer and this packaging is at a scale which is of a micrometer maybe the exact sequence details do not matter. You may be right you may be wrong I do not know, but that is one of the key things that one needs to learn how to identify that what are the relevant features what are the relevant spatial and temporal scales that I should be talking about. Once you have identified that you can guess what is the correct physical description and write down these mathematical equations. Once you have a mathematical equation you can generate predictions and then if you are right great if you are wrong you tell yourself that you learned something which is true and you go back and generate a new better model hopefully. Just to show you what I mean by this identifying the correct scales and the correct features of the problem let me give one or two examples. So, here is DNA right which we have been talking about. So, at the very microscopic level if I go here closely into the DNA what I have is the sequence right over there in the top and you might. So, if you are asking questions at a very microscopic level that will a protein come and bind here or there that depends exactly on the sequence of the DNA. So, you need to know the DNA you need to know the DNA sequence if you need to make any predictions of that sort. You can maybe abstract a little bit and say that well I will just abstract this sequence into two types of sites a binding site or a non-binding site for the DNA and then this protein might have different maybe it has thousands of binding sites over the genome. So, instead of saying that it is A, T, G's and C's I will just convert this into a binding site a non-binding site a binding site and so on. That is another level of that is another level of abstraction, but if you are just interested in how much a protein will bind maybe that is enough. If you zoom out a little more if you are not interested in such granular microscopic details you could maybe treat the DNA as a charged rod. So, these nucleotides have negative charges exposed. So, you can treat the DNA as a negatively charged helix double helix or because it is a very long object you could treat it as a sort of elastic rod. So, beads connected by springs and then you could ask how easy is it to bend the DNA. So, for example, because you have to package the DNA inside the nucleus of course, the DNA cannot be very stiff you need to be able to bend the DNA like this, but at what scales can you bend it. So, what is the persistence length of the DNA questions like these maybe you do not need to know the exact sequence maybe you can just model it as a bead and spring and then depending on the stiffness of the spring or if you get the right stiffness you might be able to capture the correct physical properties of the DNA. If you zoom out even more you can maybe talk of the DNA as a random walk in space and that would get some things right that would get some things wrong. So, even if you think of DNA as a random walk it is not that that is a completely. So, random walk in space it is not that that is a completely bad model let me just show you this. So, some of you have done random walks right. So, how does the end to end distance scale for a random walk with the number of base pairs or with time anyone. So, if I have a random walk made up of n. So, n legs so, 1 2 3 4 like that then how would the end to end distance scale with the number of units r square would go as n to the power of 1 right. So, if you look at various DNAs for various organisms. So, this is the radius of gyration which is closely related to the end to end distance you will see that it grows as n for various different organisms. So, this is a virus lambda phase this is east this is a fly chromosome and then there is the human chromosome over there if you connect all of these dots. So, if you just say that if I take this DNA polymer and I ask how large is this polymer can I get an estimate of the size of this polymer given the length of this DNA which is n it turns out that r g square goes roughly as n and that is true across organisms. So, depending on what the question that you are interested in different levels of modeling may be appropriate. You need not have the exact microscopic details in order to ask a very macroscopic coarse grain question. So, you need to identify what spatial scales or what temporal scales are important in order to answer the question that you are asking and then do your modeling at that correct scale. So, this is true I showed for DNA this is also true for proteins. So, again at the most microscopic level you have your amino acid sequence which is whatever amino acids that you have. You could then sort of abstract away a little of the complexity and say that I will just say whether it is a hydrophobic molecule or a hydrophilic molecule and then you come to these class of models that are called this high HP models. You could abstract away a little more and say I will only talk about the second V structure whether there is an alpha helix or a beta loop or you could just say that well I will just talk about the functionality that it is may be binds to some other protein. So, it is acts as a receptor or the ligand and then I can just talk about the energetics of this binding this ligand receptor binding and again these are all of these are things that we will sort of look at in greater detail as the course goes on. But the idea again is this that the most crucial thing when doing modeling of a biological problem is to sort of identify which features are important to that problem and which features are not. If you can correctly identify that you can then go about building the correct model for your problem. So, we sort of think of biophysics as a slightly modern subject in some sense it turns out that you know people have been thinking about it for a long time. For example, Schrodinger of quantum mechanics in his what is life book said that thought about these same problems and sort of crystallized the question by saying that these are events in space and time which are taking place within the boundaries of a living organism. And how can I account for these processes that taking place in space and time which cannot violate my laws of physics as I know it how can I account for these processes using the tools of physics and chemistry ok. So, that sort of the spirit in which we will approach this course that how do I take this complex processes that are taking place inside living organisms and how do I account for them using whatever physics knowledge that I have or whatever chemistry or maths knowledge that I have ok ok. So, that is yes that is all for today so we will reconvene back here on Friday.