 So my name is Bing Zhang. I'm a professor of molecular and human genetics in the Baylor College of Medicine in Houston, Texas in the United States. So I'm a PI of a lab of about 10 people. So our focus is on computational biology with application to studying cancer. So my lab is supported by funding primarily from the National Institute of Health and we're currently applying integrative bioinformatics methods to the understanding of cancer and try to figure out better ways to treat cancer. I think there are probably two aspects of this. First, I think people have to understand the need. So why we need to do the integration. I think there should be some demonstration project to demonstrate that there is a value to integrate the data. And I think some of the government effort, for example the CPTAC effort in the U.S. is doing something like this to try to bring people together and demonstrating the value of by integrating genomic and the proteomic data. For example, we can learn something that we won't be able to learn by looking at the data separately. I think then we can get some papers published for example and then people will appreciate the value of this. And the second part is, I mean, even if you want to integrate, you need to have the ability to integrate the data, right? I think currently this is through integration, interaction between scientists with different types of expertise. But I think in the future we also need to think about education. Our next generation scientist should be able to at least understand both genomic data, proteomics data, etc. And also can understand the computational part and the experimental part so that they have a holistic view of the biology and then they can do much better job than what we can do today. Well, I think basically these are basic technologies and the whole genome sequencing technology is trying to study the DNA sequences and the transcriptomic sequencing are trying to study the transcribed mRNAs. I would rather also through in the proteomics part, which will study the protein, which is a translated product of the genes. People sometimes ask me, I mean, which one is a better technology? Which one do you like? To me, I think a better way is to integrate all these technologies. Again, I mean, we want to get a holistic view of the system and the way to do that is by analyzing the system at different molecular levels, if you can, and then using informatics approach to integrate this data and this will give us the better view of the system and better understand of the disease if you want to study certain type of diseases. I think the cis EQTLs and trans EQTLs, they are not exclusive, actually a lot of times they are the same or they are connected. Let's think about a transcription factor. If there is a QTL in the promoter region of the transcription factor, it will be cis EQTL because as sleep in that region will affect the expression of the transcription factor itself. But because the transcription factor itself alter the expression will change the activity of the transcription factor and then it will change a lot of downstream genes regulated by the transcription factor. In that way, the sleep will also be a trans EQTL. So, a lot of times we can use the cis EQTL to identify the genes that are potentially important, but the trans EQTL can also tell us what is the downstream regulatory network that this sleep is working on. So, I think they are actually connected, they could be connected. So, I think that is actually a very fun part of our research. I think bioinformatics as a research area, one thing we are doing is to try to develop algorithms and methods in order to solve problems, but eventually what we want to do is to solve the biological problems. We want to make the biologists who don't have the ability to program, for example, to have access to the tools. So, a lot of efforts in my lab have been spent on this direction. So, basically we develop web-based applications so that through very user-friendly interface people can have access to a huge amount of data and then they can also have appropriate tools that they can directly use to analyze those data. One example is a linked omics tool we recently published. I think it has been used by, it has started to get many users from the cancer research community. Yeah, I think it's really interesting part of the research to make your tools or methods directly available to biologists. Actually, most of the protein interaction network data we can have or the protein interaction data we can have today in the public data repositories, I would think most of them are the static or more stable interaction relationship rather than the transient interaction relationships. I would hope that more experiments can be done in this area that can help us to identify the transient interactions and then condition-specific interactions and then we can better annotate the network and the interactions within the network and then we can use the right network to interpret our data in the right conditions. So I think it's not that we already have a lot of transient interactions in the data that but it's I think we just have very few of those interactions and we need add more but of course we need good annotations in the database to let people know about that so that they can identify the right interaction network for the specific condition they're interested in. I think the primary protein sequence can provide a lot of information that you can use to predict protein-protein interaction but the studies have shown that and only use that information you won't be able to reach very high prediction accuracy. Leveraging other type of data can certainly improve the prediction accuracy and I also want to mention that the technology like the deep learning and this more advanced machine learning technologies that are available today because of both the software and the hardware improvements now can enable us to better predict the protein interaction for example based on the primary sequence but still and if we can incorporate more other type of data that can certainly improve your prediction accuracy and especially when you want to predict the condition specific interaction I don't think the primary sequence can give you a lot of information on that and for that part specifically you want to incorporate more information. That is true on most of the databases even the protein interaction database and the entities in those databases are actually genes when we talk about protein-protein interaction the protein interaction database we're actually talking about the gene level data and I think the challenge is not the database itself is gene centric I think it's we just don't have enough information to distinguish the function and the interaction and the characteristic of the individual protein isoforms. Again I think in the future I hope the protein the databases can be protein isoform centric because different isoform can actually have very different functions but in order to achieve that and for example the proteomics experiments the sequence coverage has to be improved a lot because currently if you do a math back based experiment the sequence coverage is actually pretty no it's less than 10 percent could be and with that you won't be able to very well distinguish the different protein isoforms that's why the interpretation is usually done as at the gene level it's not difficult to and converts the protein level data to gene level you because it's aggregated to the gene level right but I think it's more difficult to get the detailed data as the protein isoform level and the build database centered around the protein isoforms rather than genes. My name is Karsten Kug. I'm a computational scientist at the proteomics platform of the Proto Institute of MIT in Harvard and I'm interested in how we can integrate large-scale data sets that have been acquired using different omics technologies. In the gene centric analysis we study the gene or the gene product itself so meaning we want to compare its expression between two phenotypes. Let's say in the context of cancer we want to know whether a gene is specifically upregulated in a cancer compared to a normal tissue which would probably or potentially introduce a new target for this for this specific cancer. If we study the entirety of all genes or gene products in the cell we have to perform a statistical test which tells us which genes or proteins are statistically significant between human normal samples and we would end up with long lists of differentially expressed genes which are sometimes very difficult to interpret. So in order to better understand what is happening and for example tumors on a molecular level we would usually or typically map these proteins or genes to pathways in order to better understand what is dysregulated in these tumors on a molecular level. There are many different or several different databases that facilitate this kind of analysis so there is the reactant database, there is the CAC database or also the database of molecular signatures or MCDB. So if you ask me whether gene centric analysis or pathway centric analysis is better so I personally think that both types of analysis are equally important so sometimes a gene centric analysis alone will probably cannot give you the correct answer because the gene that you are interested in is probably not necessarily statistically significant or it's only like a marginal case but if you look at a specific pathways and several members of these pathways are going into the same direction in the tumor sample for example which so this gives you more evidence that this pathway is regulated for example. So we know that many many mutations, millions of mutations have been associated to certain diseases and phenotypes but only for a very few we know actually the molecular consequences that are being introduced by these mutations. So if a mutation affects a coding region of a gene so it might be a non-synonymous meaning it can introduce an amino acid change in the corresponding protein sequence and if we think about post-translational modifications like phosphorylation of serine, thionine and tyrosines this can actually affect these these phosphorylation sites. So serines, thionines and tyrosines are actually very abundant in the human proteome and therefore it's very likely that these amino acids are affected by mutations. So very or probably the simplest case of like the impact of a mutation of on phosphorylation sites is that a phosphorylation site now gets mutated into a different amino acid so meaning it cannot be phosphorylated anymore and we so it's very crucial to understand what kind of downstream effects these kinds of events have. Like in case a mutation affects a modified amino acid like a serine that is usually phosphorylated now the serine is being mutated into a different amino acid it cannot be phosphorylated anymore so and it would it's very crucial to understand what kind of downstream effects are introduced by these kinds of mutations. So this is probably the the most simplest form of such events other forms of these events can must not necessarily directly affecting the PTM site but they can be happening in very close proximity and for example in phosphorylation we know we very well know the enzymes that are responsible for phosphorylation kinases and phosphatases kinases which are phosphorylating their substrates are recognizing a very specific stretch of amino acid that surrounds the PTM site so this is one of many mechanisms how a kinase recognizes its substrates so the kinase usually has between a couple and a hundred substrate and so these amino acid stretches around these PTM sites as one mechanism how a kinase recognizes its substrates so if a mutation now changes the amino acid composition of these flanking sequences as we call them around these PTM site has direct effect on the kinase substrate binding specificity so it might happen that a phosphocyte that has been more phosphorylated by specific kinase like AKD-1 for example can now not be phosphorylated anymore by this particular kinase because it cannot recognize its substrate site anymore so on the other hand or like another more complex example would be if the kinase recognition motif now changes from a kinase A to kinase B so the wild type form the unmuted form the phosphocyte was was phosphorylated by a certain kinase and now after the mutation the kinase recognition motif fits better to another kinase which now can go and phosphorylate this phosphorylation site so all of these events are probably not well understood as of now and I think it's very important to learn and to study these kind of events more in detail I think phosphorylation by far is the best and most studious post translation modifications to date because we have the methods to study these in phosphorylation on a large scale there's other post translation modifications like ubiquitination or lysine acetylation which now we also have the methods to study those at large scale or some patient samples of course we can very easily study whether mutation effects directly these PDM sites like whether these lysines are being replaced by another amino acid and now these lysines cannot be ubiquitinated or acetylated anymore but I think we still have very limited knowledge about specific binding motifs for a lot of these acetyltransferases for example so there are specific examples that when we know the sequence motif when we talk about histone modifications for example but our knowledge is still very limited in this regard I think what is very important for a biologist is to be able to at least partially analyze their own data so now biology has moved away from you know hypothesis-driven a very targeted type of analysis or experiments more to like a data-driven omics type of experiments so the amount of data is on a completely different scale compared to 10 years or 15 years back so even as a wet lab biologist it's very important to be able to analyze your own data so you need to have some computational skills I think an easy way I think okay I think an easy way to get started with any computational analysis or data science driven analysis are scripting languages like R or Python for example so both of these languages are very popular and very heavily used in data science and in general but also in computational biology in particular I can only highly recommend any student who studies biology to get some skillset in R or Python