 So Basically my lab is a bioinformatics lab that's a two things one is analyze next-gen sequencing data and the other one is analyze microarray data And kind of historically I got my phd in developing some of the core methods used in analyzing microarray data So it's something that I've been doing for a lot longer than I want to remember 12 13 years now And to some extent microarrays are a technology that's not particularly evolving anymore They kind of reached a very stable very strong point and people often ask why do we use microarrays? And actually within my team we use microarrays every day They're a quality control for next-gen sequencing studies. They're a cheap way of doing a large number of samples They're a good rationale check there are more useful for certain types of that and so it's some fundamental way They're important because there's still a major use and they're important because they provide us the Origin of most of the ways in which we analyze secrets and so there's a lot of flow through how we work with microarray data That is informed how we work with But I'm going to walk you through that today, and I think mostly I'm going to talk to you about What you should be thinking about in the context of a micro-experiment what it actually looks like you start off with Platform and samples that you want to do and then you ask what do you need to do to actually get to the useful biology out of that And we'll sort of flow through that over the course of the day So we're basically going to be designing for maybe an hour or two hours and then at the end of that you will guide an analysis of the micro-experiment The first part might take a little bit longer than two hours It's probably around time Come back afterwards do a little bit more Then move into the workshop. The key things that I'd like to get out of this I want there to be an understanding of the data itself where it comes from what are the types of microarrays and why they have error Why is this not a perfect measurement technology? What are the things that go wrong? That should allow you to appreciate the overall pipeline that we use to analyze the data and then in the practical session You'll learn how to input the data and do the basic pre-processing We'll start off with a question for you the most common And being able to measure something about our name, but what are they actually mentioned? What are these the top microarrays actually? What about the data that has been first transcribed from M. R. And by hybridization. Yeah Anybody want to highlight anything? The report itself which is the most important difference between a microarray and a microarray Ultimately an array doesn't measure In an absolute basis I can't do so. It's not just a bit of an array. It's giving you a relative abundance I think the microarray can never compare two different genes to one another. It can never compare The genes across different regions of the gene in a quantitative way. It's only able to look at the correlation point The other part of it is they measure RNA abundance in some way. They measure for this transcript and this region of the gene They wouldn't measure different place points. They wouldn't measure different variants. They wouldn't even measure what they measure Aligning what are microarrays in a bit of detail and The definition of the microarray is here Multiplex technology containing thousands of spots each of which contains oligonucleotides The oligonucleotide length can vary quite a bit and each of those spots contains picomoles of the The oligonucleotide for a specific sequence Most classically we use it to want to take RNA, but it's also equally useful to do DNA And the applications are very varied Normally use them in a hypothesis generating code. So the idea would be to identify genes or features that show some interesting characteristics over a set of experimental conditions and then That would allow you to take those into further follow-up studies to understand mechanistic detail And so from a paper perspective, you'll normally see the microarray screen is Figure one and two Mechanistic work. There are some really obvious and important exceptions. The most key one would be butthole analysis So the idea would be that you run a series of microarrays on clinical specimens And then you use those to discover some sort of a predictive marker I can tell you about the patient response. And so Rob would have talked about some of that data yesterday Of course, there are pathway analyses that are inherently linked to microarray. And hypothesis generating Obvious necessity or inspection of it. I'll come back to this at the end The sample of the problem that I've seen in liver specimens or some other media Makes immense to start thinking here is a potential source of bias, harder controls that I can So Let's talk about what an action of an experiment is critical to doing good bioinformatics So the simplest microarray in the world is made for an X95 and So there's several other things Asymmetrics are really worth sparing. There's a series of spots. These actually go all the way down the size of the array Those spots are what are called landing lights. They're quite light but one of my favorite topics That's that we will talk about the microarray now Part of this is really looking at this and you could ask the exact same questions about Has anybody here used machine learning so everybody's use machine learning heads up All right, so where did you use machine learning today? Today has anybody done it today nobody has used machine learning today Okay, so anybody take an elevator today Check your weather Use Google Amazon if you like this book you might like that book All of the global mail the front page of the global mail uses machine learning The front page of the global mail uses machine learning to decide from a couple of different features What article that they can show you one of the features is geographic location It's actually a nice to describe how those things that you can talk about have a group of things together across two criteria Okay, so we're gonna talk now So as Michelle said start getting the cell files the expression level files from the Waking to download and then we'll walk into the The practical right after this Michelle we go until 12 30 So we'll probably need 15 or 20 minutes right now to go through the rest of the the didactic stuff What I want to talk about is give you an example of how The general pipeline gets changed or modified for a specific platform This is the general pipeline and it's all the steps that we spent lots of time talking about Actually every aphiometrics environment is quite fine in the same way Can tweak it a little bit when we see miss once there's some By default they all happen exactly the same way One channel only a single sample is on each array. So there's no psi three psi five There's just a single fluorescent label everything else. We basically ignore spot quality Couple of normalization aphiometrics arrays will do a simultaneous normalization that it's Both in turn and intro array variable collapse that together Kind of refined one EL files cell files are the output of the metric's quantitation and are starting point We do background correction Step that's unique to aphiometrics arrays probe sedentation talk about that in a second Then we do statistics fostering integration the arrays are kind of outdated by definitions our president one time point in time We know these the gene today. This is our knowledge of the human That's what we we base everything on being finished splice variants will be discovered Different types of actually is a transcript to know that is reduced only 500 copies of a single product and you have to produce the software and the entertainment That's a lot of money and respect that they would like to have a single product that could last a while and so And so this A mismatch in 12 base pairs of mismatch probes to identify whenever the fact of modern arrays They don't make these any high metric single selling array. Yes, there's anybody. No one that stands for unit gene So a long time ago One of the most common ways of getting the knowledge of the transcriptome was doing what's called est sequence random RNAs Sequence them and you do this using traditional Sanger sequence as an experiment But you could learn something about the structure of the transcriptome Then be assembled on our knowledge of the genome into sort of transcript clusters So that was unit gene. So unit gene build 133 A quick deal for where we are today Um energy So unit gene is still available and is sort of still getting use. I don't know how widely used it is today But it's available for all sorts of species including things that don't get a lot of a lot of Usage from other things and that's one of the advantages of it So the human unit gene build is a representation of the transcriptome and it's at build 236 so in other words for a product that started in 1990 and is probably going through A version every month and a half two months that tells you how old the definition of the transcriptome that was Used to define afymetric single best-selling array today. It's like 10 years old and so the net effect of that is um The net effect of that is that it is in our knowledge of the transcriptome. So They have all sorts of uh, wide usage today and they represent a body hundreds of thousands of arrays in the public literature That represent really interesting things. For example, there are thousands of breast cancer samples that have been profiled on these arrays We want to be able to account for that So we find out that there's a splice variant right here that separates off this part of the gene Identify that one of these samples cross hybridized sounds like a lot of work each of 25 basepairs Who wants to go ahead and like analyze your microwave? Well, we started off talking about here scanning the array Back to what we see imagine that I designed On the human genome All right, there's some samples That array is not going to be a good representation of the human Imagine that we have a Human genome in some cells body. It should be exactly the same Are there any How many point mutations might there be in a typical breast cancer? Now each of those point mutations not just at the protein level But even a non synonymous synonymous change that doesn't affect protein level will affect the hybridization to an array Now you've got a problem in cancer samples. You might have something that looks like it's different Symbol of using multiple probes across the gene and able to identify those types of artifacts hybridization This is particularly subtle, but imagine that you're very correlated to the signal Some array centers have ozone free rooms to allow them to do it I found this so interesting where I was trying to do chip chip experiments And I could not get anything to work Summer so much still I thought I sucked which actually do but it wasn't just that I sucked Nobody could get anything to work in the summer because the ozone I also know so there Processing is an attempt to to remove those three processing It's never a substitute for good experimental design And I'm not going to talk to you about statistical design but there's a product as hard as you can in a whole series of different things but Valencing also means that you have money to do 50 arrays this year at 50 arrays next year See if it actually looks to some sort of a change to a technical replica Take multiple RNA samples from the same tumor far taking multiple to also find out that At the same time technical and biological variability together instead of just Now imagine there are only three In the world your technical thing money is a little after biological replication so that we can take the differences there And we're not going to take What happens if all your tumors are Be fixed for five please ffb fix these normal tissues. It's really worth it. Trust me. It's really worth it Actually, usually By being true about This are the problems that we're going to face. Here are examples of people surprisingly People who want to save money and not do all the controls that they need to So all you can do is ask and say and present the consequences and People will usually I find that my best web lab collaborators are the ones who actually listen And the ones who listen are the web lab collaborators that I don't uh and similar People sometimes tell you you know what that control won't work or won't do this because of some reason that you didn't realize That's really good What is going on? We're going to start looking at some aphometrics data There are two major ways of pre-processing aphometrics in the literature multi-array protocol protocol and it does full pre-processing both of these And pre-processing all the way from background correction normalization and process summarization So rma is a little bit newer than mass five and they each have sort of strengths and weaknesses Actually, they can their software package was called micro array analysis suite mas and version four of it Had an algorithm that was called the average difference algorithm, which Very good at what they did Decided this was a bad At the same time, you know, we could probably do better Aphometrics learn something rma and mass five are pretty interchangeable to develop the software probably somebody else will It's very much The last substantial novel algorithm that came out of aphometrics the work of those statisticians convince them that there's no point in it So rma and that Multiple replicates will tend to be much tighter Mass five will have better Take a look at the full changes that you can Because the fact of it is that you're going to have What you might be missing stuff you can give me Do a mass five normalization You can't quite do that with rma. There are weeks to modify it to make it work in that context So things like frozen rma, but nevertheless, it's hard for diagnostics Mass five might be preferred for kind of small