 Hi everyone. My name is Göksel. I'm from Newcastle University. My PI, Neil Weipert, was meant to give this talk, but unfortunately he couldn't make it, so apologies for me. I will first remind you some of the challenges in engineering biological systems, which are relevant to the rest of the talk. First of all, biological systems can be very complex and they can have very large design spaces. So even for a DNA region with 10 bases, there are around a million different solutions. And when we start thinking about molecular interactions, this complex increase is even further. So currently most applications are still small and designed manually. And obviously as size and complexity of these systems increase, it may not be possible to even manually design them. So computational design and simulation techniques are necessary for large-scale synthetic biology. So there are already a number of genetic circuit design tools and these tools essentially use mathematical models of biological parts to create complex and predictable systems. However, often these tools lack access to models, so it would be ideal to have repositories of models that are modular, reusable, and also in standard formats. So the information to build these models is already out there in different biological databases. So one way to efficiently facilitate the design of biological systems is to use extensive amount of biological data that are already available and growing. So biological data are really huge. Just the Janbank database itself alone includes around 200 million sequence centuries from more than 300,000 different species. And secondly, biological data are often stored in different biological databases. And the data are often available in different formats and the meaning of data can also change between different databases. So the Nuclear Research Journal publishes a database issue every year and as you can see, the number of databases is growing every year. So the latest issue reports more than 1,600 biological databases. So somehow biological data that are relevant to the design of biological systems should be integrated and presented to computational tools in proper formats. So one way to represent this biological data is obviously to use of biological networks, which look like these hairballs. But when you zoom in, you can then see some nodes and edges between them. So nodes can represent any biological entity, such as genes, proteins, or promoters, and the edges are the relationships between those entities. In this case, we have very simple protein-protein interactions and the edges are some sort of associations. In this network here on the left-hand side, we have a different type of network. The difference is the edges are labeled. So ideally, these labels, or we call them terms sometimes, for types of nodes or edges come from control precubularies or ontologies. That means even if underlying data sources change, we can use these terms or labels to create the data. So one way to represent these biological networks then for more formally for computers is to use existing semantic web technologies. These technologies have already become very popular in modeling, accessing, and exchanging data in the life sciences. So there are numerous databases providing their data in the resource description framework, or RDF format. So in RDF, we have triples. I won't go into details much, but triples are about subjects, predicates, objects. Subjects are the things that we want to talk about. They are interesting bits, and they are somehow related to some other resource via this edge or predicate or label, whatever you call them. However, using RDF to query data requires the exact use of this graph pattern, so it is more challenging to query data. On the other hand, if these ontologies, we can create more logical queries. For example, using ontologically, I can infer that this biological process term is a parent term for this small molecule biosynthetic process term without going through all the graph structures. So it's easier to query using ontologies. Again, I won't go into details. But there are many semantic web resources for synthetic biology at the moment. For example, gene ontology to provide terms for molecular functions, biological processes, cellular components, and sequence ontology to unify the meaning of biological parts. For example, there's a term in sequence ontology for a promoter. So if you use that term, everyone, every tool would understand your application. Or when you exchange data, it can be understood easily. Systems biology ontology to annotate your synthetic biology models. And a synthetic biology open language is a data RDF-based resource description framework-based data exchange format for exchanging synthetic biology designs. There are also examples of RDF databases, such as standard biological parts knowledge base, which allows querying iGEM parts. So the ontological resources that I mentioned so far are very useful to unify the meaning of data. However, we can use ontologies in a much more powerful way to capture domain knowledge for our designs. In a manual design, this knowledge comes from a domain expert. But if you want to automate the designs, ontologists can facilitate providing this domain knowledge for us. So we came up with an ontology and we defined some terms. So in this ontology, there are terms to describe basic biological parts, such as operators, spacer sequences, or we call them shims, terminators, and so on. We also have types for other biological components, such as enzymes, compounds, proteins, transcription factors, protein complexes, and so on. And we also have terms for reactions and pathways. And some of the other terms can be used to classify the rest of these terms. For example, the EC number is to classify enzymes and their corresponding biological reactions and the clusters of orthologous group IDs to classify proteins based on their orthology relationships. We then wanted to reuse existing resources as much as possible. So as you can see, there are some terms from sequence ontology, gene ontology, and also from the s-ball data model. So using these generic terms allows us writing queries, even if you don't know all the details about this ontology. The next thing, we defined some properties, we defined the relationships between these terms. So the boxes represent the terms or the classes in this ontology and the agency are the types of relationships. So using these semantics, we then integrated data from several resources. And we created a knowledge base for synthetic biology, particularly for basal subtleties in this case. So the knowledge base includes information about sequences, annotations, metabolic pathways, gene regulatory networks, protein-protein interactions, gene expression. Now we have the unified meaning of terms and a knowledge base using these defined terms. I can now write queries that I could not execute before. So one sample network from this knowledge base is the MNTR transcription factor network, for example. So you can immediately see the features of this transcription factor, such as the binding sequences that this transcription factor binds to, the coding sequence and coding this transcription factor, its molecular functions such as iron binding, transcription factor, manganese binding and so on. So when developing an ontology, we often start with some competence questions. So these questions are really informal and they are only used to verify and test ontology. One particular example is which part can be used as useful promoters. So we had some discussions yesterday and today about the importance of such promoters as logic case to create complex designs. So the answer, as captured in the ontology, is we are looking for promoters with a single operator which has the regulation type set to positive. So these are our target promoters. So an ontological query would look like this. So we basically create a new term called inducible promoter and it has the term definition. So it is also human readable but also computer readable and it's very intuitive and easy to understand. So the query or the class definition is it's a promoter and it has exactly one operator and that operator has to be positively regulated operator. I can then create another term for this and say that positively regulated operator class is an operator and has the regulation type set to positive. And when I submit this whole knowledge base into an ontological reasoner, some of the promoters that I have in my knowledge base would be classified as inducible promoter. And in this case, I had 51 of these promoters classified as inducible promoters. Here we can see some of the subset of these results, not all of them. The whole box including the green area and blue area represent a promoter and the blue areas represent the binding sites. And the length of the whole box is proportional to the length of a promoter basically. So using this approach, I can easily classify biological parts and use them in my computational tools. Examples include activator sites. So I could classify around 200 activator sites, around 300 repressor sites for example. I then classified 85 repressible promoters using knowledge base. Then I wanted to look for inducible promoters with two inputs. So those are really important because I can then create complex logic gates. These kind of promoters can act as AND or OR gates. I then searched for repressible promoters with two inputs and those promoters can act as NAND or OR gates. I also classified promoters based on sigma factors. Here for example, 67 of them classified as sigma B and around 300 constitutive promoters in knowledge base. Then I wanted to look for spatial coding sequences. Often we are trying to find nucleotides to achieve a certain behavior and that behavior usually comes from a molecular function of a gene product. So I wanted to classify coding sequences based on gene products functions. So I classified around 50 repressor encoding coding sequences and around 40 activator encoding ones. Then the two examples here are for building blocks of two component systems which are very useful to develop communication systems or to develop biological sensors. So I could classify some of the coding sequences as response regulatory encoding funds or kinase encoding coding sequences. So I can of course ask many other questions and the ontology can answer to these questions. Another example regarding pathways is which pathways should be targeted for the overproduction of ammonium. As captured in the ontology, ammonium is produced by reactions that are a member of arginine and proline metabolism and pre-metabolism pathways. A more complex query would be which parts can be used to upregulate the production of ammonium. Again, the answer is in the knowledge base, the ammonium for example is produced by reaction consuming the carbamide compound and carbamide is produced by reaction which is catalyzed by an enzyme. And this enzyme is encoded by the arc-arc coding sequence. I can also ask questions about protein-protein interactions. In this case, how can disposed ray protein the master regulator of sporlation be phosphorylated to trigger sporlation, for example. And we can retrieve a list of suitable answers for different types of questions. So far, I talked about the importance of data integration and how we can use biological networks to represent these biological data. And then I talked about semantic web technologies to represent biological networks. Then I represented an ontology to unify the meaning of terms for synthetic biology which is very useful for automated data mining. So once we have the ability for unified meaning of things and knowledge base and data mining all set up, we can now mine data to create some models. So model-deriving design of biological systems is especially important for large-scale synthetic biology. So these approaches have already been used in a number of different industries. In a typical model-deriving design approach, you have a model representing the relationships of a systems element. And you can then use that model to derive the system automatically. In the biological context, we can do the same. So we can start composing models from a repository. Once we have a composed models, that model basically is our blueprint. And it can be used to derive the DNA sequence that encode the behavior in the model. So using this approach, we can search large design space of biological systems. Or we can fine tune our favorite biological system, for example. So we can start with the initial design and let's say the target is to achieve 80% of the initial state. Or we can even change the whole system, we can evolve it by changing the whole topology. We don't have to swap just parts, but we can introduce or delete parts in the system by using their models. In this case, the same system is evolved to give inner input and output behavior. So to facilitate model-deriving design, we developed an approach called standard virtual parts or SVPs. SVPs are basically a mapping between genetic parts and their models. They are mathematical models of basic biological parts such as promoters, coding sequences, ribosome binding sites. And each SVP encapsulates a number of modeling entities and they only expose some inputs and outputs. The important thing is that these inputs and outputs make the models composable. We then developed a repository of these models. You can search the repository and the repository is only not available today, just for today because of some electrical maintenance back at university. So if you can try tomorrow, you can hopefully do some searches if you want to. So you can search by free takes, you can search by part type, by organism or the specific features of encoded gene products. And if you click to the search, you would then display the list of SVPs and each table would represent an SVP. And if you click to a row, you would then see more details. In this case, an SVP or standard virtual part for Compi, a kinase protein sequence with originates from some of the internal interactions that are encapsulated between the SVP and interactions with other parts. So that's the important bit that allows us to walk in a graph and start adding new parts and generate a large model basically. When you click to this interaction link, you can then see its details, its phosphorylation, its free takes form, the participating parts in this interaction and some interaction constraints. And typically these constraints are molecular forms of participating parts and their stoichiometries. So this information is available for humans as you see, but we want to use the same information in computational design. So the models for parts or interactions are available in standard modeling languages. In this case, it's an SPML, system-spiled modeling language format. So computation to use this format instead to compose models. So these models often include information that's necessary for simulation. So you can store the mathematical information, but to use models in a model-deriving design to facilitate different workflows, you need to annotate your models. In our case, we store or embed information between the models for the inputs and outputs. For example, which modeling entities represent these inputs and outputs and what kind of biological signals they represent. If the entities are representing, let's say, proteins, what are they, molecular forms, what kind of proteins exactly they are representing. So this information is really useful. And if you want to use models as blueprints to derive the DNA sequences directly, the models should also include DNA sequences and types of biological parts. Then the compose models would include everything I need to get back to the DNA sequence. Starting from a computation specification through simulations, we can refine, create a model that we are happy with and we can turn into a DNA sequence, ready for synthesis. So that's the idea. And recently we extended rule-based models to allow annotations. They didn't allow any external data previously, which is recently published if you want to have a look. And the computation access to the repository through a REST-based web service is quite lightweight, HTTP-based. That link is for humans and second one is for computers to access models. And the third one is for genetic description of the virtual part in the ASPOL format. And we have plenty of other methods that we can use to retrieve data. We then developed a programmatic access to the repository in the form of a Java API. So it returns Java objects for parts, interactions, and their models. It also provides methods to compose these models. And the final model, the method is really useful if you want to specify a full specification by providing the identifiers of virtual parts and their types. Then it would return a complete model ready for simulations. You can then simulate, for example, using copacy or your choice of simulator. So this is an example for a biological system. Now we use these composable models. You can think of this system as a black box. The blue one here is the input and the green one here is the output. The blue one is for satellin, is a peptide, lantibiotic used in quorum communication. And the output is GFP-green fluorescent protein. So in between we have a two-component system. Satellin phosphorylates a kinase protein, SPI-K. And when phosphorylated, the kinase protein, then phosphorylates the response regulator protein. And the response regulator, once phosphorylated, can act as a transcription factor. In this case, it activates transcription from this promoter to express GFP. Here we can see the model composition process. This orange box is representing models. And we can just link the edges to make the model simulatable. Again, programmatically, we can just start adding models, add a model of a promoter, add a model for mRNA, link it to and carry on. And finally, we have a model to write to a file we then simulate. Of course, these API and all the models are for computational tools. So our end users are basically tool developers. So for biologists, there must be tools who can use these models from the repository, but not directly through the tools. In this case, it is a tool called Symbet, which provides access to the repository. You can search the models and you can use this canvas to design your system. And finally, you can export the model for simulations basically. We also have a web-based tool at the moment, very similar to this. We then implemented three variants of the two components system I was showing you earlier. That's the native version. So this promoter is induced by the phosphorylate version of the SPAR. So this is the simulation results and that's the fluorescence measurements data. It somehow correlates. In the second one, we changed the first promoter with a constitutive weak promoter. And the results were very similar again, both simulation and the measurements. And then we wanted to change topology a little bit. We used this promoter also here to provide some positive feedback. So once the switching off of the system took a bit of time, but once it switched on, it stays there, it's more active, it seems like. So we are still working on these systems and we are trying to refine our models, feeding the data from the experiments back to the models. And we use a very simple mapping at the moment. We use all these. So that's the increase of mRNA and that's the rate of increase for mRNA GFP. Once you go to a steady state, there's no change, therefore these are zero. So we can then use mappings from the wild type experiments, correlating these different values from variants. And we can also use some nominal values for degradation rates and so on. And eventually, we can somehow map the relative transcription rates of promoters in all these different experiments. Okay, so far I talked about the data integration and how we can mine data to create their models, which are really useful to facilitate model-derived design of biological systems. So the virtual parsed repository is developed as part of a consortium project called Flowers Consortium. And we have some members in London and Cambridge. Ideally, the virtual parsed repository should be using data from the repositories here. But each group at the moment produce different types of data and they are stored in different databases with different data schemas. And apparently, each group is in different locations. So somehow these repositories should be working together. So this is also a common use case in synthetic biology tools. So ideally, the arrival of new data in any of these repositories should be broadcasted. Then some of these repositories can listen for relevant data. So using this approach, new available data can be used as input in another repository. For example, to transform data into other formats or to provide additional information. So we developed a tool called POLEN. It's a lightweight cloud-based messaging system. In this case, repositories, when they have new data, they publish a message. They can also receive messages based on certain topics. In this case, topics include provenance, repository, datasheet, part, model. So messages are really lightweight, as I said. They only include a URI, which points to a data and ideally in standard formats. So one use case would be, let's say we have a new part in VPR. We publish a message saying, I have a new part. And that message is listened by the automated characterization pipeline. The part is characterized. When it's characterized, it publishes a new message saying, I have a new datasheet. That message is listened by the VPR here. The actual datasheet is retrieved. A model is created using the datasheet. And then a new model message is submitted. So that's all automated. And the repository here is also listening for new part messages. And in return, it integrates data about that part and publishes a new message saying, I have integrated data and now you can use it. Of course, to facilitate these workflows, automated workflows, we need data standards. At the moment, the repository we are using is the SBMF and Kappa models. SBMF is for reaction-based models. And Kappa is for representing rule-based models. And for genetic descriptions of parts, we use ASPOL, the Synthetic Biology Open Language. So ASPOL is a recent data standard, maybe six, seven years, I would say. The data model is quite simple in the course of DNA component. And you can create hierarchical parts through this entity. Basically, it tells you where the child entity is in a parent DNA component. And you can associate DNA sequences. So recently, we demonstrated the usefulness of ASPOL and data standards. So in this use case, there are six different organizations, institutes participated. So we initially start with specifying an abstract toggle switch template. And then using some libraries, this template is compiled into solutions. In the next step, each solution is modeled and sequences are cordon-optimized. And finally, the models, along with the genetic descriptions in the form of ASPOL files, are submitted to the VPI here. The workflow was not automated. So we had to manually sanctify through emails. So the ASPOL community is growing. At the moment, it includes more than 100 members from 15 different countries. And these members are from over 50 different institutes. ASPOL is also part of Combine. It's an umbrella organization for bringing data standard developments in systems and synthetic biology. So currently, the development of ASPOL is facilitated by a chair and five ASPOL editors. Our current ASPOL chair is Herb Soro from University of Washington. And after four years, he's stepping down now. So Neil Leiput is elected as new chair to start in January. And I am one of these editors. And three of us from universities, and Jake from BBN Technologies, and Kevin is from TimeOfficial, our industrial representatives. And we also had different editors from different institutes in the past. So if you're interested, you can send an email to editors at ASPOLstandard.org with a few lines about yourself. And the editors can introduce you to other members of the group. And here we have the website, ASPOLstandard.org. There is plenty of information. So ASPOL was very useful, but it's not enough to exchange all the information we can have in a synthetic biology design. For example, what are the biological functions of design components? How do they interact? Who made the designs? What's the biological context that this design may work? Or how do I associate experimental data? What is the system model? If so, where is the model? Also, there are design paradigms being developed nowadays. And it will be useful to reuse these templates. ASPOL1 doesn't allow these as well. So we extend ASPOL1.1 and we send the published ASPOL2. It's available as a request for commands documents from the Biobrick Foundation website. And we simply introduce new entities. It's not the whole model, but some of the new data model here. New entities represent hierarchical representation of biological systems, interactions, and also entities to associate with models, basically. Using ASPOL2, I can now specify any sequence constraint between two biological parts. In this case, using the precedes constraint, I can now infer the order of biological parts. Or I can specify an exact location if I want to. Or I can also specify single cut sites. I can annotate my designs. I can say designed by me, for example, here. And I can also say I have a data sheet for this part at this URL. It is also ideal now for exchanging template-based designs. Previously, I was showing you a design and aim was to just change the first promoter. In this case, the whole template is imported as a subcomponent in my design here. And I have a second component, a promoter that I want to substitute this promoter with. And it results with this design. So that means I can now reuse templates as they are. I can also represent biological interactions and also different types of biological components, not just DNA-based ones. So using S-Bow-2, I can specify a compound, a small molecule, protein-based biological components, or RNA-based ones, or protein complexes, too. So I can now specify the whole system using S-Bow-2. Here, in this case, I have an example for the hierarchical representation in S-Bow-2. So the receiver here is a module, and reporter here is another module. And the receiver module has an output. In this case, it is the phospholate form of SPAR. And the reporter module has an input. It's the transcription factor. And these modules are now used in a parent module. It's called subtle receiver. It imports these two modules as submodules. And then in S-Bow-2, we can also specify mappings. So we can map an output from one submodule to another input. So that means we can computationally scale up the designs and represent them without losing any information. So I can also associate modules. S-Bow is not about modeling. So it doesn't provide a modeling formalism. So there are languages for that, like SPML or CAPA. Instead, we have placeholders pointing to actual modules. So I can, for example, associate an S-Bow-1 module and it works discretely. And I have a CAPA module. It's a rule-based module. So this is an example from a CRISPR system. It was interesting. The same design template was used in the previous talk. It's really nice to see design patterns emerging. So this design is basically from Nature Metas from 2014. Basically the Cas9 enzyme in this design is cathartically inactive so it can't be used for genome editing. So instead, it's used to guide the Cas9 and guide RNA complex by the help of this guide RNA to a specific area in the promoter. So this design pattern here is expressed as an S-Bow module. We have a generic Cas9 enzyme, a generic guide RNA and a generic target gene that is replaced by this complex. And then here it is a system that we are now designing using this template. And in this system, a specific version of Cas9 is mapped to this generic Cas9 in the template and our own guide RNA, which can point this complex to this promoter, is mapped to the generic guide RNA. And the target gene here is mapped to a fluorescence-according sequence EYFP. So there are S-Bow libraries all open so you can reuse them. They are all available at github.com, Simbiodex. The most mature of them is the BestBowl J, it's a Java library with support to S-Bow1 and S-Bow2. And at Newcastle, we've also been developing a JavaScript version of this and also a Scala version. And the C-Library, as far as I know, is about to be released maybe in two months' time by Herbus Soros Group. You can also contribute to these tools if you have time and if you want to. Some of the tools that we have at Newcastle, I won't go into details but if you want to talk later on, I'm happy to go through this. One of these tools is called S-BowStack. Basically, you can install S-BowStack anywhere in the world. So you can have, for example, four or five instances of this and you can query all the databases using federated querying from one end. The other tool is called S-BowHub. So this is more suitable for biologists. You can upload your design, annotate it and once the design is in the system, you can share it with other researchers. You can also make your designs public for everyone as well. And this is about visualization. So S-Bow has two parts that I didn't mention in the second one. One is the RDF-based data exchange format and the second one is how to visualize our designs. So this is standard representation using S-Bow glyphs. So V-Bow is a tool that takes an S-Bow document and visualizes. In this case, the input formats are S-Bow1, S-Bow2, also Gen-Bank and also custom JSON-based display list. And here the outputs are an SVG-based document that you can download or also as PNG files. One announcement before I finish my talk. So the next year's IWBDA conference, the 8-Internation Workshop on Bio Design Automation will be held in Newcastle. Hope you can make it too. And the first day on the 15th we will have a little S-Bow workshop and on the second day we will have a kind of a competition and the last two days are the actual workshops. So it's in August 16th till 18th, in August Newcastle upon time. And I would like to thank Neil Weiput and to all others who contributed to the research I presented and specifically people at Newcastle from the ICOS group and to our collaborators from Flowers Consultium including the people at Edinburgh, Cambridge, Empire College and also my calling from Auckland and to all S-Bow developers and funding bodies NSF and EPCRC.