 Federated analysis is the ability to access data for analysis but without physically sharing it. Typically in a federated analysis system you have a central node which is connected to several federated nodes that host the data. In federated analysis you perform analysis at a distance. You tell the software to perform analysis on the data and in this way we say that we take the analysis to the data. So normally there are three steps in the setup of such a system. The first step is to make sure that all of the data in each of the different nodes in the federated system are in the same format and follow the same rules. Secondly, data need to be hosted on nodes which are connected together so that information can pass between these nodes in our case between the federated node and the central server. Thirdly, specialized software tools need to be developed and installed on each of the nodes so that each node can perform the analysis in the same way. So when a user logs onto the system they log onto the central node and then they can connect to one or more of the federated nodes and instruct the software to perform particular analysis on these federated nodes and then these federated nodes will send back the results that can be then interpreted by the user. Federated analysis is typically used when there are restrictions on data sharing for legal or ethical reasons for example. Another situation where federated analysis is very useful is if you want to combine the results of several studies in order to increase the statistical power. In order to perform biomarker discovery and rhapsody we have set up a federated analysis system that now contains data from over 10 clinical cohorts and almost 70,000 individuals. Software running on our system cleverly combines aggregate or summary data from these different cohorts and creates an image of a single cohort which can then be analyzed. We use mathematical tricks in order to be able to do this without physically combining any of the data together. Such an approach could represent a new future for clinical data research. The system is scalable meaning that we can add or expand the system with new federated nodes in the future and also add new data to existing nodes thus increasing the statistical power of the system by increasing the number of individuals but also including new data as they become available. We are currently developing new algorithms including machine learning tools in order to exploit and analyze new big data especially those come from multi-omics technologies whilst keeping the data safe at their local institutions. A federated system such as this allows powerful analyses to be performed at a distance without the need to copy data between institutions or countries therefore protecting patients' rights and their privacy. This federated analysis system is currently being used to analyze data from multiple studies to discover biomarkers for risk, prediction and disease progression in time to diabetes.