 Okay, so hi again The next talk is starting now. We have Ralph Hainke here. I've started using Python in 1998 no 80 18 18 98 you're right for Started Python developing 1998 and mainly for biocomputing and biocomputing and nowadays he's working as a Freelance Python DevOps and he's going to talk about how you can use Python and are for statistical analysis. So give a round applause for Ralph and enjoy his talk Yeah, hello and good morning everybody and welcome to this talk about How to build a bridge between Python and are How many of you know are or are using can you just give a quick? Okay, so most of you guys as far as I can see For those who don't know what are is are is basically a huge package or a tool for doing statistical analysis and Calculating graphical representation of your data. It's open source runs on all major platforms like Windows, Linux, Mac and The language of are itself is my opinion not so great Python is much better language to program in but are the real power of are I think comes from the huge library of packages That's available around it and you can most download most of those packages from the CRUN network The situation we have faced when I started this project was that Python and are we're Basically completely separate ecosystems and when we wanted to do a Statistical analysis with data from Python. We had to basically pack it into a CSV files Transport them over to pi to are do the analysis and put them back into Python and that was not really very convenient for us So there are packages which solve that problem like at that time there was our pie Now there's our pie to and these are basically extension packages for are in Python so you compile are into a module you import it and Then this our pie to module provides functionality to access our do evaluation And get your results directly in Python It has a slight disadvantage which was a disadvantage for us that are runs in the same process as Python or even on the same machine So when Python in our case was running a web application server and we wanted to do analysis in our And that was a heavy analysis that was really slowing down our web server So we had to spread out are on a different machine and that was the approach we were taking That's what I'm going to talk about We wanted to build a bridge between Python and are and to be able to run are on a different computer or on a form of computers even and The first piece of that bridge the first socket is our surf Our surf is a TCP I server for our developed by Simon urban egg It allows for multiple simultaneous connections from arbitrary number of clients arbitrary as long as the machine can take it of course and Every client that connects to that are server via TCP IP Has its own namespace so all calculations are really done without side effects Clients by default are available besides the Python one for Java C++ C sharp and so on and there's a growing number of clients for our surf Part of them come with the are surf package directly other clients. You can learn a load a download as third-party packages from the our forge server and install them So the second Piece of that bridge is pie are surfs. That's the part that I have been writing It's a pure client adapter for connecting via TCP IP to our surf What it does? It serializes Python data objects over the network sends them to our R can do some calculation with it and the serialized result data is serialized or parsed on the Python side no, sir and native Python result objects are created by that it allows to To evaluate arbitrary R commands on the R side on the R server you can trigger functions Function calls in R. You can set and get variables in the our namespace and a latest addition To PyR surface that it allows our code to trigger commands in your Python interpreter from the R side And I will show that later the missing piece of that bridge is the protocol which these two sockets are talking to each other. That's the QAP1 protocol, quite attributes protocol which sounds much bigger than it is. It's I think it's invented by Simon Just for the purpose of letting our clients talk to our server It's a little bit of protocol like maybe pickle in Python But it allows to exchange serialized objects between R and Python not just within the Python ecosphere and it doesn't only allow to Serialized data. It also contains Commands so that the R server side knows what to do with the data that you're sending to the R side It's a synchronous protocol So you send off a command and they send off your data to the R server and you have to wait until what you get back Even if it's just a non-object You have to wait until the R connection really has finished the calculation So you cannot send off a second command to the same connection if you want to do parallel computing You have to do you have to open multiple connections on the R serve which is possible from the R from the Python side and Yeah installation is quite easy you download R from the R server from the from the Source server It's not possible to use the pre-compiled packages because for running our serve You need to compile and link our with a special flag these enable our shared lips Otherwise our surf cannot be loaded into this and under the execution space of our Our surf can be done directly compiled by our so there's our brings its own compiler for packages it's our command install and then your packages which you have downloaded before and Finally the missing piece on the Python side is just a Python package downloadable from Pi Pi server it runs on all major modern Python versions from two point six six on to three point four It needs lump pie if numpies installed. It's fine. Otherwise it will install numpy on the fly Starting is also easy the server side is just started with this R command our surf opens a connection on the Network and by default it only listens to a local host. That's a security feature Because in the older times our surf didn't have a way to protect access to it. So there was no Password check. It's now built in on the R serve side. It's not yet built in the pie R serve side So that's why by default they only listen to private IP addresses on the local host When you connect to our serve So we're running on your local machine. It's enough to call pi surf power surf connect Goes to local host on by default if you want to connect to a remote machine just provide a host name and Provide a port if you're running on a need non-default port on the server side The connection itself has some attributes so you can see where we're connecting to it That's especially interesting if you have multiple parallel connections on Open on your Python side so you can see where which connection is connecting to where you can close the connection You can see if the connection is closed on the on So now we come to the first real steps. What can you do? With such a Python With such a connection to our surf the connector itself provides a Method called eval that allows you to send arbitrary our Expressions our commands to the R side Let it let R evaluate that string expression and receive the result back as a native Python object on the Python side So here I just run a just summing up two numbers You can also call functions in our the C operator in our creates an area an numeric a array on the on the R side and Since either returns the result of that expression. What you get is that a numeric that array There's something popping up here all the time you get that connected in an umpire array on the Python side Sometimes it's not Always you want to return the result back from our so when you assign a very complex data structure to a variable on The R side, that's nothing you want to see on the Python side because it would have to be serialized from our Pass through the network and deserialize in the Python side And if you just want to assign it to a variable you want to avoid that so for that case there's an Variant of the eval command called void eval which just executes The expression on the R side and just doesn't return anything to Python if you still want to see the Value of the variable a bar you just can use the eval command Some more examples of string evaluations like here. You can even define a function on the R side So here you create a function call times two which takes one argument and the second eval command just provides Executes that function and the result is returned back to Python you can even Evaluate multi-line scripts which you can define Python you store them in in a string or whatever Send them over execute them and they get the result back. I think that's pretty well straightforward so the Using e-mail is very sort of the basic usage of Connecting to our or communicating with our Connector provides a much more interesting Attribute called R which represents the namespace of your R interpreter running on the remote side So why are these absolute are you can access? variables and set variables in the R interpreter you can make function calls and You have to watch out namespaces are separate as I said before for every connection But they are also getting deleted once your connection is closed So you have to make sure either you save your workspace and in our or you just lose whatever is in there so just to see The difference between string evaluation and using real namespace approach is these two commands do basically the same thing a variable a bar is instantiated on the R side with a string ABC and The first approach is the string evaluation part the second one is doing exactly the same same thing just Very pythonic, so it looks like ABC is assigned to a local variable in R but it's actually serialized and sent over to R and Set in that namespace it's even possible to Set more complex data, so that's an example where I create a numpy array in Python give it a shape and assign that that array to a variable called a matrix or an attribute and Also that numpy array is serialized sent over to R and a native R array is Created on the R side and the last call with con eval dim a matrix shows you that you can access that area in R and get the dimension as a as a result How are functions called in that pythonic way using the R namespace I'm creating here just to demonstrate that I'm creating three very simple functions The first one doesn't take an argument chest returns a static string the second one takes one argument doubles the value and returns it and The last one takes keyword arguments so that what you can do in R looks very pythonic already And now that's the way you call it. You just use your R namespace call function zero its string provide a Argument and provide a keyboard Keyword value to the last one and get the list back. I think that's very easy to see and to understand A more complex thing is our Some our functions allow to Accept another function as an argument maybe like the map functions in Python It accepts the data structure and the function you can map it against as apply in R. That's basically the same thing just the arguments are the other different order so it takes an array and Allows you to pass a function in R to be applied to it So that's also possible. You can refer you can refer to a function that's sitting on the R side from Python Con R times 2 It's important not to pass References to Python functions. That doesn't make sense. So like the double is a here I define a function in Python and if you try to refer to that function, of course, it's not possible to serialize functions from Python into R. You can serious data, but not function. So that gives you non a name error because Double is just not defined on the R side This example also shows you that Pi R surf can handle errors errors that are Raised on the R side. So I'm also when an expression is evaluated I'm looking at the result and I can see if there's an error raised and I can drag over the error message from R into Python and raise a exception Providing that the message that R sends to me So the name double is not defined. It's basically what R tells me This example shows you That things can be rather inefficient if you don't do it right. So here what I'm doing I'm creating a number area and Assign it to a variable our ARR on R side and then I make a function call with a supply where provide that area as an argument and Referring to the times to function which can apply to every argument in the array So why is that inefficient what that really does is? That it's assigning in the first line the array on the R side Then I'm pulling the array back over the network into Python and the last line pushes the array back to R And so the array is sent back and forth three times and to avoid that There is this additional attribute or this additional namespace Attribute that I reference before which is called ref Which that allows you to reference a data object in our without actually pulling it over so it just provides you a proxy to that and Here in that example now you use that to reference an array which exists in our and Supply that as an argument to the s apply function. So that avoids that data is sent in back and forth three times Out of bound measures messages, that's one of the latest additions that allows our code to Send messages interior Python interpreter, which on the Python side Trigger the call of a callback function that you can define In order to make that work, you need to start our surf with a special flag enabled in a config config file so that's the OOB enable which you can see here in that example and You have to start our surf to use that config file with the RS conf command line option That starts up the additional code in our surf for callback messages The way it's set up. It's very easy. You define a callback function in Python That takes two arguments message is basically the message you want to see from R. It's the payload of the actual Callback and message code is an additional Qualifier that helps it to interpret what you have received in message and that can be defined when the callback is triggered And you see that in a moment in order to make that callback Accepted it has to be assigned to a special attribute in the connector called OOB callback So it just assigned that to it and whenever PyR surf receives a out-of-bound message that method will then be called So that's a two simple examples The to trigger a callback from R. You have to call the self OOB send Call the self has nothing to do with Python self. It's just a namespace thingy in R I Don't really understand why Simon has implemented that way, but it's done. So that's the way to call it So the first call you see I just send a message no message code And when I print out what I receive in the callback, you see the message code is always zero by default the second call here I can override the zero send a More qualified Message code and the next example will show other than one of the next example will show you why you should want to do that One possible application of doing Callbacks is provider Feedback message for your progress So here you see a fake Dummy big job function, which has intermittent callbacks OOB send since how far your calculation has been done I'm setting up a primitive callback methods in Python You just print it for for that case and then when I call the big job You see the callbacks are they the callback messages are printed out While the R function is still running and then at the end you get the result back and can do anything with it Another very nice Application with that is to have a method dispatcher so you can make a callback from our and Control which kind of callback method is then actually called for doing that. I'm defining three constants on the R side and the Python side I'm setting up a dictionary in Python and Assigning three different Functions to be called depending on what kind of message code I receive Very small dispatcher method is created as a callback function, which just accepts the message code looks up the appropriate function in the function dictionary and calls it with a message I receive and Here you can see I make a callback Provide the argument foo and I want to see the C store method called which actually just appends the message that I receive into the List called store and when they print the list it has one argument. So that's a very nice feature and Try try it out if you haven't seen it I'm coming to the end small discussion of this network approach So the a good thing about compared to the RPI approaches when multiple people in your group and your team and all They they are all doing Calculations and you want to make sure everybody runs on the exact our version and the exact versions of all our packages they're using having one single installation on the server is much easier to maintain and As when every team member has to maintain and to ensure that all run the same versions and What you can do for What it allows you if you have three compute compute intensive stuff to set up a real our compute farm and Have a load balancer which distributes CPU intensive jobs to different our service The con side of course you have to serialize all your data that you're sending back and forth if you're sending huge amount of data That can be really a Bottleneck for you. So it's always a thing. You have to balance out yourself Security aspect is the last thing as I said before and the our serve side now nowadays allows to Have credential things so you can log in log out Pire ourselves doesn't have that so in the moment. It's best to just use it in-house So that's the talk for now. Thank you for your attention and if you have questions Any questions anyone? Okay, so thanks for the talk very interesting approach First question is did I get it right you have one session per connection which keeps the state? Yeah, that's one of one name space one session per case. Okay, so it's suitable for multiple users Exactly. Yeah, and can you say anything more about the serialization of the data you're using? You mean the protocol what I'm doing. Yeah, okay. It's a it's a binary format invented by Simon and there's a documentary very elaborate documentation on his website. It's It's basically working the same way as pickled us So it recurses into your data and goes down the the tree for a simple data That's simple but for nested dictionaries lists and tables all that can be serialized and it basically has the same approach just that this Serialization protocol is not Python specific, but it's our service specific. So all clients and all service can interact with that I mean Going into a technical details would just be too much for that talk But it everything can be looked up on the website. Have you considered to implement any other clients? Other languages other than Python client or no Ruby or just other languages Because once you're using serializing data, so it doesn't matter, you know what you use as a client. I Mean a Ruby client already exists You mean that we can exchange binary data between different languages. So have a network of connections Oh, okay. Yeah, that's maybe an interesting approach. I haven't thought about that It could be useful as a general way to exchange binary data between different Languages of different systems. Yeah, that's a very interesting idea. Okay. Thank you very much Okay, then there are no more questions. Any other questions? No, okay. Yeah, so thanks very much to all for the talk