 Next talk we have is Python and CFFI visualizing network traces by Abhijit. Abhijit stays yours. Okay, thanks Rupal. Okay, let's get started. Hi everyone, I hope everyone can see my screen and can hear me. I'm just assuming that. So this talk is going to be about a package called CFFI and how we can use it with Python to actually perform a function called as how do you visualize network traces? Okay, so before we get started, a little bit about myself. So I run a consultancy company called as hyphenOS software lab. So that's mostly into system software. Python is my main programming language. So mostly Django for backend based system and stuff like that. And I also do some fun programming in Python. In fact, last time I had given a talk about one such project that I had tried. Here are some pointers to contact me. I'm not there on Twitter and the slides will be available after this talk. So, all right, get started. So what am I hoping out of the stock actually? So I'm hoping that after listening to this talk, you should be able to write Python binding for your favorite C library. Okay, in an afternoon is probably an ambitious goal, but at least you don't feel intimidated by writing Python extension. I think if that is something I can, I'm able to achieve that will be pretty good. And I'm hoping that you'll be interested a little bit about why shark and package processing in general. Okay, these are like networking things. Okay. So more concretely, what we'll be looking at is a bit of background about the problem statement, how this was started. Then we'll be looking more at CFFI, which stands for the foreign function interface. We will start by looking at what are the choices for writing Python binding. And then they will be looking at different stages of working with CFFI, what I call it as development stage, deployment stage and runtime stage. Then some more practical advice about CFFI about some what I say some not so frequently asked questions. Finally, time permitting, we will have a quick demo of the stuff. Questions I'm mostly be going to be taking offline. So after this I'll be available on Zulip chat or one of the hallways, let's say whichever is the free. Okay. All right. So what happened a friend of mine actually said that he wanted to dump some packets from a network interface into elastic search so that he could do certain analytics on them. So my natural first reaction was why don't you do something like this? There is a utility called T shark, which is developed by the same guys who develop wire shark and it basically gives you a nice days and output. And then just call it through post to elastic web API. Or alternatively elastic had got their own something called as a packet bit, which allows one to dump packets into elastic. So it turns out that the first approach, there is a problem in the sense that it is really slow. And one can very clearly see that you are actually taking the output of a process and then taking as an input and there's a lot of things that is happening. And this was not quite well. The problem with elastic packet beat was that even though it was being for the very safe purpose, it did not have the required protocols that are required to be supported for that. The protocol that he was interested in. So yes, that was not a choice. So what do we really need? So what we really need is we need the dissectors in wire shark. We'll see in a minute what a deceptor is in the Python world. And why is that so? Because then we can use Python elastic bindings or Python elastic API to dump into elastic. And then using the sectors in Python, we can just take the packets from the network interface. So that's basically the idea. I mean, that's the problem statement, right? So quickly, let's look at what a deceptor is. So a deceptor is basically, if you look at it, a packet is just a stream of bits or bytes, however you want to see it. So what a deceptor does, it's a function. It's a program turned in C, in the case of wire shark, it's written in C. It takes that bit stream and converts it into a human or machine readable format. So here we are looking at JSON. So now what you can do is you can then dump this JSON into let's say elastic or do whatever you want to do. So basically what we now want is this functionality, which is labeled with wire shark as a C library in the Python world. So now our original problem statement like dumping packets into elastic has become developing wire shark bindings for wire shark. So now what are the choices? So of course there are C types that we have looked at that comes with standard library. Then of course there is CFFI, which we are going to be looking at a little more detail. There are a couple of other choices. Scython is quite popular with pandas and the numpy guys. Swig is one which is developed by Dave Bisley and it has got it's really quirky in the sense. Or you could just do all the heavy lifting writing Python API yourself, but this is kind of non-trivial for a fairly big library like wire shark. So what is CFFI? So CFFI stands for C foreign function interface. So what it basically does is you have some functionality in the C world, in this case wire shark, and then you want to bring that functionality or you want to call that functionality from the Python world. And CFFI helps you do that. And one of the stated goals of CFFI, I'll just directly quote from the CFFI's goal, is to be able to call C code from Python without learning a third language. And this is very important. I mean, if you have worked with Swig or Scython, I have not looked at Scython in detail, but Swig has got its interface files. It's like learning something new again. Or you have to learn the APIs that you would do in CFFI. So CFFI lives to this goal pretty well. So yes, so now our problem statement is now, okay, so we have now made a choice of using the CFFI for Python binding. Okay, so a disclaimer. So we'll be seeing a lot of code from this library that I have developed, which is called as WISP. It's open source. It'll be available. I'm trying to keep this as much generic as possible. And at some places it's just good to use the code that is already available. Okay. So, okay, let's get started looking at CFFI then. Okay, so what are the main features? So CFFI actually allows one to interface with C library. What is called as an application binary interface or an application programming interface. So application binary interface is basically something like you do a DL open and then the functions in the C world can be directly called from the from the Python module itself. CFFI developers themselves don't recommend this way because this is, even though this is good enough to quickly try stuff, this has got, this is kind of limited in terms of what all you can do. Okay. It doesn't allow you to do with nice things certain thing. And it's even actually slower. So, and then there is basically another one called as application programming interface. So what does that really mean? So it basically means is you start with something called a C definition. You'll see in a minute what they are. Okay. For your Python binding. And these C definitions will be some types and functions from your underlying C library. And then what you do is CFFI, there are a set of APIs that CFFI provides. You actually generate a C file and that will compile into a Python module. So that is like at a very high level. That is what it is. It is doing. Okay. So now we'll be looking in little more details about this API mode. Okay. And what I call it as practical advice. So whenever we are working with any new library. So our first goal is to actually get that functionality in the Python world or wherever we want. Okay. So here what I'm going to be talking about is basically three stages of development. So first is the development stage where you are somehow figuring out how to get the C functionality in the C land into the Python land. Once you've gotten to that, the next is about build and packaging where you want to actually make your functionality in the Python world available to the rest of the world through set up to integration and stuff like that. And finally what I call it as a runtime stage is essentially about how do you think in terms of your own API. So that is, these are the three different stages is what we are going to be looking at next. Okay. So development stage you start with a library whose API you want to be in Python world. So we have already looked at that. Okay. So you start with something called a C desk in the CFFI seek or which is basically C definition. Okay. So we'll see what C definitions are. So before we go there, let's look at what a C library is. Okay. In a C for a C library, right? The header files is actually the API of the C library. Okay. And what do these header files contain? The structures constant type that some has defined or they actually contain some functions. So if you really look at it, a C library actually has got two types of API or two parts to the API. So one is what I call it as types API. And then another one is what we call it as a functions API or funds API. Okay. Good. Let's start. Let's look at a simple example. Let's say you are trying to map this library called a spam into the Python and the API, which is in the spam, that looks something like this. See here, if you actually look at it, right, there are two parts here. So there is a constant in bar, which is actually a type in my, loosely speaking, it's actually a con, but we can just hide it under a type. And spymall is actually a function or a functionality that the libraries provide. Right. So now to get this into CFFI world, what we do, what we start with is we start by defining something called a C definition that's called. Okay. So we define spam to types.h and funds.h. Okay. And then the next step is CFFI provides another API called as a class called as API, FFI. And what here you do is you take this functionality, you take this definition and you are basically now trying to include this definition into this FFI object that we have created. Okay. So there are a couple of advantages of why you would like to do that. Okay. In fairly big libraries, like for example, Wireshark, which where there are easily like about a dozen header files, actually a little more than that. Separating this types and funds actually helps because in some places you are only interested in taking types and you don't want the funds to be leaking in that definition. The second thing is readability of overall the source code of your CFFI C definition improves a lot by separating code like this. Okay. And it also allows us to reorganize the code pretty well. So what I have actually seen in some of the C bindings people have that people have written using the FFI people have dumped created a huge Python multi line string and dumped all the C definitions into that kind of mix working with it's little difficult. Okay. And once you have the C definition, what is the next step that you are going to do? You are going to be verifying. So the CFFI provides two API called as verify and set source. They pretty much do the same thing. Okay. What is the same thing that they do is they use the CDF that you started with for the definitions that you started with and then generated or C5 with some boilerplate code to help you compile. Okay. So they are actually doing this. So now you have the CDF or the C definition. So now the next step is actually to say, okay, now I'm going to be verifying or actually trying to build the C definitions on my machine so that I can develop the Python. So one thing to remember, this is going to be requiring all the all your compiler tool chain for this and even the Python packages. So this is something you need to remember. And of course, the dev version of the library that you are trying to compile because you need access to the header and the source file. Okay. Now a simple advice is even though verify and set source actually provide a very similar functionality, verify is kind of limited in terms of what it can do. Whereas set source allows you to set path for your generated Python modules, et cetera. Okay. So set source is something that we want to use when we are going to be when we are going to be distributing it using set source. Okay. So generally a good idea is to separate this into a separate graphic builder module. Okay. And then you can use it in setup.py. In a minute, we will be looking at an example that is going to be telling what exactly we are going to be doing. And we can just use verify to quickly verify our findings. So these are the two APIs that we'll look at it. Okay. So here, if you take a look at an example, I started with an FFI object. This is something we have already seen. So we added one or more C definitions. Okay. That is something C. Then I'm giving a package name. So this is going to be FI fully full package name. And then there is I'm defining something called as a package config library. So if you have used package config in the past, or if you are aware, it helps you to actually generate hash includes basically minus I and minus L C flag. Okay. And then I am basically saying that you set source package config with this. Okay. So this is, I have kind of primed the lip pickup module doing this setup. Okay. Now what I do is I have some, some kind of a pickup builder in the same path. Okay. What I'm doing here is when I run that module using Python minus M blah, blah, blah, what I can do is I can actually just quickly verify that whatever I have package actually works and makes sense. And then I will simply start with printing the generated pickup library. And then I will, I'm just trying out some functions from that particular library. So once I got this part working, that means I have gotten some functionality that was there in the C word into Python word. Okay. So that is the basic idea. Okay. And once this works, we will be getting ready for the next week. Okay. A couple of things to remember here is rather than trying to get everything into Python word first and working, it's probably a good idea to try some very simple functionality. Like most of the libraries will have some, some kind of function that says print version. So if you're, and if you're able to get that print version function in the library back into back into the Python word using the verify set, you have pretty much all set because then what you really have to do is you have to keep adding more functions and more types to this. Okay. Which is kind of the workflow is that I'm not saying it's very straightforward, but the workflow is that. Okay. So this is essentially what we have covered so far is, okay, how do we get some functionality from the C world into Python word? So that was essentially about the development space. Okay. So next is about the base and distribution space. So here we will be looking more about the set up tools and integration. But even before that, let's look at a couple of things. Okay. So when we were writing the source for the C definition, okay, so all the C definitions will go into the sRG directory. Okay. So this kind of fits very well with the mention model of using the C sources and libraries as well. Another advantage of this advantage of this is this. So if you do a Python setup.py, this is where you'll be doing. And the wrapper builder that we looked at it will be including it in the setup.py. So far we have looked at at least three APIs, the FFI class, the verify and set source. So CFFI also provides one more API called CFFI modules, which is primarily for setup tools integration. So the wrapper builder module that we looked at in the first one, we will be actually using that for the setup tools integration. Okay. The code looks something like this. So here I have actually two FFI modules. So that's why I'm doing two FFI modules. So if you really look at it, it's the same thing. So I have taken this from basically a module called ifanbuilder.py and then there is the lippycap FFI that we looked at is actually the FFI object. And once you pass it like this to CFFI through CFFI modules, CFFI takes care of bidding and making sure it works. Okay. So this kind of mix thing is really easy. If you want to work with otherwise setup tool integration, you don't have to do any heavy lifting. This is just like adding three lines to your setup.py to get it working. Okay. So that, with that, we are basically coming to an end of what does it take to actually build and distribute. Okay. Then comes the next part, which is the runtime phase. So here we will be speaking a little bit about how the users of your bindings are going to be using. Okay. Often the user for your binding is just you, perhaps only you. Okay. Which is the case with which we write for a practical use case. So what do we do? Okay. So one of the things to do, to go about is we should be providing our own API that wraps the library and that is it. Okay. We should not be simply just, okay, take all the APIs from the underlying library and make them available to people in the Python world. Okay. That's probably not a good idea. Okay. Of course, we should make that available just in case somebody is interested. But we should also be thinking about what kind of API that we are going to be providing because that is very important because the users of our library will be more used to programming in Python than let's say programming in C. So the API should be more such that they think more in Python and not think in terms of lab libraries. Especially for libraries, for a few C libraries, definitely true for LibBuyershark. Their APIs are not very nice. Okay. You have to do a lot of tweaking in order to call their API. And that's kind of a lot of details that the users of your library, it's better if basically they are they are hindered from the users of your library. Okay. So for the users of your library, the RAP library is just a detail that they need not worry about. Okay. They just care about the API that you are making available. Okay. For example, in the case of PHP, what we have is essentially wrapping two libraries. There is a library called as a LibPcap and there is another library called as Wireshark. So these are the libraries written in C and this is wrapping these libraries in Python. So we, of course, we make these APIs available as it is to user if they are interested in. But we define couple of our own APIs called as a capturer and detector. So what does a capturer do? A capturer is a class that is mean for capturing life packets from network devices or from an off-the-handicap pipe. A detector is a class basically for detecting the packets and generating adjacent from that. Okay. And for the users of your library or in this case, this library, they only really need to care about the capturer and a detector. They don't need to worry about overdue. I mean, they don't even need to know whether there is a Lib Wireshark underneath it. Okay. And this has got another advantage when we work like this. So if you remember when we were packaging it, we were separating the SRC and the Lib part. So when we are, when we want to work with multiple libraries, okay? This is particularly true with the case of Wireshark because certain distributions have certain version of Wireshark library that is 2.6 and certain distribution will have a newer version that is 3.2. Now, ideally, your binding should support both of them. Of course, when you support, when you say it will support both of them, does not mean they will be supported. At a given instance of time, only one will be supported. But on different machines, multiple can be supported. So separating the code into SRC and Lib actually allows us to do that as well. So when we said SRC, so all our library-specific stuff, especially a version-specific stuff goes under SRC. And when we build the modules, our path will take care of building the Lib part. Okay. So in summary, what do we have is define something called as api.py or you might even use a more descriptive name. For example, in which we call it as Capturer and Deceptor.py. But that doesn't matter. Then you have something called as a wrapper.py module. Just make it, just let users know this is kind of an internal detail. Don't worry about it by making it underscore wrapper. Okay. And all the wrapping-specific things go inside wrapper.py. And one advantage of this is basically tomorrow I come up with even a better library than LibVireShark. I don't know how. But let's say I come up with, I'm just going to be saying, okay, you know what? I'm going to be replacing this wrapper library with another library. And my APIs don't change. The users of my library are generally happy. Okay. So this is essentially what we should think about when we think about APIs, they should be more pythonic and they should hide all the details inside wrapper.py. Okay. So with that, what we have covered is we thought and started, then we built and packaged. We thought a little bit about the APIs, how they should be there. In the next part, what we are going to be looking at is we will be looking at not, what I call it as not so commonly, not so frequently asked questions. Okay. So few things. So basically whenever we are talking with T-Core, there are few things that need to keep in mind. Okay. So garbage collection is one thing we need to keep in mind. And then when I say that, we need to make sure our objects remain in scope. So we'll see with an example what do I really mean by that. See where documentation is quite good about it, but important to know. So for example, let's take an example. We have something called as a Maya function. And it is actually CFA5 provides the API called as FFA5.new. So this is more like a C malloc. Okay. And let's say our example is like our function is like this. And what happens in this case is when my function is in scope, Val is there. Okay. But as soon as my function goes out of scope, Val can be potentially destroyed because it's no longer being referenced anywhere. Okay. So one of the ways of dealing with such situations is if you are using some class or an object, which you very typically have to make this file part of that particular object. So as long as that particular object leaves, your Val is going to be alive. This is one thing. So CFA documentation has got a few examples of this. This is especially true when your Python code actually gets called as a callback. Okay. Because there the context are completely different. So you allocate in one context and use it in another context. So you want to make sure whatever you have allocated in the context is actually available when you are using in that context. Okay. So this is one. Generally, whenever you are working, you will be requiring some way of moving data between C and Python types. And ffi.memmo is available. So this is like mem copy. I had to actually look at sources to figure this out. So that's why I have actually included it. It's not quite easily available. All the T calls. Okay. All the T calls that we are made are done by using releasing GIL. So don't play around with Python.h API. So this looks something like this. So all the T calls get wrapped into between PyBegin allow thread and PyEnd allow thread. So you might want to read a little bit more about what do these macros do. These are macros, by the way. And we looked yesterday that a lot of this is going to change. But cffi will transparently do it for us. So we don't have to worry about it. Something you need to know. And then it's also possible to use Python functions as callbacks in cffi documentation. Cffi has got a pretty good documentation about it. Okay. A little bit about calls and pass paths. This is standard Python performance improvement trick. You don't directly call dereference. You first make it first dereference it at a class or an object level and then use it because that's easy sometimes. So this is basically some standard Python, I would say optimization trick. But of course, before doing any such optimizations, it's probably always a good idea to profile your code first before doing such optimization. But these kind of optimizations can sometimes give you 5-10% performance improvement. And you will essentially be using cffi for something like or something like things in the fast path. Okay. So a little word about performance performance. And this is specific to this particular problem. This problem is particularly a sweet spot for Python because it's just in time compiler. And I have seen actually performance is 5 times faster than c5. So yes, what we saw in the talk yesterday actually holds true. In fact, in a couple of runs, this actually does better than tshark return in c. That's not really a fair comparison, but what I'm trying to say is we are coming at an acceptable, acceptable over it compared to tshark return in c. So getting into Python world actually has got a merit. Okay. So let's quickly look at the demo for this. So we'll so remember our original problem statement like getting the packets from an interface into elastic. So this is one example where I have gotten a packet from a pika file into an elastic and this is basically I'm just transferring this so I'm just I have just done it already and this is some visualization using that and if you look at it and if you guys some of you know about ccp condition controls looking at if you can quickly say oh this looks like evaluation of ccp condition window evolution of ccp condition window not evaluation. Okay. Yeah, I'm about to be done. So this is through wish p what we have gotten into this and then finally let's look at the demo so I have here two virtual environment so this is a virtual environment and what I'm here doing here is I am actually running T shark which is a utility by the way which uses the wish p for 20,000 packets using this is standard cp it takes something like 12 seconds or whatever and then we can have five five base things and this takes about two three seconds so you can already see this is like really four four and a half times fast okay and then I am doing here is basically I am running the same thing on T shark using the T shark utility that is available with so I have this is a quick shell script so this is this particular run is actually faster than the so our five five base stuff is actually faster than this this is not a fair comparison because if you look at it there is this generating output but that's another case why you need such functionality because otherwise you will have to clearly rely on anything that is provided only by T shark utility okay with that I am pretty much coming to the conclusion of my talk here are some code feel free to check out this pretty decent documentation of course it can still be better and I have also got some example code where a couple of example one for elastic another for ready and I will also see soon the adding for Kafka yeah and with that I am kind of coming to the end of my talk and if they one of them may be let's say I will be available in March after the talk okay