 Hello. And welcome to this very interesting discussion at the intersection of AI and software engineering that we call AI for code. And I'm here today with my colleague and friend Diego, a key industry analyst who has been a pioneer in this area to bring this discussion front and center for our colleagues and for developers around the world. So Diego, welcome. Thank you very much, Richard. Great to be here. I think it'll be great Diego if you really bring this topic home in terms of what is really exciting about it from a industry analysis point of view. Why has this become so important more recently? But going forward, why is this such a critical topic? Yeah, so I guess it all boils down to why talk about AI and the intersection of AI with writing applications or writing code, as Ruchir said. Well, there's three main big reasons. One is the demand for building more new and great applications is not going to diminish going forward. It's increased and it's going to be increasing much more with digital with organizations trying to move faster from to introduce more digital in their organization, right? Everybody talks about digital acceleration. Well, you can only achieve that if we build more applications. So there's a high demand, very high demand to build new applications. That's one. Number two, from another perspective, is as the University of Cambridge did this research to find out that we're spending more than 50%, developers are spending more than 50% of their time fixing and making the code work, which kind of equates all to around $312 billion. That's only on Linux, Unix. If you look at other ecosystems, Microsoft, .NET, et cetera, that number is just going to be much, much bigger. So we're spending a lot of money still writing and fixing the code that we write, right? So there's lots of quality problems. And third, we can't ignore legacy. Actually, almost everything we do today touches a legacy system, perhaps. And a lot of that legacy is written in COBOL. And if you look into the financial services industry, it's running very important co-operational processes. 85% of ATM machines touch when you swipe your card is going to touch perhaps on a backend written in COBOL. 60 to 80% of the transactions depend on COBOL writing. There's over between 200 and 250 billion lines of code that Router estimated around still existing. And also that Router's estimates that's more or less a 1 million COBOL developers. You know, COBOL is a 58-year language. It's old. And so actually, I think that number is kind of cautiously high. But I did just some very simple math. I took the cautious number of 200 instead of 250 billion. I took the optimistic number that there's 1 million COBOL programmers. If you just divide the two simple math division, you'll find out that each COBOL programmer has 200,000 lines of code to maintain. That's a lot. I developed software and I know that that's a lot to maintain for each COBOL. Now, those are the three big reasons, big issues in the industry, right? But there's also more happening, which is writing code. When I started developer years back, it was much simpler. I know it was a begin, wrote the code, and today it's nothing like that. There's lots of technologies that are popping up every other week or month, serverless, event-driven, microservices, APIs, writing, you know, building with Agile and DevOps and Dev SecurityOps. A developer needs to be a superhero. And we talk about cognitive overload for developers because there's a lot to learn. There's a lot to know in order to write good code. So it's becoming more complex as well. Now, on the other side, we are trying to make things a bit easier as well, which is there's this trend of low code, no code tools and environments that will allow and enable business people to write simple applications. Now, that kind of shifts the problem somewhere else, which is first of all, there's not going to be enough of them anywhere anyway. But secondly, when you're really doing something at the enterprise level, complex enough, they all end up extending out and reaching out to having to write some code and going back to the previous problem that we just talked about. At the end of the day, you have to do some coding as well. And so the problem is not going to be entirely fixed with just low code and no code and so-called citizen developers, which are business people that write code. And by the way, AI might contribute to that as well, but that's kind of anticipating some of the things that we're going to be talking about. So the question is correctly, right? AI is doing a lot of things today. There's lots of people adopting AI for business processes, for business operations. Can it also help us in the business of writing code? And so basically, we in Forrest have been writing, I've been writing some research around this for the last two years. And with my colleague in 2021, we wrote about an emerging technology talking about AI rights enterprise code. And we came up with this pretty bold prediction in 2022, which is we think that by the end of next year, nearly all development tools will include an AI bot that will be helping developers, developer teams, building code. What exactly that is? Well, that's what the conversation today is going to be with Roocher. So I think all that kind of gives the common, the basis for justifying and for basically sharing why the AI for code is a very important topic today. Thank you. This was a wonderful introduction to the topic for AI for code, why it has become more relevant than ever before. But I think you very rightfully predicted that AI will augment every development process that we entail in software today. But maybe you can peel the onion a little bit regarding what is behind those predictions? What led you to those predictions as well? And we can start the conversation from then on and see what is going to happen as we move forward? What are the leading indicators of it? And how long it might take? Yeah. Basically, when we started writing about the emerging technology, I was looking for prior art, right? The being able to use and leverage AI for code, there's a lot of prior art that has happened, things that we've seen happening and we captured in the last few years. One of them is, for example, using artificial intelligence for improving the way we test. So companies like Apply Tools came up with what we call visual testing on the web, visually testing applications for mobile apps. Mabel is another interesting example of a prior art that augments tester automation capabilities. But there's also prior arts in terms of research and small startups that have built interesting prior art like Bayou that generates APIs, for example, or deep code, which improves code, but also generates unit test cases. GPT-3 was an example that generated from a description of web application. And so there was a number of examples of existing prior art and then came the big announcement that IBM did a couple of years back or last year, which was the AI for code project, as well as the Microsoft Github co-pilot. All these things together made us come to the conclusion that, okay, this is now moving from the labs into the mainstream. And we'll be, of course, we are at the beginning of this very long journey. It's going to be very exciting. Thank you. And I think as you started to elude, one of the things that we focused on in the last several years of our journey, it was obviously clear to everybody. Over the last decade, it's been said, and I think Anderson Horvitz said it, in 2009, the software is eating the world. We are a decade later, software has eaten the world, and we are at the beginning of AI is eating software. And we believed when we started on this journey for AI for code that, you know, just like software has eaten the world, AI will eat software. And it's the intersection of these two very powerful technologies which will result in a massive shift in developer productivity. As you are saying, new technologies are popping up every day. And AI to be able to augment every aspect of software development lifecycle becomes extremely critical. And it resulted in the project that we fondly called now Project CodeNet. And as we were looking at things, it was clear that the latest incarnation of AI incorporated three major, I would say pillars. Obviously, it's the data, massive amount of data that came together, which on top of which new algorithms were innovated. And because of very large amount of data and these complex algorithms, one needed very complex and very powerful hardware as well. And this sort of trifacta of, if I may say, data, algorithms, and hardware, compute, combine together, snowboard into these massive innovations. And in fact, I would say if I were to just pick one of these categories, which was critical in the beginning, was the data itself. It's said that there is no AI without data. And that took us to the point where we said, what is the data? Just like ImageNet was the data for really snowballing AI into a massive societal disruption and helping society improve on things from healthcare to climate analysis to drug development and so on. What is the data source for AI for code area? And we developed CodeNet, which is this massively large data set of roughly around half a billion lines of code written in 55 different programming languages with all kind of problems from simple to complex and so on. And I think this was the one that caught the community, really, I would say it got it very excited in terms of having significant innovations and improvements. So we put that data source out there in the open as well and it's the largest data source of its kind. And along with it, we also, as you said, one of the announcements that we did earlier in the year, we also introduced some of the key algorithms that went with it as well and we continue to roll things out as we move forward. So I just wanted your opinions on what you think might be implications of projects like Project CodeNet and others that the community is developing. Yeah, well, I'm basically seeing two main important, very important threads. One is AI helps a developer, becomes a peer programmer with a developer. So therefore, I'm writing my code and I'm getting help from AI to write the best code and do it faster and do it better with high quality. It's helping me search maybe for code similarities and that maybe also tells me that, you know what, this piece of code doesn't look like yours, but it's doing exactly the same thing and it's more efficient, use it. So AI for developers. The other important approach is more enterprise level or team-wide, supporting a team. And you guys know, especially the AI for code and a lot of the work that I've seen you guys been doing is actually helping the enterprise now migrate, for example, modernization is a big topic. Everybody has this legacy we talked about. How do you bring that old legacy, which it doesn't have to be cobalt, it could be also Java, you know, technology that we use five years ago is legacy today. So how can we support those teams that the enterprise level go faster? And I think that's where your, you know, especially IBM's AI for code sits and maybe, Richard, you can explain, you know, there's this, I mentioned GPT-3, right? I mentioned this approach of GPT-3 using a huge amount of data being able and, you know, that GitHub acquired the rights to use and, you know, to build co-pilot based on all that data. What's, you know, maybe you can tell us a bit why you think AI for code is different. What is different in it? How different do you see those two approaches? But definitely in terms of besides, you know, addressing the developer and addressing the team, but within that, what's the difference between using something that is leveraging all that data coming from GPT-3 or the AI for code using, you know, the curated amount of data that you use? Yeah, excellent question and excellent point, I would say. So let's see the similarity. Let's first actually just at a very high level talk about, you know, the transfer of models like GPT-3, which, you know, really learn in an unsupervised fashion. The way technology is today from a GPT-3 point of view is they treat code as a language, just like there's human language, English, Spanish, French, German, you know, others. There is machine language. Code is a form of a language and they treat code as a language. It's got symbols and it's got a syntax and so on. Now, this is where the state-of-the-art of transformer models is, which is basically learning the symbols, predicting what might come next. This is sequence prediction actually. However, I think there are similarities between human language and machine language, such as code, but there are differences as well. And I would say there are two main differences that make the area of AI for code very exciting. Obviously, computer science has been innovating in the area of programming languages for long, you know, several decades now. And significant amount of innovations have been built in program analysis. Now, what are the... I would point out two main differences, as I said, which are, you know, obviously human language and machine language is similar in some ways that both are symbolic, but there are difference from one code is a... No, I would say code is executable. Unlike human language, which is not executable, I can give code an input and it'll give me an output. And you can observe that behavior. You can keep on giving it inputs and you can keep on getting outputs, which makes the learning much more powerful if I have access to that behavior. Like if I have access to observing an application in real life and acting on data in real world and I can observe its outputs, that gives me additional learning than just treating machine language as a language only. So that's really one aspect of it. And the other aspect I would really say, code can be compiled into what we call intermediate representation or a graph representation. This is the work of compilers, software compilers. You can take a machine language, you know, any machine language, it can be compiled into an underlying what is known as control data flow graph, also known as syntax parse trees. There are various representations of it, but fondly we call it intermediate representation, which is very powerful because on the basis of which you can rationalize across different languages, I would say, which gives us very powerful mechanism to learn across different languages as well. And these are two main differences that enable us to make the area of AI for code much more powerful than transformer models only as an example, just treating code as a language only, because code is, yes code is a language, but it is much more than a language. And because of the progress that has been made over the decades in terms of program analysis and finding the right intermediate representation and how to understand the dynamic behavior of the code, we firmly believe, if I may say the legacy actually, needs to be combined with the latest and the greatest of AI together to give rise to this new area of AI and code intersection of AI for code to be able to make significant progress in much more efficient learning than what we've been able to do so far. So, you know, it's interesting, when you treat, when you look at languages, you would think that one difference between human language and programming languages, that programming languages follow a precise structure, right? I mean, human language, we can say things in many different ways. We can program an algorithm in many different ways, but you have to follow the syntax of the, and the grammar that the language has. So my question is it easier or more difficult to, because, you know, natural language, we've been struggling to really get natural language to a point where semantically we can have a computer speak and hear, listen, and talk perfectly and understand all the different nuances of the language. It's kind of hard. We're getting, it's getting better, but it's kind of hard. Well, how harder or less harder is it to do this with programming languages? I would actually give maybe, maybe both answers on this Diego, which is in some ways it is easier, because if I may say the grammar, the syntax is very well-defined. Of course, you can code a problem with multiple different algorithms, but you know, either the code is right or the code is not right. There's no ambiguity between that. So precision of programming languages is in some ways makes it easier. On the other hand, it makes it harder, because if you generate code as an example, there is no such thing as, it's kind of right. Well, there is no such, it's not kind of right. Either you are right or you are wrong, actually. And this is what makes it harder, because it's like finding a needle in a haystack. You know, there is one right answer of it, and if you don't kind of, it's not like you land in the ballpark and it's okay. And that makes it much harder as well, because you need to be much more precise, and in fact, you brought up the point of enterprises. I would say enterprises need consistency. And where, you know, a lot of the state of the art of transformer models today is, it is at a place where it may wow you many times, like you type something in English, it may generate a code, and wow, I think this is really cool, AI can do this, it can generate the code for me, but it is not consistent. And enterprises need consistency. I'm not going to use something until I can trust on that process repeatedly that yes, it's going to help me, because otherwise it is entertaining, but it's not helping my productivity. And this is exactly what is needed from the point of view of, as we said, consistency makes it harder as well, because there is one right answer for that implementation, or that syntax makes it much more narrower and precise as well. So hopefully I think that teases out the point that we've been trying to differentiate exactly what is different on AI for code and where the latest of the technology is. One question Diego, I wanted to ask, and I know you've written a lot about it, in terms of you brought up the point of debugging, and there's been a lot of focus on AI for code techniques and code generation as an example. I think I just mentioned the point of kind of wowing that you can write something in English and it can generate some code for you. But there is no point generating code, at least not in the enterprises, and I think it holds to even for academia and others as well, if you can test that code, whether you are modernizing the code, doesn't matter what you are doing to generate a couple of lines of code, you write some test to generate that to test that code, and this was the aspect I mentioned earlier on execution behavior, like code has execution behavior, and I know you have thought a lot about it and written a lot about it. Would love to get your perspective on it. Yeah, so in the software development life cycle, when I started this research four or five years back, I looked at the entire software development life cycle, from analysis, design, development, test, integrate, deploy, and I actually did research pulling out, sending it out and kind of finding out where would you apply, where are you going to be in the following years, are you going to be trying to leverage AI in that software development life cycle, and it turned out that testing was one of the very hot areas. As a matter of fact, three years ago, I started writing more research, digging down into how can we use AI for software testing, and the reason why there was a lot of focus, I found a lot of focus on this is because, as we all know, with agile and DevOps, testing and quality has become a first class citizen, much more important in the software development life cycle process. It's always been a first class citizen, but now these days we can't, it's a must, it's a must do. So how can, and so what are the, I found many use cases actually, the biggest and the most important one is, well, can we be more intelligent in the way we test? So in other words, if you think about it, many people have been thinking about testing and automation, especially of testing, it's like ruthless automation. Let's automate everything. We've got lots of computing power. Let's just automate as much as we can and execute all the automation just to make sure that we cover all the tests. Well, not really because first of all, building those test cases and automating it costs and it's hard. And secondly, we can be much more smarter with it. And as a matter of fact, the use cases that I've been finding out is AI can help and you can now optimize what should I be testing? If there is a change, what are the related tests that I need to have make sure that are executed and passed based on that change of the code? Also, what tests should I execute? You've got lots of regression tests perhaps and you make a change again. Which automation should I execute? Should I execute everything? Simply because I've got all that computing power. Well, guess what? Even if it's on the cloud, you're going to have to pay for it. So AI can help us actually optimize the automation, optimize what we automate and also optimize what we execute. I mentioned visual application testing. Basically, it's as if when you have all these cool interfaces and now you're deploying across many different web browsers and you've got huge websites and suddenly you want to make a change that should be replicated all over your deployed applications on the web. How can you test everything? You have to with eyes and look at the different nuances and the changes of maybe the style or the color or even the labels, name labels and this is simple stuff but it becomes very hard at a very large scale. So now you've got AI visual testing that can do that in a matter of seconds and find out exactly what the difference is especially when you go and look at an application on the mobile app or on the browser. Overall, it's really about optimizing the test strategies I said what are the cool things? I don't have a lot of time. I've got constraints in terms of money that I want to spend in that amount of time. How much can I, what can I test? Where should I focus? And the machine learning models have learned all the data that we have in testing all the existing test cases or the bugs or the code enterprises can create models that will learn from those and tell them and give those suggestions in terms of which areas to focus on for the strategy. There's a whole series of actually use cases that we've seen now applied from a services perspective but also in the development tools trying to increase to increase the level of automation by decreasing the amount that we write because we're doing it in an optimal way. So that's actually wonderful and I have a very, actually I have an example as well where to us it became loud and clear how important that area is exactly on you mentioned modernizing legacy code and we've been focused a lot on cobalt to Java actually as a language translation but this was as you mentioned the legacy code doesn't have to be cobalt this was actually Java. It was a large automotive client of ours, a mission critical application that they were trying to modernize 200 million dollar asset for them so it was really critical for their business 3500 plus or so Java files written in Jaxby, SOAP Struts and many different technologies over the last 10 years or so a million lines of code and they were spinning their wheels for like a year actually on this and not able to understand what this code actually did and with our AI for code effort specifically related to modernization we got involved and we were able to build really understand code and to be able to build 25 plus or so micro services they were trying to make it really modernize it to a cloud native implementation we were able to suggest partitions analyze code, explain it as well all within roughly around four weeks so four weeks versus one year just in terms of what value does AI provide in terms of productivity boost and really getting the past done from in this case for modernizing legacy code but the reason I mentioned the example is that right after that the client said this is great but I had my unit test all of that actually were there before how do I this has now generated code for me how do I take this and now I want to really deploy it how do I know all of this is correct because I have those tests over in this side legacy test as well on top of which I did my releases how can AI help me in getting all of this over on the other side as well so that it became loud and clear that you can't do code generation despite tremendous value it might offer without actually having proper testing and in fact in another area that became clear to us is obviously security has been dropped more in really the central focus in many discussions these days but AI to be able to detect vulnerabilities in software because rarely is the development these days that somebody writes massive lines of code from scratch you are going on the web finding something there is open source code here and there is something code on Quera or stack over to over there you copy paste you contextualize it's a lot about integration and you know you never know what vulnerabilities are lurking out where yes national vulnerability database covers the libraries but it's not covering all these forums actually and AI to be able to help us find the vulnerabilities as well you know in addition to bugs and others I think that's another area which emerged as we hear more and more for clients I think there are some areas that emerging as very important as we move forward in this journey another question I just wanted to ask as we as we wrap it up you know from your point of view first any other concrete examples you want to highlight in terms of getting the value and to close it off I would say and we can have a discussion on what do we think like enterprises should be doing to prepare there's like give me the take home point what should we be doing to prepare for the tsunami that is coming on the intersection of AI and software internet right I'm going to mention quickly three things that I'm seeing on one side I'm seeing clients that are reaching out with briefings with inquiries and to kind of check their the clients have their rather their own roadmap right for looking at when they need to adopt technologies and I'm seeing large large enterprises that are putting AI for code in the roadmap right over the next two to three years and I actually this was this was last year but I think that needs to be accelerated now but anyway so clients are starting to put this in their radar screen the second concrete example is on the testing of using AI for testing is a very large bank here in Europe I actually talked about this and that's a European at the European Forester Forum that used AI for optimizing and improving the way they tested their five star mobile application and they used you know artificial AI to basically again make sure that they tested in the amount of time they had they optimized all the testing and this is a very successful five star app which which actually we wrote in a big research talking about what are the banking applications in Europe that have a top high user experience this application ended up having a very high user experience classification in the in the in the report that we wrote there's also you know big very large enterprises like Intel who's using who's who generated in together with MIT they also built a system that did this code similarity and did some automated bug fixing releasing that last year they started releasing it to their 2000 developers so companies are starting to move into that into the direction of leveraging on leveraging what's coming out from the research but also that's being productized and in the testing space there I've got many examples that that that I can can talk to but we don't have a lot pause and stop for them in terms of what we can do I'll give my quick you know recommendation of what I think is first of all I think that you know clients should really look at at this the opportunity of introducing autonomous let's call it autonomous coding or autonomous testing look around they work with partners check with the partners a lot of this is going to come also from the vendors like yourself if you're working with a vendor I would recommend clients to reach out and ask them what they're doing in this space and actually starting to you know to leverage and experiment with that I think that the code unit test generation example you gave is also a great one because developers spend you know doubles their time to write the unit tests besides for the coding and so and so therefore I think getting prepared for and it also means from an enterprise perspective is getting much better in defining what we want the you know in formalizing what the system should be doing which is kind of a specification the input that can be given to these let's call them AI bots so that you know these are going to be trained by the vendors are being going to be already you know trained by someone else so that they can leverage them start leveraging them for supporting and augmenting the developer the development team and the developer itself I think also AI you know things like code pilot is another great example for developers to start getting even those that want to do low code maybe with a code pilot type can start coding in a language which is not necessarily too low code because the code pilot can help them show them how you know certain code can implement certain requirements so there's a few things I think that are very much you know usable and leverageable today you know one wonderful point and I would say I think another area where I know lot of enterprises are struggling and that's really been their mission to modernize and you know application modernization and AI you know there are tools available out there today actually that they can go and leverage like more or to micro from IBM obviously in terms of leading edge research one thing I always suggest to all the clients is pick something start small there is no silver bullet per se this is a complex space but fast progress is being made in this area pick a project start small whether it is in testing we have another project actually called Convair you can go to Convair.io and look at this lot more in application modernization area with our red hat colleagues and there is a project in there called tackle and tackle test which actually combines application modernization and testing together and that becomes actually extremely critical as we move forward in the enterprise journey for application modernization so software development life cycle application modernization life cycle testing use cases abound but pick something start small keep following research and I am looking forward to the progress that will take place in this area and looking forward to more discussions to come there go between us as we continue to move forward. Thank you Ruchir I think I would just like to add you touched on a very important aspect sorry area I would say you know right now there is a lot of migration to the cloud that needs to be done so your modernization example I think is a very important use case Well thank you Diego and I appreciate you joining us this is a wonderful discussion and looking forward to many more to come. Thank you very much for having me thank you bye bye Bye