 Hello everyone. Welcome to this presentation. My name is Ibrahim Haddad and I am the executive director of LFAI and Data Foundation formerly known as the LFAI Foundation. I was speaking to you today about the role LFAI in data is playing in supporting the open source AI and data ecosystem. First, one major news we have is that the LFAI Foundation is combining forces with ODPI under a new name which is the LFAI and Data Foundation where we represent AI and data in terms of efforts where LFAI has been focused on AI machine learning and deep learning and ODPI has been focused on open source data. So I will do a little both initiatives, both projects before I go deeper into the presentation. So the ODPI project has been launched in February of 2015 was the initial goal of announcing and supporting the development of the open data platform and since then has been going and iterating into multiple specifications and efforts and development efforts including the EGARIA project, the OpenDS for all project and more recently the DBI and AI project and today as of January 2019 basically EGARIA 1.0 has been released which was major release and then in addition to a performance test and program and in 2019 OpenDS for all as I mentioned has been accepted into incubation project. As far as LFAI, the LFAI Foundation has been launched in March of 2018 with one incubation project and launching numbers and throughout the 2018-2019 we've had some busy time in terms of growing the foundation, adding additional projects into incubation and launching various efforts and basically getting into the mode of supporting collaboration across projects and throughout the ecosystem. In 2020 we've had an incredible growth in terms of projects that we host in the foundation where we were adding on average one new project every single month in addition to a number of events that you can see on the screen as part of supporting our communities and the project communities in getting together and supporting collaboration efforts and incubation efforts across the ecosystem. So we are coming together at FAI and ODPI combined bring developers from both organizations and projects under the single umbrella foundation with the orchestration of a single technical steering council or committee in addition to a number of other committees that are active such as the tested AI committee, the AI and AI committee and all of those committees and projects will work together towards supporting the open source AI and data ecosystem. From a guidance to our end users a unified approach is highly appreciated in terms of providing guidance on interoperability across projects, integration, standards and the future of AI data analytics in general as we see that these domains will continue to grow in every single industry out there. Furthermore coming together we will enable closer collaboration across our projects AI projects and data projects we will facilitate integration across the different projects and interoperability and this has been a proven recipe that when we host projects under a single umbrella foundation it allows us to enable that type of collaboration and build stronger ecosystem and of course as members of different organizations and having members that care about all these different projects coming together will allow us a lot of efficiencies when it comes to the value services we offer to our interoperability projects. So let's have a look at the ecosystem this basically is a screenshot of the LFAI and data foundation landscape. It is accessible from LFAI foundation. As you can see it's a striving ecosystem that we capture in the landscape and this is purely focused on what we call top tier projects and we have I think about 12 or 15 different categories within that landscape. So a category is machine learning, deep learning, data models, distributed computing, education, security and privacy and so on and of course within each of these categories we may have multiple subcategories. So if you look at the top left machine learning we have subcategories of framework platforms, libraries and as you can see there are about 300 top tier projects divided across this various categories and what you will notice is in some of these subcategories there are a lot of projects that are competing and providing similar functionalities. So if you look under for instance machine learning platforms there's about maybe 10 to 12 different projects there. If you look at deep learning frameworks there are 12 different projects. If you look at deep learning there are nine plus libraries that offer like these different functionalities. So I think it's a very thriving ecosystem, a lot of projects and this landscape represents the work of over 35,000 developers who are actively contributing to these different projects. However despite the growth and the high number of activity within that ecosystem there are some various challenges. These are challenges related to fragmentation and lack of integration across projects. Some of the challenges are related to how projects are being governed and other challenges are related to the fact that a lot of these projects started as proprietary internal efforts to contribute to a given product or a feature in the product and with time companies realized that that it is better to open source these different efforts and focus instead on data and models and applications which is where the evidence will come from and not necessarily from the actual platform or library or framework. And other challenges that could arise and are have been experienced are related to when the project becomes successful who's going to manage the project assets you know who's going to manage the trademark export filing export control filing who's going to pay for the website who's going to pay for the various build systems and AWS credits etc etc and all of these together all these different challenges create what I refer to as a glass ceiling for investing in the project and adopt them and this is how LFAI originally came about in March of 2018 to actually address these different challenges and support a growing open source AI ecosystem and of course we're all aware of the advantages of open source from the ability to access the code and change it and its flexible license model and the ability to influence development via contribution and peer review and etc. However from an open source AI perspective there are a lot of very specific benefits that are unique to AI such as the transparency and the open development model which lend a lot of credibility to different areas within AI we have fairness which are methods to detect and mitigate bias in models and data sets there is the concept of robustness which is the methods to detect tampering with data sets and models there is the concept of explainability which are methods that will help us enhance investability of models and of course there's a concept of lineage which will help us ensure provenance of data sets and AI models and all of these different very important concepts in AI would benefit from the open and transparent development and peer reviews that we have in the open source knowledge and underneath all of these different concepts we have to open data which are methods that will allow us to clean and sort and track the provenance of data in addition to governance structure for doing all these different things and this is basically the combined efforts of LFAI and data to bring together both these organizations to verse their support the collaboration and exploration of AI machine learning deep learning and data so why are we harmonizing and organizing in this way it's to improve interoperability across these different projects that we host separately now they're going to be hosted together enable colder collaboration across this different project with the support of the single technical advisory council representing our collective membership providing unified guidance to our end users and also achieving efficiencies as I mentioned earlier and all of these together will help us drive towards building a strong open source AI and data assistance how are we going to be structured this is very similar to the LFAI foundation structure as we carry the structure forward we have a governing board on the left hand side underneath which we have several different committees that are suspended off by the governing board in the middle of the screen you see a technician called the nation body which is called technical advisory council under which we have three different committees the ML workflow and interoperability committee the tested AI committee which is focused on trusted and responsible AI ethics in general and the AI and AI committee and on the right hand side we have the hosted projects and each of these hosted projects in LFAI and data have their own independent technical governance and they are enabled by a variety of services that you offer to them so what are the major efforts we have in the foundation we have in an effort to provide a neutral environment and an open governance for our projects that will foster collaboration how we're doing this is basically as you can see in the bottom of the screen we are vendor neutral we're not for professionalization we have open governance and a lot of IP clarity and out how we work we have an effort focused on harmonization and interoperability that's driven by the technical advisory committee and it offers a lot of opportunities for projects to integrate and to create collaboration not just within LFAI and data but with other links solution projects we have a focused project on trusted and responsible AI supported by three core projects in this area and training that we're offering for free on the edX platform we have a focused effort on data and a number of projects that are data specific in addition to the BI and AI committee which drives specific efforts in that space we offer an open source model marketplace and a number of supporting tools and the last major effort we have is to provide funding and awareness and there are a lot of efforts that are basically combined under this umbrella there's a funding model we have a number of events in relation to the projects we host as individually and collectively and a lot of services that we offer the projects in terms of marketing events and program and project management our membership represents some of the leading technology project and companies as you can see we have three tier of membership the premier tier is the board represented companies and then the general tier is any company can join us at the general level of membership and you can see there's there are a number of them and the third tier is what we refer to as the associate level of membership and this level of membership is dedicated for universities, government labs, research institutions and other non-profits and it is actually a free membership with the full benefits of a general membership there's a fact that we host today at the time of recording the start 22 projects and I estimate that by the end of the year we will have another 25 projects hosted in LFAI. So what does it mean to be a foundation project you will be as a project you will be hosted in a neutral environment that increases the willingness of companies to adopt the project and also to contribute to it you will have the endorsement of our members via the technical advisory committee we will support the project with a open and neutral governance model which is something that is critical and is mandatory for all of our projects we have staff full-time staff that are dedicated to support the project from a project and partner management sources as well from a marketing perspective PR perspective events legal support in addition to specific presence in different geographies that will allow support the project in these different geographies in person. One of the favorite slides I'd like to present is who are the companies hosting projects in LFAI and data and as you can see we have a really a very interesting slide with you know showcasing really top tech companies across the whole world from North America to Europe to Asia so we have a really a great lineup of companies hosting with us and this is really a great testament that these companies trust LFAI and data in incubating their projects and driving them to grow their community of users and community of contributors and helping them get to a leadership position within their category technology category. So why do these companies host projects with us? As the next foundation we are the leading organization leading commons for community assets we have over 20 years history in hosting projects and recognized and known as a respective brand and a brand that supports open source communities and projects. We are the largest open source foundation providing management of assets for over 400 projects that are hosted and with over 1500 members that are members of the X Foundation we have access to a really magnificent number of resources and audience to try to draw users and contributors to the different projects. So why would you want to incubate a project with LFAI and data? We have a lot of efforts on events. We manage the IP of the projects and we provide a number of legal services to all the projects. We provide training services that are free training services and available via the LX platform. We have a number of certification efforts that have been proven to be extremely useful and we are able and have the ability to design and execute both software and hardware testing certification program. We have a number of developer marketing efforts in addition to very proven and tested developer operations. In addition to all of that, we also have different services in relation to security whereby we audit and conducts different projects. We offer bug bounties of dependency analysis and of course source code scanning services to all of our projects. In addition to all of that, we have a very strong marketing PR efforts in support of all the projects ranging from blog to announcements to white papers, posters, presentations and promoting project releases and new features, etc. In addition to the marketing efforts, we also have our event services. As you can see on the screen, we have different types of events. Some are focused on a specific project and some are focused on the foundation and its collective efforts in terms of projects. We have the LFAI summit. Obviously, this is all the branding and we are moving towards LFAI and data summit, LFAI and database and we hold these in different geographies and in different major events such as open source summit across all of its editions in Europe, China and North America. If you're interested to incubate projects with us, our technical advisory committee is responsible for reporting in new projects into incubation in the foundation. They meet every two weeks and they are generally booked four to six weeks in advance. To learn more about proposing projects for incubation, please follow the link on the slide and I'm more than happy to have a call with you and discuss the specifics and the process itself. Joining LFAI and data is very easy. You can follow the link on the chart which will lend you directly into the website where you can learn about the different levels of membership, the fees and the benefits. In addition, we have presence across various channels. As you can see on the screen and to reach out to us directly, please feel free to send us a note at info at LFAI.foundation and as I mentioned, we're moving towards info at LFAI data and dot foundation. Thank you very much for your time today. I really appreciate that you took time to listen to my presentation and please, if you'd like to reach out to me, get a copy of the slides or have a discussion on incubating projects or joining LFAI, please send a note to info at LFAI.foundation and I will be more than happy to connect with you. Thank you. Hello. Welcome to the portion of the LFAI Virtual Mini Summit 2020 for Open Source Summit Europe. This section is going to be an update on the Linux Foundation AI Technical Advisory Council, the TAC, as well as the ML Workflow and Interop Committee. And this is happening Thursday, October 29, 2020 from 6.15 to 6.30 Pacific time recorded. I'm Jim Spore. I'm the Director of IBM's Pognitive Open Technology Group, also referred to as our Center for Open Source Data and AI Technologies, CODE. I also want to just mention that I'm also the elected chairperson of the Technical Advisory Council for this year running and also a member of the ONIC Steering Committee. And if you want to get in touch with me, you can reach out to me on LinkedIn or on Twitter or also the LFAI Slack. Those are all three great ways to get a hold of me. And if you have any questions or comments or would like to explore anything that I present in this presentation today. So let's begin with the fact that the Linux Foundation AI TAC or Technical Advisory Council meets once every two weeks. We meet on Thursday at 9 a.m. Eastern time and every other week we're meeting. Who's welcome? Really everyone interested in open source AI and data. We're super excited to have people who are working on open source projects joining. We also have people from companies joining. We have people from universities, non-profits, all are welcome to join and find out about what we're doing in Linux Foundation AI and data. And speaking of Linux Foundation AI and data, we're super excited about the merger, which at the time of this recording hasn't been announced yet, but the time you're hearing this, we have announced the fact that Linux Foundation AI merged with ODPI and that merger is now called Linux Foundation AI and data. Super excited about that. We also have a code of conduct that you can read about on our website. So who presents that the TAC call, the Technical Advisory Call, meetings every other week? Well, certainly we invite open source projects to present. A lot of the presenters are just telling us about their open source project, making us aware of it. Some of them are interested in potentially hosting their project at the Linux Foundation AI and data someday. And also, as part of the presentations, we have committee presentations. There's a couple different committees of LFAI that I'll be telling you about in a little bit. And really, it's easy to find examples of presentations that have happened at the TAC because we actually record them. We send out a deck every two weeks to our TAC general mailing list so they can see what will be presented. We also post recordings of all of our TAC calls in minutes so you can actually look down through and see, for example, on October 8, we invited the trust of AI presentations. These are three projects that are incubation at LFAI to give us updates. Sometimes the presentation, for example, on September 24 feast project requested to become an incubation project. That was a successful vote. We're now an incubation project. On August 13, the Horovoid project, which had been an incubation project, presented their readiness to be a graduated project of the Linux Foundation AI. So you can scroll down through and I really recommend joining a call. It's better than listening to a recording. But also you can subscribe to the mailing list again, you know, TACgeneralList.LFAI.Foundation. And you can find all of this that you can get here by going to wiki.LFAI.Foundation. If you just remember LFAI.Foundation, put wiki in front of it, put lists in front of it, and you can find these various things. And I'll review all this at the end. I've got a slide that reviews all of these things. So back to the presentation. You know, what happens on these bi-weekly TAC calls? Again, we do voting on when to host incubation projects and when to host graduation projects. This is a great example of a project Uber brought us called Borovoid as an incubation project way back from the beginning. You can see it had steady growth, more contributors, multiple organizations using it. And just recently in August of 2020, it was ready for graduation. And we do have criteria. So we have criteria on, you know, what a project has to be like to be hosted at the Linux Foundation. And the companies have to transfer the trademark. They have to transfer all the login rights to the social media and the GitHub repos. Because what we found is the best way for opensports projects, if you really want them to grow, if you really want them to be sustainable over the long term, they shouldn't be associated with an individual company. They should really be in a foundation under multivendor open governance. That's really the best way to help projects grow and be sustainable over the long term. We also annually elect a chairperson to preside over these meetings. I'm the current elected chairperson, Oprah Hermony, with the chairperson before me. It's coming spring. There'll be another election. There'll be a new chairperson for the Linux Foundation AI technical advisory council. So please do join the meetings. Look at the recordings. If you've got projects you'd like to put into incubation. We do occasionally, when a project moves in, if it's a pretty mature project, for example, Onyx. As an example of this, the Onyx project came in as an incubated project. When we merged with ODPI, the Algeria project came in as a graduated project. So Onyx is a graduated project. Algeria is a graduated project. They came in the Linux Foundation AI as graduated projects because they were pretty mature. Typically, though, new projects come in from a single vendor. They're looking for help from the Linux Foundation AI to become multivendor, to grow the usage. So it's more normal to come in as an incubation project like Coravoy did from Uber. And after a year or so of growth and additional governance committee being set up and graduating. These are the current tax voting members. Actually, there'll be even more. It won't be announced until October 26th, this merger between LFAI and ODPI. But that will add some new premier members of the Linux Foundation AI and data. So if you're a premier member, you get a voting seat on the attack. Also, if you're a graduated project, you get a voting seat on the tax. So these are the organizations and here are the people with their contact emails. The ones who are voting on whether a project becomes an incubation project, whether a project becomes a graduated project, who will be the next tax chairperson. Now, there are also two, there are meetings monthly meetings for two of our committees. There's other committees besides these that these are two of the technical committees. There's the trusted AI committee, which is co-chaired by three people, one from IBM in North America, one from Orange in Europe, and one from Tencent in Asia. So trusted AI, obviously responsible AI is a very, very important topic. We wanted to have co-chairs from all the different geographies to to address all the different issues that are coming up in geographies. The other technical committee is the ML workflow and interrupt committee. You can imagine with a growing set of open source projects, how do they interoperate? It's a big, big issue. How did they sequence together and pipelines and workflows? A big issue. So I'll be talking a little bit more about this one. Someone else will be presented the trusted committee shortly. Also, as a result of the LFAI and ODPI merger, we now have a new committee called the the BI and AI, the Business Intelligence and Artificial Intelligence Committee, which is chaired by Cupid Chen from analytics company. So let's talk just a little bit about just to give you a flavor of the ML workflow and interrupt committee. When Oprah was our chairperson in 2019, he mapped out this machine learning stack. We mapped all of the different projects that are in LFAI incubating and graduating onto this. We mapped other interesting projects from the landscape onto this and made quite a bit of progress thinking about how different projects fit together and interoperate. And by the way, that's a great way to start reaching graduation is when you start interoperating with other projects. That's also a criteria for coming into the Linux Foundation AI's listed project. You have to see synergies with some of the other existing projects. So this was the work that was done in 2019. In 2020, Howard Glein from Huawei is our chairperson, and he's been introducing a new framework that we're using to start thinking about northbound applications and southbound frameworks and back end and how do we get deeper interoperation of the project. Now, here's one of my last slides. This is really the call to action. So if you're listening to this recording, stick around. I'll be showing up for the live Q&A, but I really encourage you to go out and study the Linux Foundation AI landscape over 300 projects. Ibrahim has already told you about this, but each of these cards is interactive. So for example, Feast just came in as an incubation project. So when you click on the Feast card here, it brings up information about Feast. And the cards that are a little larger, these are the projects that are part of Linux Foundation AI and data. The ones that are a little smaller are, you know, you can find organization sorts of this list so you can see which organization they're part of. Also, study the Linux Foundation AI technical projects. If you go out to https://lfai.foundation brings you here and you can get information about the members, the charter, how to join or the code of conduct, how to contact us, projects, all the projects, the trusted AI projects, graduated projects, events, people on the governing board or elected officers, our staff, various resources and, of course, a newsroom about events. So really encourage you to explore this and join one of the calls. Also, again, review the recordings. I've already shown you that we have all of the TAC calls are recorded. Their deck is sent around to TAC General on the mailing list. The minutes are recorded. So please do also check out the recordings if you're interested in a particular project on the landsite date you might, or that's hosted at LFAI, you might go check out the deck and recording. And if you're thinking about hosting a project at LFAI, that's a great way to learn about what kind of slides you have to prepare with the steps. Also, you can find that information under projects here. There's information about host your project. Why would you want to host your project? How do you go about hosting a project? There's information about the proposal process. Very short. One, two-page. It's not. Typically, it can be done in an hour or two. A lot of the information comes from GitHub. And you can be on your way to getting a project reviewed for hosting and voted on by the technical advisory council. So also join the LFAI Slack. That's a great way to introduce yourself, get involved, get updates. And if your organization is interested, there's also a link here for that. So here's some reminder URLs. Again, if you remember the four letters, LFAI.foundation, that'll take you to the website. You put wiki in front of that and get out to the wiki where we have these recordings. github.com slash LFAI if you want to see what's on GitHub. And list.LFAI.foundation, take you to the mailing list. And then of course we also have events, pages, calendars you can subscribe to. And I'll be around for the live Q&A as well. So I'll stop the recording here and look forward to chatting with folks during the live. Hello, everyone. Thanks for joining us today. I'm Saishruthi Swaminathan and I work as a data scientist and developer at Burkitt in IBM Core IT. Core IT stands for central for open source data and AI technologies. Today I'm going to share about goals and work that is being done under LFAI trusted AI committee. Here's a chance for you all to join us in building a trust and transparent AI community. I'm happy to share that I'm one of the committee members and I'm sure after the session you will join us and help us build a better community together. Let's get started. LFAI trusted AI committee goals. Let's start with the goal of our committee. LFAI is an umbrella foundation of the Linux Foundation. LFAI which is Linux Foundation AI aims to create a sustainable open source ecosystem by open sourcing artificial intelligence, mission learning and deep learning projects. LFAI trusted AI is a global group under LFAI focusing on infusing trust and transparency into AI applications. First let's see scope of LFAI trusted AI committee. First to define policies, guidelines, tooling and use cases by industry to create a responsible and trusted AI applications. Number two, create a badging process for open source projects that meet the trusted AI policies and guidelines as defined by LFAI. So these two are the scopes of LFAI trusted AI committee. Next slide. Now let's see about the working groups. This committee has two working groups, principles groups and use cases group. First let's see about the principles group. What do they do? This group defines ethics guidelines and principles for trusted AI projects. Next group is use cases group. This group defines and implements trusted AI use cases within different AI projects and domains. So just to reiterate, we have two groups. One is the principles working group and the next one is use cases working group. Next, trusted AI projects. So now we are getting into the projects that are under LFAI trusted AI group. So to start with, IBM has donated three of their trusted AI projects to LFAI in July 2020 and they are currently being incubated. So the projects that have been donated are AI finance 360 toolkit, AI explainability 360 toolkit, adversarial robustness toolkit and they are currently being incubated under LFAI trusted AI. So now let's quickly go over like those three projects. They could very high-level overview of what these three projects are. To start with, we'll see about AI F360 which is AI Finance 360. AI Finance 360 package is available in both Python and R. It contains a set of metrics and algorithms for detecting and mitigating bias in data and machine learning models. We have over 70 metrics for examining bias and 10 algorithms for mitigating bias. So when we talk about fairness, right, the word fairness is not an easy thing to define. It varies, it is multifaceted, it depends on the context and it is defined by social constant. So AI toolkit is useful towards achieving justice in many situations but can't fully capture fairness in all the situation, right? So this particular process requires a broad discussion with the multiple stakeholders in your development process and it's good to get input from different stakeholders on overall decision-making workflows. So this toolkit will be an excellent starting point to have a broader discussion about fairness in your application and this toolkit only works when the problem that you want to solve is well defined. We have built tutorials around demonstrating industrial use cases using the toolkit and to make your life easy, we also provide guidance material that will help you choose metric and algorithm as per the application need. You can use this tutorial at any point in the pipeline. By this I mean you can use this toolkit during the data preparation phase. Let's say you don't have access to the data preparation phase process. Now you can use this toolkit in the model development phase. Let's say you don't have access to either the data preparation phase or the model development phase but you have access once the model is trained. So now you can take this toolkit and use them on the trained model to examine and mitigate bias. And there are a lot of interesting stuff this toolkit has and I would highly encourage you to go to the page and know more about it and contribute to the project as well. Now let's get into AIX360 which is AI explainability 360. AI explainability 360 toolkit is an open-source library to help explain AI and machine learning models and their predictions. This includes three classes of algorithm, local postdoc, global postdoc and directly interpretable explainers for model that use image, text and structure or tabular data. This is a Python package that includes a comprehensive set of explainers both at global level and local level. First, there uses contrastive techniques to explain model behavior in the vicinity of the target data point. It identifies features, it bears the features which are most important for the prediction and also least important for the prediction. It also displays factor that influence a prediction in a very simple term. Finally, it provides explanation in terms of top key features which played key role in the prediction. And again, if you want to know more details about the toolkit, please feel free to visit the toolkit page on the repository to know more and to learn how to use them. And the last one is adversarial robustness toolkit is a Python library for machine learning security and we call it the toolkit as ART. ART provides tools that enables developers and researchers to define and evaluate machine learning models and applications against the adversarial threats of evasion, poisoning, extraction and influence. ART supports all popular machine learning frameworks like TensorFlow Keras and all data types and machine learning tasks. So these are three projects under LFAI Trusted AI that are currently being incubated. If you want to know more about these projects and also the work that is that are being carried out under this committee, please do join us in data science seed online meetup happening on October 23rd. And I have links in the presentation in case you want to join and know more about these projects in the work. And this is something exciting. So we have launched a YouTube series to work through our projects as well as the work. Please feel free to subscribe to the channel to keep yourself updated. There are a lot of new videos coming in and we have experts talking about the toolkits, we have experts talking about the projects and the workflows on the roadmap and what is currently happening in the committee and so on. So please feel free to subscribe and help us in building, not help us, I would say, let's all build together like a trustworthy and transparent AI community. And I think all this made you all super interesting. You're all excited. I'm sure now you want to be a part of this discussion, right? You want to be a part in building this community and I have a way for you as you can see on the screen. If you go to the wiki page of our LFA Tested AI committee, you can follow the instruction under the mailing list and meeting here. First, to add yourself to our mailing list, follow the instruction on our mailing list and also subscribe to our calendar so that you are updated with the meeting schedule of using the instructions provided under meeting. So the wiki page link is in the presentation. Feel free to, you know, subscribe and join us in our monthly calls. Next, there's a separate procedure if you want to be a part of principle supporting group and we have a separate wiki page for principles working group and please contact Susan Malika if you'd like to join and they have a detailed set of documents that explains the current work and the past work and what are they planning to do in future. So it's a very interesting and informative document. Please visit the wiki page to know more about the principle working group and here is the Slack invite for three of our projects if you want to interact with the contributors, committers and maintenance and want to know how to use the toolkit and where to use the toolkit. Feel free to join the Slack channels and you know, start the interaction with the other community members as well as committers, contributors and maintenance of the project. Yes, so we have come to the conclusion. I would like to conclude with these resources for you all to take back and explore more about the committee and be a part of this wonderful journey. Thanks again for giving me this opportunity to connect with you all today and hope you have a great conference session. Thank you. Hello everyone, today I'm going to give a project update for a month. First of all, let me give a bit of an intro about myself. My name is Tao. I'm a staff engineer at the Lyft Data Platform and Tools Team. I'm an Apple PMC, an average Apple PMC and committer at Lyft. I'm working on different data products including like Apple media platforms for months and also leading the data organized data of why cost attribution effort. Previously I worked at LinkedIn and Oracle. So let me introduce Amunzin. So what is Amunzin? In a nutshell, Amunzin is a data discovery metadata platform for improving the productivity of data analysis, data scientists and engineers when interacting with data. Amunzin is currently hosted at Linux Foundation AI, aka LF AI, as its incubation projects with open governance and RSC processes. For those who have interest, I wrote a blog post about the whole experience. Amunzin, here it allows you to search data. Here is the landing page for Amunzin. You can see there's a search bar allowing you to type any keyword which will search over all kinds of data entity including data currently support data set, user and dashboard. And you can see all the tags that is a help user group, different data set. You can bookmark data set as well. Lastly, there's showing the popular tables that is used by most of the people in the organizations. Search data set. So you could search like type any keyword and you will search the data sets. And the result is weighted based on the data set usage. Once you find out a data set you want, you will click through and see the details of the data set. For example, you could see the data set schema name, table name, descriptions. If you also support integration with Jira and which allow Amunzin to be a central portal for data quality issue reporting. If any data Jira tick is created, you will link all the tickets to. It's showing the data range, last updated timestamp, tags, owner and frequent user. It also has a support any arbitrary unstructured metadata service for programmatic descriptions. On the right hand side you can see all the columns for this table as well as what's their schema name, column name, descriptions type. You could also even request description if the display is not available. And you could see like the GitHub source file, the preview and explore. So once you click the columns, you could see the description as well, the markdown format as well as column statistic. You could see the data preview of the tables. Lastly, for data set, you could also show all the dashboard has been used using this data set. Also, you could search this as well. It's like, if I search Amunzin, you will list all the dashboard that has Amunzin's term. Here is that once you figure out the dashboard, dashboard includes a lot of user study, a lot of data set. For example, in this one, you will see it is a more analytic dashboard. The owner is me when it has been last created, when it's last successfully run. And what are the dashboard previews? What are the table has been using in fact? You could also search for coworkers. In this case, for example, once I search the veterans in the team, I could see his profile and see what data set he owned, what data set he bought marked, and what data set he frequently used. So here is an impact. Amunzin since launch has been consistently meeting more than 700 ugly active users at live because our like less than 5,000 total employee and it has been indexed 150k tables more than 5,000 employees. And in open source, it has a huge adoption. In our Slack community, we have more than 900 users. We have one more than 150 companies in the community, more than 20 companies are using it in productions. Here is a bit of landscape in our Amunzin inclusive diversity community. And Amunzin is pretty pluggable and could be extend for different use cases. Here are some short notes about other companies, how do they use Amunzin. For example, first one, Amunzin. They are using it for primary for data discovery, also integrated with their in-house data quality services, as well as the Delta analytics platform. They have a blog post to talk about in detail, you just take a look. ING built data discovery on top of Amunzin with Apache Inlet as a store. They care a lot more about security like Inlet screen, Apache ranges for supporting role-based asset. They contribute a lot under integration to Amunzin. They also have a blog post about the whole integration. Workday built a data discovery on the analytics platform named GOKU. Amunzin is a landing page for GOKU. Square is primary using Amunzin for security and compliance use cases. They contribute like the Grammarling and add AWS in-app to integrations. Here are a few recent contributions from the community as well. Since we launched dashboard entity, Hasuna contribute like a redash dashboard integrations. That still contribute a Tableau dashboard integration. Brax contribute like a local dashboard integrations and Amunzin has been contributing to the Delta analytics platform integration. What is the project's future one or two courses? In Q4, I have been solely focused on supporting the data lineage, so working on the UX design for a surfacing table lineage. The IFC on how to support data lineage is coming and it allows people in general mechanics to push very simple paths. ML features 2021 Q1 will focus on surfacing the discovery of ML features as a separate entity. For example, service like features set in-speak, service like feature and upstream data is lineage, service where there is metadata around ML features. Another is metadata platform. Currently, to use the metadata, allow users to primarily rely on front-end or Amunzin. But in fact, there are many different services that want to programmatic assess APIs to read and write the metadata. For example, we want to expose metadata to BI SQL tooling. For example, service like metadata like which table when user is composing their SQL, they will automatically service like what table they should join on what column based on metadata. They integrate with the data calling service to service helps for data calling informations, support hybrid pool plus push metadata in gestures, for example, build SDK to push metadata to Amunzin through API or cloud. Thank you. Hello, my name is Prasad Palavarti. I'm one of the steering committee members for Onyx and also one of the co-founders. My day job at Microsoft is leading the AI frameworks team which contributes extensively to Onyx and develops the Onyx runtime. Today I'll be sharing an update on Onyx which is a graduate project of the LFAI. Let's start with a quick recap. While machine learning usage in production continues to grow, the reality is that there are still many challenges to take ML models from research and development to production. There are many great frameworks to choose from. However, supporting and optimizing them on a variety of deployment targets is not easy, requires a lot of work and takes a lot of time. This is where Onyx plays an important role. Having a standard model format allows data science teams to use their choice of tools while ensuring that their model can be represented in a common way that can be easily run and deployed. Having a standard model format also allows development of acceleration technology and tools against just one format instead of many and reach a broader set of users. Onyx stands for Open Neural Network Exchange. Onyx defines the spec for representing models. The Onyx spec supports both DNN and traditional ML. Onyx has been a graduate project in the LFAI for almost a year now, thanks to its broad usage and support. One of the strengths of Onyx is the community. As shown here, many companies support Onyx and the list continues to grow as we regularly add new members. The great thing is that all these companies actually support Onyx in their products. It's not just a spec, it's actually implemented in shipping products. Here's a list of some of the products that support Onyx. You'll recognize many popular tools. The ones marked new have added Onyx support in the last six months or so. We're also happy to have Acumus, another LFAI graduate project, recently add support for Onyx and make it easy to create microservices for hosting Onyx models. In terms of metrics, we track some numbers to measure participation and usage. We expand these over time and have recently added new metrics for dependent repos and wheel downloads. We regularly have community meetings to engage with our worldwide community. These forums are well attended and provide an opportunity to share updates about the project with the community as well as hear how the community is using Onyx. We very recently had one on October 14th and you can see some of the presentations from a broad variety of organizations. All the presentations and recordings from this meetup are available on the website. So as you saw, many companies and organizations are using Onyx. I'll briefly share our experience at Microsoft. At Microsoft, we use machine learning extensively in a wide variety of products. These are products and services that have significant scale and very demanding requirements. As our teams develop new models and seek to deploy them to production applications and services, they run into challenges. These are the same challenges that everyone across the industry faces. We have tight inference latency requirements. Our models are being trained in Python but need to be deployed to production targets that don't support a Python interpreter. We need to deploy to Edge and IoT devices which have many size and performance constraints. Sometimes the same model needs to be deployed on a diverse set of clients with different platforms and configurations. In several cases, we are building a platform that needs to support models provided by others in different formats. At Microsoft, teams have chosen to use Onyx and Onyx Runtime to solve the problems of ML productivity. Onyx Runtime is an open source engine for cross platform accelerated machine learning. It supports the entire Onyx spec and is highly optimized. I'll share a few examples of how it's being used in Microsoft. Azure cognitive services makes use of Onyx and Onyx Runtime in scenarios ranging from computer vision to natural language processing to speech. For example, the speech service saw 10x reduction in time to productize new models in addition to a latency improvement. Due to the agility improvements, they were able to develop and deploy new models that increased accuracy as well. Some teams like Azure Connect need to support deployment on a variety of Edge devices ranging from Windows PCs to Linux-based IoT appliances. With Onyx Runtime, they're able to use the same model and the same APIs. Here we see Onyx Runtime running on a laptop and small devices from Intel and Nvidia. Onyx Runtime's extensibility mechanisms allow it to make use of the best acceleration available on each device so that the developer gets the benefits of a common software stack without having to compromise on performance. Windows ML is the API for Windows developers to integrate ML models into their applications and take advantage of hardware acceleration without having to worry about installing drivers or other toolkits. It uses Onyx as a standard format so that application developers can use the framework of their choice and get excellent performance on a variety of hardware devices. One class of model that many people have started using, especially for NLP tasks, are Transformers. Some popular Transformers include BERT and GPD2. These models yield excellent results but are very challenging to operationalize due to their size. Onyx supports these models and Onyx Runtime delivers exceptional performance for them, as you can see in the charts here, taken from some recent blogs jointly authored by the Onyx Runtime team and Huggingface, which is a company that specializes in Transformers. It's worth noting that Onyx Runtime is how the teams at Microsoft operationalize their Transformer models. And if you thought inferencing Transformer models is hard, training them is even harder, which is why Onyx Runtime now supports accelerating the training of these models as well. It's available as a preview and is being used by teams at Microsoft as well as some of our customers. So I've talked about how many companies, including Microsoft, are using Onyx, so it's likely that you may already be using it too. But if you aren't already using Onyx, you can get started easily. First, the Onyx models you provide a variety of pre-trained models. Most of them have detailed instructions for how to use the model as well. If pre-trained models are not sufficient, you can convert your own model to Onyx. Some frameworks like PyTorch have Onyx support built in. For other frameworks, there are tools that help you generate Onyx models. Once you have an Onyx model, you can inference it using a tool like Onyx Runtime. Onyx Runtime has a variety of language bindings, works on Linux, Windows, Mac, Android, and iOS, and integrates with many popular accelerators, including those from NVIDIA, Intel, AMD, and more. One quick way to get started is with the Onyx Docker container. You can follow the instructions on the website to pull down this container image and run through several example Jupyter notebooks. You've seen how Onyx is being used and how you can get started with it. So what's next? Well, it's really up to the community. Onyx has open governance and we invite everyone to participate. There are regular SIG and working group meetings, technical discussions and code live on GitHub. We also have active channels on the LFAI Slack for announcements and discussions. And make sure to sign up for our mailing list. We use these forums to communicate about upcoming events like the community meetups, as well as technical discussions about the roadmap. In fact, we recently had several sessions to develop the roadmap for Onyx. We have a few more coming up and we welcome you to join and share your input. Well, thanks so much for having me here. I'm very excited about Onyx and I hope you are too. I'll leave you with some key URLs to remember and I'm having to take any questions. Hi, everyone. My name is Bruce and I'm from the Angel Project. Thanks to OSS and the Summit organizers. It's a good opportunity today to introduce the recent progress of the Angel project. In this talk, I will firstly give an overview of the Angel system. Then I will share the features in the latest release version 3.1 and also the status of Angel open source community. At last, I will preview the features in the coming new release. Angel is an open source machine learning framework. It could be summarized in four aspects. Firstly, Angel has very high performance in model training with trillions of spots feature dimensions and it has been particularly optimized for the advertising recommendation scenario. It could be five times faster than other systems. Secondly, Angel is a full stack system and has a collection of more than 50 well implemented algorithms, including traditional machine learning, deep learning, graph embedding, and federated learning. Thirdly, Angel is an open source and has already graduated from LFAI foundation in the end of last year. So far, Angel project has total 11 sub-projects and has received 6,000 stars and been forked by 1,500 times. Finally, Angel has been widely deployed in production clusters and used in various business like advertising, financial service, social analysis, and so on. For example, there are 1,000 daily Angel tasks in Tencent. Angel has a full stack for machine learning pipeline. More specifically, it could be easily integrated with other ETL tools and then it provides facilities from feature engineering model training to model validation and also the auto machine learning for parameter tuning. At last, Angel can serve the model for inference. Upon the basic framework, they are easy to use visual modeling, model management platform, model serving platform for better use. There are two kinds of portals for the end users, the Tesla portal and Taiwan on cloud. Here is assistant architecture of Angel. The bottom fundamental layer is Angel core math library. In the framework, the key model is Angel parameter server which stores the global model to update during training. Besides the Angel native runtime, Angel is also integrated with other runtime such as Spark, PyTorch, and so on. On the top level, Angel has auto machine learning and serving models that could be accessed while interactive layers on the cloud. Recently, we released the version 3.1 on main 6. Here I show several key improvements and features. First, the parameter server has been improved significantly. In the third side, Angel supports auto model shutting across multiple nodes and large-scale model as well as more optimizers. In the executor, it can perform auto training data parallelism and resource sharing among tasks. As for the model level, Angel now provides user customized parameter server functions. Meanwhile, Angel has both hash and range partitioners for different data cases. Also, we try to make an easy platform deployment of Angel. Now it could be deployed in four ways. Hadoop cluster, Grenadiers environment, cloud native manner, and microservice. In this case, we also have a well-development graph learning framework, especially we publish a collection of well-implemented graph algorithms such as traditional learning graph embedding and graph deep learning. These algorithms could be used directly in the production model by calling with simple configurations. We also provide an operator API for graph manipulations, including building graph and operating the vertex and edges. The graph learning algorithms have been widely used in a variety of applications, while the accuracy has been improved, all the computing time has been reduced significantly. In the open source community, the number has increased in both contributions and committers. The projects have received 6,000 stars and foxed 1.5 thousand times. The numbers of new committees and the poor requests have been increased since the beginning of this year. In the past months, we have organized the four technical meetups. The events are all online due to the COVID-19. The events attract nearly 1,000 attendants each time. NGL also has close collaborations with other projects in the community. For example, we have installed the Acrimus C version on our Tencent cloud node and onboarded three NGL models onto it. We also integrate NGL with an open source job scheduling and dashboard project named TRANS. This is a snapshot of a job, a gestation with TRANS, and this is a view of a history of high-paper parameter tuning and its corresponding curve. In the coming new release, we will have a new model federated learning. As shown in the right figure, two parties A and B have data with the same ideas by the different features. For the privacy reason, we cannot physically merge the two data sets, but now with the new federated learning technique, we can logically join the data sets to train a single model. Here we only exchange the global model parameters rather than the raw data to keep the privacy. All the transportation are encrypted. The federated learning models of NGL will be open source soon with significant improvements in safety, performance, usability, and has been demonstrated in real applications. For example, the federated learning model is applied to financial anti-fraud to join financial data and social data. A logistic regression model is jointly trained and show improvement in AUC up to 15% comparing with traditional local model. Okay, this all recent update of NGL project, it is also in continuous progress. If you have any questions about NGL projects, please send mail to me at this address. Thanks, bye. Hello, I'm Julian. I'm the project lead for Marquez and also the CTO and co-founder of Denikin. Today I'm going to give you an update on the project. So to start with a quick agenda, I'm going to start with a problem statement. Then the ongoing effort mentioned the open lineage effort we have ongoing and talk a little bit about our regular community meeting. First, a problem statement. Marquez came out of the need to create a healthy data ecosystem and to create a healthy data ecosystem, you want to make sure teams are able to move independently in a agile way while not creating friction. So usually as organization grows and they're more and more team responsible for their own transformation of data, their own consumption, their own metrics creates a lot of friction amongst team interdependencies. And one good way to fix that is to add better visibility on those dependencies and create explicit contracts. When it's about data, I've been working on this mass loss data hierarchy of needs. You know, mass loss hierarchy of needs is about before looking for happiness, you need to have shelter, you need to have food, you need to be safe. And then you can look into being happy and other things. And the data hierarchy of needs is the same way you need to first have your data available. And then it needs to be fresh. And when it's fresh, you need to ensure that it's good quality. And then you can look at how to use it to optimize your business or to grow new business opportunities. So this is really the underlying need of understanding how data is available. Or it is not, whether it's on time or late or its quality. And so this is done through collecting metadata, which Marcus is about. So currently ongoing effort on the project, there's ongoing effort on improving the SQL support, so improving the SQL parsing in particular that works well for all postgres derivatives, like redshift, snowflake, and seminars and progress itself. Marcus provides schema versioning. So whenever a job runs, it's going to keep track of what's the new version of the schema and what version of the job changed it. Another ongoing effort is BigQuery support. The BigQuery API provides information through the job details. In particular, it provides information about which data sets are being read from and what data set is being written to. And it enables capturing metadata like the query plan and the query profile. And those are interesting information. There are also work on experimental Spark support. There's still work to do. And basically the way it works is using a Java agent to inspect the logical plan as the job is running. So internally Spark is a listener mechanism that lets you know that a job is started and the job is finished and what its logical plan, what its physical plan and so on, which is to also inspect the lineage and second have information about what's the logical plan, what's the query profile at the time it's running. And so this applies to all data set based jobs or Spark SQL. And there's also support for old school RDD jobs. Other aspect of market that is currently improving, there's something of a technical depth on the UI and visual improvement to improve the visual aspects to the CSS as well as revamping the current graph layout for the lineage. So those should have drastic improvements to the current UI experience. The last thing I wanted to talk about is our open lineage initiative. And so the goal is standardize lineage and metadata collection, not just for markets but across the entire data ecosystem. And so the current problem today is that there's lots of complexity. There's lots of different projects like Amundsen, which is also an LFAI project that we heard about today. Data hub, markets, Atlas, are a bunch of hosers that are all like needing lineage from a bunch of different source, whether they're more like pandas, or schedulers, or warehouses, or other SQL types things. And so there's lots of duplication of effort and there's a lot of catch up work happening on being able to stay in sync with the way to extract the metadata from this project. So the effort we're starting is to have an open lineage standard so that all those projects that use lineage can use a single standardized metadata spec so that we can share the effort of integration as well as avoiding having to play catch up and pushing in each project when the implementation is. So this is very similar to open telemetry and the way you would have just an API and that can be pushed as a simple dependency to all those projects to capture the lineage and then it's up to the user what back into use. So you can write to Marquez, write to Amundsen back in, for example. And so that saves removal duplication of effort and also we don't have to play catch up anymore because now this standard becomes part of every project. And so as part of this lineage, if we look at the delineation between Marquez, things like Data Hub and Amundsen, for example, this focuses on the spectrum of the lineage collection. Like if we take Marquez, for example, Marquez as a storage, as a UI, as also the exploration API. And this open lineage is really about just the metadata and lineage collection. And then you can have multiple backends for this. Last point I wanted to mention is we are holding now bi-weekly community meetings. Everyone is invited. Nodes are sent to the list as well as invitation. This is on the list from the LFAI Foundation website where you can find the Marquez technical discuss mailing list. And it also has a calendar. So if you want to keep track of the invitation or the Zoom link, it's happening. The next one is happening on October 29th, 10 a.m. Pacific time. And you're all welcome to join. And that can be used either to ask questions, to bring up improvements that we need to be made to suggest contributing to the project. Everybody is welcome and usually start by introductions of attendees and building the agenda. Thank you very much for your time. This was my Marquez update for October 2020. And have a great rest of your day. Bye-bye.