 Bonjour tout le monde. Hello everyone. Je suis extrêmement heureux de vous accueillir aujourd'hui pour cette nouvelle conférence virtuelle de l'initiative EA et Société et du Berkman Klein Center. I am delighted to welcome you today for a new virtual event co-hosted by the A&Society Initiative at the University of Ottawa and the Berkman Klein Center for Internet and Society at Harvard University. Produced to don't know me, my name is Florian Martin Barreto. This year I am a fellow at the Berkman Klein Center and I am also an Associate Professor of Law and the University Research Chair in Technology and Society at the University of Ottawa where I notably lead the A&Society Initiative for which we hosted a series of conversations to discuss the ethical, legal and policy issues raised by the development and deployment of AI systems and the Berkman Klein Center of course have also been very active in in the space and in fostering such conversation and I'm really delighted that we're able to partner for today's event. So for our conversation today I am very happy to be joined and very I'm delighted to be joined by Faith Magicology, Michael Geist and Ruth Okedigi to discuss corporate issues and recent developments around the world regarding texts and data mining. To briefly present our speakers today so Dr. Faith Magicology is a fellow at the Berkman Klein Center for Internet and Society at Harvard University and an Aston Professor in the Faculty of Law at the University of Alberta. Dr. Michael Geist is the Canada Research Chair in Internet and Economist Law and a full professor in the Faculty of Law, Kamala Section at the University of Ottawa and you will also hear from Dr. Ruth El Okedigi who is the Jeremiah Smith Jr. Professor of Law at Harvard Law School and who is also co-director of the Berkman Klein Center for Internets and Society. We will start our event with short presentations by our speakers and I will then help facilitate a conversation with our guests. I invite attendees with us today to use the Q&A function of the webinar to ask questions that you may have and I will look at those questions and ask them to our guests. So thank you again everyone for being with us today and without further ado I will give the virtual floor to Professor Magicology. Thank you. Good day everyone, thank you for coming to this event. I would like to start off by providing a context for today's discussion on copyright, text mining and AI training. In an H-rated application of computer algorithms, large computing power and tools add a high volume, velocity and variety of data generated in digital form collide whether perfectly or not so perfectly. It's actually not surprising that computer scientists and technologies devised a way of applying computer algorithms to large data sets for data processing, discovery of information and knowledge. In the last couple of years, text mining and AI training have become in the British times as many technological developments have been based on automated analysis of text and data to generate valuable information and to make machines intelligent. In the EU copyright directive on digital single market, text and data mining was defined broadly as any automated analytical technique in that analyzing text and data in digital form in order to generate information and this information could include patterns, trends and correlations in the analysis data. Text and data mining is this really an activity that also occurs in AI training because AI systems or machine learning models train on large data sets that are mined to generate information and to generate knowledge that are used to make machines intelligent. You will agree with me that text mining has become much more progressive in the era of data and internet of things as digital systems generate, record and process data to facilitate interactions between interconnected devices and also now interactions between humans and machines. Internet search engines and virtual assistants like Siri and Alexa mine data from the web and other services to provide outcomes to our search queries. Generative AI tools like Pat's GPT are made possible by the availability and mining of large data sets of all which this language models are trained. Researchers especially those in digital humanities and biomedical sciences are also now employing text and data mining techniques to analyze and process a wide range of data to derive information and insights that could be useful in solving key research challenges in the areas of health, agriculture, climate change and many more. Given the relevance of text mining and AI training to research and innovation it is not surprising again that scholars and even policy makers are concerned about the potential of copyright law to stand in the way of these activities. Now to the crux of our conversation today how does copyright really interface with text mining and AI training? Now when text mining and AI training activities involve news of what's that protected by copyright law in data sets copyrights may be implicated. Copyright law grants the copyrights alone the exclusive rights to the reproduction distribution telecommunication and adaptation of the work. In essence copyrights love prohibits the drain of this act without the authorization of the copyright owner and CDM activities and AI training would necessarily involve the reproduction of what's that's from the business of data sets to which these algorithms are applied to train AI systems or to derive information from large data sets. When simply there can be no CDM activity AI training without the use of copyrighted materials whether these materials are in the form of text images sounds or visuals or a combination of any of this. And the main fact that you know these materials are readily available that is without clear wall on the internet or in a library that is really obeys the need for permission unless a negotiated license or a coupon license has been granted by the copyright owner or legal exception permitting the exploitation or use of this work without seeking permission existing the relevance of rights law. Without a negotiated license or an open license or an exception permitting the reproduction or even the adaptation of works for text and data mining purposes and AI training purposes copyrights law could be implicated even if this works are readily accessible on the internet. And even for works that are readily accessible on the internet there could be digital logs and you know scientists technologists would have to circumvent to make reductions and even adaptations possible here by again indicating circumvention provisions in copyright laws. Now why the computational analysis and processing of digital or digitalized contents to generate and synthesize information without this information that is being generated by this computational analysis are right or wrong or they're even doubtful. The use of this computational techniques have become commonplace today and people are quite amazed about the potential of AI. But these countries are still grappling with how to react to this development in the light of the use of in copyrights for text and data mining and AI training purposes. Now jurisdictions like the EU, Singapore, UK and Japan have reacted to this copyright and text mining interface by making legislative changes in the form of text and data mining exceptions to their copyright laws and these exceptions either allow the use of mocks in text and data mining for both commercial and non-commercial purposes provided the use would be non-expressive uses or sometimes just allow the use of this works for non-commercial or research purposes. And these are just the handful of countries or jurisdictions that have reacted to the text and data mining and copyright interface. The majority of countries however are still struggling to decide whether the use of copyrighted works as imputes for text and data mining and AI training activities implicate or should implicate copyright laws. Now part of this stance in residing on this particular issue is not unconnected with implications of any decision that is made by a country for innovation especially development of AI and other frontier technologies. Also the value of works involved in text and data mining and AI training activities makes it more challenging to decide whether obtaining permissions for the large range of copyrighted works involved and for works that are already readily accessible is feasible and even necessary. On the other hand we also have concerns that the unlicensed use of works to the theft of the intellectual labor of others who are often neither acknowledged nor remunerated for the use of their works. Oftentimes it is infact even oppressable to acknowledge the authors of works that form the basis of text and data mining activities of AI training activities. This concern is even more heightened when AI training activities culminate in the development of generative AI tools like captivity because these tools could be commercialized and could even lead to the exclusion of those whose works form parts of the training data for the tools. So on the one hand they are not being remunerated for the use of their works and on the other hand they have even been excluded from the use of these tools because access barriers are being developed. Now the question remains whether copyrighted owner must be given control over or remunerated for all forms of exploitation of their works. This has been a question that keeps coming up in copyright law. Should we remunerate the copyright owner or should we give control to the copyright owner over every form of exploitation of their work? Or are there certain forms of exploitation that should be excluded from the scope of the exclusive control of the copyright owner in the best interest of the general public and whether the use of work for text mining and AI training activities should fall within the scope of such acts or explanations that should be excluded from the scope of copyright protection in the best interest of the general public. Now while this question has not been answered one way or the other in Canada, in the context of text and data mining and AI training, there are suggestions that to the extent that CDM and AI training activities involve roughly accessible works on the internet or roughly accessible works in libraries that such activities, the online use of works for such activities should be covered under extent copyrighted exceptions, especially when those activities do not lead to the communication or the distribution of the old law and substantial of the copyrighted works that form the basis of the data corporacy. Now most commentators have relied on section 29 of the Canadian copyright switch provided a fair deal for the purpose of research, private study, education, parody or satire does not infringe on copyright in Canada. All right, even in reliance on this particular section to justify unlicensed use of copyrighted works remain very unclear, especially because unlike the United States, Canada has an exhaustive list model, which means unless the feddling activity is for any of these purposes or for the other purposes in section 29.1 and section 29.2 that is criticism and misreporting. It could not be covered on that feddling provision in Canada. And so whether every text and data mining activity or every AI training activity would necessarily be interpreted to amounts to research activities remains unclear, even in the light of Supreme Court's decision that we should give a large and liberal interpretation to what research and to federaling purposes, but it still remains unclear whether this Supreme Court of Canada would be willing to actually give such an expansive interpretation to the world research for the purpose of text mining and AI training, especially for those that are not involved in traditional research activities. Now in 2021, Innovation, Science and Economic Development Canada, ISED thought for insights into copyright and related issues surrounding text and data mining and the use of copyrighted works to train data for development of AI systems. While the consultation period has since ended, the copyright issues are yet to be resolved by means of legislative interventions, as there appears to be desire to promote AI and text and data mining on the one hand, and to remit authors for works used in the course of these activities on the other. Now whether both can be achieved in Canada remains to be seen, but relevant question is how do we ensure that copyright law does not stand as a barrier to innovation in Canada? In the context of African countries, copyright issues arising from text and data mining and AI training are also beginning to be raised, notably in the recent proposal by the African group, so the Standing Committee on Copyright and Related Rights at the World Intellectual Property Organization, was adopted in March 2023. The African group actually urged the SCCR to consider facilitating a discussion and exchange of views on limitations and exceptions for text and data mining research. Now while the intuition of text and data mining, limitations and exceptions for text and data mining research in this proposal suggests a desire by African countries to ensure that copyright does not actually stand in the way of text and data mining activities in the region. The restriction to text and data mining research as this aspect of text and data mining activity, just text and data mining research, actually reflects a culture's expectation in African countries to dabble into the waters of exceptions and limitations for text and data mining and AI training activities. Now, African countries are interested in the potential of TDM activities on solving some of the key challenges of the continent and in also advancing research and innovation on the continent. Everybody are also concerned about the possibility of knowledge being mined and used in the creation of technological and knowledge goods that may eventually be inaccessible today. And these concerns are very valid in the face of colonialism, but also not only in the face of colonialism, but also in the face of continued misappropriation and misuse of cultural heritage, traditional knowledge and genetic resources that have gotten from local communities in African countries. Some African countries have valid reasons to really be concerned about the implications of text and data mining and AI training that rely on text and data that is generated within the region, used for technological activities outside of the region and used in the development of technological tools that are inaccessible to people within the region. But yet, they are also interested in what can be done in terms of copyright numsetting to further text and data mining in the region, perhaps in ways that address their concerns. So, even though African countries have these concerns, they also appreciate the benefit or the importance of text and data mining activities to research and innovation on the continent. So, in today's conversation with professors, with pedigree and guides, and as Chastitypity and other generative AI tools reach on, I hope we will be able to discuss whether our copyright should try to patch up. Thank you, everyone. Thank you so much, Faye, for this presentation and explaining a bit all of the legislative forces at stake and what's going on around the world. So, maybe I will invite Professor Geist to offer some comments for five to ten minutes. Yeah, I'd be happy to do that. Thanks, Florian, and thanks, Faith, for a really great presentation. Well, I'm going to focus just expanding on a few things, especially from a Canadian perspective, but I want to start by saying that this feels like a really important moment when it comes to this issue. It, in some ways, feels to me almost like the this generation's digital locks or any circumvention pools, where we ended up with almost 20 years worth of debate around what felt like paracopyright and legal protections for digital technologies. And we saw how that unfolded both at Waipo and then in many other countries in the DMCA in the United States and Canada with the Copyright Modernization Act, and it has played out in many different places. And it seems to me that we are starting, I think, to see some of the same kinds of issues begin to emerge. And that this will, I think, have similar kinds of stakes, perhaps even bigger stakes, if we think about the impact that AI is likely to have. There has been, at least from my perspective, a pretty significant shift in the discussion, even around things like text and data mining, or as described at one point in time in Canada, as an IA exception, information analysis exception, from one that was viewed as seeking to facilitate AI, looking where there was a great deal of excitement about what AI would bring and concern that copyright could create a significant barrier to AI achieving what many were hoping for, to, I think, what is now a case where some are looking to copyright to stop some of the AI development. Rather than seeking to embrace it, one gets the sense that there are increasing amount of fears about where AI is headed. We've seen, quite literally, some AI leaders talking about finding ways to put a stop to, or at least a temporary halt to some of the development that we're seeing. And one gets the sense that copyright may be used as a cudgel here to try to facilitate some of those barriers. And I think that's a pretty significant source of concern. I'll turn to some of my specific views, I guess, once we get into a bit of the dialogue, but just to highlight a little bit of this discussion, I think, may be evolving at least in Canada. Faith mentioned the consultation around text and data mining in Canada. I think it's fair to say that the way the Supreme Court of Canada has interpreted fair dealing consistently now for the better part of two decades, there's a pretty strong case that many of the kinds of uses that we see would still be covered by fair dealing. We've seen a full change in the court in terms of its composition, and yet the principles that the Canadian Supreme Court has adopted around user's rights, around a flexible interpretation of fair dealing, and the way they have embodied that in a number of different cases, looking at some of the purposes that faith identified, suggests to me that there's going to be at least a pretty reasonable case that many of the kinds of uses that might arise would be covered by fair dealing today. That said, if you are engaged in the business of developing these tools, either at a research level, but even more on a commercialization level, and in Canada, we've started to see much more of an emphasis on some of the commercialization side. I put out a regular, almost weekly podcast, and my guest this week is Aidan Gomez, who is the CEO and co-founder of Cohere AI. It's a Canadian AI company. It's a sizable Google investment as part of it. Aidan was involved in the tea, in chat GPT, in developing some of the transformers. It comes, I think, with a great deal of credibility in terms of where it's been moving, and that emphasis on shifting away from what was once, from a Canadian perspective, leadership on the research side of AI to leadership on the commercialization side means that there will be more attention on what the legal frameworks look like. And it seems to me that if you're looking to invest millions, hundreds of millions, even billions of dollars, as we have started to see in some of these, it's not good enough, in many instances, for some of those investors to say, well, we think that the Supreme Court might come up with a pretty reasonable interpretation of fair dealing. That becomes, I bet, the company kind of issue, potentially. And so there will be many who will be looking for greater certainty. The consultation that Faith made reference to was part of a series of consultations that the government has launched around copyright for those that kind of aren't familiar with our long processes associated with this. We spent the better part of about a decade battling over white bow implementation, leading up to reforms that took place in 2012. And over the last decade, there's been some piecemeal changes, some of them driven through by trade agreements, for example, we extended our term of copyright due to the USMCA or COSMA. So we've seen a bit of that, but there have been less of the larger kinds of reforms. This was put forward as one of them. But I have to say the cynic in me would suggest that if we look at some of the outcomes of those consultations and the willingness of the government to pay much attention to them, it's not clear to me that they're paying much attention, quite frankly. That certainly was the case with respect to copyright term extension with respect to the text and data mining consultation and AI and the IA consultation, informational analysis exception, you had a wide range of perspectives. You certainly had a strong case made for the need for this kind of exception to provide that additional level of certainty. But I don't know that we've seen, at least so far, a significant amount of uptake within government. It's possible that this will become a priority, but we haven't been given, I don't think, a strong signal that this is something that we're going to get. And if it does come, it may come as part of a larger package that may raise its own set of issues in terms of where the government wants to head. They didn't spend much time mentioning it, but I do want to mention that there is another piece of AI regulation that is currently being discussed in Canada. And I think it actually, the way in which it has begun to evolve, highlights the likelihood that we will see an acceleration of this issue, although not necessarily one that seeks to facilitate greater AI development in use, but rather potentially sees this as a mechanism to impede some of that AI development or at least sort of allow some people to hit cautiously the break a little bit. That bill is Bill C27 in Canada and it's a bill that has both elements of privacy. In fact, it is billed primarily as a privacy reform bill, but there is separately a section that deals with AI regulation. And it's a bill that has not moved quickly, although that seems likely to change, seems about to change. And it seems as if that a big part of that is in fact the AI side of the equation. It's a piece of legislation, at least the AI provisions that have not received uniform applause. In fact, they haven't received much up until a few days ago, hadn't received much applause at all, quite frankly. Many suggesting that the bill itself was more virtue signaling than actual regulation. It doesn't provide much in the way of specifics. And there is a long timeline for regulations and actually providing a bit of meat on the bone, so to speak, in terms of providing some specifics. But I think in light of some of the kinds of campaigns we've started to see around chat GPT, and that includes I think some of the copyright elements, it's clear the government's paying attention. And just last week, the responsible minister, Francois Philippe Champagne, held a meeting of his AI task force. And as part of that, we've started to see just in the last number of days, growing calls to move very, very quickly. I've talked to people who say the government may try to push this through by the summer. And that's a piece of legislation that I don't think is really worthy of pushing through. There is a need for I think some significant reform to it. But I put all of this on the table before getting into some of the substantive questions, which I hope we have the chance to do as part of the upcoming discussion, just to highlight that the relevance of this and the speed with which, at least in Canada, it seems to me governments are beginning to turn their attention to this and be willing to regulate I think demands our attention for those that are interested and concerned with some of these issues. And that in many ways the script has been flipped a little bit. And the role of copyright here may ultimately be at least in the view of some pushing forward on these regulations, less about seeking to facilitate AI, and more about looking for ways to constrain some of its development copyright being identified as one of those means. I'll stop there and look forward to some of the additional discussions we'll have. Perfect. Thank you so much, Michael. And I will try to stay in my role of facilitator regarding AIDA, do not offer too many comments. But so without further ado, I will hand it over to Roth or KTG for your comments. Thank you so much to Faith and Florian and everyone in the audience for this webinar and the opportunity, I think, to gather and really delve into some of the challenges that we're facing. I agree with Michael that we are certainly at a moment this does feel like a moment in which the choices that we make will shape both the nature of the policy outcomes that we're likely to get and also influence the way that the subject matter of copyright in particular is likely to evolve in the face of these large language models and generative AI. I want to just say a number of things because I think the conversation is going to be very important in helping us think through this. First, whenever we hit a technological frontier, there are always the first movers. And the assumption in other fields other than law is that first movers have first mover advantage, that they can shape the outcome of the debate. They can select what values or norms we are likely going to advance and that they can begin to make it hard for other countries in this sense to do something different because there's always a pull towards harmonization. We saw this, of course, in the harmonization battles in the 1990s. We saw them in the digital locks battles that Michael has already alluded to and I suspect that we're going to see them here. So we already have, as Faith noted, we already have early movers, the UK, the EU, Singapore, really taking statutory measures to facilitate text and data mining. Then we have slow movers, but fast watchers, which is where I would put the US. In other words, the US watches what everyone else is doing, rarely moves legislatively as a first option and tends to allow these issues to percolate through the courts. Now, that leads me to my first major point and that is the nature of copyright lawmaking. And while the system that we have in place, which involves a serious exercise in political economy, often requires bargains between competing interests and rarely indulges the careful reflection and slow deliberation that may be required, while that model may work in our favor, as Michael has alluded, for those who may be interested more in restraining the rapid development of AI, I just want to say that this model of copyright lawmaking is fundamentally flawed. And I don't think that the fact that we may benefit from it in one instance over another ought to cause us to overlook that we are dealing with a legal discipline that has deeply divided policy goals and deeply conflicted normative objectives. And we need to always ask the question, are we addressing the problem with the right tool? Is copyright law the appropriate place to have the battle over access to text and data mining? So that's the first question I want to put on the table, that making copyright law in a manner that facilitates decision making without a clear sense of what the tradeoffs are and where the technology might lead us may not still be the wisest option. It was not the wisest option when we handled duration. It was not the wisest option when we got the White Book Copyright Treaty in its initial stages. It was not the wisest option, certainly as we are now looking at TDM and AI. So what does that mean? If this is the system of lawmaking that has now gripped not only the US but Canada and Australia and West African countries and all over the world, copyright lawmaking is a contest about what we fundamentally think the right innovation policy ought to be. We have turned copyright rightly or wrongly. It is a measure of innovation. And so the battle over do we do innovation through exceptions and limitations or do we do it through the exclusive rights remains very present even in an era in which access to publicly available information is really what is generating the need for attention to text and data mining. So the first thing, of course, is the text and data mining with works that are in the public domain presents no problem. You don't need an exception. You don't need to fight at the policy table because the work is in the public domain. And so we have a tendency to ignore the old battles in favor of the new ones. But may I suggest that duration remains a problem? If the tension between TDM is over, do I get to freely access it to do my research or do I have to seek permission? Then it tells us that the length of copyright duration remains a huge question which affects the scope of the rights that we are willing to give and also fundamentally affects the limitations of exceptions that we need. If we had shorter copyright duration, there are many more works injected into the public domain if we had even tiered duration where not everything is subject to the exact same uber long duration, we may actually have a more robust set of works in the commons on which text and data mining researchers can operate. So I want to be sure that we are not indirectly and unintentionally reframing the problem, that we have a fundamental problem with the duration of copyright. And we need to think about what this means in an era of rapidly changing creative options, rapidly changing creative mechanisms, and rapidly changing innovation frontiers that require large inputs of data in order to actually do the work of advancing the progress of science and the useful arts and enhancing human flourishing. That's the first thing. The second concern that this highlights for me is the limits of limitations and exceptions. Every new possibility of scientific progress requires data. We are unavoidably and unchangeably and irrevocably in a data driven economy, a data driven culture, a data driven production system. If data is the basic input and we recognize property rights in data and then we build on top property rights in data, copyright interests, and then the only way we give keys to unlock the data is through limitations and exceptions. It means that for every innovative turn that requires access to data, we are dependent on this broken system of copyright lawmaking. We've got to go back to the Canadian Parliament. We've got to go back to Congress. We've got to do all of these things. In the case of the United States, as many of you know, the author's alliance was able to get a new exception to section 1201 of the DMCA to enable text and data mining research on e-books and films. When you think about the transaction costs of getting these permissions to do the work that copyright was designed to do, to enhance knowledge, to promote progress, to build the commons, if every battle requires an exception and that's the modality in which we become accustomed, then the very project of copyright law I think is imperiled. We have to be thinking then perhaps of models that include unfair competition, models that include fundamental rights to reverse engineer or to engage in cultural interchange, things that allow us to participate as a society in the data economy and to promote the advancement of research and progress in the useful arts. So I think we need to ask ourselves whether exceptions and limitations are becoming the default mode in which we advance the very goals of copyright and whether that is the kind of environment in which we think that text and data mining will most flourish. Remember that of course when we're talking about text and data mining we're talking about large-scale computational analysis. This is an exception that needs to endear for the rest of our foreseeable future because we're not going back on the way in which our economy is built on data for purposes of medical research, of scientific advancement, of learning and understanding new bodies of knowledge. There's no going back. So a 1201 DMC exception that lasts for a certain number of years and we have to repeat repetition or a legislation that is drafted fairly narrowly that very quickly outlives its utility for scientific research. The question is are we really being pound wise and penny foolish or penny foolish and pound wise? I want to challenge this and I want us to think about this as we engage with policymakers and engage with lawmakers about what makes sense for the kind of economy that is built on large data sets for which the inputs may or may not be subject to copyright protection but clearly we have challenges with inputs and we have challenges with outputs and it strikes me the limitations and exceptions work very differently if you're thinking about an input versus if you're thinking about an output in the context of AI training for example. So that's a second major point that I want to make. The third that I want to make on top of duration is that one of the challenges we have faced with text and data mining consider the EU exception. In a paper that Tom Margoni and Martin Krishna have written they point out the inherent biases in the way in which the EU text and data mining exception exists. So only certain individuals get to exercise that exception. In a world in which we are often concerned about discrimination and downstream innovation and ensuring a small and local small and medium enterprises are able to compete in this market the idea that we would distinguish in an in artful way who can use this exception and who can't already tells us that the exception limitations mode in addition to being not the best way to do policy also has an inherent potential to be discriminatory. So who's at the table who gets to use the exception? Under what circumstances is the exception in fact allowable and that then of course brings me to the third point about fair use the fourth point about fair use. So to Michael's point and we're seeing that fair use is flexible fair use certainly allows us to engage in uses with a zero royalty rate. In other words it's at your risk you and I think obviously fair use is in a way a policy tool that gets its greatest effectiveness because it doesn't operate as a property rule in many ways it's a liability rule you you essentially go in you use it and then you figure out after the fact whether or not it is legal. My question is whether given the promise of text and data mining for large language models given the promise of AI do we do we want to rest on the uncertainty that pervades the more flexible approaches to limitations and exceptions. So as much as I am a fan of fair use and we of course have the Warhol case before us I question whether even in the context of TDMs whether what we want is the uncertainty and the instability that can permeate ad hoc case by case determinations of whether or not the exception is in fact allowable in a particular context. My sense is that as a matter of innovation policy this is likely not the sustainable outcome that we might want to use and my last point really is the question of how we might reimagine a text and data mining exception that isn't a victim of the instability and the uncertainty that marks fair use that isn't subject to repeated efforts of renewal in the DMCA style manner and that isn't subject to a narrow definitions in a way that actually limits who can use it and who can access it and so one one possibility is should we be thinking about a sweet generous regime as opposed to exceptions and limitations as opposed to fair use as opposed to DMCA rulemaking authority should we be thinking about something fundamentally different that allows the experimentation that precludes the discrimination and that facilitates the kind of innovation that AI and text and data mining practices suggest is right at our very fingertips and I'll stop there. Perfect thank you so much for those thoughtful comments we have a lot of questions from the audience I also took like so many notes when like the three of you were like speaking but maybe first I would like to maybe offer the opportunity to faith to reply to those comments or like offer some more like no thinking based on what you just heard. Do you want me to pick a random question? No no no you if you have like comments and thought on about what Ruth and Michael just just said and then I will ask a few questions. Thank you for those guys. I mean I haven't quite thought of the fact that what's really happening now especially with view of citizens of Canada is like a reversal of the role of copyright law but isn't copyright law to restrain development in AI and that's what it's interesting because what that would do is it would lead to the expulsion of the exclusive rights that's copyrighted and has over the exploitation of copyright works and again it takes us to a place where we take us to a place where we end up with a system that is more rigid than the system we had when it comes to optional discussions relating to copyrights AI and data mining. And also to refer to Professor Kedige's points relating to the limitation or the restriction of exceptions and limitations as tools for facilitating access to data sets for large language modeling and also other AI at extra-determined mining related activities. My greater we remain to start thinking about sweet-generated systems we remain thinking about the limitations of exceptions and limitations and also particularly because most exceptions and limitations facilitate yields but not necessarily facilitate copies and text and data mining activities necessarily involve the making of copies that are being mined or that are being used to train AI systems and also copyright laws or intellectual copyright laws so far have tried to be kind of neutral when the necessary should not be so in the context of say discrimination or AI bias if we rely on just copyright exceptions and limitations it would not this is really neutral to all of the bias and discrimination attached to the use of inadequate data sets or non-widely represented data sets so I agree that dealing with the issues relating to AI we need to look beyond copyrights law and perhaps we ask the Congress just to do much more than it is it is capable of doing and also to both press a guy stand for a circuit is comments relating to you know fails and fiddling provisions which first flexibility and which we could say that's in the US or in Canada we could rely on fiddling to justify you know these AI activities but businessmen or presidents engaging in these activities for commercial purposes require a lot of certainty to actually go into project this project even researchers who are going to be engaging in what million dollar projects want a level of certainty going into this project and relying on rather than just rely on you know an after fact application of fiddling or or failures or failings provisions so I agree that we need most certainty regarding whether we could actually engage in text and data mining activities activities we could actually reproduce this works we could actually circumvent this uh this style logs rather than wait until you say the supreme court decides on whether we can do it and the supreme court would make that decision within a particular context that may not not again provide as much generalization for the use of these works in for that context so I'll stop there and and allow the questions perfect thank you so much uh faith and there's been a a few questions uh I think also picking up on what uh professor KTG uh said uh so yes fiddling's uh fiddling in the US uh you know in Canada for youth uh in the US might serve as a defense but we don't know and we will not know until like both supreme court I think can uh offer an interpretation um so what what should what should we do should we uh so there is a need for like harmonization of course also like to support innovation we cannot have it contrafing a different approaches this is why like over the year the spiritual copyright trademark patent that's been a global harmonization but if copyright is not the right forum to develop such tool which one would it be or is it like too complex and because you know the the you started and and roof you mentioned the the paper that you mentioned I thought it was like very interesting in like picking up on so many of the the issues with the the EU framework but so yeah so should we go uh toward this way and build better maybe uh exception with a more refined definition of text and data mining or do we need to go more like on as you say and for competition or or otherwise what could it look like I know it's kind of ask you to have you crystal ball for the three of you but maybe uh like to give the conversation if you think we need to have such an exemption artist to support you know and less unlicensed use of corporate work for for TGM what would be the the best way if you could be like advising those governments Michael since I just finished speaking why don't you start and then I'll jump on after you okay I'm I'm doing that and well I'm going to give a bit of a non-answer answer I guess um which is to say that that I feel a little bit like the house is about to be burning and we're and the and and we're going to miss this if our if we're spending our time focused primarily on how do we get greater certainty through the text and data mining exception or some other exception I think Ruth's comments about the framing as an exception I think are really apt but I think it's it's more than that I think there is a a big push right now for more certainty but it's the opposite kind of certainty the certainty that we are starting to see a push for is a certainty of payment and I think that creates some significant risks we see it in news for example where the same lobbying that took place by Rupert Murdoch in Australia to get to obtain payments for linking to news and we're seeing it playing out in Bill C18 here in Canada is now playing out in AI arguing that if you're using my news as part of your data set you need to pay the author's guilt is looking for changes to the law that will basically stop similarly the use of works within these large learning large learning language models without some form of compensation so I think we are rapidly moving to a world in which the battle or the at least the political discourse around legal reform is less around how we do we facilitate this by providing legal certainty to allow this to take place to well if you are going to do this you're going to have to pay and I think and it's not that there's an absolute version to payment I think that this is a complex issue more complex than say the search side which we could identify as you know clear societal benefits in terms of enhancing access to information and reasons to ensure that that activity could continue but I fear that the kinds of concerns that we have with AI will be exacerbated by the kinds of lobbying that we're seeing right now so if what we see are fewer data sets used because the standard becomes unless you unless you pay it doesn't go into your data set the concerns we have around bias where only certain kinds of information gets included are going to be exacerbated we're going to have less information less ways to teach these systems and what will be in there is the stuff that people either either they can afford or more likely it'll be stuff that people will want to influence outcomes potentially misinformation type outcomes and otherwise to say no no you can go ahead and use our stuff no payment required and what we end up with is I think risks of more information misinformation coming out of these systems we also run a risk I think of creating a less competitive environment if it's an environment where we say we've got a pain we get into how you even do this when basically you're saying some sort of global collective in which everybody's content is entitled to some sort of compensation I don't really see how any of that is really workable but we do know there will be groups that say hey we need compensation if it is only the Googles and the Microsofts and a couple and a handful of other companies that are in a position to say we actually don't mind this because we can use our economic power to ensure our future dominance in this space because others simply won't be able to enter into this space they won't have access to the large language models I think it would be a terrible outcome for for both the development of the technology as well as ensuring greater competition and so I see that so it's not a direct answer but I do think that the debate around legal certainty for the activities that we're talking about is playing out it's just not playing out in quite the way that we I think envisioned a few years ago I would just jump in I've been looking at some of these the questions and I think they all they all revolve around you know what what do we do now imperfectly while we sort of figure out what we might do in the long term more perfectly so I think I want to echo some of Michael's concerns that this is playing out now and there are certain things we ought not to do we may not know exactly what the best framework is and with innovation we rarely do but we certainly know that there's certain things we should not bake into the system so let me just sort of make my priors pretty clear if we fundamentally believe that text and data mining is something that is an added value to the economy in which we live that it is necessary for the production of goods and services and the production of knowledge that will enhance human flourishing and that it is important to permit and facilitate access at a level that will give us the greatest outputs possible in other words a very an argument that mirrors our argument for intellectual property then there's no question that we need to recognize it not as an exception not as something that is exceptional that is under certain circumstances you can do it for so long by so many people but rather we need to recognize it as a limit to copyright that framework I think is very different I think recognizing the copyright law was not designed to permit and to grant a monopoly over basic building blocks patterns and practices for which we all have a consensus that our societal advancement requires access at an optimal level we should not be doing it one paper cut at a time and that is the issue because every time we fight over exceptions it's not the principle of the exception okay we recognize it's actually the contours how do you design the exception who gets access to it how long does it last under what circumstances can you use it and all of a sudden what should have been the response as a policy or normative matter actually becomes part of the problem and that I think we absolutely have to avoid I also want to say that whatever we do with a regime that recognizes TDM as a limit to copyright not just as a mere exception even that's not enough we still need to have ways for example in which we address contracts what if access to the works requires you to agree that you're not going to in fact conduct the kind of large-scale computational analysis for which you want access to the work what do you do with the private ordering of access to the data comments what do you do with that and so the idea that we need a public policy that drives the kinds of legal options and legal tools that facilitate access in order for TDM to occur is my larger point that we're going to need not just one tool but a number of tools but we're going to need a policy framework that says that what we're doing is creating an environment in which innovation can meaningfully advance building on existing knowledge that is out there and that means for example we did this with the idea expression distinction nowhere in the world do we allow a copyright owner to obtain monopoly control over an idea even if it's wrapped up in a contract same thing with the limits to the distribution right if you purchase your physical copy of a book we don't allow the copyright owner to control what you do with that book in the privacy of your home same thing with the need for privacy rights the reason we had some of the exceptions in the burn convention was to protect the right to privacy research being in many ways part of an expression of the privacy right I mean so there there is a collage of mechanisms that need to work in meaningful balance in order to create an environment for which in which TDM is operating so that we can get the benefit of the advancement in scientific research and knowledge that we so I think favorably want to see perfect thank you thank you so much and I really like the distinction between like an exception and a limitation to copyright it also like speaks to the interest at stake and and all the issues about like access as you mentioned they're like not new in in copyright like in in academia and you know this we know this way too well with the licenses of the database and one of the attendees was mentioning this in the in the chat well like given document but like in public domain I've been in public domain are not publicly accessible and universities are paying like billions of license fees to have access to those to those documents and Michael as you're mentioning it's also an issue but like bias and misinformation because yes bias data and like often like data from from the from the north or like misinformation is pretty accessible and quality data for research etc is often like behind a paywall so it's going to be even it's already an issue but it's going to be even a bigger issue if we cannot train LLMs on on those points but as a faith did you want to also maybe react on this want to make sure that we have all the all the perspectives and and opinions in the in the conversations there thank you actually I I wanted to go back to a point that I made in the presentation and to get to the results based on the page which is how could you just thought of that which is something I've seen with um African countries African countries are interested in text and data mining activities but they are also uh precious uh regarding what the implications of text and data mining activities could be especially if it involves mining data that comes from the continent to build technological tools that are not accessible to the continent so how do you think African countries can address that concern with copyrights law even because this kind of research is happening within context of copyrights law and saying okay we're going to text and data mining activities what can we do within context of copyrights law to ensure that we also reap the benefits of text and data mining activities and we're not in turn excavated from the benefit so from the good that could arise from text and data mining activities no and I think you're you raise a very good point about like access to to knowledge and data and also like the the powers uh at play and someone in the in the Q&A was also no explaining that some uh big corporation who have access like such as university you know like managing catalogs and stuff they they can develop digital logs to impede on like you know people ability to do text and and data mining in the catalog you have as Rufus mentioning like big contracts or like basically even though that would be maybe for use you know under copyright framework it would still be like you know an infringement on you on the contract and determined conditions so you still cannot cannot do it and during that time there is also like those big corporations in the north who could just like access all of the knowledge and extract uh knowledge from majority world and we've seen those issues before with traditional knowledge and induced knowledge so I think this is also something to like keep in mind and making sure there is a global conversation uh for for this and and do you think that's maybe uh we we need to have a new uh WIPO uh treaty or should it be like a WTO treaty or like no we we're done with the treaties we need more like a collaboration between the states uh or is it like or there like room for international uh collaboration and I know that the three of you have been like very involved in international you know negotiation and conversation so I would love to have like you know your thoughts on this. In terms of international treaty I don't see us getting an international treaty on our text and data mining right now and also not even at the pace that we really need the response to the text and data mining issue because when the issue of text and data mining came up at the last SCR the African group wanted to make it a point a point in the work program that would be addressed by the SCR for the way objections even relating to you know just having conversations relating to decisions and exceptions for text and data mining activity that they had to uh concede to using the word may that we may have conversations related to text and data mining activities so even to have to start that conversation at international level it's already been contentious that's sort of rich rich any consensus that would press on international treaty on on that subject who is still in the process of you know um pressing an international treaty on exceptions and limitations for education and and library uh uses which are issues that have you know funded copyrights law for decades now and so the issue of text and data mining may not get as much as attention and even if it does it may not get as quickly as we want um needs to get it uh at international whether at WIPO or at the W2 you know if I can follow with follow up on that I think I think that was really good intervention I I must admit you know I think the fallback invariably is to talk about this from a global perspective and it's it's it's obvious that there's a need to take that into account there was a recent letter published by a number of AI pioneers and others calling for this sort of halt of chat GPT for now a chat GPT for and one of the responses that we've seen from particularly people in the sector is to say that if if all you do is stop what takes place within um the the within democracies or within you know the countries that are willing to to go along and basically seed the field to certain other countries that may not be willing to to play along that may not be as democratic that raises some real concerns about the evolution globally of where I AI heads um so I think there is a need for that for these global discussions I guess part of my question would be whether or not and this kind of builds on Ruth's point about the the frustration of framing so much of this around an exception more broadly whether or not this is you know we should be looking at these issues through primarily a copyright lens which if we did this at WIPO presumably that's what we would be doing and while I know of course that the TDMs are exactly that to me this issue is so inextricably linked to the other elements when we talk about AI policy whether that's around some of the biases that that can arise and the human rights related issues that come out of AI the competition related issues I mean there's just there's so many factors that come into a discussion around AI regulation that that that sort of the using copyright as the prism for trying to address some of these issues strikes me is probably pretty problematic not just because I think that that user public interest sometimes gets sort of drift in some of these for when it comes to to copyright despite the best efforts of people like Ruth and others to try to ensure that those those views are well represented so you know I do think there is I think there is an opportunity now I mean quite frankly I think greater certainty is the sort of thing that could be traded as part of an effort to try to develop these systems for greater algorithmic transparency stronger commitments around bias around the kinds of things that we want to see to to create some of the safeguards around AI and part of the the value exchange within that regulatory framework might well be greater certainty around access to the large language models but we need to be looking at these issues I think in a somewhat more holistic fashion that we that we bring in many of these different issues and I'm not sure that we've got an ideal for at this stage to to deal with those questions but it seems to me that's where the discussion has to go let me just say really quickly because I know our time is wrapping up here but one something to think about is that AI is a general use technology and you always regulate general use technologies quite differently think of the radio for example and think of the way we have handled platforms so when I mentioned sui generis treatment it's because we need to understand this data economy as functioning in a very different way or the potential to function in a very different way if we regulate it differently so AI policy is not going to be one policy it's going to be multiple policies AI in medicines AI in law AI in copyright AI in art AI you know in road construction it will look very different and the modalities for liability will look different the incentive structures will look different my challenge is that we need to be asking what can we do differently so that we don't replicate the problems that we've had with the DMCA with duration with monopoly use misuse of of these technologies that are governed by multiple legal regimes Wendy Gordon and I were talking recently and she mentioned something that I thought was that I've been thinking about and that is just saying something as the public domain isn't enough because you can always recapture it with technological protection measures with contracts with all sorts of different things and so we need to think about even the public domain what does it look like to say that something isn't in public domain when you have a technology that is kept that has the potential to capture the public domain and essentially regenerate it so that it's no longer the public domain and so our policy priorities need to be clear and I don't think we need to look for a uniform policy to address the possibilities that AI presents and the challenges that we have to overcome with it perfect thank you thank you so much and as you mentioned the issue also like what is the the legal framework and the regime of the outputs of those systems especially if we allow a full like you know free for all the texts and data mining is another question but maybe as a last question to the three of you because I've been discussing no many interests at stay at at stake sorry issue of like power payments money etc so when we discuss you know if we imagine maybe building a new framework there would be more or unlike data not just copyright not just privacy but like you know about data as a limitation to copyright framework to maybe privacy framework etc so when we're discussing barrier to innovation protecting the authors protecting the privacy of people so like so many things are at stake who should be around the table then do we need yeah I know it's like the very complex question for for the I'm sorry for this but like in a perfect world let's dream for for maybe a few seconds to end this conversation on like a lovely notes how could we make this happen if you had like the the power to make this happen I mean I would say pretty much everyone should be at the table um authors of works that are included in data sets should be at the table people generating data sets should be at the table people with data being used are being included in data sets which are like you and me everyone should be represented at the table users of data sets should be represented at the table persons who would be who would be infected by the use of such data sets should be represented at the table and um technologies should be represented at the parts of everyone is affected by all of these issues and um the perspectives of value the perspectives are important and it should be represented at the table roof Michael I agree with I mean seriously it's right it's clear these issues have such broad applicability and touch on so many issues noted you know AI for for the met for health is is different than AI and some other spaces but all of this is is part of that kind of dialogue that you're searching for it seems to be part of the starting point is to identify you know where it is where is it that we want to go with this I mean it helps if you're trying to set out on a journey in this case on a policy journey you know what what is the goal and I'm not sure that we that we've sort of come to grips at this stage fully with that you know I I know I have my own view that that is a somewhat more optimistic take on AI that I think this is truly transformative in some incredibly exciting ways and our challenge is to develop some of the safeguards and guard rails to ensure that we go in eyes open but recognizing that there's some real opportunities there and if if that's the driving or animating force behind this well then then we start having conversations around policies that allow both the the both the development of these technologies in a manner that that preserves equity addresses bias and the like but let's recognize not everybody I would come to that proverbial large global table with that same objective and I think part of the discussion needs to be around you know what where are we where are we headed here and I think recognizing that that both between sectors they may there may be different perspectives between countries there may be different perspectives depending on where they even sit in the on the development of AI and whether or not they see sort of more direct say economic benefit or other societal benefits or what they see are risks and being left behind and therefore some of the their priorities are going to differ. Since I have the last word I'm just gonna say two things one is I don't think there's one table and I think we should avoid that mistake. I'm not sure that copyright lawyers should be speaking to the application of AI technologies in the medical space and and what that regulation should look like and what it should look like in the global south versus the global west. I think there are multiple tables and what I'm hopeful we will find our opportunities for even non-interested parties. To be at the table we we start designing regulation with this fundamental flaw consistently in IP and that is to assume that the innovator always stays the innovator and the user always stays the user but we know that number one they're both one and the same often and that they switch in a dynamic fashion and so it's vital for all of us to ensure that the faces of the table represent different positions within the ecosystem of innovation and creativity so that's I think there are many tables and I think we should be intentional about creating many tables and ensuring that we're getting even the adversarial perspective right so that we're making sure that the policies we make are policies that are robust. Last comment I would make is that I also think that we need an institution AI is general purpose technology you know the text and data mining and the things that we want to encourage in order to generate more data are all about fueling the production of these productive assets so my view is you need something like the food and drug administration like some an agency that is responsible for advancing the policy dialogue but ensuring that the outputs are safe enough not without risk but safe enough that they don't undo our democratic political and legal intuitions about what it means to live in a society that is flourishing so I don't think it's just negotiations between interested parties and and everyone whose perspective matters I think we do need an institutional anchor that can help facilitate the work that needs to be ongoing about what makes for a productive society in which data and text mining and artificial intelligence and computing models all function in a way that ensures that our basic civil and political and legal virtues are kept and preserved. Perfect thank you thank you so much for the three of you for this amazing conversation I know that people have been also very engaged online so thank you for the participants for all the questions I hope I've been able to unpack and maybe clarify a few points for you all but I'm sure you're ending up this question with maybe more questions but I hope also like some answers and a new perspective so thank you all and because people have been asking in the chat I will just confirm that yes this was recorded we might like do a few edits for some of the technical glitch and then the video will be made available on on YouTube and we'll send it to all of you and a few attendees have been asking about maybe some additional readings of some of the scholarship that the three of you would have committed on the topic so maybe we will try to create a kind of like a reference list for people want to learn more about this and we will send this to all people sign up for the event so again thank you so much Faith, Michael and Roo for joining us today I think this was a fantastic conversation and I'm sure it's just the the first of the series of a global conversation on the topic so thank you again and have you all a lovely rest of the day bye-bye thank you so much thank you thank you everyone thank you everyone bye-bye