 I wanted to do any introductory announcement. Thanks. I'll let you take it. All right. Can I get a verbal my verbal. Yes, on my screen share. Yes. Awesome. Welcome everybody to today's session. Part of our webinar series through accessible technology services. My name is Dan Comden. I'll be joined later by Gaby DeYoung and Terrell Thompson. What I'm talking about today are alternatives to PDF and a subtitle or alternate title to this might be anything but PDF. So we'll be making the case for the problems that are inherent with PDF and then also looking at solutions. I learned a long time ago. You can't just show up with problems. You have to have solutions as well. I want to acknowledge my sons who often make fun of me. And some of my approaches to technology, even though I've been working with it for a long time, I will acknowledge that I do have a certain age component to some of the things that I'm saying. But I think in this particular instance, what we're talking about is applicable to all age groups. For those who can't see, it's a meme on the screen right now of an old painting, somebody working on a loom, and somebody younger working a couple of kids below them. The person at the loom saying, industrial age are ruining the country. And the child is saying, okay, loomer. So we'll talk a little bit about documents. What is a document? I think everybody has an idea of what it is. And just from the Wikipedia page, the pretty solid definition of this, but I want to highlight the electronic matter, a piece of written printed or electronic matter that provides information or evidence and it goes on from there. So we're going to talk about, of course, electronic matter and the different kinds of electronic documents. Sorry, that we work with while we're working in this online world. So we've got Word files, of course PDFs. We've got plain text and almost plain text, which are TXT and RTF formats, PowerPoint files. EPUB is an up and coming format. And then of course, there's all kinds of other proprietary file formats. We're going to concentrate primarily on HTML and why HTML sort of rules this space. And we're going to ignore audio and video documents for today. So I really want to concentrate on the types of content that we consume online. And primarily that is text. Text is still the ruler of the digital information space. We also have images which can be maps, technical drawings, charts, graphs and so on. And then we also have STEM content where we're dealing with things like equations and formulas and statistical information. All of this, of course, can be combined into the formats that we talked about. What I want to talk about today is sort of a hierarchy of the accessible learning tools that are available to us. So we're in higher education. We're talking about learning primarily or sharing information. And so what I want to talk about is this hierarchy with regards to promoting document functionality, which I call HALT PDF for short. And that I guess could be another alternate title of today's presentation. For those of us who work in the accessibility space, PDFs are a persistent, ongoing irritation, problem, challenge, what have you to everyone that's involved with either creating them, fixing them or consuming them. So this is based on my experience and observations over 30 years of working with different file formats, talking about things that are inherently accessible. This is assuming that these file formats are properly done. HTML really is the best structured Microsoft Word files are second. And then below that we get things like RTF, which has a little bit of structure, but really not much, or plain text, which has no structure. And then below, way below that we've got things like PowerPoint and PDF. And again, this is based on our observations of electronic documents, just not only in the canvas platform, but just on web platforms in general. So PDF is an acronym that stands for portable document format that I've seen and come up with some other ideas for what those letters can actually mean. It's a pretty, yet it's a dumb file. Probably doesn't flow is another one for the PDF. I think really the most accurate descriptor of what PDF is though, it's a print description format. It really is designed for documents that are going to make their way to a piece of paper. And a piece of paper is a very different thing than a screen or a monitor or even an audio file through or an audio experience through a screen reader. PDFs really are made for print and they're good at that. So don't get me wrong. I don't think PDFs are useless. I think they do have a place, but I don't think they have a place online for the most part, for the most part. So we're going to look at some numbers. And these are based on recent and historical interactions with our friends over in the disability resources for students office. They have a quite a large team that are involved with fixing PDF documents, making them accessible for their students with disabilities that they serve. And I wanted to call them out for working with us on generating some of this stuff. I will say that over the past year, of course, everything is different for everybody and it's been different for them in their office as well. So some of the numbers from the last 12 months are out of sync with what we've observed over the last 15 years or so. But some of the numbers are the same as far as the number of students that they're working with. So they're working with just over 1,600 students that are receiving services. I'll also point out that based on research and survey responses that we've seen here and at other institutions, we know that that count is low by perhaps as much as 50%. So we might be dealing with well over 3,000 students with disabilities on our campus. But we'll go with what's official now. So 1,600 that are registered and approximately 10% of those over the last year are requesting document remediation. And again, that number is down. But also just the number of students in general at the University of Washington, which I just found out is also down over the past year. So again, these recent numbers are a bit skewed due to the COVID-19 situation. But if we can go back, I've got some information from prior years and we have no reason to believe that the trend has changed apart from the situation of students not being on campus. So when I say the prior year, I mean from the summer 2020 to spring 2021, which just wrapped up to over 2,000 requests for document remediation through DRS. And the average number of pages is about 20. So we look at, we break things down into pages as we'll find out later from Gaby that's looking at a per page is more important than looking at per file. So just over the past year, we're looking at over 40,000 pages and by far most of those are PDF files. So keep that number in mind as we carry through on this. Going back to some numbers provided a couple of years ago, we see that there was a pretty much a growth in requests for document remediation. I'm not going to read all of these tables here, but I will say they just show that there was a growth. One of the first things that we look at for PDFs and whether or not they're accessible is something called text selectability. So can you as a sighted person use your mouse and highlight individual characters within the document? If you click on the page and the whole page becomes selected, then we know that that document is completely inaccessible because it's just a picture of text and not actually text. So the numbers skew a little bit for this based on our information from six, seven years ago, but we still see it's a significant number of things that are entirely inaccessible. And then going past text selectability, we want to look to see whether the document has structure. So whether there's tags or bookmarks that have been inserted into the document. And we see that those numbers are very, very low. So even for documents that are somewhat text selectable, they really, they don't have any structure. They're just a giant gob of text over two thirds. And there's no reason for us to think that, that has changed again. PDFs are expensive. And a lot of times the cost of PDFs is really not borne by the individual or department that is producing them. Somebody has to fix those. If a student with a disability, a print disability needs to be able to use their assistive technology tools to listen to the text within a PDF, somebody has to make that fixable. Right now that's viewed as an accommodation by the disability resources for students office. They're the ones that are doing this work. So essentially they're taking products that were made by other individuals and departments on campus and fixing them. And I think we're going to maybe have a whole side discussion on whether or not that should be the responsibility of a single entity on our campus. I would argue that the answer to that is probably not. Remediation of these files can really vary quite a lot. If it's a simple document with not very many pages, we're looking at about a minute per page to remediate, to fix, to make sure that the text is selectable and detectable and also has some structure to it. The moment we start dealing with more complex things like tables or images or math or science, that number quickly climbs. So the number of minutes per page to remediate can really go up. So looking at our 40,000 pages over the last year, if we multiply that out by the other research one schools, of which there are 130 in the United States, we're looking at over 5 million PDF pages that are getting fixed every year nationwide. That's just the R1 schools. So keep that number in mind when Gaby talks about remediation costs. So over 5 million. The number of public university colleges, two year and four year schools in the United States is about 1600 little over. And the total number of higher education institutions is recognized by the National Center for Education Statistics is nearly 4,000. So we could even go further out. But of course, enrollment is not uniform across those. Pardon me. So we've got these costs to fix PDFs. And one of the one of the sort of hidden costs is you need to have special software. I like this image of a Adobe acrobat reader that I found a while ago, which is an image of a gymnast holding a book and reading. That's your acrobat reader. That tool will not fix PDFs. If you're going to fix them, you have to have acrobat pro software. And that software costs money. The acrobat reader software is free, but not the software that will fix things. So then you're looking at either doing yourself or you're going to outsource it and outsource it on campus or outsource it off campus. That's really the question, but really the cost is time. Time. We've got a grumpy cat on the screen. Time is something I don't have for you. We're all pressed for time. I would argue that pushing all of that time off onto a single acrobat reader is perhaps not a reasonable approach to dealing with the PDF problem. PDFs also offer a risk when it comes to the office for civil rights or the department of justice. Dealing with complaints about accessibility and the inaccessible documents are a common theme. So we're going to go into many of the complaints that OCR and DOJ receives. And one of the first things they do when they're evaluating a university is they can go online and they can just look and see what is the online presence of that school. And so they're going to look at everything. And that everything does include PDF files. So we don't want that risk. Let's look at it from another direction. Let's not look at everything as a threat or risk. Let's look at just making the experience better. How do we read our content digitally? We're doing it on screens, right? We're doing it in front of a computer. And that computer could be like the computer I have here in my home office, but we're finding increasingly the computers that students are using all the time and prefer to use are their handheld computers or their mobile devices and laptops. And what is on the screen now are a couple of photographs. One is an image of the New York Times newspaper on a tiny little screen. And yes, many young people who have vision do have good vision and can read that small text. That ability doesn't usually stay with age, but it also is a challenge to read for just about everybody. A lot of students also use laptops. Not all those laptops have big screens either to view this. And we'll talk a little bit about some of the user costs of dealing with PDF files. So when you have to disrupt your browsing session to open a PDF file that contains primarily text, you've just lost all of your navigation. You've lost your browser experience by having to open up that other piece of software so that that navigation goes away. It's not impossible for a user to bookmark information in a PDF file, but it's very challenging. And it's not a feature that's built into the PDF experience. And that is by design. That file format is not made to be editable. Even editing text with the powerful Acrobat Pro software is not an easy experience. Adobe's just not done it. So the PDF format, which has been around since I think it really started appearing in the early 90s. So nearly nearly 30 years. Adobe still hasn't provided that functionality. And just in general, the company that was initially responsible for the PDF format. I will point out a lot of people think it's proprietary. It no longer is. It's an open format as of 2008. But the primary tools that are used to, to deal with PDF files still come from Adobe. As an aside, the Microsoft Word, Word for Windows format is only about five years older, but it has quite a bit more capability, just inherently as far as accessibility. So again, going back to the slide, the user costs. No bookmarks, no, no navigation. Unless the file creator has, has inserted the navigation in there in the term of, in term of what are PDF bookmarks, which are not the same thing as user bookmarks. So getting text out of that may or may not work. It really depends on how that PDF was created. Again, if it is an image of text. It's very difficult to get text out of it on the user side of things. In my mind, one of the biggest, most serious problems with PDF files is that the text doesn't reflow. So if you're looking at it on a smaller screen, you end up having to do a lot of horizontal scrolling, which is very, very difficult to do and retain your place in the document. I think that's, if nothing else, that lack of a reflow really is the stake through the heart of the PDF file format, that lack of reflow. And also a lot of people don't understand that students with disabilities, I mean, people say, oh, all you need is this extra reading software. Well, students with disabilities often are already using extra software and we're just, we're just layering them up either with browser plugins or additional applications They already need that. In the case of students with visual impairments, they're using magnification or they're using screen reading. Many students with other print disabilities that do have vision or using TTS or text-to-speech software. So that's a lot to ask to add on, pile on the cost of the PDF file. So all that said, students often still are asking, when they're asked what file format you want this fixed file in, they're asking for PDF files. And they don't know about other ways to get information. And so they're still asking for PDFs. And so it is a question of training for these students. And a lot of them will say, I know what I like. I would counter that with, they like what they know. And all of us have a tendency to do this. It's not just a disability related thing. It can be hard or even feel a little painful to make a change. But we need to be thinking about what the best experience would be. So we've got a rule for ARIA. For those who have been to some of our other webinars, ARIA is a tool in the HTML world for creating accessible Internet applications. And the first rule of ARIA is don't use ARIA when you don't have to. If there already exists HTML things work, that will do the job. And I would like to extend that and have that also be the first rule of PDF. Don't use PDF when HTML will work. Jacob Nielsen is a well-known researcher in the usability space. He's in an article going back, I think, to 1996. He talks about his first law of computer documentation, which is users don't read computer documentation. And then the corollary to that is if they do read it, they're not reading it front to back. They're looking for specific things. And doing that in a PDF file can be very, very difficult. Nobody really reads a manual like you would a book. So we do have examples of places where we can do user documentation. I'm going to point out. Let's see here. Stand by for just a moment. I want to make sure I've got this guide up. There we go. Go back to. So what's on the screen now is user documentation for work day. And we, we were able to convince early on before work day was deployed on campus that they initially wanted to do the user documentation in PDF files. And we convinced them that that was not the way to go. And so what we've been able to do is put this documentation in a place that's easy to find, easy to use. The text flows well, and it works well for everybody. So it's entirely possible to do this. And to do it successfully. And that's in the user guides in the integrated service center. So for further reading for those who are going to pick up the. This PowerPoint. After the presentation, I've got a couple of links here for further reading or further discussion discussion. This first one was, was written just last year by the Nielsen Norman groups. It's not stale the one below this where it talks about PDFs being a strange otherworldly out of browser experience is a bit older, but I think it's still relevant. And then also another recent document that talks about things more from a, like a commercial search engine optimization viewpoint is this one titled wire PDFs mostly awful. And what's the alternative. So how do we publish better documents? Well, part of it I think is getting some better training out there, which is part of what we're doing today. Making these, making tools to fix the existing PDFs better. We've got some good ones, Microsoft's, you know, ability to export a good PDF is, is well known as part of Microsoft word. So we need to make sure that people are using this. Do we want to, do we want to institute policies to discourage use of PDF? I don't have answers for that, but I think it's, it's something for all of us to think about, but really we, we want to talk more with our members of faculty because they're the ones that are putting a lot of the educational materials online and anybody who is supporting staff in the canvas environment would do well and Terrell's going to go into some detail on creating good canvas based content a little bit later on. So all this said you get the impression I don't care for PDFs and you would be correct, but there is a place for them. And I didn't want to sell them completely short. But if there is a document that should be printed, it's a great format for that. It really is great for something that's going to be printed. So things like posters and brochures or anything that requires an actual physical signature PDF file. So it's going to get printed out, it's going to get signed, maybe returned in person or by mail. PDF is an appropriate format for that. And of course there are some official, and I use that word in quotes or legal documents that are produced in PDF because there are expectations that these official documents look a certain way. And I'd like, I'd like us all to push back on that because I have been told that some of these forms are official when there's really no official reason for them to be official. It's just what people know and what they've been using. So I want to encourage folks to rethink what a document is and how we consume information now versus maybe 20 years ago. We really, when we're coming up with a report or a brochure or any kind of information, we want to design it from the start to be viewed on a screen. And we really want to get beyond this paper think idea that just seems ingrained in so many people. And young people as well, it's not an old versus young issue. But that's a lot of that is how they're taught is what is this thing going to look like when it gets printed? Well, a lot of this stuff is never going to get printed. It's only going to be consumed on a screen. And so we want to get rid of this idea of a page metaphor for our documents because the size of the page is not relevant. If you're viewing it on a handheld devices versus, you know, the situation I have here, I've got nearly three feet wide of screens, you know, those are two very different things. The reading experience is very different. So what is a page? The visual style of our information is not more important than the content in that text. I think it's really important for us to keep that in mind. So I'm going to stop sharing now. We're going to hear from Gaby who's going to talk about how to get information out of our PDFs as well as a little more information on what it costs to do remediation outside of the DRS office. Go for it, Gaby. All right. Thanks, Dan. Take a moment here. Share my screen. Okay. Everybody can see my screen. Yes. Excellent. So thanks, Dan. My name is Gaby Deung. I'm also a member of the IT accessibility team. And when we were prepping for this, we were trying to figure out like different solutions for, you know, what could we use instead of PDF? And so we thought about, well, we can convert PDF to different formats. And so I'm going to talk about converting PDF to Word. Terrell's going to talk about converting PDF to HTML. But one of the biggest misconceptions about PDF is that it can't be edited or changed. But it's actually quite easy to take information and edit or change a PDF document and just by exporting it into a different format. So that kind of blows that theory out of the water. So, but I wanted to kind of give you a little bit of information as to the methodology for turning a PDF document into Word. And then I'll give you a little bit more information about the findings when I've accomplished that. So essentially, what I did is I just did a Google search for the term convert PDF Word for free and took the top for search results and just had a PDF and just went through these different top search results and just see what that output is. So I would run them through the PDF to Word converter. And then I'd open them back up again in Word and run the accessibility checker to see if there are any errors. And then I also performed a manual review of the styles to see what we can compare. So the original document that I used for this conversion process is a completely tagged PDF. The image has alt text. The title is an H1 and the other supporting headings there are tagged as H2. Lists are tagged as lists. The table has a column header. And the document is identified as English. And then we've got one sentence there at the end that is identified as French. So the first tool that came up during the Google search is the Adobe Convert PDF to Word. And all of these solutions are all web based. I didn't want to download anything on my computer. I didn't want to give anybody my credit card information for a seven day free trial or anything like that. I just wanted to quickly find a solution for taking a PDF document and converting it to Word document. And so Adobe Convert PDF to Word was the first item that came up. And it's completely web based. And for all of these, it's just a matter of taking your file and dropping it into the web browser. But for PDF, it did actually require that I created an Adobe account in order to use the service. It's free. It's still free. You just have to sign up with your email or something like that so that Adobe can send you annoying messages about buying products. So I did that. And then I opened it up in Microsoft Word and I ran the accessibility checker. And in the inspection results, I got a warning. And this is true for all of the output to check the reading order of the tables. So even though the tables were created in PDF with a column header, that did not convert back in Word. So it did not, the table header did not stick when it was converted back to Word. The H1 was actually marked as a title. And I've actually included a screenshot in here of the styles, styles guide, which gives you kind of a visual representation of the structure of this document. So you can kind of see there. So the titles marked as a, I'm sorry, the heading, the heading level one was actually marked as a title. And then the heading levels two were marked as heading level ones, which can be kind of confusing. This was still marked as lists. The document title was still intact. And the French language section there was also marked as English. The second method or the second option that came up was simply PDF. And this was completely free. I didn't have to put in my email or anything like that. So I could just drag it into the browser and then it performed its conversion process. And then I downloaded it, ran the accessibility check. And again, it gave me the inspection results of a warning that I needed to check the reading order of the tables and make sure that the column headers were marked. Again, same thing for the H one was marked as a title. H two was also marked as heading one. So kind of a similar output to what Adobe had as well. The second search came up with a free PDF convert. And this is another one that's free. That does not require any sign up or email or anything. And when I ran the accessibility checker or the inspection results again, gave me the warning for the table. And this time, this is really interesting that the H one, instead of being marked as a title was marked as normal text. But the H two's were still marked as heading one list for marked as lists and the document title was intact, but everything was marked as English language. And then the fourth method was a product called PDF to Doc X. Again, this is another free one. And this produced probably the worst results out of all of them. The accessibility checker came up with the warning for the tables, but then it also came up with a warning for missing alt text for the image and the image was not in line with the rest of the text either. And as you can see from this particular screenshot, all of the contents for this output was marked as normal text. So essentially we have no structure for this particular output. So that would be the least as desirable of all of the outputs. So conclusions for this would be if you wanted to convert your PDF document to a Word document, and there are many reasons to do so. One of them being that PDF is only really supported on the Windows environment, not so much in the Mac environment. So the conclusions would be to use the PDF to Word converter from Adobe Acrobat. But if you didn't want to sign up and get constant annoyances from Adobe, you could use simply PDF as it does maintain most of the structure. But then of course you'll need to touch up your tables as those those table headers are not converting. I wanted to share a little bit more information about the cost of remediation. I'm actually in the middle of a pretty big project right now. We are working on making canvas model courses that are available to UW to kind of review and see what a model can this course looks like. We're in the process of taking these canvas model courses and making the content accessible. And I wanted to share with you one of the courses so you could kind of get a better idea of all of the work and the cost that is associated with retroactively making a canvas course accessible. So for this particular canvas course, it has an accessibility score of 57%. And you can see that there are 523 elements associated with this course. About 184 PDF documents. About 57 Word documents. And some other items there. And you can see that out of all of these documents there's about 189. 189 that have a very low score. And I believe most of those are PDF. That would that do require remediation. So I want to break this down a little bit more for you. So for this particular course, there are about 54 documents which equal the 311 pages. For PowerPoint, there were 87 decks, which included 1,918 slides. For Excel, there were 36 workbooks with 61 worksheets. PDF, there were 182 PDF documents for a total of 2,424 pages. Now, we have a contract with a PDF remediation vendor called Open Access Technologies. And I sent all these documents to the service for a quote. And we actually have a standard quote of $8 per page for remediation. But because there are so many documents, we got an even bigger discount, a volume discount for $6 per page for remediating all of these documents. And for some of the PDFs, they were quizzes. And they had some form elements or they should have had some form elements in them in order to be utilized accurately. And so there was an additional hourly cost for adding tool tips and form fields to some of these documents. And that was 62 through quarter hours at $25 an hour to add that additional information. So the total cost just for remediation of all of these documents is $29,852. And it took me about eight and a half hours to go through this Canvas course and audit all of the files to pull them down off of the Canvas course, collect them, and then put them in a shared folder so that I can share them with our remediation service. And just a lot of my administrative time just to collect these documents. And I'm not done yet. I haven't received the completed files from the remediation service, so I still have to replace them in the Canvas course. So there's additional time that needs to be included there. So, you know, so it does really kind of add up in terms of monetary cost and time for retroactively making a Canvas course accessible and remediating PDF documents. Now, I had this instructor from the beginning thought about accessibility ahead of time and created accessible content while printing this course together. It may have taken a little bit longer, admittedly, to get this course together, but it would have, you know, probably saved the university a lot of money and saved the university a lot of time had they put the effort up front into making the content accessible. So that's pretty much all I wanted to share with you. And I'm going to go ahead and turn it over to Terrell. Yes, baby. Just clarify, that was one course of several, right? I've forgotten how many were in the set, somewhere from eight to ten. Yeah, I think there's actually nine courses. Yeah, nine courses. And then those are, you're seeing similar numbers in the other courses too. So this is not just an isolated incident. It's a pretty common trend in our online courses. So we've got just a little over 10 minutes left, close to 15 minutes left. I just pasted into chat the URL of the archived webinar recordings. So we'll share our slides there as well. And you also have this recording. But I think I am going to go over. But I've got some good stuff. So I hope that you'll stick with me because it's going to be fun all the way down to the final slide, I think. But if you do have a hard stop at four o'clock, then this is being recorded so you can catch up later. Let me share my screen. So as David mentioned, my goal is to get to HTML because HTML really is the ultimate format. It from the beginning has had really good markup for structure. Headings have been there since the beginning. Alt text for images have been there since the beginning. So you were talking early 90s. HTML has been accessible. And in HTML 4.0, which was many, many years ago, decades ago, they introduced a bunch of new elements that just sort of set the bar, definitely set the bar for accessibility. So this is where we got labels and legends and field sets for forms and where we got table headers and the scope attributes and all the things that make tables accessible and much, much more in HTML. And HTML 5 is taking it even a step further with new semantic elements that are supported by screen readers. So ultimately, HTML is the best option. It works across operating systems. It reflows nicely. So all the criticisms that Dan had in the first portion of this presentation, HTML, you know, works. It addresses all of those things. And it's cross-platform, which even though we're spending, you know, so much money and so much time to make PDFs accessible, it really is a Windows-only solution. There is starting to be some support on mobile devices, both iOS and Android, for tagged PDF, but it still is pretty limited compared to what you can get with HTML. So I wanted to explore how do you get from PDF because we've got tens, hundreds of thousands, millions of PDFs out there. And a lot of the PDFs we're using in courses, in particular, are coming from third parties. And so we get a PDF from somebody, you know, because it's a good resource and we want to use it in our course. And that's the only format that's available in. So how do you get that and convert it into HTML? You know, is there a way to do that effectively? And so that's what I've been doing some research on, trying to find a good strategy for doing this. And I approached it from two perspectives. First, what is the best way to convert from PDF to HTML? So similar to what Gaby was doing with Word, but I wanted to get to HTML. And second, what is the best way then to get that converted HTML onto the web? And the three environments that most of our web content is delivered in at the UW are the ones that I focused on. Canvas, a Canvas page, a web page using WordPress or a web page using Drupal. So I started with the same original source document that Gaby started with. And I actually used two different versions. You've probably seen this if you've attended some of our other trainings. We use this document pretty regularly. And in its PDF form, if it's tagged, then you've got the tag structure that Gaby was describing. You've got an image that has all text. You've got heading one and heading two. You've got two levels of headings. You've got a list. You've got tables that have explicit column headers identified in the PDF tag tree. And you've got a document that's identified as English in one sentence as French. And so I looked at different methods for converting that to HTML to see which of these methods preserve the tag structure. First of all, I used Acrobat Pro DC. This is desktop 2020. And both Windows and Mac give you the exact same results. And export, the option is export to HTML web page. And you get an image in a separate file. So it creates a folder and puts all the assets in there and then links to it. And that's okay. It's harder to distribute that way. But that works. And headings are preserved. But one heading in this example is in the wrong place. It actually put textbook above introduction of physics course syllabus that reposition of those. The list was coded as an unordered list, which is appropriate. However, the bullets didn't work for me. And I tried this in both operating systems and came with the same results. So I don't know why, you know, the bullets did not appear, but that that was a visual flaw. Column headers are even though they're tagged as table headers as the TH in the BF, those are not exported properly. Those are exported as TD. And the document is tagged as English French content is not tagged. And the visual appearance is approximated other than it, it got those headings in the wrong place. But otherwise it more or less preserves the visual appearance using inline CSS. There's a lot of, a lot of additional CSS in the market in order to, to preserve the look. So okay, but not great. The second method was to upload to canvas. And when we upload to canvas, there's actually a question in chat while you were talking to me about how you gathered so much data. And I know you did a lot of stuff manually, but there's also is the accessibility reports and all canvas courses and it's available in the instructor menu. And that is made possible by a tool called blackboard ally. It's allies in the name of the product. Blackboard is now the owner of that product, but it works in multiple learning management systems, including canvas. And it does a few things. It checks the accessibility of materials that are uploaded into the course and provides instructors with feedback. But it also allows users to generate custom versions or alternative versions of everything that gets uploaded. And so if you upload a PDF, then users, students can download an HTML version or versions in various other formats. And when you do that, the image is recreated as part of the HTML document. And so it's not a separate file, which makes it a lot easier to distribute. It's using the base 64 image source attribute. And so it encodes the image and that is in the source code. So it's part of the HTML document itself, not a separate file. Headings are preserved. So it got the heading ones, right? Heading twos were all right. The list is coded as a an unordered list. It doesn't try to stylize the bullets. So you don't end up with those funky broken font icons. It just lets the browser render the list as it will by default. Column headers are correctly tagged as THs. And I think they even had scope equals call. So it got the scope attribute on there. Documents tagged as English. The French content does not tag. And so that actually was the problem. I found no solution. It sounds like a be found no solution to for the language issue. That's not communicated well across platforms. So it's kind of an isolated issue. If you have a multi-lingual document, then that would need to be addressed after you convert. But most documents probably are not going to face that unless you're. In a foreign language, this one. And again, the HTML output actually differently from what Acrobat created includes it doesn't include a CSS block that provides some styling, but it doesn't rely so extensively on CSS. And the thinking with HTML is, particularly if you're going to plug it into another platform, you're going to plug it into your canvas course, or if you're going to plug it into WordPress or the Drupal, you've already got a theme where, you know, for that, that context for that website. And ideally they won't have a bunch of inline styles that will just accept the theme and the document will plug in. It won't look like it originally looked, but it'll look like all the other pages within that website or within that course. And so that really is ideal, I think, to not have that kind of extra styling. But it does, you know, Ally does add a little bit of CSS so that you have some of the same styling that you had in the original. I also looked at another tool that we provide. This is available as a URL in the top corner, tinyurl.com slash uw dash doc dash convert. That's for our UW document conversion service. This is powered by Census Access. And it's a third party tool that we license. And the nice thing about this, the reason that we have it in place is because it takes DRS, disability resources for students, some time to generate alternative formats on behalf of students. And so this is a service that students can use or anybody with the UW net ID can use to upload a document, get it back in a wide variety of formats through email and just converting it to an alternative format so that you can access it more easily. And we do have a number of students who use this on a regular basis. But for HTML, it doesn't produce nearly the level of output that Ally does within Canvas. So now the output is readable. It actually will do OCR. So you've got to scan PDF. It's just a picture of text, no actual text. It will convert that to text. So the document then is scanned and converted. It may have some errors depending on how bad the original is. But there is text there. However, there's no semantic structure. It doesn't even make any effort to read the PDF tag tree that's under the hood. It just tags everything. Everything is essentially a paragraph. Now the table is tagged as a table, but the headers are TDs, not THs. And everything's a paragraph, all the headings and everything. It does bold headings if they originally were bold, but doesn't add any structure to them. So not a great tool for converting to HTML. Method 4a, there are lots and lots of PDF to HTML conversion tools. So if you do something similar to what Gabe you did and just do a Google search for PDF to HTML conversion tools, you'll get dozens, maybe even hundreds of results. And I didn't want to try them all. I wasn't quite brave enough to do that. I was afraid of what sort of malicious things I might be downloading to my computer. But I looked for kind of top 10 conversion tool lists from credible sources and compared them and found a few tools that were referenced in multiple places and felt that those were worth the risk to try. And what I found, I'm not naming any names here because I found that essentially they're all the same. They produce PDFs that are exact, if not, almost exact, if not exact replicas visually of the original really focused on preserving visual appearance. But they don't have any semantics. I've got a screenshot here of some source code from one of these documents. And the whole thing is divs nested within divs, nested within divs, very deep div levels of divs. Everything in the entire HTML document is a div that has an enormous amount of classes and inline styles added to it in order to make it look the way it looks. So my conclusion from that is the only way to get from PDF to HTML with tag structure in place is if you're using a tool that is specifically designed for that. So one that is focused on accessibility like Ally will get you where you need to get and am exporting from Acrobat gets you there as well but not one of these other tools that is not designed for accessibility. So that's if you start with the tag file. So recognizing that most of the PDFs out there are not tagged. They were not designed with accessibility in mind. How do we get to tag PDF or tag HTML? So we have the same document available in an untag format. So it is just text. The image has no alt text. Those headings are not really headings. They're just big bold text. So that's one of the things that we have to do is no underlying tag structure at all. So accessibility is not possible in the PDF itself. So now we're relying on the conversion tools to assign tags intelligently. And with Acrobat Pro if we export to HTML then it does create an HTML with tag structure and then it does add an alt text. So that's a good job. This is an area where this has improved over the years. The image it doesn't make any effort to intelligently assign alt text to that. As you probably have seen the science there is getting better where Microsoft Word for instance if you upload a document into a Word doc it will add an alt text using artificial intelligence. Not always great and usually it needs to be edited but in this case it just says alt equals image. Every image gets that alt text. The H1 and H2 in this sample document were correctly tagged. And so that your mileage may vary depending on the complexity of your document and whether I don't know what the algorithm is but presumably they're looking at text size and position relative to other text and things like that. And in this case it was able to intelligently identify the headings properly. The list was correctly tagged as an unordered list. The column headers that missed that one assigned them as TDs not THs and again no language attributes. And again similar to the original export from Acrobat we had a visual appearance that was approximated using inline CSS. Again identical results in both Mac and Windows. This is where Blackboard Ally really shines I think that if you feed it an inaccessible PDF it is able to intelligently convert that. It did not do anything with the image and I need to play with this some more. I'm not quite sure why that is but in the original when it was tagged it was able to take that image and encode it into the HTML document. In this case there was no image in the untagged document so apparently it's using the tag tree in order to understand something about the image which kind of surprised me but I don't know all the inner workings of the underlying structure of a PDF and so apparently that's a challenge getting an image out of an untagged PDF. The tags that are the headings on H2 correctly tagged. Again that textbook for some reason that heading is in the wrong place so that happens in multiple tools. The list is correctly tagged column headers are correctly tagged language again it misses that and again they're like the original Ally output there was some CSS but not as much CSS if you go from Acrobat but particularly what kind of sets this apart is the column headers in the table that was really a difference that Ally is able to do that and Acrobat was not and if the original document is tagged Ally embeds the image in the HTML file which also separates it sets it apart in terms of the conversion tools. Since it's access it wasn't able to do much with the tagged PDF and so I didn't expect it to do anything better with an untagged PDF and the results actually are identical so its method works consistently no matter what you feed it and it's not great there's no structure. So given all that the answer is my first part of my question how do you get from PDF to HTML the best way is from what we've just walked through is to use Ally ideally you start with a tagged PDF but if that's not available Ally still does a pretty good job of intelligently adding structure to the HTML document so that process then upload the PDF to a Canvas course go to the alternate formats menu and download the HTML version if you don't have access to Canvas then using Acrobat Pro DC is another option export to HTML from there again a couple things at least in the testing that I did where Ally does a better job but both of these could be viable so then the second part of my question here of my research is what is the best way then to take that converted HTML and plug it into your online course and I used for these tests I used the Ally output since I was sold on on what it was able to do so a good tagged HTML it's got all the right stuff or most of the right stuff one method so we focus on Canvas first of all if we copy the the HTML document open up the Ally converted document in your browser copy it then paste it into the Canvas rich content editor where you go in and you create a new Canvas page just paste that content directly into the rich content editor not in the HTML editor but in the visual rich content editor then what you end up with is all the HTML elements and attributes are preserved the H1 element is preserved even though H1 in our Canvas environment is not available as an option H1 is the title of the page which you assign in another field so ideally there shouldn't be an H1 the first level of headings would be H2 but if H1 is important you know in the original you want to preserve that then it does convert exactly as you copied it and you paste it it will save that H1 the base64 image if you're using the Ally version that includes that embedded image that base64 image is stripped out when you paste it into the rich text editor it actually is there, you can see it and you can check the alt text make sure that's good do all the things with image formatting that you can do with any image that image is stripped out then so I actually have raised the ticket I sent the ticket to help at UW and have been talking with the Canvas people our service owner at with Canvas is out of the office until after Independence Day but this has been escalated to Canvas to Instructure to see if it's even technically possible for them on their end to preserve that base64 image and so I'll keep y'all posted as to whether it's possible to keep that but anyway that is stripped out and so you would have to add images back if you use this method the tables have no border but those can easily be added so you paste the content and then you do some touch up and you probably have to do a little bit of touch up regardless of which method you use one really interesting caveat that took me a while to test this thoroughly and really track it down is that if you copy from any browser other than Firefox there are a couple bullets here of everything I tried copy from Chrome or Safari and macOS copy from Chrome or Edge and Windows 10 and then paste that into Firefox in either Mac or Windows in the rich text editor in Canvas then what you get is all the inline styles are added and preserved and so I showed the source code here that in everything that was in that ally doc where it did have some inline styles everything gets preserved and if you start with the Acrobat exported HTML file that has a lot of inline styles again it's the same thing that content gets preserved but only in Firefox and only if you copy from a browser other than Firefox so if you copy from Firefox and paste into the rich text editor in Firefox this doesn't happen it only for whatever reason Firefox seems to be taking that inline style content and preserving it from other browsers but not from itself and I honestly don't have a good explanation for why that's happening or whether it's a feature or a bug but it is happening I tested it extensively and it reliably happens so if anybody knows why I would love to talk further about that and hopefully it's not a bug that they're going to fix and this goes away because it might be useful however as Dan has pointed out and I'm convinced too the appearance really should not be a driving force here that ideally when you paste content into your canvas first it will look like all your other canvas pages not like a separate document and it's the content that really reigns supreme so adding to WordPress if you copy from a browser open up in a browser copy it pasted into WordPress and I did this with our accessible technology website we are using the hosted WordPress service from UMAC and this is the boundless theme it adapts the styles of the theme and so the inline styles may actually be there but they're overridden in most cases by the styles from the theme and the base 64 image is stripped out I have not talked to UMAC to see whether that could be preserved there so but it looks like the theme it's branded properly for the website and that really is ideal it should look like other pages on the site so it's not, I don't think it's important that it preserve the styles if you add to Drupal copy and paste how I converted HTML paste that into the Drupal editor and this is the default Drupal editor so I just used a clean copy of Drupal 8 nothing other than out-of-the-box Drupal and pasted and this was the one place where the base 64 encoded image is in fact preserved and it has alt text because the alt text was there in the original all in all of these cases the structure that's there is preserved inline styles are added a little bit and again the rule of Firefox and other browsers that relationship is true regardless of what you're pasting into and the catchier though text format when you paste into Drupal there's a text format field and the options there are full HTML basic HTML and restricted HTML restricted HTML you have particular tags that are allowed while others are not probably the chances that you have full HTML enabled are pretty slim I imagine most behind the scenes full webmasters are tightening that up a little bit but you do have to have full HTML in place for to preserve the appearance and to get that base 64 image and so my initial excitement may have worn off now that you know that but if you do have access to a Drupal site where full HTML is allowed or maybe that is conditional based on privileges and certain users have full HTML privileges then talk to your webmaster about that because maybe that could be opened up enough so that people can paste HTML documents including the images from those HTML documents so given all that the ultimate recommended workflow is to again ally so the first two bullets are the same as what I showed earlier allies are best bet acrobat pro exporting from that is the second best option then open the convert HTML page in the web browser copy it pasted into the rich text editor of your whatever content management system your learning management system and if preserving the original appearance is important then you need to do that other browser non Firefox to Firefox trick and then finally no matter what you do there's probably going to be some touch up required and so you check it make sure the headings are appropriate if not you fix the headings that's probably the most important thing is just make sure you've got a good heading structure that forms an outline make sure that lists are coded as lists and if not you select them and put the list button in the rich text editor toolbar to make them into lists make sure your tables have headers make sure your images have all text and probably in most cases unless you can get that base 64 image to be preserved then probably you're going to need to re upload images into the HTML page so so there may be a little bit of work on the the final as a final step to sort of touch things up and make sure that it's a good working functional document with all the information you want but that arguably is going to be a lot less work than all the work that goes into fixing an inaccessible P yet as you've seen from my co-presenters here so it looks like we didn't lose too many people we still have a crowd of 13 and probably it looks like some content in chat have others been monitoring that to see those actual questions or yeah are there any questions yeah there were a couple questions and we eventually got to them I of course had you had it backwards at one point but you know your slides corrected me and I made sure to point point out that the pasting has to be into Firefox which is just surreal it's like is this 1990 again what's going on yeah I wish I knew more and I actually still plan to continue this investigation and see you know what I can learn but I've never heard of this before so I don't know I don't know if other people are even aware of it if anybody at Firefox anybody on Mozilla is aware that school has this magic functionality alright we've got a little bit of time well we're way over time but I think the stuff that maybe and Terrell had to share was just fantastic and I appreciate collaboration once again any questions you can you can ask them here if you want to hang out for a little bit more or you can contact us directly there is a question there Terrell so in terms of students enrolled in that course that has PDFs and Canvas using the Ally tool could make those docs accessible it the potential is there it certainly does its best and arguably does a better job than other automated tools better than census access but it also has garbage in garbage out and this is all based on one document which is a pretty simple document visually it's really easy to tell what the structure is so whatever Ally's algorithms are not tested extensively with this document but imagine if we throw a bunch of you know much more complex ugly documents at it it's going to be sort of hit and miss it will be able to create accessible HTML out of those some of the time but it's going to be problematic other times and that's not a substitute relying on that is not a substitute for creating accessible documents from the get-go yeah better off really we need to go back to our content creators and say look why really press them on this whole why is it a PDF because if they think they're doing it because it's more secure or less changeable they haven't let's say an incomplete understanding of what PDF is it's not secure it can be changed and really in the interest of making things more readable easier to use more accessible if they're starting with a Word document pasting into Canvas from that Word document is really a great way to go it's native to the to the platform it's going to flow well on all kinds of different devices and it really is a superior way to present information a comment about STEM fields using tech and law tech yeah that's kind of a whole another world I will say that presenting math in PDF files is still a I don't want to say terrible can I say terrible it's not a good experience it's really not a good platform for sharing STEM content tech and law tech is much better for writing and reading it assuming people have the viewers for that Canvas does have some good integration for law tech and then we get involved with producing Braille from law tech as well so that actually renders quite well when we take that next step of producing bumpy paper or Braille documents actual physical Braille documents that's really outside the course the scope of this of this talk I will say that law tech is actually pretty good it's another markup that actually predates HTML also canvas now it renders if you can use law tech to author math equations and it then renders those in math ML which is becoming pretty well supported by assistive technologies so I haven't looked at converting math content that was actually something I saw an early prototype of Ally before it got purchased by Blackboard where they were doing math conversion so there I know there is some of that going on behind the scenes but I haven't really explored that fully to see how you know how successful that is overall but I think some combination of all these things you know could result in accessible math within courses without a whole lot of extra effort I know if you just type formulas directly into the rich text editor using the math the math tool within the canvas rich text editor then you get accessible math so that's one nice way to go there is a question about PowerPoint or Google slides in a more accessible format that is also non-editable I personally encourage people to set aside the idea of non-editable you can make it difficult to edit but the canvas instructor essentially is the one who gets to say what the source is but I you know I I haven't done a lot of instruction I don't know what all the ins and outs that are maybe that's a whole another presentation I mean if you want to make things non-editable typically what you end up with is also inaccessible can I go ahead Joe I'm sorry I just wanted to clarify that this is not a course it's more like what you're doing now you're going to send out these slides to everybody but what format are you going to send them out and is that format accessible we're doing something very similar with our webinars we do the webinar and then we send the slides out so and they're not the students since it's not on canvas do you have any suggestions in that circumstance we're not going to send these out all three of our decks are PowerPoint and they will run the PowerPoint accessibility checker and that will give us some tips if we overlooked adding all texts to some of these images and some other things that we need to look at we're not we're not concerned about the you know the content being non-editable or other people being able to use it and so that part of the question we can't relate to necessarily but just in terms of accessibility PowerPoint is a good format natively Google slides if you do ultimately decide you want to go to PDF with an interest in non-editable again you can convert from PDF as gave you demonstrated to other formats and so it's not really who could still get to that content but if you export from PowerPoint to PDF then it creates a decently accessible tagged PDF the alt text will be preserved you'll have a heading above each each slide is represented by a heading so it's really easy for spring users to navigate if you export from Google slides Google does not generate a tag for yet at all so for docs or slides so that definitely if you're going to export stay away from the G Suite Amy's got her hand up you still with us we are well over yes I'm here I'll try to keep it quick but I work with Joe so our questions are related but Dan on your hierarchy that you had of the most desirable to undesirable electronic document formats I think that PowerPoint was kind of down at the bottom close to the PDF and so that kind of raised bells in my head and made me wonder if there was something you know we should be trying to export our PowerPoint files to yet like a different kind of no that's an excellent point Amy thanks for catching that PowerPoint can't like Terrell just said it can be pretty accessible it's not a very accessible platform to work within for anybody who needs to create content but the content can be made very accessible again you know what is the value in that PowerPoint and would it be better served as being just an HTML file or set of HTML files so the problem is that so much PowerPoint is created that is terrible for accessibility because the analogy you can do all kinds of inaccessible things very very easily using that tool unfortunately Also Amy you and I have talked about this recently but we have done some recent research it's like how do you left what's up with that he kept us around for two yeah that's hardly sure the last two days anyway Hadi and Gaby and I sat down and looked at did some extensive PowerPoint testing just observing how he interacts with PowerPoint and essentially we would go into slide view mode and would view it as a slide show using a screen reader and the results with JAWS versus NBDA are completely opposite one another and so that we're working on gathering notes from that session and documenting what we found and probably filing some bugs with screen reader developers to try that for that reason I'm hesitant to say PowerPoint should be a high on Dan's hierarchy list and for through no fault of Microsoft's I think it's just this is technology support yeah I think the other thing that we are realizing too is just that these are like info sessions and group advising slides and we're trying to stay consistent with the UW branding and is also something that gets very difficult to maintain because you have to download and install special fonts and to look at to see them in their copper format and so students and prospective students aren't necessarily going to have those fonts downloaded so then it all looks funky and they do publish those alternative standard alternative fonts though that's down this page that's what that's what I use okay Joe has his hand up yeah and I'm sorry we're still talking about PowerPoint Amy and I but this is actually goes to a bigger point one of your arguments against PDF was that you have to have one more layer of proprietary software to read a PDF file the same as true a PowerPoint word and you know if you're a UW student yeah you can get that for free but we're dealing with a lot of prospective students outside the university so is there a way to avoid using these other proprietary formats like Word and PowerPoint I've been trying to but I've been trying to find a way to convert a PowerPoint to an HTML again this is a webinar so we are doing a PowerPoint presentation I can't find a decent way to convert a PowerPoint to HTML that they used to have it but they've taken it out do you have any other suggestions or ways to do that there are actually another great topic for another session I think but there are quite a few tools free tools for doing HTML slides it's an alternative to PowerPoint it's not exporting from PowerPoint but slidey I think was one slider.js there are a lot of tools that use standard HTML and then you add a JavaScript file to that that renders that standard HTML as an interactive slideshow and then you style everything using CSS I used to actually for a period of my life I used that exclusively for slides because nothing beats standard HTML with a little bit of CSS and JavaScript and it works in your browser but the challenge there was distributing it that people just expect PowerPoint so universal I guess and they want to copy my slides and if I'm using one of those HTML tools I have a bunch of files involved so I could just send them the HTML file which has all the content but it's not going to be rendered then as slides and it's not going to have the visuals from the CSS and so it just was less desirable from a sharing content perspective but just if it's going to be online that might be a viable solution I did for a while look at exporting from PowerPoint 2 and I know that I don't look at it recently but what they used to do in terms of when you export to HTML it wasn't good it was nowhere near the quality of HTML that you get just creating it from HTML and scratch and using a tool like slider.js to render it PowerPoint to PDF to HTML where does the madness end can we all just have one format world will probably be less interesting you know I love ebooks I'm on ebooks all the time that's my preferred method for reading books I still love actual print books but when I'm reading nowadays it's all so EPUB, Kindle Format, all that stuff works really well getting a book in PDF is a real soul crusher because that is no fun we are well over our schedule time thank you everybody for working with us it's always a fun conversation stay cool and the forthcoming hot hot hot coming stay cool thanks everybody