 Ladies and gentlemen, welcome, and thank you for joining today's Finding a Needle in a Haystack, and to find a wide FOIA searches at CDC Webinar. Before we begin, please ensure that you have opened the WebEx participants and chat panels by using the associated icons located at the bottom right-hand side of your screen. Please note all audio connections are currently needed, and this conference is being recorded. You are welcome to submit written questions throughout the Webinar, which will be addressed at the Q&A sessions of the Webinar. To submit a written question, select all panelists on the drop-down menu in the chat panel, then enter your question in the message box provided and sent. If you require technical assistance, please send a chat to the event producer. With that, I will turn the Webinar over to Alina Simo, Director, Office of Government Information Services. Alina, please go ahead. Thank you, Michelle. Good morning, everyone. My name is Alina Simo, and as the Director of the Office of Government Information Services at the National Archives and Records Administration, it is my pleasure to welcome all of you to our event today titled, Finding a Needle in a Haystack, Enterprise-Wide FOIA Searches at the CDC. I hope everyone who is joining us today has been staying safe, healthy, and well. Shortly, I will go through some basic housekeeping roles and steps and expectations for today's meeting. First, I would like to give you some background on today's event and how OGIS became involved. As many of you know, OGIS is the Federal FOIA Ombudsman, and in that role, we worked to improve the FOIA process in a number of different ways by reviewing agency compliance, by offering dispute resolution services to assist requesters and agencies, by chairing and managing bodies like the FOIA Advisory Committee and co-chairing the Chief FOIA Officers Council and more. In that role, OGIS has a unique perspective on FOIA programs across the federal government landscape. For the last 14 months, we have been watching with interest the impact of the pandemic on agencies FOIA programs. Just over a year ago, OGIS was pleased to host our first CDC-led webinar, FOIA Request for CDC COVID-19 Records. Once again this year, the CDC FOIA program managers brought our assistance to speak directly to all of you about how the CDC conducts enterprise-wide searches in response to FOIA requests. You will be hearing today from Srinath Tutukhuri, who is the IT Project Manager for the CDC FOIA program. Srinath is joined by the CDC FOIA Director, Lazar Ando, and the CDC Deputy FOIA Director, Bruno Viana. The PowerPoint for today's presentation is accessible on the OGIS website at archives.gov.org. We will also add it to the chat. Throughout this morning, we will be monitoring the chat function on WebEx. We are also simultaneously live-streaming on the NARA YouTube channel and also monitoring the chat submitted on that platform. We will be taking questions throughout the presentation, so as you think of questions, please type them using the chat function on either platform. Our plan is to pause periodically to check in and see if there are any questions that have come in via chat. And we will also open up our telephone lines on WebEx during those pauses to give attendees the opportunity to ask any questions orally. An important reminder with regard to your question. Please be aware that this is not the right time to ask questions about a specific FOIA request. We're happy to have all points of view shared, but please respect your fellow attendees and keep the conversation civil and on topic. We will do our best to answer all of your chat and telephone questions. If we do not get to your question, please don't worry. We will post any unanswered questions and answers on the OGIS website in the upcoming days. We are recording today's session and we will post a video and transcript of this event on the OGIS website as soon as it becomes available. I also want to take this opportunity to speak to those of you joining us from other federal agency FOIA programs. The CDC FOIA program has been proactive in communicating with our stakeholders using this venue. OGIS is happy to help any other agency FOIA program to host similar events. If you are interested, please send us a chat during today's event. Call us at 202-741-5470 or email us at OGIS at narra.gov. We look forward to hearing from you. At this time, I would like to welcome our main presenter today, Srinath Choudhury, who was also joined, as I mentioned earlier, CDC's FOIA director Roger Ando and CDC's FOIA deputy director Bruno Guana. Srinath is the IT project manager in the CDC FOIA office. He primarily takes care of managing the enterprise searches in addition to also being responsible for FOIA's IT infrastructure at the CDC. He has been in this role for more than six months, during which he has explored various tools and options to improve enterprise searches. During this presentation, you will present firsthand information on enterprise search process, the tools being used, potential issues, and finally, tips to scope search requests for optimal results. Srinath, over to you now. Thank you, Alina. Good morning, everyone. I'm Srinath Choudhury, IT project manager here at the CDC FOIA office. Today, I will be doing a presentation related to how we perform enterprise searches at the CDC FOIA office, in addition to the issues that we run at the FOIA office when we try to run this search. And finally, some tips and recommendations that we feel can help us get better search results and even probably take some advice and inputs from the user community and come up with better search results which will help everyone in the long run. Having said that, I would like to go to the next slide, which is agenda of this meeting. Before I get started with the agenda of this meeting, let me go over two important points that I need to tell. The first is whatever the capabilities that we have at the CDC FOIA office. The capabilities that we have are number one is we have access to search on all the email addresses within CDC's domain. That comes to around 5,000 to 10,000 email boxes. In addition to this capability, we also have another capability where we can search for documents on all the shared drives within the CDC's network. Having said that, we do have some limitations on this. The first limitation is that we cannot run an wildcard search on any of the mailboxes. And we also definitely need to take some mandatory approvals from the custodians of these mailboxes in order for us to be able to perform any searches. And second is the same with any of these shared drives too. We need to take the approvals and be granted access on the shared drives before we can search for any document and locate any documents if there are any. Hopefully this gives you an understanding that we do have limitations when we run the search processes. And we cannot just simply run a search on all the mailboxes at CDC. And we have limitations where we have to only run searches on restricted mailboxes and a group of mailboxes. We also cannot run searches on a whole division or a CIO if there are hundreds of people. Hopefully this gives a clear understanding before we can delve deeper into how the agenda of this meeting actually is. So the agenda of this meeting is being divided into four categories. The first category is an overview of ES, which is known as enterprise search. The second category is how we categorize this request based on the technical complexities. And the third is the issues that we run into when we perform this enterprise searches. And the last is what are the improvements that we would suggest based on the observations that we have seen when we perform this enterprise searches. And finally we will also have a Q&A session over this particular aspect. Before I move to the next slide, does anyone have any questions? Ladies and gentlemen, if you'd like to ask a question via phone, please press pound two on your telephone keypad to enter the question queue. Once again, pressing pound two will enter you into the question queue or you may enter your question into the chat box. So right now we have no questions on chat. So go ahead. Thank you, Alina. Let me go to the next slide. So the first category which talks about enterprise search overview split into three categories. The first category talks about the process flow. The second category talks about the tools that we use. And the third category is one of the most important features that we use to really find, you know, searches, which is known as DGIP and containment. Any questions on this three slides so far? Once again, pressing pound two will enter you into the question queue. All right. Are there any questions on the line and those questions in the chat? Sure. Let me go to the next slide, the yes process flow. So this slide pretty much goes using bird's eye view or an complete understanding of how the enterprise search process is performed at the FOIA office here. So even before we get an enterprise search into the technical team, the enterprise search is pretty much analyzed and wetted by the FOIA analysts to make sure that relevant information is present. The most key information that we need to perform an enterprise search is one, the custodian email mailboxes and the second is the time span. Without these two pieces of information, we cannot proceed with any enterprise searches. Just in case the requester has not provided as the custodian email mailboxes, our FOIA analysts contact the relevant subject matter experts and get us the relevant custodian email. The same goes to the time span too. If there is no time span, the subject matter experts provide us the time span information too. So once this information is provided to us, we analyze the search request to see if it needs any keywords. Sometimes the keywords are also provided by the requester. Sometimes if there are no keywords, they come in from our subject matter experts or if there are no keywords given. One of my two members goes through the request and gathers and understands the search request and comes up with the keywords to perform a search. Any questions so far on how does the analyze search aspect of the enterprise search? I do not see any questions on the line and no questions in chat. Sure. Thank you. Let me go to the next step in the enterprise search process. So once we have the keywords, the custodian mailboxes and the time span defined, we take this information and plug it into our primary search tool, which is the Microsoft Office 365 compliance. I will be going over how this tool, the capabilities of this tool in the future slides. But for now, we simply take this details or enter this information as features into this and put it into the Office 365 compliance tool, which is a graphical user interface, which pretty much hooks into the Microsoft Exchange server and brings us out all the emails which from the Exchange server. Once this information is available to us, we have some next steps that we follow. But before I go to the next slide, does anyone have any questions related to this aspect on Office 365 so far? No questions in chat. There are no questions on the phone. Sure. Thank you. Generally, the next step that we follow is we try to eliminate clutter. However, we do not eliminate any clutter because if the requester has specifically stated that they're not interested in the subscriptions or newsletters, we hold back and we simply perform the search. But if the requester has explicitly given us instructions that they're not interested in any subscriptions or newsletters, we try to eliminate the details too. I don't see one question regarding what is NUIX here. So NUIX is an 4 and 6 software which users are analyzing the data, which I will be going over in the next section specifically where I'll be talking about different tools actually. So once we perform the search and we get the necessary search results, we do, and if we have to eliminate any clutter, we go ahead and eliminate any subscripts and emails based on the records that we see. And then we do re-run the search. After re-running the search, one of our analysts goes ahead and samples the data to make sure that the results meet the expectations within the scope of the request. So we do capture some metrics regarding what kind of records have been captured for each keyword or how many records are coming from a single mailbox or different custodians and so forth. So we have all this information which is captured. And if the records are very less and we are certain that the search request is a simple request and there's not much of ambiguity in the results that we see, we go ahead and the next steps of exporting the data and preparing the data and finally presenting the data. However, if we see that the results look ambiguous and we have a lot of data, I present it to the analyst with all the necessary insights for them to make or probably contact the requester and make an informed decision and if they are willing to narrow the scope if required to come down with less than a number of records. And then we probably re-run the search to get some better results. However, if as and when we feel that the records are good enough, we go ahead and export this data into either a PDF and a doc format or a message or an email and sometimes see into a PST records. Occasionally even since the graphical user interface has some issues where we can't really delve much deeper into each record to understand and see if the records are that good, we sometimes take this data and put it in Outlook to get a better insight and awareness of how the data looks. Once we feel that the data is good enough, we go ahead and prepare this data. So the preparation of the data is where we talk about dedupe and containment. I have a specific section to talk about dedupe and containment in detail, but in a nutshell what the dedupe and containment does is that it eliminates lots of duplicate data and it cuts down the volume of the results by around 20 to 30, 20 to 40 percent based on what we have seen so far. But it helps us to make the records more concise that way it saves us the time as well as the request of the time when we are not presenting them duplicate information. And the last step is that we do present this data and we take this data and put it into the specified shape and also into our case management tool, which is known as FOIA Express. And from here on, we just pass on the page and back to the analyst and the analyst takes it over from here till he brings this case to a logical conclusion and finally closing out the case. And that is the overview, this is an overview of how the whole enterprise page process is performed from a technical perspective at the CDC FOIA office. So does anyone have any questions regarding the process flow here? No, any questions in the chat so far? Yes, I do see one question. I do see one. Yes, Srinath, go ahead. I was going to say you see the same question I gave, right? Yes. Yes, I do see one question. Have you ever had any email fail to properly input into FOIA Express? Yes, sometimes very rarely do we see, we run into issues where some of the emails fail to load into FOIA Express. In that instance, what we do is that we take the email message out and we try to reformat it into PDF manually. But yes, we do run into a patient once in a while, but it is not very prevalent. Any additional questions? Has that individual clarified more specifically they come in as their native format rather than the proper format? Maybe you could talk a little bit about the format that they're going to come into. Yes, so usually we come in and most of the emails come in the proper format. We don't get any format which are not non-English specific or non-Ascii specific. So we never really run into some of this issue. However, one issue we do run occasionally is that when emails are encrypted, it prevents those emails from being converted into PDF documents. So what we do is that we have to sometimes take those encrypted emails and probably even figure out a way of either going back to the requester to get that email for ours and de-encrypt that email and probably put it back into FOIA Express in a different format and resolve the issue. So hopefully that answers the question. All right, and there were no questions on the phone. Yeah, I do see another question. Can you repeat the prepared data process? Yeah, sure, definitely. So what we do in the prepared data process is that once we have all the data exported from Office 365, which is usually either in individual messages or it's an .psd file which can think of it as a zip file with all the different messages, we take this data and put it into a software which we use which is called the case management software of FOIA Express and we run this data into that software which is known as deduce and containment. When we do the deduce and containment, what really happens is that all duplicate messages are eliminated. So to give you an example is that let's say I'm running and search on five custodian email boxes and I have sent this an email to five people and five of them are CC. During the deduce process what happens is that it only takes one email, one unique email rather than the five emails. That way it saves us four records which are eliminated from the total set. And when I talk about containment, what it really means is that let's say there is a conversation between myself and another user and we have 15 different emails going back and forth. What the containment process really does is that it eliminates all the individual emails and gets the last email of the email chain. So what really happens is that it saves us from having to go through the individual email and it eliminates the emails which are contained within the final email. So typically what we have observed as I said in the past is that the deduce and containment process reduces the record volumes by 20 to 40 percent while making sure that the scope of the search results is still intact. Hopefully this answers your question, Adrian. Yes, for deduce and containment we can use different software. We use Office 365 sometimes does some deduce process. Needs can do deduce and outlook can also do sometimes we are capable of eliminating some records but we primarily use the FOIA Express for containment actually. So to answer your question, yes, all the records we definitely go through the FOIA Express software for the containment and it's extensively used. For every search goes through the containment. As long as the only exception would be that we have five or ten records and there's no necessity to really do the containment process. Let me go to the next slide. And I did go over some of the tools that we spoke about during the previous slide but I can go over each of the different tools that we use here at the FOIA office to make sure that we are able to get the best results out. So any search that is performed at the FOIA office primarily first goes through the Microsoft 365 compliance tool. So the way it runs is that as I said this tool has is a graphical user interface which is a web-based interface which has filters to perform searches. The different filter options that we have are it gives an ability to search on keywords then the subject of the email, the recipients of this email, the participants of an email as well as who the sender is. And the most important aspect is running the search on the custodians of this email and finally a date range. Because the custodians and date range we do not do any search because it's going to be a wild goose chase and we with the search can take forever and we're not going to get any productive results actually. As I said the tool is very simple and it gives us insight. It's the first step for us to really get all the data. And based on our observation if the scope is well defined and the record count is less we do get our records are much more precise and we feel if we are confident at this level we just go ahead and keep running the search in the records in any other tools. However if we do get lots of data and we feel that the search does not look that great or the results can be ambiguous at times. And if the record count is less than like 100 or 50 records we simply take the records all this data in a PSC file and quickly analyze it in the outlook. And that is then a very quick way of looking at the records for us to analyze if the data looks good or probably eliminate any records which are not needed. And usually we stick to we seldom go to the outlook process but if required we do it if the record volumes are less. And the next step is that we do not really go into FOIA access to do any searches. And what we do is we have been using this 46 software to call Munich and this software has higher capabilities than the Office 365 software as well as outlook. And it can really provide much more insight into the records and it is capable of doing some containment and additional deduces which Office 365 process fails when the record volumes are much more higher. And it gives us an insight into the data and helps us understand if the record clip counts. So what we see is that once we have like a big set of records of around 10,000 records and we run this in Munich, it cuts down the data volumes and it's much more precise. And it gives us a lot of options to do insights like it groups the data based on subsets, groupings based on topics and it also gives us different domains and how many emails are coming in from each domain. And then it also gives us it also gives us a better ability to like cut down. Let's say the user feels that I'm only interested in all the email sent from CDC. It helps us to narrow down this record. So it has different additional search filter options which are not available in Office 365 and we do use this tool on an as needed basis. But it is definitely a powerful 4-in-6 software where we can run pretty much data. We can pretty much run and analyze a lot of data, not just outlet emails but also even a lot of hard drive based data and a lot of documents and so forth. Any questions on this so far? On the tools specifically. I do not see any questions in the phone queue. Remind ladies and gentlemen if you would like to give into the phone queue. Pressing pound two on your telephone queue that will enter into that queue. I see no new questions on the channel. Thank you. I can move to the next slide. This pretty much conclude. Okay, sorry. Let me talk about this. I did speak about it a few minutes back. But I can present a slide which has captured from the FOIA expert software which gives us an insight into how the video went actually. So in this instance we had 900 records which were captured after running the search through Office 365 as well as the new software. We knew that this 903 records could be condensed further and as I said the due process typically brings down the records by around 30%. So when we did run this 903 records through the due process in FOIA express what we found was that our interest is primarily on the green slide green bar here. We want to see how many records it comes down to. So from 903 records the records were condensed to 630 records here. And the next thing is that it also gives us a number of records which were eliminated as part of the containment process. So in this instance 270 records were eliminated as part of the containment process. It is not able to eliminate any duplicates. The reason is because all these duplicates were already eliminated by Office 365 as well as new age. So we were able to condense the record volumes for 903 to 630 which translates to a reduction of around 30 to 35% of records. Just to give you an overview each record on an average translates to around four pages. So in a perspective 600 records roughly goes to around 2500 pages of data which needs to be analyzed by the analyst again and presented to the final requester. So the containment process helps us greatly actually in reducing this volume of records while keeping the scope exact and phase time for the analyst as well as for the end requester. Any questions on this? On the D-group and containment? There are no questions on the phone line. No chat questions. Thank you. Thank you. I can move to the next slide. Yeah. So that pretty much concludes the first section of the overview or the bird's eye view of the enterprise search process at CDC. So the different aspects that I did cover were the process flow, then the different tools that we use and also the review process. Let me move to the next section which is how the technical team categorizes the enterprise search process. I would like to make it clear that we also have another categorization on the administrative aspect of enterprise searches. So I will not be going into that aspect. Here the categorization is primarily limited to the complexity that is involved from a technical aspect when we try to get search results. So the first I have categorized the search into three different categories. One is the low intensity. The next is the moderate intensity and the last is in high intensity search. I would like to go to the next slide where I will be talking about low intensity search. So when I say low intensity search, what it really means is that the search is a very simple to perform and we are absolutely certain that we are getting the right results and we can very quickly get the search done and close out the request in a timely fashion without any issues or without going back and forth with the requesters. So and as I slide the picture here, keep it simple. So generally when the requesters try to keep it simple, we know that the search is a very low intensity search. So what is really a low intensity search? I have placed a few attributes which primarily define what a low intensity search is for it. So we have the custodian mailboxes which are defined. So when I say custodian mailboxes are defined, we know that the request we have to run a search on a few of a few custodian mailboxes could be a director or an assistant director or probably the head of a division and things like that. So it's very clear on who we are running the search. The next is that we also have a very short time span. It's a very important thing with the time span actually because the shorter the time span, our results are more accurate and more in line and in sync with the scope of the request. So if it's a two-week search or a three-week search or a few days around an event, the search results are very precise. And the next is the number of participants. So if we know who the participants are, let's say we have a very limited participants, like a discussion between a few individuals, four individuals, five individuals, then it really helps us to narrow down the search results that makes the search results more accurate, actually. And the last is not having any unambiguous keywords. When you say unambiguous keywords, we don't expect a keyword like run a search on COVID or run a search on autism or run a search on AIDS or such. So the searches could be very vague and we could get tons of records, actually. That's what I mean by unambiguous keyword. Any questions on this so far? We have one question on the chat. Do you use NUIX as your primary method to deduce your records? And or do you think this is a more efficient way to deduce compared to the ADR slash EDR tools within FOIA Express? We do not use NUIX to primarily deduce records. So we do use deduce for deducing, but the first type of deduce always happens at Office 365. And if anything is missed out during the deduce process at Office 365, it is captured in NUIX. And by and large, NUIX does a good job at deduce. So we do not see any deduce happening when we do the containment and deduce process in FOIA Express, actually. But FOIA Express does an excellent job with the containment process. And probably NUIX also has an ability to do the containment process, but we haven't figured that out. As I said, we only started using it in the last three to four months. Does that answer your question? Okay. If there are no questions related to the low-intensity search, I can show an example of what a low-intensity search is so that it gives an understanding of what I really mean by low-intensity search. Next slide, please. Here is an example of a low-intensity search. And I'll do like 15 seconds for you to go ahead and read the content of the low-intensity search. All right. So the request is here as requested for online communication between the CDC's director and the office of the vice president, Mike Spence, between September 10th and October 1st. So the time span here is very short, 20 days. We know who is the custodian here. It is the director of the CDC. And we also know who are the participants here. So the participants are two people. One is the director of CDC. And it could also be anybody from the office of the vice president. It could be a secretary or anybody sending those emails to us. So we have our mailbox defined. We have our dates defined. And there is no necessity to do a keyword search here. And all we do is that we make sure that the participants are anyone from the domain of the email domain of the vice president. So in this instance, all the participants would be or it could have a domain address of ovp.up.gov. So that would be an partial content or an suffix within the email address of any email coming in from the office of the vice president. So that pretty much will give us a very concise results, accurate results. And from here, we can just take those results and go to the take those results. And if the record count is very small, we just take those records and we run them through the deal and containment process within FOIA access. And we're able to get the results out in a very short time span. And the analyst is able to close out the results. So what I mean to say is that it's a very simple search for us because the scope is very clear and there is no ambiguity. And it makes it very easy on us to get searches done. So if our requested communities can provide us searches which are very specific and very low intensity, it helps us in the long run. Any questions on this example? I do not see any questions in the phone queue. Nothing on the chat. Thank you. Thank you. I can move to the next slide. So when I talk about the next slide is about moderate intensity. So when I say moderate intensity search, what I really mean is that sometimes the custodian mailboxes can be defined or may not be defined, but we have a way of figuring out who the custodians here are. And the participants may be known sometimes or they may not be known, but typically in a medium intensity search we could have more number of participants too. And we may also have to run searches on group mailboxes like event-based mailboxes or response-based mailboxes and so forth. And the search is not specific to a particular keyword. It could be a phrase search or it could be a combination of keywords that need to be searched on. And generally the date range is larger or it could be in a much longer date range at a time span. So when I say moderate intensity search, what it really means is that the record count is much more higher. Typically it is within 100 and 1000 of records. But we do know that we can definitely get the records here, but it involves some work on our end before we can really pinpoint and nail the accurate results which are relevant for the scope of the request. Any questions on this moderate intensity search? There are no questions on the line? No. Thank you. I will go over an example of moderate intensity search. Probably I'll take few more minutes to really explain that example in much more detail. Next slide please. Thank you. I'll give you 15 seconds for you to go ahead and read the content of the search. All right. Let me get started with this request. The request relates to a news reporter who was interested in finding out the investigation that CDC had performed related to an incident where a few people were infected with COVID when travelling on a bus from Milwaukee all the way to Texas and apparently, unfortunately, an individual passed away too. So the event details, if you look at the email face, it's around October 13, 2020 when the event happened. So in this instance, we do not have any custodian mailboxes to search on, nor do we have a time frame on where to perform the search. However, the only thing that we have from the request is we are able to pick up the different keywords. One is COVID-19, then it's a commercial bus. It has reference to a particular company, a bus company, El Toro, El Toro of Nido, then an air place, which is Teneca Foods and some of the different stops like Laredo, Chicago, Wisconsin and so on. So we do have sufficient keywords to probably even start off with the search here. So what happens here is that our analyst goes to the probably contacts the relevant subject matter experts and was able to identify the people who really performed this investigation. So we did get the custodian from them and they're also provided as a time frame on when to perform the search. So we had the custodians now and we also had the time frame. So when we, any questions so far on this? No questions on the line. No questions? Yeah, thank you. Sure. Thank you. Please to the next slide. So we did take the keywords, the custodian mailboxes and the date range and we did perform a search on the Office 365 tool on the extension server. So the way we did the search was that we had to come up with an concatenated phrase or come up with a set of keywords where we had to use either an Boolean search of and or or probably to draw some results and do some analysis on it. So what we really did was that we said, let's do a search based on bus. And any of these keywords Milwaukee or different places here, San Antonio Dallas and so forth. Or it's a motor coach and any of the different places here. Within this particular date range. So based on the search that we ran, we were able to get around. I think it is. I don't remember the exact figure but it is a few hundreds of records. And I believe we had probably around three or four custodians who did on whose mailboxes we had to perform the search. So we came up with around four or five records and we were quickly able to ascertain that yes, these records look in sync with what the requester is looking for. I usually do or we do some sampling of a few records here and there to see if the records look relevant and the keywords look relevant. So in this instance, we were able to identify that within the subject of each email, which says COVID-19 bus contact or land conveyance, bus investigation, those things we can highlight in yellow show those particular keywords being captured. And we also did see some emails, which were like newsletters and news articles reports and things like that, which were news articles, which were not really relevant to the investigation. So we had to eliminate those, we call them flatter because these are null emails, which are not really relevant to the investigation. So we had to eliminate newsletters as well as subscription emails and things like that. And we were able to cut down some of those records. But still we had a large number of records and we did know that Office 365 can sometimes be a little haywire here where it's not really accurate. Office 365 generally is not going to be very accurate when we have multiple keywords and we have a combination of phrases and multiple custodians. So that is when we really need to go to the next level. So in this instance, after eliminating most of this noise-based emails, we took this record and we put them into NUICS. And NUICS did a much better job of eliminating some records, which really didn't make sense. Because when we were doing phrase-based searches and combination of keywords, it did come up with records which were not relevant. So it cut down some of those records actually. So that is what a moderate intensity search is. And NUICS also gave us a lot of insights as well as topics, it gave us groupings and topics. But in general, we were confident that the records that were coming out of NUICS and what we were seeing was in line with what the requester was looking. But this definitely needed much more effort. It was not a simple search. And we needed to make sure that these records were relevant. And we just went back to the requester again based on the insights that the analysts have provided. And once we got the necessary approval, they moved forward with the duty and containment costs, which again reduced around 30% of records. So this is what really a moderate intensity search looks like. So when I say moderate intensity search, the characteristics are it has lots of mailboxes. The record volumes run into hundreds. And we definitely need to run this process through multiple software. And we definitely do some analysis. And it's more than likely that we have to go back and forth with the requester before we can finalize the records. And any questions? I do see the three questions on the chat. So let me try to go with the first question. So the first question was, are you referring to e-discovery when you're using search tool in Office 365? Is it an e-discovery tool? Are you referring to e-discovery when you're using the search tool in Office 365? Yes. It is the same thing. We do use the e-discovery to the left and right. Okay, great. And the next one, I'm not sure if you can answer or if this is going to be one from Roger Bruno. Could you speak briefly on how subject matter experts and potential custodians are determined before creating a search query? Thank you. This is what I can take that question. So depending upon what the scope of the request is, CDC has set up an emergency operation center to handle the coronavirus pandemic. And they have teams that are set up to address specific aspects of the pandemic. So you have, for example, folks who deal with the vaccine and the folks who deal with the no-sale order virus group. So depending upon what the request is about, if it's COVID related, we would send the request where in this situation we have no custodians provided because probably the request doesn't mean no, who are the custodians are. We would send it to the emergency operations center and say, please give us the names of the folks who will be involved with this topic that this request is interested in. And so they would then identify either custodians or a particular mailbox that's being used by a team that would reasonably contain the records requested. Does that answer the question? I don't want to see anything else in the chat, so I think we'll send yes unless we hear something more. Thank you. Oh, I'm sorry, there was a follow-up. Sure. Thank you. And for non-COVID topics, is it the same process? For non-COVID topics, the same process, but the process would be we would identify the program office within CDC that is likely to have a responsive record and send it to them and say, we've received this request. You can provide the documents, and if they want us to conduct the search, they would provide the names of the custodians whose email is boxed with search against it. Thank you. And if, for example, if, and sometimes it might come back and say, I'll give an example of where the requester doesn't identify mailboxes, but he identifies the whole group. So a requester comes and says, I want you to search against search for all employees within NCRD's emails for this keyword. Well, like Zena said earlier, we can't perform the search against email boxes for just a particular program office, right? So later you're asking us to search against custodians for an entire program office or division. We're going to come back to you and say, you're going to have to limit it. So if you can't limit it by name, you're going to have to limit it by a topic enough that they will identify who are the folks who worked on this particular subject matter. And then they will provide a list of custodians. Thanks, Roger. No other questions? Sure. Thank you. And if there are no questions, I can move to the next slide. So I'm going to talk about in high-intensity search here. And if you look at the picture there, the person there is me who is bald and lost his hair because of the type of request I got. I'm just kidding. So typically what happens in high-intensity search is that we do not... The biggest characteristic of an high-intensity search is that the scope of the request is very vague. And we run into probably thousands of records, sometimes 10,000, sometimes 20, sometimes 30, and as seen, search is going to 80,000 records. So why do we get such kind of records, actually? So if you look at the characteristics here, we don't have a clear custodian mailbox defined sometimes. We have too many custodian mailboxes defined. And the next is sometimes we do not know the participants in this email conversation. So when we do not know the participants, it's possible that there could be a discussion with so many people on this particular topic. And it could go to any extent that it becomes very difficult to identify which emails are really relevant to the scope of this request, actually. And sometimes we also do not get any keywords. So we have to frame our own keywords. And we have to come up with keywords based on the request. And most of the times when the high-intensity search starts, it's an unknown, unknown for us. But looking at the request, we can say that this is probably a high-intensity search because once we run the search through Office 365 and we see all these different volumes of records, then we figure out that, yes, this is going to be a wildest chase where we're not going to get too many records. And what really makes it more complex is sometimes we have requests which we have to use Boolean searches like OR and ANT to concatenate the search results, and the records can be very given. And finally, even having too many attachments in the emails can also complicate. Sometimes we see slides, presentations which have different words which absolutely have no relation to the scope of the request. So that is what in high-intensity search really I'm talking about here right now. Any questions related to the high-intensity search? Is there any other questions on the far? There's one. No, I can't see there. I just wanted to add one thing. At least from my experience with these high-intensity searches, I think Srinath was pretty generous when he said that a record advertises four pages. It could be one payment, one page, or it could be as much as five or six pages. So we talk about an email string, and that is just the email string itself without including attachments. So if it has three or four attachments, and these attachments are always four pages, you see how that establishes into being a lot of records just on its basis. So sometimes when we say we located 5,000 records, that doesn't translate into pages. It could be 5,000 pages. 5,000 records could be 25,000 pages. It all depends upon what the size of the record is and how many attachments it contains. And what we've had with some requests that would say, at least we've agreed with them, is remove the attachments. So they just want the raw emails, and then they would come back, we would negotiate and say, you can always come back and ask for a specific number of attachments after the fact, and that could also help with us being able to process your request much more timely if we don't have to process the entire record, including attachments and everything else. So that's over. Thank you very much, Roger, for reminding me of that issue. I forgot about it. Thank you. If there are no questions, we can go to the next slide. I will pause for 15 seconds so you can read. Everyone who can read this request is an example of an high-intensity search here. So we had a requester who was interested in all responsive records related to procedures, guidelines, and discussions that happened around coming up with a guidance on wearing face masks to slow down the speed of COVID-19. In this instance, there were no specific keywords given to us, and we had to come up with a set of keywords. The next is that we had to identify who are the custodian mailboxes, as Roger has already mentioned. We get that information from the SME who gave us the guidance on one of the custodian mailboxes. And the next is the date range which is also coming either from the requester or it's going to be given to us by the SME. So in this instance, the keywords were identified as face masks, face coverings, respirators, and magnifiers. See, these were the different four keywords that we were given. So when we do a search here, what it necessarily means is that I have to locate records which are either a face mask or masks, face covering or face covering, respirator or respirator, or even respiratory, anything. So we do specific searches as well as specific searches, and finally the N95 masks. So we had to concatenate at spring to come up with the keyword searches. So I can... Any questions so far? Any questions on the line? I can move to the next slide which shows the results of the search results, actually. Based on the search that was performed here, we had come up with 22,877 records. So I'm only talking about unique emails, actually. It's not the number of pages. So if we do... And these were the insights that we got based on the preliminary search that we did in Office 365. And for math, it came up with 12,000 records. A respirator, it gets 12,000. Face and cover was 6,000. So the total records were around 22,000 records here. And just keep in mind that these emails were only emails sent by this core user which were like high-level officials within CDC. We only had four high-level officials. And we are not even talking about emails which were sent to them. So the volume of records here was very high, just 22,000 records. And looking at these results, I do know that probably since it was four mailboxes and that you... Containment, I could probably come down to a limit 40% of the records here. And the search analytics I'm providing here is only primarily at the Office 365 level which probably is around 80% accurate at this point of time because I see so many records. So probably if I were to run this record through leaks of a dedupe and containment, I would probably come down to less than 10,000 or less than that. I cannot determine the records, but it's going to be less than that. But still, that's a huge number of records. So 10,000 and even five pages for each record could translate to 50,000 records. So I don't think it is humanly possible for any of our analysts or even the request to go through the 50,000 pages of data and digest this information and comprehend this information and come up with some reasonable analysis. So at this point of time, we simply make a determination letting the analysts know that this is going to be... The scope is too broad. We definitely need to narrow down the request. And these are the insights that I see and these are the keywords that I see. So if the requester wants to make a determination on how the insights look, I go ahead and share all the information with him. So the requester goes back to the analyst and tries to narrow down the scope in this instance. However, in some instances, let's say we come with the 7,000 or 8,000 records and there are like lots of mailboxes, like the custodian mailboxes are 15 or 16, then I do know that there's a potential for a lot of duplicates. So in that instance, we do go through the dedu process and in probably run through new links and if it's less than 1,000 records, then probably we do the shot and we try to go through the ultimate steps to prepare the data actually. Any questions on this so far? So I have one technical question. Someone asked, if applicable, how do you use NUICS and UIX in high-intensity searches? Sure. So the way we use NUICS in high-intensity searches is that after the records have been filtered or we get in first date of data from Office 365 search, which is the eDiscovery search, we take the whole data as a PSC file or even individual emails, it's in 1,000. So if it's in the 10,000, we don't take individual emails, we take the whole PSC file and we take that data and put it into NUICS and we run the same search terms that we ran in Office 365. NUICS does a better job with Aliminia and it has a mechanism to eliminate some records which are probably missed in Office 365 and it brings down the number of records count down. That's the first thing. And sometimes it can also eliminate some duplicates. So we definitely see a reduction. It just depends on the number of custodian mailboxes and the number of keywords and so forth. So it's undeterministic to say how much person with NUICS can eliminate which were not eliminated by Office 365. The next step is NUICS has a much more higher analytical capabilities where it uses analysis based on which mailbox has a lot of emails being sent or which domains or which organizations are sending all these emails and which email address is sending as emails, which are in the CC, which are in the DCC, which are in the too. It also provides us like insights on around which date or which timeframe do we see a lot of emails going out. I will call them heat map kind of things for that analysis of today. And it also separates the records. It uses subsets of data saying that, okay, if you're trying to do search for a mask, face and face mask, for these three keywords we see around 500 records. But then it has its own way of analyzing groups. So it uses different subsets of groups actually. So that it uses all the analytics, it provides sufficient analytical information for the requester as an analyst to make an informed decision to narrow down the scope, is all I can tell you. So that is how we primarily use NUICS in high-intensity search. Thank you. I have another question that might be better for Roger or Bruno. Someone on YouTube chat asked, do they need to request all related attachments if they want attachments? So I think that means would you assume attachments unless you hear otherwise or do people actually have to specifically ask for attachments if that's what they want? Great question. Unless you say you don't want attachments, then your request, the search would include attachments. So the default is yes. The default is yes. The default is yes, unless you say no. Great. And then the second question came in, and it has to do with records retention. How far back do archives go for file search? Do they follow records retention schedules and get destroyed on a schedule like paper files ordinarily would be? And do searches recover files that have been deleted by individuals? No longer needed, not required to be retained? Long question. I'm not a records retention expert, but this is what I'll say, is that we have received, when we receive a request for documents and primary emails, which we have, where someone says I'm looking for all email correspondence from X starting from 2005 or from 2000. If we can set that against the customer's mailbox, we would start from that timeframe from 2000 to 2002. If those records are still contained within their mailbox, it's going to be pulled. If it's not there, they won't be able to pull it. So can we pull data that has been deleted from a president's mailbox? Unless it corrects me. I don't believe that the 36 that you're describing to can do that. If the emails are for somebody who is in a copstone program, which is a few folks whose emails are basically archives forever, and I don't mean I literally, but pretty much forever. Then we can start against any date rate. So for example, records, emails, archive, even though he's gone. So 10 years from now, if someone makes a request for records COVID related documents, we're going to find it because his mailbox, everything in his mailbox was captured. Okay. Yes, thank you. And I'll just speak for NARA and records management. Electronic records are scheduled like paper files generally. That is a true statement. Thank you. Yeah. Thank you, Alina. And thank you, Roger. Yes. I just want to add one statement of this is that there is a record retention policy within CDC, and each mailbox is treated differently. So our records liage and other records retention agency within division within CDC. We can use clear directions on how many months or how many years the particular mailbox can be retained actually. So based on that, we can quickly let the request know that these mailboxes' mails are not going to be found, or as Roger has said, if the capstone official probably records retention policies much longer actually. And this is Roger again. I just wanted to add something just on the new X2 because someone had a question about that. So the new X2, what it does, that the EDR, the 3D site you described it to doesn't do, is that it's able to better analyze the data. And so by being able to properly analyze the data, it helps us actually find a needle in the haystack. That is what new X2 is supposed to do. So it's, we don't use it for, let's say for deduplication, because the EDR feature can do that in 3D. It's more to analyze the data. So for example, what Senator was talking about, heat maps, where is most of the email traffic coming from, it categorizes the records. We've had new X2 for quite a while, but CDC has basically, we have definitely increased our usage of e-discovery tools since COVID. And so we're still awake in progress. And we continue to utilize the functionality of the system, but the SELMI is a much, much more robust system for analyzing data than the e-discovery tools or ADR is. It's helping us locate records. Thank you, Roger. Martha, you have a question on Twitter, right? Correct. We've been monitoring Twitter, and so we do have one question. What does CDC have available or will make available to help requesters better understand who the custodians would be for particular emails? Are there org charts or directories? They said it seems like CDC is placing the burden on the requester to know that. Well, let me, to the extent that we placed, my position is that in some situations that FOIA request, in a lot of situations that FOIA request may not know who the custodians are. And sometimes they do. To the extent that a FOIA requester doesn't know who not know where the custodians are, I tell my team, we should not go back to them and ask them for names of custodians because they would know. For example, if somebody makes a FOIA request and says, I want any correspondence sent by the chief of staff for Governor Cuomo, this is the person's name, to anybody in CDC, well, they don't have to know who the recipient in CDC is. They've given you the name of the person who sent the email. And so then we can go to EOC and say, hey, did anybody have any contact with the chief of staff for Cuomo? So yes, in some circumstances, the EOC, one of the, and I will say probably the EOC, the EOC is made up of employees who are detailed for a period of time and they leave. So it's a revolving door. So it's not, the people who are in EOC today may not be there 60 days from now. So it continues to change this. So there's not a list of folks who are, who are there for the entire duration of the pandemic. They're not. They go on detail for 30 to 60 days and they go back to their program office. The very few of them stay on for much longer periods. So that is part of the give and take. And so to the extent, and I'm sure it happened, and I would own that and apologize for that, to the extent that we are placing a burden on YouTube and look for costumers. We might be doing the situation where A, we've identified that you would know who the costumers are because of what you say. And in a situation where you don't know who the costumers are, then if you probably describe the topic matter, then it makes it easier for us to identify the costumers. I mean, for example, if you say, I want all correspondence about communications between CDC and CBP with regard to some particular topic, right? If the topic is scoped enough, we will be able to identify the folks within CDC who had any discussion, but not everyone, but at least the heavy-headed hitters who were involved in the discussion. With regard to whether there's going to be an alt chart, again, I'm not sure an alt chart necessarily will be helpful unless you're talking about the heads of the units who don't change, but even then, they change. I mean, I think that might over the EOC has gone through at least three, I think, to date. So they change. So I think what is important is be very clear about what it is you're asking for. You don't have to give us clarity on the costumers, but be clear on what it is that you're looking for. And then we can take it from there and to the extent that even they're not clear who the costumers are, we'll come back to you and ask you to refine your ask so that we can identify who is having a discussion about what you're asking for. Yeah. Thank you, Roger. And it was a good reminder of reminding everyone of the title, Identifying the Needleman Head Start, which we forgot, actually. Yeah. And I do see one question here. Yes. How do we eliminate duplicates? And it's already been covered in the discussion. We can use any of the tools, like Office 365 or even FoyExpress to eliminate duplicates. However, for containment, we only can use, right now, our capabilities are limited to using FoyExpress for containment. Thank you. Any questions on this? Yeah. High-intensity search. Got any questions on the line? And that covers the chat for now. Thank you. Yeah. So before I move to the next slide, what I would say is that any high-intensity search is bound with lots of complexities and a decision has to be made whether we move forward with the request or we hold back and send it back to the request. So that decision is made based on the number of custodians and the type of emails that we see if the keywords are very generic and so forth. So sometimes there is some discretion when we have to go back to the request to let them know that we cannot perform the search. Yeah. I can move to the next slide. So I have covered the different categorizations of the searches based on the technical complexities that we have seen so far. The next topic or next section is the issues that we see when we perform searches here. So we have made an attempt to identify the problem and help the end users to know the problems that we face to see if we can find some solutions and come up with better search results. So the first issue that was really identified was broad scope. The second was high-record count and the third is average data quality. And the three of these are pretty much related. And I can quickly go to the next slide where I'll be talking about the broad scope of the search. I think by now, most of you would have been or pretty much aware what the broad scope really means. The characterizations of the broad scope are too many keywords and having very generic keywords like just searching on autism or searching on SARS or searching on COVID or having too many mailboxes and the date range is very large. Sometimes you get requests where the date range is for a few years, for a few months and the results are very, very, too many results where it becomes really hard for us to identify the search, actually. So just to put it in perspective, look at the picture there. The scope, it's a rainy day and we have so many umbrellas there, but in reality, we only need one umbrella to identify the request here. And in different sense, the umbrella is good enough for us to identify the records and narrowed on the scope, actually. Any questions related to this topic of broad scope? Any other questions on why? Nothing new, thank you. Yeah, next slide. And as we have already discussed in the high-intensity search, you see very high data volumes and it really becomes very difficult for us to identify which is the right data or which is the wrong data unless the requester is really specific about what it's looking for. And sometimes some requester is very good at telling us what they are really looking for, but sometimes the requester has come up with some... I cannot get into the requester's mind to probably really mind to understand what he really is looking for or what she is really looking for. That is what makes it complex. When such a situation arises, it so happens that we get so much of data and we cannot know which is the real data in this. So just to be a perspective, look at the picture, we have so much of records there and we do not know which is the right data in the picture. Right data there. Any questions? There are no questions from the line. No? Yes. I can move to the next slide. And I think this is a relatively interesting topic here. So I'm using this term called average data quality. So when we do a search based on three keywords, sometimes we do see the records being read right. And when we do analyze the records, it turns out that we know that these records are not really what the Indians are looking for. But since the request is not really specified that he needs these records or those records, we still have to deliver these records. I can give you an example of a request where we were asked to search for records on all mailboxes at the CDC's Gautamala office. And then we did the search on Gautamala and ICE, ICE Tanks for Integration and Customs Enforcement. What we found was that we were getting all emails where the word Gautamala was showing up in the email signature and the word ICE was showing up in some attached documents in a PDF or in a Word document. And we absolutely knew that these records were not what the Indians were looking for. So this is what it means. So we have the quantity of the data here, but the quality is very cool because we are very certain that we are not getting the right records. And somebody asked for a response for COVID. So when we run a search for response for COVID, we do have a division or an EOC, a specific branch, which is looking at COVID response. So people have their, excuse me, people have their addresses as COVID-19 response. So what happens is all the emails with signatures of COVID-19 response show up. And I do know that these are not the records that they're looking for, but I still have to deliver them because these records are what the request are requested. So if it makes sense, what I'm trying to say is that the quality of the search is cool because of the keywords that are being provided or because the scope of the request was not really clear. If it makes sense. Any questions on this? Currently no questions in the phone queue. There's one chat question. It's a bit broader. We can save it or we can take it now. I think it's for you, Roger. Okay, we can take it now. Okay. I've seen the CDC FOIA annual report. You received approximately 2400 requests last year. How many FTEs do you have dedicated to doing FOIA searches for this number of requests? Dedicated to FOIA searches is one. That's it. We're working on getting a contractor to assist us, but right now it's just doing the searches. Okay. Thank you. Sure. I just wanted to add, as far as this, I really did a quality in a situation where the keyword that has been provided by a requester is so generic that it's going to be found in, for example, the signature bar, for example. One way to limit that would be to say the keyword should appear in the email content or in the subject. That would narrow it down so that we go, okay, if the word should appear in the body of the email or it should be in the subject or it should be, I think we can do searches within a certain number of words. COVID within five or 10 words of no-sale order or some other word just so that we make sure that whatever it is that you're looking for, right? Because at the end of the day, the requester, you are seeking information that is useful to you. And to the extent that we are looking and reviewing documents that are of no use to you, that is a waste of our time. That's a waste of your time. That results in a delay of response to you because at the end of the day, you want information that is useful to you. And a lot of times when it comes to e-description searches, you as a requester can do a lot to help us in making sure that we have good data to provide to you by the way you scope your request and to the extent that you make it easier for us to be much more precise and identify the documents that are responsive to your request. Thank you, Roger. Thank you, Roger. If you don't have any additional questions, we can move to the next section. So, so far, I've done a lot of complaining regarding issues and we have made some analysis and built-in observations. We are glad that we have found some recommendations that we are willing to share with the end users and also probably take any input advisors that you have for us so that we can come up with better searches. So, hopefully, this last section is going to be more intuitive and useful to all of you. So, let me start with the first aspect of improved EES search, when I say well-defined scope. So, what does a well-defined scope really mean? So, I'll categorize this into three different sections. So, when I say well-defined scope, what I mean is that we do not want any ambiguity in the scope. The requester needs to be very precise and concise in what he's looking for. So, as long as the requester is very precise and concise in what he's looking for, I'm very confident that we can get very good results. The second is, if a requester is looking to perform multiple searches within one single search, the recommendation is that he split each search into its individual line item within the search request. It will even be better if each sub-search is made its own individual request. That way, each search is very focused on an objective of what we're looking at you. That really helps us out, actually. And the last thing is that one recommendation is that most of the searches that I have observed there's a lot of newsletters and subscriptions that come in. And we do see a lot of requester explicitly stating that we do not need newsletters and subscriptions and we are looking only at conversations and things like that. So that is really appreciated. When we have these three or four items taken care of, when the scope is really well defined, it makes the search much more predictable. It saves us a lot of time and as Roger has stated, it provides much more productive results and it helps the end requester get the right data. Any questions on this? There are currently no questions on the fund. Thank you. And no new chat questions. Thank you. Okay, thank you. Let me go to the next item on this. Limiting keywords. So when I say limiting keywords, what I mean is that sometimes we do get requests. We have requests to give us keywords and say we want to search on this keyword. We give us one subset of keywords, another subset of keywords and say pick an or between this subset and that subset or an and between this and that. So what happens is that when we have multiple keywords coming in, I absolutely know that the search results are very diluted and we are getting a much more generic and abstract subject of data. So it's going to be an needle in and head stack here. That's for sure. So if the requester can be very concise and precise saying that I'm only looking for this keyword or this keyword, that really helps us in narrowing down this search. And the biggest recommendation I would say is that rather than using an and or an or search, the second recommendation is to go with a free search. I can give you an example of a free search. So we did a request that asking for testing for COVID-19 in long term care facilities. So that is a very good phrase but it doesn't necessarily mean that when I search for this phrase I'm going to get any records or all the records because people can use different words that probably they could rephrase the content of what they are looking to search in different ways. So what we figured out is that testing for COVID-19 in skill nursing facilities so the way the search was performed was rather than say testing we say test, test star. So the word is a suffix. So it could be test testing or testing. That is one thing. And the next is looking for testing within five or ten words to the reference of COVID or it could be in SARS, NCOV-19 or Corona and also additionally the term nursing skill nursing facility could be referred to as SNF long term care facility LTC or long term care facilities skilled nursing home and things like that. So we just need to get creative with those words and try to come up with and try to try to come up with an phrase search trying to add all the synonyms and capture those words. And my observation has been that rather than doing an and search of COVID-19 and skill nursing facility and testing when we did this phrase search trying to find words within a number of words, tasting we were able to get much better results which are much more accurate. So that is one thing that is definitely recommended instead of doing an and or search actually. Because an and or search could go very way. If there is an email with thousand pages the first word could start at the starting of the body of the email and the last word could be somewhere in a content in an Excel document or in a world document. So that record may not be relevant actually. So that is one thing we can eliminate such things when we try to do a phrase search. And the third thing is that let's say if the end user is coming up with keywords, it will always be better if they can prioritize which keyword takes precedence. So if they are doing a three keyword that recommend them doing the priority, this is the first keyword that takes precedence, the second less precedence and third less last precedence. Because when we run this records and we are getting tumor records, when we run them to do it, it uses a subset of records to us or even in office systems. That helps us present information to them and it's creating that okay for this first keyword you are seeing this records and this is taking more precedence. If you want this to take precedence, we will give you this subset of records. So we are trying to help, I'm trying to help the user come up with what is really looking for rather than having keywords with an and and so forth. So that is one thing that really helps us prioritize in the keywords, doing a phrase search and giving the keywords to as minimal as possible. Any questions on the limiting the keywords? There are currently no questions in the front here. That's right, I wanted to say something here because I want to make clear to everyone who is listening that there is no requirement that when you submit a FOIA quiz device that you have to provide us with keywords. So this example would be if you do provide us with keywords, limit the number of keywords because we've had two page, sometimes folks give us a whole page of keywords or two page of keywords. So what is important, one of the most important things that you can do is to have a well-defined scope, right? If you have a well-defined scope we will be able to find the key way that you might use might not be the term that internally the folks who are having conversations would use. So you might say long-term care and maybe they just use a term, they might use the name of the facility or they might just say LCC or whatever it is. So if the scope is well-defined that's a very good start. If you want to provide keywords you can limit the number of keywords but you're not required to give us keywords. You're also not required to give us custodians but if you do want to give us a list of custodians limit the list of custodians because the more custodians you provide to us the more records you're going to pull the more duplicated records are going to be provided because if there are 10 or 15 custodians and all of them are CCLA participants in particular discussion that means that one email string is going to be contained within 15 or 20 custodian email boxes and so just point of clarification. You don't need to give us keywords, you don't need to give us a list of custodians but if you do just limit it. Thank you Roger it was very useful information and a very good reminder and I can go to the next item which is avoiding generic keywords. So when I say generic keywords right I do see a lot of, I can give an example here I see a lot of requests coming with autism and I had one request where we were asked to search on a request on a custodian mail box who is a researcher on autism? So when we did a search on his mail box all his emails were all about autism so we came up with 30,000 records of autism based emails within span of 3 months. It was, it's like trying to search and stop brokers email with the word stock so that was the type of request which is very generic. In this instance the recommendation would be to if you are giving some generic keywords please also provide some supplementary keywords that will help us narrow down the search. So if somebody is sending an autism and you are searching the mail box of an autism research probably there is some medicine or there is a condition which is causing that so something which can narrow down the search results or something which is more specific that you are on a subset of records within autism that you are looking for. So that really helps us out in the long run. And next is one more example is about the beef processing plant guidelines and things like that so even that that was very we had lots of keywords very generic and things like that. So I am just giving an example of if you give us generic keywords also make sure that you do probably at least one supplementary keyword to narrow down the results. I can move to the next slide which is to limit the number of custodians that the monitor has already gone over it. The more number of custodians that you are going to have you are going to have more number of emails and more number of duplicates that we need to go through so I am hoping that I don't need to do that again and again. So the lesser number of custodians you are going to get the lesser number of records and it becomes a lot easier for us to like really narrow down the search results. And the last and the last item is reducing the time span of these searches actually. So if sometimes I do see requests coming in 4 years time span or 5 years time span and we find records sometimes we don't find records because of the records retention policy which is very different for each mailbox. However, we do notice that sometimes when we run searches like for a year or couple of months we get like 20,000, 30,000 records. So it is always better to like limit the time span because it is very specific on which time span you are looking for if there was an event that happened probably a month around the event 15 days before the event 15 days after the event probably makes sense where there is a lot of noise or heat masks related to this particular activity or let's say an example is the clarity and then people talk about clarity and just talk about for like a month or two. So if that one month or two months of time span can be identified it really helps us to identify the rights of records actually. These are some of this improvements actually which will really help us get better search for user request just actually. And if anybody has any questions for me I'm willing to answer them related to this topic on this section. So let's turn off the Cisolina. We have a question actually from our side Pomodos which if you and this is possibly a question also for Roger and Bruno would you be able to talk a little bit about the role of the FOIA public liaison and whether when a very broad search is submitted is the requester able to reach out to the FOIA public liaison who would be willing to help request or draft a well-scoped request? Yes, sure. I know the question to Roger but I do want to answer the question. Sure. So with CDC yes, I've had requesters contact the FOIA public liaison which I think now it's Bruno's list of the FOIA public liaison or they reached out to me I'm more than happy to work with requesters to scope out the request but at least from where I should it is much more advantageous for them to work. What we do when we get a FOIA request is that sign to an analyst and then on his hand he knows that case from cradle to grave. So at some point in the process if it's COVID I'm going to see that request review the request and then it gets released. Oftentimes the person who knows the day to day the in and out this means who has more details about the request would be the analyst. So my preference would be the first start of working with an analyst. If there's an impasse and then you have to calculate it, I'm more than happy to jump in. But I think if you start with an analyst and most times I think in most situations they are able to work with the requesters to reformulate their request in a way that is satisfactory to both sides. Sometimes we have an impasse and sometimes even they might have an impasse with me it just depends on what you're asking for. So for example based on the case I want you to do a set against other email boxes by particular program or division we're going to have an impasse because I'm going to say we can set against 300 or 400 custodians because Sunez cannot push a button to do that. He's going to have to manually put in every single email box for every single employee in that program or division. That right there would be an unreasonable request and it's going to take an unreasonable amount of time. So certainly yes, you can contact a file publicated, you can contact me directly you can contact Bruno to help you reform the request but the person you really should start with would be the person that sent you a request and that person's name is always in your acknowledgement letter that you receive. So you have the contact information of that person in your acknowledgement letter and it's best to start with that person. Okay, thanks. Mark, I think we have another question on the chat. Yes. So it was explained earlier that containment tools pull the last email string. However, what happens if multiple strings are created with recipients and CCs, editors lost and conversations going in multiple directions. Will the programs keep those break off strings or will they be eliminated by the program? That's a very good question and I can take this question and I can answer it for you. Yes. So if there is a breakage or somebody changes the content of the email or ads and your recipient or delete recipients that chain is broken and it so happens that another record is created. But when the analysts look at the record here they make sure that sometimes if it's the same thing, it's not it's not the containment they can go ahead and delete the record if required. But if the chain is broken it does create a new record actually. So the containment will not work for that particular instance here. Yeah, just to amplify what you're going to say. So if all the email correspondence was not all contained within one email string then any separate emails are complete separate records are going to be pulled they're not going to be eliminated. Okay. And then we had a follow-up I think on the same basic topic. Could you please discuss the topic of the most comprehensive email thread? Let me attempt to answer that question. I get I'm going to assume when you see the most comprehensive email thread you're saying that email thread that contains every single email correspondence about a particular topic. So if if that exists because sometimes it may not exist. So if to the extent that an email thread contains every single discussion about that particular subject matter well I'll assume that is the most comprehensive and then to the extent that the containment system identifies that then it pulls that record. So that the requester is receiving every single communication of discussion about that particular subject matter. But if they understand that one email string is not comprehensive and they are maybe multiple ones. They are subsets of it would go in different directions then those are going to have to be pulled and then they're not going to be considered as dupes or near dupes because they're not. So when you said they're going to be pulled you mean they will be part of the responses. Absolutely. They will be part of the responses. Exactly. And I want to add on to that as well. This is Bruno Vian at the CDC. From my experience using the tool in Cernath you can back me up. As far as the duplicates and the containment is concerned that tool is very sensitive. So I've had analysts come to me and say these are duplicates why is it not catching it? But any sort of change here or there if there's an attachment missing if it's a forward if there's any slight change the tool is very sensitive and it'll include it in the responsive document set. Let's give an example. For example let's say Bruno and Alina had an email conversation about having this webinar and so we had emails back and forth and there's a final email thread of this discussion and then I forward to Cernath and I just do FYI. I don't even see anything. I just forward to Cernath the whole email string. Not have introduced Cernath, that's a separate chain because he was not part of our conversation. So I just forwarded the whole email string between myself Bruno and Alina to Cernath. We no longer have one comprehensive email. We've created two separate ones now. Thank you Roger and thank you Bruno for reminding us. I'll just add one thing to what Bruno said is that if somebody even tries to add a small line break within the email chain and forward it to somebody else, it can create another chain altogether. So we'll end up having the same content or the same scope but as Roger said, multiple subjects of data now from the same information. Thanks everyone. I don't see anything else in chat right now. Thank you. I would like to open this up to anybody within the user community who's willing to provide us any recommendations that can help us give a little bit. So we can probably have a few minutes of chat or discussion to see if they have any suggestions for us and we can take those suggestions and have them discussed internally within our CDC file. Ladies and gentlemen, if you would like to make a comment over the phone or you have a question you may press pound two on your telephone keypad to enter the queue. I think you've done such a great job answering questions as we've done. Everyone has been on silence at this point but we'll give everyone a couple of minutes to absorb and I don't know if you want to ask Michelle to go to the next slide where the contact information is there. Yeah, sure. Perfect. Thank you for your commentation on chat that the information session was very helpful for understanding your end in order to work together. Thank you. Thank you very much. Thank you. Yes, so if anybody has any additional questions please feel free to reach out to me related to any technical aspects but if it is related to any business or administrative aspect I recommend that you reach out to Roger or Bruno and they should be able to answer the questions. Martha, do we have any other questions on chat platform? Nothing from our colleagues who are watching the YouTube chat right now. Thank you. I just saw another chat question come in. Does the CDC have an analyst to do manual responsiveness checks to further reduce duplicate emails slash attachments within threat? I will do this question to bring it up. I'll take that one. So this is just the first part but every set of records that SRAMP pulls is going to go to an analyst who is going to analyze it to go through the process of records before it's released to the requester. During that process if they are seeing duplicates because it's not perfect. At the end of the day it's a computer whatever you put in is what you're going to get out so you still need that human eye to look at it to make sure that everything is still responsive for one reason or the other. Yes, every package that goes out a person will still look at it and do that analysis and they look for duplicates and again as much as a computer is imperfect we are too. There may be duplicates that we miss but we take all the effort in the world to make sure that we catch those and not just for the requester but it's easier on us if we can catch the duplicates and we got to go line by line in review so it helps us out as well so we definitely do that review after certain ass process it definitely goes through another review before the release. In addition to the analyst who is assigned to review it when I'm reviewing a COVID record I'm also looking for everything to send I see duplicate emails of the same that are contained within a comprehensive thread I mean if I could either say flag as a duplicate or I might just leave it in but I have to make sure that the process is consistent that's the biggest thing I have to watch for is that that separate email that is contained within a comprehensive email is not processed differently from the comprehensive one so I have to make sure that that is done accurately and so as far as attachment goes the trick when we talk about attachment is a duplicate if I send if I send if I have email correspondence between CDC officials and they attach a document this is CDC please review and edit CDC school guidance for example and then that same school guidance edit is sent by let's say Dr. Wolinski and she sends it to let's say the White House and says this is our current draft of the school guidance I can't say that just because we have released it in this email string internally it's the same thing therefore it's a duplicate no it's not because that email change to the White House is a specific email that attachment is to that email string therefore that the document itself is not a duplicate so it's included even though it's the exact same document that internally Wolinski saw was given to by his staff it's the same document but we're not going to mark that as a duplicate just because it's the same document attached to a different email it's not so when we talk about removing attachments as duplicates it means that the email and the attachment are the same so everything should be the same otherwise it's in so the email string is the same but the attachment is different as a new record if the email the email string is different the attachment is the same we've seen earlier it doesn't matter it's still a different record and this Roger this is Bruno again this goes back to the question that Roger answered at the beginning of the presentation so in the FOIA world it's considered the email and the associated attachments are considered a record so that's why the default is if you make a request for emails those attachments are going to come unless you say that you don't want them then we can exclude them but in a record in this instance is that email and any associated attachments so that's why the body of the email is just a forward or it looks the same or the body of the I'm sorry the attachment is the same there's no changes made to an attachment but Roger sends me a draft of a document to five different people it's going to go to five different people the attachment is the same but the text may be different if it's forwarded or replied but there's no changes to that attachment. I think we have another couple of chat questions yes so this is getting to communication between the analysts and the requester will the analysts reach out and say your request is probably high intensity can we talk about scope to get it to moderate or low or will you do the search first before you determine that it's high intensity I guess the question is when a request comes in you know can a is there always a search conducted or can it be determined to be high intensity before the search is conducted I think that's the question yeah this is Roger I think at least from my experience a summer request on its face will be a high intensity search without you having to do a search but in some situations I've asked my staff to go before you go back and say this is overly broad or vague overomeness we need to have data to support that right so we should do a preliminary search to see what we pull because it might turn out that there's not much discussion here and sometimes we might do the second realize oh okay there wasn't a lot of conversations around the subject matter it seemed to be broad on its face but there wasn't much conversation here so but to the extent that so if we do the search and then we determine it's a high intensity search and Senors would make that known to the analysts and the analysts would go back to their request there would be nothing information to help them to form the request but sometimes on its face and I go back to this one about I want all correspondence that the CDC had with for example the White House okay it comes to the CDC with the White House from January 1, 2020 through December 31, 2020 on its face it's going to be a high intensity search because they're going to be multiple people they're going to multiple email domainings that's going to be high intensity search right on its face and we don't need Senors to do what it says to tell us that so it depends is the answer one question that someone had regarding duplicates if the recipient changes but the email thread is identical the thread containing a different recipient would be contained as a non-duplicate is that correct the content is exactly the same but you've got to define it it's a different email okay I don't see anything else in the chat right now unless I've missed something, Zina no, I don't see anything else either I think we've asked all the questions Michelle, anyone wants to try an orally on the phone no, I do not see any stat questions or comments from the site okay all right, turn out to any other wrap up words before we say goodbye to everyone or let them get on with their day yeah, sure, Alina I would like to wrap this up by saying that we would like to avoid the situation of the needle in the head stack as long as the scope is finalized and the scope is very concise I think the biggest takeaway from this session would be that if the requester can provide us the right scope it makes their life and our life a lot easier and I thank you for giving me this opportunity to present at today's session and I thank our partners at OGS for giving this opportunity for me to present this information and hopefully this is a helpful session and it helps us to even cut down on our first results, thank you thanks, Roger and Bruno and the other parting thoughts before we say goodbye to our folks Bruno, do you want to go first? Sure, I will I just want to say thank you again to OGS and I would recommend any other FOIA offices reach out and use their service as well they're great about advertising events and organizing, running them, moderating doing all the work so they make us look good we do the easy part so we really appreciate that I also would echo that and I would encourage us listening in to take advantage of take advantage of the opportunity that OGS has given to us to communicate with your requesters about your request I think the more we can communicate and the more we can let requesters know the challenges that we have to go through what we have to do I think the better it is for all of us and I want to say FOIA offices and the agency is that we take our job and respond to FOIA requests very seriously and we work tirelessly we work tirelessly every day to make sure that we get respond time to FOIA requests are we perfect? No are we close to being perfect? No but we try our hardest every day to get there and this is part of what we're trying to do is to FOIA requests to understand that they can help us make that goal of getting responses to them as time goes by. Great message Roger during public service recognition week so I think yes we're all tireless government employees well thank you all very much Roger and Bruno you've all done a great job of covering a lot of important material I think everyone will find it very helpful they have your contact information questions I want to thank everyone for joining us today I hope everyone and their families remain safe, healthy and resilient take care everyone and have a great day Bye Thank you all bye bye That concludes our conference thank you for using events so this is you may now disconnect