 Hi everyone, it's Monica Wahee, your Public Health Data Science Mentor. Today I'm going to talk to you about a dirty subject we try to avoid in public health, and that is the cost of queries. So you might be saying, what is she talking about? The cost of queries? We don't pay for queries in public health and health care? Well, actually we do. We just often don't realize it. For example, every time you find yourself sitting there waiting for SAS code to run, you are experiencing the cost of a query. Every time you feel like banging on your computer because SQL or R or Python code runs slow, we are basically talking about query cost. Let's talk more about what I mean by cost. In databaseing, cost literally means the level of burden placed on electronic resources needed in order to execute the operation. And the more electronic resources needed, the longer the operation will take. So we do not like costly operations and we try to avoid them if we can. On the slide, I compare two operations you can do in the back end. The first one on the left side is where you query or read data like running a select query in SQL or doing a proc print of some of your dataset in SAS. The second operation on the right side of the slide is where you actually enter new data or you change the values of existing data. In those cases, you are essentially adding new data. Let's go back to the left side of the slide and learn about how read operations work with data. There are roughly three steps to a read operation. First, the query criteria need to be specified. In SQL, if you were running a select query, you'd likely be setting these criteria under a where clause. You would probably use a where clause in SAS too. Second, the query gets executed, meaning the criteria are applied to back end data and the results are retrieved. Then finally, in step three, the data are displayed. Remember, in a read operation, no data are being written to disk or saved with new values. They are just being read. So if you have a normal form relational SQL database that is optimized and you've linked up your primary and foreign keys properly, you can run a select query quite quickly. In fact, you can save that select query as something called a view and then treat the view like a table. Only it's not a table. It's like when you filter an Excel spreadsheet. It's not a new spreadsheet, but it allows you to just view a few of the records at a time. In SAS, admittedly, it's a lot slower to run commands like proc print just to look at your data. This is mainly because SAS does not really have much in the way of indexing capabilities. SAS has a proc SQL that lets you set indexes like primary and foreign keys, but personally, I've never successfully used the strategy to improve IO in SAS. Still, you can optimize IO in SAS using a few tricks. First, if you sort on a variable, it acts as an index, so if you set criteria on that variable, your query will go faster. Next, if you do a strategic placement of if and where commands for criteria, then you can improve IO as a read operation in SAS. Now let's compare the cost of a read operation, which has a non-zero cost, to the cost of entering and updating data. Let's look at the three steps involved in updating data. First, the system has to get the new data. What this means is you have to fill out the data entry front end with the new data, and then hit submit. In the second step, the data flow to the back end where they hit validation rules. In other words, the data you entered in the front end are now evaluated to see if they are compliant with the data structure of the back end and are allowed to actually be saved in the back end. Now let's go on to step three. If the data are found to be compliant with validation rules, then the data are updated in the back end. They are literally written to a physical disk, which actually takes some time when you think about it, but let's say they are non-compliant with validation rules. That means a fatal edit. In other words, the edit dies or errors out. The edit doesn't actually happen because it's against the validation rules. So the user on the front end should get an error message back saying their data did not write to disk. So let's now step back and compare read operations, even slow ones in unoptimized environments like SAS, to write operations, which is entering and editing data. Which of these operations is busier, which has more going on in it? Definitely the write operations. One of the main things that makes writing costly is evaluating the validation rules. But the thing that actually definitely makes it more costly than read operations is writing to disk. A fatal edit is much faster than a successful write to disk. And even a fatal edit usually goes slower than a read operation. So if this is kind of a new concept for you and you think, hey, I should really know more about this kind of information, then I encourage you to take my online applications basics course. This course is aimed at professionals with a background in public health and health analytics who want to learn about applications so they can better understand data coming from applications. This course is actually one of the first courses you will take if you join my group data science mentoring program. If you are a health data analyst looking to tweak your career in a more data science direction, consider joining the program. That way I can give you more hands on mentoring when it comes to figuring out your query costs and hopefully getting the costs down. I'll put links to the course and the program in the description to this video. Thanks so much for watching this video to the end and taking some time to learn about query cost. If you found this video worthwhile, please give it a thumbs up. And if anything I said sparked an idea in your head, please share it with us in the comments. And beyond that, have a beautyous and superb day. Thank you for watching this video, which is part of the public health to data science rebrand program. If you are interested in joining the program, please sign up for a 30 minute zoom interview using the link in the description.