 And welcome to our barcodes on ballots bad. My name is Kevin Skoglund. I'm a voting system security researcher who studied barcodes for several years and reported vulnerabilities I found. Today I'm going to discuss how voting systems store votes in barcodes, explain how to decode them, and explore their attack surface from a security perspective. There are many places where barcodes may be used in elections. You may see them in the margins of ballots, while the tabulator had a map ovals to a set of contests and candidates. You may see them on mail-in ballots, poll books, or to keep track of equipment. These barcodes may have vulnerabilities, maybe even ones that we'll discuss, but we aren't going to focus on them. We will focus on barcodes on ballots that store vote selections. In 2004, the Populix slate was the first ballot marking device to store vote selections inside a barcode, but it was not widely used. Barcodes on ballots really took off a decade later with the introduction of the ESNS Express vote. It was quickly adopted by many states. Like many ballot marking devices, it was used by voters with disabilities, but ESNS also marketed it to a wider audience, for use in early voting centers and for universal use, where every voter in a precinct votes on a touchscreen. You can see in the photo that the ballot has a section of barcodes at the top and a list of the selected candidates printed below. Since then, similar products have been introduced by other vendors. This category of voting systems is still new, but it's spread quickly. They are used by several whole states and a number of populous cities. There are many pros and cons to using these devices, but I want to spend our time on the security issues around the barcodes specifically. For anyone who's not familiar with ballot marking devices, it may help to have a quick overview of how they're used. A voter would use a touchscreen computer known as a ballot marking device, or BMD for short, to make vote selections. The voter prints those selections on a printer. The printer outputs a paper ballot which contains barcodes and a text summary of the selections. The voter has an opportunity to examine the text summary and then cast the ballot by feeding it into a tabulator. The tabulator reads the barcodes and attributes votes to candidates. Then the ballot is impounded into the ballot box. There are variations, but this is the general process. Now that we have some background, let's learn how to decode the barcodes on a few different systems. We will look at the ESNS ExpressVote and ExpressVote XL, the Dominion ImageCast X, and the Unison Freedom Vote tablet. I picked these as popular examples that illustrate three different ways barcodes are used. They make good case studies and they can act as stand-ins for other systems that use similar concepts, maybe even systems that are not yet on the market. The ESNS ExpressVote ballot uses barcodes in the Code 128C format. The long barcode at the top contains the ballot style, which sets which contests and candidates are being voted on. It also contains a count of the number of barcodes that follow in the three columns below it. Those are the vote selections. This ballot has 15 barcodes in five rows. If you look at the text below the barcodes, you'll find that there are 16 contests. One contest had no selection, so no barcode is stored for it. The barcodes can be scanned by any barcode reader. However, smartphones have difficulty because there's so many barcodes and they're so close together that it can't pick out just one. There are other obstacles to decoding these barcodes. One thing, the barcodes are not printed in the same order as the contests in the text. For example, the sixth barcode is a vote for the second candidate in the summary that makes it difficult to match them up and account for them all. And the barcodes do not decode the text. They contain six-digit numbers, like 093411. That number is a candidate identifier. The tabulator, which processes the barcode, will match that identifier to a candidate's tally and record a vote. The candidate identifier also matches the ovals on a mark-sense ballot, which you may know is a full ballot or a hand-marked paper ballot. I'm going to call it a mark-sense ballot because the scanner senses the marks that are in the ovals. With that ballot in hand, we can work out the meaning behind the identifier. The timing marks around the outside edge create a grid of coordinates, and we can locate the ovals by those coordinates. The six digits follow a pattern. Two digits for column number, two digits for row number, one digit for page number, and one digit for side number. And so on the grid, 091011 would be the ninth column, the tenth row of page one, side one, which is the candidate Hope S. Lost. Two positions lower on the grid is candidate ID 091211, which is Rufus Lee King. Below that is a write-in position at 091411. Write-in votes would store the number only, not the candidate's name. The barcodes are strictly six-digit numbers. Overall, it's a clever way to allow the tabulator to scan two very different types of ballots, but only work with one set of identifiers. Scanning a mark-sense ballot returns grid positions, and scanning barcodes also returns grid positions. On the Dominion Image Cast X ballot, the barcode is in the QR code format. The BMD version prints on an 8.5x11 sheet of paper. The VVPAP version prints on a narrow spool of paper inside a box. The QR code includes the ballot style, like the long barcode did on the ESNS ballot. It contains the vote selections. It encodes any write-in candidate names, and it has an HMAC signature. For long ballots, it can print more than one QR code if it needs to. Most barcode scanners and smartphones will not read these barcodes. Most QR codes contain an identifier that's for an application to use, such as for package tracking, or they may contain a URL. You may have used one in a restaurant in order to look up the menu. These QR codes are not standard because they contain binary data with a different structure instead of just characters. If you want to decode them, you can use a couple of websites called ZBarImage. Any reader needs to extract raw bytes, either as binary or hexadecimal, in order to work with them. If we take a Dominion ballot barcode and we run it through ZBarImage, it will return bytes of data as hexadecimal numbers. Every two hex digits is one byte or eight bits. The spaces separate those bytes. To understand the vote selections, we need the binary, so I used some Ruby code to convert the hex digits to binary two at a time. I colored the first four bytes differently so you can see how the two versions lined up. It's really just condensing and expanding the same information. Now that we have binary data, what does it mean? We can parse it into several sections. The green parts are metadata. Many of them are just separators. The blue parts are the precinct and ballot style. If you convert them to decimal numbers, you might recognize them. The red part contains the vote selections. And the gray part is a 32 byte HMAC, which is a hash of the selections data. You can think of the HMAC like a fingerprint for this set of selections. It's not a fingerprint for this particular ballot, but any ballot with the same choices would have the same fingerprint. Let's focus on the vote selections. As with the ESNS ballot, we need a mark-sense ballot or similar reference to make sense of this encoding. If we made a list of the contests and then a list of the candidates in each contest in order, it would line up perfectly with the list of zeros and ones. I just took away the spaces and added line returns. Each one indicates a vote. Each zero represents no vote. For example, the court clerk contest has three candidates. The binary tells us that the first one has a vote and the second two do not. This is a straightforward example. They get more complicated with ballot sheets and write-ins and rank choice voting, but the basic pattern is the same. Let's look at our last example. The ballot from the Unison Freedom Vote Tablet. It has a non-standard barcode. At the very bottom, you see a barcode that's in the Code 128 format, which contains the ballot style. Above that, the vote selections are marked in a grid pattern. This is similar to the punch cards of the past or to a system called Incavote, which is how I refer to these. You can also think of them as compressed vote targets. Vote targets are what we call the ovals or boxes on a full ballot. Imagine removing all the text from a full ballot and then compressing the remaining boxes down to a small bit of real estate. To illustrate, here's a Unison Marksense ballot on the left alongside the grid of selections and the text summary. The outermost marks in the grid are timing marks, just the same as the timing marks around the full ballot. In both cases, they help the scanner locate the vote targets. The vote selections are inside the orange box. There are four of them. The text summary lists five contests, but the governor's contest had no selection. It helps a lot if I add boxes to the grid to show you the vote choices that were not marked. The votes are recorded starting in the upper left and moving down each column. The first box is empty. It corresponds to the straight party Republican vote on the full ballot. The second box is filled. We can look at the full ballot and see that this is a vote for the straight party Democratic choice, and that's what the text summary shows, too. We move to the next column, the box is filled. On the full ballot, that would be a vote for Daniel Boone, which is also what the summary shows, and so on. Notice that the text summary also includes position numbers that match the positions in the barcode. So is this even a barcode? Well, technically probably not, but it is one for our purposes. It encodes the vote selections in a machine-readable format that voters cannot read or verify, and it can be attacked. We'll get to see how a custom roll-your-own barcode stands up to those attacks. Here's the table that compares the key features of these three examples. None of the barcodes that we examined are readable by a voter with a smartphone in the voting booth, and even if they have a fancier barcode scanner, all of them need a full ballot to use as a reference when decoding. None of them are encrypted, and they vary widely in their use of integrity protection measures. Now let's talk about potential attacks on barcodes and on barcode scanners. In this section, I'm not claiming vulnerabilities in any specific systems. I will point out characteristics of some systems to illustrate how issues might be handled. I'm not examining the feasibility or the likelihood of exploitation, which are very important considerations. For one thing, I'm not looking at what other defenses may exist, and I'm not planning to give enough information to perform an attack. This will be an examination of the attack's surface and the areas that have potential risks. You can think of it as advice that I'd give to a pen tester who is obsessed with examining a system that uses barcodes. These are the places that I would look, here are the rocks that I'd turn over. I also hope to surface some of the risks so that manufacturers can consider them, mitigate them, and build better systems. And I'd like to help jurisdictions know what to ask during purchasing and to help them to secure the systems that they already own. My overall goal is to help the defenders, not the attackers. I believe we can be mature and build systems without panicking and weaving things into conspiracy theories. The process diagram is also a good diagram of the attack's surface. Every point has attack potential. Obviously, it's better to attack closer to the left side before barcodes are printed. But there are attack opportunities to edit or destroy barcodes later, even after the ballot has been tabulated and stored. Note that some systems perform ballot creation inside one device, while others use separate devices. Position number two could represent a USB cable between a tablet and a printer. And some systems combine tabulation and storage into one device, while others, such as central count tabulation, split it up. And some all-in-one hybrid systems contain everything from one to seven inside a single device. There are many ways that we might launch attacks on these areas. I won't spend time explaining them because you've likely heard them elsewhere. They include supply chain attacks, ordering, malware, machine-in-the-middle attacks, and remote access. One important ingredient in many of the attacks we'll look at is access to a printer. Positions one through three are before printing and have access to that printer. At position four, usually the ballot is under the voter's control, but there may be an opportunity for poll workers to handle the ballot, or the voter may insert the ballot into a device to verify their selections, as one vendor routinely demonstrates. At position five, most tabulators do not have a printer, though some may add one to support post-election audits by printing ballot identifiers. Ideally, that printing would be in a different color and restricted to the margins. Most tabulators scan a ballot and drop it into storage immediately, but several all-in-one hybrid machines have a single paper path, so that the cast ballot passes a software-controlled printer at position six on the way. And finally, a printer isn't the only way to mark on these ballots. Several print on thermal paper. Applying heat, friction, or scraping will mark on those ballots. You can make a mark by swiping a fingernail across them. And just like with any type of paper ballot, there may be opportunities for manipulation during transport, storage, and post-election audits. The first attack that we'll look at is the most straightforward, printing a different barcode. This is an attack during creation of every type of barcode. It's the strongest attack because the attacker has the most flexibility about what gets printed and has the least chance of detection. A hacked BMD would be the ideal target because the BMD has everything it needs to generate barcodes. That's its regular job. Whereas the printer or a machine in the middle attack would need to know the barcode format, the data structure, and possibly additional data to create ballots for the voting measures. If the attacker replaced only the barcode, the voter would have no way of detecting it and would cast the ballot. Or the attacker could create a much better forgery by modifying both the barcode and the text since studies show that voters rarely check the text. Either way, it's the barcode that gets counted on the election night. Next we have a replay attack. During creation of a ballot, an attacker takes a barcode and they replace it with one that they do like. This is different than printing any barcode because the attacker doesn't have to navigate all the complexities of building a barcode. It's just mimicry of a known valid barcode. One doesn't need to know the structure or any secrets. It just has to know that one barcode is favorable and then recognize unfavorable ones to swap out. But attack number two is suitable for a printer who is at positions two and three where you may not possess all the information that the BMD has. Next, we'll look at two similar attacks together. First, if there's an undervote in a contest where the voter has not voted or has not voted as many votes as they're allowed, the attacker can add their preferred vote to that contest. In these examples, you'll see that the voter could have voted for one more candidate in each of these contests. As barcodes, those could be opportunities to steal an undervote. It would net one vote for a favored candidate or many votes for a straight party contest. An attacker could also add a surplus vote to create an overvote. If too many candidates are picked, all the votes in the contest may become invalid. In the example, you can see that each contest has one vote too many. It would not net any new votes for a favored candidate, but it could subtract votes for one favored candidate. These examples use boxes, not barcodes. So how would barcodes be affected? The Inca vote that we saw with the freedom vote tablet, it's very vulnerable. Adding a vote is as simple as adding a square to the grid. On thermal paper, I might be able to do it with a small scraper. And because of their similarity, Marksense ballots have this same vulnerability. One difference though is that Marksense ballots have context around the marks, so the voter may notice the change. With barcodes, they don't. Code 128 may be vulnerable, depending on the implementation. It helps that each barcode is an individual vote. The express vote style of ballot may leave blank spaces if rows are not evenly divisible by 3. ESNS has reportedly mitigated this in future software by filling in a row of Xs. ESNS also records a count of the barcodes on the ballot. I don't think that's a strong integrity measure, but it does offer a basic defense. Adding votes to the QR code would be more difficult. The format has a complex structure and each vote is not as distinct. The ImageCast X implementation also includes an HMAC, so changing selections would change the expected HMAC. And honestly, if an attacker has enough access to create a matching HMAC, then it's no longer distinct from attack number one where an attacker can print any barcode at all. Instead of adding a vote, an attacker could modify a printed barcode to have different values. This is not possible with Incavote or QR codes. One reason is that modification must be additive. You can add a black mark to one box, but you can't take away an existing black mark. And we've already seen how QR code has a complex format and may have integrity checking on those selections. However, Code 128 is potentially vulnerable. The ExpressVote implementation confirms the count of the barcodes, but not their content. Let's look at a Code 128 barcode. It has a start character, the data, a check character, and a stop character. The six-digit candidate identifier is stored in pairs as three barcode characters. In 093411, this is the 34 character. Don't think of barcodes as being fat and thin lines or spaces. There are 11 positions that can either be black or white. Two of the same color together looks like a thicker line. 34 is a good candidate for editing because it has a lot of white space where we can add black lines. In 093011, the 30 character is not a good candidate for editing because it has lots of black lines already. And modification is additive. We can't remove those lines. But we can change 34 to 30 just by adding two lines. But this new barcode won't be valid. After the data portion of the barcode is a check character. And its purpose is to ensure that the barcode scanner read the data correctly. The scanner passes the data into a fast algorithm and gets a value between 0 and 102, which it encodes as the check character. If the data doesn't match the check character, then the scan fails. The check character for 093411 is 9. The check character for 093011 is 1. So if we modified the 34 to become 30, the expected check character would be 1 and it's not. But the check character may also be modified. If we add two lines to the 9, it becomes a 1 and the resulting barcode becomes valid. If every row of candidate names is plus 2 on the grid, the odds are good that these two coordinate positions will be candidates in the same contest. I wrote a program that runs through all the editing possibilities. Around 31% of grid positions are editable like this. Of course, not all columns in the grid are used frequently. Column 1 is often filled with valid instructions. Column 9 is frequently used for the first set of contests. And on page 1, side 1, 48 out of the 99 rows are editable to different positions. Some positions have several possible edits that could be made. 093411 can be changed to 9 other grid positions. 100811 can be changed to 26. Others have 1 or none. Modifying barcode 128 barcodes has constraints, but it is possible. I found one example at the bottom of the first column where a straight party vote for one party could be changed to a completely different party. One modified barcode could change the votes in many contests. A less refined attack would be to invalidate barcodes. They could be printed as invalid, but appear perfectly real. They could be modified so they become invalid, or they could simply be defaced. The result might be that the barcode is ignored, the ballot might be rejected, or it might create an undervote opportunity. Imagine if you could invalidate one whole QR code while adding an alternate QR code. Or it could even be used to hide evidence of manipulation. We'll now shift our focus to attacks on barcode scanners using barcodes. An attacker could introduce a barcode which modifies the scanner settings. Commercial off-the-shelf scanners often allow configuration by scanning special barcodes. These barcodes are not secret. They're usually printed at the back of the manual. These might be handheld scanners, or fixed mount scanners. Mostly, most if not all of the voting machine tabulators on the market today make a full image of the ballot and then use software to read the barcodes from the image. Software can be configured in other ways, so it has no need of these configuration barcodes. So while a scanner reconfiguration attack on ballot tabulation is possible, better targets would be on other parts of the system, like the built-in scanners used for ballot activation, or the ones used to transfer ballot selections from a phone. What could be reconfigured? There may be as many as 50 different settings. An attacker could create denial of service, allow additional barcode types to be scanned, change how duplicate or invalid barcodes are handled, allow data to span several barcodes, or enable user-defined triggers so that scanning a barcode triggers an action like a macro. Reconfiguration might help with our next attack, injection attacks. Some of you are familiar with other types of injection attacks. An attacker sends malicious input that gets interpreted by the system. A barcode is a keyboard, as Hari Hirsti has frequently said. Barcodes can include characters commonly used for injections. They can shift to different character sets to access special characters. Unlike reconfiguration attacks, injection applies equally to any computer process that doesn't fully sanitize both hardware scanners and software scanners may be vulnerable. One might be able to use code 128 with less than 10 characters to send the keystrokes that are needed to exit the running program and launch a command line interface. QR code and other formats may hold hundreds of characters enough for a small program. This is an attack that could have a lot of impact. The last set of attacks we'll look at are disinformation attacks. Disinformation comes in all shapes and sizes, but barcodes are vulnerable because of their lack of transparency. They're not verifiable by the voter even if the voter has a barcode reader in hand. Certainly, some voters ignore them or just accept them, but many voters routinely express fear, uncertainty, and doubt about them. The seed for disinformation to latch on to is if your vote was in a barcode and the barcodes are what the system counts, how do you know your ballot has passed and counted as you intended? Now, my opinion, which is only backed by subjective and anecdotal evidence, is that barcodes provoke mistrust more than paperless DRE systems do. That's odd since paperless systems should be less trustworthy, but paperless DRE systems don't wave any red flags during the voting process. A voter is not provoked to consider how the voting machine does its work. Barcodes raise the question of trust with a prominently placed visual. Voters can't help but notice unreadable glyphs on their ballot and they have to decide how they feel about them. It sets the stage for disinformation attacks. Alright, let's look now at how defenders can detect and mitigate these types of attacks. We'll start with some things manufacturers of voting systems could do. Manufacturers can improve the overall security of BMDs, protect barcode scanners like they would any other port, and sanitize and allow less all data from barcodes to protect against injection attacks and they can prevent scanner reconfiguration in the field. Their systems should not re-expose printed ballots to printers. There should not be a single paper path that gets reused. Most importantly, they should design systems to confirm the integrity and authenticity of data coming from barcodes and in a strong way that considers adversarial attacks. That may mean public key cryptography and it may require working out good key management practices and working to make barcodes more transparent and interoperable would help to make them more trustworthy. Jurisdictions can do a lot to protect the systems they already have. They should think first about resilience planning. A huge vulnerability is that a malfunctioning ballot marking device can't write barcodes and a malfunctioning tabulator can't read them. If a machine needs to be taken out of service, jurisdiction should have a backup plan that's ready to go. Jurisdictions should ensure that ballot marking devices have good physical security and chain of custody measures, especially during voting machine sleepovers in a polling place. Don't assume that the BMD is less critical than the tabulator. They should encourage voter verification of all BMD printed ballots to detect errors and anomalies. This can be done with signage and with poll worker prompting. They can detect everything, but studies show that it does help. Poll workers can be trained on potential barcode issues and jurisdictions can advocate for legislation and policies that define the official vote. This is important because one issue barcodes create, which hasn't existed previously, is that a single piece of paper contains two separate representations of a vote, the barcode version and the text version. If they match, there's not a problem, but if the barcodes in the human readable text it needs to be legally clear which one should govern. In most places the law doesn't consider that possibility yet. And accordingly any recounts of the ballots should use the text as the official vote. And by far the best way to mitigate any attacks that change election outcomes is with post-election audits. They prevent tabulation errors of all kinds. The gold standard is risk limiting audits, which are efficient, but also tied to the margin of victory, so that you test more ballots when a race is close than when it's not. Any audit of ballots with barcodes should examine the human readable text on the paper by the hand-eye method and not the barcodes. The more contests that are audited, the more protection an election has. So in conclusion, are barcodes on ballots bad? As we've seen, barcodes introduce new attack surface and new risks. Many of those risks can be mitigated or managed, but we have to actually do it and there are gaps at the moment. The hardest issue to mitigate is the lack of transparency. This is the primary reason I'm personally not in favor of barcodes. It's not the risks or the attacks. Barcodes make voters uneasy. I've heard it from many voters. They worried that the system could be doing sleight of hand or pulling a fast one on them. You may have that gut feeling yourself. I understand it. To me, that runs counter to our goal of building trust in elections. There are two voting system manufacturers who do not put votes in barcodes. Heart and clear ballot. So barcodes aren't a necessity. In 2019, the state of Colorado announced that after 2021, they would no longer certify voting systems that tabulate using barcodes. As a result, Dominion modified the Imagecast XBMD used in Colorado so that it fills in the ovals on a mark sense ballot, instead of using a QR code like the ballot that we deconstructed. This security is not the only consideration in making a choice like Colorado's. But with one action, it took barcodes off of the ballots and reduced all of the risks that we've looked at today. Thank you very much.