 Hello everyone, so I appreciate that you are here at that late time. My name is Selits Tachov, and I'm the author of Air Spam Day Open Source Spam Filtering System, and also I'm a FreeBSD developer for more than 10 years. In this talk, I will talk about one particular integration of Air Spam Day, and actually it's about integration of Air Spam Day in FreeBSD email. This talk won't be about performance of Air Spam Day, because this release doesn't process more than one million messages per day, and I consider this volume as very low, so I don't care about performance at all. So the tasks were how to build spam filtering, because nobody likes spam, and especially nobody likes spam in mailing lists. And because mailing lists are the main source of information and you hate to archive spam messages and so on and so forth. So that's quite an important task, and here are some explanations about what was difficult. So let's start this talk about the architecture of FreeBSD mail. So there are a couple of mixes, and I'll draw some sort of flows of mail here. So we have FreeFall, where all developer accounts live, and there are also aliases. So email comes to MX-1, and it integrates with FreeFall to operate with local email. Then we have Mailman, which operates why MX-1 and MX-2 is used for outbound spam. Sorry, for outbound. Yes. Spam as well. Then MX-1 operates with MX-2. MX-2 can send back and deliver notifications via MX-1. FreeFall can send email via MX-2. Mailman can send emails to FreeFall for local users. And that was my opinion about all this system. So luckily, we had some sort of help of the postmasters, in particular from Remkalotgar and Kiril Panamoryov, and they actually built RSPOMD in these complicated mail flows. I did just small adjustments, actually. So the main problems of the email architecture is that we have very complicated mail flows. So there are many relays, many aliases. So lots of users prefer to use something like Gmail instead of using local clients, and actually it complicates the situation. And also email has very non-trivial circuits. So in some cases I could see like 16 received headers for a single message because it constantly ping-ponged between different relays. And there are lots of legacy decisions because FreeBSD mail is almost as old as FreeBSD, so lots of decisions are from 90s when we had something like UTP. So the decision to filter spam and some other stuff was to add RSPOMD because I'm a FreeBSD developer and I'm the author of RSPOMD and I suggested postmasters to try this project because it's free in the open source. And to resolve our issues. So that's something about RSPOMD, so it has web interface. That's not FreeBSD relay, that's just my random relay. And internally it's represented as a complicated system of different layers. It integrates with MTA using MILTER in our case, so we use postfix and hence we use MILTER. In our situation we don't use separate proxy layer, we combine proxy and scan layer, so it operates in self-scan mode and internally it uses different things and actually all of them with exception of bias are used for now, so we use even neural networks. I'll talk about it a bit later because it's a bit tricky topic. So what problems we've tried to resolve with RSPOMD? So the first one is actually spam in the mailing list. We don't like spam, nobody likes spam. Another problem which was also quite important was that we break the Kim signatures when they pass our mailman and we wanted to do some sort of controlling of our mail flows because we had completely no understanding about how mail goes and how to make it better. Well, not this one. So my personal motivation was mainly to reduce amount of spam in my mail client. And actually that's some of samples of messages that I've constantly received from FreeBSD mailing lists. And what's more important I couldn't filter them because my profile of RSPOMD, my own RSPOMD local was so that actually mails from FreeBSD lists are usually trusted. So the vast majority of emails from FreeBSD lists are actually ham and they have very low score, low negative score. And also FreeBSD relays are listed in some of DNS white lists. So basically it's impossible to use any RBLs and other stuff because they're hidden beyond the FreeBSD relay. That's why I constantly asked about let's install RSPOMD, let's install RSPOMD and try to do something better because my filter can filter these emails and your filter cannot. So now I still receive some spam from FreeBSD lists because actually we are currently quite cautious about rejecting email. But what you can see that there is special symbol called spam flag. It means that the relay on FreeBSD marked this message as junk and actually RSPOMD can recognize this and treat this email separately. So this email wasn't my junk folder, luckily. And actually that's quite good. So in future we plan to reduce reject score and to increase probably junk score to filter more emails. Because currently that's the profile of our today's statistics. So the vast majority of emails are ham. But actually these statistics is a bit skewed because basically on mailing lists, on each email that comes to mailing list we have like 200-1000 of emails that are coming out. And actually because of this complicated schema we see these messages many times. And since we send these emails to all our subscribers we actually have a situation when we multiply one ham message and send it many times. Obviously if message has been rejected we don't send it to subscribers. And that's why this profile is a bit difficult to understand because basically it's not true that the vast majority of emails are ham. Because we have lots of spam but due to these multiplication effects from mailing lists we don't see this in this stat. Another problem of this statistic is actually using of neural networks. Because for neural networks they assume that your traffic is somehow balanced. So it learns the model using half of samples spam and half of samples of ham. And in this situation neural network is also not very useful. So this is also quite a complicated topic. So the plans for future are to improve neural network in the way to reduce number of the same training vectors. Because currently it just pushes everything so if a message has high enough negative score it just tells that let's learn neural network from it. Actually that's not quite a good decision especially for this profile of traffic. So we have improved the situation with spam but there are obviously areas to improve it better. I'll talk about it a bit later. Another problem that we result with Rspomday is actually to sign messages. Because in many cases we have the following situation. We have something like Gmail or other sender and this sender signs their messages using their private key. Mailing lists are usually modifying messages. So they are adding some sort of disclaimers or this message has been delivered by some mailing list, blah, blah, blah. And that actually breaks the key signatures. What's worse is actually that Gmail in particular rewrite sender, sorry, sign sender. And sender is the special email header that is always rewritten by Mailman. And actually all messages from Gmail have broken the key signatures. And this situation is not very good because many users don't like when they receive messages that are from some sort of untrusted sources because you don't see anything good from that. So the trivial solution is actually to add another key signature. And actually Rspomday is used for this as well. Obviously there are some other solutions like OpenDikim. But in our case we have Rspomday so why not use it for the key signature, the key signing. And in this case all emails that come from FreeBSD are digitally signed by FreeBSD early. It makes some users happy. Actually there is one thing here. So currently we sign all emails. Even if we decide that this email is junk, we still sign it. I think that should be fixed at some point to not sign messages that are not trusted and that are considered as spam because it doesn't make any sense to sign spam. Another technique is actually to use Arc. So unlike Dikim, Arc provides the way how to organize chains of trust in emails. So in nutshell it looks like that each relay adds special header. Actually there are three headers. And each relay increases special counter. So basically unlike Dikim's signature you don't know when this relay has added this signature. Obviously much something like received headers. But usually the received headers are not signed. But Dikim for obvious reasons because they can be added in any order. For Arc there are some senders that are using Arc, in particular Gmail, then Yahoo. And as far as I remember Microsoft Office 365. And what's more important that all these providers who are using Arc, they also recognize Arc. So basically if you have this Dikim signature broken, a message still can be verified because there is some trusted relay, for instance 3bz relay, that tells that this email was successful before I modified it. So the idea here is that you can have message and you can modify it because you are like trusted relay. And you can tell another relays that the initial message has like valid signature. And another relays if they trust you, obviously they can tell that okay, I trust 3bz and I know that 3bz won't lie about the initial signature and I can improve our sending experience by adding this signature. So that's how it works. That is email that has been passed from my domain to 3bz delist and then it was received by Gmail. So Gmail actually signed it one more time. And in this case you see that the i equal 2 means that there is the third relay on the way. And also there are authentication results. So you can see that the body hash from the originator is not verified because it was broken by mailman. But the ARC verification and Dikim verification afterwards was successful. And actually it helps to interoperate with email services providers and it helps to improve delivery basically. So what are issues with signing? The main problem again is that we can see messages many times. So in some cases we can see a single message on a mix 1 like 3 or 4 times. Yes. And accordingly we need to enable and disable signing and checks because we have seen this message. And that that's not done actually that we should not sign spam. So if we decide that this message is spam we should stop the signing process and proceed it without signatures because well it's stupid. So that's how it's resolved in RSPOMD. So there is a small snippet. So all configuration is stored in local d-folder so we don't modify any vanilla configurations. That's recommended way how to configure RSPOMD because all the time when I have bug reports in many cases users are just modifying vanilla config and don't know how to deal with the upgrade. And that's the way how it should be done. So here we just define some rules and enable and disable some symbols according to their pass when we see this message. So for instance we don't add multiple Dikim signatures. That's quite important because otherwise if we see this message multiple times if you add multiple signatures from the same domain then one of these signatures won't be valid. That's the idea. That's not true for ARC by the way. Yes. That's not missing the Dikim signature. I mean yes I know but the question here is that from the perspective of ESP it depends because invalid Dikim signature means that your DMARC verification if you have some sort of DMARC rules can be totally failed. And if you have no Dikim signature well I've implemented many DMARC. I've implemented DMARC in RSPOMD and I know that well RFC tells treat it in this way but actually sometimes it shouldn't be treated in this way because basically if you receive message with broken Dikim signature or without the Dikim signature from PayPal oh yeah you're right. So I'm just telling the same thing in other words. Sorry. But in this case it just ensures that you don't have two signatures because one of that signatures will be valid and another will be not valid. ARC works in this situation but Dikim doesn't. So now some words about what we want to do actually because well currently we have very, we have working prototype but there are still ways how to improve it. So first of all we have a big pain point now. So there is a special script on Freefold that sends e-mail for developers and unfortunately this script adds this disclaimer just after body of message and as far as you see it's my message. In my message you cannot do it because in my message you need to insert the signatures inside of mine parts and actually if you have multi-part alternative parts you need to insert the signature into both text and HTML and that's actually not easy because there are many situations when you need to insert the signature in right place and that's why I've decided that it could be a good task first from there because I can modify it from there. So that's again some work in progress project that can modify message and it knows about multi-part messages it can insert signatures for text, for HTML parts it knows how to deal with DSNs, it knows how to deal with 7-bit encoding it knows how to deal with Quoted Printable, UTF and so on and so forth so it's pretty much sophisticated. It also can rewrite subject because in some cases you need to do this so literally it can do things that are typically done in Mailman but this list is special, it's not done in Mailman so that's what we want to do and also we think about including here the Kim signatures as well so every mailing list should be signed so this part is also missing now so we don't use Bayesian statistics and that's actually bad because basically it's good to have personal bias for our users we have lots of developers who are good enough and who are eager to train their spam filter and who can do their spam filtering better unfortunately it won't work for everybody because all people have different preferences and the generic idea is actually quite simple so we plan to add generic bias that will be used for everybody with some sort of generic training corpus and on top of it there should be some sort of personalized statistics it can be implemented in NERSPOMD so it stores everything in Redis and it can actually check first of all one thing and then another and adjust things according to users' preferences so that's one of the biggest plan that we have also we want to present better statistics because this one is from my spam trap, it's not from FreeBSD but we want to have something similar for FreeBSD to view how many messages we receive, to view how many things we are sending to view popularity of mailing lists and so on and so forth so this is done by a project called Clickhouse and this is bought from it called Redash so basically it also works almost out of the box the only specific for BSD is actually that this system is more Linux-oriented so probably it will require some sort of manual modifications it is written in Go but I suppose it should work Clickhouse itself has been ported to FreeBSD so if you want to call an oriented database with lots of analytical features and very high performance in terms of storage, you can check Clickhouse it's ported to FreeBSD also you can do some sort of analytics so it's CQ like so actually that's all it was very short talk about how we integrate R-Spom-D in FreeBSD my goal was not to describe lots of things about R-Spom-D because I can talk about this for a very long time and in this particular presentation I wanted to explain this practical implementation so if you have any questions, please ask so we can just use these aliases as a map in R-Spom-D as a map so it's possible to read it as hash so we can just add additional aliases for learning that's how it's done so basically you can just tell something like learn Spom there's some specific key that is available for developers and we will just accept this message and learn Spom for him just for him yes because that's a complicated question because basically we have false positives all Spom filtering system has false positives and if you reject that you won't see it and in many cases unfortunately there are too many broken senders who won't react to the essence and that's actually real life and the idea is actually to sign it and to leave it on users' consideration probably we do some sort of statistical mistakes we really learn this message as ham and we are done if we reject it we cannot do anything the only alternative to this is somehow currentiness but currentining, that's when we support currentining but postfix requires additional daemons as far as I remember to do currentining so that's why we just add flag and tell our users that just use this flag to mark these messages as Spom in your C rules that's all the servers which receive that's possible, yes that's possible but we plan to reduce reject score but we don't plan to reduce it to grey level so there will be definitely some sort of grey zone obviously we are trying to reduce this grey zone as well but currently we have too high reject threshold if we reduce reject threshold this problem should be resolved because nobody would reduce reputation for grey mail yes, that's great yes does it deal with being the string you actually put in name and the encoding of the mind part? so what it does so internally in Spom this text is decoded so this code is operating with decoded parts and actually for text parts it's recommended to use quoted printable in any case so that's general recommendation and that's from the JSON codes encoded printable forgetting about the original encoding why do so? because first of all if we have 8-bit encoding it doesn't make any sense because many relays doesn't support 8-bit encoding by default so they don't announce it and what should we do actually with 8-bit encoding sending it via these relays so 8-bit is not an option Bay 64 is done by some... well I would have said broken ESPs but actually many people are sending text in Bay 64 but actually it adds lots of overhead so the only sane solution from my point of view is just to encode everything in quoted printable and furthermore it encodes everything in F8 so if the original encoding was like something like Koi 8 that dash R it should recode at UTF-8 that's limitation but unfortunately I don't have to support all these zoo of encodings because some of encodings are absolutely brain damaged to tell the truth yes it will recode at UTF-8 but it works for UTF-8 so that's the point here actually in GB something and ChivGIS you can encode Kirillic letters as well so they are full-unicode compatible but again that's what I'm talking about brain damage encoding right Danian? and the second has policy of known if you break a signature with policy of known there are no implications and adding another decaying signature for your domain which says it does not compensate for breaking the other signature? well it depends but the only solution for this problem for this particular problem is actually ARC I don't see any resolution about how to fix broken DMARC when you have to do it because of like standards for instance you need to write this GDPR node I think everybody knows about that but you have to do it and you have to break from address? change a sender yes that's another option but sender writing is actually something that also quite difficult to implement properly from doable doable but 8 seconds yes thank you very much