The problem of backscatter, part 8 - Why is it so hard to stop?
I came across the following diagram at this site, and it nicely summarizes the issue of backscatter spam:
Getting a single piece of backscatter spam is one thing, getting dozens, hundreds or even thousands of them is a major problem. While spammers may be nefarious in attempting to spam indirectly, what's more annoying is that legitimate hosts are sending you piles of messages that are cluttering up your inbox.
So why is backscatter so difficult to defend against? Here are some reasons:
IP reputation analysis doesn't work - Spammers who spam from botnets have a weakness, public RBLs like Spamhaus will list them and so content filters can reject all mail from them. In the case of backscatter, the sending mail server is not a bot and is known to send good mail so it doesn't belong on a blacklist (the good blacklists, anyway).
Finding sources of mail servers that send exclusively NDR backscatter is one thing, but if you ban all mail servers that send you backscatter, you will end up blocking a lot of legitimate mail.
Sender reputation doesn't work - or rather, regular sender reputation doesn't work. When most MTAs send backscatter, they usually send with an SMTP MAIL FROM as <>. This is so that if they send the NDR and their NDR bounces, the recipient MTA doesn't bounce it back again; you can't bounce to a null sender <>.
Traditional sender reputation assumes that the inbound message is coming to you directly from another sender. So, you do an SPF check on the SMTP MAIL FROM, but since the sender is empty, you can't verify the source of the message. The spam filter is forced to rely on something else.
Content filtering is more difficult - NDR messages are a pain to handle. You can't do regular sender or IP reputation analysis on NDRs in the SMTP conversation, so you have to accept the body contents of the message. Then, if you want to do the above you need to parse the body contents of the message (and not the message headers of the NDR itself) in order to extract the information you need.
If you were to do this your spam filter needs specialized logic to recognize that the message is a bounce message and to treat it differently to extract the tokens in the message. This is trickier than it sounds because different bouncing MTAs will bounce messages differently. Remember the RFC guidelines? Hopefully, the bouncing MTA sends back the message contents including the headers. If not, or if it changes them in transit or munges them, doing header analysis is not going to work too well.
Regular content filtering is more difficult - If you don't want to go to the trouble of extracting many different tokens and running checks that you would normally do during the SMTP conversation (ie, SPF or DKIM checks or IP reputation), you could default back to regular content filtering. Your inbound spam filter can simply examine the message content and if there is spam, filter out the message regardless of whether or not it is an NDR.
The problem here is that often times, content filters will detect the spam and mark it as spam but it will also detect that the message is a bounce and de-spamify its earlier spam classification. In other words, the filter is intelligent enough to detect that this is a bounce message and possibly legitimate which lowers the overall spam score of the total message with spam attached. This leads to inconsistent spam filtering of backscatter. Not all of it gets through to the user's inbox, but some of it does.
That summarizes the problem of backscatter. Relying on regular inbound mail filtering to detect and filter backscatter introduces problems because NDR messages are different than regular mail. They are notifications. In order catch them you need to do something different than the way you normally catch spam.