Dec 17 2004
HOW TO COMBAT SPAM WITH OPEN SOURCE
At around 3000 spam messages per month, reading email has gone beyond minor annoyance and into the realm of major pain. At first, I was pretty happy with Mozilla’s built in anti-spam filter, a Bayesian algorithm that to date, catches almost 90% of the spam I receive. However, as I am now getting around 100 spam emails per day, even the small percentage that creep past Mozilla’s filter are starting to outnumber legitimate email. (13-Apr-2004: I have recently been receiving upwards of 1000 spam messages per day and this solution is still holding strong.)
What’s the plan?
This is “Phase 2″. I want to really turn down the screws on filtering out spam, but I’m not quite ready to implement strict whitelisting (only accepting mail from addresses that I expect it to come from). For this, I have chosen to try out Spam Assassin. It uses fuzzy logic to determine if a message is spam, and performs an amazing array of tests. I’m also trying Vipul’s Razor, a collaborative spam detection network, which Spam Assassin can use as one of its tests. More possibilities include installing DCC, which Spam Assassin can also use as a test, and Ricochet for reporting of junk mail. It is not *terribly* difficult to install Spam Assassin if you happen to be receiving email from a UN*X-ish server that you can install software on. (Admittedly, most people do not fall in this category) A quick google search started me off at this great page on configuring systemwide Postfix+Procmail+SpamAssassin which answers the question: So I have all the software installed - now what? After setting everything up, I ran into a reference, on the Spam Assassin Documentation page, logically enough, describing How to setup Postfix+custom filters+SpamAssassin+AnomySanitizer, which is a slightly more complex setup, but worthwhile read.
How’s it working?
See the Updates and Changes section at the end of this document. Basically, I am now getting only a few spam messages per day with this setup, if that. There have not yet been any false positives, which is perhaps even more important than catching every last spam email. On average, it’s certainly less than 25% of the previous volume with Mozilla’s filter, feels more like 10%. The initial results after training the Spam Assassin Bayesian classifier were nothing short of phenomenal: 205 emails that day, zero slipped by Spam Assassin, and zero false positives.
0) Turn off loading of external images in your email client
To prevent spammers from using “Web Bugs”, an tag with your email address encoded in the src attribute (which allows them to confirm that you looked at their email and the email address is valid), turn off auto-loading of external images. In Mozilla/Netscape mail, this setting is found under Edit->Preferences->Privacy & Security->Images Check the “Do not load remote images in Mail & Newsgroup messages” checkbox. Currently, you must also select the “Accept images that come from the originating server only” radio, which affects the browser as well, unfortunately. After making these changes, click OK.
1) Install SpamAssassin
I’m running RedHat 7.2 and prefer to use RPMs for administrative simplicity, so I went to the Download Page for Spam Assassin and clicked on the RPMs link. The README.txt said none were available for my version or RedHat (7.2), so I grabbed the source RPM. This requires a 2 step process:
1) Use rpmbuild to build the source and generate a binary RPMs
> rpmbuild spamassassin-2.53-1.src.rpm
2) Install the binary RPMs
> cd /usr/src/redhat/RPMS/i386
# rpm -i perl-Mail-SpamAssassin-2.53-1.i386.rpm
I ran into difficulty here as the Mail::SpamAssassin module requires HTML::Parser. I didn’t want to bother with RPMs for this, as the dependencies between perl modules is complex and finding RPMs to match them all is unlikely, or at least very painful. So, I used CPAN to install the module instead. If you haven’t used CPAN before, you’ll have to go through a few questions to configure it for your system before you can run the install command:
# perl -MCPAN -e shell
# > install HTML::Parser
…
While in CPAN, I upgraded all of the other required modules to the latest ones. Below is the list of modules required for Spam Assassin (SA) AND modules required for Razor (RZ). There were quite a list of modules that had to be installed or upgraded to satisfy all the dependencies. I don’t remember what order these went in - I just installed those recommended for SA or RZ, and read through the messages that scrolled by, adding dependencies or recommendations as I went.
ExtUtils::MakeMaker (SA)
File::Spec (SA, couldn’t upgrade as perl < 5.8)
Pod::Usage (SA)
HTML::Parser (SA)
Sys::Syslog (SA, couldn't upgrade as perl < 5.8)
DB_File (SA)
Net::DNS (SA, RZ - FAILs tests, say no to 'test live DNS?')
Digest::SHA1 (RZ)
Net::Ping (RZ)
Time::HiRes (RZ)
Test::More (RZ)
Digest::Nilsimsa (RZ)
Digest::MD5 (RZ)
Digest::HMAC (RZ)
URI (RZ)
Data::HashUtils (Dep)
Devel::CoreStack (Rec)
Getopt::Long (Dep)
MD5 (Rec)
MIME::Base64 (Dep)
PodParser (Dep)
Storable (Rec)
Test (Rec)
Test::Harness (Dep)
Test::Simple (Dep)
...
# > install URI
# > quit
Now, we definitely have HTML::Parser, but RPM doesn’t know about it - so force the installation (or we could install Mail::SpamAssassin with CPAN and force install of the spamassassin and spamassassin-tools RPMs)
# rpm -i –nodeps perl-Mail-SpamAssassin-2.53-1.i386.rpm
# rpm -i spamassassin-2.53-1.i386.rpm
# rpm -i spamassassin-tools-2.53-1.i386.rpm
These rpms are nice enough to install a spamd startup script in /etc/init.d, so you just have to set spamd to run on startup, then start it up.
# /sbin/chkconfig –add spamassassin
# /sbin/runlevel
N 3 < -- use this number, or your default runlevel, next:
# /sbin/chkconfig --level 3 spamasssassin on
# /etc/init.d/spamassassin start
2) Install Vipul's Razor
I could not easily find RH 7.2 RPMs for this either, so I went ahead and downloaded the latest razor-agents from the Razor Sourceforge download page. Skip getting razor-agents-sdk as all of the perl modules were installed in the previous step. From here, simply untar, and do the standard perl app installation:
> tar -zxvf razor-agents-2.22.tar.gz
> perl Makefile.PL
> make
> make test
# make install
Then, following the razor install instructions:
# razor-client
> razor-admin -create
3) Tell procmail to use SpamAssassin
This step is pretty easy. If you don’t have an /etc/procmailrc, create it and add a command to process incoming mail with spamc:
# cat > /etc/procmailrc
:0fw
| /usr/bin/spamc
4) Tell postfix to use procmail, and reload
Now you have to tell postfix to use procmail. This may already be the case. To check, edit the file /etc/postfix/main.cf Look for the search for the string mailbox_command, and make sure the file contains this line:
mailbox_command = /usr/bin/procmail
Then, have postfix reload it’s config files:
/usr/sbin/postfix reload
5) Test your setup!
Send some test emails, spam and not, to verify that your new setup works as expected!
Updates and Changes
—-5/9/2003, 12:00PM: Installed Spam Assassin
Over the next three days:
204 spam messages
Slipped by spam assassin: 7 (2 x criticsVoice)
Of those, slipped by mozilla: 3
False Positives, Spam Assassin: 0
False Positives, Mozilla 0
—-5/12/2003, 12:01AM: Trained SA on 1000’s of spam/ham saved messages
> sa-learn –spam –mbox
> sa-learn –ham –mbox
Over the next 36 hours:
205 spam messages
Slipped by Spam Assassin: 0
Of those, slipped by Mozilla: 0
Slipped past Mozilla but not Spam Assassin: 2
(The Spam Assassin-wrapped email appeared in my Inbox)
False Positives, Spam Assassin: 0
False Positives, Mozilla: 2 (sprintpcs, SFBay Movie Goers)
—-5/13/2003, 12:15PM: Turned off Mozilla spam filter, setup filter for SA
As evidenced by the previous day, the Mozilla default spam filter was causing more problems than it was solving, so I decided to turn it off (Tools->Junk Mail Controls…->uncheck ‘Enable Junk Mail Controls’), and rely entirely on Spam Assassin’s filtering. This required setting up a Mozilla email filter to move Spam Assassin-marked spam to a Junk folder, similar to how Mozilla’s default spam filter does things. (Tools->Message Filters->New)
Setup a Mozilla filter to catch Spam Assassin-marked spam:
Match all of:
Custom Header: “X-Spam-Flag” is “YES”
Sender isn’t in my address book Personal Address Book
Sender isn’t in my address book Collected Addresses
Also, a filter to catch spam email spoofed “from me”:
(needed since I’m in my address book)
Match all of:
Custom Header: “X-Spam-Flag” is “YES”
Sender ends with:
Over the next day:
False Positives, Spam Assassin: 0
Slipped by Spam Assassin: 2
Tiffany Bubbles, SA Bayes messed up, Razor caught
Taught SA Bayes with sa-learn
Critics Voice, Bayes messed up
Taught SA Bayes with sa-learn
—-5/15/2003: Configured Postfix to block known spammer domain
Update to main.cf:
smtpd_client_restrictions = check_client_access hash:/etc/postfix/access
smtpd_sender_restrictions = hash:/etc/postfix/access
Update to access:
criticsvoice.com 550 Go Away
.criticsvoice.com 550 Go Away
Rebuild map and reload Postfix config
# postmap /etc/postfix/access
# /usr/sbin/postfix reload
I tested this setup by manually sending some test messages to the server, and they were blocked. Also, later in the day, when criticsvoice.com tried to connect for it’s daily delivery of spam, it was rejected, success!
Manual Test:
> telnet
HELO mailcv-ne1.criticsvoice.com
MAIL FROM: prof@criticsvoice.com
RCPT TO:
DATA
This is a test
.
QUIT
—-5/19/2003
criticsvoice.com is no longer attempting delivery to our server, so I added some more spammer domains in the unlikely chance they will follow suit:
Update to access:
criticsvoice.com 550 Go Away
.criticsvoice.com 550 Go Away
savingshaus.com 550 Go Away
.savingshaus.com 550 Go Away
lolslideshow.com 550 Go Away
.lolslideshow.com 550 Go Away
enquired.net 550 Go Away
.enquired.net 550 Go Away
offerdelivery.com 550 Go Away
.offerdelivery.com 550 Go Away
acumenmedia.com 550 Go Away
.acumenmedia.com 550 Go Away
# postmap /etc/postfix/access
# /usr/sbin/postfix reload
—-5/20/2003 Update Spam Assassin to Latest Version, Install DCC, Update learn/report script
Noticed there was a new version of Spam Assassin out now, so downloaded the latest source RPMs, built, and upgraded - easy!
> wget http://spamassassin.org/released/RPMs/spamassassin-
# rpmbuild –rebuild spamassassin-
# cd /usr/src/redhat/RPMS/i386
# rpm -U –nodeps perl-Mail-SpamAssassin-
# rpm -U spamassassin-
Shutting down spamd: [ OK ]
Starting spamd: [ OK ]
# rpm -U spamassassin-tools-
Going through messages, I noticed that missed spam was frequently a result of incorrectly being marked as good email by th Bayesian classifier, but it was still 90%+ on Razor. Seems Razor does not have much weight in determining the spam score. I went though the Razor config files, and in /usr/share/spamassassin/50_scores.cf noticed that Razor has at most a score of under 2, whereas DCC gets a score of over 3 and Pyzor gets up to 4.4! Decided to download, build, and install DCC (didn’t bother to look for RPMs). Pyzor requires Python 2.2.1, which will break me out of RedHat 7.2, so I’m holding off for now…
> wget http://www.rhyolite.com/anti-spam/dcc/source/dcc-dccd.tar.Z
> tar -zxvf dcc-dccd.tar.Z
> cd dcc-dccd-
> ./configure
# make install
While browsing the SpamAssassin FAQ, I whipped up a little script to bulk teach Spam Assassin Bayesian classifier while at the same time reporting the spam to the various collective checksum services. You might also want to check out this much more complete spam handling script
#!/usr/bin/perl
$confirmed_ham_mbox = ‘mail/confirmed_ham’;
$confirmed_spam_mbox = ‘mail/confirmed_spam’;
$processed_ham_mbox = ‘mail/processed_ham’;
$processed_spam_mbox = ‘mail/processed_spam’;
if ( -s $confirmed_spam_mbox) {
# Report and learn from all the manually confirmed spam
ystem(”formail +1 -s spamassassin -r < $confirmed_spam_mbox");
movemail($confirmed_spam_mbox, $processed_spam_mbox);
} else {
print "No confirmed spam mbox ($confirmed_spam_mbox) found.\n";
}
if ( -s $confirmed_ham_mbox) {
# learn from all manually confirmed ham
ystem("formail +1 -s sa-learn --ham --no-rebuild --single < $confirmed_ham_mbox");
ystem('sa-learn --rebuild');
movemail($confirmed_ham_mbox, $processed_ham_mbox);
} else {
print "No false positive mbox ($confirmed_ham_mbox) found.\n";
}
# Move confirmed mail to another mbox so we don't process it again.
# spamassassin handles multiple reporting correctly (ignores), but
# this will speed things up. Note: $tobox mbox must already exist!
sub movemail {
my ($frombox, $tobox) = @_;
ystem("formail +1 -s < $frombox >> $tobox”);
ystem(”mv $frombox $frombox.bak”);
ystem(”formail -1 -s < $frombox.bak >> $frombox”);
# ystem(”rm -r $frombox.bak”);
}
sub ystem {
$cmd = shift;
print “Calling: $cmd\n”;
system $cmd;
}
In order to report spam to Razor, you must create an identity to be used in ranking confidence in your ratings.
> razor-admin -register
Error 202 while performing register, aborting.
I got an error the first time, but tried the command again, just for the heck of it, and it worked that time…
> razor-admin -register
Register successful. Identity stored in /home/
—-5/23/2003 Where did the spam go?
Wow, the latest version is awesome! I’ve only recieved one spam in the past three days. I’m actually starting to miss it! ;) My blocked domains list has grown to the following:
Contents of /etc/postfix/access:
criticsvoice.com 550 Go Away
.criticsvoice.com 550 Go Away
savingshaus.com 550 Go Away
.savingshaus.com 550 Go Away
lolslideshow.com 550 Go Away
.lolslideshow.com 550 Go Away
enquired.net 550 Go Away
.enquired.net 550 Go Away
offerdelivery.com 550 Go Away
.offerdelivery.com 550 Go Away
acumenmedia.com 550 Go Away
.acumenmedia.com 550 Go Away
asp-platform.com 550 Go Away
.asp-platform.com 550 Go Away
floortwo.com 550 Go Away
.floortwo.com 550 Go Away
etracks.com 550 Go Away
.etracks.com 550 Go Away
shoppersville.net 550 Go Away
.shoppersville.net 550 Go Away
dbhits.com 550 Go Away
.dbhits.com 550 Go Away
spyvalues.com 550 Go Away
.spyvalues.com 550 Go Away
springeoffers.com 550 Go Away
.springeoffers.com 550 Go Away
thatsoundsgood.net 550 Go Away
.thatsoundsgood.net 550 Go Away
optnetwork.net 550 Go Away
.optnetwork.net 550 Go Away
neatfunstuff.net 550 Go Away
.neatfunstuff.net 550 Go Away
pandabearperks.com 550 Go Away
.pandabearperks.com 550 Go Away
reminderpro.com 550 Go Away
.reminderpro.com 550 Go Away
123greetings.info 550 Go Away
.123greetings.info 550 Go Away
online-shop-exchange.com 550 Go Away
.online-shop-exchange.com 550 Go Away
getbiggerfaster.org 550 Go Away
.getbiggerfaster.org 550 Go Away
clix.pt 550 Go Away
.clix.pt 550 Go Away
# postmap /etc/postfix/access
# /usr/sbin/postfix reload
—-6/1/2003 Still holding strong, bye-bye blocklist
Still no false positives, and getting at most a spam or two every few days. I got tired up constantly updating my list of blocked domains, so I just got rid of it! Spam Assassin still catches almost all of the rogue emails anyway, and it’s much more resource efficient on my part to just confirm a few more spam messages then to add an entry to the blocked domains file and recompile everytime a repeat offender crops up.
Contents of /etc/postfix/access:
# postmap /etc/postfix/access
# /usr/sbin/postfix reload
—-1/12/2004 Latest version strongest yet except Habeas weakness
After seeing the number of spams slowly creep upwards to a few a day, (under a torrent of now nearly 200 spam messages per day, I wasn’t surprised) I upgraded to the latest Spam Assassin a few weeks back. All spam was again silenced (this time at the expense of a couple false positives in as many weeks - both were mass email forwards from friends). However, I’ve noticed a recent rash of identical messages creeping through. Examining the headers, it seems a Spam Assassin rule is being used against it. Every single one of them contained these headers.
X-Habeas-SWE-1: winter into spring
X-Habeas-SWE-2: brightly anticipated
X-Habeas-SWE-3: like Habeas SWE ™
X-Habeas-SWE-4: Copyright 2002 Habeas ™
X-Habeas-SWE-5: Sender Warranted Email (SWE) ™. The sender of this
X-Habeas-SWE-6: email in exchange for a license for this Habeas
X-Habeas-SWE-7: warrant mark warrants that this is a Habeas Compliant
X-Habeas-SWE-8: Message (HCM) and not spam. Please report use of this
X-Habeas-SWE-9: mark in spam to
If you’ve read the Habeas site, the fact that these are being included in spam now should not come as a suprise. In fact, I’d wager that more spam than legitimate messages will bear said headers over the life of the system. I have reported all the messages, but I won’t waste my time doing so for long. That, of course, will be the downfall of the Habeas system. If only they had decided to use a cryptographically signed header that SA could verify, I *might* expect it to work…
Message sent to Habeas after forwarding illegal spam messages:
I get a *lot* of spam.
I currently use Spam Assassin (SA) as my spam filter. It is *very* good.
In the past few days, 100% of the spam that has slipped through the filter has contained your headers. In fact, SA has a “learning” Bayesian filter and every message containing your headers now will come through with 99% “learned” probability of being spam.
Does Habeas keep statistics on the number of spam emails reported versus the number of legitimate email bearing your headers? I’m beginning to think that the numbers would point towards classifying all mail containing your headers as spam.
Why didn’t you include a mechanism to cryptographically sign the headers so that smart spam filters could verify the email is legit?
Hoping for your success,
-Eric
—-4/13/2004 Massive Flood: dictionary attacks
The nature of my spam problem changed this week. I went from getting 100 to 200 spams per day to getting 2500 spams in the past 2 days. A large number of them are similar in nature to messages I’ve received before, but about 2/3 are due to a dictionary attack against one of my domains. One or more spammer(s) is sending the same message to dozens of random names @ domain. I currently have a ‘catchall’ setup where all of these messages are accepted and delivered to my account. I’m seriously contemplating refusing all unknown recipients. In general, this has not caused much additional work for me as Spam Assassin has held strong only allowing a handful per day to sneak by.
Edit | Source | Template
