October 2003

SpamAssassin Bayes training in a single-user procmail setup

Update 2005-04-17: Please also check out using IMAP for learning and the wiki entries on ProcmailToForwardMail, (which contains a slightly updated script) and the very detailed directions at SingleUserUnixInstall.

SpamAssassin is currently the most effective spam filter. For me, it correctly marks several hundred messages a day as spam, with nearly no false positives, and only a couple false negatives. Those false negatives — spam that gets through — should be avoidable by training the Bayes algorithm that SpamAssassin uses. Unfortunately, using Outlook working with Exchange server as my mailer makes this incredibly hard to do.

Note that Bayes does not need to be hand trained in order to work well. The magic of SpamAssassin is that the Bayes bootstraps its learning off of the several hundred non-Bayes rules, including the use of DNS blocklists. So, spammy messages that hit certain rules train the Bayes to find similar spam in the future even that doesn’t hit those rules. Thus, the purpose of this procmail rule is simply to enable mistake-based training, which catches the small percentage of false negatives that might otherwise slip through.

Like many SpamAssassin users, I forward my mail through a Unix account, where I’ve configured procmail to filter the message through SpamAssassin and then forward it to my private address on another machine.

The trick for Bayes training is to add some extra procmail rules to specify special processing for training messages. The following is based on having a catchall address for all mail sent to example.com, so I can trigger the bayes training by sending mail to spam@example.com and ham@example.com. It is left as an exercise for the reader to create an alternative script that triggers based on a passphrase added to the subject, and uses formail to remove that passphrase before passing the message to sa-learn.

Note that this setup still only works passably with Outlook and Exchange, because even resending the message causes a new Message-ID header to be created and the old Received headers to be lost. Other headers are still carried over. To trigger Bayes learning from Outlook on false negatives, choose Action: Resend this Message (you have to remove any From and CC headings and change the To field to spam@example.com). Note that nearly every other mailer (except for AOL) supports real redirects; see the bottom of this site.

Here’s the .procmailrc:
Continue Reading »

Hacking

Comments (3)

Permalink

Police Subdue a Tiger in Harlem Apartment

I must be a little strange if this story only increases my appreciation of NYC:

To the sounds of enormous jungle roars, a police sniper rappelled down the side of a Harlem apartment building yesterday and fired tranquilizer darts through an open fifth-floor window to subdue — seat belts, please — a 350-pound Bengal tiger.

The daring, and creative, bit of sharpshooting helped end an episode in which the New York Police Department, unaccustomed to bagging big game, nonetheless managed to sedate the beast. Officials planned to send the tiger, temporarily being held at the Center for Animal Care and Control on 110th Street, to a conservancy in Ohio….

It was shortly before 4:30 p.m. when the police sniper, Officer Martin Duffy, armed with a dart gun and a rifle with live ammunition, began to rappel down toward the window. He fired one dart a few minutes later, which drew a knee-shaking roar from inside the apartment….

As hundreds of onlookers gathered on the street, some began to wonder if this urban big cat would get along so well in the less cosmpolitan reaches of Ohio. “My concern is that the city cat won’t make it in the country,” said Lynnette Braxton, 49. “He’s going to have no jazz, no hip-hop. He’s going to miss the Harlem Renaissance.”

No explanation of how the cat got there (along with a 4-5 ft. alligator-like reptile called a caiman). Presumably, the owner (now in custody) brought them in when they were much smaller. Compare the size of hand and paw:

tiger.jpg

Cities

Comments (0)

Permalink