User Tools

Site Tools


old:classic:spamassassin_to_identify_spam

SpamAssassin to Identify Spam

Outdated page.

Updated information at: Classic Linux.

Also please see: https://discourse.rahul.net/.


The SpamAssassin spam-detection program scans all incoming email in the classic_linux Environment. SpamAssassin only assigns a spam score to each message. It does not block any message or refile any message into any spam folder. You may in your .procmailrc file take any preferred actions based on the spam score assigned by SpamAssassin.

This implementation of SpamAssassin includes Bayesian (i.e., probabilistic) spam filtering that uses a central database.

Every email message you receive will end up containing a spam rating added by SpamAssassin. Look for headers beginning with 'X-Spam-', for example:

  X-Spam-Flag:
  X-Spam-Report:
  X-Spam-Status:
  X-Spam-Level:

Based on these headers you can have other code in your .procmailrc file that will refile or delete spam based on your preferences.

Note: Mail arriving for certain Nojunk domains, such as boxmail.com, might not be scanned by SpamAssassin at this time.

You can adjust your SpamAssassin preferences by editing the user_prefs file in your .spamassassin directory. If this does not already exist, SpamAssassin will automatically create one for you. You can also use a Usermin web interface to update your SpamAssassin preferences as describe on the page for oxygen.rahul.net.

Use the above web interface to adjust the following settings to your satisfaction:

Allowed and Denied Addresses -> 
  From: addresses to never classify as spam
  From: addresses to always classify as spam
Spam Classification ->
  Hits above which a message is considered spam
  Languages in email that are not considered potential spam
  Character sets in email that are not considered potential spam

There are a large number of other SpamAssassin settings, but we recommend you change them only if you consider yourself to be an advanced SpamAssassin user.

SpamAssassin: How to Enable

Since SpamAssassin is automatically active, you need do nothing to enable it. However, if in the past you had explicitly enabled SpamAssassin by adding any lines into your .procmailrc file, you should remove those lines, to avoid making SpamAssassin run twice.

If you have lines like the following in your .procmailrc file, or any other lines that invoke “spamassassin” or “spamc”, you should remove them or comment them out.

# spamassassin beta-test {{{

# back up
:0c:
  ./Mail/Backupmail

# filter
:0fw
  | spamc

# }}}

SpamAssassin: Bayesian Filtering

Bayesian filtering involves SpamAssassin learning which email you consider to be spam and which email you consider to be non-spam. SpamAssassin will automatically try to initialize the Bayesian database from some of your email messages by guessing which ones are spam and which ones are not. However, this will not give the best results. For best results, you should yourself teach SpamAssassin before Bayesian filtering will become fully effective.

Collect non-spam messages in a mailbox, and then use this command:

% sa-learn --progress --mbox --ham <filename>

where <filename> is the filename of the mailbox containing non-spam (termed 'ham').

Collect spam messages in another mailbox, and then use this command:

% sa-learn --progress --mbox --spam <filename>

The sa-learn command will automatically ignore duplicates, so you can efficiently run it on the same mailbox(es) over and over again.

After sa-learn has seen a few hundred spam messages, and a few hundred non-spam messages, Bayesian filtering will become active. Then you will notice that the X-Spam-Status lines in email will includes tokens such as BAYES_99 indicating the Bayesian probability of the message being spam.

SpamAssassin Security

SpamAssassin uses a shared database and a “spamd” daemon that changes to your username before processing your mail. Due to the nature of the implementation, a dedicated adversary logged into the machine could cause spamd to read your user_prefs file, or possibly add addresses to your whitelist, or read or write your Bayesian tokens. Since your Bayesian tokens are stored in a hashed form, it is unlikely that anybody else will be able to tell what text is contained in your email.

In practice we don't expect any of this to be a problem. In no case does SpamAssassin allow anybody to access any of your data outside of your .spamassassin directory and outside of the Bayesian database.

SpamAssassin: Upgrade from Private Installation

The following page is obsolete, but is available for reference. Please see:

old/classic/spamassassin_to_identify_spam.txt · Last modified: 2021/01/30 04:54 by admin