User Tools

Site Tools


hints:teaching_system_recognize_spam_not

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
hints:teaching_system_recognize_spam_not [2021/02/21 07:23]
admin [Teaching the system from the command line]
hints:teaching_system_recognize_spam_not [2021/03/06 15:19]
admin [Teaching the system to recognize spam or not-spam]
Line 17: Line 17:
 | 2.0 BAYES_80 BODY: Bayes spam\\ probability is 80 to 95% [score: 0.8325]  | Spam with 83.25% certainty,\\ spam score increased by 2.0    | | 2.0 BAYES_80 BODY: Bayes spam\\ probability is 80 to 95% [score: 0.8325]  | Spam with 83.25% certainty,\\ spam score increased by 2.0    |
 | BAYES_HAM(-2.73) [98.80%]                                                 | Not-spam with 98.8% certainty,\\ spam score decreased by 2.73  | | BAYES_HAM(-2.73) [98.80%]                                                 | Not-spam with 98.8% certainty,\\ spam score decreased by 2.73  |
 +
 +Note that anything recognized as spam with high certainty is immediately rejected during incoming SMTP, so you will never see it (see [[/hints/classic_linux_mail_flow|Classic Linux mail flow]]).  Most of the time you will see these BAYES headers only when email was not recognized as spam with a high enough certainty.
 ===== How the system learns ===== ===== How the system learns =====
  
Line 35: Line 37:
 Or select a bunch of ham (not-spam) messages, and forward them **as an attachment** to the ham-recognizing email address: <hamtrap@rahul.net>. Or select a bunch of ham (not-spam) messages, and forward them **as an attachment** to the ham-recognizing email address: <hamtrap@rahul.net>.
  
-These email addresses will automatically tokenize the contents of whatever they get, an add these tokens into the database as spam or ham tokens respectively.+These email addresses will automatically tokenize the contents of whatever they get, and add these tokens into the database as spam or ham tokens respectively.
  
 **Ham tokens are important.** They let the Bayesian system learn how to recognize legitimate email, thus subtracting from its spam score, and making it less likely to be erroneously classified as spam. Make it a point to send at least as much ham as you send spam. **Ham tokens are important.** They let the Bayesian system learn how to recognize legitimate email, thus subtracting from its spam score, and making it less likely to be erroneously classified as spam. Make it a point to send at least as much ham as you send spam.
Line 46: Line 48:
  
 The most useful messages with which to teach the system **are those messages that the system recognized incorrectly**. If the system already correctly recognized spam or ham, it won't learn a lot. If the system incorrectly recognized ham as spam or spam as ham, that is where it most needs to learn. The most useful messages with which to teach the system **are those messages that the system recognized incorrectly**. If the system already correctly recognized spam or ham, it won't learn a lot. If the system incorrectly recognized ham as spam or spam as ham, that is where it most needs to learn.
 +
 +Sending the same thing twice does no harm. The system will recognize duplicate content and not learn from it twice.
  
 If you accidentally send spam to the hamtrap address, or vice versa, just send the same content again this time to the correct address. The system will unlearn the incorrect content and re-learn it the right way. If you accidentally send spam to the hamtrap address, or vice versa, just send the same content again this time to the correct address. The system will unlearn the incorrect content and re-learn it the right way.
Line 60: Line 64:
 Sending the same thing twice does no harm. The system will recognize duplicate content and not learn from it twice. Sending the same thing twice does no harm. The system will recognize duplicate content and not learn from it twice.
  
-If you accidentally feed them spam instead of ham or vice versa, just repeat with the correct command. The system will unlearn the incorrect content and re-learn it the right way.+If you accidentally feed these commands spam instead of ham or vice versa, just repeat with the correct command. The system will unlearn the incorrect content and re-learn it the right way.
  
 ===== Debugging the learning ===== ===== Debugging the learning =====
Line 77: Line 81:
 |    ... with verbose output |'' sa-learn -D %%--%%ham filename ''| |    ... with verbose output |'' sa-learn -D %%--%%ham filename ''|
  
- 
-All these commands will recognize duplicate content and not learn from it twice. 
  
 These commands (unlike '' spamtrap '' and '' hamtrap '') will also take one or more directory names, and they will cause every file in each directory to be processed. These commands (unlike '' spamtrap '' and '' hamtrap '') will also take one or more directory names, and they will cause every file in each directory to be processed.
Line 84: Line 86:
 If no filename is specified, these commands will read from their standard input. If no filename is specified, these commands will read from their standard input.
  
-If you accidentally feed them spam instead of ham or vice versa, just repeat with the correct command. The system will unlearn the incorrect content and re-learn it the right way.+Sending the same thing twice does no harm. The system will recognize duplicate content and not learn from it twice. 
 + 
 +If you accidentally feed these commands spam instead of ham or vice versa, just repeat with the correct command. The system will unlearn the incorrect content and re-learn it the right way.
hints/teaching_system_recognize_spam_not.txt · Last modified: 2021/03/06 21:07 by admin