User Tools

Site Tools


hints:teaching_system_recognize_spam_not

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
hints:teaching_system_recognize_spam_not [2021/02/21 07:23]
admin [Teaching the system from the command line]
hints:teaching_system_recognize_spam_not [2021/03/06 21:07] (current)
admin [How you can teach the system]
Line 17: Line 17:
 | 2.0 BAYES_80 BODY: Bayes spam\\ probability is 80 to 95% [score: 0.8325]  | Spam with 83.25% certainty,\\ spam score increased by 2.0    | | 2.0 BAYES_80 BODY: Bayes spam\\ probability is 80 to 95% [score: 0.8325]  | Spam with 83.25% certainty,\\ spam score increased by 2.0    |
 | BAYES_HAM(-2.73) [98.80%]                                                 | Not-spam with 98.8% certainty,\\ spam score decreased by 2.73  | | BAYES_HAM(-2.73) [98.80%]                                                 | Not-spam with 98.8% certainty,\\ spam score decreased by 2.73  |
 +
 +Note that anything recognized as spam with high certainty is immediately rejected during incoming SMTP, so you will never see it (see [[/hints/classic_linux_mail_flow|Classic Linux mail flow]]).  Most of the time you will see these BAYES headers only when email was not recognized as spam with a high enough certainty.
 ===== How the system learns ===== ===== How the system learns =====
  
Line 35: Line 37:
 Or select a bunch of ham (not-spam) messages, and forward them **as an attachment** to the ham-recognizing email address: <hamtrap@rahul.net>. Or select a bunch of ham (not-spam) messages, and forward them **as an attachment** to the ham-recognizing email address: <hamtrap@rahul.net>.
  
-These email addresses will automatically tokenize the contents of whatever they get, an add these tokens into the database as spam or ham tokens respectively.+These email addresses will automatically tokenize the contents of whatever they get, and add these tokens into the database as spam or ham tokens respectively.
  
 **Ham tokens are important.** They let the Bayesian system learn how to recognize legitimate email, thus subtracting from its spam score, and making it less likely to be erroneously classified as spam. Make it a point to send at least as much ham as you send spam. **Ham tokens are important.** They let the Bayesian system learn how to recognize legitimate email, thus subtracting from its spam score, and making it less likely to be erroneously classified as spam. Make it a point to send at least as much ham as you send spam.
Line 41: Line 43:
 If you are forwarding a single message to <spamtrap@rahul.net> or <hamtrap@rahul.net>, you can send it as an attachment or you can use the **bounce** or **resend** feature of some mail clients. All of these are equally good. If you are forwarding a single message to <spamtrap@rahul.net> or <hamtrap@rahul.net>, you can send it as an attachment or you can use the **bounce** or **resend** feature of some mail clients. All of these are equally good.
  
-A normal forward not as an attachment often does not include the complete headers, so might not be as useful (but it's still better than nothing).+A normal forward not as an attachment often does not include the complete headers, so it is less usefulbut it's still better than nothing.
  
 **Budget your time.** Life is short. Spend no more than, say, 2–3 minutes per week sending spam or ham. You can do it once a week, or you could spend just 5–10 seconds a day quickly forwarding two or three selected messages. **Budget your time.** Life is short. Spend no more than, say, 2–3 minutes per week sending spam or ham. You can do it once a week, or you could spend just 5–10 seconds a day quickly forwarding two or three selected messages.
  
 The most useful messages with which to teach the system **are those messages that the system recognized incorrectly**. If the system already correctly recognized spam or ham, it won't learn a lot. If the system incorrectly recognized ham as spam or spam as ham, that is where it most needs to learn. The most useful messages with which to teach the system **are those messages that the system recognized incorrectly**. If the system already correctly recognized spam or ham, it won't learn a lot. If the system incorrectly recognized ham as spam or spam as ham, that is where it most needs to learn.
 +
 +Sending the same thing twice does no harm. The system will recognize duplicate content and not learn from it twice.
  
 If you accidentally send spam to the hamtrap address, or vice versa, just send the same content again this time to the correct address. The system will unlearn the incorrect content and re-learn it the right way. If you accidentally send spam to the hamtrap address, or vice versa, just send the same content again this time to the correct address. The system will unlearn the incorrect content and re-learn it the right way.
Line 60: Line 64:
 Sending the same thing twice does no harm. The system will recognize duplicate content and not learn from it twice. Sending the same thing twice does no harm. The system will recognize duplicate content and not learn from it twice.
  
-If you accidentally feed them spam instead of ham or vice versa, just repeat with the correct command. The system will unlearn the incorrect content and re-learn it the right way.+If you accidentally feed these commands spam instead of ham or vice versa, just repeat with the correct command. The system will unlearn the incorrect content and re-learn it the right way.
  
 ===== Debugging the learning ===== ===== Debugging the learning =====
Line 77: Line 81:
 |    ... with verbose output |'' sa-learn -D %%--%%ham filename ''| |    ... with verbose output |'' sa-learn -D %%--%%ham filename ''|
  
- 
-All these commands will recognize duplicate content and not learn from it twice. 
  
 These commands (unlike '' spamtrap '' and '' hamtrap '') will also take one or more directory names, and they will cause every file in each directory to be processed. These commands (unlike '' spamtrap '' and '' hamtrap '') will also take one or more directory names, and they will cause every file in each directory to be processed.
Line 84: Line 86:
 If no filename is specified, these commands will read from their standard input. If no filename is specified, these commands will read from their standard input.
  
-If you accidentally feed them spam instead of ham or vice versa, just repeat with the correct command. The system will unlearn the incorrect content and re-learn it the right way.+Sending the same thing twice does no harm. The system will recognize duplicate content and not learn from it twice. 
 + 
 +If you accidentally feed these commands spam instead of ham or vice versa, just repeat with the correct command. The system will unlearn the incorrect content and re-learn it the right way.
hints/teaching_system_recognize_spam_not.1613920987.txt.gz · Last modified: 2021/02/21 07:23 by admin