This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
hints:teaching_system_recognize_spam_not [2021/02/21 07:23] admin [Teaching the system from the command line] |
hints:teaching_system_recognize_spam_not [2021/03/06 21:07] admin [How you can teach the system] |
||
---|---|---|---|
Line 17: | Line 17: | ||
| 2.0 BAYES_80 BODY: Bayes spam\\ probability is 80 to 95% [score: | | 2.0 BAYES_80 BODY: Bayes spam\\ probability is 80 to 95% [score: | ||
| BAYES_HAM(-2.73) [98.80%] | | BAYES_HAM(-2.73) [98.80%] | ||
+ | |||
+ | Note that anything recognized as spam with high certainty is immediately rejected during incoming SMTP, so you will never see it (see [[/ | ||
===== How the system learns ===== | ===== How the system learns ===== | ||
Line 35: | Line 37: | ||
Or select a bunch of ham (not-spam) messages, and forward them **as an attachment** to the ham-recognizing email address: < | Or select a bunch of ham (not-spam) messages, and forward them **as an attachment** to the ham-recognizing email address: < | ||
- | These email addresses will automatically tokenize the contents of whatever they get, an add these tokens into the database as spam or ham tokens respectively. | + | These email addresses will automatically tokenize the contents of whatever they get, and add these tokens into the database as spam or ham tokens respectively. |
**Ham tokens are important.** They let the Bayesian system learn how to recognize legitimate email, thus subtracting from its spam score, and making it less likely to be erroneously classified as spam. Make it a point to send at least as much ham as you send spam. | **Ham tokens are important.** They let the Bayesian system learn how to recognize legitimate email, thus subtracting from its spam score, and making it less likely to be erroneously classified as spam. Make it a point to send at least as much ham as you send spam. | ||
Line 41: | Line 43: | ||
If you are forwarding a single message to < | If you are forwarding a single message to < | ||
- | A normal forward not as an attachment often does not include the complete headers, so might not be as useful | + | A normal forward not as an attachment often does not include the complete headers, so it is less useful, but it's still better than nothing. |
**Budget your time.** Life is short. Spend no more than, say, 2–3 minutes per week sending spam or ham. You can do it once a week, or you could spend just 5–10 seconds a day quickly forwarding two or three selected messages. | **Budget your time.** Life is short. Spend no more than, say, 2–3 minutes per week sending spam or ham. You can do it once a week, or you could spend just 5–10 seconds a day quickly forwarding two or three selected messages. | ||
The most useful messages with which to teach the system **are those messages that the system recognized incorrectly**. If the system already correctly recognized spam or ham, it won't learn a lot. If the system incorrectly recognized ham as spam or spam as ham, that is where it most needs to learn. | The most useful messages with which to teach the system **are those messages that the system recognized incorrectly**. If the system already correctly recognized spam or ham, it won't learn a lot. If the system incorrectly recognized ham as spam or spam as ham, that is where it most needs to learn. | ||
+ | |||
+ | Sending the same thing twice does no harm. The system will recognize duplicate content and not learn from it twice. | ||
If you accidentally send spam to the hamtrap address, or vice versa, just send the same content again this time to the correct address. The system will unlearn the incorrect content and re-learn it the right way. | If you accidentally send spam to the hamtrap address, or vice versa, just send the same content again this time to the correct address. The system will unlearn the incorrect content and re-learn it the right way. | ||
Line 60: | Line 64: | ||
Sending the same thing twice does no harm. The system will recognize duplicate content and not learn from it twice. | Sending the same thing twice does no harm. The system will recognize duplicate content and not learn from it twice. | ||
- | If you accidentally feed them spam instead of ham or vice versa, just repeat with the correct command. The system will unlearn the incorrect content and re-learn it the right way. | + | If you accidentally feed these commands |
===== Debugging the learning ===== | ===== Debugging the learning ===== | ||
Line 77: | Line 81: | ||
| ... with verbose output |'' | | ... with verbose output |'' | ||
- | |||
- | All these commands will recognize duplicate content and not learn from it twice. | ||
These commands (unlike '' | These commands (unlike '' | ||
Line 84: | Line 86: | ||
If no filename is specified, these commands will read from their standard input. | If no filename is specified, these commands will read from their standard input. | ||
- | If you accidentally feed them spam instead of ham or vice versa, just repeat with the correct command. The system will unlearn the incorrect content and re-learn it the right way. | + | Sending the same thing twice does no harm. The system will recognize duplicate content and not learn from it twice. |
+ | |||
+ | If you accidentally feed these commands |