SpamAssassin Bayes DB Token limit [message #118028] |
Mon, 08 December 2014 21:41  |
Machete
Messages: 187 Registered: February 2012 Location: United States
|
|
|
|
I've read that the Bayes DB only holds so many 'tokens' - and there's supposedly a place to increase this limit, but I haven't found it with Kerio's implementation of Spam Assassin.
Since I routinely have an issue with SPAM making it past the filters, regardless of what I set - I think this is my next step is pursuing a longer term solution from my SPAM increasing (modifying how many tokens the db stores).
See here: http://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html# expiration
|
|
|
Re: SpamAssassin Bayes DB Token limit [message #118131 is a reply to message #118028] |
Fri, 12 December 2014 00:02   |
MarkK
Messages: 342 Registered: April 2007
|
|
|
|
MAKE A BACKUP COPY FIRST!!!! Also, Kerio probably does not endorse making changes to this file. Also, it will be overwritten at the next Kerio update.
I believe the file you are looking for is the .\plugins\spamserver\spamassassin\site\lib\mail\SpamAssassin \Config.pm
I would suggest creating some of your own custom Spam Assassin scores and rules. I have written a post on this, as well as posted a good starting custom rule set. It has worked wonders for me.
http://forums.kerio.com/mv/msg/27477/0/
|
|
|
Re: SpamAssassin Bayes DB Token limit [message #118216 is a reply to message #118131] |
Mon, 15 December 2014 18:07   |
Machete
Messages: 187 Registered: February 2012 Location: United States
|
|
|
|
Thanks Mark - You've helped me in the past with the rules, and I've written and tweaked and it's helped immensely - but as you know, SPAM isn't a set it and forget it process.
I've got it down to where I'm only blocking 40% SPAM and the other 60% of SPAM ends up in the Likely Spam folder - better than the inbox, and finding 1-2 HAM messages a day among 100+ in Likely Spam is aggravating for users. I can't spend 2 hours a day creating/tweaking rules and then restart the mail service each time for the new rules to take effect. And that's what I've been doing.
And this problem sub-sides each time I start with a fresh Bayes.db file. In other words, I go from 40% Block & 60% Likely Spam to 80% block with a fresh Bayes DB.
|
|
|
Re: SpamAssassin Bayes DB Token limit [message #118225 is a reply to message #118216] |
Mon, 15 December 2014 19:14   |
MarkK
Messages: 342 Registered: April 2007
|
|
|
|
It did take some time to get things tweaked. Unfortunately it will never be perfect, 100% spam detection and 100% not-spam not-mismarked, from what I can tell. I do know that I'm at a point that I don't have to do much now. As I do see unmarked spam, I'll look and see if I can tweak a rule, and then not restart the server right away just for that one change.
In Connect's custom rules, I have added an Allow exception for various email addresses or domains that are valid but get caught by a rule.
I do have coworkers that complain spam is getting caught and put in their spam folder instead of deleted. I just explain that the spam filter is doing its job, allowing them the opportunity to catch a good email that has been mismarked. For us, that is maybe 1 a month.
Not sure what to say on your Bayes issue.
|
|
|
Re: SpamAssassin Bayes DB Token limit [message #118227 is a reply to message #118225] |
Mon, 15 December 2014 19:45   |
Machete
Messages: 187 Registered: February 2012 Location: United States
|
|
|
|
After you provided the suggested place to look (which provided some additional description of the variables I was curious about) I started digging.
-Kerio's implementation has auto-expire off
-Bayes.DB file itself is over 50mb (and only a year old)
-using a SQLBrowser, there's over 800,000 tokens in the DB (whatever the upper limit value is defined it's not holding it to the 150,000 default limit)
So the Bayes DB being full of tokens doesn't appear to be my issue unless the overall size of the DB is an issue.
I'm curious how much disk space other users see their bayes.db consuming? All of previous bayes.db files were in the 10MB-27MB range.
I've followed your other posts on custom SA rules, so I'll keep plugging along there until someone who knows more about expiring tokens, etc. and the bayes.db provides more details.
|
|
|
|
Re: SpamAssassin Bayes DB Token limit [message #118253 is a reply to message #118229] |
Tue, 16 December 2014 18:36   |
Machete
Messages: 187 Registered: February 2012 Location: United States
|
|
|
|
Thanks Mark - My journal size is the same as yours, and I recognize that volume of SPAM, amount of users, etc will affect the size the of the bayes.db as much as time does.
I really appreciate you chiming in and providing some help and assistance.
|
|
|
Re: SpamAssassin Bayes DB Token limit [message #122153 is a reply to message #118253] |
Thu, 18 June 2015 22:28   |
McIrish
Messages: 256 Registered: October 2011
|
|
|
|
Sorry to raise this thread from the dead but I have a question about bayes.db. Mine is now at 2.7GB which seems like it might be excessive. I'd hate to have to start over again with the learning process. What's recommended?
|
|
|
Re: SpamAssassin Bayes DB Token limit [message #122176 is a reply to message #122153] |
Fri, 19 June 2015 19:18   |
MarkK
Messages: 342 Registered: April 2007
|
|
|
|
Wow, 2.7GB seems huge. I'm at 13 months old and only 17.5MB for the database, which is the biggest it has ever been for my installation. As mentioned before, I'm guessing the volume of spam and the number of users probably plays a role in the size of this. I really don't know anything directly on the Bayes filtering.
Do you think this is causing an issue? If so, what you can try is to stop Connect, copy the existing files in to a holding folder, such as .\MailServer\store\spamassassin\bayes\20150619, and then delete the files in the .\bayes folder. When you start Connect again, it will create fresh files. Then if you are seeing adverse effects from deleting the Bayes databases, you can stop Connect again and put back the old files.
|
|
|
Re: SpamAssassin Bayes DB Token limit [message #122179 is a reply to message #122176] |
Fri, 19 June 2015 19:30   |
ksnyder
Messages: 557 Registered: August 2014 Location: USA
|
|
|
|
To add to this: once wiped, the Bayes DB begins learning again after around 200 spam emails. This shouldn't take long at all in most cases and is well worth the minor short-term inconvenience.
Ken Snyder
|
|
|
|
|
Re: SpamAssassin Bayes DB Token limit [message #123206 is a reply to message #122190] |
Fri, 31 July 2015 19:14   |
McIrish
Messages: 256 Registered: October 2011
|
|
|
|
I stopped the Kerio service and then deleted the bayes.db (after making a backup) and tried to restart kerio. It wouldn't start. Event viewer shows that it couldn't start the spam filter. I'm not sure what I might have done wrong. Got any ideas?
|
|
|
Re: SpamAssassin Bayes DB Token limit [message #123207 is a reply to message #123206] |
Fri, 31 July 2015 19:19   |
MarkK
Messages: 342 Registered: April 2007
|
|
|
|
What about the bayes.db-journal file?
Typically, what I do is create a dated folder (.\bayes\20150731), move the 3 files in to it so that the .\bayes\ folder is empty, and restart the server.
I'm wondering if you didn't delete the journal file, it is finding it and looking for the missing db file. Just a guess.
|
|
|
|