GFI Software

Welcome to the GFI Software community forum! For support please open a ticket from https://support.gfi.com.

Home » GFI User Forums » Kerio Connect » SpamAssassin Bayes DB Token limit
help-browser.png  SpamAssassin Bayes DB Token limit [message #118028] Mon, 08 December 2014 21:41 Go to next message
Machete
Messages: 187
Registered: February 2012
Location: United States
I've read that the Bayes DB only holds so many 'tokens' - and there's supposedly a place to increase this limit, but I haven't found it with Kerio's implementation of Spam Assassin.

Since I routinely have an issue with SPAM making it past the filters, regardless of what I set - I think this is my next step is pursuing a longer term solution from my SPAM increasing (modifying how many tokens the db stores).
See here: http://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html# expiration
Re: SpamAssassin Bayes DB Token limit [message #118131 is a reply to message #118028] Fri, 12 December 2014 00:02 Go to previous messageGo to next message
MarkK is currently offline  MarkK
Messages: 342
Registered: April 2007
MAKE A BACKUP COPY FIRST!!!! Also, Kerio probably does not endorse making changes to this file. Also, it will be overwritten at the next Kerio update.

I believe the file you are looking for is the .\plugins\spamserver\spamassassin\site\lib\mail\SpamAssassin \Config.pm

I would suggest creating some of your own custom Spam Assassin scores and rules. I have written a post on this, as well as posted a good starting custom rule set. It has worked wonders for me.

http://forums.kerio.com/mv/msg/27477/0/
Re: SpamAssassin Bayes DB Token limit [message #118216 is a reply to message #118131] Mon, 15 December 2014 18:07 Go to previous messageGo to next message
Machete
Messages: 187
Registered: February 2012
Location: United States
Thanks Mark - You've helped me in the past with the rules, and I've written and tweaked and it's helped immensely - but as you know, SPAM isn't a set it and forget it process.

I've got it down to where I'm only blocking 40% SPAM and the other 60% of SPAM ends up in the Likely Spam folder - better than the inbox, and finding 1-2 HAM messages a day among 100+ in Likely Spam is aggravating for users. I can't spend 2 hours a day creating/tweaking rules and then restart the mail service each time for the new rules to take effect. And that's what I've been doing.

And this problem sub-sides each time I start with a fresh Bayes.db file. In other words, I go from 40% Block & 60% Likely Spam to 80% block with a fresh Bayes DB.
Re: SpamAssassin Bayes DB Token limit [message #118225 is a reply to message #118216] Mon, 15 December 2014 19:14 Go to previous messageGo to next message
MarkK is currently offline  MarkK
Messages: 342
Registered: April 2007
It did take some time to get things tweaked. Unfortunately it will never be perfect, 100% spam detection and 100% not-spam not-mismarked, from what I can tell. I do know that I'm at a point that I don't have to do much now. As I do see unmarked spam, I'll look and see if I can tweak a rule, and then not restart the server right away just for that one change.

In Connect's custom rules, I have added an Allow exception for various email addresses or domains that are valid but get caught by a rule.

I do have coworkers that complain spam is getting caught and put in their spam folder instead of deleted. I just explain that the spam filter is doing its job, allowing them the opportunity to catch a good email that has been mismarked. For us, that is maybe 1 a month.

Not sure what to say on your Bayes issue.
Re: SpamAssassin Bayes DB Token limit [message #118227 is a reply to message #118225] Mon, 15 December 2014 19:45 Go to previous messageGo to next message
Machete
Messages: 187
Registered: February 2012
Location: United States
After you provided the suggested place to look (which provided some additional description of the variables I was curious about) I started digging.
-Kerio's implementation has auto-expire off
-Bayes.DB file itself is over 50mb (and only a year old)
-using a SQLBrowser, there's over 800,000 tokens in the DB (whatever the upper limit value is defined it's not holding it to the 150,000 default limit)

So the Bayes DB being full of tokens doesn't appear to be my issue unless the overall size of the DB is an issue.

I'm curious how much disk space other users see their bayes.db consuming? All of previous bayes.db files were in the 10MB-27MB range.

I've followed your other posts on custom SA rules, so I'll keep plugging along there until someone who knows more about expiring tokens, etc. and the bayes.db provides more details.
Re: SpamAssassin Bayes DB Token limit [message #118229 is a reply to message #118227] Mon, 15 December 2014 19:58 Go to previous messageGo to next message
MarkK is currently offline  MarkK
Messages: 342
Registered: April 2007
Mine: 7 months old
bayes.db 15,236KB
bayes.db-journal 1,954KB
Re: SpamAssassin Bayes DB Token limit [message #118253 is a reply to message #118229] Tue, 16 December 2014 18:36 Go to previous messageGo to next message
Machete
Messages: 187
Registered: February 2012
Location: United States
Thanks Mark - My journal size is the same as yours, and I recognize that volume of SPAM, amount of users, etc will affect the size the of the bayes.db as much as time does.

I really appreciate you chiming in and providing some help and assistance.
Re: SpamAssassin Bayes DB Token limit [message #122153 is a reply to message #118253] Thu, 18 June 2015 22:28 Go to previous messageGo to next message
McIrish is currently offline  McIrish
Messages: 256
Registered: October 2011
Sorry to raise this thread from the dead but I have a question about bayes.db. Mine is now at 2.7GB which seems like it might be excessive. I'd hate to have to start over again with the learning process. What's recommended?
Re: SpamAssassin Bayes DB Token limit [message #122176 is a reply to message #122153] Fri, 19 June 2015 19:18 Go to previous messageGo to next message
MarkK is currently offline  MarkK
Messages: 342
Registered: April 2007
Wow, 2.7GB seems huge. I'm at 13 months old and only 17.5MB for the database, which is the biggest it has ever been for my installation. As mentioned before, I'm guessing the volume of spam and the number of users probably plays a role in the size of this. I really don't know anything directly on the Bayes filtering.

Do you think this is causing an issue? If so, what you can try is to stop Connect, copy the existing files in to a holding folder, such as .\MailServer\store\spamassassin\bayes\20150619, and then delete the files in the .\bayes folder. When you start Connect again, it will create fresh files. Then if you are seeing adverse effects from deleting the Bayes databases, you can stop Connect again and put back the old files.
Re: SpamAssassin Bayes DB Token limit [message #122179 is a reply to message #122176] Fri, 19 June 2015 19:30 Go to previous messageGo to next message
ksnyder
Messages: 557
Registered: August 2014
Location: USA
To add to this: once wiped, the Bayes DB begins learning again after around 200 spam emails. This shouldn't take long at all in most cases and is well worth the minor short-term inconvenience.

Ken Snyder
Re: SpamAssassin Bayes DB Token limit [message #122185 is a reply to message #118028] Fri, 19 June 2015 20:48 Go to previous messageGo to next message
McIrish is currently offline  McIrish
Messages: 256
Registered: October 2011
Ken,
Is the size of my database a problem? I'd hate to start over again only to find that it was never part of the problem.
Re: SpamAssassin Bayes DB Token limit [message #122190 is a reply to message #122185] Fri, 19 June 2015 22:02 Go to previous messageGo to next message
ksnyder
Messages: 557
Registered: August 2014
Location: USA
I don't know that the size, per se, is what really matters. A combination of the age of data in the Bayes filter and the sophistication of spammers can spoil the database. See http://kb.kerio.com/product/kerio-connect/server-configurati on/antispam/optimizing-spam-protection-in-kerio-connect-265. html and the section, "Managing SpamAssassin Bayes". In there you'll see a recommendation to check the Bayes score of detected, undetected, and legitimate mails to determine if you need to reset it.

At the end of the day, if the Bayes filter hasn't been reset in over a year, there's a good chance you'll benefit from a reset. The current size of your file is potentially slowing the system down and confusing the filter.


Ken Snyder
Re: SpamAssassin Bayes DB Token limit [message #123206 is a reply to message #122190] Fri, 31 July 2015 19:14 Go to previous messageGo to next message
McIrish is currently offline  McIrish
Messages: 256
Registered: October 2011
I stopped the Kerio service and then deleted the bayes.db (after making a backup) and tried to restart kerio. It wouldn't start. Event viewer shows that it couldn't start the spam filter. I'm not sure what I might have done wrong. Got any ideas?
Re: SpamAssassin Bayes DB Token limit [message #123207 is a reply to message #123206] Fri, 31 July 2015 19:19 Go to previous messageGo to next message
MarkK is currently offline  MarkK
Messages: 342
Registered: April 2007
What about the bayes.db-journal file?
Typically, what I do is create a dated folder (.\bayes\20150731), move the 3 files in to it so that the .\bayes\ folder is empty, and restart the server.

I'm wondering if you didn't delete the journal file, it is finding it and looking for the missing db file. Just a guess.
Re: SpamAssassin Bayes DB Token limit [message #123208 is a reply to message #123207] Fri, 31 July 2015 19:24 Go to previous message
McIrish is currently offline  McIrish
Messages: 256
Registered: October 2011
Hi Mark,
I didn't delete the autowhitelist. My next attempt, I stopped the service and then just renamed the bayes directory. That time it worked. Man... I was freaking out when it wouldn't start up. whew!

[Updated on: Fri, 31 July 2015 19:24]

Report message to a moderator

Previous Topic: Sent Items on POP3
Next Topic: Free foward-only users
Goto Forum:
  


Current Time: Sun May 28 14:35:47 CEST 2023

Total time taken to generate the page: 0.05455 seconds