Optimizing disk utilization on a 3.2TB store [message #136781] |
Tue, 19 September 2017 02:35  |
Hartz
Messages: 10 Registered: June 2014 Location: Australia
|
|
|
|
Hi all,
Looking for suggestions on optimizing storage for our Kerio Connect server.
Domains: 24
Users: 412
Mailbox Store: 3.2TB
Scenario: every user must keep every message back until the dawn of time, just in case.
We are about to migrate to a new server with additional storage on a Dell Compellent SAN. CentOS with ZFS pool for mail store is what I am investigating (we already run CentOS).
I really want to try deduplication but ZFS dedupe seems out of reach. 961 million 4k blocks for our existing 3.2TB store would require 307GB of memory to store the dedupe tables (providing my calcs are correct). My test of a single 200GB domain produced a dedupe ratio of 1.2 (shy of the 2.0 recommended for dedupe) so it seems to not be worth it either way; I thought this dedupe ratio seemed low though.
ZFS compression seems like a instant yes with improved disk I/O performance and only a hit to CPU load which is acceptable in our scenario.
Is there anything I am missing? Any alternate solutions? Some built in dedupe would be amazing.
Regards,
H.
[Updated on: Tue, 19 September 2017 03:27] Report message to a moderator
|
|
|
Re: Optimizing disk utilization on a 3.2TB store [message #136804 is a reply to message #136781] |
Tue, 19 September 2017 23:52   |
Bud Durland
Messages: 586 Registered: December 2013 Location: Plattsburgh, NY
|
|
|
|
I'm not 100% certain, but from what I've seen of how Kerio stores messages on the server, I'm not sure a de-dupe application would be all that effective anyway. During a recent "all is well" GFI road map webinar, they said that a new message store database is coming in early 2018 (?), that will yield radically higher performance for searches and such. Maybe they will actually use a real dbms rather than flat text files.
How do you plan to migrate the message store? We used rsync and a creative script for a similar sized store to minimize user downtime.
|
|
|
Re: Optimizing disk utilization on a 3.2TB store [message #136805 is a reply to message #136804] |
Wed, 20 September 2017 00:07   |
Hartz
Messages: 10 Registered: June 2014 Location: Australia
|
|
|
|
Hi Bud,
I plan to rsync over n days and then have a final sync overnight for cutover. We already use rsync for backups rather than the built in backup process as it's quicker and just as effective.
Interesting news about the dbms in the roadmap. Definitely won't be rolling that update out on day 1.
|
|
|
Re: Optimizing disk utilization on a 3.2TB store [message #136816 is a reply to message #136805] |
Wed, 20 September 2017 14:43   |
Bud Durland
Messages: 586 Registered: December 2013 Location: Plattsburgh, NY
|
|
|
|
Just to be clear, GFI did NOT specifically say there was a dbms based message store in the future, just that there would be a new datastore mechanism that would provide radically improved performance. I just don't see how that will happen without some type of dbms.
|
|
|
Re: Optimizing disk utilization on a 3.2TB store [message #145761 is a reply to message #136781] |
Fri, 10 May 2019 20:01  |
robvas
Messages: 4 Registered: November 2018 Location: USA
|
|
|
|
Hartz wrote on Mon, 18 September 2017 20:35Hi all,
We are about to migrate to a new server with additional storage on a Dell Compellent SAN. CentOS with ZFS pool for mail store is what I am investigating (we already run CentOS).
I really want to try deduplication but ZFS dedupe seems out of reach. 961 million 4k blocks for our existing 3.2TB store would require 307GB of memory to store the dedupe tables (providing my calcs are correct). My test of a single 200GB domain produced a dedupe ratio of 1.2 (shy of the 2.0 recommended for dedupe) so it seems to not be worth it either way; I thought this dedupe ratio seemed low though.
ZFS compression seems like a instant yes with improved disk I/O performance and only a hit to CPU load which is acceptable in our scenario.
Is there anything I am missing? Any alternate solutions? Some built in dedupe would be amazing.
Old thread, but I figured I would chime in:
From ZFS website:
RAM Rules of Thumb
If this is all too complicated for you, then let's try to find a few rules of thumb:
For every TB of pool data, you should expect 5 GB of dedup table data, assuming an average block size of 64K.
This means you should plan for at least 20GB of system RAM per TB of pool data, if you want to keep the dedup table in RAM, plus any extra memory for other metadata, plus an extra GB for the OS.
4K blocks would to small to use IMO. You should be good with 64-96GB of RAM on the storage side.
|
|
|