DirHarvesting behaviour in 14.1 - slow mail delivery
|
Logged in as: Guest
|
|
Users viewing this topic:
none
|
|
Login | |
|
DirHarvesting behaviour in 14.1 - slow mail delivery - 12.Oct.2009 10:33:11 PM
|
|
|
Jacob Luebbers
Posts: 63
Joined: 8.Sep.2004
Status: offline
|
Has anyone else noticed a change in the way ME v14.1 handles incoming SMTP connections when running DirHarvesting in SMTP Transport filtering mode, versus v14? We have two W2K3/IIS6 boxes in an NLB cluster running both ME and MSEC. Since upgrading them from ME 14 to 14.1 one of the two nodes has been showing a noticable delay in delivering inbound mail - and further diagnostics appears to show it's due to it simply not closing DirHarvesting SMTP connections as aggressively as the other node. For reference - yesterday we received around 2.7 million inbound email messages (this is fairly typical), 98.7% of which were DirHarvesting spam. The "bad" node will average around 200 concurrent SMTP connections, whereas the "good" node will average 35 during the same time frame. I've checked the NLB weights - the rule is set to equal (50/50), and watching the various TCP connection counters in PerfMon on both nodes shows them receiving a similar number of new connections after a clean simultaneous boot. However the "bad" node is issuing far less TCP Connection Resets than the "good" node over that time frame. And observing the "Current Sessions" list in the IIS SMTP manager on both nodes shows obvious DirHarvesting connections hanging around much longer than on the good node. I understand that v14.1 changed the way DirHarvesting in SMTP mode worked - previously it would simply close the connection as soon as it identified it as DirHarvesting. In 14.1 it responds with a 5.5.4 error, and DOESN'T close the connection, leaving it up to the client end (or any other SMTP max connection age limit) to clean up. My assessment of our situation is that one of the nodes is still behaving like v14 did (perhaps due to a bad upgrade), with the aggressive SMTP connection closing behaviour, whereas the other node is behaving "correctly" as per the new 14.1 behaviour. I've raised this with local GFI support and eventually the case was closed with this behaviour marked as a bug, but no ETA or visibility on when we can get a hotfix. This leaves us with a partially crippled cluster, with one node behaving like v14 did. If the slow closing of DirHarvesting connections is how v14.1 is supposed to work I'd prefer to downgrade back to v14. So a general question for anyone out there using v14: what does a typical incoming DirHarvesting SMTP conversation look like on your boxes? Our looks roughly like this: client: HELO/EHLO spamhost.somewhere.com server: 250 – Hello, proceed client: MAIL FROM:<spammer@somewhere.com> server: 250 – 2.1.0 spammer@somewhere.com Sender OK client: RCPT TO:<fakeuser@ourdomain.com> server: 250 – 2.1.5 fakeuser@ourdomain.com client: DATA server: 501 – 5.5.4 Unrecognized parameter at this point the connection is NOT closed. Unfortunately I don't have a capture of the v14 behaviour to compare. Regards, Jacob
< Message edited by Jacob Luebbers -- 12.Oct.2009 10:36:54 PM >
|
|
|
|
RE: DirHarvesting behaviour in 14.1 - slow mail delivery - 13.Oct.2009 1:39:14 PM
|
|
|
RSP
Posts: 1270
Joined: 31.Oct.2006
From: The East Riding of Yorkshire, UK
Status: offline
|
Jacob When I did some testing about this before, the SMTP conversation was similar to your observed conversation, except the connection was severed. The new behaviour that you're seeing is how ME appears to respond from ME14.1, and is correct in that the client should decide to close the connection unless a timeout is reached, 10 minutes by default I think. I don't think it's behaviour is optimal, as it accepts all recipients, then decides to give the 501 5.5.4 error against the DATA command. This could be interpreted as being cunning in that the client doesn't know which recipients are bad, but this is assuming the client is a spammer. However, I believe it should be returning a 550 5.1.1 against invalid recipients initially. Obviously if there are no valid recipients, it should be responding to the DATA command in a similar manner to: 554 5.5.1 Error: no valid recipients. If the connection times out, then the appropriate response is something like: 421 4.4.2 myhost.co.uk Error: timeout exceeded These changes should not affect the timely delivery of legitimate emails though, if you have sufficient concurrent connections available in your configuration. You say your diagnostics appears to show it's due to not aggressively closing connections, but how did you reach this conclusion?
_____________________________
Disclaimer: I don't work for GFI, I just use their products.
|
|
|
|
RE: DirHarvesting behaviour in 14.1 - slow mail delivery - 13.Oct.2009 7:43:34 PM
|
|
|
Jacob Luebbers
Posts: 63
Joined: 8.Sep.2004
Status: offline
|
RSP - Thanks for the reponse. I'm actually not 100% sure of my suspicion yet (about the more aggressive closing of connections on one node) - I wanted to check with others to see what a normal DirHarvester SMTP conversation looks like in v14.1 before proceeding further with my diagnostics. I've been looking at the "Connections Active" and "Connections Reset" counters under "TCPv4" in PerfMon after rebooting both nodes simultaneously and watching them climb as new inbound connections were received. I observed the Connections Active counters climb on both nodes at roughly the same rate, whereas the Connections Reset counter lagged behind on the bad node compared to the good, with that gap widening over time. Also when looking at the current SMTP session list in IIS Manager I could see that most of the connections were present for a longer period (I was repeatedly refreshing the view on both nodes side by side). The "good" node's DirHarvesting connections were culled very quickly from the list, whereas the "bad" node's connections remained until the client disconnected them. However I've just tested a manual SMTP DirHarvesting conversation against both nodes individually, and BOTH are behaving like I described in my previous post, with the connection NOT being closed after the 5.5.4 error. Yet the good node is still culling connections from the list much quicker than the bad, and mail delivery through the bad one is lagging. So my initial assumption seems faulty - but the fact remains that as soon as I bring the bad node back into the cluster its current SMTP connection count rises to more than double the good node within about a minute, and stays at least that high. I can only get them in rough parity by changing the NLB weight to 80/20 in favour of the good node. This problem only started with the v14.1 upgrade on the cluster, so I'm thinking it has to be related to ME in some way. Any suggestions as to where to look next? The two nodes are on identical hardware and the various IIS SMTP settings are the same on both (msgs/connection, session size, msg size, max connections, connections/domain, etc.) Regards, Jacob
|
|
|
|
New Messages |
No New Messages |
Hot Topic w/ New Messages |
Hot Topic w/o New Messages |
Locked w/ New Messages |
Locked w/o New Messages |
|
Post New Thread
Reply to Message
Post New Poll
Submit Vote
Delete My Own Post
Delete My Own Thread
Rate Posts |
|
|