Weblog entry #9 for dkg

Pros and Cons of secondary MX records
Posted by dkg on Sat 21 Oct 2006 at 19:10
Tags: none.
Many sites use multiple MX records in DNS. But i feel like i'm seeing more and more which just have a single MX record. Why choose the one strategy over the other?

Given that MTAs are increasingly complicated these days (with various spam filtering techniques), what are some good arguments for (or against) having multiple MX records for a relatively small domain (<1000 users)? Here's a couple notes of my own (which i'm not wedded to: please tell me if you disagree!):

For:

  • more control over mail delivery: if your primary MTA is down or unreachable, you still have a machine you control who will accept mail deliveries on your behalf, rather than trusting the remote mailer to retry properly.
  • it's "the standard" way to do things.
  • redundancy is good.
Against:
  • synchronizing settings between primary and secondary MTAs is complicated and potentially error-prone. If settings are not synchronized, the secondary MX could end up accepting messages for delivery that the primary would not have accepted.
  • simplicity is good.
  • queues on the secondary MX provide yet another place for mail to be lost or mangled in an already-complicated protocol
  • i've heard many reports of spammers preferring the secondary mail exchangers over the primaries, though i'm not clear why that is.
Your thoughts?

 

Comments on this Entry

Posted by daemon (155.232.xx.xx) on Sat 21 Oct 2006 at 20:26
[ View Weblogs ]
I don't think the size of the domain (as in number of users) is really an issue when considering if you need a secondary MX or not (although there will, of course, be a point where it's just not viable). I think the more important question is how valuable your mail is to you...

Of your "For" reasons, I think the last one, redundancy, has the higher priority, but it is also a "best practise" to have a backup MX system to fall back on, or even load balance with on a high volume domain/hosting setup.

As for why spammers would like to target secondary MX hosts, I think it's down to the impression that secondaries are often remote in relation to the primary host, and often run by different sysadmins, or for personal domains, friends. As such, the secondaries in such situations are thought to be unlikely to have as strict a policy for the acceptance of mail compared to primary hosts, and are seen as soft targets.

Cheers.

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Sat 21 Oct 2006 at 23:15
[ View dkg's Scratchpad | View Weblogs ]
there's the load balancing concern for high volume domains, as you say. In this situation, i can see using multiple MX records for load balancing. i can also think of several other load-balancing techniques (e.g. multiple IPs for a single A record, pointed to by a single MX) which probably each have their own set of tradeoffs.

But for smaller domains, load-balancing doesn't seem like a very good argument. So we're back to redundancy as the primary reason i think (i don't consider "best practice" to be a reason in and of itself: presumably it's "best practice" because it offers redundancy, right? If there's another reason, what is it?).

But redundancy cuts both ways when you ask "how valuable your mail is to you?" Multiple mail exchangers now means two machines that you need to keep running, and you (presumably) need to synchronize their administration somehow -- are there good tools for this? Otherwise, your secondaries will be the soft targets you describe for spammers. For domains doing SMTP-time content filtering (as more and more seem to be doing), it means your filters need to be sync'ed as well. And if those filters are user-tunable, it sounds like even more of a potential for trouble.

Many mail server administrators are now adopting greylisting, which demands SMTP-compliant behavior from the remote mail hosts, including sensible retry and delay timings. If your worst-case downtime is on the order of a day or two, and you already expect this level of responsible behavior from the remote mail hosts you interact with, how does the redundancy help you avoid losing mail?

Secretly, of course, i'm as lazy as the next systems administrator, and i'm looking for reasons why secondary mail exchangers aren't necessary. If they aren't, then i can spend my limited hours doing other work that my users will appreciate, rather than administering a second mail server. So i'm looking for the knock-down argument that says i need to have a secondary if i want some particular sort of reliability.

i'm still not sure that I see any significant extra reliability introduced by this redundancy that makes it worth the extra complexity. Any other arguments? am i missing something obvious (or subtle)?

[ Parent | Reply to this comment ]

Posted by daemon (155.232.xx.xx) on Sun 22 Oct 2006 at 13:52
[ View Weblogs ]

Well, we're all a bit lazy at heart, that's what makes Debian-Administration so good -- others have done alot of the thinking for us already ;-)

i'm still not sure that I see any significant extra reliability introduced by this redundancy that makes it worth the extra complexity. Any other arguments? am i missing something obvious (or subtle)?

I guess it really depends on your environment. I doubt you're missing anything, let alone anything obvious -- different environments are widely variable, some suite secondary MX's, others don't really need them, or at least can get away without one, as the extra work involved would out weigh the benefits.

Of course, if your head would like to have a secondary MX, but your "lazy gene" is fighting against it, you could always use something like cfengine to maintain your site's mail configuration -- that way, once set up, you only have to make configuration changes once, and cfengine would roll changes out to your MX hosts... One of the biggest problems with being a sys-admin these days are all the possible options to choose from when solving problems ;-)

Cheers.

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Sun 22 Oct 2006 at 17:40
[ View Weblogs ]
\\\\ it's "the standard" way to do things.

Various serious men in serious white coats have advised against seconds MX for a long time. It is one of those things people think they need without thinking of the details. So I don't think it is the standard way any more. I think you need to define precisely what a secondary MX is to achieve for you, then you'll know if you need one.

Remember they date from the days when many peoples Internet connections were unreliable, and even then there was a hack made to various SMTP servers, so that the primary could ask the secondary to resend email (no good having email on a secondary if it is using exponential back-off for delivery to the primary, and delays the email longer than it would have done otherwise).

Once people just queued any email for their domain on the secondary. Current spam levels make this totally unworkable, as a minimum one needs to sync the list of valid email addresses, so one can reject undeliverable email. One also needs to synchronise the spam filtering set-up, or the secondary will still dump rubbish on the primary. Obviously, if like me you use greylisting, you need also devise a greylisting scheme that fails over between servers, which isn't that hard.

Relying on remote hosts to retry delivery is pretty reliable, I know, I use one MX and greylisting on hundreds of domains. Face it if a remote host doesn't retry when it fails, there will be occasions when it fails to deliver email whatever you do, such as when its own route to the Internet is down. There is a reason the RFC specifies the need to retry, and you can reasonably expect RFC compliant behaviour in email servers (i.e. you can blame the sender if such email isn't delivered).

So the traditional secondary MX is almost entirely about coping with an outage of the network connectivity to the primary.

If your primary MX dies, with or without a secondary MX, you need to replace it - in this case the only difference is whether your queue expiry tells the senders their email wasn't delivered, or the sending servers does (I believe people would get a more consistent email experience if all their messages were returned after the same period, which means queuing on the sending server results in a more consistent experience during problems).

If you have 1000 active users (we have rather more users, but they are mostly not sending a lot of email, but have email with their webhosting), you probably ought to consider having the email server somewhere with a redundant Internet connection (BGP), whether that is a server in a data center in an ISP arrangement, or a server on a big corporate site. This might not remove the need for a secondary MX, but it may be a higher priority item, especially since it can keep everything online during a network outage or glitch, not just SMTP.

Similarly, if availability of email is important, one probably wants to deploy two primary MX servers, which are either symmetrical (active/active), or a fail-over (active/standby), so that you can continue actually delivering email when one server fails, rather than just queuing it. Apart from the redundant shared storage for the ultimate delivery destination, it is fairly easy to build SMTP and POP/IMAP/Webmail servers up in parallel fashion.

Once you've deployed these kind of arrangements you may feel a secondary MX is an unnecessary complication.

I think it is possible to argue that with a number of secondary MXes, one could cope with extreme peaks in demands better (as long as they aren't sustained), and one can reduce the chance of a sending server being "unlucky" in failing to get a connection on multiple occaisons due to the primary being busy. Although my experience is that SMTP doesn't usually have these extreme fluctuations in demands, even a fairly big "joe job" usually only results in thousands of email s an hour, which even a modestly specced email server can handle without breaking into a sweat. Greylisting also helps spread out the backscatter from Joe-jobs, it'd be interesting to measure that and see if it is a net gain (obviously backscatter will pass greylisting eventually, and there is work in saying "come back latter").

I've seen too many misconfigured secondary MXes reject email (including from big ISPs that should know better), that would otherwise have been delivered fine, to believe that most admins are disciplined enough to keep a secondary server correctly configured and upto date. Not so much the complexity, as the administrative cohesion.

I believe some MTAs use to have a "secondary a domain, if you are listed as a secondary MX", which before spam was an excellent idea, since it stopped some trivial types of abuse, but avoided the "relaying denied" messages due to someone forgetting to add a domain, these days it is just an invitation to be used as the spammers workhorse.

When we had 2 MXes, we saw about a 1/3 of all email delivered direct to the secondary, it was all spam except during genuine outages of the primary (rare). I don't know if it is deliberate on the part of spammers. We also see a fair amount of email delivery attempted to the A record (ignoring the MX record) of domains (Which is an RFC violation, unless you get a response that positively says no MXes exist, an MTA must not deliver to the A record). Again that may be badly written spambots, or just the discovery by the spammer that sometimes it works. Spammers don't care. Interesting the direct to A record delivery had more genuine email attempted, than the secondary MX, but again it is an RFC violation and the senders problem if their email server doesn't conform to relevant RFCs.

[ Parent | Reply to this comment ]

Posted by jamiemcc (69.114.xx.xx) on Mon 23 Oct 2006 at 01:23
I'm focusing on the "For" arguments as well, because I'm also against secondary MX records for reasons of added complexity and am looking for a reason to change my mind.

And, I agree that it's different to focus on relatively small systems in which load balancing isn't an issue (because I think that certainly could change things).

So, on the issue of why we should have secondary mx servers for relatively small domains...

Let's start by punting the "standard way of doing things" reason. I'm a HUGE fan of doing things in the standard way so that if person A sets up a system, person B can maintain it. I don't think this situation applies to secondary mx records. It takes person B about 2 seconds to determine that there is no secondary mx record. Done. Now it's even easier for person B to figure out how things operate because there's only one system involved (see "simplicity is good" under arguments against). Furthermore, I also agree with simonw about question whether this still is standard anyway.

I think "redundancy" and "control" are two versions of the same reason. That single reason is: do you want to have full control over the redundancy or is it ok to rely on the sending server to handle the backing up of sent mail?

If we had unlimited resources I think we would setup a backup mx server. It really does give you full control over redundancy and you could devote all the resources at your disposal to ensure that it is properly configured and monitored.

However, since we don't have unlimited resources (and we have enough resources to be reasonably sure we can bring up a new server in less than 24 hours), skipping the secondary mx server seems like a better move.

[ Parent | Reply to this comment ]

Posted by dominic (142.58.xx.xx) on Mon 23 Oct 2006 at 17:37
This is a matter of suiting your needs. As simple as that. Must incoming mail be queued even when there's a total failure at the primary site?

If you answered "Yes" to this question, then you need secondary (and possibly tertiary) MX records.

If you answered "No" or "Maybe", then consider that it does add complexity and updating mail configurations at to distinct sites is going to incur some cost (time and energy mostly).


Even for my personal domain where I only have a dozen users, mail has to be queued because several users, myself included, consider email very important. So even for my small site, I run a secondary MX.

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Mon 23 Oct 2006 at 17:46
[ View dkg's Scratchpad | View Weblogs ]
I consider e-mail very important also, of course. it's a primary means of communication with other humans for some of us!

Why is immediate queueing on your own machine so important for you? One alternative would be to let the remote mailer try again later when you are back up. this is equivalent to queuing on the remote MTA instead of queuing on your own machine. Why is this a worse alternative than a secondary MX?

Do you use greylisting on either machine as a spamfiltering technique?

How do you co-ordinate configuration information (user lists, spam filter parameters, virus definitions, etc) between your two MX's?

[ Parent | Reply to this comment ]

Posted by dominic (142.58.xx.xx) on Mon 23 Oct 2006 at 18:40
I have been through a couple outages that have been very long (as long as a week). Queueing mail on a secondary server has allowed me to setup temporary measures for retrieving mail, like forwarding certain recipients to a gmail account.

I don't do greylisting since on either machine. Though I may as well on the secondary machine. Since mail isn't being delivered immediately anyhow... Hmm, that's a good suggestion.

As for co-ordination, since my domain is very small, I don't bother. I just let everything spool on the secondary and let the primary sort it all out later. The only thing I sync is the list of domains which I do by hand.

If I had to sync, my "real" users are in an LDAP directory so the secondary mail server could be setup to use the LDAP directory (or setup a replica and use that). But I don't have full control over the secondary server so I don't go any further then just listing the domains and spooling all mail.

We only do a couple hundred email messages a day so even the 50-70% spam isn't a huge burden.

- dom

[ Parent | Reply to this comment ]

Posted by adamshand (198.95.xx.xx) on Tue 24 Oct 2006 at 02:17
[ View Weblogs ]
That's exactly right. The primary value of a secondary MX is to give you added control when things go wrong.

* If you're primary MX goes down for more then 5 days (gack!), having a secondary MX allows you adjust the timeouts and keep the mail queued and safe until the primary is back and running.

* In a time critical mail environment, having a secondary MX allows you to immediately dequeue messages from your secondary to your primary MX once an issue with the primary is resolved (rather then waiting arbitrary amounts of time for mail servers all over the world to dequeue your mail).

That said, I've just removed the secondary MX from my personal server because I was getting a significant amount of spam from it and the tradeoff didn't seem worth it.

Adam.

[ Parent | Reply to this comment ]

Posted by dominic (24.80.xx.xx) on Tue 24 Oct 2006 at 02:41
Does mail from your secondary server get processed for spam when it shows up on at your primary server anyhow? Even if you want to cut back spam on your secondary, a simple Spam Assassin installation that rejects anything scoring over 10 should bring you down to a volume that isn't too costly.

And I guess one could always add a secondary MX after the fact if there's a long service outage.

[ Parent | Reply to this comment ]

Posted by adamshand (198.95.xx.xx) on Tue 24 Oct 2006 at 02:56
[ View Weblogs ]
On my primary I'm running Postfix, SBL+XBL, Greylistinging, Amavasd-new and Spamassassin (with network checks). So it's a fairly thorough ;-)

The problem I was having was that the majority of the spam that did make it through was coming via my secondary (which is run by a friend so I don't have full control over it). I believe that this was because certain checks can't work when when it's coming from a secondary which would lower the spamassassin score just enough that it would sneak in.

I wasn't getting a *lot* of spam from my secondary, just enough that it was annoying. I'm still evaluating the effects of removing it (I implemented greylisting shortly after so cause/effect is a bit blurry at this point) but it seems to have made a significant difference.

The clue that made me investigate this came from running mailgraph. I noticed that I was getting regular spikes of incoming mail every hour and with a bit of poking figured out that it was the queue runner on my secondary kicking off.

Adam.

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Wed 25 Oct 2006 at 00:29
[ View Weblogs ]
<blockquote>As for co-ordination, since my domain is very small, I don't bother. I just let everything spool on the secondary and let the primary sort it all out later.</blockquote>

Which makes you part of the problem (don't take it personally, I'm part of the problem, just I struggle harder to not be).

So someone fakes my email address as sender, delivers to your secondary, and because you can't be arsed, I get the bounce when your primary says "no such address". Gee thanks.

You're a backscatter source.

This isn't acceptable email practice any more, even if qmail does it by default.

Easiest way to stop abusing the rest of us, is to remove your secondary MX.

[ Parent | Reply to this comment ]

Posted by dominic (142.58.xx.xx) on Wed 25 Oct 2006 at 01:15
No offence taken. This is very much core to the spam issue.

However, the problem is not whether or not I check right away since you would get that bounce from either host (if the spam scored under 10 and more then 6 or whatever the current settings are). The problem is that the source address is not verifiable. There's nothing I can do to ensure that mail address from you has anything at all to do with you in the first place.

I've setup SASL authenticated SMTP so that all mail is at least legitimately authenticated before my mail server will relay it out. I still haven't figured out how to enforce that the recipient might be able to verify this to any degree of satisfaction.

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Wed 25 Oct 2006 at 02:00
[ View Weblogs ]
My spam filtering doesn't create backscatter because I do it before the SMTP transaction completes.

This is a problem with your set up, not a generic issue with SMTP. Sure you aren't alone (and my servers aren't yet perfect) but this is avoidable backscatter.

If you reject the email before the SMTP transaction completes, no DSN report is generated by you. Sure the sending server might generate one, but odds are it is a spambot and won't do that.

The problem here is accepting email you won't subsequently deliver. If you always deliver what you accept, and reject what you won't deliver, before the transaction completes you don't generate backscatter.

The antispam workers are busy building lists of machines that generate backscatter too readily. Expect all your DSN reports to be ignored by big email sites if you continue to generate them too freely, so next time it comes back with an error you couldn't handle early on - like "overquota" when you try and deliver from queue to disk, the DSN will be rejected because your email server is a bad Internet denizen and has been listed as a backscatter source, and the sender will never know their email was rejected.

Backscatter is a major issue. Something like half the email passing our initial filters at work is backscatter at the moment (having grown noticably in the last couple of weeks). Although this is probably because we are efficient at killing email from sources that aren't behaving like genuine email servers, leaving the backscatter arriving from genuine email servers to be dealt with.

Of course it would be far simpler if people didn't generate the backscatter in the first place.

The "I'm not very big" doesn't really wash either. When you get hit by thousands of backscattered emails each day, because thousands of server admins thought "I'm not very big", it becomes clear the behaviours leading to backscatter don't scale well.

Similarly the spammers don't know how big you are, and if they start a dictionary attack against your domain using your secondary server, you'll suddenly be queuing a lot of email to bounce. I've seen gigabytes of such email accumulate over a weekend on a secondary MX that didn't know who had an email address and who didn't.

[ Parent | Reply to this comment ]