Killin’ Zombie Bots DNS Style

The fun post title is the only thing fun about this. Unless you are in hosting, server admin, or run your own DNS servers this is going to be complicated, technical and boring.

First [is first ~LK]: an apology to any “upstream” DNS servers that our pet zombie bots may have passed bad requests to.  Like us, you probably didn’t know it was happening.

“What’s happening?!”
~ Dana (Dominique Dunne, Poltergeist, 1982)

Thousands Hundreds of thousands Millions Hundreds of millions of bad DNS requests.

Let me start at the beginning.  All “new” servers at MLD (computermedic.org) – weblogger here.

Checking on things (scanning logs) on the new equipment to see how it was doing: a new message in /var/log/messages: named[1584]: clients-per-query decreased to 61 (as low as the 20s). Never heard of it (the old servers had bind/named logging disabled).  A little search finds: # rndc querylog (turn on logging for bind).

# tail -f /var/log/messages
Boom.  Screens scrolling by too fast to see what’s going on.
Jun 27 06:55:58 nineseven named[1584]: client 183.178.216.210#12673: query: 1rip IN A + (70.63.178.157)
Jun 27 07:29:28 nineseven named[1584]: client 213.128.75.196#41712: query: 1rip.com IN ANY +E (70.63.178.157)
…and a bunch of query: DDoS.asia (how inconspicuous)

The immediate problems (why these goofy old internet attacks work):
1. Nobody watch-dogs their name servers (if they – or we – did, we would have known this was happening).  By default, bind/named do not log.  Most people running a name server are running secondary to a web, email or app server; it’s just part of the setup and quickly forgotten.
2. Name Servers (mostly) pass queries “upstream” when they do not have an answer themselves. This is the big deal: Bot->to->ns1.computermedic.org “what is the ip address of 1rip?” ns1 does not know, tries to be overly helpful and “I’ll find out and get back to you” asks an upstream server “can you tell me the address of 1rip so I can tell bot?”; Well, what if upstream server doesn’t have an answer?  This continues (upstream processing / passing of queries) until either a timeout occurs or some server gives an answer – even if the answer is: 1rip doesn’t have any DNS, thank you.
3. #2 thousands of times per second.  The only ‘symptom’ on a new/modern server with a new/modern OS: clients-per-query decreased

#4: How do ‘we’ know #1 is true – we didn’t catch this for all these years – and our upstreams have never called or sent email to say “did you know that your servers are bombing ours with upstream queries?”  We are unwitting zombies in the zombie-internet-apocalypse.

The decided upon solution: Stop this madness. Call the internet police and have them write a ticket and issue a “strongest terms” warning to all of the other did-not-know-they-were-zombies-zombies.  They can just slap the old bracelets on those rogue bots, they need to be jailed anyway.

There is no such thing as internet police – call the server admins, work out what you can do with the tools that you have.  Solution: fail2ban, iptables firewall, spoof zones.

The problems with the solution (.plan): fail2ban, iptables, and bind (real zones, let alone spoofs) are extremely complicated to setup and use; and, equally frustrating in their levels of “should work, but doesn’t work.”  The ‘another’ problem(s) with the solution (.plan): we use a “web interface server manager” that wants to control everything about the server and is prone to overwriting changes you make directly to the machine. The server manager has its own firewall rules and implements them using iptables.  That is “sort of” good, because iptables is already up and running and known to be working.

.plan in action part 1 – fail2ban

 See who, what, where, why and how about fail2ban here: http://www.fail2ban.org

Then search with your favorite flavor of search engine about fail2ban howto, and those evil-incarnate regexs (regex->py(thon) style->fail2ban interpreted). Then come back here for the step-by-step.

Our test this .plan server: 2-Core Intel processor (old), CentOS x86_64 6-point-something, bind and fail2ban installed with yum.  Will the steps work with your (other than CentOS 6 basically OOB)? Probably not, every OS distributor likes to move files around or change some trivial word (directories, folders, libraries?) to make their OS “better”.  The ‘paths’ and ‘stuff’ below are for our server, they “should be” nearly the same on yours.

First [is FIRST, Lilly Kate says], make sure bind/named are logging.  We turned it on “by hand” and have not yet configured the .conf ( /etc/named.conf or /etc/named.conf.local ) to do the logging on boot.

# rndc querylog

Now, tail -f /var/log/messages and make sure you start to see some [named] entries.

If you do not see any bad guy messages or floods, stop, don’t sweat it, check your logs next month.

The first-worst zombie-bot-flood was/is queries for: 1rip.com
After 2 hours of logging: #grep -c ‘1rip’ /var/log/messages
100025

Now, once you start to try to figure out fail2ban you’re going to find about a-million weblogger (moron) pages that tell you: RTFM.  The Linux nerds love that, translated from nix-nerd to English RTFM means “I don’t know either but I like writing that on boards and web pages.”

Here’s TFM for this particular problem and this particular fail2ban .plan (of course, with the particular setup/OS mentioned previously):

# vim /etc/fail2ban/filter.d/named-flood.conf
(new file… press insert to get into insert mode)
[Definition]
failregex = .* named\[.*\]: client <HOST>\#.*: query: 1rip.com IN *
ignoreregex =
:wq

I spent better than 2 hours testing, trying, RTFM-ing, playing with, etc. that failregex and fail2ban-regex (their testing tool). If your log lines look any-at-all different than ours did (scroll way up) then this regex won’t work for you – you have to tweak it and test it.  Copy a few lines from your log to a file ( if you don’t know how to do that, you really should not be messing with servers and firewalls ); make sure you have some ‘good’ DNS responses in the file.
# vim /tmp/testfail2regex.txt (new file, insert/paste some lines, :wq).
# fail2ban-regex /tmp/testfail2regex.txt /etc/fail2ban/filter.d/named-flood.conf

OK, I’ll assume if you are still reading your test said: some number of hits / matches.

( add the following at the bottom of the file )
( edit 7.1.13 – use the example at the bottom of this page )

# vim /etc/fail2ban/jail.conf
[named-flood]
enabled = true
filter = named-flood
logpath = /var/log/messages
action = iptables-allports[name=dnsflood]
bantime = 600 ; start with 10 mins
findtime = 1
maxretry = 1
( :wq to write and quit )

Do NOT miss or skip this step, fail2ban’s default iptables rules default to tcp – DNS is udp and if you don’t fix the action file fail2ban bans all tcp, DNS flood will continue.

# vim /etc/fail2ban/action.d/iptables-allports
#protocol = tcp (this # is not the nix command prompt, it’s a comment)
protocol = all
( :wq to write and quit )

# /etc/init.d/fail2ban restart

That’s (almost) it. You should start seeing ( tail -f /var/log/fail2ban.log ) bans and unbans.

Why the 10-minute ( bantime = 600 ) wall?  Why not forever, or a month? Because, reminder: our servers were sending these requests to upstream DNS resolvers for years.  We never got locked/blocked from anywhere.  Ten minutes is a “good/fair start” – hopefully if these are requests from real servers (not all zombie bots) they’ll end up fixed.  And, another “to remember” is that if my “upstream providers” ban me forever, on all ports, how am I going to get legitimate DNS requests resolved?  It’s a very complex problem/solution scenario (in the real world).

The last step of part 1 (fail2ban) is to fix a mis-configuration “out of the box” with fail2ban’s own logs.  The “default” conf (configuration file) has logging set to /var/log/fail2ban.log.  That’s AOK with us.  The default /etc/logrotate.d/fail2ban (installed by/with fail2ban) has a line that tells fail2ban to use SYSLOG (/var/log/messages) after rotation.  Don’t want that so:

# vim /etc/logrotate.d/fail2ban
(change the size from 30k to a bigger/better number – don’t want 100s of fail2ban-date.log files)
size 500k
(delete [dd] these two lines 6.29.13: correction-change the postrotate command)
postrotate
/usr/local/bin/fail2ban-client set logtarget /var/log/fail2ban.log 2>/dev/null || true
( :wq )

( edit 7.1.13 – use the example at the bottom of the page )
Your finished /etc/logrotate.d/fail2ban file should be:
/var/log/fail2ban.log {
missingok
notifempty
size 500k
create 0600 root root
postrotate
/usr/local/bin/fail2ban-client set logtarget /var/log/fail2ban.log 2>/dev/null || true
endscript
}

 

.plan part 2: spoofed zone – stop sending upstream

The only “hitch” with the fail2ban solution (using fail2ban for anything) is that it is an “earn our dis-trust” practice.  Meaning: you let anyone and everyone in until they do something (accidentally or not) that gets them thrown out.  Running internet servers is a lot like running a bar – just more drunks.

At our “digital Cheers” we run a clicky-easy administration program.  Clicky-add a DNS Zone.  Here, again, problems.  Clicky-easy does spell checking and IP validation and all kinds of neat “don’t let someone make server-death mistakes” because they think clicking a mouse makes them an RTFM-posting Certified Server AdMInIsTraTOR.  If we write our own zone files or mod the .conf files of bind/named – Clicky-easy will overwrite our “mastery” the next time we make changes.

So.  Work around our own watch dogs.

Why, again?  Because we want our servers to answer the zombie-bots or “downstream” requests ( for 1rip.com in particular ) with “I know the answer to that!!!” (but it’s a CIA dis-information campaign) rather than becoming an in-stream zombie ourselves.  Let me try to rephrase that:
Zombie-Bot asks us: 1rip.com?
Right now, since fail2ban is up and only suffers a few seconds lag time:
we (named) finds no local ‘zone file’ (“I dunno”) so passes those few 1rip.com?s upstream.
Right now, what “we” want to happen:
we (named) finds a local ‘zone file’ and answer the bad-zombie-bot (or less-informed-down-stream):
1rip.com? I know them – look deep inside your self Clarisse! ( spoof the reply, 1rip.com = 127.0.0.1 )

So. Work around our own watch dogs: Clicky-easy will not let you enter (lo, loopback, localhost, 127.0.0.1) or any other “bad” stuff into a zone file.  Reformulate the .plan, write it out in advance (WTFM so you can RTFM), try it on a non-production server, then go at it, fast, because you’re messing with a production server now.

Take down the named, you don’t want to give ‘good answers’ to those ‘bad requests’.
# /etc/init.d/named stop

Clicky-Easy-Server-Admin->
Add a Zone (Wizard), all real information to pass ‘muster’ (spell check and validation)
Save Zone, logout of Clicky-Easy, exit Clicky-Easy
Wait for it… there it is…
/var/named/1rip.com.zone
*** not the real file name, don’t want to give away too much

# vim /var/named/1rip.com.zone (use your real file name)
$TTL        3600
@       IN      SOA     ns1.1rip.com. null.1rip.com. (
2013062706       ; serial, todays date + todays serial #
7200              ; refresh, seconds
540              ; retry, seconds
604800              ; expire, seconds
86400 )            ; minimum, seconds
;
1rip.com. 3600 A        127.0.0.1
1rip.com. 3600      NS        ns1.1rip.com.
ns1 3600 A        127.0.0.1
( :wq to write and quit )

# /etc/init.d/named start

Using a very different clicky-easy Data-Base admin program, go into the db for clicky-easy-server-admin and change/remove it’s version of the ‘bad zone’ data.
Note to self: DO NOT use clicky-easy to open/view/edit/save Spoofed-Zone Domains/DNS.

******** postrotate ********

 6.28.13: The .plan in .effect is working.  Instead of ~100,000 1rips per hour (2 ns servers) it has fallen off to about ~100/hour at the primary.  Yes, there’s a lot of bloated log files (so what, 3TB drives are in the US$100 range right now); yes, there is some cpu overhead to fail2ban (0.0,0.1,0.0 last check); yes, the 10 minute kick is probably not enough (90 % of the bans in the fail2ban.log happen one second after that zombie-bot-ip was Unban-ed) – but, hay, 10% went away forever in one day.

The “way up side”: absolutely no more “passed upstream requests” for 1rip from us to our upstreams.  No more “our side” delays waiting for responses or timeouts from upstream.  Last night clients-per-query reduced to 99.

The ‘couple of other things’ that need to be done (or could be), jail a couple more of the offensive requests (no need to spoof dns for DDoS.asia – only about 1000 attempts in the last 24 hours).

The simple-est solution is to copy/paste/edit the fail2ban rule file:
# vim /etc/fail2ban/filter.d/named-flood.conf
(yy to yank the failregex line, p to paste a copy, change the badguy name)
[Definition]
failregex = .* named\[.*\]: client <HOST>\#.*: query: 1rip.com IN *
.* named\[.*\]: client <HOST>\#.*: query: DDoS.asia IN *
ignoreregex =
:wq

# /etc/init.d/fail2ban restart -or- # fail2ban-client restart

There are ways to ignore case and macro-expand the regex’s – (?this|that) – not too vastly interested in it.  There are really not that many different variations coming in right now (absolutely zero ddos.asia; all DDoS.asia) [ grep -c ‘DDoS.asia’ /var/log/messages = 1070 and grep -i -c ‘DDoS.asia’ /var/log/messages = 1070 ].  If you want to “get all fancy” then see fail2bans apache-badbots.conf file.

– – – – – 6.28.13 Late Update – – – – –

A 2nd one appears today (been in the logs a couple of (hundred thousand) times): isc.org

Found: http://my.opera.com/jlouisbiz/blog/2013/05/14/blocking-amplified-dns-attack-by-the-ip-address-and-using-linux-firewall-softwar

# vim /etc/fail2ban/filter.d/named-flood.conf
(yy to yank the failregex line, p to paste a copy, change the badguy name)
[Definition]
failregex = .* named\[.*\]: client <HOST>\#.*: query: 1rip.com IN *
.* named\[.*\]: client <HOST>\#.*: query: DDoS.asia IN *
.* named\[.*\]: client <HOST>\#.*: query: isc.org IN *
ignoreregex =
:wq

# /etc/init.d/fail2ban restart -or- # fail2ban-client restart

Stretched the ban time out to 20 minutes.

July 1, 2013: Some notes about how this has continued:
– it has only gotten worse (another post coming)
– had ‘timing’ issues with fail2ban (blahblah already banned 1000s of times)
– stretched the jail time out to 2 hours (sorry infected-good-guys id’d as zombie-bad-guys)
– got errors trying to use apache-bad-bots style ‘expansion’ so ended up with:

/etc/fail2ban/jail.conf

# DLW 6.27.13 - should use .local file? in a hurry to stop flood attack, sync/ddos
[named-flood]
enabled = true
filter  = named-flood
logpath = /var/log/messages
action  = iptables-allports[name=dnsflood]
bantime = 7200  ; start with 10 mins, bumped to 20, bumped to 2 hours 6.30
findtime = 2
maxretry = 1

The findtime of “2” stopped fail2ban from tripping over its own feet (stuck in an “…already banned” loop).

/etc/fail2ban/filter.d/named-flood.conf

[Definition]
failregex = .* named\[.*\]: client <HOST>\#.*: query: (1rip\.com|isc\.org) IN *
ignoreregex =

B-b-b-buh-but wait, there’s more! fail2ban’s default install and documentation have another ‘glitch':
the logs don’t postrotate because they put the wrong path to the right file in their script.

[root@servername~]# /usr/local/bin/fail2ban-client set logtarget /var/log/fail2ban.log
-bash: /usr/local/bin/fail2ban-client: No such file or directory
[root@servername~]# /usr/bin/fail2ban-client set logtarget /var/log/fail2ban.log
Current logging target is:
`- /var/log/fail2ban.log
[root@servername ~]# tail -f /var/log/fail2ban.log
2013-07-01 23:19:28,262 fail2ban.server : INFO
------------>   Changed logging target to /var/log/fail2ban.log for Fail2ban v0.8.8

Notice the absence of /local/ in the path.  One more tidbit fixed.
/etc/logrotate.d/fail2ban

/var/log/fail2ban.log {
    missingok
    notifempty
    size 500k
    create 0600 root root
    postrotate
      /usr/bin/fail2ban-client set logtarget /var/log/fail2ban.log 2>/dev/null || true
    endscript
}

/etc/logrotate.conf has the defaults for rotate and other settings

!!! Note: isc.org is a real deal – not some left over bad-bot junk.  They make bind (DNS Name Server) and all kinds of good stuff so that we can have these internets without typing IP addresses for everything.  Because they are ‘real’ (unlike 1rip.com) we did not attempt to ‘spoof’ their zone – the floods were amazing, now were passing up a few 1000 ‘snuck by fail2ban’ requests.  But, sorry bad-zombie-bots – we don’t server your kind here (aka we don’t run a free-for-all DNS server, you zombies will have to flood someone else).

There is still another post coming about this madness, but it’s a different wopper all together.

 

3 thoughts on “Killin’ Zombie Bots DNS Style

  1. Pingback: ZombieBots Part 2 or… | ComputerMedic (dotOrg) Web Servers

  2. Pingback: Beall's Outlet

    1. computermedicorg Post author

      Rick-a-Rack-a-No-Good WP: can’t figure out how to make it show the full original… here’s the snip:
      I absolutely love your blog and find the majority of your post’s to be exactly what I’m looking for. Do you offer guest writers to write content for yourself? I wouldn’t mind writing a post or elaborating on a number of the subjects you write about her…

      Send a sample old fashioned email style (support [email symbol shift+2] computermedic.org) and we’ll add it to the post or give it it’s own. Thanks.

Comments are closed.