ZombieBots Part 2 or… | ComputerMedic (dotOrg) Web Servers

Equally exciting, terrifying, low-budget and prone to sequels.

So bad it’s good movie lovers, click the link above and see if you can survive that whirlwind of bites.

Server admins, stay right here and get ready for DNS-Zombie-Bots Two: More Tech-Talk and .configs Than You Can Stand! (Or, “Bored To Death!” Or, “You can have the whole seat, but you only need the edge!”)

Or, I had to document it so I can take it from server to server without trusting my memory, so I thought I would share.

It started with a ‘Hay Bay-Bay’ – or a ‘clients-per-query’ message.

Lots of tweaks, tunes, service this restart, /etc/init.d/that restart later: ‘clients-per-query’ (increased/decreased) messages, lots of them. (Somehow, sync’d between servers, trying to figure that BIND9 magic out would be like trying to reach into the mouth of one of those sharknado sharks and pull its heart out. It is because it is, do the fixes you can do and worry about enigmatic synchronicity later.)

Here’s the setup again so when you try these things on a server with a point-zero-zero-one version difference you’ll know why it doesn’t work:
~ CentOS x86_64 6.4 (Installed, updated [yum update] June, 2013)
~ bind / named
* # rndc status: version: 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.4
* # yum list bind: bind.x86_64 32:9.8.2-0.17.rc1.el6_4.4
~ fail2ban
* # yum list fail2ban: fail2ban.noarch 0.8.8-3.el6 ( 0.8.10-1.el6 available )

Since last I wrote about it ( Killin Zombie Bots ) some seemingly minor, but very important changes mainly to the bind/named and related conf files.
/etc/resolv.conf: nameserver 127.0.0.1
~ all ‘in-server’ services should ask ‘self’ for DNS, when self doesn’t know it “recurses” (goes upstream) and caches so that for a time (cache TTLs and expirys) ‘self’ does know the answer.
/etc/named.conf (in the options { } block): querylog yes;
~ log at boot (that semi-colon ‘;’ is very-necessary)
/etc/fail2ban/jail.conf: ignoreip = 127.0.0.1/8
~ ‘confirmed’ (it is the default in the [default] section) – Self: don’t ban we.

clients-per-query groans the almost healed zombie server

Ask the modern zombie-to-English interpreter ( google ) what that means and the interpreter says:
About 2,440,000 results (0.33 seconds)

Go on an Injun vision quest and consult the shaman: add more RAM.

Don’t run down that 2.44-Million results rabbit hole. Smack your head and think I should have thought of that when the light bulb turns on in there behind the sign that reads:

The cache of a caching DNS server on a moderate-to-heavy-load server can get quite large.

That’s RAM bay-bays, nothing else. Don’t believe yourself, # free. @125MB of 2GB left and look lower: @105648[k] used Swap:. This is on the ‘dedicated’ DNS server. Pop-Nerd-Quiz: Going swap happens when? Right, when ‘real memory’ (RAM) is full.

Jump over to the we-have-it-no-matter-what-it-is-and-cheap shop of the new millennium ( eBay ) and [Buy It Now] on 8GB of the wrong RAM for your server. Smack your head again because everything is ‘the hard way’ (like being the only seal in a Sharknado), put the wrong RAM up for sale and [Buy It Now-Now] on 8GB of hopefully the right RAM for your server. When it shows up (it will be right this time!) hope that 8GBs is enough for a caching DNS server.

Summary: clients-per-query = add more RAM.

There is a nightmare-nado of other things you can try to tweak or tune or limit and shutdown -r now… or you can RAM-up and see all those Zombie-Language /var/log/ messages in your logs vanish. If you are ‘flush with GB’ and nothing in swap: it’s 2.44 million curtains for you, tough guy (in 0.33 seconds)!

Z0mb13 t4Lk (or ID-10-T bot-writer cOdEs) and fail2ban

1rip, 1Rip, 1rIP, and so-on. case-insensitive, or ignore-case. Sounds so easy. So you try a little (?i) and a little \/\.IhateRegEx (interpreted through .py and other things depending on revision or build number) and pretty soon you are standing in the eye of a shark-icane with your Mary Poppins umbrella waiting for the winds to take you to a hopefully quick and not too painful shark-shutdown (in the air).

begin here (/etc/fail2ban/filter.d dir):
[root@server filter.d]# cp named-flood.conf named-ignoretest.conf
[root@server filter.d]# vim named-ignoretest.conf

It has now (sorry for the wordpress word-wrap):

failregex = .* named\[.*\]: client <HOST>\#.*: query: (1rip\.com|isc\.org|\.) (IN|ANY) *

Based on the only search result that made sense ( https://github.com/fail2ban/fail2ban/issues/48 ) and ( http://www.tutorialspoint.com/python/python_reg_expressions.htm ) [and about 100 trial-with-error failures] change it to:

 failregex = .* named\[.*\]: client <HOST>\#.*: query: ((?i)1rip|1rip\.com|isc\.org|\.) (IN|ANY) *

The important thing here (besides this is not tested against any other versions): ((?i)[pipe separated list]). The ‘ignore-case’ (?i) toggle is working on all of the entries in the [pipe separated list]. Another thing: I didn’t test and don’t care if the case-insensitive compare carries over to the (IN|ANY).

Because the only ‘spoof’ in there is 1rip.com (now case does not matter) some of those isc.org queries are still getting answered, and the (space)1rip(space) [1rip without a domain extension] are still doing something (as yet unknown) to the cache and the upstream. What is known about those is that they are now successfully triggering fail2ban to shut those servers/ips down after a couple of hits and send the rest of their millions of attempts to >dev/null.

Doing packet/byte count watches ( #iptables -n -L -v –line-numbers ) reveals that once ‘dumped’ into the ‘fail2ban filter table’ the bad-zombie-bots (flooding w/requrests) are ‘dropping’ many hundreds of thousands of requests (packets) and GBs of data per hour.

2    3000K  864M fail2ban-dnsflood    all  -- *  *   0.0.0.0/0   0.0.0.0/0
3    1829K  792M fail2ban-maillogins  all  -- *  *   0.0.0.0/0   0.0.0.0/0

“It’s only” 72MB (@10% by bytes), but fully 39% of all packet-traffic being killed by this fail2ban zombie-net – on this one particular server. Not sure how to ‘math it out’ but it is also a server-unload because that many (1171K = 1.2-Million) queries/requests are not being cache-pulled or sent upstream – on this one particular server. (iptables numbers above were reset 60 minutes previous)

Slowly I turned, step by step, inch by inch
(shark by shark twisting in the wind)

I ran off on a statistics tangent and never completed the fail2ban new-regex howto.

The new-est /etc/fail2ban/filter.d/named-flood.conf needs to be up-to-dated:
# vim /etc/fail2ban/filter.d/named-flood.conf

[Definition]
failregex = .* named\[.*\]: client <HOST>\#.*: query: ((?i)1rip|1rip\.com|isc\.org|\.) (IN|ANY) *
ignoreregex =

:wq (write and quit)

Make a test file:
#vim /tmp/testfile.txt (press insert when it ‘loads: New File’)

Jul 11 05:40:22 server named[1301]: client 1.1.1.1#1: query: 1rip IN ANY +E ([ip of server])
Jul 11 05:40:22 server named[1301]: client 2.2.2.2#2: query: 1rip IN ANY +E ([ip of server])
Jul 11 05:40:23 server named[1301]: client 3.3.3.3#3: query: 1Rip IN ANY +E ([ip of server])
Jul 11 05:40:23 server named[1301]: client 4.4.4.4#4: query: 1rIp IN ANY +E ([ip of server])
Jul 11 05:40:24 server named[1301]: client 5.5.5.5#5: query: 1riP IN ANY +E ([ip of server])
Jul 11 05:40:24 server named[1301]: client 6.6.6.6#6: query: 1rip IN ANY +E ([ip of server])
Jul 11 05:40:24 server named[1301]: client 7.7.7.7#7: query: 1rip.com IN ANY +E ([ip of server])
Jul 11 05:40:24 server named[1301]: client 8.8.8.8#8: query: 1rIp.com IN ANY +E ([ip of server])
Jul 11 05:40:24 server named[1301]: client 9.9.9.9#9: query: linenine.com IN ANY +E ([ip of server])

:wq (write and quit)

[root@server filter.d]# fail2ban-regex /tmp/testfile.txt named-flood.conf

Should get 8 “number of match”

Compare to grep-ing (note the spaces and escaped .s inside the single quotes)
grep -c -i ‘ 1rip ‘ /tmp/testfile.txt : 6
grep -c -i ‘ 1rip\.com ‘ /tmp/testfile.txt : 2
grep -c -i ‘ \. ‘ /tmp/testfile.txt : 0
grep -c -i ‘ isc\.org ‘ /tmp/testfile.txt : 0

Test and compare against the real thing:

Make a working copy:
[root@server filter.d]# cp /var/log/messages /tmp/x.txt

[root@server filter.d]# fail2ban-regex /tmp/x.txt named-flood.conf
Takes a while on this server, then:
Success, the total number of match is 106225

Compare to grep-ing (note the spaces and escaped .s inside the single quotes)
grep -c -i ‘ 1rip ‘ /tmp/x.txt : 81034
grep -c -i ‘ 1rip\.com ‘ /tmp/x.txt : 2977
grep -c -i ‘ \. ‘ /tmp/x.txt : 15917
grep -c -i ‘ isc\.org ‘ /tmp/x.txt : 6297
——————– 106225 all added together

Sure looks like this is working, rm all those test files in /tmp/, then:

# /etc/init.d/fail2ban restart

Because of the amount of sharks in this nado, you might (we have to) manually block some ip’s while fail2ban gets back in the race. Once fail2ban is all caught up and ready to go up against the whirlwind of feeding-frenzied zombie-shark-bots, manually release those and let fail2ban do its thing.

One last piece of housekeeping in this bad movie: reset the counters.
#iptables -Z

About one minute later, late one Saturday afternoon:

num   pkts   bytes    target (rest snipped)
1     2059    175K    fail2ban-dnsflood
2      352   66583    fail2ban-maillogins

A whopping 17% of packet-traffic is NOT a DNS-DDoS-Flood packet.