Bad Actor Script

This is a script to figure out which ip addresses to ban from your foswiki site. You can do this either in an external firewall (e.g. the AWS vpn/ACL lists) or by judicious use of fail2ban(1) and your local firewall. It helps analyze the /var/log/apache2/other_vhosts_access_log.* files for ip addresses which are abusing your site by making massive numbers of requests in a short period of time. So far in my experience these attackers all share the same /16 address, typically out of China, Vietnam, or Singapore.

Usage is as follows:

Usage: ./find_bad_actors.sh f:i:hIT
    You must pass a -f filename, which must be an other_vhosts_access_log from apache2.
    If you pass only the -f filename, you will get back a list of /16 ip addresses and the counts
    of the times they appear in the file, sorted with the largest number of addresses at the bottom.
    If you pass a "-i ip_address", you will get back the requests made from that ip address. Make
    your own mind up about crawlers, but attacks are pretty easy to see from this.
    -h of course brings you this happy message.
    -i ip_address  -I will show you the full IP address of all the /16 ip addresses you are looking at.
    -i ip_address  -T will show you the full IP address and the log timestamp of all the /16 addresses you are looking at.

Running the script against an other_vhosts_access file alone gets you a list of /16 ip addresses and how many times they've hit your site:
$ find_bad_actors.sh -f other_vhosts_access.log
      1 216.244
      1 60.188
      2 114.119
      3 216.73
      3 42.236
      5 108.88
      5 223.109
    582 47.242

A script run for an obvious attacker will look something like this ( this is only the last few lines on my terminal ).


$ find_bad_actors.sh -f other_vhosts_access.log -i 47.242

[14/Aug/2025:13:31:19+0000]"HEAD/foswiki/view/TWiki/TWikiForms?amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D3%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=3%3Btable%3D3%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=3%3Btable%3D3%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=1%3Btable%3D4%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=2%3Btable%3D4%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bsortcol=1%3Btable%3D3%3Bup%3D0&amp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D3%3Bup%3D0&amp%3Bamp%3Bsortcol=1%3Btable%3D4%3Bup%3D0&amp%3Bsortcol=2%3Btable%3D1%3Bup%3D0HTTP/1.1"5042351"-""Mozilla/5.0(WindowsNT6.3;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/77.0.3535.183Safari/537.36"
[14/Aug/2025:13:31:19+0000]"HEAD/foswiki/view/TWiki/TWikiForms?amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D2%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=5%3Btable%3D2%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=1%3Btable%3D1%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=2%3Btable%3D1%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D2%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bsortcol=0%3Btable%3D2%3Bup%3D0&amp%3Bamp%3Bamp%3Bsortcol=0%3Btable%3D3%3Bup%3D0&amp%3Bamp%3Bsortcol=1%3Btable%3D4%3Bup%3D0&amp%3Bsortcol=1%3Btable%3D3%3Bup%3D0HTTP/1.1"5042350"-""Mozilla/5.0(WindowsNT6.2;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/78.0.2089.53Safari/537.36"
[14/Aug/2025:13:31:19+0000]"HEAD/foswiki/view/TWiki/TWikiForms?amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=3%3Btable%3D2%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=3%3Btable%3D2%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=1%3Btable%3D4%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=0%3Btable%3D2%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=3%3Btable%3D1%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bsortcol=3%3Btable%3D2%3Bup%3D0&amp%3Bamp%3Bamp%3Bsortcol=2%3Btable%3D3%3Bup%3D0&amp%3Bamp%3Bsortcol=0%3Btable%3D1%3Bup%3D0&amp%3Bsortcol=0%3Btable%3D4%3Bup%3D0HTTP/1.1"5042352"-""Mozilla/5.0(WindowsNT6.1;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/61.0.3786.102Safari/537.36"
[14/Aug/2025:13:31:19+0000]"HEAD/foswiki/view/TWiki/TWikiForms?amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D3%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=1%3Btable%3D4%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D3%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=2%3Btable%3D2%3Bup%3D0&amp%3Bamp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D2%3Bup%3D0&amp%3Bamp%3Bamp%3Bsortcol=1%3Btable%3D1%3Bup%3D0&amp%3Bamp%3Bsortcol=0%3Btable%3D1%3Bup%3D0&amp%3Bsortcol=0%3Btable%3D4%3Bup%3D0HTTP/1.1"5042351"-""Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/70.0.2791.136Safari/537.36"

Note not only the weird GET requests, but also the ruthlessly close together timestamps. That drives my uptime(1) stats up into the double-digit range, where the system becomes unresponsive and eventually falls over.

But a (more) benign crawler will look more like this:

$ find_bad_actors.sh -f other_vhosts_access.log -i 216.73
[14/Aug/2025:13:11:25+0000]"GET/robots.txtHTTP/1.1"301585"-""Mozilla/5.0AppleWebKit/537.36(KHTML,likeGecko;compatible;ClaudeBot/1.0;+claudebot@anthropic.com)"
[14/Aug/2025:13:19:26+0000]"GET/robots.txtHTTP/1.1"200590"-""Mozilla/5.0AppleWebKit/537.36(KHTML,likeGecko;compatible;ClaudeBot/1.0;+claudebot@anthropic.com)"
[14/Aug/2025:13:19:57+0000]"GET/sitemap.xmlHTTP/1.1"404494"-""Mozilla/5.0AppleWebKit/537.36(KHTML,likeGecko;compatible;ClaudeBot/1.0;+claudebot@anthropic.com)"
' You can get a look at the actual IP addresses sorted by count with the -I tag:

$ find_bad_actors.sh -f other_vhosts_access.log -i 66.249 -I
      1 66.249.66.164
      1 66.249.66.165
      1 66.249.69.172
      2 66.249.68.35
      2 66.249.68.37
      2 66.249.68.38
      2 66.249.79.168
      3 66.249.79.166
      4 66.249.79.167
      5 66.249.68.36
     19 66.249.66.192
     28 66.249.66.200
     48 66.249.66.199
    136 66.249.79.195
    151 66.249.79.194
    182 66.249.79.193

And you can see what the timestamps are like with the -T tag:

$ find_bad_actors.sh -f other_vhosts_access.log -i 66.249 -T

66.249.79.194 [25/Aug/2025:00:37:14
66.249.79.193 [25/Aug/2025:01:01:27
66.249.79.195 [25/Aug/2025:02:32:14
66.249.79.195 [25/Aug/2025:02:38:00
66.249.79.195 [25/Aug/2025:03:07:15
66.249.79.193 [25/Aug/2025:03:52:16
66.249.79.193 [25/Aug/2025:03:53:17
66.249.79.194 [25/Aug/2025:03:53:20
66.249.79.195 [25/Aug/2025:03:53:20
66.249.79.193 [25/Aug/2025:03:53:20
66.249.79.194 [25/Aug/2025:03:53:20
66.249.79.194 [25/Aug/2025:03:53:21
66.249.79.193 [25/Aug/2025:03:59:19
66.249.79.194 [25/Aug/2025:04:00:20
66.249.79.194 [25/Aug/2025:04:04:22
66.249.79.168 [25/Aug/2025:04:38:16
66.249.79.166 [25/Aug/2025:04:38:16
66.249.79.166 [25/Aug/2025:04:38:17
66.249.79.193 [25/Aug/2025:05:23:16
66.249.79.194 [25/Aug/2025:06:08:17

# ( and so forth )

You can make your own decisions about how benign an AI crawler is, but at least it's not bringing down my site.

The script is of course released GPLV3 blah blah blah.

-- CharlesShapiro 27 August 2025
I Attachment Action Size Date Who Comment
find_bad_actors.shsh find_bad_actors.sh manage 2 K 27 August 2025 CharlesShapiro A script to analyze other_vhosts logs for IP addresses to block. Now with more options!
Topic revision: r2 - 27 August 2025, CharlesShapiro
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback