Bad Actor Script
This is a script to figure out which ip addresses to ban from your foswiki site. You can do this either in an external firewall (e.g. the AWS vpn/ACL lists) or by judicious use of fail2ban(1) and your local firewall. It helps analyze the
/var/log/apache2/other_vhosts_access_log.*
files for ip addresses which are abusing your site by making massive numbers of requests in a short period of time. So far in my experience these attackers all share the same /16 address, typically out of China, Vietnam, or Singapore.
Usage is as follows:
Usage: ./find_bad_actors.sh f:i:hIT
You must pass a -f filename, which must be an other_vhosts_access_log from apache2.
If you pass only the -f filename, you will get back a list of /16 ip addresses and the counts
of the times they appear in the file, sorted with the largest number of addresses at the bottom.
If you pass a "-i ip_address", you will get back the requests made from that ip address. Make
your own mind up about crawlers, but attacks are pretty easy to see from this.
-h of course brings you this happy message.
-i ip_address -I will show you the full IP address of all the /16 ip addresses you are looking at.
-i ip_address -T will show you the full IP address and the log timestamp of all the /16 addresses you are looking at.
Running the script against an other_vhosts_access file alone gets you a list of /16 ip addresses and how many times they've hit your site:
$ find_bad_actors.sh -f other_vhosts_access.log
1 216.244
1 60.188
2 114.119
3 216.73
3 42.236
5 108.88
5 223.109
582 47.242
A script run for an obvious attacker will look something like this ( this is only the last few lines on my terminal ).
$ find_bad_actors.sh -f other_vhosts_access.log -i 47.242
[14/Aug/2025:13:31:19+0000]"HEAD/foswiki/view/TWiki/TWikiForms?amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D3%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=3%3Btable%3D3%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=3%3Btable%3D3%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=1%3Btable%3D4%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=2%3Btable%3D4%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bsortcol=1%3Btable%3D3%3Bup%3D0&%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D3%3Bup%3D0&%3Bamp%3Bsortcol=1%3Btable%3D4%3Bup%3D0&%3Bsortcol=2%3Btable%3D1%3Bup%3D0HTTP/1.1"5042351"-""Mozilla/5.0(WindowsNT6.3;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/77.0.3535.183Safari/537.36"
[14/Aug/2025:13:31:19+0000]"HEAD/foswiki/view/TWiki/TWikiForms?amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D2%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=5%3Btable%3D2%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=1%3Btable%3D1%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=2%3Btable%3D1%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D2%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bsortcol=0%3Btable%3D2%3Bup%3D0&%3Bamp%3Bamp%3Bsortcol=0%3Btable%3D3%3Bup%3D0&%3Bamp%3Bsortcol=1%3Btable%3D4%3Bup%3D0&%3Bsortcol=1%3Btable%3D3%3Bup%3D0HTTP/1.1"5042350"-""Mozilla/5.0(WindowsNT6.2;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/78.0.2089.53Safari/537.36"
[14/Aug/2025:13:31:19+0000]"HEAD/foswiki/view/TWiki/TWikiForms?amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=3%3Btable%3D2%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=3%3Btable%3D2%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=1%3Btable%3D4%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=0%3Btable%3D2%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=3%3Btable%3D1%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bsortcol=3%3Btable%3D2%3Bup%3D0&%3Bamp%3Bamp%3Bsortcol=2%3Btable%3D3%3Bup%3D0&%3Bamp%3Bsortcol=0%3Btable%3D1%3Bup%3D0&%3Bsortcol=0%3Btable%3D4%3Bup%3D0HTTP/1.1"5042352"-""Mozilla/5.0(WindowsNT6.1;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/61.0.3786.102Safari/537.36"
[14/Aug/2025:13:31:19+0000]"HEAD/foswiki/view/TWiki/TWikiForms?amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D3%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=1%3Btable%3D4%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D3%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bamp%3Bsortcol=2%3Btable%3D2%3Bup%3D0&%3Bamp%3Bamp%3Bamp%3Bsortcol=4%3Btable%3D2%3Bup%3D0&%3Bamp%3Bamp%3Bsortcol=1%3Btable%3D1%3Bup%3D0&%3Bamp%3Bsortcol=0%3Btable%3D1%3Bup%3D0&%3Bsortcol=0%3Btable%3D4%3Bup%3D0HTTP/1.1"5042351"-""Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/70.0.2791.136Safari/537.36"
Note not only the weird GET requests, but also the ruthlessly close together timestamps. That drives my uptime(1) stats up into the double-digit range, where the system becomes unresponsive and eventually falls over.
But a (more) benign crawler will look more like this:
$ find_bad_actors.sh -f other_vhosts_access.log -i 216.73
[14/Aug/2025:13:11:25+0000]"GET/robots.txtHTTP/1.1"301585"-""Mozilla/5.0AppleWebKit/537.36(KHTML,likeGecko;compatible;ClaudeBot/1.0;+claudebot@anthropic.com)"
[14/Aug/2025:13:19:26+0000]"GET/robots.txtHTTP/1.1"200590"-""Mozilla/5.0AppleWebKit/537.36(KHTML,likeGecko;compatible;ClaudeBot/1.0;+claudebot@anthropic.com)"
[14/Aug/2025:13:19:57+0000]"GET/sitemap.xmlHTTP/1.1"404494"-""Mozilla/5.0AppleWebKit/537.36(KHTML,likeGecko;compatible;ClaudeBot/1.0;+claudebot@anthropic.com)"
'
You can get a look at the actual IP addresses sorted by count with the
-I
tag:
$ find_bad_actors.sh -f other_vhosts_access.log -i 66.249 -I
1 66.249.66.164
1 66.249.66.165
1 66.249.69.172
2 66.249.68.35
2 66.249.68.37
2 66.249.68.38
2 66.249.79.168
3 66.249.79.166
4 66.249.79.167
5 66.249.68.36
19 66.249.66.192
28 66.249.66.200
48 66.249.66.199
136 66.249.79.195
151 66.249.79.194
182 66.249.79.193
And you can see what the timestamps are like with the
-T
tag:
$ find_bad_actors.sh -f other_vhosts_access.log -i 66.249 -T
66.249.79.194 [25/Aug/2025:00:37:14
66.249.79.193 [25/Aug/2025:01:01:27
66.249.79.195 [25/Aug/2025:02:32:14
66.249.79.195 [25/Aug/2025:02:38:00
66.249.79.195 [25/Aug/2025:03:07:15
66.249.79.193 [25/Aug/2025:03:52:16
66.249.79.193 [25/Aug/2025:03:53:17
66.249.79.194 [25/Aug/2025:03:53:20
66.249.79.195 [25/Aug/2025:03:53:20
66.249.79.193 [25/Aug/2025:03:53:20
66.249.79.194 [25/Aug/2025:03:53:20
66.249.79.194 [25/Aug/2025:03:53:21
66.249.79.193 [25/Aug/2025:03:59:19
66.249.79.194 [25/Aug/2025:04:00:20
66.249.79.194 [25/Aug/2025:04:04:22
66.249.79.168 [25/Aug/2025:04:38:16
66.249.79.166 [25/Aug/2025:04:38:16
66.249.79.166 [25/Aug/2025:04:38:17
66.249.79.193 [25/Aug/2025:05:23:16
66.249.79.194 [25/Aug/2025:06:08:17
# ( and so forth )
You can make your own decisions about how benign an AI crawler is, but at least it's not bringing down my site.
The script is of course released GPLV3 blah blah blah.
--
CharlesShapiro 27 August 2025