Bypass Hetrixtools through Cloudflare's bot fight mode

Hi.
I've been receiving a lot of requests from bots today and server is becoming slow so I'm trying to enable bot fight mode in Cloudflare, but Hetrixtools bot is being blocked.

I tried adding a firewall rule to allow the bot, but CF keeps enforcing the blocking rule:

Any ideas?

Comments

  • heyhey OG
    edited January 2021

    check the firewall overview, look at the reason IP blocked

  • AndreiAndrei Services Provider

    This is an issue with CloudFlare and the way their firewall works. We've had several such complains before.

    It looks like CloudFlare's Bot Fight Mode will simply ignore or trump any 'allow' firewall rules... which doesn't make sense.

    I've done all I could on our part, applied for our monitoring locations to be whitelisted in their Bot Fight Mode, but haven't heard anything back from CloudFlare in over a month.

    @hey said:
    check the firewall overview, look at the reason IP blocked

    I'd also be interested in seeing this.

    Cheers.

  • Shoot CF in the face.
    Figure out what url's the bots are calling, block it with nginx or its just 404's fail2ban go over the error log and let fail2ban ban them by IP.

  • alwyzonalwyzon Hosting Provider

    If bots slow down your server, it‘s not always because there are too many bots crawling around but that they hit some „expensive“ resource, such as a long-running PHP script or something causing complex database queries. Could as well just be the GoogleBot following some links to something you didn‘t considered to be found; and I guess you wouldn‘t want to block Google, right? ;)

    Instead of using a firewall, you might consider to figure out what is this „expensive“ route/path they hit and:

    • either rate limit access to this resource, or
    • implement some caching that reduces the load caused by this resource.

    Good point to start would be to look at the servers access log.

    Alwyzon - Virtual Servers in Austria starting at 4,49 €/month (excl. VAT)

  • @alwyzon said:
    If bots slow down your server, it‘s not always because there are too many bots crawling around but that they hit some „expensive“ resource, such as a long-running PHP script or something causing complex database queries. Could as well just be the GoogleBot following some links to something you didn‘t considered to be found; and I guess you wouldn‘t want to block Google, right? ;)

    I have a page that displays a map of available bus stops: https://yoursunny.com/p/rideon-today/
    It contains a query argument of the date, because bus service is different on different dates.

    Google crawled thousands of pages, because each date results in a different URI.
    I initially added noindex on the page with specified date other than "today". Google stops indexing excessive pages but calling continued.
    Then I added rel=nofollow to the links. Google stopped crawling unnecessary pages.

    PS. I usually avoid databases, but this page has SQLite and it's the most complicated SQL query I wrote in a decade:
    https://bitbucket.org/yoursunny/yoursunny-website/src/0616b70e484bbb736185a7debb2e3addf8153fe5/www/p/rideon-today/gtfs-db.inc.php#lines-5

    ServerFactory aff best VPS; HostBrr aff best storage.

  • check the firewall overview, look at the reason IP blocked

    I'd also be interested in seeing this.

    Well... reason is Bot Fight mode :p And the action taken includes Block, JS Challenge and more. I tried to bypass all of them without success. @Andrei is right.

    @alwyzon said:
    If bots slow down your server, it‘s not always because there are too many bots crawling around but that they hit some „expensive“ resource, such as a long-running PHP script or something causing complex database queries. Could as well just be the GoogleBot following some links to something you didn‘t considered to be found; and I guess you wouldn‘t want to block Google, right? ;)

    Instead of using a firewall, you might consider to figure out what is this „expensive“ route/path they hit and:

    • either rate limit access to this resource, or
    • implement some caching that reduces the load caused by this resource.

    Good point to start would be to look at the servers access log.

    I understand, but there is not much room for optimization with Wordpress + plugins + multipurpose themes. These are not the good bots and they don't respect robots.txt. Caching is enabled but they are scraping years old archives very fast, content that is not cached by LSCache yet. Wordfence makes everything slow and ModSecurity throws many false positives that backend becomes unusable.

    In the meantine I've disable bot fight mode, the attack has stopped. Will check some rate limiting options for the old content.

    Thank you!

Sign In or Register to comment.