Results 1 to 5 of 5

Thread: [How To] - Block Bad Bots from accessing your Website

  1. #1
    GlowHost-James is offline Master Glow Jedi
    Join Date
    Apr 2012
    Posts
    318

    Default [How To] - Block Bad Bots from accessing your Website

    What is a Bad Bot?
    They can be thought of as the bots or spiders that do more harm than good to your website.

    An example of a bad bot would be an email harvester which scans your web page code for email addresses that can then be used to send spam to. Another example is an unwanted bot which consumes too much bandwidth, or causes the load to go up on your server, causing it to go slow or at worst, completely offline due to overload.

    While the worst of the "Bad Bots" will ignore your robots.txt directives completely, there are some bots that are not necessarily intending to be a Bad Bot, but may they may be unwanted by you. For the bots that ignore your robots.txt file, they would need to be blocked by using user-agent directives in your .htaccess file, but that topic is beyond the scope of this simple guide.

    For the bots that are not intending to be malicious, but sometimes are, we can take care of them in your robots.txt file. For example, if you have a site based in the USA, you may not want bots from Non-English speaking countries coming in and eating up your bandwidth or other resources. Many bots will follow your rules, and this simple guide can help you to control the bots which access your site.

    How to block Bad Bots
    Follow these steps to block the bad bots and spiders from accessing your website.

    Step 1:
    Open your favorite text editor and create a file called robots.txt.

    Step 2:
    Place the following code in this file.
    Code:
    # Deny all robots that we do not specifically want to allow
    User-agent: *
    Disallow: /
    
    # Allow these robots only
    User-agent: googlebot
    Allow: /
    The code above will block all bots from accessing your website, with the exception of Google (googlebot).

    **See the end of this post for more search engines / robots that are safe to add to your robots.txt file.

    Step 3:
    Save the file and upload it to your public_html directory. You can upload it via FTP or through the cPanel file manager.

    More Good Bots to allow
    The example above only uses Googlebot. There are others that you may want to add to your robots.txt file. Here are a few.

    • Googlebot-News - Google News
    • Googlebot-Image - Google Images
    • Googlebot-Mobile - Google Mobile
    • MSNBot - Microsoft MSN
    • Teoma - Teoma Search
    • bingbot - Bing Search
    • Slurp - yahoo! Search
    • Scooter - AltaVista Search
    • Scrubby - Scrub the Web


    You can add them into the robots.txt file in the following format:
    Code:
    User-agent: BOTNAME
    Allow: /
    Where BOTNAME is the name of the bot listed above.

    So one example of a robots.txt file which bans all robots except yahoo, bing, and google might look like this:

    Code:
    # Deny all robots that we do not specifically want to allow
    User-agent: *
    Disallow: /
    
    # Allow these robots only
    User-agent: slurp
    Allow: / 
    
    User-agent: bingbot
    Allow: /
    
    User-agent: googlebot
    Allow: /
    But if robots.txt doesn't help, you may block bots in your .htaccess file. First of all, we need to find out how to identify a bot. You will need to check your raw access logs using appropriate option in your Cpanel. The "User Agent" string in the logs is the one we need. For example, in the line below you may see YandexBot string:
    Code:
    HTTP/1.1" 200 927 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"
    This is what we need. In order to block Yandex bot, you need to add the following into your .htaccess:

    Code:
    BrowserMatchNoCase YandexBot bad_bot
    Order Deny,Allow
    Deny from env=bad_bot
    The other bots can be blocked by adding BrowserMatchNoCase directive in the same way.

    If you have any further questions, please feel free to register and post a reply in this thread.
    Last edited by Alexander; 05-05-2016 at 10:34 AM.

  2. #2
    mandi007va is offline Newbie
    Join Date
    May 2016
    Posts
    1

    Default how to decide the bot is bad or good.

    sir how to find that the bot is good or bad. my server resources has gone high because of some bots like archivebots....and some more...how to decide it is good or bad

  3. #3
    David I is offline Newbie
    Join Date
    Jun 2010
    Posts
    1,245

    Default

    Hello,

    You need to google about the bot and then decide if this is a good or a bad bot.

  4. #4
    H_T
    H_T is offline Newbie
    Join Date
    Jul 2016
    Posts
    1

    Default Is this the right order?

    Hi James,

    The robots.txt of one of my client website is like -

    Code:
    User-agent: Googlebot
    Allow: /
    
    User-agent: Slurp
    Allow: /
    
    User-Agent: msnbot
    Allow: /
    
    User-agent: *
    Disallow: /
    So, I'm not sure whether is it the right order that allows only Googlebot, Slurp, MSNbot to crawl the whole website and to disallow the spambots?

    OR

    Also, I'd like to know - Does order matters in robots.txt?

  5. #5
    Alexander's Avatar
    Alexander is offline Technical Analyst
    Join Date
    Jul 2007
    Posts
    1,772

    Default

    Most Search Engines recommend to put Allow directives first, since they follow the "first rule counts" rule. In the same time, Google doesn't follow this rule, but rather the specificity based on the length of the entry. Anyway, you can always check your robots.txt file in Google Webmaster tools.

    Hope this helps.

Similar Threads

  1. GlowHost Spam-O-Matic Bad Words List
    By Matt in forum Programming Talk
    Replies: 7
    Last Post: 10-15-2013, 09:45 PM
  2. Apache Bad Request
    By charlesh in forum General Support
    Replies: 11
    Last Post: 02-11-2010, 06:31 AM
  3. Replies: 6
    Last Post: 12-05-2007, 07:18 PM
  4. Olga - Bad hard Disk
    By Matt in forum Outages and Scheduled Maintenance
    Replies: 0
    Last Post: 04-05-2007, 12:29 PM
  5. Bad CGI Errors.. Need Urget Help Please
    By FiberglassForum in forum General Support
    Replies: 1
    Last Post: 11-22-2005, 09:38 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14