Apache: Rewrite to IF

**stryder** · (This post was last modified: Dec 1, 2017 02:45 AM by stryder.)

Currently I'm looking at a slight change in a configuration file for the Apache Webserver. I'm currently doing this on a development system before applying it to this actual site.

One of the main reasons for this is actually down to the lapse examples and documentation on how to utilise <IF><IFELSE><ELSE> additions to Apache. This means I'm having to mess with the settings to see what works and what causes 500 Server Errors. As I go though I thought I would post what I've worked out and how slowly things can be shifted from using mod_Rewrite, as it might be helpful to other webmasters. (I apologise to all that think this is all "Gobbledygook")

A simple example:

Redirect example.com to www.example.com if the file or directory exists

Code:
<If "(-f %{REQUEST_FILENAME} || -d %{REQUEST_FILENAME}) && (%{HTTP_HOST} == 'example.com')">

Redirect permanent "/" "Http://www.example.com/"

</If>

This might not seem much, however previously my lapse rewrites were actually causing redirects even when a file or directory doesn't exist, that isn't really necessary (in fact it increases is the amount of resources used by the server, albeit marginally)

With the above IF statement, the 404 ErrorDocument will handle any requests that don't match the criteria (unless otherwise specified in other IF statements)

If you intend to use it with HTTPS, then it makes sense to follow it up with:

Code:
#Elseif the attempted request uses HTTP (and exists) redirect to HTTPS 

<ElseIf "%{REQUEST_SCHEME} =='http' && (-f %{REQUEST_FILENAME} || -d %{REQUEST_FILENAME})">

Redirect permanent "/" "Https://www.example.com/"

</ElseIf>

Just make sure that both redirects begin Https:// (The first example doesn't have an s in Wink

)
It just tests to see if the page was accessed using http and if the file that's attempting to be accessed exists. (It will default to a 404 on failure which again doesn't use redirects) This method should be compatible with sites using cloudflare that doesn't even have SSL certs themselves.

Restricting Request methods to only GET, POST and HEAD

Code:
<If "(%{REQUEST_METHOD} != 'GET') || (%{REQUEST_METHOD} != 'POST') || (%{REQUEST_METHOD} != 'HEAD')">

Redirect 405 -

</If>

This took me a little while to workout since I was trying to do it in a more concise manner involving checking the request method against an array (to shorten the conditional)

Other Request methods can be used by some servers/services, however I try to tighten it where possible.

An addition that isn't an IF Statement might be capable of being used to take it further:

Code:
<LIMIT PUT DELETE CONNECT OPTIONS PATCH PROPFIND PROPPATCH MKCOL COPY MOVE LOCK UNLOCK>

Deny from All

</LIMIT>

I'll try to add some more as I work out the correct conditionals.

edit:

Code:
<LIMITEXCEPT GET POST HEAD>

Require all denied

</LIMITEXCEPT>

Limitexcept is like limit, however you put what you want to use and everything else has what's between the tags applied. Notice this is using Require all denied as opposed to Deny from All (since that is now depreciated)

**stryder** · (This post was last modified: Sep 8, 2016 12:18 AM by stryder.)

Don't allow POSTing with less than HTTP/1.1

Code:
<If "(%{REQUEST_METHOD} == 'POST') && (! %{THE_REQUEST} =~ m#^POST(.*)HTTP/1.1$#)">

Deny from All

</If>

This Checks to make sure that the request is actually a POST and that it it's not using HTTP/1.1 (This will likely change with HTTP/2.0 around the corner) Any POST attempts not using 1.1 will trigger a 403 Forbidden by using a Deny from All rule.

This example makes use of regex (regular expressions) to test the Raw request information.

**stryder** · (This post was last modified: Sep 11, 2016 07:55 AM by stryder.)

It seems like I wasn't the only one looking at SSL inclusion with their site, so I decided to write a rather lengthy response to a question about it thanks to what I've done here:

https://stackoverflow.com/a/39433261/4136214

In the article (as that's what it became) I touched on how the HSTS protocol requires elevation to follow a particular pattern:

Http://example.com through a 301 Redirect need to be elevated to https://example.com before being 301 redirected yet again to https://www.example.com
It unfortunately negates the version I was using to reduce how many redirections occur.

**stryder** · (This post was last modified: Nov 19, 2017 08:46 AM by stryder.)

Traffic Calming - Slowing Down Robots

As you might or might not be aware, robots (spiders, crawlers, agents) can be a regular pain the *insert explicit here*
While some attempt to take into consideration the Robots.txt of a site to identify where to crawl and how often, others can be down right abusive and eat up as much bandwidth as they can. This can lead to site instabilities where the server literally can't handle all the requests.

Most webmaster related articles or examples tend to attempt to use bot the robots.txt for Good bots and block the request of "Bad" bots, I however have been looking at a slightly different approach. While indeed there are bad bots (ones that attempt to exploit, consume resources or scrape the site for data), others are actually more Roguish than Bad. Rogue bots in that instance are ones that just need better control methods applied to them.

So I came up with a traffic calming method using SetEnvifNoCase, RewriteCond serverenvironment strings:

PHP Code:
<?php 

<IfModule mod_setenvif.c>

    SetEnvIfNoCase User-Agent (roguebot1|roguebot2){1} throttlebot=1

</If>

<IfModule mod_rewrite.c>

    # Initialise the Rewrite Engine if not already Initialised

    RewriteEngine on

    RewriteBase /

    #If its between the hours of 0 to 8, 12, or 16 to 23

    # set an environment of trafficcalm

    RewriteCond %{TIME_HOUR} >00

    RewriteCond %{TIME_HOUR} <08 

    RewriteRule ^ - [E=trafficcalm:1]

     RewriteCond %{TIME_HOUR} =12 

     RewriteRule ^ - [E=trafficcalm:1]

     RewriteCond %{TIME_HOUR} >16

     RewriteCond %{TIME_HOUR} <23

     RewriteRule ^ - [E=trafficcalm:1]

    # If trafficcalm and throttlebot set and the request isn't for robots.txt

    # and it's between 10 and 50 seconds of the minute

    # Redirect to a 503 else set environment normaltraffic

    # Throttlebots only have access for 20 seconds per minute

     RewriteCond %{ENV:trafficcalm} 1

     RewriteCond %{ENV:throttlebot} 1

     RewriteCond %{REQUEST_URI} !^\/robots\.txt$ [NC]

     RewriteCond %{TIME_SEC} >10

     RewriteCond %{TIME_SEC} <50

     RewriteRule ^ - [E=throttled:1,R=503,L]

     RewriteRule ^ - [E=normaltraffic:1]

</IfModule>