![]() |
|
Apache: Rewrite to IF - Printable Version +- Scivillage.com Casual Discussion Science Forum (https://www.scivillage.com) +-- Forum: Science (https://www.scivillage.com/forum-61.html) +--- Forum: Computer Sci., Programming & Intelligence (https://www.scivillage.com/forum-79.html) +--- Thread: Apache: Rewrite to IF (/thread-2809.html) |
Apache: Rewrite to IF - stryder - Sep 7, 2016 Currently I'm looking at a slight change in a configuration file for the Apache Webserver. I'm currently doing this on a development system before applying it to this actual site. One of the main reasons for this is actually down to the lapse examples and documentation on how to utilise <IF><IFELSE><ELSE> additions to Apache. This means I'm having to mess with the settings to see what works and what causes 500 Server Errors. As I go though I thought I would post what I've worked out and how slowly things can be shifted from using mod_Rewrite, as it might be helpful to other webmasters. (I apologise to all that think this is all "Gobbledygook") A simple example: Redirect example.com to www.example.com if the file or directory exists Code: <If "(-f %{REQUEST_FILENAME} || -d %{REQUEST_FILENAME}) && (%{HTTP_HOST} == 'example.com')">This might not seem much, however previously my lapse rewrites were actually causing redirects even when a file or directory doesn't exist, that isn't really necessary (in fact it increases is the amount of resources used by the server, albeit marginally) With the above IF statement, the 404 ErrorDocument will handle any requests that don't match the criteria (unless otherwise specified in other IF statements) If you intend to use it with HTTPS, then it makes sense to follow it up with: Code: #Elseif the attempted request uses HTTP (and exists) redirect to HTTPS Just make sure that both redirects begin Https:// (The first example doesn't have an s in )It just tests to see if the page was accessed using http and if the file that's attempting to be accessed exists. (It will default to a 404 on failure which again doesn't use redirects) This method should be compatible with sites using cloudflare that doesn't even have SSL certs themselves. Restricting Request methods to only GET, POST and HEAD Code: <If "(%{REQUEST_METHOD} != 'GET') || (%{REQUEST_METHOD} != 'POST') || (%{REQUEST_METHOD} != 'HEAD')">This took me a little while to workout since I was trying to do it in a more concise manner involving checking the request method against an array (to shorten the conditional) Other Request methods can be used by some servers/services, however I try to tighten it where possible. An addition that isn't an IF Statement might be capable of being used to take it further: Code: <LIMIT PUT DELETE CONNECT OPTIONS PATCH PROPFIND PROPPATCH MKCOL COPY MOVE LOCK UNLOCK>I'll try to add some more as I work out the correct conditionals. edit: Code: <LIMITEXCEPT GET POST HEAD>RE: Apache: Rewrite to IF - stryder - Sep 8, 2016 Don't allow POSTing with less than HTTP/1.1 Code: <If "(%{REQUEST_METHOD} == 'POST') && (! %{THE_REQUEST} =~ m#^POST(.*)HTTP/1.1$#)">This example makes use of regex (regular expressions) to test the Raw request information. RE: Apache: Rewrite to IF - stryder - Sep 11, 2016 It seems like I wasn't the only one looking at SSL inclusion with their site, so I decided to write a rather lengthy response to a question about it thanks to what I've done here: https://stackoverflow.com/a/39433261/4136214 In the article (as that's what it became) I touched on how the HSTS protocol requires elevation to follow a particular pattern: Http://example.com through a 301 Redirect need to be elevated to https://example.com before being 301 redirected yet again to https://www.example.com It unfortunately negates the version I was using to reduce how many redirections occur. RE: Apache: Rewrite to IF - stryder - Nov 18, 2017 Traffic Calming - Slowing Down Robots As you might or might not be aware, robots (spiders, crawlers, agents) can be a regular pain the *insert explicit here* While some attempt to take into consideration the Robots.txt of a site to identify where to crawl and how often, others can be down right abusive and eat up as much bandwidth as they can. This can lead to site instabilities where the server literally can't handle all the requests. Most webmaster related articles or examples tend to attempt to use bot the robots.txt for Good bots and block the request of "Bad" bots, I however have been looking at a slightly different approach. While indeed there are bad bots (ones that attempt to exploit, consume resources or scrape the site for data), others are actually more Roguish than Bad. Rogue bots in that instance are ones that just need better control methods applied to them. So I came up with a traffic calming method using SetEnvifNoCase, RewriteCond serverenvironment strings: PHP Code: Then using a rewrite first to work out what time of day it is in relationship to a traffic patten (in this instance were looking to cover the HIGH LOAD hours in the day), during these times we have a server environment of "trafficcalm" being set. Then using a rewrite that checks for both "throttlebot" and "trafficcalm" together and that it's not accessing robots.txt (it shouldn't be calmed ever) it also checks if its within 10 and 50 seconds on the minute. If that is the case we throttle the bot by using a 503 Service Not Available otherwise an server environment of "normaltraffic" is set. Additionally, having a Retry-After header with the value of 43 in the 503 page and adding a Crawl-delay with the value of "43" means that should a robot retry after 43 seconds they will eventually cycle around to accessing during the "normaltraffic" window. The reason to Calm rather than block is that it does mean a robot can spider, so it won't effect SEO as much. The robot can't however be blocked for hours as that would cause problems, so creating an intermittent block on a per minute basis can lead to calming effect. Most good robots (which aren't effected by this measure) tend to be able to retry crawling after a duration of time, if the resource they are looking for isn't missing for hours. |