0

I am aware of the canonical question and have read it, yet I seem to be unable to find some stuff there.

Here are my conditions and rules to drop www and force https:

RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^(.*)$ https://%1/$1 [R=301,L,NE]

RewriteCond %{HTTPS} off
RewriteCond %{HTTP:X-Forwarded-Proto} !https
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L,NE]

I understand what I am trying to match. However the substitution rules are a bit unclear to me. What I don't understand is:

  1. How did my hostname (without www.) end up in %1?
  2. Why isn't the query string lost when the second rule is applied?

The reason behind the second question is that the manual explicitly states (highlighted by me):

REQUEST_URI

The path component of the requested URI, such as "/index.html". This notably excludes the query string which is available as as its own variable named QUERY_STRING.

Džuris
  • 165

1 Answers1

3

I assume these directives are working OK and you are just after an explanation as to why?

  1. How did my hostname (without www.) end up in %1?

%1 is a backreference to the first captured group in the last matched CondPattern. So, given the following condition:

RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]

The regex (ie. CondPattern) ^www\.(.*)$ is matched against the HTTP_HOST server variable. The match is successful when HTTP_HOST satisfies the regex ^www\.(.*)$, which is www. followed by anything. That anything is part of a captured group (parenthesised subpattern). ie. (.*), not simply .*. Whatever matches the (.*) group is saved in the %1 backreference and can be used later in the RewriteRule substitution. For example, given a request for www.example.com/something, this becomes:

RewriteCond www.example.com ^www\.(.*)$ [NC]

%1 will therefore contain example.com.

Why isn't the query string lost when the second rule is applied?

Because, if you don't explicitly include a query string on the RewriteRule substitution then the query string from the request is automatically appended onto the end of the resulting substitution.

However, if you included a query string on the end of the substitution, even just an empty query string (a ? followed by nothing), then the query string from the request is not appended. For example:

RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI}? [R=301,L,NE]

This will result in the query string being stripped from the request (note the trailing ?). Alternatively, on Apache 2.4+ you can use the QSD (Query String Discard) flag to prevent the query string being appended.

Aside: I also removed the parentheses from the RewriteRule pattern. You don't need a captured group here, since you are using the REQUEST_URI server variable instead. (This would be available in the $1 backreference - note the $ prefix. Storing backreferences when you don't need them is just waste of resources and hampers readability.)

RewriteCond %{HTTP:X-Forwarded-Proto} !https

I assume your server is behind a proxy server that is setting the X-Forwarded-Proto header?

MrWhite
  • 13,315