rewrite and map ??interfering regexps

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

rewrite and map ??interfering regexps

Norman Gray

Greetings.

I'm trying to do some fairly intricate URI rewriting, and the behaviour
of the 'rewrite' statement does not correspond to anything I can explain
from the docs.

The goal is that /A/foo/modified is rewritten to /_newroot/foo and
/B/foo/modified to /_defaultroot/foo.  I hope to achieve this with

     map $uri $modroot {
         default _defaultroot;
         ~^/A _newroot;
     }

and

     location ~ /modified$ {
         rewrite ^(.+)/modified$ /$modroot/-$1-;
     }

(in the real config, /_newroot is reverse-proxied to a web service, so
that the URIs it handles end up grafted on to selected trees; the 'map'
is intended to select/limit which URIs are passed on to this service).

The complete nginx.conf is at the bottom.

Looking in the error log, when I retrieve /hello/modified I find

2020/04/26 12:23:28 [notice] 63328#0: *5 "^(.+)/modified$" matches
"/hello/modified", client: 127.0.0.1, server: localhost, request: "GET
/hello/modified HTTP/1.1", host: "localhost"
2020/04/26 12:23:28 [notice] 63328#0: *5 rewritten data:
"/_defaultroot/-/hello-", args: "", client: 127.0.0.1, server:
localhost, request: "GET /hello/modified HTTP/1.1", host: "localhost"

...which is fine: the map defines $modroot as /_defaultroot, and the
rewrite captures the /hello.

But retrieving /A/hello/modified,

2020/04/26 13:00:51 [notice] 63828#0: *6 "^(.+)/modified$" matches
"/A/hello/modified", client: 127.0.0.1, server: localhost, request: "GET
/A/hello/modified HTTP/1.1", host: "localhost"
2020/04/26 13:00:51 [notice] 63828#0: *6 rewritten data: "/_newroot/--",
args: "", client: 127.0.0.1, server: localhost, request: "GET
/A/hello/modified HTTP/1.1", host: "localhost"

I would expect this to be rewritten to /_newroot/-/A/hello-

Here, the map has defined $modroot as /_newroot (which is correct).  The
'rewrite' _has_ matched, but the $1 in that line appears to be empty.  
Note the '+' in the regexp: there is supposed be be a string of non-zero
length in there (ie, this is ruling out that I'm inadvertently matching,
and replacing, an empty string, as a result of being somehow confused
about where in the string '^' is matching).

It's as if the regexp match in the 'map' is somehow interfering with the
group-capturing in the 'rewrite'.

As a workaround, I can get this to work with ^(?<newprefix>/A) in the
'map', and using $newprefix in the 'rewrite', but that's fiddly/ugly and
more confusing than localising the rewriting to the 'rewrite' statement.

Am I misunderstanding how 'rewrite' matches things, or is there an issue
here?

Best wishes,

Norman






% ../sbin/nginx -V
nginx version: nginx/1.18.0
built by clang 11.0.3 (clang-1103.0.32.29)
configure arguments: --prefix=/Data/tools/nginx-1.18
--with-pcre=../pcre-8.44

Both nginx and pcre built, as shown, from source.

This is on macOS 10.15.3, but I get the same results with (packaged)
nginx/1.16.1 on FreeBSD 12.1


Complete nginx.conf:

worker_processes  1;

events {
     worker_connections  1024;
}


http {
     include       mime.types;
     default_type  application/octet-stream;

     error_log logs/error.log debug;
     rewrite_log on;

     sendfile        on;
     keepalive_timeout  65;

     # selected URIs are dynamically 'rehomed'
     map $uri $modroot {
         default _defaultroot;
         ~^/A _newroot;
     }

     server {
         listen       80;
         server_name  localhost;

         location / {
             root   html;
             index  index.html index.htm;
         }
         location ~ /modified$ {
             rewrite ^(.+)/modified$ /$modroot/-$1-;
         }
         location /_defaultroot { # not a 'rehomed' one
             internal;
             error_page 404 /404-private.html;
         }
         location /_newroot {
             internal;
             root html/x;
         }

         error_page   500 502 503 504  /50x.html;
         location = /50x.html {
             root   html;
         }
     }
}


--
Norman Gray  :  https://nxg.me.uk
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: rewrite and map ??interfering regexps

Maxim Dounin
Hello!

On Sun, Apr 26, 2020 at 01:49:19PM +0100, Norman Gray wrote:

> Greetings.
>
> I'm trying to do some fairly intricate URI rewriting, and the behaviour of
> the 'rewrite' statement does not correspond to anything I can explain from
> the docs.
>
> The goal is that /A/foo/modified is rewritten to /_newroot/foo and
> /B/foo/modified to /_defaultroot/foo.  I hope to achieve this with
>
>     map $uri $modroot {
>         default _defaultroot;
>         ~^/A _newroot;
>     }
>
> and
>
>     location ~ /modified$ {
>         rewrite ^(.+)/modified$ /$modroot/-$1-;
>     }
>
> (in the real config, /_newroot is reverse-proxied to a web service, so that
> the URIs it handles end up grafted on to selected trees; the 'map' is
> intended to select/limit which URIs are passed on to this service).
>
> The complete nginx.conf is at the bottom.
>
> Looking in the error log, when I retrieve /hello/modified I find
>
> 2020/04/26 12:23:28 [notice] 63328#0: *5 "^(.+)/modified$" matches
> "/hello/modified", client: 127.0.0.1, server: localhost, request: "GET
> /hello/modified HTTP/1.1", host: "localhost"
> 2020/04/26 12:23:28 [notice] 63328#0: *5 rewritten data:
> "/_defaultroot/-/hello-", args: "", client: 127.0.0.1, server: localhost,
> request: "GET /hello/modified HTTP/1.1", host: "localhost"
>
> ...which is fine: the map defines $modroot as /_defaultroot, and the rewrite
> captures the /hello.
>
> But retrieving /A/hello/modified,
>
> 2020/04/26 13:00:51 [notice] 63828#0: *6 "^(.+)/modified$" matches
> "/A/hello/modified", client: 127.0.0.1, server: localhost, request: "GET
> /A/hello/modified HTTP/1.1", host: "localhost"
> 2020/04/26 13:00:51 [notice] 63828#0: *6 rewritten data: "/_newroot/--",
> args: "", client: 127.0.0.1, server: localhost, request: "GET
> /A/hello/modified HTTP/1.1", host: "localhost"
>
> I would expect this to be rewritten to /_newroot/-/A/hello-
>
> Here, the map has defined $modroot as /_newroot (which is correct).  The
> 'rewrite' _has_ matched, but the $1 in that line appears to be empty.  Note
> the '+' in the regexp: there is supposed be be a string of non-zero length
> in there (ie, this is ruling out that I'm inadvertently matching, and
> replacing, an empty string, as a result of being somehow confused about
> where in the string '^' is matching).
>
> It's as if the regexp match in the 'map' is somehow interfering with the
> group-capturing in the 'rewrite'.
>
> As a workaround, I can get this to work with ^(?<newprefix>/A) in the 'map',
> and using $newprefix in the 'rewrite', but that's fiddly/ugly and more
> confusing than localising the rewriting to the 'rewrite' statement.
>
> Am I misunderstanding how 'rewrite' matches things, or is there an issue
> here?

The issue is that $1..$N variables as used by the second argument
of the rewrite directive are from the last regular expression
matched.  And the last regular expression is not the one from the
first argument of the rewrite directive when using a variable
from map with regular expressions.  Relevant ticket is here:

https://trac.nginx.org/nginx/ticket/564

Unfortunately, there is no obvious solution.  On the other hand,
this is something relatively easy to work around.

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: rewrite and map ??interfering regexps

Norman Gray

Maxim, hello.

On 26 Apr 2020, at 17:28, Maxim Dounin wrote:

> Relevant ticket is here:
>
> https://trac.nginx.org/nginx/ticket/564
>
> Unfortunately, there is no obvious solution.  On the other hand,
> this is something relatively easy to work around.

Aha, so it _is_ the map regexp and the rewrite regexp mutually
interfering!  Thanks for the speedy insight.

Looking through the comments in the ticket, I agree with you that 'the
current behaviour is bad, and should be fixed'.  If only on a principle
of least surprise.

Until it is fixed, however, it would be extremely useful if, in the
description of the 'map' stanza (ie, in
<https://nginx.org/en/docs/http/ngx_http_map_module.html>) it mentioned
that the regexp in 'map' can interfere with the regexp in a 'rewrite'
directive, in such a way that positional groups in the latter don't
work.  It could note that this is a (temporary?) defect, but that until
it is fixed, using named groups in the 'rewrite' regexp is a good
workaround, and give an example.

It would be better here than in the documentation of 'rewrite', as that
would keep the 'rewrite' documentation relatively simple.  It only needs
to be seen by people using 'rewrite' and 'map' together, who might be
assumed to be marginally more sophisticated users.

Best wishes,

Norman


--
Norman Gray  :  https://nxg.me.uk
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: rewrite and map ??interfering regexps

J.R.
In reply to this post by Norman Gray
> Until it is fixed, however, it would be extremely useful if, in the
> description of the 'map' stanza it mentioned
> that the regexp in 'map' can interfere with the regexp in a 'rewrite'
> directive, in such a way that positional groups in the latter don't
> work.

Yeah, I just realized I posted a question a couple hours after yours,
and the answer was the same with the positional capture in a map
causing issues with other directives after it...

I would agree that adding a note in the map directive documentation
would probably go a long way to help eliminate a lot of these
redundant troubleshooting issues.
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx