Allow internal redirect to URI x, but deny external request for x?

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Allow internal redirect to URI x, but deny external request for x?

J. Lewis Muir
On 09/02, Francis Daly wrote:

> nginx does not "do" php. nginx does not care what your fastcgi server
> will do with the key/value pairs that it sends. nginx cares that the
> fastcgi server gives a valid response to the request that nginx makes.
>
> Typically, your fastcgi server will use the value associated with
> SCRIPT_FILENAME as "the name of the file to execute". If your fastcgi
> server fails to find / read / execute that file, it will return its own
> error indication.
>
> (So your "if", or the more common "try_files", is just an early-out,
> to sometimes avoid involving the fastcgi server. It may happen that the
> file is present when nginx looks for it, but is absent when the fastcgi
> server looks for it -- so that case does have to be handled anyway.)
>
> In this case, if $document_root is /srv/www/my-app/current/ and
> $realpath_root is /srv/www/my-app/releases/1.0.2/, and the script
> name is test.php, then with one config, nginx would send the string
> "/srv/www/my-app/current/test.php", and with the other config nginx
> would send the string "/srv/www/my-app/releases/1.0.2/test.php".
>
> (That is "pathname1" vs "pathname2".)
 
Understood.

> So if "one request" involves the fastcgi server reading
> "/srv/www/my-app/current/test.php", and then reading a bunch of other
> files in the same directory -- then I guess that unfortunate timing
> could lead to it reading some files from releases/1.0.1 and some from
> releases/1.0.2. (Assuming that it opens the directory afresh each time --
> which can't be ruled out.)
 
Right, that's what I was trying to avoid by using $realpath_root.  I
assumed that $realpath_root was set at the beginning of the location
processing.  That way, I could be guaranteed that it would not change
for the duration of the request handling within nginx.  And since nginx
would give that value (i.e., the path with the symlinks resolved) to
the FastCGI server, the FastCGI server would be using that same path
for the whole request and wouldn't know anything about the "current"
symlink that can change at any moment.  But perhaps that's an invalid
assumption, and the path is resolved every time $realpath_root is
expanded as a variable?  I hope not, but that would be really important
to understand.

> But if "the app" involves a http request to part1.php and then a http
> request to part2.php (or: a second http request to part1.php), I don't
> think that the symlink+realpath thing will prevent those two requests
> going to different release versions.

Hmm, good point.

I'm not sure how to do a seamless web app update deploy, then.  Maybe
it's not possible without additional constraints.

I'm assuming the app is hosted on a single nginx server.  Although, I'd
be curious how this is typically solved for a multiple-server case as
well (e.g., a load balancer with multiple identical instances of the web
app running on two or more servers).  The idea is to have no downtime.

I suppose I could encode the app version in the URI (e.g.,
/my-app/1.0.2) or in the request header.  I've seen REST APIs versioned
in the URI or in the request header, but I'm not sure web apps do that.

Or I could try to ensure that my web app updates are *always* backward
compatible if they are to keep using the same URI.  I could do that for
any web apps I write, but I can't control that for any that I don't
write and am just deploying.

Another idea I previously toyed with was to deploy the web
app in a directory structure similar to the symlink+realpath
approach but without the symlink; the path to the app root is
versioned.  In the nginx config, use the versioned path to the
app root (e.g., /srv/www/my-app/releases/1.0.2).  To deploy a new
version of the app, install to a new versioned app root (e.g.,
/srv/www/my-app/releases/1.0.3), change the app root in the nginx
config, and cause nginx to reload the config.  Note that I'm
intentionally keeping the file system path versioned so that the path
changes when a new version is deployed to avoid the need to flush any
caching that might be going on at the FastCGI server or elsewhere.

Will there be downtime for a split second when the config is reloaded?
What will happen?  Will nginx refuse connections?  Will it accept
connections but just not respond while it's reloading the config?

But this approach has the same issue that you pointed out for the
symlink+realpath approach in that I don't see a way to prevent the case
where the app involves an HTTP request to part1.php that goes to one
release and then an HTTP request to part2.php that goes to the updated
release if the deploy happens at just the right (wrong) time.

What's the best strategy for deploying a new version of an app, then?
Expect that all app updates are backward compatible?  Expect that the
app passes around a version identifier with each request that the app
will use to detect when the version has changed and force the user to
log in again or something?  Version the URI?

Thanks!

Lewis
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Allow internal redirect to URI x, but deny external request for x?

J. Lewis Muir
On 09/03, J. Lewis Muir wrote:

> On 09/02, Francis Daly wrote:
> > But if "the app" involves a http request to part1.php and then a http
> > request to part2.php (or: a second http request to part1.php), I don't
> > think that the symlink+realpath thing will prevent those two requests
> > going to different release versions.
>
> Hmm, good point.
>
> I'm not sure how to do a seamless web app update deploy, then.  Maybe
> it's not possible without additional constraints.

After searching the web and failing to find anything addressing this
(maybe it's out there, but I couldn't find it), I'm inclined to believe
that there are roughly two choices: either the web app maintains
backward compatibility in its request API, or it doesn't.

The web app that maintains backward compatibility in its request API
will work with the symlink+realpath approach, assuming the FastCGI
server either does no caching or caches based on the file path.  (The
path-based caching works because the path changes when an app update is
deployed because the version is encoded in the path.)  Note, however,
that even for an app that maintains backward compatibility like this,
rolling back a deploy to a previous release would not work unless it
was a patch update (as defined in the Semantic Versioning scheme).  For
example, you could safely roll back from 1.0.3 to 1.0.2, but not from
1.1.0 to 1.0.3, and not from 2.0.0 to 1.2.3.

The web app that does *not* maintain backward compatibility in its
request API will *not* work with the symlink+realpath approach.  It
might work by chance depending on the timing of the deploy, the timing
of the requests, and which requests were in flight at the time of the
deploy.  Or you could orchestrate the deploy to shut down the nginx
server, wait for an amount of time deemed to be the maximum time that
should ever elapse between the "part1.php" request and the "part2.php"
request (which may be impossible to determine, or may be infinite) such
that all "part2.php" requests will happen and fail because they couldn't
connect to the nginx server, deploy the app update, and then start the
nginx server again.  This approach will never be 100% correct.

I'd love to be enlightened on other choices, but this is my
understanding as of now, and I think I'll proceed with the
symlink+realpath approach under the expectation that the web apps I
deploy maintain backward compatibility in their request API, or they
might just break for some users when I deploy an update in which case I
might choose my deploy time to be the time of least demand on average.

Regards,

Lewis
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Allow internal redirect to URI x, but deny external request for x?

J. Lewis Muir
In reply to this post by zakirenish
On 08/30, j94305 wrote:

> I've been following this, and I would take a slightly different approach.
>
> 1. Serve all apps under /{app}/releases/{version}/{path} as you have them
> organized in the deployment structure in the file system.
>
> 2. Forget about symbolic links and other makeshift versioning/defaulting in
> the file system.
>
> 3. Use a keyval mapping to handle redirections (307) of
> /{app}/current/{stuff} to /{app}/releases/{currentVersion}/{stuff}, where
> the keyval mapping provides {app} => {currentVersion}. You can update an
> manage this during deployment.

Sorry, I forgot about your post!  Thank you for your suggestions!

Is this a keyval?

  https://nginx.org/en/docs/http/ngx_http_keyval_module.html

> We usually include this in a CI/CD pipeline after deployment to dynamically
> switch to the last version (using a curl request to the NGINX API). If you
> can't use keyvals, use a static map and dynamically generate that "map"
> directive's mapping. Restart NGINX to reflect changes. Keyvals let you do
> this on the fly.

Is this a static map?

  https://nginx.org/en/docs/http/ngx_http_map_module.html

And by "dynamically generate" do you mean generate the map directive as
a config file that would be included from the main config and then cause
nginx to reload its config?
 
> The major advantage of this approach is with updates. You are most likely
> going to run into issues with browser or proxy caching if you provide
> different versions of files/apps under the same path. By having a canonical
> form that respects the version structure, you are avoiding this altogether.
> Yet, you have the flexibility to run hotfixes (replace existing files in an
> existing version without creating a new one), or experimental versions
> (which won't update the "current" pointer).

Interesting.  What I was trying to do with $realpath_root, I thought
was similar to what you're describing.  However, when you mention
browser or proxy caching, then I'm not sure.  Are you suggesting
serving from a different URI for each version of the app?  If not,
then I don't understand how your proposal behaves differently than the
symlink+realpath idea.  (But this may be because you wrote this on Aug
30, and the symlink+realpath idea had not been clearly stated yet.)

> I would try to keep the complexity low.

Agreed!  However, changing a symlink (albeit with some nginx config
changes to use $realpath_root and such) is pretty simple to me, so it's
a little harder for me to see using a keyval or a static map as keeping
the complexity low.  But if I understand your proposal correctly, it
would be more straightforward in terms of not needing to use symlinks at
all and not needing to worry about $realpath_root vs. $document_root.
Instead, you just use variables, and to update the variables, you just
use the API if using a keyval, or cause nginx to reload its config if
using the static map.

Thank you for the suggestions!

Regards,

Lewis
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Allow internal redirect to URI x, but deny external request for x?

Jürgen Wagner (DVT)

Hi Lewis,

  the idea is to have a deployment process that places apps or whatever artifacts always in a certain distinct place that is determined once at deployment time. This will determine the address where you can reach the app in the namespace of NGINX. So, if the convention is to place an app in a directory

{webroot}/{app}/releases/{version}/...

served as

https://{server}/{app}/releases/{version}/...

you would have a single, official URL prefix for each app version to be served from.

Now, you want to be able to say what is the "current" version and reflect this in the URL namespace as well. In the file system, that's a symbolic link. In the URL namespace of NGINX, that could be a redirection (status code 307). Both approaches would work. For the redirection you need a location

/{app}/current

which redirects any request for paths starting with this to the actual version you want to serve:

/{app}/releases/{latestVersion}

This can be achieved with a dynamically-generated stub you include in a "map" directive (requiring NGINX reload in case of changes) or a "keyval" map that can be changed via the NGINX API on the fly as you need it (not requiring reloads). The mapping will get the app name and determine the path of the latest version where the redirection should go to.

The issue about browser and proxy caches: if over time you serve multiple versions of an app from the same URLs, browsers (or proxies) may consider their cached version of some files current enough not to feel motivated refetching them. In some cases, you would end up with some files loaded into the browser being of an old version, some already a newer one. This can be avoided entirely by giving each version of the app a distinct canonical prefix that will never be re-used. The "current" redirection is simply a pointer to the right location for the latest version, but as it is an external redirection, the browser will ultimately load the app from the official "releases" path with the version number in it.

Cheers,

--j.



On 04.09.2019 05:29, J. Lewis Muir wrote:
On 08/30, j94305 wrote:
I've been following this, and I would take a slightly different approach.

1. Serve all apps under /{app}/releases/{version}/{path} as you have them
organized in the deployment structure in the file system.

2. Forget about symbolic links and other makeshift versioning/defaulting in
the file system.

3. Use a keyval mapping to handle redirections (307) of
/{app}/current/{stuff} to /{app}/releases/{currentVersion}/{stuff}, where
the keyval mapping provides {app} => {currentVersion}. You can update an
manage this during deployment.
Sorry, I forgot about your post!  Thank you for your suggestions!

Is this a keyval?

  https://nginx.org/en/docs/http/ngx_http_keyval_module.html

We usually include this in a CI/CD pipeline after deployment to dynamically
switch to the last version (using a curl request to the NGINX API). If you
can't use keyvals, use a static map and dynamically generate that "map"
directive's mapping. Restart NGINX to reflect changes. Keyvals let you do
this on the fly.
Is this a static map?

  https://nginx.org/en/docs/http/ngx_http_map_module.html

And by "dynamically generate" do you mean generate the map directive as
a config file that would be included from the main config and then cause
nginx to reload its config?
 
The major advantage of this approach is with updates. You are most likely
going to run into issues with browser or proxy caching if you provide
different versions of files/apps under the same path. By having a canonical
form that respects the version structure, you are avoiding this altogether.
Yet, you have the flexibility to run hotfixes (replace existing files in an
existing version without creating a new one), or experimental versions
(which won't update the "current" pointer).
Interesting.  What I was trying to do with $realpath_root, I thought
was similar to what you're describing.  However, when you mention
browser or proxy caching, then I'm not sure.  Are you suggesting
serving from a different URI for each version of the app?  If not,
then I don't understand how your proposal behaves differently than the
symlink+realpath idea.  (But this may be because you wrote this on Aug
30, and the symlink+realpath idea had not been clearly stated yet.)

I would try to keep the complexity low.
Agreed!  However, changing a symlink (albeit with some nginx config
changes to use $realpath_root and such) is pretty simple to me, so it's
a little harder for me to see using a keyval or a static map as keeping
the complexity low.  But if I understand your proposal correctly, it
would be more straightforward in terms of not needing to use symlinks at
all and not needing to worry about $realpath_root vs. $document_root.
Instead, you just use variables, and to update the variables, you just
use the API if using a keyval, or cause nginx to reload its config if
using the static map.

Thank you for the suggestions!

Regards,

Lewis
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

juergen_wagner.vcf (421 bytes) Download Attachment
smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Allow internal redirect to URI x, but deny external request for x?

J. Lewis Muir
On 09/04, Jürgen Wagner (DVT) wrote:
> Now, you want to be able to say what is the "current" version and reflect
> this in the URL namespace as well. In the file system, that's a symbolic
> link. In the URL namespace of NGINX, that could be a redirection (status
> code 307). Both approaches would work. For the redirection you need a
> location

Got it!  Thank you!  So this approach versions the URI.

> /{app}/current
>
> which redirects any request for paths starting with this to the actual
> version you want to serve:
>
> /{app}/releases/{latestVersion}
>
> This can be achieved with a dynamically-generated stub you include in a
> "map" directive (requiring NGINX reload in case of changes) or a "keyval"
> map that can be changed via the NGINX API on the fly as you need it (not
> requiring reloads). The mapping will get the app name and determine the path
> of the latest version where the redirection should go to.

Got it.

> The issue about browser and proxy caches: if over time you serve multiple
> versions of an app from the same URLs, browsers (or proxies) may consider
> their cached version of some files current enough not to feel motivated
> refetching them. In some cases, you would end up with some files loaded into
> the browser being of an old version, some already a newer one. This can be
> avoided entirely by giving each version of the app a distinct canonical
> prefix that will never be re-used. The "current" redirection is simply a
> pointer to the right location for the latest version, but as it is an
> external redirection, the browser will ultimately load the app from the
> official "releases" path with the version number in it.

Wouldn't the 307 redirection mean that for *every* request, nginx has to
issue a 307 and then the client has to request the versioned URI which
nginx then has to server; so a double-request for every resource?

I agree that this approach solves the browser and proxy cache problem,
though.

How does this solve the request-chain problem where "part1.php" executes
in one version of the app, but then "part2.php" executes in the updated
version because the updated version was deployed in between?  I presume
it doesn't, which is OK, but I want to make sure I understand.

Do you know of any mainstream web apps that are deployed this way
(i.e., 307 redirect to versioned URI)?

Thank you!

Lewis
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Allow internal redirect to URI x, but deny external request for x?

Jürgen Wagner (DVT)
Hi Lewis,

   no, that won't cause double requests.

/myapp/current/blah.html

307 => /myapp/releases/1.2.0/blah.html

and from thereon (as we did not redirect internally, but rather
externally), any further accesses will happen unter the true "releases"
path (ideally, as relative URLs).

That's only one redirection overhead in the beginning.

The redirection will forward any path under "current", i.e.,

/myapp/current/index.html => /myapp/releases/1.2.0/index.html

/myapp/current/images/icon.jpg => /myapp/releases/1.2.0/images/icon.jpg

and so on. Only the first request will be a redirection. All subsequent
requests would use the true path.

We use this approach with a number of applications, e.g., multiple
Jenkins or Gitlab installations behind one NGINX, but also with
front-end components being deployed with a CI/CD pipeline in Amazon S3,
that also switches the "current" link to the then respectively latest
version of the artifact. The good thing is: if a user has loaded version
1.2.0, all links into the releases/1.2.0 path will continue to work,
even if you upload a new version 1.2.1 and make that the "current"
version. Any URLs with the "current" part in it will not be used as a
reference except in the initial process of accessing the latest version
of an app. From there, everything will always and only use the canonical
form in the "private" releases path. That's the nature of a redirection.

This is the effect you get by having the HTTP equivalent of a symbolic
link in the NGINX (visible to the browser), not in the file system
(which is opaque to users). The file system link will (over time) serve
different contents under the same URL, so in fact, addressing changes
with every deployment. The suggested approach keeps URL addressing
constant and just changes the entry pointer on a new deployment.

I agree that this is not the solution that first comes to ones mind, but
it does solve a number of nasty versioning issues we have run into over
time. Your mileage may vary :-)

Good luck!

--Jürgen



On 04.09.2019 17:28, J. Lewis Muir wrote:
> Wouldn't the 307 redirection mean that for*every*  request, nginx has to
> issue a 307 and then the client has to request the versioned URI which
> nginx then has to server; so a double-request for every resource?

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx

juergen_wagner.vcf (421 bytes) Download Attachment
smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Allow internal redirect to URI x, but deny external request for x?

J. Lewis Muir
On 09/04, Jürgen Wagner (DVT) wrote:

> This is the effect you get by having the HTTP equivalent of a symbolic link
> in the NGINX (visible to the browser), not in the file system (which is
> opaque to users). The file system link will (over time) serve different
> contents under the same URL, so in fact, addressing changes with every
> deployment. The suggested approach keeps URL addressing constant and just
> changes the entry pointer on a new deployment.
>
> I agree that this is not the solution that first comes to ones mind, but it
> does solve a number of nasty versioning issues we have run into over time.
> Your mileage may vary :-)

Thank you for the further explanation!  Indeed it seems like a
compelling solution!

What about web search engine indexing; do you do anything to avoid
search engines indexing the versioned URLs?  I suppose that if you only
publish the unversioned entry-point URLs, search engines will respect
that?  (Maybe wishful thinking.)  Or will they follow a 307 redirect and
index those URLs?

For example, it would seem undesirable to do a web search for "my-app"
and get a list of, say, the "index.php" for each version (e.g.,
"/my-app/releases/1.0.0/index.php", "/my-app/releases/1.0.1/index.php",
"/my-app/releases/1.0.2/index.php", etc.).

So, perhaps you use a "/robots.txt" to exclude "/my-app/releases/"?

DuckDuckGo seems to respect "/robots.txt" for controlling what gets indexed

  https://help.duckduckgo.com/duckduckgo-help-pages/results/duckduckbot/

But Google says "/robots.txt" is not for keeping a web page out of their
index

  https://support.google.com/webmasters/answer/6062608

and that you should use a "noindex" directive instead.

So maybe you use both a "/robots.txt" and the robots meta tag with
content="noindex" in the served resources or perhaps "X-Robots-Tag:
noindex" in the HTTP header response?

Regards,

Lewis
_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
Reply | Threaded
Open this post in threaded view
|

Re: Allow internal redirect to URI x, but deny external request for x?

zakirenish
Robots exclusion is generally quite unreliable. Exclusions based on user
agents are also not really reliable. You can try all of the options for
robots exclusion and may still get undesired crawlers on your site.

The only way you can keep robots out is to require authentication for those
parts you don't want to have crawled.

--j.

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,285463,285599#msg-285599

_______________________________________________
nginx mailing list
[hidden email]
http://mailman.nginx.org/mailman/listinfo/nginx
12