Nginx upload module

Nginx has an upload module (in the nginx-extra package, not in the nginx-full one) that is very nice when you want to allow users to upload large files (think multi-GB video files).  The way it works is that you define a location to which posts may be made where the post may contain an upload.  The module parses out the embedded files from the body, stores them in files in a directory, and then lets you forward the metadata to your Django form. The nginx config looks like this:

    location /upload {
        
        upload_pass /uploads/upload_nginx/;
        upload_store /var/www/media/upload;
        upload_max_file_size 16000M;
        upload_store_access user:rw group:rw all:r;
        upload_cleanup 400-599;

        upload_set_form_field $upload_field_name.name "$upload_file_name";
        upload_set_form_field $upload_field_name.content_type "$upload_content_type";
        upload_set_form_field $upload_field_name.path "$upload_tmp_path";

        upload_pass_form_field "^csrfmiddlewaretoken$";
    }

upload_pass there is the (local) URL to which you are going to send control after the upload completes.  The upload_set_form_field stuff lets you specify how your file metadata is going to be passed, in this case, if your file field is named "file" then you'll get "file.name" "file.content_type" and "file.path" parameters in your Django form.  The upload_pass_form_field stuff lets you forward other fields selectively, you'll at least want to include the csrf token.

On the Django side, you need to construct File instances that point at the uploaded files and have the metadata parsed out of the parameters you passed.

class NginxUploadedFile(uploadedfile.UploadedFile):
    def __init__(self, path, name, content_type, size, charset):
        file = open( path, 'rb' )
        super( NginxUploadedFile, self ).__init__( file, name, content_type, size, charset )
    def temporary_file_path(self):
        return self.file.name
    def close(self):
        try:
            return self.file.close()
        except OSError, e:
            if e.errno != 2:
                # Means the file was moved or deleted before the tempfile
                # could unlink it.  Still sets self.file.close_called and
                # calls self.file.file.close() before the exception
                raise

You need to be careful, obviously, to validate the filenames here as actually being inside your uploads folder (and avoid directory traversals, etc), as you don't want a malicious post uploading your passwords/configs/etc. The effect is that your uploads stream directly to disk in nginx, and then your form's standard save() will attempt to do a basic move/rename of the file into the final location (that only works if your upload folder is on the same filesystem as the final destination, of course).

There are some details to deal with, such as cases where the form's validation fails, which in Django is normally returned as a 200 rather than a 302. Nginx won't clean up the uploads in that case, so you need to manually go through and unlink the files in those cases.  Still, the mechanism allows even a Raspberry Pi to accept large-file uploads without swamping as pointless copies of the files are done between Nginx and the proxy server (i.e. gunicorn/django).  The upload module has some progress/restart functionality too, but I haven't looked into that yet.

[Update] One thing you should note: the nginx-upload-module is not compatible with newer nginx servers (1.3.9+), so while it's currently the only pre-packages solution I've found that does 0-copy uploads, it's not long for this world.

[Update 2] Turns out that the nginx 1.3.9+ has the ability to turn off the post body, which means it should be possible to avoid the copy to the proxy (you'll still wind up reading the data into ram and then out again, though)...

[Update 3] And it turns out that this does *not* work with ModelForms as described, because the __get__ method on FileFields will wind up wrapping your uploaded file in a FieldFile instance, which doesn't have a temporary_file_path() method. You can work around it by setting the value explicitly, but that sucks more than a bit.

Comments

  1. nginx rocks

    nginx rocks on 07/06/2013 1:58 a.m. #

    nice post on nginx upload module.

Comments are closed.

Pingbacks

Pingbacks are closed.

Trackbacks