Do I really need 0-copy?

Nginx Upload Module's key feature/optimization is that as nginx streams the multipart post to disk, it directs the embedded files directly to separate file handles, you can then just move those files to the final location.  So far everything I've seen to replace it has at least a couple of copies going on, as they instead stream the whole post into a single file (with all of the body parts), or they attempt to stream  bit-by-bit, but still go through the proxy mechanism.

The only other solutions I've seen that don't need to copy the files around are things like Flash uploaders, and WebSockets, basically you can write a daemon to receive just the uploads, but then you're stuck supporting at least a couple of mechanisms, as older browsers still need to be able to upload files.  Still, I had to stop myself from writing a WebSockets implementation just because... well... it looks fun :) .

I started writing a Django WSGIHandler sub-class that could use the client_body_in_file_only clean; proxy_set_body off; approach to avoid one copy, but you still wind up reading the data off the disk just to write it back out again.  You're also doing your body parsing in (slow) Python instead of fast C.  It also seems somewhat grotty to start rewriting WSGI for this kind of thing, so I pulled back.

jquery-file-upload has support for "chunked" uploads (on modern browsers only), and I can (and even started to) write a Django app to direct those into disk-files, but that increases total load, and adds lots of latency with round-trips and the like.  You likely want to split the uploads into 60 or so chunks to get good upload feedback (so your 2GB file would be uploaded in 33MB chunks).  On a slow (and overworked, giving what it's doing) server, that's likely adding 30-60s to the upload... it's on the same kind of scale as the whole upload with the upload module.

WebDAV is another approach I'm considering, add some proxy-based authentication and then let users just upload the files using whatever WebDAV client they like. That should, I would hope, just stream direct-to-disk.

At this point, however, having somewhat burned out on caring about 0-copy uploads, I think I'm just going to keep using the upload module until I find something more satisfying.

Comments

  1. Ben Timby

    Ben Timby on 06/10/2013 3:01 p.m. #

    Our Django application handles many thousands of uploads a day. We use the Nginx Upload module currently. I wanted to bring to your attention a wrapper I created that takes care of 0-copy uploads, and presents them to your Django application the same way as regular uploads.

    https://github.com/smartfile/django-transfer/

    For us, this allows the same upload handling code to work under Nginx and the Django development server.

Comments are closed.

Pingbacks

Pingbacks are closed.

Trackbacks