Python: Its a scripting language you can basically do anything with. Err.. most things, some are a real pain in the ass out of the box.
Before we start, a caveat. I’m working with Python 2.7, if you’re using Python 3.x, you may be able to use
urllib3, which I’ve heard good things about. If you’re too lazy to look into it, the code I include below should still work for you.
I found myself needing to upload a file via Python. Like any
expert developer I began searching the compendium of human knowledge that is StackOverflow.
I found many results, all of which looked promising at first glance.
- Using MultipartPostHandler to POST form-data with Python
- Send file using POST from a Python script
- How to send a “multipart/form-data” with requests in python?
I quickly realized that almost all questions referenced the incredible “requests” , “poster” or other third party modules. While any sane developer would just bask in their single handed victory and then start on the next item of their to-do list, I’m a glutton for punishment. I needed to do my multipart upload like a real
man developer: python standard libraries only.
Luckily I was able to find a simple looking snippet that only used urllib2 something I was familiar with. Huzzah! With a few test files in hand, I began testing my shiny new script. Alas it was all for naught, the multipart upload script would only work for some files, and would fail horribly for others.
The error message I was getting
UnicodeDecodeError: 'utf8' codec can't decode byte 0x8d in position 516: invalid start byte helped clue me into the fact that the files that failed were binary files rather than simple text documents. It seems the simple script was concatenating the file data directly into a string, at which point my binary files threw up. Ah the joys of file encoding.
I tried a quick and proven fix: when in doubt, force “utf-8”. As the
open command doesn’t allow us to force encoding, I switched to using the built-in
codecs module. I tried a few different file encodings before doing a naive search for “How to detect the encoding of a file” at which point I felt like a real idiot as I saw the answer:
Files generally indicate their encoding with a file header. … However, even reading the header you can never be sure what encoding a file is really using.
Great, back to square one.
The most obvious solution was to rewrite the uploader script so that it used a binary buffer to store the file data, something that would be much more intelligent. I quickly hacked together a quick version of the file uploader script, but made sure to use
BytesIO to store the form data, rather than joining all the data into a string. Again, no joy. Now I was getting the same error, but deep inside the
urllib2 function. Ugh, that means that internally
urllib2 is converting my beautiful binary buffer into a string. Son of a.
Screw it. I’ll just rewrite it using
import mimetools import mimetypes import io import http import json form = MultiPartForm() form.add_field("form_field", "my awesome data") # Add a fake file form.add_file(key, os.path.basename(filepath), fileHandle=codecs.open("/path/to/my/file.zip", "rb")) # Build the request url = "http://www.example.com/endpoint" schema, netloc, url, params, query, fragments = urlparse.urlparse(url) try: form_buffer = form.get_binary().getvalue() http = httplib.HTTPConnection(netloc) http.connect() http.putrequest("POST", url) http.putheader('Content-type',form.get_content_type()) http.putheader('Content-length', str(len(form_buffer))) http.endheaders() http.send(form_buffer) except socket.error, e: raise SystemExit(1) r = http.getresponse() if r.status == 200: return json.loads(r.read()) else: print('Upload failed (%s): %s' % (r.status, r.reason)) class MultiPartForm(object): """Accumulate the data to be used when posting a form.""" def __init__(self): self.form_fields =  self.files =  self.boundary = mimetools.choose_boundary() return def get_content_type(self): return 'multipart/form-data; boundary=%s' % self.boundary def add_field(self, name, value): """Add a simple field to the form data.""" self.form_fields.append((name, value)) return def add_file(self, fieldname, filename, fileHandle, mimetype=None): """Add a file to be uploaded.""" body = fileHandle.read() if mimetype is None: mimetype = mimetypes.guess_type(filename) or 'application/octet-stream' self.files.append((fieldname, filename, mimetype, body)) return def get_binary(self): """Return a binary buffer containing the form data, including attached files.""" part_boundary = '--' + self.boundary binary = io.BytesIO() needsCLRF = False # Add the form fields for name, value in self.form_fields: if needsCLRF: binary.write('\r\n') needsCLRF = True block = [part_boundary, 'Content-Disposition: form-data; name="%s"' % name, '', value ] binary.write('\r\n'.join(block)) # Add the files to upload for field_name, filename, content_type, body in self.files: if needsCLRF: binary.write('\r\n') needsCLRF = True block = [part_boundary, str('Content-Disposition: file; name="%s"; filename="%s"' % \ (field_name, filename)), 'Content-Type: %s' % content_type, '' ] binary.write('\r\n'.join(block)) binary.write('\r\n') binary.write(body) # add closing boundary marker, binary.write('\r\n--' + self.boundary + '--\r\n') return binary