Sending bz2-compressed data as utf-8 string via gunicorn with Python 2.7 -
i trying use gunicorn send utf-8 encoded string, result bz2 compression, response request.
here code on gunicorn server side:
def app(environ, start_response): data = "hello, world!" * 10 compressed_data = bz2.compress(data) start_response("200 ok", [("content-type", "text/plain"), ('charset', 'utf-8'), ("content-length", str(len(compressed_data))), ('access-control-allow-headers', '*'), ('access-control-allow-origin', '*'), # ('content-transfer-encoding', 'base64'), ]) return iter([compressed_data])
and when try request client side using python requests package this
import bz2 import requests res = requests.get('http://127.0.0.1:8000') bz2.decompress(res.text)
it raising exception
unicodeencodeerror: 'ascii' codec can't encode character u'\xab' in position 11: ordinal not in range(128)
saying response cant not decoded when trying print text of response
print(res.text) >>u'bzh91ay&syy\xabm\x99\x00\x00\x13\x97\x80`\x04\x00@\x00\x80\x06\x04\x90\x00 \x00\xa5p\xd0\xda\x10\x03\x0e\xd3\xd4\xdai4\x9bo\x93\x13\x13\xc2b~\x9c\x17re8p\x90y\xabm\x99'
when printing encoded text
import bz2 print(bz2.compress("hello, world!" * 10)) >> 'bzh91ay&syy\xabm\x99\x00\x00\x13\x97\x80`\x04\x00@\x00\x80\x06\x04\x90\x00 \x00\xa5p\xd0\xda\x10\x03\x0e\xd3\xd4\xdai4\x9bo\x93\x13\x13\xc2b~\x9c\x17re8p\x90y\xabm\x99'
the difference unicode sign, solving issue tweaking data on client side make response string decodeable, wondering how can solve problem on server side?
you can't send bzip2 compressed data utf-8. binary data, not text.
if http client accepts bzip2 content-encoding (bzip2
not standard) sent utf-8 encoded text compressed using bzip2:
#!/usr/bin/env python import bz2 def app(environ, start_response): status = '200 ok' headers = [('content-type', 'text/plain; charset=utf-8')] data = (u'hello \n{snowman}\n' * 10).encode('utf-8') if 'bzip2' in environ.get('http_accept_encoding', ''): # use bzip2 if requested data = bz2.compress(data) headers.append(('content-encoding', 'bzip2')) headers.append(('content-length', str(len(data)))) start_response(status, headers) return data
example
uncompressed response:
$ http -v 127.0.0.1:8000 / http/1.1 accept: */* accept-encoding: gzip, deflate connection: keep-alive host: 127.0.0.1:8000 user-agent: httpie/0.9.2 http/1.1 200 ok connection: close content-length: 100 content-type: text/plain; charset=utf-8 date: sun, 17 may 2015 18:47:50 gmt server: gunicorn/19.3.0 hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃
bzip2 compressed response if client specifies accepts bzip2:
$ http -v 127.0.0.1:8000 accept-encoding:bzip2 / http/1.1 accept: */* accept-encoding: bzip2 connection: keep-alive host: 127.0.0.1:8000 user-agent: httpie/0.9.2 http/1.1 200 ok connection: close content-encoding: bzip2 content-length: 65 content-type: text/plain; charset=utf-8 date: sun, 17 may 2015 18:48:23 gmt server: gunicorn/19.3.0 +-----------------------------------------+ | note: binary data not shown in terminal | +-----------------------------------------+
here's corresponding http client uses requests
library:
#!/usr/bin/env python __future__ import print_function import bz2 import requests # $ pip install requests r = requests.get('http://localhost:8000', headers={'accept-encoding': 'gzip, deflate, bzip2'}) content = r.content print(len(content)) if r.headers['content-encoding'].endswith('bzip2'): # requests doesn't understand bzip2 content = bz2.decompress(content) print(len(content)) text = content.decode(r.encoding) print(len(text)) print(text, end='')
output
65 100 80 hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃
otherwise (no non-standard accept-encoding), should send data application/octet-stream
@icedtrees suggested:
#!/usr/bin/env python import bz2 def app(environ, start_response): status = '200 ok' headers = [('content-type', 'application/octet-stream')] data = bz2.compress((u'hello \n{snowman}\n' * 10).encode('utf-8')) headers.append(('content-length', str(len(data)))) start_response(status, headers) return data
example
$ http 127.0.0.1:8000 http/1.1 200 ok connection: close content-length: 65 content-type: application/octet-stream date: sun, 17 may 2015 18:53:55 gmt server: gunicorn/19.3.0 +-----------------------------------------+ | note: binary data not shown in terminal | +-----------------------------------------+
bzcat
accepts bzip2 content:
$ http 127.0.0.1:8000 | bzcat hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃
the data shown correctly because terminal uses utf-8 encoding.
Comments
Post a Comment