Sending bz2-compressed data as utf-8 string via gunicorn with Python 2.7 -


i trying use gunicorn send utf-8 encoded string, result bz2 compression, response request.

here code on gunicorn server side:

def app(environ, start_response):     data = "hello, world!" * 10     compressed_data = bz2.compress(data)     start_response("200 ok", [("content-type", "text/plain"),                               ('charset', 'utf-8'),                               ("content-length", str(len(compressed_data))),                               ('access-control-allow-headers', '*'),                               ('access-control-allow-origin', '*'),                               # ('content-transfer-encoding', 'base64'),                           ])     return iter([compressed_data]) 

and when try request client side using python requests package this

import bz2 import requests res = requests.get('http://127.0.0.1:8000') bz2.decompress(res.text) 

it raising exception

unicodeencodeerror: 'ascii' codec can't encode character u'\xab' in position 11: ordinal not in range(128) 

saying response cant not decoded when trying print text of response

print(res.text) >>u'bzh91ay&syy\xabm\x99\x00\x00\x13\x97\x80`\x04\x00@\x00\x80\x06\x04\x90\x00 \x00\xa5p\xd0\xda\x10\x03\x0e\xd3\xd4\xdai4\x9bo\x93\x13\x13\xc2b~\x9c\x17re8p\x90y\xabm\x99' 

when printing encoded text

import bz2 print(bz2.compress("hello, world!" * 10)) >> 'bzh91ay&syy\xabm\x99\x00\x00\x13\x97\x80`\x04\x00@\x00\x80\x06\x04\x90\x00 \x00\xa5p\xd0\xda\x10\x03\x0e\xd3\xd4\xdai4\x9bo\x93\x13\x13\xc2b~\x9c\x17re8p\x90y\xabm\x99' 

the difference unicode sign, solving issue tweaking data on client side make response string decodeable, wondering how can solve problem on server side?

you can't send bzip2 compressed data utf-8. binary data, not text.

if http client accepts bzip2 content-encoding (bzip2 not standard) sent utf-8 encoded text compressed using bzip2:

#!/usr/bin/env python import bz2  def app(environ, start_response):     status = '200 ok'     headers = [('content-type', 'text/plain; charset=utf-8')]     data = (u'hello \n{snowman}\n' * 10).encode('utf-8')      if 'bzip2' in environ.get('http_accept_encoding', ''): # use bzip2 if requested         data = bz2.compress(data)         headers.append(('content-encoding', 'bzip2'))      headers.append(('content-length', str(len(data))))     start_response(status, headers)     return data 

example

uncompressed response:

$ http -v 127.0.0.1:8000 / http/1.1 accept: */* accept-encoding: gzip, deflate connection: keep-alive host: 127.0.0.1:8000 user-agent: httpie/0.9.2    http/1.1 200 ok connection: close content-length: 100 content-type: text/plain; charset=utf-8 date: sun, 17 may 2015 18:47:50 gmt server: gunicorn/19.3.0  hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ 

bzip2 compressed response if client specifies accepts bzip2:

$ http -v 127.0.0.1:8000 accept-encoding:bzip2  / http/1.1 accept: */* accept-encoding: bzip2 connection: keep-alive host: 127.0.0.1:8000 user-agent: httpie/0.9.2    http/1.1 200 ok connection: close content-encoding: bzip2 content-length: 65 content-type: text/plain; charset=utf-8 date: sun, 17 may 2015 18:48:23 gmt server: gunicorn/19.3.0    +-----------------------------------------+ | note: binary data not shown in terminal | +-----------------------------------------+ 

here's corresponding http client uses requests library:

#!/usr/bin/env python __future__ import print_function import bz2 import requests # $ pip install requests  r = requests.get('http://localhost:8000', headers={'accept-encoding': 'gzip, deflate, bzip2'}) content = r.content print(len(content)) if r.headers['content-encoding'].endswith('bzip2'): # requests doesn't understand bzip2     content = bz2.decompress(content) print(len(content)) text = content.decode(r.encoding) print(len(text)) print(text, end='') 

output

65 100 80 hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ 

otherwise (no non-standard accept-encoding), should send data application/octet-stream @icedtrees suggested:

#!/usr/bin/env python import bz2  def app(environ, start_response):     status = '200 ok'     headers = [('content-type', 'application/octet-stream')]     data = bz2.compress((u'hello \n{snowman}\n' * 10).encode('utf-8'))      headers.append(('content-length', str(len(data))))     start_response(status, headers)     return data 

example

$ http 127.0.0.1:8000  http/1.1 200 ok connection: close content-length: 65 content-type: application/octet-stream date: sun, 17 may 2015 18:53:55 gmt server: gunicorn/19.3.0    +-----------------------------------------+ | note: binary data not shown in terminal | +-----------------------------------------+ 

bzcat accepts bzip2 content:

$ http 127.0.0.1:8000 | bzcat hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ hello ☃ 

the data shown correctly because terminal uses utf-8 encoding.


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -