python - pycharm console unicode to readable string -


studying python this tutorial

the problem when trying cyrillic characters unicode in pycharm console.

enter image description here

import requests bs4 import beautifulsoup import operator import codecs  def start(url):     word_list = []     source_code = requests.get(url).text     soup = beautifulsoup(source_code)      post_text in soup.findall('a', {'class': 'b-tasks__item__title js-set-visited'}):         content = post_text.string          words = content.lower().split()         each_word in words:             word_list.append(each_word)     clean_up_list(word_list)    def clean_up_list(word_list):     clean_word_list = []     word in word_list:         symbols = "!@#$%^&*()_+{}|:<>?,./;'[]\=-\""         in range(0, len(symbols)):             word = word.replace(symbols[i], "")         if len(word) > 0:             clean_word_list.append(word)     create_dictionary(clean_word_list)    def create_dictionary(clean_word_list):     word_count = {} word in clean_word_list:     if word in word_count:         word_count[word] += 1     else:         word_count[word] = 1  key, value in sorted(word_count.items(), key=operator.itemgetter(1)):     print(key, value) 

when changing print(key, value) print(key.decode('utf8'), value) getting "unicodeencodeerror: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)"

enter image description here

start('https://youdo.com/tasks-all-opened-all-moscow-1') there suggestion on internet changing encoding in files - don't it. can't read in console? osx

upd key.encode("utf-8") enter image description here

utf-8 painful. created file line in latin caracters , 1 russian ones. following code:

# encoding: utf-8  open("testing.txt", "r", encoding='utf-8') f:     line = f.read()     print(line) 

outputs in pycharm

enter image description here

note 2 encoding entries

since getting data web page, must make sure use right encoding well. following code

# encoding: utf-8 r = requests.get('http://www.pravda.ru/') r.encoding = 'utf-8' print(r.text) 

outputs in pycharm as

enter image description here

please note must set encoding match 1 of page.


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -