python - pycharm console unicode to readable string -
studying python this tutorial
the problem when trying cyrillic characters unicode in pycharm console.
import requests bs4 import beautifulsoup import operator import codecs def start(url): word_list = [] source_code = requests.get(url).text soup = beautifulsoup(source_code) post_text in soup.findall('a', {'class': 'b-tasks__item__title js-set-visited'}): content = post_text.string words = content.lower().split() each_word in words: word_list.append(each_word) clean_up_list(word_list) def clean_up_list(word_list): clean_word_list = [] word in word_list: symbols = "!@#$%^&*()_+{}|:<>?,./;'[]\=-\"" in range(0, len(symbols)): word = word.replace(symbols[i], "") if len(word) > 0: clean_word_list.append(word) create_dictionary(clean_word_list) def create_dictionary(clean_word_list): word_count = {} word in clean_word_list: if word in word_count: word_count[word] += 1 else: word_count[word] = 1 key, value in sorted(word_count.items(), key=operator.itemgetter(1)): print(key, value)
when changing print(key, value) print(key.decode('utf8'), value) getting "unicodeencodeerror: 'ascii' codec can't encode characters in position 0-7: ordinal not in range(128)"
start('https://youdo.com/tasks-all-opened-all-moscow-1') there suggestion on internet changing encoding in files - don't it. can't read in console? osx
upd key.encode("utf-8")
utf-8 painful. created file line in latin caracters , 1 russian ones. following code:
# encoding: utf-8 open("testing.txt", "r", encoding='utf-8') f: line = f.read() print(line)
outputs in pycharm
note 2 encoding
entries
since getting data web page, must make sure use right encoding well. following code
# encoding: utf-8 r = requests.get('http://www.pravda.ru/') r.encoding = 'utf-8' print(r.text)
outputs in pycharm as
please note must set encoding match 1 of page.
Comments
Post a Comment