Parse HTML table data to JSON and save to text file in Python 2.7 -

July 15, 2015

i'm trying extract data on crime rate across states webpage, link web page http://www.disastercenter.com/crime/uscrime.htm

i able text file. response in json format. how can in python.

here code:

import urllib         import re       bs4 import beautifulsoup     link = "http://www.disastercenter.com/crime/uscrime.htm"     f = urllib.urlopen(link)     myfile = f.read()     soup = beautifulsoup(myfile)     soup1=soup.find('table', width="100%")     soup3=str(soup1)     result = re.sub("<.*?>", "", soup3)     print(result)     output=open("output.txt","w")     output.write(result)     output.close()

the following code data 2 tables , output of json formatted string.

working example (python 2.7.9):

from lxml import html import requests import re regular_expression import json  page = requests.get("http://www.disastercenter.com/crime/uscrime.htm") tree = html.fromstring(page.text)  tables = [tree.xpath('//table/tbody/tr[2]/td/center/center/font/table/tbody'),           tree.xpath('//table/tbody/tr[5]/td/center/center/font/table/tbody')]  tabs = []  table in tables:     tab = []     row in table:         col in row:             var = col.text_content()             var = var.strip().replace(" ", "")             var = var.split('\n')             if regular_expression.match('^\d{4}$', var[0].strip()):                 tab_row = {}                 tab_row["year"] = var[0].strip()                 tab_row["population"] = var[1].strip()                 tab_row["total"] = var[2].strip()                 tab_row["violent"] = var[3].strip()                 tab_row["property"] = var[4].strip()                 tab_row["murder"] = var[5].strip()                 tab_row["forcible_rape"] = var[6].strip()                 tab_row["robbery"] = var[7].strip()                 tab_row["aggravated_assault"] = var[8].strip()                 tab_row["burglary"] = var[9].strip()                 tab_row["larceny_theft"] = var[10].strip()                 tab_row["vehicle_theft"] = var[11].strip()                 tab.append(tab_row)     tabs.append(tab)  json_data = json.dumps(tabs)  output = open("output.txt", "w") output.write(json_data) output.close()

Search This Blog

Lix

Parse HTML table data to JSON and save to text file in Python 2.7 -

Comments

Post a Comment

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

javascript - IE11 incompatibility with jQuery's 'readonly'? -

php - How can I echo out this array? -