sorting - Python3 Make tie-breaking lambda sort more pythonic? -


as exercise in python lambdas (just can learn how use them more properly) gave myself assignment sort strings based on other natural string order.

i scraped apache version number strings , came lambda sort them based on numbers extracted regexes. works, think can better don't know how improve it's more robust.

from lxml import html import requests import re  # send request page , parse list of html links jmeter_archive_url='https://archive.apache.org/dist/jmeter/binaries/' jmeter_archive_get=requests.get(url=jmeter_archive_url) page_tree=html.fromstring(jmeter_archive_get.text) list_of_links=page_tree.xpath('//a[@href]/text()')  # filter out non-md5s. there lot of links, , # it's more data needed exercise jmeter_md5_list=list(filter(lambda x: x.endswith('.tgz.md5'), list_of_links))  # here's 'magic' happens. use 2 different regexes rip first # , second number out of string , turn them integers. # return them in order grabbed them, allowing tie break. jmeter_md5_list.sort(key=lambda val: (int(re.search('(\d+)\.\d+', val).group(1)), int(re.search('\d+\.(\d+)', val).group(1)))) print(jmeter_md5_list) 

this have desired effect, output is:

['jakarta-jmeter-2.5.1.tgz.md5', 'apache-jmeter-2.6.tgz.md5', 'apache-jmeter-2.7.tgz.md5', 'apache-jmeter-2.8.tgz.md5', 'apache-jmeter-2.9.tgz.md5', 'apache-jmeter-2.10.tgz.md5', 'apache-jmeter-2.11.tgz.md5', 'apache-jmeter-2.12.tgz.md5', 'apache-jmeter-2.13.tgz.md5']

so can see strings sorted order makes sense. lowest version first , highest version last. immediate problems see solution two-fold.

  • first, have create 2 different regexes numbers want instead of capturing groups 1 , 2. because know there no multiline lambdas, don't know how reuse single regex object instead of creating second.
  • secondly, works long version numbers 2 numbers separated single period. first element 2.5.1, sorted correct place current method wouldn't know how tie break 2.5.2, or 2.5.3, or string arbitrary number of version points.

so works, there's got better way it. how can improve this?

this not full answer, far along road one.

the return value of key function can tuple, , tuples sort naturally. want output key function be:

((2, 5, 1), 'jakarta-jmeter') ((2, 6), 'apache-jmeter') etc. 

do note poor use case lambda regardless.


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

javascript - How to insert selected radio button value into table cell -

css - Transitioning Transforms in Safari look terrible/shaky/stuttering -