Python to combine lines in a txt file -
a question regarding combine lines in txt file.
file contents below (movie subtitles). want combine subtitles, english words , sentences in each paragraph 1 line, instead of showing either 1, 2 or 3 lines separably.
could please advise method feasible in python? many thanks.
1 00:00:23,343 --> 00:00:25,678 been while since here in front of you. 2 00:00:25,762 --> 00:00:28,847 maybe i'll favour , stick cards. 3 00:00:31,935 --> 00:00:34,603 there's been speculation involved in events occurred on freeway , rooftop... 4 00:00:36,189 --> 00:00:39,233 sorry, mr stark, expect believe 5 00:00:39,317 --> 00:00:42,903 bodyguard in suit conveniently appeared, 6 00:00:42,987 --> 00:00:45,698 despite fact sorely despise bodyguards? 7 00:00:45,782 --> 00:00:46,907 yes. 8 00:00:46,991 --> 00:00:51,662 , mysterious bodyguard somehow equipped
intuitive solution
a simple solution based on 4 types of lines can have:
- an empty line
- a number indicating position (no letters)
- a timing subtitle (with specific pattern; no letters)
- text
you can loop on each line, classifying them, , act accordingly.
in fact, "action" non-text not-empty line (timeline , numeric) same. thus:
import re open('yourfile.txt') f: exampletext = f.read() new = '' line in exampletext.split('\n'): if line == '': new += '\n\n' elif re.search('[a-za-z]', line): # check if there text new += line + ' ' else: new += line + '\n'
result:
>>> print(new) 1 00:00:23,343 --> 00:00:25,678 been while since here in front of you. 2 00:00:25,762 --> 00:00:28,847 maybe i'll favour , stick cards. ...
regex explained:
[]
indicates of characters insidea-z
indicates range of characters a-za-z
indicates range of characters a-z
Comments
Post a Comment