Splitting a csv with awk: how to consider returns? -


i have file:

field1|field2|field3|f41;f42|f5 field1|field2|field3|f41|f5| field1|field2|field3|f41;f42;f43|f5 

i want parse , obtain:

field1|field2|field3|f41|f5 field1|field2|field3|f42|f5 ... 

in short make subparsing according semicolumn in field 4. awk script following:

awk < myfile.txt -f\| '{  n=split($4,a,";"); print $1 for(i=0; ++i <= n;) print $1"|"$2"|"$3"|"a[i]"|"$5"|";  }' 

it works, anyway lines not ending "|" first character of following line disappearing! example, given file get:

field1|field2|field3|f41|f5 ield1|field2|field3|f42|f5 

i think due fact there no "|" @ end of line. there way tell awk consider carriage return?

  1. don't write loops using wacky syntax for(i=0; ++i <= n;) obfuscates code (e.g. need think if i 0 or 1 first time through loop since it's not stated). write them intended written for (init;condition;increment): for(i=1;i <= n;i++).
  2. don't redirect input awk, e.g. awk < file 'script', let awk open file awk 'script' file have access filename in scripts.
  3. don't add spurious semi-colons throughout script - not c.
  4. don't print hard-coded field separator multiple times, e.g. print $1"|"$2"|"$3"|"a[i]"|"$5, use ofs designed instead: ofs="|";...;print $1,$2,$3,a[i],$5.
  5. don't use strings in regexp context unless have excellent reason obfuscate, complicate , reduce efficiency of code, e.g. instead of split($4,a,";") should use split($4,a,/;/).
  6. use white space/indentation, surprisingly cheap.

so step 1 rewrite script:

awk < myfile.txt -f\| '{  n=split($4,a,";"); print $1 for(i=0; ++i <= n;) print $1"|"$2"|"$3"|"a[i]"|"$5"|";  }' 

as:

awk ' begin { fs=ofs="|" } {     n=split($4,a,/;/)     print $1     for(i=1; i<=n; i++)         print $1, $2, $3, a[i], $5, ""  } ' myfile.txt 

from that, fixing for loop syntax can see printing first field twice, first time on line of it's own, can change to:

$ awk ' begin { fs=ofs="|" } {     n=split($4,a,/;/)     for(i=1; i<=n; i++)         print $1, $2, $3, a[i], $5, "" } ' myfile.txt field1|field2|field3|f41|f5| field1|field2|field3|f42|f5| field1|field2|field3|f41|f5| field1|field2|field3|f41|f5| field1|field2|field3|f42|f5| field1|field2|field3|f43|f5| 

so - wanted? unfortunately used same values same field positions on input lines can't tell output lines/fields coming input lines/fields , didn't post full expected output can't tell if above expected output or not. it's not clear if want print empty field @ end of every output line or not or whether or not want hard-code number of output fields.

oh, , if characters disappearing in output it's because have control-ms or other spurious control characters in input file. use cat -v see them , dos2unix or similar remove them if control-ms.


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -