string - How to calculate and substitute values in a specific column of tabular data? -
given following input:
mccc processed: unknown event at: tue, 14 oct 2014 12:02:26 cst station, mccc delay, std, cc coeff, cc std, pol , t0_times , delay_times zj.uno1 -0.7964 0.0051 0.9690 0.0139 0 graw.bhz 301.1263 -1.8041 zj.dose -0.7065 0.0072 0.9760 0.0133 0 knyn.bhz 301.3372 -1.9249 zj.tres 0.9675 0.0072 0.9548 0.0292 0 leon.bhz 301.2611 -0.1749 phase: p pde 2013 7 15 14 6 58.00 -60.867 -25.143 31.0 0.0 7.3
i want remove mean of 9th column (delay_times) each of delay_times, requires summing 9th column values, dividing number of values, , subtracting mean each of values (-1.8041, -1.9249, -0.1749).
i confused begin endeavor. i've provided starting script below:
#!/usr/bin/perl use strict; use warnings; open $file '<', "file.txt" or die $!; while (<$file>) { ($name, $time) = (split /\s+/, $file)[1,9]; # calculate mean of 9th column every row begins zj, # , subtract mean each value (time) in 9th column. } # output new file mean removed each "time" in 9th column
would easier in awk, or perl? thank you.
your attempted perl solution pretty spot on - didn't finish :-)
an "extended 1 liner" follows:
perl -ane 'push @f,[@f] }{ (@f){ $s += $_->[8] , $n++ if $_->[0] =~ /zj/ } $_->[0] =~ /zj/ ? ( "@{$_}[0..7] ", $_->[8]-($s/$n) ) : "@$_" @f' data.txt
slightly shortened using -an
(see perlrun
) not golfish. awk
solution @john1024 read file twice - in case 2 for
loops. use ternary operator (<cond> ? :
) print out - or say
- each lines, either (@$_
), or field substitution.
output:
mccc processed: unknown event at: tue, 14 oct 2014 12:02:26 cst station, mccc delay, std, cc coeff, cc std, pol , t0_times , delay_times zj.uno1 -0.7964 0.0051 0.9690 0.0139 0 graw.bhz 301.1263 -0.5028 zj.dose -0.7065 0.0072 0.9760 0.0133 0 knyn.bhz 301.3372 -0.6236 zj.tres 0.9675 0.0072 0.9548 0.0292 0 leon.bhz 301.2611 1.1264 phase: p pde 2013 7 15 14 6 58.00 -60.867 -25.143 31.0 0.0 7.3
as script might read follows:
use v5.16; (@timedata, $rec, $sum, $n) ; while (<data>) { push @timedata, [ split(" ") ] ; } foreach $rec (@timedata){ $sum += $rec->[8] , $n++ if $rec->[0] =~ /zj/ ; } foreach $rec (@timedata) { $rec->[0] =~ /zj/ ? ( "@{$rec}[0..7] ", $rec->[8]-($sum/$n) ) : "@$rec" ; } __data__ mccc processed: unknown event at: tue, 14 oct 2014 12:02:26 cst station, mccc delay, std, cc coeff, cc std, pol , t0_times , delay_times zj.uno1 -0.7964 0.0051 0.9690 0.0139 0 graw.bhz 301.1263 -1.8041 zj.dose -0.7065 0.0072 0.9760 0.0133 0 knyn.bhz 301.3372 -1.9249 zj.tres 0.9675 0.0072 0.9548 0.0292 0 leon.bhz 301.2611 -0.1749 phase: p pde 2013 7 15 14 6 58.00 -60.867 -25.143 31.0 0.0 7.3
there may way avoid 2 loops (combining while
or map
for
wouldn't count though), creating sum , average in 1 pass , substituting in makes script clear , simple.
Comments
Post a Comment