sorting - bug in bash sort with different columns? -


i working file contains 3 values, id (they happen protein ids in case curious), value, , value. tab delimited, looks this:

a2m     0.979569315988908       1 aacs    0.925340159491081       1 aagab   0.982296215686199       1 aak1    0.736903840140103       1 aamp    0.00589711816127862     0.138868449447202 aars2   1       1 aars    3.13300124295614e-05    0.00212792325492566 aarsd1  0.527417792161261       1 aasdh   0.869909252023668       1 aasdhppt        0.763918221284724       1 aatf    0.691907759125663       1 abat    0.989693691462661       1 abca1   0.601194017450064       1 abca5   1       1 abca6   1       1 

i interested in sorting these ids in alphabetical order , extracting various values. however, noticed sort sorts ids differently, depending on extracting. when execute:

    cut --fields\=1,2 input.txt|sort --key=1 

the resulting file is:

a2m     0.979569315988908 aacs    0.925340159491081 aagab   0.982296215686199 aak1    0.736903840140103 aamp    0.00589711816127862 aars2   1 aars    3.13300124295614e-05  aarsd1  0.527417792161261 aasdh   0.869909252023668 aasdhppt        0.763918221284724 aatf    0.691907759125663 abat    0.989693691462661 abca1   0.601194017450064 abca5   1 abca6   1 

but when execute:

cut --fields\=1,3 input.txt|sort --key=1 

i get

a2m     1 aacs    1 aagab   1 aak1    1 aamp    0.138868449447202 aars    0.00212792325492566 aars2   1 aarsd1  1 aasdh   1 aasdhppt        1 aatf    1 abat    1 abca1   1 abca5   1 abca6   1 

notice positions of aars , aars2 switched, shouldn't since sorting based on first column. i've never seen behavior sort, , i've been using bash while now. bug, or doing wrong?

the --key=1 option tells sort use "fields" first through end of line sort input. @rici observed first, default locale-sensitive sort, , in many locales whitespace ignored collation purposes. that's seems happening here.

if want sort only on protein ids, this:

cut --fields=1,2 input.txt | sort --key=1,1 cut --fields=1,3 input.txt | sort --key=1,1 

@rici explains how approach problem specifying collation order accounts whitespace.


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -