shell - Recursively search directory of binary files for hexadecimal sequence? -
the current commands i'm using search hex values (say 0a 8b 02
) involve:
find . -type f -not -name "*.png" -exec xxd -p {} \; | grep "0a8b02" || xargs -0 -p 4
is possible improve given following goals:
- search files recursively
- display offset , filename
- exclude files extensions (above example not search
.png
files) - speed: search needs handle 200,000 files (around 50kb 1mb) in directly totaling ~2gb.
i'm not confident if xargs
working 4 processors. i'm having difficulties printing filename when grep
finds match since piped xxd
. suggestions?
if:
- you have gnu
grep
- and hex bytes search never contain newlines (
0xa
)[1]- if contain nul (
0x
), must providegrep
search string via file (-f
) rather direct argument.
- if contain nul (
the following command there, using example of searching 0e 8b 02
:
lc_all=c find . -type f -not -name "*.png" -exec grep -fhoab $'\x{0e}\x{8b}\x{02}' {} + | lc_all=c cut -d: -f1-2
the grep
command produces output lines follows:
<filename>:<byte-offset>:<matched-bytes>
which lc_all=c cut -d: -f1-2
reduces <filename>:<byte-offset>
the command almost works bsd grep
, except byte offset reported invariably start of line pattern matched on.
in other words: the byte offset correct if no newlines precede match in file.
also, bsd grep
doesn't support specifying nul (0x0
) bytes part of search string, not when provided via file -f
.
- note there'll no parallel processing, few
grep
invocations, based on usingfind
's-exec ... +
, which,xargs
, passes many filenames fit on command linegrep
@ once. - by letting
grep
search byte sequence directly, there no needxxd
:- the sequence specified ansi c-quoted string, means escape sequences expanded literals shell, enabling grep search resulting string as literal (via
-f
), faster.
the linked articlebash
manual, work inzsh
(andksh
) too.- a gnu grep alternative use
-p
(support prces, perl-compatible regular expressions) non-pre-expanded escape sequences, slower:grep -phoab '\x{0e}\x{8b}\x{02}'
- a gnu grep alternative use
lc_all=c
ensuresgrep
treats each byte own character without applying encoding rules.-f
treats search strings literal (rather regex)-h
prepends relevant input filename each output line; note grep implicitly when given more 1 filename argument-o
report matched strings (byte sequences), not whole line (the concept of line has no meaning in binary files anyway)[2]-a
treats binary files if text files (without this, grep print textbinary file <filename> matches
binary input files matches)-b
reports byte offsets of matches
- the sequence specified ansi c-quoted string, means escape sequences expanded literals shell, enabling grep search resulting string as literal (via
if it's sufficient find @ 1 match in given input file, add -m 1
.
[1] newlines cannot used, because grep invariably treats newlines in search-pattern string separating multiple search patterns. also, grep line-based, can't match across lines; gnu grep's -null-data
option split input nul bytes help, if search byte sequence doesn't comprise nul bytes; you'd have represent byte values escape sequences in regex combined -p
- because you'll need use escape sequence \n
in lieu of actual newlines.
[2] -o
needed make -b
report byte offset of match opposed of beginning of line (as stated, bsd grep always latter, unfortunately); additionally, beneficial report matches here, attempt print entire line result in unpredictably long output lines, given there's no concept of lines in binary files; either way, however, outputting bytes binary file may cause strange rendering behavior in terminal.
Comments
Post a Comment