hadoop - Pig load multiple sequential files -
assuming multiple files in directory, if 1 passes directory pig load a = load '/somedir/'
it load files @ once(i think in order..i'm not sure). considering if file names dynamic , in sequence e.g. according date, how can 1 call pig load in order? or can unix list directory command used ls
?
/somedir$ls 20150101.csv 20150102.csv 20150104.csv ....... #pig load files @ once while keeping order
pig load statement used read input data specified location. suppose pig command is:
a = load '/data/examples/file.txt';
it means specifying read data file.txt available on location /data/examples/
suppose pig command is: a = load '/data/examples/';
, in directory have multiple file, say
20150101.csv 20150102.csv 20150104.csv
it means specifying read data directory is:/data/examples/
in case, pig find files under directory specify , use them input load statement , read happen sequentially,starting first file.
if directory specify has other directories, files in directories included well.
below link useful understand load
function in depth.
http://pig.apache.org/docs/r0.8.1/udf.html#load+functions
http://chimera.labs.oreilly.com/books/1234000001811/ch05.html#pl_load
Comments
Post a Comment