hadoop - Pig load multiple sequential files -


assuming multiple files in directory, if 1 passes directory pig load a = load '/somedir/'it load files @ once(i think in order..i'm not sure). considering if file names dynamic , in sequence e.g. according date, how can 1 call pig load in order? or can unix list directory command used ls?

/somedir$ls  20150101.csv 20150102.csv 20150104.csv .......  #pig load files @ once while keeping order  

pig load statement used read input data specified location. suppose pig command is:

a = load '/data/examples/file.txt'; 

it means specifying read data file.txt available on location /data/examples/

suppose pig command is: a = load '/data/examples/'; , in directory have multiple file, say

20150101.csv 20150102.csv 20150104.csv 

it means specifying read data directory is:/data/examples/ in case, pig find files under directory specify , use them input load statement , read happen sequentially,starting first file.

if directory specify has other directories, files in directories included well.

below link useful understand load function in depth.

http://pig.apache.org/docs/r0.8.1/udf.html#load+functions

http://chimera.labs.oreilly.com/books/1234000001811/ch05.html#pl_load

http://pig.apache.org/docs/r0.8.1/piglatin_ref2.html#load


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -