python - Hadoop Non-splittable TextInputFormat -


is there way have whole file sent mapper without being split?

i have read this wondering if there way of doing same thing without having generate intermediate file. ideally, existing option on command line hadoop.

i using streaming facility python scripts on amazon emr.

just set configuration property mapred.min.split.size huge (10g):

-d mapred.min.split.size=10737418240 

or compress input file using codec isn't splittable (gzip). .gz extension, textinputformat return false issplittable(filesystem, path) method


Comments

Popular posts from this blog

django - How can I change user group without delete record -

java - Need to add SOAP security token -

java - EclipseLink JPA Object is not a known entity type -