python - Hadoop Non-splittable TextInputFormat -
is there way have whole file sent mapper without being split?
i have read this wondering if there way of doing same thing without having generate intermediate file. ideally, existing option on command line hadoop.
i using streaming facility python scripts on amazon emr.
just set configuration property mapred.min.split.size huge (10g):
-d mapred.min.split.size=10737418240 or compress input file using codec isn't splittable (gzip). .gz extension, textinputformat return false issplittable(filesystem, path) method
Comments
Post a Comment