python - Hadoop Non-splittable TextInputFormat -
is there way have whole file sent mapper without being split?
i have read this wondering if there way of doing same thing without having generate intermediate file. ideally, existing option on command line hadoop.
i using streaming
facility python scripts on amazon emr.
just set configuration property mapred.min.split.size
huge (10g):
-d mapred.min.split.size=10737418240
or compress input file using codec isn't splittable (gzip). .gz extension, textinputformat return false issplittable(filesystem, path)
method
Comments
Post a Comment