kmrshell

NAME
SYNOPSIS
DESCRIPTION
OPTIONS

NAME

kmrshell − map-reduce by shell command pipeline

SYNOPSIS

kmrshell -m mapper -r reducer file

DESCRIPTION

kmrshell performs map-reduce by pipelining shell commands, also called "streaming" in Hadoop. It forks and execs processes of a mapper, a shuffler, and a reducer, and they are joined through pipes. kmrshell reads a file and passes the contents to a mapper. When a directory name is given instead of a file, kmrshell reads all regular files under the directory and passes them to a mapper (not recursive). Shuffler is written with KMR.

OPTIONS

The following options are supported:

-m mapper

Specifies a mapper program. The program can have arguments, where they are separated by a whitespace.

Mapper specification: A mapper reads data from STDIN and outputs key-value data to STDOUT. Output data is a sequence of a line "key value\n", where the fields are separated by a whitespace.

-r reducer

Specifies a reducer program. The program can have arguments, where they are separated by a whitespace.

Reducer Specification: A reducer reads key-value data from STDIN and outputs the result to STDOUT. Input data is a sequence of a line "key value\n", where the fields are separated by a whitespace. Lines with the same keys constitutes consecutive lines.