COMMAND（sorter）

sorter(USP)

Name

sorter : Split a file based on a key

(The file doesn't need to be sorted on the key)

Synopsis

Usage : sorter [options] <filename> <file>

Options : -d : delete key

-a : append file

-z : compress

-s : storage size

Version : Tue Jan 9 09:02:34 JST 2024

Edition : 1

Description

Read in <file> and then write records who share the same value in

the key field to separate files using <filename>. For example, to

write records with the same value in field 2 to separate files,

specify "data.%2" as <filename>. The output files will be named

data.<value of field 2>. Unlike keycut, sorter does not require

the input file to be sorted on the key field. Also, records are

written to each output file in the order they appear in the input

file, so the output files are not sorted. The key field is specified

in <filename> using %<field number>, but you can also speficy

substrings such as %5.2 or %5.1.3.

Example 1

$ cat data

04 Connecticut 13 Hartford 92 56 83 96 75

01 Texas 03 Houston 82 0 23 84 10

03 New_Jersey 10 Newark 52 91 44 9 0

02 New_York 04 Manhattan 30 50 71 36 30

01 Texas 01 Austin 91 59 20 76 54

03 New_Jersey 12 Trenton 95 60 35 93 76

04 Connecticut 16 Bridgetown 45 21 24 39 03

02 New_York 05 Brooklyn 78 13 44 28 51

$ sorter data.%1 data

$ ls -l data.*

-rw-r--r-- 1 usp usp 87 Feb 19 11:14 data.01 ↑

-rw-r--r-- 1 usp usp 82 Feb 19 11:14 data.02 Split into

-rw-r--r-- 1 usp usp 77 Feb 19 11:14 data.03 four files

-rw-r--r-- 1 usp usp 91 Feb 19 11:14 data.04 ↓

$ cat data.01

01 Texas 03 Houston 82 0 23 84 10

01 Texas 01 Austin 91 59 20 76 54

$ cat data.02

02 New_York 04 Manhattan 30 50 71 36 30

02 New_York 05 Brooklyn 78 13 44 28 51

$ cat data.03

03 New_Jersey 10 Newark 52 91 44 9 0

03 New_Jersey 12 Trenton 95 60 35 93 76

$ cat data.04

04 Connecticut 13 Hartford 92 56 83 96 75

04 Connecticut 16 Bridgetown 45 21 24 39 03

Example 2 (Using a Substring)

$ sorter data.%1.2.1 data

$ ls -l data.*

-rw-r--r-- 1 usp usp 87 Feb 19 11:15 data.1

-rw-r--r-- 1 usp usp 82 Feb 19 11:15 data.2

-rw-r--r-- 1 usp usp 77 Feb 19 11:15 data.3

-rw-r--r-- 1 usp usp 91 Feb 19 11:15 data.4

Example 3 (Using -a option)

If you specify the -a option, the output files will be appended

instead of replaced. If the output file doesn't already exist

it will be created. If you don't specify this option and the

file already exists, it is overwritten.

$ sorter data.%1 data

$ sorter -a data.%1 data

$ ls -l data.*

-rw-r--r-- 1 usp usp 174 Feb 19 11:16 data.01

-rw-r--r-- 1 usp usp 164 Feb 19 11:16 data.02

-rw-r--r-- 1 usp usp 154 Feb 19 11:16 data.03

-rw-r--r-- 1 usp usp 182 Feb 19 11:16 data.04

$ cat data.01

01 Texas 03 Houston 82 0 23 84 10

01 Texas 01 Austin 91 59 20 76 54

01 Texas 03 Houston 82 0 23 84 10

01 Texas 01 Austin 91 59 20 76 54

Example 4 (Using the -d option)

If you specify the -d option, the records are written to the

output files without the key field. Even if the key field is

specified as a substring (such as %1.2.1) the entire key field

(in this example, field 1) is skipped.

$ sorter -d data.%1 data

$ ls -l data.*

-rw-r--r-- 1 usp usp 81 Feb 19 13:13 data.01

-rw-r--r-- 1 usp usp 76 Feb 19 13:13 data.02

-rw-r--r-- 1 usp usp 71 Feb 19 13:13 data.03

-rw-r--r-- 1 usp usp 85 Feb 19 13:13 data.04

$ cat data.01

Texas 03 Houston 82 0 23 84 10

Texas 01 Austin 91 59 20 76 54

Example 5 (Using the -z option)

If you specify the -z option, the output files are compressed

using gzip.

$ sorter -z data.%1.gz data

$ ls -l data.*

-rw-r--r-- 1 usp usp 98 Feb 19 13:17 data.01.gz

-rw-r--r-- 1 usp usp 94 Feb 19 13:17 data.02.gz

-rw-r--r-- 1 usp usp 82 Feb 19 13:17 data.03.gz

-rw-r--r-- 1 usp usp 100 Feb 19 13:17 data.04.gz

$ gunzip < data.01.gz

01 Texas 03 Houston 82 0 23 84 10

01 Texas 01 Austin 91 59 20 76 54

Note 1

sorter.c uses zlib. When compiling, make sure to use the following:

$ cc -sstatic -O3 -o /home/TOOL/sorter sorter.c -lz

Note 2

If you use the -a option and the -z option together, a compressed

file is appended to an already compressed file. This new file

can be properly decompressed using gunzip.

Note 3

This command reads the entire input file into memory. However, when

available memory is low or when more than half of physical memory is

used, it writes data to the file and then continues.