(Video materials in preparation)
keycut : Split a file based on a key field (Key must be sorted)
Usage : keycut [options] <filename> <file>
Options : -d : delete key
-a : append file
-z : compress
Version : Tue Jan 9 09:02:34 JST 2024
Edition : 1
Reads in <file> and splits it into multiple files where the key
field specified in <filename> has the same values.
For example, if you want to split the file into multiple files where
the 2nd field contains the same value, specify the filename as
"data.%2". The names of the output files will be data.(2nd field
value). The key field must be sorted to use keycut.
(Files are output when the value of the field changes.)
The key field specified in <filename> should be written as "%(Field #)",
but you can speficy substrings as "%5.2, %5.1.3," etc.
$ cat data
01 Massachusetts 03 Springfield 82 0 23 84 10
01 Massachusetts 01 Boston 91 59 20 76 54
02 New_York 04 Manhattan 30 50 71 36 30
02 New_York 05 Brooklyn 78 13 44 28 51
03 New_Jersey 10 Newark 52 91 44 9 0
03 New_Jersey 12 Moorestown 95 60 35 93 76
04 Pennsylvania 13 Philadelphia 92 56 83 96 75
04 Pennsylvania 16 Hershey 45 21 24 39 03
$ keycut data.%1 data
$ ls -l data.*
-rw-r--r-- 1 usp usp 87 Feb 19 11:14 data.01 ↑
-rw-r--r-- 1 usp usp 82 Feb 19 11:14 data.02 Split into 4
-rw-r--r-- 1 usp usp 77 Feb 19 11:14 data.03 files
-rw-r--r-- 1 usp usp 91 Feb 19 11:14 data.04 ↓
$ cat data.01
01 Massachusetts 03 Springfield 82 0 23 84 10
01 Massachusetts 01 Boston 91 59 20 76 54
$ cat data.02
02 New_York 04 Manhattan 30 50 71 36 30
02 New_York 05 Brooklyn 78 13 44 28 51
$ cat data.03
03 New_Jersey 10 Newark 52 91 44 9 0
03 New_Jersey 12 Moorestown 95 60 35 93 76
$ cat data.04
04 Pennsylvania 13 Philadelphia 92 56 83 96 75
04 Pennsylvania 16 Hershey 45 21 24 39 03
$ keycut data.%1.2.1 data
$ ls -l data.*
-rw-r--r-- 1 usp usp 87 Feb 19 11:15 data.1
-rw-r--r-- 1 usp usp 82 Feb 19 11:15 data.2
-rw-r--r-- 1 usp usp 77 Feb 19 11:15 data.3
-rw-r--r-- 1 usp usp 91 Feb 19 11:15 data.4
When you specify the -a option, the split files are appended to
the specified file.
If the specified file doesn't exist, it is created. If you do not
use the -a option, existing files are overwritten.
$ keycut data.%1 data
$ keycut -a data.%1 data
$ ls -l data.*
-rw-r--r-- 1 usp usp 174 Feb 19 11:16 data.01
-rw-r--r-- 1 usp usp 164 Feb 19 11:16 data.02
-rw-r--r-- 1 usp usp 154 Feb 19 11:16 data.03
-rw-r--r-- 1 usp usp 182 Feb 19 11:16 data.04
$ cat data.01
01 Massachusetts 03 Springfield 82 0 23 84 10
01 Massachusetts 01 Boston 91 59 20 76 54
01 Massachusetts 03 Springfield 82 0 23 84 10
01 Massachusetts 01 Boston 91 59 20 76 54
If you specify the -d option, all records omitting the key field
are output to the file. Even if the key field is specified as a
subscript (such as %1.2.1) the entire key field is omitted.
$ keycut -d data.%1 data
$ ls -l data.*
-rw-r--r-- 1 usp usp 81 Feb 19 13:13 data.01
-rw-r--r-- 1 usp usp 76 Feb 19 13:13 data.02
-rw-r--r-- 1 usp usp 71 Feb 19 13:13 data.03
-rw-r--r-- 1 usp usp 85 Feb 19 13:13 data.04
$ cat data.01
Massachusetts 03 Springfield 82 0 23 84 10
Massachusetts 01 Boston 91 59 20 76 54
If you specify the -z option, the output files will be compressed
using gzip.
$ keycut -z data.%1.gz data
$ ls -l data.*
-rw-r--r-- 1 usp usp 98 Feb 19 13:17 data.01.gz
-rw-r--r-- 1 usp usp 94 Feb 19 13:17 data.02.gz
-rw-r--r-- 1 usp usp 82 Feb 19 13:17 data.03.gz
-rw-r--r-- 1 usp usp 100 Feb 19 13:17 data.04.gz
$ gunzip < data.01.gz
01 Massachusetts 03 Springfield 82 0 23 84 10
01 Massachusetts 01 Boston 91 59 20 76 54
keycut.c uses zlib. When compiling, use the format below:
$ cc keycut.c -lz -o /home/TOOL/keycut
If you use the -a option and the -z option together, you can append
to an existing compressed file. The resulting file can be properly
decompressed with gunzip.