uspTukubaiコマンドに関する様々な資料がここにあります。

 

 

魂内検索

DOCUMENTS

COMMAND

FORUM

UEC DOCS

VIDEO

 

Alphabetical list

(Video materials in preparation)

keycut(USP)

Name

keycut : Split a file based on a key field (Key must be sorted)

Synopsis

Usage   : keycut [options] <filename> <file>

Options : -d : delete key

          -a : append file

          -z : compress

Version : Tue Jan  9 09:02:34 JST 2024

Edition : 1

Description

Reads in <file> and splits it into multiple files where the key

field specified in <filename> has the same values.

For example, if you want to split the file into multiple files where

the 2nd field contains the same value, specify the filename as

"data.%2".  The names of the output files will be data.(2nd field

value).  The key field must be sorted to use keycut.

(Files are output when the value of the field changes.)

The key field specified in <filename> should be written as "%(Field #)",

but you can speficy substrings as "%5.2, %5.1.3," etc.

Example 1

$ cat data

01 Massachusetts 03 Springfield  82 0 23 84 10

01 Massachusetts 01 Boston       91 59 20 76 54

02 New_York      04 Manhattan    30 50 71 36 30

02 New_York      05 Brooklyn     78 13 44 28 51

03 New_Jersey    10 Newark       52 91 44 9  0

03 New_Jersey    12 Moorestown   95 60 35 93 76

04 Pennsylvania  13 Philadelphia 92 56 83 96 75

04 Pennsylvania  16 Hershey      45 21 24 39 03

$ keycut data.%1 data

$ ls -l data.*

-rw-r--r-- 1 usp usp 87 Feb 19 11:14 data.01       ↑

-rw-r--r-- 1 usp usp 82 Feb 19 11:14 data.02 Split into 4

-rw-r--r-- 1 usp usp 77 Feb 19 11:14 data.03 files

-rw-r--r-- 1 usp usp 91 Feb 19 11:14 data.04       ↓

$ cat data.01

01 Massachusetts 03 Springfield  82 0 23 84 10

01 Massachusetts 01 Boston       91 59 20 76 54

$ cat data.02

02 New_York      04 Manhattan    30 50 71 36 30

02 New_York      05 Brooklyn     78 13 44 28 51

$ cat data.03

03 New_Jersey    10 Newark       52 91 44 9  0

03 New_Jersey    12 Moorestown   95 60 35 93 76

$ cat data.04

04 Pennsylvania  13 Philadelphia 92 56 83 96 75

04 Pennsylvania  16 Hershey      45 21 24 39 03

Example 2 (Using substrings)

$ keycut data.%1.2.1 data

$ ls -l data.*

-rw-r--r-- 1 usp usp 87 Feb 19 11:15 data.1

-rw-r--r-- 1 usp usp 82 Feb 19 11:15 data.2

-rw-r--r-- 1 usp usp 77 Feb 19 11:15 data.3

-rw-r--r-- 1 usp usp 91 Feb 19 11:15 data.4

Example 3 (Using -a option)

When you specify the -a option, the split files are appended to

the specified file.

If the specified file doesn't exist, it is created.  If you do not

use the -a option, existing files are overwritten.

$ keycut data.%1 data

$ keycut -a data.%1 data

$ ls -l data.*

-rw-r--r-- 1 usp usp 174 Feb 19 11:16 data.01

-rw-r--r-- 1 usp usp 164 Feb 19 11:16 data.02

-rw-r--r-- 1 usp usp 154 Feb 19 11:16 data.03

-rw-r--r-- 1 usp usp 182 Feb 19 11:16 data.04

$ cat data.01

01 Massachusetts 03 Springfield 82 0  23 84 10

01 Massachusetts 01 Boston      91 59 20 76 54

01 Massachusetts 03 Springfield 82 0  23 84 10

01 Massachusetts 01 Boston      91 59 20 76 54

Example 4 (Using -d option)

If you specify the -d option, all records omitting the key field

are output to the file.  Even if the key field is specified as a

subscript (such as %1.2.1) the entire key field is omitted.

$ keycut -d data.%1 data

$ ls -l data.*

-rw-r--r-- 1 usp usp 81 Feb 19 13:13 data.01

-rw-r--r-- 1 usp usp 76 Feb 19 13:13 data.02

-rw-r--r-- 1 usp usp 71 Feb 19 13:13 data.03

-rw-r--r-- 1 usp usp 85 Feb 19 13:13 data.04

$ cat data.01

Massachusetts 03 Springfield 82 0  23 84 10

Massachusetts 01 Boston      91 59 20 76 54

Example 5 (Using -z option)

If you specify the -z option, the output files will be compressed

using gzip.

$ keycut -z data.%1.gz data

$ ls -l data.*

-rw-r--r-- 1 usp usp  98 Feb 19 13:17 data.01.gz

-rw-r--r-- 1 usp usp  94 Feb 19 13:17 data.02.gz

-rw-r--r-- 1 usp usp  82 Feb 19 13:17 data.03.gz

-rw-r--r-- 1 usp usp 100 Feb 19 13:17 data.04.gz

$ gunzip < data.01.gz

01 Massachusetts 03 Springfield 82 0  23 84 10

01 Massachusetts 01 Boston      91 59 20 76 54

Note 1

keycut.c uses zlib.  When compiling, use the format below:

$ cc keycut.c -lz -o /home/TOOL/keycut

Note 2

If you use the -a option and the -z option together, you can append

to an existing compressed file.  The resulting file can be properly

decompressed with gunzip.