uspTukubaiコマンドに関する様々な資料がここにあります。

 

 

魂内検索

DOCUMENTS

COMMAND

FORUM

UEC DOCS

VIDEO

 

Alphabetical list

(Video materials in preparation)

tagkeycut(USP)

Name

tagkeycut : Split a file based on a key

         (The file doesn't need to be sorted on the key)

Synopsis

Usage   : tagkeycut [options] <filename> <file>

Options : -d : delete key

          -a : append file

          -z : compress

Version : Tue Jan  9 09:02:34 JST 2024

Edition : 1

Description

Reads in <file> and splits it into multiple files where the key

field specified in <filename> has the same values.

For example, if you want to split the file into multiple files where

the field with tag name KEY contains the same value, specify the

filename as "data.%KEY".  The names of the output files will be

data(KEY field value).  The key field must be sorted to use tagkeycut.

(Files are output when the value of the field changes.)

The tag key field is specified in <filename> using %<tag name>, but you

can also specify substrings such as %KEY.2 or %KEY.1.3.

When specifying with %, you can put braces {} around KEY such as

"%{KEY}" in order to make the range more clear.  If the character

after the tag name is nt "- . / % }" then you must use braces.

Example 1

$ cat data

PNUM PREF CNUM CITY D1 D2 D3 D4 D5

01 New_Hampshire 03 Nashua 82 0 23 84 10

01 New_Hampshire 01 Manchester 91 59 20 76 54

02 Massachusetts 04 Boston 30 50 71 36 30

02 Massachusetts 05 Worcester 78 13 44 28 51

03 Vermont 10 Burlington 52 91 44 9 0

03 Vermont 12 Rutland 95 60 35 93 76

04 New_York 13 Brooklyn 92 56 83 96 75

04 New_York 16 Manhattan 45 21 24 39 03

$ tagkeycut data.%1 data

$ ls -l data.*

-rw-r--r-- 1 usp usp 122  May 18 16:54 data.01       ↑

-rw-r--r-- 1 usp usp 117  May 18 16:54 data.02 Split into

-rw-r--r-- 1 usp usp 112  May 18 16:54 data.03 four files

-rw-r--r-- 1 usp usp 126  May 18 16:54 data.04       ↓

$ cat data.01

PNUM PREF CNUM CITY D1 D2 D3 D4 D5

01 New_Hampshire 03 Nashua 82 0 23 84 10

01 New_Hampshire 01 Manchester 91 59 20 76 54

$ cat data.02

PNUM PREF CNUM CITY D1 D2 D3 D4 D5

02 Massachusetts 04 Boston 30 50 71 36 30

02 Massachusetts 05 Worcester 78 13 44 28 51

$ cat data.03

PNUM PREF CNUM CITY D1 D2 D3 D4 D5

03 Vermont 10 Burlington 52 91 44 9 0

03 Vermont 12 Rutland 95 60 35 93 76

$ cat data.04

PNUM PREF CNUM CITY D1 D2 D3 D4 D5

04 New_York 13 Brooklyn 92 56 83 96 75

04 New_York 16 Manhattan 45 21 24 39 03

Example 2 (Specify Substring)

$ tagkeycut data.%1.2.1 data

$ ls -l data.*

-rw-r--r-- 1 usp usp 122  May 18 16:58 data.1

-rw-r--r-- 1 usp usp 117  May 18 16:58 data.2

-rw-r--r-- 1 usp usp 112  May 18 16:58 data.3

-rw-r--r-- 1 usp usp 126  May 18 16:58 data.4

Example 3 (-a option specified)

If the -a option is specified, the lines are appended to the output

file.

If the output file doesn't exist, a new one is created.

If you do not specify this option, the file is overwritten.

$ tagkeycut data.%1 data

$ tagkeycut -a data.%1 data

$ ls -l data.*

-rw-r--r-- 1 usp usp 209  May 18 17:00 data.01

-rw-r--r-- 1 usp usp 199  May 18 17:00 data.02

-rw-r--r-- 1 usp usp 189  May 18 17:00 data.03

-rw-r--r-- 1 usp usp 217  May 18 17:00 data.04

$ cat data.01

PNUM PREF CNUM CITY D1 D2 D3 D4 D5

01 New_Hampshire 03 Nashua 82 0 23 84 10

01 New_Hampshire 01 Manchester 91 59 20 76 54

01 New_Hampshire 03 Nashua 82 0 23 84 10

01 New_Hampshire 01 Manchester 91 59 20 76 54

Example 4 (Using -d option)

If you specify the -d option, all records omitting the key field

are output to the file.  Even if the key field is specified as a

subscript (such as "%KEY.2.1") the entire key field is omitted.

$ tagkeycut -d data.%1 data

$ ls -l data.*

-rw-r--r-- 1 usp usp 111  May 18 17:03 data.01

-rw-r--r-- 1 usp usp 106  May 18 17:03 data.02

-rw-r--r-- 1 usp usp 101  May 18 17:03 data.03

-rw-r--r-- 1 usp usp 115  May 18 17:03 data.04

$ cat data.01

PREF CNUM CITY D1 D2 D3 D4 D5

New_Hampshire 03 Nashua 82 0 23 84 10

New_Hampshire 01 Manchester 91 59 20 76 54

Example 5 (-z option specified)

If you specify the -z option, the output files will be compressed

using gzip.

$ tagkeycut -z data.%1.gz data

$ ls -l data.*

-rw-r--r-- 1 usp usp 131  May 18 17:05 data.01.gz

-rw-r--r-- 1 usp usp 126  May 18 17:05 data.02.gz

-rw-r--r-- 1 usp usp 115  May 18 17:05 data.03.gz

-rw-r--r-- 1 usp usp 132  May 18 17:05 data.04.gz

$ gunzip < data.01.gz

PREF CNUM CITY D1 D2 D3 D4 D5

01 New_Hampshire 03 Nashua 82 0 23 84 10

01 New_Hampshire 01 Manchester 91 59 20 76 54

Note 1

tagkeycut.c uses zlib.  When compiling, please use:

$ cc -static -O3 -o /home/TOOL/tagkeycut tagkeycut.c -lz

Note 2

If you use the -a option and the -z option together, you can append

to an existing compressed file.  The resulting file can be properly

decompressed with gunzip.