uspTukubaiコマンドに関する様々な資料がここにあります。

 

 

魂内検索

DOCUMENTS

COMMAND

FORUM

UEC DOCS

VIDEO

 

Alphabetical list

(Video materials in preparation)

utf8nude(USP)

Name

utf8nude : Removes control characters and invalid UTF-8 characters

Synopsis

Usage   : utf8nude <file>

Options : -e

          -i

          -d<string>

Version : Tue Jan  9 09:02:34 JST 2024

Edition : 1

Description

After converting POST data with cgi-name, this command deletes

control characters ans invalid UTF-8 characters. Followings are

subject to br removed.

* control characters except Tab, NewLine and Space.

  (0x00 - 0x08、0x0b - 0x1b、0x7f、0xc2 0x80 - 0xc2 0x9b)

* 0x80 - 0xbf (except trailing bytes of multiple-byte character.

* redundant encode

* 5 and 6 byte UTF-8 codes in old standard

Example 1

Control characters like NUL, DEL are removed.

$ xdump -v data1

61 62 00 63 64 7F 65 66 0A : ab.cd.ef.

$ utf8nude data1 | xdump -v

61 62 63 64 65 66 0A : abcdef.

Multiple-byte characters with trailing byte out of range 0x80 - 0xbf

are also removed.

EF BF C0 0A : ....

$ utf8nude data2 | xdump -v

0A : .

Example 2

If -i option is specfied, this command exits with exit status 1 when

some characters are removed,  If -e option is specfied, this command

immediately exits when a character is removed.

$ utf8nude data1 >out1

$ echo $?

0

$ xdump -v out1

61 62 00 63 64 7F 65 66 0A : ab.cd.ef.

$ utf8nude -i data2 >out2

$ echo $?

1

$ xdump -v out2

61 62 00 63 64 7F 65 66 0A : ab.cd.ef.

$ utf8nude -e data3 >out3

$ echo $?

1

$ xdump -v out3

61 62 : ab

Example 3

The option -d<string> allows you to specify the substitution string.

$ utf8nude -d__ data1 | xdump -v

61 62 5F 5F 63 64 5F 5F 65 66 0A : ab__cd__ef.