PDA

View Full Version : Linux Big5 to GB converter



Omega Blue
29th May 2006, 07:56 AM
I just need a simple, command line based program that converts Big5 text to GB. Does anybody know about this?

There's nothing on Freshmeat I can find. There are 3 on SourceForge but the most relevant one is of version 0.0.1. Hm.

Google didn't turn up much, either.

TIA.

markkuk
29th May 2006, 08:36 AM
iconv can do that, it's included in Fedora by default.

man iconv
iconv --from-code BIG-5 --to-code GB --output outputfile inputfile

Omega Blue
29th May 2006, 09:16 AM
Thanks, I'll try it out.

Omega Blue
30th May 2006, 04:22 AM
It doesn't work? I tried with the following sample:



龍天後土

in a file "foo.txt" like this:



iconv -f BIG5 -t GB foo.txt

and got "iconv: illegal input sequence at position 0."

Any ideas?

markkuk
30th May 2006, 09:26 AM
The error means that the input file "foo.txt" begins with a byte sequence that isn't valid in the chosen input encoding. Are you sure foo.txt is in BIG-5? Maybe it's in BIG5-HKSCS?

Omega Blue
30th May 2006, 09:32 AM
Okay. If I run just "iconv foo.txt" it reads the input correctly to output. When I run "iconv -f big5 foo.txt" it outputs the first character incorrectly, then aborts with "illegal input sequence at position 2." If I use "iconv -f big5-hkscs foo.txt" instead it outputs all 4 characters, although incorrectly, then aborts with "illegal input sequence at position 8."

Is there a utility to find out the encoding of the file?

Edit: I figured it out, it's in utf-8. However when I add the output part I still get the original error. Any ideas?

Omega Blue
30th May 2006, 09:46 AM
Further experimentation shows that iconv can't convert to GB from utf-8, but can convert to Big5 from utf-8. Very strange.

markkuk
30th May 2006, 10:32 AM
Is there a utility to find out the encoding of the file?

See here: http://www.mandarintools.com/sinodetect.html (needs Java).