PDA

View Full Version : Non-ascii characters in filenames



ordinary
14th July 2007, 12:08 AM
I have an FC6 box that exports an ext3 filesystem via NFS and Samba. The filesystem is used to distribute music files around my network. In several cases, the music files have non-ascii characters in the filenames. In particular, accute accents (in Bela Fleck, for instance), umlauts, and other such. The files were named, variously, by Sound Juicer from Linux, Creative Media-Source from Winders, and maybe a couple of other programs.

Here's the problem: Some files with non-ascii characters are not visible. They don't show up in Nautilus nor does ls (at the command line) list them. However, I know they're there from this:


[phil@frederic all_formats]$ find . -name stuff -print
find: ./wmas/José-Luis Garcia, Anthony Halstead, Leonard Slatkin; English Chamber Orchestra: No such file or directory
find: ./wmas/Frédéric Chopin: No such file or directory
find: ./wmas/Antonin Dvorák: No such file or directory
find: ./oggs/Georg Friedrich Händel: No such file or directory
find: ./oggs/Schönhertz & Scott: No such file or directory
[phil@frederic all_formats]$

I'd like to rename these files and get rid of the accents and umlauts and so forth. So far I can't find a way to do that, though.

My questions are:

Isn't there a way to use an escape sequence to refer to a non-ascii character numerically? Can I use this to rename the files?

Is there a way to make files whose names contain non-ascii characters available and visible? Is it a matter of setting environment variables (LANG, or LC_, or something?)

Other recomendations?

my machine is:


[phil@frederic ~]$ uname -a
Linux frederic 2.6.20-1.2944.fc6xen #1 SMP Tue Apr 10 18:03:37 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
[phil@frederic ~]$

Thanks,
Phil

wneumann
14th July 2007, 12:14 AM
Use quotes:

mv "Antonin Dvorák" Antonin_Dvorak

will do what you want

ordinary
14th July 2007, 01:08 AM
No, wneumann, I like the simplicity of your idea, but it just isn't workable. Because of the non-ascii characters, the file is more or less invisible. Well, kind of, anyway. It is visible enough that filename completion completes it, but still invisible enough that the shell doesn't really recognize it.

Take a look at:


[phil@frederic wmas]$ mv "Antonin Dvorák" Antonin\ Dvorak
mv: cannot stat `Antonin Dvorák': No such file or directory
[phil@frederic wmas]$ mv 'Antonin Dvorák' Antonin\ Dvorak
mv: cannot stat `Antonin Dvorák': No such file or directory
[phil@frederic wmas]$ mv Antonin\ Dvorák Antonin\ Dvorak
mv: cannot stat `Antonin Dvorák': No such file or directory
[phil@frederic wmas]$ ls Antonin\ Dvorák
ls: Antonin Dvorák: No such file or directory
[phil@frederic wmas]$

In each command, I used filename completion to get the file name (i.e. Antonin\ Dvorák), but still, the file is invisible. Any other ideas?

Please note that the file is really there, otherwise what name is bash completing, and what name is find (in the example in my previous post) reporting as "No such file..."?

Thanks, though,
Phil

Hlingler
14th July 2007, 05:53 AM
LANG is an environment variable:
[Vince@presario ~]$ declare|grep LANG
LANG=en_US.UTF-8
[Vince@presario ~]$But I don't know the exact option to change it to that would give you what you want (export LANG=?????). Maybe try changing your global language setting? But I also assume that you must have fonts to support it, so...?

wneumann
15th July 2007, 09:30 AM
Funny. Works for me (I did 'touch "Antonin Dvorák"' to create the file and could then move and rename it at will).

Anyway, second suggestion: I've always found dired in emacs will handle filenames that nothing else can touch. (Start emacs and then do esc-x dired).

lazlow
15th July 2007, 09:34 AM
You might try, under view in nautilus, show hidden files.

ls -a

Just a couple of guesses.

ordinary
16th July 2007, 07:19 PM
Well, of course you are right, wneumann, I can touch and mv "Antonin Dvorák", too, and it seems to work just as expected. My filesystem seems to be corrupt somehow. Interestingly, it seems to affect only, but not necessarily all, files with non ascii characters in the name.

(Please note that I had to move my diagnostic activities with this problem to an Ubuntu box. My FC6 box seems to have a flaky motherboard (an MSI K9N SLI Platinum). It always has been, but now I'm having CMOS checksum errors, and I don't want to fight with that while I fight with this. The filesystem in question is on an external USB disk, so it is easy enough to move from box to box.)

I unmounted the filesystem and ran fsck

phil@selma:~$ sudo fsck -V /dev/sdb2 -f
fsck 1.40-WIP (14-Nov-2006)
[/sbin/fsck.ext3 (1) -- /dev/sdb2] fsck.ext3 -f /dev/sdb2
e2fsck 1.40-WIP (14-Nov-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Problem in HTREE directory inode 2588675: node (2) has bad min hash
Invalid HTREE directory inode 2588675 (/all_formats/mp3s). Clear<y>? yes

Problem in HTREE directory inode 2592776: node (4) has bad min hash
Problem in HTREE directory inode 2592776: node (5) has bad min hash
Problem in HTREE directory inode 2592776: node (7) has bad max hash
Invalid HTREE directory inode 2592776 (/all_formats/oggs). Clear<y>? yes

Problem in HTREE directory inode 6000105: node (2) has bad min hash
Problem in HTREE directory inode 6000105: node (4) has bad min hash
Invalid HTREE directory inode 6000105 (/all_formats/wmas). Clear<y>? yes

Pass 3: Checking directory connectivity
Pass 3A: Optimizing directories
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/sdb2: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdb2: 22683/24133632 files (10.4% non-contiguous), 25879068/48249219 blocks
phil@selma:~$

Then I remounted the disk and had the same problems. I unmounted and ran

phil@selma:~$ sudo fsck -V /dev/sdb2 -f -p -v
Password:
fsck 1.40-WIP (14-Nov-2006)
[/sbin/fsck.ext3 (1) -- /dev/sdb2] fsck.ext3 -f -p -v /dev/sdb2

22683 inodes used (0.09%)
2357 non-contiguous inodes (10.4%)
# of inodes with ind/dind/tind blocks: 17917/9772/0
25879068 blocks used (53.64%)
0 bad blocks
1 large file

18216 regular files
4458 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
22674 files
phil@selma:

So that looked okay, but my problem still existed, so I tried:


phil@selma:~$ sudo fsck -V /dev/sdb2 -f -c -c -v
Password:
fsck 1.40-WIP (14-Nov-2006)
[/sbin/fsck.ext3 (1) -- /dev/sdb2] fsck.ext3 -f -c -c -v /dev/sdb2
e2fsck 1.40-WIP (14-Nov-2006)
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: ^X 8442496/ 48249218
Interrupt caught, cleaning up
phil@selma:

But, because the filesystem is on an external USB disk, that was going to take a week and a half, so I stopped it.

In any case, the problems persist.

Thanks to those who read this, and especially those who replied.

Further suggestions are welcome. I dread re-ripping all those CDs.

Phil

ordinary
18th July 2007, 07:28 PM
Well, I gave up. From what I can figure out, UTF-8 characters should work fine in filenames. I don't understand what created my problems. I tinkered around with debugfs and hd and different locales but didn't manage to fix my filenames. The affected files were backed up, bad filenames and all, so I'll restore the stuff and try to prevent the bad filenames propagating back to the restored filesystem.

Thanks everybody.

wneumann
20th July 2007, 10:41 PM
Did you try emacs method I suggested? Has worked on the craziest filenames for me. Quick intro to emacs dired:

In a terminal enter "emacs directory_name" (or simply "emacs ." if you are in the directory in question) to open emacs in dired mode on the directory. The type h to get instructions if you don't know emacs. You can move around in the directory, type R on a file to rename it, d to mark it for deletion, x to execute pending deletions, etc. Finally, when you are all done, ctrl-X ctrl-C exits.

ordinary
22nd July 2007, 11:43 PM
Wneumann,

Frankly, I didn't. This was a very circuitous investigation, and mostly a fruitless one. I did a lot of rooting around with debugfs(8) and hd(1). Eventually I became convinced that the problem was in the filesystem itself, at some level. This is outside my expertise, so I'm not certain. But debugfs wouldn't even let me access the directories via inode numbers.

However, I have the files backed up on a different filesystem, and the same problems exist there, so tried it today after I read your last post.

Emacs dired showed this in the directory listing:



drwxr-xr-x 3 phil phil 4096 2007-03-19 20:33 Ma Rainey And Her Georgia Band
drwxr-xr-x 3 phil phil 4096 2007-03-19 20:33 Marian McPartland
drwxr-xr-x 3 phil phil 4096 2007-03-19 20:33 Mark O'Connor
drwxr-xr-x 7 phil phil 4096 2007-03-28 15:23 Marty Robbins
drwxr-xr-x 3 phil phil 4096 2007-03-19 20:33 matchbox twenty
?--------- ? ? ? ? ? /media/disk-3/all_formats/oggs/./Béla Fleck and the Flecktones
?--------- ? ? ? ? ? /media/disk-3/all_formats/oggs/./Řystein Sevĺg
drwxr-xr-x 4 phil phil 4096 2007-03-19 20:32 Memphis Minnie
drwxr-xr-x 4 phil phil 4096 2007-03-19 20:33 Mendelssohn
drwxr-xr-x 3 phil phil 4096 2007-03-19 20:33 Meri Wilson
drwxr-xr-x 3 phil phil 4096 2007-03-19 20:33 Merle Haggard


and when an affected file was selected, emacs responded:


No file on this line

Although emacs has surprised me over the years, I guess one can't expect even emacs to overcome faults of the underlying filesystem.

Thanks for your interest. I'm off to re-create that backup.

Phil

P.S. You've probably seen this, if not you may enjoy it.


A young man studying in the temple went seeking the priest. He asked the priest 'Master, does Emacs possess the Buddha nature?' The priest had resided in the temple for a good many years, and was very wise. He thought for a while, and then answered: 'I don't see why not, it's got bloody well everything else.' The young man then achieved enlightenment.