SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
What filename characters does Mac OS X support?
Author Message
Post What filename characters does Mac OS X support? 
I'm trying to make rdiff-backup a little friendlier on Mac OS X now.
Does anyone know offhand what characters are allowable in a Mac OS X
filename?

I'm asking because I need to figure out exactly when we don't need to
quote characters. As long as the source directory is non-empty (and
has a filename that has a letter in it), then we can tell whether that
directory is case-sensitive. But it's harder to tell whether or not a
source (and thus read-only) directory supports characters like a colon
or a backslash.

So if Mac OS X supports everything except case sensitivity (and '/',
and NULL). But if there is other support missing things could become
more complicated.


--
Ben Escoto

Post What filename characters does Mac OS X support? 
On 20 Oct 2005, at 21:26, Ben Escoto wrote:

I'm trying to make rdiff-backup a little friendlier on Mac OS X now.
Does anyone know offhand what characters are allowable in a Mac OS X
filename?

I'm asking because I need to figure out exactly when we don't need to
quote characters. As long as the source directory is non-empty (and
has a filename that has a letter in it), then we can tell whether that
directory is case-sensitive. But it's harder to tell whether or not a
source (and thus read-only) directory supports characters like a colon
or a backslash.

So if Mac OS X supports everything except case sensitivity (and '/',
and NULL). But if there is other support missing things could become
more complicated.

At the GUI level, it seems that OS X allows any character except a
colon and NULL. "/" characters are legal, but the GUI translates
them to ":" behind the scenes at the unix level. So a file named
"crazy/name.txt" at the GUI level is actually named "crazy:name.txt"
at the unix level.

This strange system of translating "/" to ":" is due to the fact that
OS 9 and earlier used ":" as the path specifier - the equivalent of
"/" in unix, or "\" in DOS. "/" was legal in filenames in OS 9.
When OS X came along, they needed a way to handle users who had
legacy files with a "/" in them, so they came up with this scheme to
translate them to ":".

At the unix level where rdiff-backup works, every character appears
to be legal except "/" and NULL.

The above info is based on what the OS X Help system says, and what I
can find on Google. It is not guaranteed to be 100% complete, but I
am 99.9% sure it is good info.

http://docs.info.apple.com/article.html?path=Mac/10.4/en/mh552.html
http://lists.seas.upenn.edu/pipermail/unison-hackers/2005-March/
000006.html

Kevin Horton
Ottawa, Canada

Post What filename characters does Mac OS X support? 
Ben Escoto wrote:

I'm asking because I need to figure out exactly when we don't need to
quote characters. As long as the source directory is non-empty (and
has a filename that has a letter in it), then we can tell whether that
directory is case-sensitive. But it's harder to tell whether or not a
source (and thus read-only) directory supports characters like a colon
or a backslash.

There are different filesystems coming with OS X Tiger:
Mac OS Extended alias HFS+ (with and without journaling)
Mac OS Extended case sensitive alias HFSx (with and without journaling)
UNIX File System alias UFS

I think that it can also access all Windows filesystems but NTFS can
only been read.

The normally used HFS+ filesystem is not case-sensitive, but it is
case-preserving.
The new HFSx filesystem is case-sensitive.

Optimal would be to check source and destination and quote only if the
source is case-sensitive and the destination is only case-preserving.

Carsten

Post What filename characters does Mac OS X support? 
On 21/10/2005, at 11:26 AM, Ben Escoto wrote:

I'm trying to make rdiff-backup a little friendlier on Mac OS X now.
Does anyone know offhand what characters are allowable in a Mac OS X
filename?

Technote 1150 describes the HFS Plus volume format in great detail:
http://developer.apple.com/technotes/tn/tn1150.html

In particular, there's a table with a list of illegal characters
here: http://developer.apple.com/technotes/tn/tn1150table.html

I'm asking because I need to figure out exactly when we don't need to
quote characters. As long as the source directory is non-empty (and
has a filename that has a letter in it), then we can tell whether that
directory is case-sensitive. But it's harder to tell whether or not a
source (and thus read-only) directory supports characters like a colon
or a backslash.

Maybe it's because I'm new to rdiff-backup, but I can't understand
why you need to determine the capabilities of the source file system?

Post What filename characters does Mac OS X support? 
"Carsten Lorenz" <clorenz < at > hamburg.fcb.com>
wrote the following on Fri, 21 Oct 2005 10:30:55 +0200

Optimal would be to check source and destination and quote only if the
source is case-sensitive and the destination is only case-preserving.

Yes, that is the plan. I thought there might be a complication
because we might have needed to quote ':' or whatever even if both
systems were case insensitive, but it looks like that's not the case.


--
Ben Escoto

Post What filename characters does Mac OS X support? 
Alastair Rankine <arsptr < at > optusnet.com.au>
wrote the following on Sat, 22 Oct 2005 08:29:26 +1000

Technote 1150 describes the HFS Plus volume format in great detail:
http://developer.apple.com/technotes/tn/tn1150.html

In particular, there's a table with a list of illegal characters
here: http://developer.apple.com/technotes/tn/tn1150table.html

Thanks for the references. Unfortunately they're in unicode and I
don't know enough to translate them to ascii offhand. Kevin Horton's
message suggests that all the standard unix characters should be fine
though.

If anyone wants to be more precise about this, by looking at
Alastair's table and translating it into normal unix calls (which deal
with C char *'s), let me know what you come up with.

Maybe it's because I'm new to rdiff-backup, but I can't understand
why you need to determine the capabilities of the source file
system?

Under the old system we didn't check the source, just the destination
(as in your scheme). This worked ok, but led to unnecessary quoting.
For instance in a Mac OS X -> Mac OS X backup, rdiff-backup would
quote all uppercase characters.

If we determine the capabilities of the source too, then we can quote
only what needs to be.


--
Ben Escoto

Post What filename characters does Mac OS X support? 
On 22/10/2005, at 12:35 PM, Ben Escoto wrote:
In particular, there's a table with a list of illegal characters  
here: http://developer.apple.com/technotes/tn/tn1150table.html



Thanks for the references.  Unfortunately they're in unicode and I
don't know enough to translate them to ascii offhand.  Kevin Horton's
message suggests that all the standard unix characters should be fine
though.

Ben, I don't know what you mean by "translate [unicode characters] to ascii"? This just isn't possible, but perhaps you mean translate these characters to UTF-8 (ie char * in C)? In which case you should look at the "encode" python string methods, and/or the libiconv C library.

However: After some further investigation I'm not entirely sure you need to worry about that table of illegal unicode characters I quoted earlier. I just ran the following experiment:

#!/usr/bin/python
# -*- coding: utf-8 -*-
open( u"é composed char", "w").close()
open( u"u00e9 escaped composed", "w").close()
open( u"u0065u0301 escaped decomposed", "w").close()

This resulted in the é character being successfully inserted into each of the three output filenames. (I'd include output of "ls" here, but it doesn't seem to be unicode aware). So even though U+00E9 is explicitly designated as an illegal character by the filesystem specification, it looks like the OS is silently taking care of the required decomposition into the U+0065, U+0301 sequence on disk.

So although it is an issue *on disk* for some unicode characters to be decomposed, in reality it doesn't seem to make any difference - the OS takes care of the correct on-disk representation. Interestingly, the OS seems to be re-composing the decomposed characters when reading them from disk:

os.listdir(u".")
[u'eu0301 composed char', u'eu0301 escaped composed', u'eu0301 escaped decomposed']

This is not important for rdiff-backup, just an interesting aside.

Anyway, it seems that any of the unicode character set is usable in MacOS X filenames.

Maybe it's because I'm new to rdiff-backup, but I can't understand
why you need to determine the capabilities of the source file
system?


Under the old system we didn't check the source, just the destination
(as in your scheme).  This worked ok, but led to unnecessary quoting.
For instance in a Mac OS X -> Mac OS X backup, rdiff-backup would
quote all uppercase characters.

I'm sorry I still don't get it. If the destination filesystem is case *preserving* (which in this case it is), surely this removes the need for unnecessary quoting?

Post What filename characters does Mac OS X support? 
On 23 Oct 2005, at 01:56, Alastair Rankine wrote:

On 22/10/2005, at 12:35 PM, Ben Escoto wrote:
Under the old system we didn't check the source, just the destination
(as in your scheme). This worked ok, but led to unnecessary quoting.
For instance in a Mac OS X -> Mac OS X backup, rdiff-backup would
quote all uppercase characters.

I'm sorry I still don't get it. If the destination filesystem is
case *preserving* (which in this case it is), surely this removes
the need for unnecessary quoting?

The HFS+ file system is case preserving, but case insensitive. E.g.
a file name "SomeFile" will overwrite a file named "somefile", so
these two filenames cannot coexist.

Imagine that the source has a file system that is case sensitive, so
it could have both those file names. If the user does a backup onto
an HFS+ volume we have a problem unless we somehow deal with the case
issue. Rdiff-backup deals with this by quoting the upper case
characters. However, it does it in some situations where it is not
necessary, i.e. if both the source and destination are HFS+.

Kevin Horton
Ottawa, Canada

Post What filename characters does Mac OS X support? 
Alastair Rankine <arsptr < at > optusnet.com.au>
wrote the following on Sun, 23 Oct 2005 15:56:53 +1000

Ben, I don't know what you mean by "translate [unicode characters] to
ascii"? This just isn't possible, but perhaps you mean translate
these characters to UTF-8 (ie char * in C)? In which case you should
look at the "encode" python string methods, and/or the libiconv C
library.

Well I was just hoping to remain ignorant of unicode. Regardless of
what the unicode descriptions of the files are, if the files can be
processed with standard unix functions like open(char *, int) then
their filenames get represented as a collection of bytes (which I
improperly called ascii).

So I was hoping to deal with filenames just as byte arrays, and not
worry what they represent and if they are unicode or whatever.


--
Ben Escoto

Post What filename characters does Mac OS X support? 
"Carsten Lorenz" <clorenz < at > hamburg.fcb.com>
wrote the following on Mon, 24 Oct 2005 15:40:30 +0200

The lowest mention character in this table is 0x00C0 which is an Ŕ. And
if you enter "touch \300" the answer is "touch: Ŕ: Invalid argument".
It must be replaced by the other Unicode coding 0x0041 and 0x0300 which
is A and `.

Hmm so the quoting behavior in 1.1.0 isn't foolproof, in that there
could be some filesystem that was case insensitive, yet still
supported a filename of Ŕ ("\300"). If you tried to back this
filesystem up to a HFS+ system, then rdiff-backup wouldn't quote at
all, and would barf on the Ŕ files.

I don't think there is a read-only way to test for this stuff though,
and in practice I don't know if this will ever come up.


--
Ben Escoto

Post What filename characters does Mac OS X support? 
On 25/10/2005, at 2:29 PM, Ben Escoto wrote:

So I was hoping to deal with filenames just as byte arrays, and not
worry what they represent and if they are unicode or whatever.

UTF-8 lets you do that - as long as you don't make the assumption
that one byte == one character. ASCII characters take one byte, but
when you get into the more esoteric parts of the unicode character
set, it can take up to six (I think, at least four) bytes to
represent a single character.

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB