Unicode Encoding For Filesystem In Mac Os X Not Correct In Python?
Solution 1:
MacOS X uses a special kind of decomposed UTF-8 to store filenames. If you need to e.g. read in filenames and write them to a "normal" UTF-8 file, you must normalize them :
filename = unicodedata.normalize('NFC', unicode(filename, 'utf-8')).encode('utf-8')
Solution 2:
getfilesystemencoding()
is giving you the correct response (the encoding), but it does not tell you the unicode normalisation form.
In particular, the HFS+ filesystem uses UTF-8 encoding, and a normalisation form close to "D" (which requires composed characters like รถ
to be decomposed into o¨
). HFS+ is also tied to the normalisation form as it existed in Unicode version 3.2—as detailed in Apple's documentation for the HFS+ format.
Python's unicodedata.normalize
method converts between forms, and if you prefix the call with the ucd_3_2_0
object, you can constrain it to Unicode version 3.2:
filename = unicodedata.ucd_3_2_0.normalize('NFC', unicode(filename, 'utf-8')).encode('utf-8')
Post a Comment for "Unicode Encoding For Filesystem In Mac Os X Not Correct In Python?"