Bob Klahn wrote:
How should I handle phrases such as "Déjà
vu"? Externally, the é and à are recorded as hex
82 and hex 85 respectively; internally, they
should presumably be hex E9 and hex E0
respectively. How do I get "Déjà vu" back from
Unicode to extended ASCII?
···
---
From what I read, é: hex82, à:hex85, it *probably* comes from the
table cp850, used in the DOS world. The vertical bar, hexB3 is
not a char, but is a "drawing character" used in the DOS world.
See http://fr.wikipedia.org/wiki/Page_de_code_850
However, I should say cp850 does not fit exactly with your proposed
"extended ASCII" table.
In a DOS box on my win platform setup for Western European
Languages (cp850), Python yields this (one recognizes 82 and 85):
>>> s = 'Déjà vu'
>>> s
'D\x82j\x85 vu'
>>>
Now on Windows using cp1252 as table, I can mimick the
'Déjà vu' string and convert it to an unicode.
>>> s = 'Déjà vu'
>>> s
'D\xe9j\xe0 vu'
>>> isinstance(s, str)
True
>>> u = s.decode('cp1252')
>>> isinstance(u, unicode)
True
Once the unicode is created, it should be possible to
convert it into something else.
>>> s = u.encode('cp850')
>>> s
'D\x82j\x85 vu'
>>> isinstance(s, str)
True
82 and 85 again!
or
>>> u.encode('cp1252')
'D\xe9j\xe0 vu'
>>> u.encode('iso-8859-1')
'D\xe9j\xe0 vu'
>>> u.encode('utf-8')
'D\xc3\xa9j\xc3\xa0 vu'
>>> u.encode('utf-16')
'\xff\xfeD\x00\xe9\x00j\x00\xe0\x00 \x00v\x00u\x00'
>>> u.encode('raw_unicode_escape')
'D\xe9j\xe0 vu'
A side note, this encoding/decoding job is done on the Python
level and has nothing to do with the wxPython builds ANSI/unicode.
It is up to you if you prefer to work with the ANSI or unicode
build.
Hope that helps.
Jean-Michel Fauth, Switzerland