Mystery: wx.grid, a filter function, and Unicode

Bob Klahn wrote:

How should I handle phrases such as "Déjà
vu"? Externally, the é and à are recorded as hex
82 and hex 85 respectively; internally, they
should presumably be hex E9 and hex E0
respectively. How do I get "Déjà vu" back from
Unicode to extended ASCII?

···

---

From what I read, é: hex82, à:hex85, it *probably* comes from the
table cp850, used in the DOS world. The vertical bar, hexB3 is
not a char, but is a "drawing character" used in the DOS world.
See http://fr.wikipedia.org/wiki/Page_de_code_850

However, I should say cp850 does not fit exactly with your proposed
"extended ASCII" table.

In a DOS box on my win platform setup for Western European
Languages (cp850), Python yields this (one recognizes 82 and 85):

>>> s = 'Déjà vu'
>>> s
'D\x82j\x85 vu'
>>>

Now on Windows using cp1252 as table, I can mimick the
'Déjà vu' string and convert it to an unicode.

>>> s = 'Déjà vu'
>>> s
'D\xe9j\xe0 vu'
>>> isinstance(s, str)
True
>>> u = s.decode('cp1252')
>>> isinstance(u, unicode)
True

Once the unicode is created, it should be possible to
convert it into something else.

>>> s = u.encode('cp850')
>>> s
'D\x82j\x85 vu'
>>> isinstance(s, str)
True

82 and 85 again!

or

>>> u.encode('cp1252')
'D\xe9j\xe0 vu'
>>> u.encode('iso-8859-1')
'D\xe9j\xe0 vu'
>>> u.encode('utf-8')
'D\xc3\xa9j\xc3\xa0 vu'
>>> u.encode('utf-16')
'\xff\xfeD\x00\xe9\x00j\x00\xe0\x00 \x00v\x00u\x00'
>>> u.encode('raw_unicode_escape')
'D\xe9j\xe0 vu'

A side note, this encoding/decoding job is done on the Python
level and has nothing to do with the wxPython builds ANSI/unicode.
It is up to you if you prefer to work with the ANSI or unicode
build.

Hope that helps.

Jean-Michel Fauth, Switzerland

Thanks, Jean-Michel, cp850 was just what I needed. Recently it’s
become clearer than ever that trying to use anything but Unicode with
wxPython would continue to give me untold grief. So now, after
quite a number of coding changes this evening, my application, which
wasn’t using explicit Unicode anywhere, is using Unicode
throughout.

BTW, one of the coding changes was to replace my makefilter function with
this CharFilter class:

` class
CharFilter(object):

"""

    
Given a string of Unicode characters to (a) keep or

(b) delete,

build a filtering function that, applied to any string

s,

returns a copy of s containing

    
    (a) only the characters to be

kept, or

    (b) all but the characters to

be deleted.

"""

    
def __init__(self, chars, delete=True):

    
    self.chars  =

set(map(ord,chars))

    self.delete = delete

    
def __getitem__(self, n):

    
    if self.delete:

    
        if

n in self.chars: return None

    else:

    
        if n

not in self.chars: return None

    return unichr(n)

    
def __call__(self, s):

    
    return

unicode(s).translate(self)

`Thanks again for your help! I wasn’t aware of
cp850.

Bob

···

At 05:46 PM 1/3/2008, Jean-Michel wrote:

Bob Klahn wrote:

How should I handle phrases such as "Déjà

vu"? Externally, the é and à are recorded as hex

82 and hex 85 respectively; internally, they

should presumably be hex E9 and hex E0

respectively. How do I get “Déjà vu” back from

Unicode to extended ASCII?


From what I read, é: hex82, à:hex85, it probably comes from the

table cp850, used in the DOS world. The vertical bar, hexB3 is

not a char, but is a “drawing character” used in the DOS
world.

See

http://fr.wikipedia.org/wiki/Page_de_code_850

However, I should say cp850 does not fit exactly with your proposed

“extended ASCII” table.

In a DOS box on my win platform setup for Western European

Languages (cp850), Python yields this (one recognizes 82 and
85):

s = ‘Déjà vu’

s

‘D\x82j\x85 vu’

Now on Windows using cp1252 as table, I can mimick the

‘Déjà vu’ string and convert it to an unicode.

s = ‘Déjà vu’

s

‘D\xe9j\xe0 vu’

isinstance(s, str)

True

u = s.decode(‘cp1252’)

isinstance(u, unicode)

True

Once the unicode is created, it should be possible to

convert it into something else.

s = u.encode(‘cp850’)

s

‘D\x82j\x85 vu’

isinstance(s, str)

True

82 and 85 again!

or

u.encode(‘cp1252’)

‘D\xe9j\xe0 vu’

u.encode(‘iso-8859-1’)

‘D\xe9j\xe0 vu’

u.encode(‘utf-8’)

‘D\xc3\xa9j\xc3\xa0 vu’

u.encode(‘utf-16’)

‘\xff\xfeD\x00\xe9\x00j\x00\xe0\x00 \x00v\x00u\x00’

u.encode(‘raw_unicode_escape’)

‘D\xe9j\xe0 vu’

A side note, this encoding/decoding job is done on the Python

level and has nothing to do with the wxPython builds ANSI/unicode.

It is up to you if you prefer to work with the ANSI or unicode

build.

Hope that helps.

Jean-Michel Fauth, Switzerland


To unsubscribe, e-mail:
wxPython-users-unsubscribe@lists.wxwidgets.org

For additional commands, e-mail:
wxPython-users-help@lists.wxwidgets.org

No virus found in this incoming message.

Checked by AVG Free Edition. Version: 7.5.516 / Virus Database:
269.17.13/1207 - Release Date: 1/2/2008 11:29 AM