wxPython and Unicode and widgets

Hi All,

Chris Mellon wrote:
>> Christopher Barker wrote:
>>
>> To make that clear with your example:
>>
>> 2. "Right" way:
>> msg = u"Could be a fatal string"
>> wx.MessageBox( msg, "", wx.OK )
>>
>> so pass only unicode objects to wxPython.
>>
>> Where did your "could be a fatal string" come from? when you get that is
>> when you should handle the decoding, as Chris (the other one) said.
>>
>> ----
>>
>> Basically you are right. But you forget, in my mind, one important effect
>> which is coming from the Python side. Python "speaks" better iso-8859-1 than
>> cp1252 (win platform). This is especially true for the str <--> unicode
>> conversions.
>>
>
> Sorry, but this just isn't true. Python "speaks" both of them perfectly well.
>
>> If one works in a pure <str>-type mode, eg cp1252, on a win platform, using the
>> wxPython ansi build, then there is a proper cp1252-ANSI mapping and it avoids
>> some annoying side effects.
>>
>
> "str" is a sequence of bytes. The default conversion to use when
> converting to unicode is ascii,

Not always. Python's default can be changed from the site.py file. And
for automatic conversions done in wxPython (passing a string to a
wxString parameter in a Unicode build, or passing a Unicode object in a
ansi build) then if sys.getdefaultencoding() is still "ascii" then
wxPython will use locale.getdefaultlocale()[1] for the encoding
conversions. Doing it this way means that when the programmer needs to
deal with strings that are not strictly ascii then most of the time
wxPython will Do The Right Thing with the conversion because it will use
the current system locale's default encoding.

For the curious here is the actual code for deciding what encoding to use:

default = _sys.getdefaultencoding()
if default == 'ascii':
    import locale
    import codecs
    try:
        if hasattr(locale, 'getpreferredencoding'):
            default = locale.getpreferredencoding()
        else:
            default = locale.getdefaultlocale()[1]
        codecs.lookup(default)
    except (ValueError, LookupError, TypeError):
        default = _sys.getdefaultencoding()
    del locale
    del codecs
if default:
    wx.SetDefaultPyEncoding(default)
del default

You can find out what conversion encoding wxPython is using with
wx.GetDefaultPyEncoding, and you can change it if you want with
wx.SetDefaultPyEncoding.

Now that we are close to abandon ansi builds (as far as I understood,
which makes me less than happy anyway), there are a couple of things
that astonish me a bit, while being ironically sad (or sadly ironic?):

- It is very difficult (impossible?) to setup an encoding which will
support *all* the possible characters in the known world languages. I
used utf-8 for GUI2Exe but I remember I read it could fail anyway in
some occasions (but my memory could fail here);

- It should be enough to put something like:

# -*- coding: utf-8 -*-

At the beginning of a script to force
Python/wxPython/numpy/matplotlib/whatever site-package you want to
transparently encode/decode everything without the developer
intervention. If I wish to distribute my application in China, Russia
or Germany, my opinion is that I should not waste more than an eye
blink time to think about encodings.

All this stuff about sys.getdefaultencoding(),
wx.GetDefaultPyEncoding(), # -*- coding: whatever -*-,
locale.set_locale(), codecs, BOM, is extremely confusing if you are
not a Python guru (which I am not). There are many resources on the
web to read about it, but sometimes they just help in increasing the
confusion.

I am curious to see what will happen when I'll start moving my
database-based app to unicode (remembering the GUI2Exe encoding
nightmare, God Save Andrea) :smiley:

Andrea.

"Imagination Is The Only Weapon In The War Against Reality."
http://xoomer.alice.it/infinity77/

···

On Dec 20, 2007 7:52 PM, Robin Dunn wrote:

> On Dec 20, 2007 4:39 AM, jmf <jfauth@bluewin.ch> wrote:

Andrea Gavana wrote:

- It is very difficult (impossible?) to setup an encoding which will
support *all* the possible characters in the known world languages. I
used utf-8 for GUI2Exe but I remember I read it could fail anyway in
some occasions (but my memory could fail here);

I'm curious, because that's certainly the goal of Unicode, and utf-8 is supposed to support it. Besides, where could data come from that can't be encoded as utf-8?

There is a Python issue that still confuses me. Internally, Python can use either UCS-16 or UCS-32, depending on how it is compiled. It's my understanding the UCS-16 can hold most, but not all of the codepoints, so what happens when you try to use one that it can't hold?

And what does wx use internally in Unicode builds?

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

Kevin,

Kevin Ollivier wrote:

....

BTW, a while back ActiveGrid asked me to write up some help for understanding/working with Unicode better in their wxPython app, and they gave me permission to make the document public. It's available here: http://kevino.theolliviers.com/python-unicode.html

This is very good information, maybe it could be included in the wxPython wiki.

One thing which it does not cover is how to deal with databases. I am just converting my app which uses Firebird SQL to Unicode and there are a few things one has to watch out for e.g. BLOB's encoding and the string columns encoding and last but not least the connection encoding. I don't know what other db's do, but it might be helpful to add another page to the wiki with some information on the different db's in relation to Unicode handling.

Werner