I do see a potential problem, however, in that the Unicode objects used by
wx (and typically in the Python builds too) are based on the size of the C
wchar_t type, and on Windows that is 16-bits (or UTF-16/UCS-2).
yup -- I've been poking into this a bit -- the Windows API uses UTF-16
(not UCS-16)
Python can be compiled to use them instead of UTF-16. Use of UTF-32
strings on Windows (where wchar_t is 16 bits) is almost non-existent."
Python, on the other hand, used either UCS-16 or UCS-32, depending on
how it is built -- I suspect the standard binaries for Windows use
UCS-16. Sorry, I don't have a system handy to check right now.
You can see if Python has been built to support narrow or wide unicode
values by looking at the value of sys.maxunicode, or by whether
unichr(0x10001) raises an exception.
interesting, the OS_X build appears to be 16 bit:
In [22]: sys.maxunicode
Out[22]: 65535
In [23]: unichr(0x10001)
···
On Thu, Dec 13, 2012 at 3:11 PM, Robin Dunn <robin@alldunn.com> wrote:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-ae7c3efc97d4> in <module>()
----> 1 unichr(0x10001)
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
The trick is the if your python is built with 16-bit unicode, it's
does not properly handle coce points outside the 16 bit range -- as
above, it can raise an exception if you try -- not sure what happens
with other ways to create a unicode object (decode, for instance).
I'm not sure but my guess is that wx will still be able to use those
code-points beyond the 16-bit values,
probably -- the Windows API does.
but you may have to do some recoding
of the Unicode before passing it to wx.
but how? isn't the unicode build of wx python designed to take unicode
objects? With a 16 bit build of python, you don't have unicode objects
that can handle those high values -- you could put utf-16 yourself in
a bytes object (old-style string), but what would wxPython do with
that?
(IIRC UTF-16 will use multiple
values to represent the code points beyond 16-bit, like how UTF-8 will use
multiple bytes to represent code points beyond what an 8-bit value can
reach.)
exactly.
Does VB have built-in support for UTF-32 or are you doing something
there like recoding as UTF-16?
I seriously doubt it -- I can't image why it would use anything other
than the Windows-standand utf-16 (internally) -- one can only hope
that it has an opaque unicode string object, and has some methods to
encode/decode on IO -- just like Python, etc.
Anyway, if the OP has some data encoded in utf-32 (or any encoding),
s/he should do the normal pyton thing -- decode it into a pyton
unicode object, then pass that around to wxPython and everywhere else.
Chances are it will work fine. And if not, you're fighting with
Python, not wx.
This is all cleaned up in Python 3.3 -- I guess we should all go there some day!
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov