wxPYTHON and UTF-32 support

Dear list,

  I am in the process of writing an application which needs to display a font for a language in the UTF-32 portion of Unicode. Currently, we are writing in Visual BASIC (primarily because it will allow us to display

this language); we are dissatisfied and would rather use PYTHON. I have used older versions of wxPYTHON and would like to use it to build a GUI. I have been having trouble finding out from the documentation how to code such a display; e.g., what information
do I need to give to wxPYTHON to tell it that I want to use a utf-32 font? I would appreciate any help and even more so if someone could give me a code snippet.

Thank you,

Bernard Sypniewski

Department of Computer Science

Rowan University - Camden Campus

Not sure what you mean by the UTF-32 portion -- unicode is unicode,
and UF-32 is a particular encoding.

Have you tried the unicode version of wxPython -- does it not work for you?

NOTE: under the hood, Python can be built with either 16bit or 32bit
unicode objects -- I'm pretty sure that the 16 bit versions do not
support the entirrty of unicode -- the ones that need more than 16
bits to represent - that may be your issue, though it's pretty reare
to need those.

I'm pretty sure that the standard Windows builds of Python 2.* are
using 16 bits (the common Linux builds use 32 bits)

The latest version 3 python has cleaned all that mess up, so some day
we'll be done with it.

-Chris

···

On Thu, Dec 13, 2012 at 9:01 AM, Sypniewski, Bernard Paul <Sypniewski@rowan.edu> wrote:

     I am in the process of writing an application which needs to display a
font for a language in the UTF-32 portion of Unicode.

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

You can specify the facename when creating the wx.Font if you want to ensure that it uses a specific font.

I do see a potential problem, however, in that the Unicode objects used by wx (and typically in the Python builds too) are based on the size of the C wchar_t type, and on Windows that is 16-bits (or UTF-16/UCS-2). On Wikipedia's UTF-32 page it says, "On Unix systems, UTF-32 strings are sometimes used for storage, due to the type wchar_t being defined as 32-bits. Python can be compiled to use them instead of UTF-16. Use of UTF-32 strings on Windows (where wchar_t is 16 bits) is almost non-existent."

You can see if Python has been built to support narrow or wide unicode values by looking at the value of sys.maxunicode, or by whether unichr(0x10001) raises an exception.

I'm not sure but my guess is that wx will still be able to use those code-points beyond the 16-bit values, but you may have to do some recoding of the Unicode before passing it to wx. (IIRC UTF-16 will use multiple values to represent the code points beyond 16-bit, like how UTF-8 will use multiple bytes to represent code points beyond what an 8-bit value can reach.) Does VB have built-in support for UTF-32 or are you doing something there like recoding as UTF-16?

···

On 12/13/12 9:01 AM, Sypniewski, Bernard Paul wrote:

Dear list,
      I am in the process of writing an application which needs to
display a font for a language in the UTF-32 portion of Unicode.
Currently, we are writing in Visual BASIC (primarily because it will
allow us to display this language); we are dissatisfied and would rather
use PYTHON. I have used older versions of wxPYTHON and would like to use
it to build a GUI. I have been having trouble finding out from the
documentation how to code such a display; e.g., what information do I
need to give to wxPYTHON to tell it that I want to use a utf-32 font? I
would appreciate any help and even more so if someone could give me a
code snippet.

--
Robin Dunn
Software Craftsman

I do see a potential problem, however, in that the Unicode objects used by
wx (and typically in the Python builds too) are based on the size of the C
wchar_t type, and on Windows that is 16-bits (or UTF-16/UCS-2).

yup -- I've been poking into this a bit -- the Windows API uses UTF-16
(not UCS-16)

Python can be compiled to use them instead of UTF-16. Use of UTF-32
strings on Windows (where wchar_t is 16 bits) is almost non-existent."

Python, on the other hand, used either UCS-16 or UCS-32, depending on
how it is built -- I suspect the standard binaries for Windows use
UCS-16. Sorry, I don't have a system handy to check right now.

You can see if Python has been built to support narrow or wide unicode
values by looking at the value of sys.maxunicode, or by whether
unichr(0x10001) raises an exception.

interesting, the OS_X build appears to be 16 bit:

In [22]: sys.maxunicode
Out[22]: 65535

In [23]: unichr(0x10001)

···

On Thu, Dec 13, 2012 at 3:11 PM, Robin Dunn <robin@alldunn.com> wrote:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-ae7c3efc97d4> in <module>()
----> 1 unichr(0x10001)

ValueError: unichr() arg not in range(0x10000) (narrow Python build)

The trick is the if your python is built with 16-bit unicode, it's
does not properly handle coce points outside the 16 bit range -- as
above, it can raise an exception if you try -- not sure what happens
with other ways to create a unicode object (decode, for instance).

I'm not sure but my guess is that wx will still be able to use those
code-points beyond the 16-bit values,

probably -- the Windows API does.

but you may have to do some recoding
of the Unicode before passing it to wx.

but how? isn't the unicode build of wx python designed to take unicode
objects? With a 16 bit build of python, you don't have unicode objects
that can handle those high values -- you could put utf-16 yourself in
a bytes object (old-style string), but what would wxPython do with
that?

(IIRC UTF-16 will use multiple
values to represent the code points beyond 16-bit, like how UTF-8 will use
multiple bytes to represent code points beyond what an 8-bit value can
reach.)

exactly.

Does VB have built-in support for UTF-32 or are you doing something
there like recoding as UTF-16?

I seriously doubt it -- I can't image why it would use anything other
than the Windows-standand utf-16 (internally) -- one can only hope
that it has an opaque unicode string object, and has some methods to
encode/decode on IO -- just like Python, etc.

Anyway, if the OP has some data encoded in utf-32 (or any encoding),
s/he should do the normal pyton thing -- decode it into a pyton
unicode object, then pass that around to wxPython and everywhere else.
Chances are it will work fine. And if not, you're fighting with
Python, not wx.

This is all cleaned up in Python 3.3 -- I guess we should all go there some day!

-Chris

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

How to do it is "left as an exercise for the reader..." :wink: I don't know enough about Unicode to say for sure, but I would guess that there must be code out there somewhere to convert a sequence of 32-bit values (not necessarily a Unicode object) into the corresponding UTF-16 values, using more than one 16-bit value for the glyphs that need them. Then my assumption is that passing that to a wx method which then passes it off to Windows will do the right thing.

···

On 12/14/12 10:04 AM, Chris Barker - NOAA Federal wrote:

On Thu, Dec 13, 2012 at 3:11 PM, Robin Dunn <robin@alldunn.com> wrote:

but you may have to do some recoding
of the Unicode before passing it to wx.

but how? isn't the unicode build of wx python designed to take unicode
objects? With a 16 bit build of python, you don't have unicode objects
that can handle those high values -- you could put utf-16 yourself in
a bytes object (old-style string), but what would wxPython do with
that?

--
Robin Dunn
Software Craftsman

How to do it is "left as an exercise for the reader..." :wink: I don't know enough about Unicode to say for sure, but I would guess that there must be code out there somewhere to convert a sequence of 32-bit values (not necessarily a Unicode object) into the corresponding UTF-16 values, using more than one 16-bit value for the glyphs that need them.

Sure, that's a simple:

.encode("utf-16")

Then my assumption is that passing that to a wx method which then passes it off to Windows will do the right thing.

Do you mean a c++ method? If so, then yes, that is likely to work.

However, I think we had a discussion on this list ( or the dev list ),
about this, and I thought wxPython assumes a py2 string is ANSI (
latin1?). So if pass in UTF16, you'll get a mess, or an error.

This makes me wonder if wxPython is doing the right thing now- if it's
passing the raw bytes of a 16 bit Unicode object off to windows APIs,
that would be wrong ( utf-16 is not Ucs-2), but it would work most of
the time, particularly with European languages. According to Wikipedia
, they are the same 97% of the time.

In fact , in the windows world, there are apparently a lot of folks
that assume utf-16 strings are always 2 bytes per character and
apparently don't get bit often enough to figure it out. One reason
lots of folks think utf-16 is evil!

Note: I miss-wrote earlier-- there is no Ucs-16, only ucs-2 (2
bytes), and utf-16 (16 bits ).

- Chris