Unicode fun

Hi!

First off, some praise for wxPython -- the code I've been writing in the last few weeks has been fun and moving right along. Cross platform UI development is fun for a change!

On to my questions -- I am the creator of MusicBrainz (http://musicbrainz.org) and I am writing a cross platform music tagging application similar to the MusicBrainz Tagger as part of my grant work for the Helix Community project. (http://musicbrainz.helixcommunity.org)

This application retrieves music metadata from the server which speaks nothing but UTF-8 and if at all possible I would like have the entire tagging application use nothing but Unicode end-to-end. Upon studying up on the Unicode issues with this, I've come to realize that wxWidgets/wxPython can either be built Unicode or er, not. I have the feeling that I cannot force my users to install the unicode version (especially if they already have the non-unicode wxWidgets installed). Thus I think it would be most advantageous to write my application to use the same encoding that wxWidgets is using.

Q1: How do I determine what that encoding is? If it is not Unicode, will it be 8859-1 or are other encodings possible?

On my Debian box I've got the non-unicode version of wxPython installed and it throws an exception every-time I try to pass non-ascii characters to one of the wxPython widgets. Setting my default encoding to 8859-1 using a sitecustomize.py improves this a lot. However, in my deployed application I don't want to force the user to install this file -- its both a pain in the rear as well as against my religious beliefs with respect to installing applications on an end user's machine.

Q2: Are there any drawbacks to setting sitecustomize.py this site wise?

I tried putting sitecustomize.py into the same dir as the main wxPython application, but python ignores this, presumably since the python path doesn't include . by default. And since the default encoding needs to be set before python is fully initialized, this becomes a bit challenging to set the encoding for just one application. I assume that some sort of wrapper script that modifies the the path and then executes the real application is necessary.

Q3: Do I need to create a wrapper script to change the default encoding for just one application? If so, do you have any tips on how to do this, and if not, how can I accomplish this?

Thanks for any tips you might have!

···

--

--ruaok Somewhere in Texas a village is missing its idiot.

Robert Kaye -- rob@eorbit.net -- http://mayhem-chaos.net

This application retrieves music metadata from the server
which speaks
nothing but UTF-8 and if at all possible I would like have the entire
tagging application use nothing but Unicode end-to-end. Upon studying
up on the Unicode issues with this, I've come to realize that
wxWidgets/wxPython can either be built Unicode or er, not. I have the
feeling that I cannot force my users to install the unicode version
(especially if they already have the non-unicode wxWidgets
installed).
Thus I think it would be most advantageous to write my application to
use the same encoding that wxWidgets is using.

Q1: How do I determine what that encoding is? If it is not Unicode,
will it be 8859-1 or are other encodings possible?

Try locale.getlocale() or locale.nl_langinfo(locale.CODESET)
wx.USE_UNICODE tells if unicode is used

On my Debian box I've got the non-unicode version of wxPython
installed
and it throws an exception every-time I try to pass non-ascii
characters to one of the wxPython widgets. Setting my default
encoding
to 8859-1 using a sitecustomize.py improves this a lot.
However, in my
deployed application I don't want to force the user to install this
file -- its both a pain in the rear as well as against my religious
beliefs with respect to installing applications on an end user's
machine.

Try unicode("text","utf8").encode("encoding") to convert your utf8
text to the encoding the app is using (which you know from
locale.getlocale() ).

I'd try using these tools instead of using sitecustomize.py.

Robert Kaye wrote:

I have the
feeling that I cannot force my users to install the unicode version
(especially if they already have the non-unicode wxWidgets installed).
Thus I think it would be most advantageous to write my application to
use the same encoding that wxWidgets is using.

The other alternative is to bundle everything up so that there are
no external dependencies. You should also be aware that there
are several more combinations of wxWidgets since Python can be
compiled using 2 byte or 4 byte Unicode characters, and the
2.4 and 2.5 versions of wxWidgets are not sufficiently compatible
with each other.

If you want an example of an application bundled up, look at
http://dotamatic.sourceforge.net

You can download for Linux, Windows and Mac and as a user
not care about any other dependencies (including Python).
It also gives you an idea how big the installation is
for something larger than hello world (for example online
help is also included).

I have been looking at how to get my BitPim app onto
Gentoo (I am moving to Gentoo from Redhat as they no longer
want my money), and although dependency handling is easier,
it is still a pain due to version issues.

Roger

The other alternative is to bundle everything up so that there are
no external dependencies. You should also be aware that there
are several more combinations of wxWidgets since Python can be
compiled using 2 byte or 4 byte Unicode characters, and the
2.4 and 2.5 versions of wxWidgets are not sufficiently compatible
with each other.

Good thinking. So, please sanity check this strategy then:

1. Binary distributions are totally bundled as you suggested -- these will contain 2/4 byte unicode enabled wxWidgets. Only versions of the application that people installed themselves is subject to further considerations. For those:
2. Check wx.UNICODE setting. If that is enabled, use unicode internally.
3. If not, check current locale. Use that locale if set, and if none is specified, assume 8859-1.

If you want an example of an application bundled up, look at
http://dotamatic.sourceforge.net

Ohh -- thanks for the tip -- this is still a big question mark, and seeing someone blaze that trail before me is reassuring. I'll take a closer look.

Thanks for your help!

···

On May 27, 2004, at 2:05 PM, Roger Binns wrote:

--

--ruaok Somewhere in Texas a village is missing its idiot.

Robert Kaye -- rob@eorbit.net -- http://mayhem-chaos.net

Roger Binns writes:

If you want an example of an application bundled up, look at
http://dotamatic.sourceforge.net

Thanks for that, Roger. I just read makedist.py and it was very
easy to follow: now I know how to distribute my wxPython stuff
to Mac, Win, and Lin, something I've been dreading.

Is dotamatic going to move to wxPython2.5?

···

--
Paul
http://www.paulmcnett.com

Paul McNett wrote:

Is dotamatic going to move to wxPython2.5?

I was initially planning on supporting wxPython 2.4 and 2.5
for both the projects (BitPim and dotamatic). Unfortunately
the API changes in 2.5 were painful enough that I couldn't
support both at the same time without *lots* of if statements
switching on wxPython version. That basically meant I had
to support one or the other. I also figured (lucky guess)
that the API may be changing again for the next wxPython 2.5.

There were also lots of bug reports on this list against 2.5
which would have very adversely affected the applications.
Consequently I am waiting till the next 2.5 release.

Roger