Unicode yet again

Charles_Hartman · April 25, 2005, 1:51pm

I'm using the unicode build of wxPython 2.5.4.1 (with Mac Python 2.4.1). I have a list of lists of unicode strings, and I want to make a displayable version. If

cw = [[u'ex', u'HI', u'bit'], [u'TRAIN']] # from "exhibit", "train"

then this line

print str(' / '.join(' '.join(str(s) for s in w) for w in cw))

produces more or less what I want:

ex HI bit / TRAIN

(Phew.)

The trouble -- again! -- comes when one of the strings contains an accented character (code > 128). The 'print' line I gave above raises a can't-decode exception. This revision

print str(' / '.join(' '.join(str(s.encode('utf-8')) for s in w) for w in cw))

gets rid of the exception. But the 's' containing the accented character doesn't print with that character; it prints with some ascii-but-nonalphanumeric garbage.

Obviously I'm still missing something in this maze of encode/decoding/unicode/ascii/ stuff. (Am I the only one who finds this bizarre?) I hit on the 'encode' addition more or less by accident. Can anyone guide me toward a more complete solution that (as a bonus) makes sense?

Charles Hartman