Unicode confusion

There is a real problem with this code. What is the 8F character supposed to be? That code point is not present in any of the iso-8859 variants, nor in the Windows cp1252 encoding, nor in UTF8, nor even in Unicode. What encoding do you have defined in your -*- coding -*- string? When you call "encode" to convert the Unicode strings to 8-bit, how is it going to produce an 8F? What encoding do you get back from wx.GetDefaultPyEncoding?

It is quite dangerous to embed characters > 128 in a constant string in a Python program. Such characters have NO inherent meaning on their own: they have to be associated with an encoding. If 8F is a character in Elvish, for example, then you had better make darned sure you convert all of your incoming strings to Elvish before comparing (wordstring = wordstring.encode('Elvish')). You can't rely on the default encoding if your program requires a specific encoding.

ยทยทยท

On Fri, 8 Apr 2005 22:43:22 -0400, Charles Hartman <charles.hartman@conncoll.edu> wrote:

Footnote: I discovered that I can use (for example)
sre.compile(r"[ae\x8fiouy]+")
to match the accented e -- but only after I *also* do
defEnd = wx.GetDefaultPyEncoding
wordstring = wordstring.encode(defEnd)
and send that encoded wordstring to the regex matching code, where it *will* catch the >128 character. This seems *awfully* roundabout. Will it work cross-platform? It means I have to import wx into that module though it's used for nothing else. Am I missing some simpler approach?

--
- Tim Roberts, timr@probo.com
  Providenza & Boekelheide, Inc.