Unicode confusion

Charles_Hartman · April 9, 2005, 2:20am

That's very clear -- thanks! "utf-8" does not seem to work, so I tried this:

  defEnc = wx.GetDefaultPyEncoding()
  data = mySTC.GetText()
  data = data.encode(defEnc)
  myfile.write(data)

and as far as I can tell it works fine. When I get to the Windows machine, I'll hope it works there too.

Now another problem crops up. I have a regex that looks for vowels, and its list includes the grave-accented e. That character in the text is no longer matched by the regex. If I look at myregex.pattern, I see the accented-e as \ufff, that is, as unknown garbage. As far as I can tell from the Python docs (and experiment confirms it), the sre.UNICODE flag isn't relevant here. Do I need to encode the target string (like the string I want to write to disk) before Python modules like sre can read it properly? I tried that, and it doesn't seem to make any difference. Still confused!

Charles Hartman
*the Scandroid* is available at: http://cherry.conncoll.edu/cohar/Programs

Charles Hartman
Professor of English, Poet in Residence
*the Scandroid* is available at: http://cherry.conncoll.edu/cohar/Programs
http://villex.blogspot.com