RichTextControl xml parsing error

Hello all!:

I am writing an app which will use a RTC as editor. The development of it has begun under Python 2.5 and wxPython 2.8.7.1 under Linux (Ubuntu 8.04) and Windows XP without any more troubles than the usual due to try to use some new component.

As part of the app, I get some HTML from a database, transform it to RTC-Kosher XML, and load into the control via LoadStream. Everything went OK even when I updated the wxPython version to 2.8.8.0 in both machines. But when the version of my Linux box got updated to 2.8.8.1, suddenly the RTC stopped recognizing as valid my XML, showing a message "XML parsing error: 'not well-formed (invalid token)' at line 1". The Win box (at job, and still using 2.8.8.0) runs the same code without a glitch. The line number is not useful because I pass the XML as only one long line.

After some trials and errors, I reduced the problem to that the RTC does not accept unicode strings (u"text"), but only regular ones ("text").

Attached goes a simple program that shows the unexpected behavior. The programs opens a frame showing a RTC loaded with some text, and a button that changes the text to a unicode string (no non-ascii chars in it).

Should I post a bug report, or there is something that I am missing? The situation itself can be worked around easily changing the string to non-unicode before loading it to the RTC, but I was surprised for the change in behavior.

Regards,

        Walter

Frame1.py (3.67 KB)

Walter Mario Gardella Sambeth wrote:

Hello all!:
    I am writing an app which will use a RTC as editor. The development of it has begun under Python 2.5 and wxPython 2.8.7.1 under Linux (Ubuntu 8.04) and Windows XP without any more troubles than the usual due to try to use some new component.
    As part of the app, I get some HTML from a database, transform it to RTC-Kosher XML, and load into the control via LoadStream. Everything went OK even when I updated the wxPython version to 2.8.8.0 in both machines. But when the version of my Linux box got updated to 2.8.8.1, suddenly the RTC stopped recognizing as valid my XML, showing a message "XML parsing error: 'not well-formed (invalid token)' at line 1". The Win box (at job, and still using 2.8.8.0) runs the same code without a glitch. The line number is not useful because I pass the XML as only one long line.
    After some trials and errors, I reduced the problem to that the RTC does not accept unicode strings (u"text"), but only regular ones ("text").

        s = u'<?xml version="1.0" encoding="UTF-8"?>...

Well, if it's a unicode object, then the encoding can not be UTF-8, right?

   Attached goes a simple program that shows the unexpected behavior.

The programs opens a frame showing a RTC loaded with some text, and a button that changes the text to a unicode string (no non-ascii chars in it).
    Should I post a bug report, or there is something that I am missing? The situation itself can be worked around easily changing the string to non-unicode before loading it to the RTC, but I was surprised for the change in behavior.

The parser is just telling you that you lied to it. :wink:

···

--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!

So the parser is stricter now that in 2.8.8.0, and does not accept a
  declared encoding in already-unicode strings. I think I can live with that.

Thanks for your response.

Regards,

        Walter
···

El jue, 21-08-2008 a las 00:20 -0700, Robin Dunn escribió:

The parser is just telling you that you lied to it. ;-)

Walter Mario Gardella Sambeth wrote:

···

El jue, 21-08-2008 a las 00:20 -0700, Robin Dunn escribió:

The parser is just telling you that you lied to it. :wink:

So the parser is stricter now that in 2.8.8.0, and does not accept a
  declared encoding in already-unicode strings.

Yes, that's my guess. I didn't track down and look at the specific change, but it makes sense.

--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!