Error loading xml in RichTextCtrl

Hello,

I create a editor using wx 2.8.8.1. I tried to open a xml file created in Windows in Linux and the application crashes, the same occurs with the richtext editor demo of the wx. The message showed in the console is this “(python:5007): Pango-WARNING **: Invalid UTF-8 string passed to pango_layout_set_text()”. The xml contains the character “é”. It is a bug? There is a workaround?

Thanks.

Thiago Franco Moraes wrote:

I create a editor using wx 2.8.8.1 <http://2.8.8.1>. I tried to open a
xml file created in Windows in Linux and the application crashes, the
same occurs with the richtext editor demo of the wx. The message
showed in the console is this "(python:5007): Pango-WARNING **:
Invalid UTF-8 string passed to pango_layout_set_text()". The xml
contains the character "é". It is a bug?

How did you create the XML file? What tool did you use?

The issue is that you can't just say "the xml contains the character
"é"," because in an 8-bit file there is no standard defined meaning for
"é". The letter "e" is always 0x65 in an 8-bit file, but "é" has
different encodings depending on your character set.

For example, "é" is defined in Latin-1 as 0xE9. It is also defined in
UTF-8, but as the two-byte sequence 0xC3 0xA9. So, if your Windows
system created the file with 0xE9, and you try to load that in to your
Linux system that expects UTF-8, the 0xE9 will cause it to read an
invalid character.

There is a workaround?

You need to be consistent. You can convert the file to UTF-8 using
iconv, as long as you know how it was created to begin with.

···

--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

Tim Roberts, I created using the editor that I'm developing. This is
encoded in ISO-8859-1. See this piece of the xml:

<?xml version="1.0" encoding="ISO-8859-1"?>
<richtext version="1.0.0.0" xmlns="http://www.wxwidgets.org">
  <paragraphlayout textcolor="#000000" fontsize="8" fontstyle="90"
fontweight="90" fontunderlined="0" fontface="MS Shell Dlg 2"
alignment="1" parspacingafter="10" parspacingbefore="0"
linespacing="10">
    <paragraph>
      <text>"Isto "</text>
      <symbol>-23</symbol>
      <text>" apenas um teste:"</text>

The character "é" is that <symbol>-23</symbol>
Thanks!

Well, in C parlance, the ISO-8859-1 encoding of "é" is 0xf9. (char)((unsigned char)0xf9) is -23. So, I'm guessing that whoever is writing that file is not writing the binary value, but writing it as a string instead?

--Grant

···

On 19 Sep, 2008, at 12:05, Thiago Franco Moraes wrote:

Tim Roberts, I created using the editor that I'm developing. This is
encoded in ISO-8859-1. See this piece of the xml:

<?xml version="1.0" encoding="ISO-8859-1"?>
<richtext version="1.0.0.0" xmlns="http://www.wxwidgets.org">
<paragraphlayout textcolor="#000000" fontsize="8" fontstyle="90"
fontweight="90" fontunderlined="0" fontface="MS Shell Dlg 2"
alignment="1" parspacingafter="10" parspacingbefore="0"
linespacing="10">
   <paragraph>
     <text>"Isto "</text>
     <symbol>-23</symbol>
     <text>" apenas um teste:"</text>

The character "é" is that <symbol>-23</symbol>

Thiago Franco Moraes wrote:

Tim Roberts, I created using the editor that I'm developing. This is
encoded in ISO-8859-1. See this piece of the xml:

<?xml version="1.0" encoding="ISO-8859-1"?>
<richtext version="1.0.0.0" xmlns="http://www.wxwidgets.org">
  <paragraphlayout textcolor="#000000" fontsize="8" fontstyle="90"
fontweight="90" fontunderlined="0" fontface="MS Shell Dlg 2"
alignment="1" parspacingafter="10" parspacingbefore="0"
linespacing="10">
    <paragraph>
      <text>"Isto "</text>
      <symbol>-23</symbol>
      <text>" apenas um teste:"</text>

The character "é" is that <symbol>-23</symbol>

Please make a ticket for this at http://trac.wxwidgets.org/

···

--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!