RichTextControl utf-8 encoding

I'm trying to display Chinese in a RichTextControl, using utf-8 encoding.

I'm doing:

        for h in rt.RichTextBuffer.GetHandlers():
            h.SetEncoding("utf-8")

But I get garbage when I load in a utf-8 encoded file. I've attached a
simple script and text file that displays the problem.

Thanks

Mark

utf8.txt (16 Bytes)

rtc_encoding.py (1.58 KB)

Hi, I am probably missing something (I don't use rtc very extensively
and don't really understand the usage of Handlers etc., but it seems,
that the wxpython demo app for rtc displays the same garbage like your
app; i.e. the utf-8 file is read using windows-1252.

On the other hand using a straightforward
                self.notepad.SetValue(unicode(open(path, "r").read(),
"utf-8", "strict"))
instead of
                self.notepad.LoadFile(path, fileType)

I get 我在学中文 in the richtext field.
(Of course, normally you would use more robust code around open(...),
, e.g. with ..., or try... except, or codecs.open(...).)

hth,
  vbr

···

2010/2/13 Mark Reed <markreed99@gmail.com>:

I'm trying to display Chinese in a RichTextControl, using utf-8 encoding.

I'm doing:

       for h in rt.RichTextBuffer.GetHandlers():
           h.SetEncoding("utf-8")

But I get garbage when I load in a utf-8 encoded file. I've attached a
simple script and text file that displays the problem.

Thanks

Mark

--
To unsubscribe, send email to wxPython-users+unsubscribe@googlegroups.com
or visit http://groups.google.com/group/wxPython-users?hl=en

Thanks,

I've modified it to use codecs if loading a .txt file and to use the
XMLHandler for .xml files and it works. Even if I remove the
SetEncoding("utf-8") call for the XMLHandler. So it appears like the
Handlers are ignoring the encoding and the XMLHandler is simply
recognizing the unicode and encoding the data as utf-8. I'll write up
a bug or take a look myself later.

                if path.endswith("txt"):
                    self.Load(path) # Uses codecs.open and rtc.SetValue()
                else:
                    self.notepad.LoadFile(path, fileType)

Mark

···

2010/2/13 Vlastimil Brom <vlastimil.brom@gmail.com>:

2010/2/13 Mark Reed <markreed99@gmail.com>:

I'm trying to display Chinese in a RichTextControl, using utf-8 encoding.

I'm doing:

       for h in rt.RichTextBuffer.GetHandlers():
           h.SetEncoding("utf-8")

But I get garbage when I load in a utf-8 encoded file. I've attached a
simple script and text file that displays the problem.

Thanks

Mark

--
To unsubscribe, send email to wxPython-users+unsubscribe@googlegroups.com
or visit http://groups.google.com/group/wxPython-users?hl=en

Hi, I am probably missing something (I don't use rtc very extensively
and don't really understand the usage of Handlers etc., but it seems,
that the wxpython demo app for rtc displays the same garbage like your
app; i.e. the utf-8 file is read using windows-1252.

On the other hand using a straightforward
               self.notepad.SetValue(unicode(open(path, "r").read(),
"utf-8", "strict"))
instead of
               self.notepad.LoadFile(path, fileType)

I get 我在学中文 in the richtext field.
(Of course, normally you would use more robust code around open(...),
, e.g. with ..., or try... except, or codecs.open(...).)

hth,
vbr

--
To unsubscribe, send email to wxPython-users+unsubscribe@googlegroups.com
or visit http://groups.google.com/group/wxPython-users?hl=en

Mark,

I suspect your problem is with your codecs and file handlers, not
strictly speaking with the RichTextCtrl. I've worked with chinese
characters internally with the RichTextCtrl with no difficulties.

If you want to look at what I do, look at the Demo program that goes
with my RTF Parser, described at
http://www.transana.org/developers/PyRTFParser/.

David

···

-----Original Message-----
From: wxpython-users@googlegroups.com
[mailto:wxpython-users@googlegroups.com] On Behalf Of Mark Reed
Sent: Saturday, February 13, 2010 8:36 AM
To: wxpython-users@googlegroups.com
Subject: Re: [wxPython-users] RichTextControl utf-8 encoding

Thanks,

I've modified it to use codecs if loading a .txt file and to
use the XMLHandler for .xml files and it works. Even if I remove the
SetEncoding("utf-8") call for the XMLHandler. So it appears
like the Handlers are ignoring the encoding and the
XMLHandler is simply recognizing the unicode and encoding the
data as utf-8. I'll write up a bug or take a look myself later.

                if path.endswith("txt"):
                    self.Load(path) # Uses codecs.open and
rtc.SetValue()
                else:
                    self.notepad.LoadFile(path, fileType)

Mark

2010/2/13 Vlastimil Brom <vlastimil.brom@gmail.com>:
> 2010/2/13 Mark Reed <markreed99@gmail.com>:
>> I'm trying to display Chinese in a RichTextControl, using utf-8
>> encoding.
>>
>> I'm doing:
>>
>> for h in rt.RichTextBuffer.GetHandlers():
>> h.SetEncoding("utf-8")
>>
>> But I get garbage when I load in a utf-8 encoded file.
I've attached
>> a simple script and text file that displays the problem.
>>
>> Thanks
>>
>> Mark
>>
>> --
>> To unsubscribe, send email to
>> wxPython-users+unsubscribe@googlegroups.com
>> or visit http://groups.google.com/group/wxPython-users?hl=en
>
> Hi, I am probably missing something (I don't use rtc very
extensively
> and don't really understand the usage of Handlers etc., but
it seems,
> that the wxpython demo app for rtc displays the same
garbage like your
> app; i.e. the utf-8 file is read using windows-1252.
>
> On the other hand using a straightforward
> self.notepad.SetValue(unicode(open(path,
"r").read(),
> "utf-8", "strict")) instead of
> self.notepad.LoadFile(path, fileType)
>
> I get 我在学中文 in the richtext field.
> (Of course, normally you would use more robust code around
open(...),
> , e.g. with ..., or try... except, or codecs.open(...).)
>
> hth,
> vbr
>
> --
> To unsubscribe, send email to
> wxPython-users+unsubscribe@googlegroups.com
> or visit http://groups.google.com/group/wxPython-users?hl=en

--
To unsubscribe, send email to
wxPython-users+unsubscribe@googlegroups.com
or visit http://groups.google.com/group/wxPython-users?hl=en