StyledTextCtrl UnicodeDecodeError

Hello,

  I am not sure if this is a problem related to wx only, STC only,
python and windows or something else (why not?)
  I received a bug report from a person that tried to display some
text in a STC control, but got this:

   File "H:\test\act_udtfinpay\udt_ui\udtgui.py", line 354, in _display_file
     self.uview.SetText(intext)
   File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\stc.py",
line 2934, in SetText
     return _stc.StyledTextCtrl_SetText(*args, **kwargs)
   File "C:\Python25\lib\encodings\cp1252.py", line 15, in decode
     return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
112004: character maps to <undefined>

  It works on Linux.

···

--
-- Guilherme H. Polo Goncalves

Guilherme Polo wrote:

Hello,

  I am not sure if this is a problem related to wx only, STC only,
python and windows or something else (why not?)
  I received a bug report from a person that tried to display some
text in a STC control, but got this:

   File "H:\test\act_udtfinpay\udt_ui\udtgui.py", line 354, in _display_file
     self.uview.SetText(intext)
   File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\stc.py",
line 2934, in SetText
     return _stc.StyledTextCtrl_SetText(*args, **kwargs)
   File "C:\Python25\lib\encodings\cp1252.py", line 15, in decode
     return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
112004: character maps to <undefined>

  It works on Linux.

The default encoding for the locale (apparently cp1252 based on the traceback) doesn't know how to convert the \x8f character to a Unicode value.

  >>> st = '\x8f'
  >>> st
  '\x8f'
  >>> print st
  è
  >>> wx.GetDefaultPyEncoding()
  'mac-roman'
  >>> st.decode('cp1252')
  Traceback (most recent call last):
    File "<input>", line 1, in <module>
    File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/cp1252.py", line 15, in decode
      return codecs.charmap_decode(input,errors,decoding_table)
  UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 0: character maps to <undefined>
  >>>

You'll probably need to convert the string value to unicode yourself before passing it to the STC. That way you can use a more permissive conversion mode, or you can choose a different codec if you happen to know the encoding of the string.

···

--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!

Guilherme Polo wrote:
> Hello,
>
> I am not sure if this is a problem related to wx only, STC only,
> python and windows or something else (why not?)
> I received a bug report from a person that tried to display some
> text in a STC control, but got this:
>
> File "H:\test\act_udtfinpay\udt_ui\udtgui.py", line 354, in _display_file
> self.uview.SetText(intext)
> File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\stc.py",
> line 2934, in SetText
> return _stc.StyledTextCtrl_SetText(*args, **kwargs)
> File "C:\Python25\lib\encodings\cp1252.py", line 15, in decode
> return codecs.charmap_decode(input,errors,decoding_table)
> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> 112004: character maps to <undefined>
>
> It works on Linux.
>

The default encoding for the locale (apparently cp1252 based on the
traceback) doesn't know how to convert the \x8f character to a Unicode
value.

  >>> st = '\x8f'
  >>> st
  '\x8f'
  >>> print st
  è
  >>> wx.GetDefaultPyEncoding()
  'mac-roman'
  >>> st.decode('cp1252')
  Traceback (most recent call last):
    File "<input>", line 1, in <module>
    File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/cp1252.py",
  line 15, in decode
      return codecs.charmap_decode(input,errors,decoding_table)
  UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
  0: character maps to <undefined>
  >>>

You'll probably need to convert the string value to unicode yourself
before passing it to the STC. That way you can use a more permissive
conversion mode, or you can choose a different codec if you happen to
know the encoding of the string.

Isn't it possible to patch STC in a way that it would substitute those
"bad" chars with something like '?' or anything else automatically ?

···

2007/10/3, Robin Dunn <robin@alldunn.com>:

--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!

---------------------------------------------------------------------
To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org

--
-- Guilherme H. Polo Goncalves

This is actually happening in Pythons str/unicode conversion, not in
STC proper, and it's because of Pythons philosophy towards error
handling - errors should not pass silently (unless explicitly
silenced). You're passing something that it doesn't know how to deal
with, so instead of trying to guess what you meant, it fails so that
you can decide.

Convert your data to a python unicode object before passing it to STC,
using the "replace" error handling strategy to get the behavior you
want:

unicode('\x8f', errors='replace')

···

On 10/3/07, Guilherme Polo <ggpolo@gmail.com> wrote:

2007/10/3, Robin Dunn <robin@alldunn.com>:
> Guilherme Polo wrote:
> > Hello,
> >
> > I am not sure if this is a problem related to wx only, STC only,
> > python and windows or something else (why not?)
> > I received a bug report from a person that tried to display some
> > text in a STC control, but got this:
> >
> > File "H:\test\act_udtfinpay\udt_ui\udtgui.py", line 354, in _display_file
> > self.uview.SetText(intext)
> > File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\stc.py",
> > line 2934, in SetText
> > return _stc.StyledTextCtrl_SetText(*args, **kwargs)
> > File "C:\Python25\lib\encodings\cp1252.py", line 15, in decode
> > return codecs.charmap_decode(input,errors,decoding_table)
> > UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> > 112004: character maps to <undefined>
> >
> > It works on Linux.
> >
>
> The default encoding for the locale (apparently cp1252 based on the
> traceback) doesn't know how to convert the \x8f character to a Unicode
> value.
>
> >>> st = '\x8f'
> >>> st
> '\x8f'
> >>> print st
> è
> >>> wx.GetDefaultPyEncoding()
> 'mac-roman'
> >>> st.decode('cp1252')
> Traceback (most recent call last):
> File "<input>", line 1, in <module>
> File
> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/cp1252.py",
> line 15, in decode
> return codecs.charmap_decode(input,errors,decoding_table)
> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> 0: character maps to <undefined>
> >>>
>
>
> You'll probably need to convert the string value to unicode yourself
> before passing it to the STC. That way you can use a more permissive
> conversion mode, or you can choose a different codec if you happen to
> know the encoding of the string.
>

Isn't it possible to patch STC in a way that it would substitute those
"bad" chars with something like '?' or anything else automatically ?

Guilherme Polo wrote:

···

2007/10/3, Robin Dunn <robin@alldunn.com>:

Guilherme Polo wrote:

Hello,

  I am not sure if this is a problem related to wx only, STC only,
python and windows or something else (why not?)
  I received a bug report from a person that tried to display some
text in a STC control, but got this:

   File "H:\test\act_udtfinpay\udt_ui\udtgui.py", line 354, in _display_file
     self.uview.SetText(intext)
   File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\stc.py",
line 2934, in SetText
     return _stc.StyledTextCtrl_SetText(*args, **kwargs)
   File "C:\Python25\lib\encodings\cp1252.py", line 15, in decode
     return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
112004: character maps to <undefined>

  It works on Linux.

The default encoding for the locale (apparently cp1252 based on the
traceback) doesn't know how to convert the \x8f character to a Unicode
value.

  >>> st = '\x8f'
  >>> st
  '\x8f'
  >>> print st
  è
  >>> wx.GetDefaultPyEncoding()
  'mac-roman'
  >>> st.decode('cp1252')
  Traceback (most recent call last):
    File "<input>", line 1, in <module>
    File
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/cp1252.py",
  line 15, in decode
      return codecs.charmap_decode(input,errors,decoding_table)
  UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
  0: character maps to <undefined>
  >>>

You'll probably need to convert the string value to unicode yourself
before passing it to the STC. That way you can use a more permissive
conversion mode, or you can choose a different codec if you happen to
know the encoding of the string.

Isn't it possible to patch STC in a way that it would substitute those
"bad" chars with something like '?' or anything else automatically ?

No, it's not STC that is doing the conversion. It's the standard
wxPython wxString wrapper that is used everywhere in wxPython, and it
would not be appropriate to do a relaxed conversion in most places.
If you want a relaxed conversion then you need to do it yourself.

See UnicodeBuild - wxPyWiki for more info

--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!

Guilherme Polo wrote:
>> Guilherme Polo wrote:
>>> Hello,
>>>
>>> I am not sure if this is a problem related to wx only, STC only,
>>> python and windows or something else (why not?)
>>> I received a bug report from a person that tried to display some
>>> text in a STC control, but got this:
>>>
>>> File "H:\test\act_udtfinpay\udt_ui\udtgui.py", line 354, in _display_file
>>> self.uview.SetText(intext)
>>> File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\stc.py",
>>> line 2934, in SetText
>>> return _stc.StyledTextCtrl_SetText(*args, **kwargs)
>>> File "C:\Python25\lib\encodings\cp1252.py", line 15, in decode
>>> return codecs.charmap_decode(input,errors,decoding_table)
>>> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
>>> 112004: character maps to <undefined>
>>>
>>> It works on Linux.
>>>
>> The default encoding for the locale (apparently cp1252 based on the
>> traceback) doesn't know how to convert the \x8f character to a Unicode
>> value.
>>
>> >>> st = '\x8f'
>> >>> st
>> '\x8f'
>> >>> print st
>> è
>> >>> wx.GetDefaultPyEncoding()
>> 'mac-roman'
>> >>> st.decode('cp1252')
>> Traceback (most recent call last):
>> File "<input>", line 1, in <module>
>> File
>> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/cp1252.py",
>> line 15, in decode
>> return codecs.charmap_decode(input,errors,decoding_table)
>> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
>> 0: character maps to <undefined>
>> >>>
>>
>>
>> You'll probably need to convert the string value to unicode yourself
>> before passing it to the STC. That way you can use a more permissive
>> conversion mode, or you can choose a different codec if you happen to
>> know the encoding of the string.
>>
>
> Isn't it possible to patch STC in a way that it would substitute those
> "bad" chars with something like '?' or anything else automatically ?

No, it's not STC that is doing the conversion. It's the standard
wxPython wxString wrapper that is used everywhere in wxPython, and it
would not be appropriate to do a relaxed conversion in most places.
If you want a relaxed conversion then you need to do it yourself.

See UnicodeBuild - wxPyWiki for more info

--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!

Sorry for taking long to mail back, but I get a segfault now.

$ python stc_segfault.py
wx.__version__ 2.9.0.0
wx.PlatformInfo ('__WXGTK__', 'wxGTK', 'unicode', 'gtk2',
'wx-assertions-on', 'SWIG-1.3.29')
Segmentation fault (core dumped)

With this sample code:

import wx
import wx.stc as stc

class STC(stc.StyledTextCtrl):
    def __init__(self, parent):
        stc.StyledTextCtrl.__init__(self, parent, -1)

class Win(wx.Frame):
    def __init__(self):
        wx.Frame.__init__(self, None, -1)

        stc = STC(self)
        stc.SetText(unicode('\x8f', errors='replace')) # segfault

        box = wx.BoxSizer(wx.VERTICAL)
        box.Add(stc, 0, wx.EXPAND, 0)

        self.SetSizer(box)
        box.Fit(self)
        self.SetAutoLayout(True)

if __name__ == "__main__":
    print 'wx.__version__', wx.__version__
    print 'wx.PlatformInfo', wx.PlatformInfo
    app = wx.App()
    frame = Win()
    frame.Show()
    app.MainLoop()

···

2007/10/5, Robin Dunn <robin@alldunn.com>:

> 2007/10/3, Robin Dunn <robin@alldunn.com>:

---------------------------------------------------------------------
To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org

--
-- Guilherme H. Polo Goncalves

Interesting. On Windows, I get the "missing glyph" box, which is the
behavior I'd expect. What happens if you set the text to u'\ufffd'?

···

On 10/10/07, Guilherme Polo <ggpolo@gmail.com> wrote:

2007/10/5, Robin Dunn <robin@alldunn.com>:
> Guilherme Polo wrote:
> > 2007/10/3, Robin Dunn <robin@alldunn.com>:
> >> Guilherme Polo wrote:
> >>> Hello,
> >>>
> >>> I am not sure if this is a problem related to wx only, STC only,
> >>> python and windows or something else (why not?)
> >>> I received a bug report from a person that tried to display some
> >>> text in a STC control, but got this:
> >>>
> >>> File "H:\test\act_udtfinpay\udt_ui\udtgui.py", line 354, in _display_file
> >>> self.uview.SetText(intext)
> >>> File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\stc.py",
> >>> line 2934, in SetText
> >>> return _stc.StyledTextCtrl_SetText(*args, **kwargs)
> >>> File "C:\Python25\lib\encodings\cp1252.py", line 15, in decode
> >>> return codecs.charmap_decode(input,errors,decoding_table)
> >>> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> >>> 112004: character maps to <undefined>
> >>>
> >>> It works on Linux.
> >>>
> >> The default encoding for the locale (apparently cp1252 based on the
> >> traceback) doesn't know how to convert the \x8f character to a Unicode
> >> value.
> >>
> >> >>> st = '\x8f'
> >> >>> st
> >> '\x8f'
> >> >>> print st
> >> è
> >> >>> wx.GetDefaultPyEncoding()
> >> 'mac-roman'
> >> >>> st.decode('cp1252')
> >> Traceback (most recent call last):
> >> File "<input>", line 1, in <module>
> >> File
> >> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/cp1252.py",
> >> line 15, in decode
> >> return codecs.charmap_decode(input,errors,decoding_table)
> >> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> >> 0: character maps to <undefined>
> >> >>>
> >>
> >>
> >> You'll probably need to convert the string value to unicode yourself
> >> before passing it to the STC. That way you can use a more permissive
> >> conversion mode, or you can choose a different codec if you happen to
> >> know the encoding of the string.
> >>
> >
> > Isn't it possible to patch STC in a way that it would substitute those
> > "bad" chars with something like '?' or anything else automatically ?
>
> No, it's not STC that is doing the conversion. It's the standard
> wxPython wxString wrapper that is used everywhere in wxPython, and it
> would not be appropriate to do a relaxed conversion in most places.
> If you want a relaxed conversion then you need to do it yourself.
>
> See UnicodeBuild - wxPyWiki for more info
>
>
> --
> Robin Dunn
> Software Craftsman
> http://wxPython.org Java give you jitters? Relax with wxPython!
>

Sorry for taking long to mail back, but I get a segfault now.

$ python stc_segfault.py
wx.__version__ 2.9.0.0
wx.PlatformInfo ('__WXGTK__', 'wxGTK', 'unicode', 'gtk2',
'wx-assertions-on', 'SWIG-1.3.29')
Segmentation fault (core dumped)

With this sample code:

import wx
import wx.stc as stc

class STC(stc.StyledTextCtrl):
    def __init__(self, parent):
        stc.StyledTextCtrl.__init__(self, parent, -1)

class Win(wx.Frame):
    def __init__(self):
        wx.Frame.__init__(self, None, -1)

        stc = STC(self)
        stc.SetText(unicode('\x8f', errors='replace')) # segfault

        box = wx.BoxSizer(wx.VERTICAL)
        box.Add(stc, 0, wx.EXPAND, 0)

        self.SetSizer(box)
        box.Fit(self)
        self.SetAutoLayout(True)

if __name__ == "__main__":
    print 'wx.__version__', wx.__version__
    print 'wx.PlatformInfo', wx.PlatformInfo
    app = wx.App()
    frame = Win()
    frame.Show()
    app.MainLoop()

> > Guilherme Polo wrote:
> > >> Guilherme Polo wrote:
> > >>> Hello,
> > >>>
> > >>> I am not sure if this is a problem related to wx only, STC only,
> > >>> python and windows or something else (why not?)
> > >>> I received a bug report from a person that tried to display some
> > >>> text in a STC control, but got this:
> > >>>
> > >>> File "H:\test\act_udtfinpay\udt_ui\udtgui.py", line 354, in _display_file
> > >>> self.uview.SetText(intext)
> > >>> File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\stc.py",
> > >>> line 2934, in SetText
> > >>> return _stc.StyledTextCtrl_SetText(*args, **kwargs)
> > >>> File "C:\Python25\lib\encodings\cp1252.py", line 15, in decode
> > >>> return codecs.charmap_decode(input,errors,decoding_table)
> > >>> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> > >>> 112004: character maps to <undefined>
> > >>>
> > >>> It works on Linux.
> > >>>
> > >> The default encoding for the locale (apparently cp1252 based on the
> > >> traceback) doesn't know how to convert the \x8f character to a Unicode
> > >> value.
> > >>
> > >> >>> st = '\x8f'
> > >> >>> st
> > >> '\x8f'
> > >> >>> print st
> > >> è
> > >> >>> wx.GetDefaultPyEncoding()
> > >> 'mac-roman'
> > >> >>> st.decode('cp1252')
> > >> Traceback (most recent call last):
> > >> File "<input>", line 1, in <module>
> > >> File
> > >> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/cp1252.py",
> > >> line 15, in decode
> > >> return codecs.charmap_decode(input,errors,decoding_table)
> > >> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> > >> 0: character maps to <undefined>
> > >> >>>
> > >>
> > >>
> > >> You'll probably need to convert the string value to unicode yourself
> > >> before passing it to the STC. That way you can use a more permissive
> > >> conversion mode, or you can choose a different codec if you happen to
> > >> know the encoding of the string.
> > >>
> > >
> > > Isn't it possible to patch STC in a way that it would substitute those
> > > "bad" chars with something like '?' or anything else automatically ?
> >
> > No, it's not STC that is doing the conversion. It's the standard
> > wxPython wxString wrapper that is used everywhere in wxPython, and it
> > would not be appropriate to do a relaxed conversion in most places.
> > If you want a relaxed conversion then you need to do it yourself.
> >
> > See UnicodeBuild - wxPyWiki for more info
> >
> >
> > --
> > Robin Dunn
> > Software Craftsman
> > http://wxPython.org Java give you jitters? Relax with wxPython!
> >
>
> Sorry for taking long to mail back, but I get a segfault now.
>
> $ python stc_segfault.py
> wx.__version__ 2.9.0.0
> wx.PlatformInfo ('__WXGTK__', 'wxGTK', 'unicode', 'gtk2',
> 'wx-assertions-on', 'SWIG-1.3.29')
> Segmentation fault (core dumped)
>
> With this sample code:
>
> import wx
> import wx.stc as stc
>
> class STC(stc.StyledTextCtrl):
> def __init__(self, parent):
> stc.StyledTextCtrl.__init__(self, parent, -1)
>
>
> class Win(wx.Frame):
> def __init__(self):
> wx.Frame.__init__(self, None, -1)
>
> stc = STC(self)
> stc.SetText(unicode('\x8f', errors='replace')) # segfault
>
> box = wx.BoxSizer(wx.VERTICAL)
> box.Add(stc, 0, wx.EXPAND, 0)
>
> self.SetSizer(box)
> box.Fit(self)
> self.SetAutoLayout(True)
>
>
> if __name__ == "__main__":
> print 'wx.__version__', wx.__version__
> print 'wx.PlatformInfo', wx.PlatformInfo
> app = wx.App()
> frame = Win()
> frame.Show()
> app.MainLoop()
>

Interesting. On Windows, I get the "missing glyph" box, which is the
behavior I'd expect. What happens if you set the text to u'\ufffd'?

Simply putting u'\ufffd' will return me a UnicodeDecodeError, doing
u'\ufffd'.encode('utf-8') work as expected.

···

2007/10/10, Chris Mellon <arkanes@gmail.com>:

On 10/10/07, Guilherme Polo <ggpolo@gmail.com> wrote:
> 2007/10/5, Robin Dunn <robin@alldunn.com>:
> > > 2007/10/3, Robin Dunn <robin@alldunn.com>:

---------------------------------------------------------------------
To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org

--
-- Guilherme H. Polo Goncalves

!! u'\ufffd' shouldn't give a UnicodeDecodeError, because it should
never be decoded. Can you show me an example here?

···

On 10/10/07, Guilherme Polo <ggpolo@gmail.com> wrote:

2007/10/10, Chris Mellon <arkanes@gmail.com>:
> On 10/10/07, Guilherme Polo <ggpolo@gmail.com> wrote:
> > 2007/10/5, Robin Dunn <robin@alldunn.com>:
> > > Guilherme Polo wrote:
> > > > 2007/10/3, Robin Dunn <robin@alldunn.com>:
> > > >> Guilherme Polo wrote:
> > > >>> Hello,
> > > >>>
> > > >>> I am not sure if this is a problem related to wx only, STC only,
> > > >>> python and windows or something else (why not?)
> > > >>> I received a bug report from a person that tried to display some
> > > >>> text in a STC control, but got this:
> > > >>>
> > > >>> File "H:\test\act_udtfinpay\udt_ui\udtgui.py", line 354, in _display_file
> > > >>> self.uview.SetText(intext)
> > > >>> File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\stc.py",
> > > >>> line 2934, in SetText
> > > >>> return _stc.StyledTextCtrl_SetText(*args, **kwargs)
> > > >>> File "C:\Python25\lib\encodings\cp1252.py", line 15, in decode
> > > >>> return codecs.charmap_decode(input,errors,decoding_table)
> > > >>> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> > > >>> 112004: character maps to <undefined>
> > > >>>
> > > >>> It works on Linux.
> > > >>>
> > > >> The default encoding for the locale (apparently cp1252 based on the
> > > >> traceback) doesn't know how to convert the \x8f character to a Unicode
> > > >> value.
> > > >>
> > > >> >>> st = '\x8f'
> > > >> >>> st
> > > >> '\x8f'
> > > >> >>> print st
> > > >> è
> > > >> >>> wx.GetDefaultPyEncoding()
> > > >> 'mac-roman'
> > > >> >>> st.decode('cp1252')
> > > >> Traceback (most recent call last):
> > > >> File "<input>", line 1, in <module>
> > > >> File
> > > >> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/cp1252.py",
> > > >> line 15, in decode
> > > >> return codecs.charmap_decode(input,errors,decoding_table)
> > > >> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> > > >> 0: character maps to <undefined>
> > > >> >>>
> > > >>
> > > >>
> > > >> You'll probably need to convert the string value to unicode yourself
> > > >> before passing it to the STC. That way you can use a more permissive
> > > >> conversion mode, or you can choose a different codec if you happen to
> > > >> know the encoding of the string.
> > > >>
> > > >
> > > > Isn't it possible to patch STC in a way that it would substitute those
> > > > "bad" chars with something like '?' or anything else automatically ?
> > >
> > > No, it's not STC that is doing the conversion. It's the standard
> > > wxPython wxString wrapper that is used everywhere in wxPython, and it
> > > would not be appropriate to do a relaxed conversion in most places.
> > > If you want a relaxed conversion then you need to do it yourself.
> > >
> > > See UnicodeBuild - wxPyWiki for more info
> > >
> > >
> > > --
> > > Robin Dunn
> > > Software Craftsman
> > > http://wxPython.org Java give you jitters? Relax with wxPython!
> > >
> >
> > Sorry for taking long to mail back, but I get a segfault now.
> >
> > $ python stc_segfault.py
> > wx.__version__ 2.9.0.0
> > wx.PlatformInfo ('__WXGTK__', 'wxGTK', 'unicode', 'gtk2',
> > 'wx-assertions-on', 'SWIG-1.3.29')
> > Segmentation fault (core dumped)
> >
> > With this sample code:
> >
> > import wx
> > import wx.stc as stc
> >
> > class STC(stc.StyledTextCtrl):
> > def __init__(self, parent):
> > stc.StyledTextCtrl.__init__(self, parent, -1)
> >
> >
> > class Win(wx.Frame):
> > def __init__(self):
> > wx.Frame.__init__(self, None, -1)
> >
> > stc = STC(self)
> > stc.SetText(unicode('\x8f', errors='replace')) # segfault
> >
> > box = wx.BoxSizer(wx.VERTICAL)
> > box.Add(stc, 0, wx.EXPAND, 0)
> >
> > self.SetSizer(box)
> > box.Fit(self)
> > self.SetAutoLayout(True)
> >
> >
> > if __name__ == "__main__":
> > print 'wx.__version__', wx.__version__
> > print 'wx.PlatformInfo', wx.PlatformInfo
> > app = wx.App()
> > frame = Win()
> > frame.Show()
> > app.MainLoop()
> >
>
> Interesting. On Windows, I get the "missing glyph" box, which is the
> behavior I'd expect. What happens if you set the text to u'\ufffd'?
>

Simply putting u'\ufffd' will return me a UnicodeDecodeError, doing
u'\ufffd'.encode('utf-8') work as expected.

> > > Guilherme Polo wrote:
> > > >> Guilherme Polo wrote:
> > > >>> Hello,
> > > >>>
> > > >>> I am not sure if this is a problem related to wx only, STC only,
> > > >>> python and windows or something else (why not?)
> > > >>> I received a bug report from a person that tried to display some
> > > >>> text in a STC control, but got this:
> > > >>>
> > > >>> File "H:\test\act_udtfinpay\udt_ui\udtgui.py", line 354, in _display_file
> > > >>> self.uview.SetText(intext)
> > > >>> File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\stc.py",
> > > >>> line 2934, in SetText
> > > >>> return _stc.StyledTextCtrl_SetText(*args, **kwargs)
> > > >>> File "C:\Python25\lib\encodings\cp1252.py", line 15, in decode
> > > >>> return codecs.charmap_decode(input,errors,decoding_table)
> > > >>> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> > > >>> 112004: character maps to <undefined>
> > > >>>
> > > >>> It works on Linux.
> > > >>>
> > > >> The default encoding for the locale (apparently cp1252 based on the
> > > >> traceback) doesn't know how to convert the \x8f character to a Unicode
> > > >> value.
> > > >>
> > > >> >>> st = '\x8f'
> > > >> >>> st
> > > >> '\x8f'
> > > >> >>> print st
> > > >> è
> > > >> >>> wx.GetDefaultPyEncoding()
> > > >> 'mac-roman'
> > > >> >>> st.decode('cp1252')
> > > >> Traceback (most recent call last):
> > > >> File "<input>", line 1, in <module>
> > > >> File
> > > >> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/cp1252.py",
> > > >> line 15, in decode
> > > >> return codecs.charmap_decode(input,errors,decoding_table)
> > > >> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> > > >> 0: character maps to <undefined>
> > > >> >>>
> > > >>
> > > >>
> > > >> You'll probably need to convert the string value to unicode yourself
> > > >> before passing it to the STC. That way you can use a more permissive
> > > >> conversion mode, or you can choose a different codec if you happen to
> > > >> know the encoding of the string.
> > > >>
> > > >
> > > > Isn't it possible to patch STC in a way that it would substitute those
> > > > "bad" chars with something like '?' or anything else automatically ?
> > >
> > > No, it's not STC that is doing the conversion. It's the standard
> > > wxPython wxString wrapper that is used everywhere in wxPython, and it
> > > would not be appropriate to do a relaxed conversion in most places.
> > > If you want a relaxed conversion then you need to do it yourself.
> > >
> > > See UnicodeBuild - wxPyWiki for more info
> > >
> > >
> > > --
> > > Robin Dunn
> > > Software Craftsman
> > > http://wxPython.org Java give you jitters? Relax with wxPython!
> > >
> >
> > Sorry for taking long to mail back, but I get a segfault now.
> >
> > $ python stc_segfault.py
> > wx.__version__ 2.9.0.0
> > wx.PlatformInfo ('__WXGTK__', 'wxGTK', 'unicode', 'gtk2',
> > 'wx-assertions-on', 'SWIG-1.3.29')
> > Segmentation fault (core dumped)
> >
> > With this sample code:
> >
> > import wx
> > import wx.stc as stc
> >
> > class STC(stc.StyledTextCtrl):
> > def __init__(self, parent):
> > stc.StyledTextCtrl.__init__(self, parent, -1)
> >
> >
> > class Win(wx.Frame):
> > def __init__(self):
> > wx.Frame.__init__(self, None, -1)
> >
> > stc = STC(self)
> > stc.SetText(unicode('\x8f', errors='replace')) # segfault
> >
> > box = wx.BoxSizer(wx.VERTICAL)
> > box.Add(stc, 0, wx.EXPAND, 0)
> >
> > self.SetSizer(box)
> > box.Fit(self)
> > self.SetAutoLayout(True)
> >
> >
> > if __name__ == "__main__":
> > print 'wx.__version__', wx.__version__
> > print 'wx.PlatformInfo', wx.PlatformInfo
> > app = wx.App()
> > frame = Win()
> > frame.Show()
> > app.MainLoop()
> >
>
> Interesting. On Windows, I get the "missing glyph" box, which is the
> behavior I'd expect. What happens if you set the text to u'\ufffd'?
>

Simply putting u'\ufffd' will return me a UnicodeDecodeError, doing
u'\ufffd'.encode('utf-8') work as expected.

Just to clear up. Doing:

  mytext = unicode(mytext, errors='replace').encode('utf-8')
  stc.SetText(mytext)

solved.

Thanks for your time

···

2007/10/10, Guilherme Polo <ggpolo@gmail.com>:

2007/10/10, Chris Mellon <arkanes@gmail.com>:
> On 10/10/07, Guilherme Polo <ggpolo@gmail.com> wrote:
> > 2007/10/5, Robin Dunn <robin@alldunn.com>:
> > > > 2007/10/3, Robin Dunn <robin@alldunn.com>:

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
> For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org
>
>

--
-- Guilherme H. Polo Goncalves

--
-- Guilherme H. Polo Goncalves

> > > > Guilherme Polo wrote:
> > > > >> Guilherme Polo wrote:
> > > > >>> Hello,
> > > > >>>
> > > > >>> I am not sure if this is a problem related to wx only, STC only,
> > > > >>> python and windows or something else (why not?)
> > > > >>> I received a bug report from a person that tried to display some
> > > > >>> text in a STC control, but got this:
> > > > >>>
> > > > >>> File "H:\test\act_udtfinpay\udt_ui\udtgui.py", line 354, in _display_file
> > > > >>> self.uview.SetText(intext)
> > > > >>> File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\stc.py",
> > > > >>> line 2934, in SetText
> > > > >>> return _stc.StyledTextCtrl_SetText(*args, **kwargs)
> > > > >>> File "C:\Python25\lib\encodings\cp1252.py", line 15, in decode
> > > > >>> return codecs.charmap_decode(input,errors,decoding_table)
> > > > >>> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> > > > >>> 112004: character maps to <undefined>
> > > > >>>
> > > > >>> It works on Linux.
> > > > >>>
> > > > >> The default encoding for the locale (apparently cp1252 based on the
> > > > >> traceback) doesn't know how to convert the \x8f character to a Unicode
> > > > >> value.
> > > > >>
> > > > >> >>> st = '\x8f'
> > > > >> >>> st
> > > > >> '\x8f'
> > > > >> >>> print st
> > > > >> è
> > > > >> >>> wx.GetDefaultPyEncoding()
> > > > >> 'mac-roman'
> > > > >> >>> st.decode('cp1252')
> > > > >> Traceback (most recent call last):
> > > > >> File "<input>", line 1, in <module>
> > > > >> File
> > > > >> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/cp1252.py",
> > > > >> line 15, in decode
> > > > >> return codecs.charmap_decode(input,errors,decoding_table)
> > > > >> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> > > > >> 0: character maps to <undefined>
> > > > >> >>>
> > > > >>
> > > > >>
> > > > >> You'll probably need to convert the string value to unicode yourself
> > > > >> before passing it to the STC. That way you can use a more permissive
> > > > >> conversion mode, or you can choose a different codec if you happen to
> > > > >> know the encoding of the string.
> > > > >>
> > > > >
> > > > > Isn't it possible to patch STC in a way that it would substitute those
> > > > > "bad" chars with something like '?' or anything else automatically ?
> > > >
> > > > No, it's not STC that is doing the conversion. It's the standard
> > > > wxPython wxString wrapper that is used everywhere in wxPython, and it
> > > > would not be appropriate to do a relaxed conversion in most places.
> > > > If you want a relaxed conversion then you need to do it yourself.
> > > >
> > > > See UnicodeBuild - wxPyWiki for more info
> > > >
> > > >
> > > > --
> > > > Robin Dunn
> > > > Software Craftsman
> > > > http://wxPython.org Java give you jitters? Relax with wxPython!
> > > >
> > >
> > > Sorry for taking long to mail back, but I get a segfault now.
> > >
> > > $ python stc_segfault.py
> > > wx.__version__ 2.9.0.0
> > > wx.PlatformInfo ('__WXGTK__', 'wxGTK', 'unicode', 'gtk2',
> > > 'wx-assertions-on', 'SWIG-1.3.29')
> > > Segmentation fault (core dumped)
> > >
> > > With this sample code:
> > >
> > > import wx
> > > import wx.stc as stc
> > >
> > > class STC(stc.StyledTextCtrl):
> > > def __init__(self, parent):
> > > stc.StyledTextCtrl.__init__(self, parent, -1)
> > >
> > >
> > > class Win(wx.Frame):
> > > def __init__(self):
> > > wx.Frame.__init__(self, None, -1)
> > >
> > > stc = STC(self)
> > > stc.SetText(unicode('\x8f', errors='replace')) # segfault
> > >
> > > box = wx.BoxSizer(wx.VERTICAL)
> > > box.Add(stc, 0, wx.EXPAND, 0)
> > >
> > > self.SetSizer(box)
> > > box.Fit(self)
> > > self.SetAutoLayout(True)
> > >
> > >
> > > if __name__ == "__main__":
> > > print 'wx.__version__', wx.__version__
> > > print 'wx.PlatformInfo', wx.PlatformInfo
> > > app = wx.App()
> > > frame = Win()
> > > frame.Show()
> > > app.MainLoop()
> > >
> >
> > Interesting. On Windows, I get the "missing glyph" box, which is the
> > behavior I'd expect. What happens if you set the text to u'\ufffd'?
> >
>
> Simply putting u'\ufffd' will return me a UnicodeDecodeError, doing
> u'\ufffd'.encode('utf-8') work as expected.
>

!! u'\ufffd' shouldn't give a UnicodeDecodeError, because it should
never be decoded. Can you show me an example here?

$ python b1.py
wx.__version__ 2.9.0.0
wx.PlatformInfo ('__WXGTK__', 'wxGTK', 'unicode', 'gtk2',
'wx-assertions-on', 'SWIG-1.3.29')
Traceback (most recent call last):
  File "b1.py", line 28, in <module>
    frame = Win()
  File "b1.py", line 14, in __init__
    stc.SetText(u'\ufffd')
  File "/home/polo/wxWidgets/wxPython/wx/stc.py", line 2935, in SetText
    return _stc.StyledTextCtrl_SetText(*args, **kwargs)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in
position 0: ordinal not in range(128)

The example is the same as posted previously, just changed SetText.

···

2007/10/10, Chris Mellon <arkanes@gmail.com>:

On 10/10/07, Guilherme Polo <ggpolo@gmail.com> wrote:
> 2007/10/10, Chris Mellon <arkanes@gmail.com>:
> > On 10/10/07, Guilherme Polo <ggpolo@gmail.com> wrote:
> > > 2007/10/5, Robin Dunn <robin@alldunn.com>:
> > > > > 2007/10/3, Robin Dunn <robin@alldunn.com>:

---------------------------------------------------------------------
To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org

--
-- Guilherme H. Polo Goncalves

> > > > > Guilherme Polo wrote:
> > > > > >> Guilherme Polo wrote:
> > > > > >>> Hello,
> > > > > >>>
> > > > > >>> I am not sure if this is a problem related to wx only, STC only,
> > > > > >>> python and windows or something else (why not?)
> > > > > >>> I received a bug report from a person that tried to display some
> > > > > >>> text in a STC control, but got this:
> > > > > >>>
> > > > > >>> File "H:\test\act_udtfinpay\udt_ui\udtgui.py", line 354, in _display_file
> > > > > >>> self.uview.SetText(intext)
> > > > > >>> File "C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\stc.py",
> > > > > >>> line 2934, in SetText
> > > > > >>> return _stc.StyledTextCtrl_SetText(*args, **kwargs)
> > > > > >>> File "C:\Python25\lib\encodings\cp1252.py", line 15, in decode
> > > > > >>> return codecs.charmap_decode(input,errors,decoding_table)
> > > > > >>> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> > > > > >>> 112004: character maps to <undefined>
> > > > > >>>
> > > > > >>> It works on Linux.
> > > > > >>>
> > > > > >> The default encoding for the locale (apparently cp1252 based on the
> > > > > >> traceback) doesn't know how to convert the \x8f character to a Unicode
> > > > > >> value.
> > > > > >>
> > > > > >> >>> st = '\x8f'
> > > > > >> >>> st
> > > > > >> '\x8f'
> > > > > >> >>> print st
> > > > > >> è
> > > > > >> >>> wx.GetDefaultPyEncoding()
> > > > > >> 'mac-roman'
> > > > > >> >>> st.decode('cp1252')
> > > > > >> Traceback (most recent call last):
> > > > > >> File "<input>", line 1, in <module>
> > > > > >> File
> > > > > >> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/cp1252.py",
> > > > > >> line 15, in decode
> > > > > >> return codecs.charmap_decode(input,errors,decoding_table)
> > > > > >> UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
> > > > > >> 0: character maps to <undefined>
> > > > > >> >>>
> > > > > >>
> > > > > >>
> > > > > >> You'll probably need to convert the string value to unicode yourself
> > > > > >> before passing it to the STC. That way you can use a more permissive
> > > > > >> conversion mode, or you can choose a different codec if you happen to
> > > > > >> know the encoding of the string.
> > > > > >>
> > > > > >
> > > > > > Isn't it possible to patch STC in a way that it would substitute those
> > > > > > "bad" chars with something like '?' or anything else automatically ?
> > > > >
> > > > > No, it's not STC that is doing the conversion. It's the standard
> > > > > wxPython wxString wrapper that is used everywhere in wxPython, and it
> > > > > would not be appropriate to do a relaxed conversion in most places.
> > > > > If you want a relaxed conversion then you need to do it yourself.
> > > > >
> > > > > See UnicodeBuild - wxPyWiki for more info
> > > > >
> > > > >
> > > > > --
> > > > > Robin Dunn
> > > > > Software Craftsman
> > > > > http://wxPython.org Java give you jitters? Relax with wxPython!
> > > > >
> > > >
> > > > Sorry for taking long to mail back, but I get a segfault now.
> > > >
> > > > $ python stc_segfault.py
> > > > wx.__version__ 2.9.0.0
> > > > wx.PlatformInfo ('__WXGTK__', 'wxGTK', 'unicode', 'gtk2',
> > > > 'wx-assertions-on', 'SWIG-1.3.29')
> > > > Segmentation fault (core dumped)
> > > >
> > > > With this sample code:
> > > >
> > > > import wx
> > > > import wx.stc as stc
> > > >
> > > > class STC(stc.StyledTextCtrl):
> > > > def __init__(self, parent):
> > > > stc.StyledTextCtrl.__init__(self, parent, -1)
> > > >
> > > >
> > > > class Win(wx.Frame):
> > > > def __init__(self):
> > > > wx.Frame.__init__(self, None, -1)
> > > >
> > > > stc = STC(self)
> > > > stc.SetText(unicode('\x8f', errors='replace')) # segfault
> > > >
> > > > box = wx.BoxSizer(wx.VERTICAL)
> > > > box.Add(stc, 0, wx.EXPAND, 0)
> > > >
> > > > self.SetSizer(box)
> > > > box.Fit(self)
> > > > self.SetAutoLayout(True)
> > > >
> > > >
> > > > if __name__ == "__main__":
> > > > print 'wx.__version__', wx.__version__
> > > > print 'wx.PlatformInfo', wx.PlatformInfo
> > > > app = wx.App()
> > > > frame = Win()
> > > > frame.Show()
> > > > app.MainLoop()
> > > >
> > >
> > > Interesting. On Windows, I get the "missing glyph" box, which is the
> > > behavior I'd expect. What happens if you set the text to u'\ufffd'?
> > >
> >
> > Simply putting u'\ufffd' will return me a UnicodeDecodeError, doing
> > u'\ufffd'.encode('utf-8') work as expected.
> >
>
> !! u'\ufffd' shouldn't give a UnicodeDecodeError, because it should
> never be decoded. Can you show me an example here?

$ python b1.py
wx.__version__ 2.9.0.0
wx.PlatformInfo ('__WXGTK__', 'wxGTK', 'unicode', 'gtk2',
'wx-assertions-on', 'SWIG-1.3.29')
Traceback (most recent call last):
  File "b1.py", line 28, in <module>
    frame = Win()
  File "b1.py", line 14, in __init__
    stc.SetText(u'\ufffd')
  File "/home/polo/wxWidgets/wxPython/wx/stc.py", line 2935, in SetText
    return _stc.StyledTextCtrl_SetText(*args, **kwargs)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in
position 0: ordinal not in range(128)

The example is the same as posted previously, just changed SetText.

It is UnicodeEncodeError, my bad, wrong word on that email.

···

2007/10/10, Guilherme Polo <ggpolo@gmail.com>:

2007/10/10, Chris Mellon <arkanes@gmail.com>:
> On 10/10/07, Guilherme Polo <ggpolo@gmail.com> wrote:
> > 2007/10/10, Chris Mellon <arkanes@gmail.com>:
> > > On 10/10/07, Guilherme Polo <ggpolo@gmail.com> wrote:
> > > > 2007/10/5, Robin Dunn <robin@alldunn.com>:
> > > > > > 2007/10/3, Robin Dunn <robin@alldunn.com>:

>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
> For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org
>
>

--
-- Guilherme H. Polo Goncalves

--
-- Guilherme H. Polo Goncalves

Guilherme Polo wrote:

Sorry for taking long to mail back, but I get a segfault now.

$ python stc_segfault.py
wx.__version__ 2.9.0.0
wx.PlatformInfo ('__WXGTK__', 'wxGTK', 'unicode', 'gtk2',
'wx-assertions-on', 'SWIG-1.3.29')
Segmentation fault (core dumped)

What about the 2.8.6.0 version (or current code from the 2.8 branch) ? I haven't spent much time yet keeping the 2.9 version (svn trunk) updated and tested. Since there have been a bunch of Unicode related changes in wxString in that version I'm not too surprised that there might be some bugs that haven't been found yet.

···

--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!