wx.GetDefaultPyEncoding() failing me

Hi all,
I have so far relied on wx.GetDefaultPyEncoding() to prevent unicode
encoding problems from crashing my application, since they creep up
from time to time in the data my app looks at. However, today, this
method failed me for the first time. Here is the traceback:
Traceback (most recent call last):
  File "c:\prog\bookshare\lbc.py", line 605, in DefaultHandler
    if self.CustomHandler: return self.CustomHandler(self, event, sName)
  File "c:\prog\bookshare\dialogs.py", line 70, in eventHandler
    return SearchResultsDialog(title="Most Recent Books",
favoriteable=False, results=results)
  File "c:\prog\bookshare\dialogs.py", line 148, in __init__
    for res in self.results:
self.resultText.append(res.title.encode(wx.GetDefaultPyEncoding())+",
by "+res.authorStr.encode(wx.GetDefaultPyEncoding()))
  File "c:\python27\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0107'
in position 33: character maps to <undefined>

I have no way of knowing what character this is since it is pulling
from a website's api, but I have never seen this fail before. I
thought the whole point of GetDefaultPyEncoding() was to prevent this
sort of thing?

···

--
Have a great day,
Alex (msg sent from GMail website)
mehgcap@gmail.com; http://www.facebook.com/mehgcap

It tells you what character it is failing on:

\u0107
This is a c with an acute (�), see Unicode Character 'LATIN SMALL LETTER C WITH ACUTE' (U+0107)

Some sample lines would help ...
It looks like you are trying to encode a unicode character to unicode though:
"
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0107'
in position 33: character maps to <undefined>

"

Cheers

···

Am 04.10.11 05:30, schrieb Alex Hall:

Hi all,
I have so far relied on wx.GetDefaultPyEncoding() to prevent unicode
encoding problems from crashing my application, since they creep up
from time to time in the data my app looks at. However, today, this
method failed me for the first time. Here is the traceback:
Traceback (most recent call last):
  File "c:\prog\bookshare\lbc.py", line 605, in DefaultHandler
    if self.CustomHandler: return self.CustomHandler(self, event, sName)
  File "c:\prog\bookshare\dialogs.py", line 70, in eventHandler
    return SearchResultsDialog(title="Most Recent Books",
favoriteable=False, results=results)
  File "c:\prog\bookshare\dialogs.py", line 148, in __init__
    for res in self.results:
self.resultText.append(res.title.encode(wx.GetDefaultPyEncoding())+",
by "+res.authorStr.encode(wx.GetDefaultPyEncoding()))
  File "c:\python27\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0107'
in position 33: character maps to <undefined>

I have no way of knowing what character this is since it is pulling
from a website's api, but I have never seen this fail before. I
thought the whole point of GetDefaultPyEncoding() was to prevent this
sort of thing?

--
--------------------------------------------------
Tobias Weber
CEO

The ROG Corporation GmbH
Donaustaufer Str. 200
93059 Regensburg
Tel: +49 941 4610 57 55
Fax: +49 941 4610 57 56

www.roglink.com

Gesch�ftsf�hrer: Tobias Weber
Registergericht: Amtsgericht Regensburg - HRB 8954
UStID DE225905250 - Steuer-Nr.184/59359
--------------------------------------------------

There are several possible kinds of answers. I will choose
the less pleasant.

I have so far relied on wx.GetDefaultPyEncoding() to ...

This is a design mistake. Work in a unicode mode, decode
early and encode late. This is the rule. Not only it will
not fail, it cann't fail.

Your problem is not on the Python or wxPython level, your
problem is on the "coding of the characters" level.
Python, which is handling all this stuff natively and in
in an extremely good and power way, only reflect this.

import unicodedata
unicodedata.name(u'\u0107')

LATIN SMALL LETTER C WITH ACUTE

z = ['cp1252', 'mac-roman', 'iso-8859-1', 'cp437', 'cp850', 'cp858']
for e in z:

... try:
... u'\u0107'.encode(e)
... except UnicodeEncodeError:
... "coding {} expectedly fails".format(e)
...
coding cp1252 expectedly fails
coding mac-roman expectedly fails
coding iso-8859-1 expectedly fails
coding cp437 expectedly fails
coding cp850 expectedly fails
coding cp858 expectedly fails

I have no way of knowing what character this is since it
is pulling from a website's api, but I have never seen
this fail before.

Terrible to say. You are probably writing bad code since
years.

···

On 4 oct, 05:30, Alex Hall <mehg...@gmail.com> wrote:

-----------

Tip (only a bad workaround):

sys.stdout.encoding

cp1252

u'\N{LATIN SMALL LETTER C WITH ACUTE}'\

... .encode(sys.stdout.encoding, 'replace')
?

repr(u'\N{LATIN SMALL LETTER C WITH ACUTE}'\

... .encode(sys.stdout.encoding, 'ignore'))
''

jmf

There are several possible kinds of answers. I will choose
the less pleasant.

I have so far relied on wx.GetDefaultPyEncoding() to ...

This is a design mistake. Work in a unicode mode, decode
early and encode late. This is the rule. Not only it will
not fail, it cann't fail.

Your problem is not on the Python or wxPython level, your
problem is on the "coding of the characters" level.
Python, which is handling all this stuff natively and in
in an extremely good and power way, only reflect this.

Here's the thing. I started this, as I imagine most people do, with
simple tests on the command line before I wrote the gui. In the
command line interface, some characters in the data I am pulling would
cause unicode encode errors, even though I was not encoding at all.
When I encoded, things worked. When I wrote the gui, I kept that
"lesson" in mind and so used the
String.encode(wx.GetDefaultPyEncoding()) method to encode any data
from the website I was going to output, and things always worked.
After your message, I removed all calls to this function, and now I
get no errors at all. I wonder why I got errors when not encoding for
the command line, then no errors when not encoding for my gui. Well,
at least it works now, thanks! Still, I am interested to hear just
what I was doing wrong.

···

On 10/4/11, jmfauth <wxjmfauth@gmail.com> wrote:

On 4 oct, 05:30, Alex Hall <mehg...@gmail.com> wrote:

import unicodedata
unicodedata.name(u'\u0107')

LATIN SMALL LETTER C WITH ACUTE

z = ['cp1252', 'mac-roman', 'iso-8859-1', 'cp437', 'cp850', 'cp858']
for e in z:

... try:
... u'\u0107'.encode(e)
... except UnicodeEncodeError:
... "coding {} expectedly fails".format(e)
...
coding cp1252 expectedly fails
coding mac-roman expectedly fails
coding iso-8859-1 expectedly fails
coding cp437 expectedly fails
coding cp850 expectedly fails
coding cp858 expectedly fails

I have no way of knowing what character this is since it
is pulling from a website's api, but I have never seen
this fail before.

Terrible to say. You are probably writing bad code since
years.

-----------

Tip (only a bad workaround):

sys.stdout.encoding

cp1252

u'\N{LATIN SMALL LETTER C WITH ACUTE}'\

... .encode(sys.stdout.encoding, 'replace')
?

repr(u'\N{LATIN SMALL LETTER C WITH ACUTE}'\

... .encode(sys.stdout.encoding, 'ignore'))
''

jmf

--
To unsubscribe, send email to wxPython-users+unsubscribe@googlegroups.com
or visit http://groups.google.com/group/wxPython-users?hl=en

--
Have a great day,
Alex (msg sent from GMail website)
mehgcap@gmail.com; Redirecting...

Because Python is automatically going to try to encode any unicode objects for output to the terminal using the default encoding, which is usually ascii. By explicitly using wx.GetDefaultPyEncoding() then you were more likely to be using the default encoding for your locale instead, so it was able to handle more exotic content than the ascii encoding, so you got less errors.

···

On 10/4/11 5:35 PM, Alex Hall wrote:

On 10/4/11, jmfauth<wxjmfauth@gmail.com> wrote:

On 4 oct, 05:30, Alex Hall<mehg...@gmail.com> wrote:

There are several possible kinds of answers. I will choose
the less pleasant.

I have so far relied on wx.GetDefaultPyEncoding() to ...

This is a design mistake. Work in a unicode mode, decode
early and encode late. This is the rule. Not only it will
not fail, it cann't fail.

Your problem is not on the Python or wxPython level, your
problem is on the "coding of the characters" level.
Python, which is handling all this stuff natively and in
in an extremely good and power way, only reflect this.

Here's the thing. I started this, as I imagine most people do, with
simple tests on the command line before I wrote the gui. In the
command line interface, some characters in the data I am pulling would
cause unicode encode errors, even though I was not encoding at all.
When I encoded, things worked. When I wrote the gui, I kept that
"lesson" in mind and so used the
String.encode(wx.GetDefaultPyEncoding()) method to encode any data
from the website I was going to output, and things always worked.
After your message, I removed all calls to this function, and now I
get no errors at all. I wonder why I got errors when not encoding for
the command line, then no errors when not encoding for my gui. Well,
at least it works now, thanks! Still, I am interested to hear just
what I was doing wrong.

--
Robin Dunn
Software Craftsman

There are several possible kinds of answers. I will choose
the less pleasant.

I have so far relied on wx.GetDefaultPyEncoding() to ...

This is a design mistake. Work in a unicode mode, decode
early and encode late. This is the rule. Not only it will
not fail, it cann't fail.

Your problem is not on the Python or wxPython level, your
problem is on the "coding of the characters" level.
Python, which is handling all this stuff natively and in
in an extremely good and power way, only reflect this.

Here's the thing. I started this, as I imagine most people do, with
simple tests on the command line before I wrote the gui. In the
command line interface, some characters in the data I am pulling would
cause unicode encode errors, even though I was not encoding at all.
When I encoded, things worked. When I wrote the gui, I kept that
"lesson" in mind and so used the
String.encode(wx.GetDefaultPyEncoding()) method to encode any data
from the website I was going to output, and things always worked.
After your message, I removed all calls to this function, and now I
get no errors at all. I wonder why I got errors when not encoding for
the command line, then no errors when not encoding for my gui. Well,
at least it works now, thanks! Still, I am interested to hear just
what I was doing wrong.

Because Python is automatically going to try to encode any unicode
objects for output to the terminal using the default encoding, which is
usually ascii. By explicitly using wx.GetDefaultPyEncoding() then you
were more likely to be using the default encoding for your locale
instead, so it was able to handle more exotic content than the ascii
encoding, so you got less errors.

Ah, so wx already handles encoding then, unlike the terminal, so when
I tried to force encoding instead of letting Python and/or the os take
care of it in my gui, I restricted things and caused problems since I
was not letting Python just get on with outputting the optimal way?

···

On 10/4/11, Robin Dunn <robin@alldunn.com> wrote:

On 10/4/11 5:35 PM, Alex Hall wrote:

On 10/4/11, jmfauth<wxjmfauth@gmail.com> wrote:

On 4 oct, 05:30, Alex Hall<mehg...@gmail.com> wrote:

--
Robin Dunn
Software Craftsman
http://wxPython.org

--
To unsubscribe, send email to wxPython-users+unsubscribe@googlegroups.com
or visit http://groups.google.com/group/wxPython-users?hl=en

--
Have a great day,
Alex (msg sent from GMail website)
mehgcap@gmail.com; Redirecting...

Not quite. wxPython uses unicode internally (assuming you use a unicode build for 2.8 and prior) so there is no encoding needed if you give it unicode objects for widget values or whatever. Hence what JMF was suggesting earlier is basically to always use unicode and only encode/decode when needed at the "boundaries" of your application (such as when reading/writing data to files, network connections, etc.)

···

On 10/4/11 8:26 PM, Alex Hall wrote:

On 10/4/11, Robin Dunn<robin@alldunn.com> wrote:

On 10/4/11 5:35 PM, Alex Hall wrote:

On 10/4/11, jmfauth<wxjmfauth@gmail.com> wrote:

On 4 oct, 05:30, Alex Hall<mehg...@gmail.com> wrote:

There are several possible kinds of answers. I will choose
the less pleasant.

I have so far relied on wx.GetDefaultPyEncoding() to ...

This is a design mistake. Work in a unicode mode, decode
early and encode late. This is the rule. Not only it will
not fail, it cann't fail.

Your problem is not on the Python or wxPython level, your
problem is on the "coding of the characters" level.
Python, which is handling all this stuff natively and in
in an extremely good and power way, only reflect this.

Here's the thing. I started this, as I imagine most people do, with
simple tests on the command line before I wrote the gui. In the
command line interface, some characters in the data I am pulling would
cause unicode encode errors, even though I was not encoding at all.
When I encoded, things worked. When I wrote the gui, I kept that
"lesson" in mind and so used the
String.encode(wx.GetDefaultPyEncoding()) method to encode any data
from the website I was going to output, and things always worked.
After your message, I removed all calls to this function, and now I
get no errors at all. I wonder why I got errors when not encoding for
the command line, then no errors when not encoding for my gui. Well,
at least it works now, thanks! Still, I am interested to hear just
what I was doing wrong.

Because Python is automatically going to try to encode any unicode
objects for output to the terminal using the default encoding, which is
usually ascii. By explicitly using wx.GetDefaultPyEncoding() then you
were more likely to be using the default encoding for your locale
instead, so it was able to handle more exotic content than the ascii
encoding, so you got less errors.

Ah, so wx already handles encoding then, unlike the terminal, so when
I tried to force encoding instead of letting Python and/or the os take
care of it in my gui, I restricted things and caused problems since I
was not letting Python just get on with outputting the optimal way?

--
Robin Dunn
Software Craftsman