Hi all,
I have so far relied on wx.GetDefaultPyEncoding() to prevent unicode
encoding problems from crashing my application, since they creep up
from time to time in the data my app looks at. However, today, this
method failed me for the first time. Here is the traceback:
Traceback (most recent call last):
File "c:\prog\bookshare\lbc.py", line 605, in DefaultHandler
if self.CustomHandler: return self.CustomHandler(self, event, sName)
File "c:\prog\bookshare\dialogs.py", line 70, in eventHandler
return SearchResultsDialog(title="Most Recent Books",
favoriteable=False, results=results)
File "c:\prog\bookshare\dialogs.py", line 148, in __init__
for res in self.results:
self.resultText.append(res.title.encode(wx.GetDefaultPyEncoding())+",
by "+res.authorStr.encode(wx.GetDefaultPyEncoding()))
File "c:\python27\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0107'
in position 33: character maps to <undefined>
I have no way of knowing what character this is since it is pulling
from a website's api, but I have never seen this fail before. I
thought the whole point of GetDefaultPyEncoding() was to prevent this
sort of thing?
Some sample lines would help ...
It looks like you are trying to encode a unicode character to unicode though:
"
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0107'
in position 33: character maps to <undefined>
"
Cheers
···
Am 04.10.11 05:30, schrieb Alex Hall:
Hi all,
I have so far relied on wx.GetDefaultPyEncoding() to prevent unicode
encoding problems from crashing my application, since they creep up
from time to time in the data my app looks at. However, today, this
method failed me for the first time. Here is the traceback:
Traceback (most recent call last):
File "c:\prog\bookshare\lbc.py", line 605, in DefaultHandler
if self.CustomHandler: return self.CustomHandler(self, event, sName)
File "c:\prog\bookshare\dialogs.py", line 70, in eventHandler
return SearchResultsDialog(title="Most Recent Books",
favoriteable=False, results=results)
File "c:\prog\bookshare\dialogs.py", line 148, in __init__
for res in self.results:
self.resultText.append(res.title.encode(wx.GetDefaultPyEncoding())+",
by "+res.authorStr.encode(wx.GetDefaultPyEncoding()))
File "c:\python27\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u0107'
in position 33: character maps to <undefined>
I have no way of knowing what character this is since it is pulling
from a website's api, but I have never seen this fail before. I
thought the whole point of GetDefaultPyEncoding() was to prevent this
sort of thing?
--
--------------------------------------------------
Tobias Weber
CEO
There are several possible kinds of answers. I will choose
the less pleasant.
I have so far relied on wx.GetDefaultPyEncoding() to ...
This is a design mistake. Work in a unicode mode, decode
early and encode late. This is the rule. Not only it will
not fail, it cann't fail.
Your problem is not on the Python or wxPython level, your
problem is on the "coding of the characters" level.
Python, which is handling all this stuff natively and in
in an extremely good and power way, only reflect this.
import unicodedata
unicodedata.name(u'\u0107')
LATIN SMALL LETTER C WITH ACUTE
z = ['cp1252', 'mac-roman', 'iso-8859-1', 'cp437', 'cp850', 'cp858']
for e in z:
There are several possible kinds of answers. I will choose
the less pleasant.
I have so far relied on wx.GetDefaultPyEncoding() to ...
This is a design mistake. Work in a unicode mode, decode
early and encode late. This is the rule. Not only it will
not fail, it cann't fail.
Your problem is not on the Python or wxPython level, your
problem is on the "coding of the characters" level.
Python, which is handling all this stuff natively and in
in an extremely good and power way, only reflect this.
Here's the thing. I started this, as I imagine most people do, with
simple tests on the command line before I wrote the gui. In the
command line interface, some characters in the data I am pulling would
cause unicode encode errors, even though I was not encoding at all.
When I encoded, things worked. When I wrote the gui, I kept that
"lesson" in mind and so used the
String.encode(wx.GetDefaultPyEncoding()) method to encode any data
from the website I was going to output, and things always worked.
After your message, I removed all calls to this function, and now I
get no errors at all. I wonder why I got errors when not encoding for
the command line, then no errors when not encoding for my gui. Well,
at least it works now, thanks! Still, I am interested to hear just
what I was doing wrong.
···
On 10/4/11, jmfauth <wxjmfauth@gmail.com> wrote:
On 4 oct, 05:30, Alex Hall <mehg...@gmail.com> wrote:
import unicodedata
unicodedata.name(u'\u0107')
LATIN SMALL LETTER C WITH ACUTE
z = ['cp1252', 'mac-roman', 'iso-8859-1', 'cp437', 'cp850', 'cp858']
for e in z:
Because Python is automatically going to try to encode any unicode objects for output to the terminal using the default encoding, which is usually ascii. By explicitly using wx.GetDefaultPyEncoding() then you were more likely to be using the default encoding for your locale instead, so it was able to handle more exotic content than the ascii encoding, so you got less errors.
···
On 10/4/11 5:35 PM, Alex Hall wrote:
On 10/4/11, jmfauth<wxjmfauth@gmail.com> wrote:
On 4 oct, 05:30, Alex Hall<mehg...@gmail.com> wrote:
There are several possible kinds of answers. I will choose
the less pleasant.
I have so far relied on wx.GetDefaultPyEncoding() to ...
This is a design mistake. Work in a unicode mode, decode
early and encode late. This is the rule. Not only it will
not fail, it cann't fail.
Your problem is not on the Python or wxPython level, your
problem is on the "coding of the characters" level.
Python, which is handling all this stuff natively and in
in an extremely good and power way, only reflect this.
Here's the thing. I started this, as I imagine most people do, with
simple tests on the command line before I wrote the gui. In the
command line interface, some characters in the data I am pulling would
cause unicode encode errors, even though I was not encoding at all.
When I encoded, things worked. When I wrote the gui, I kept that
"lesson" in mind and so used the
String.encode(wx.GetDefaultPyEncoding()) method to encode any data
from the website I was going to output, and things always worked.
After your message, I removed all calls to this function, and now I
get no errors at all. I wonder why I got errors when not encoding for
the command line, then no errors when not encoding for my gui. Well,
at least it works now, thanks! Still, I am interested to hear just
what I was doing wrong.
There are several possible kinds of answers. I will choose
the less pleasant.
I have so far relied on wx.GetDefaultPyEncoding() to ...
This is a design mistake. Work in a unicode mode, decode
early and encode late. This is the rule. Not only it will
not fail, it cann't fail.
Your problem is not on the Python or wxPython level, your
problem is on the "coding of the characters" level.
Python, which is handling all this stuff natively and in
in an extremely good and power way, only reflect this.
Here's the thing. I started this, as I imagine most people do, with
simple tests on the command line before I wrote the gui. In the
command line interface, some characters in the data I am pulling would
cause unicode encode errors, even though I was not encoding at all.
When I encoded, things worked. When I wrote the gui, I kept that
"lesson" in mind and so used the
String.encode(wx.GetDefaultPyEncoding()) method to encode any data
from the website I was going to output, and things always worked.
After your message, I removed all calls to this function, and now I
get no errors at all. I wonder why I got errors when not encoding for
the command line, then no errors when not encoding for my gui. Well,
at least it works now, thanks! Still, I am interested to hear just
what I was doing wrong.
Because Python is automatically going to try to encode any unicode
objects for output to the terminal using the default encoding, which is
usually ascii. By explicitly using wx.GetDefaultPyEncoding() then you
were more likely to be using the default encoding for your locale
instead, so it was able to handle more exotic content than the ascii
encoding, so you got less errors.
Ah, so wx already handles encoding then, unlike the terminal, so when
I tried to force encoding instead of letting Python and/or the os take
care of it in my gui, I restricted things and caused problems since I
was not letting Python just get on with outputting the optimal way?
···
On 10/4/11, Robin Dunn <robin@alldunn.com> wrote:
On 10/4/11 5:35 PM, Alex Hall wrote:
On 10/4/11, jmfauth<wxjmfauth@gmail.com> wrote:
On 4 oct, 05:30, Alex Hall<mehg...@gmail.com> wrote:
Not quite. wxPython uses unicode internally (assuming you use a unicode build for 2.8 and prior) so there is no encoding needed if you give it unicode objects for widget values or whatever. Hence what JMF was suggesting earlier is basically to always use unicode and only encode/decode when needed at the "boundaries" of your application (such as when reading/writing data to files, network connections, etc.)
···
On 10/4/11 8:26 PM, Alex Hall wrote:
On 10/4/11, Robin Dunn<robin@alldunn.com> wrote:
On 10/4/11 5:35 PM, Alex Hall wrote:
On 10/4/11, jmfauth<wxjmfauth@gmail.com> wrote:
On 4 oct, 05:30, Alex Hall<mehg...@gmail.com> wrote:
There are several possible kinds of answers. I will choose
the less pleasant.
I have so far relied on wx.GetDefaultPyEncoding() to ...
This is a design mistake. Work in a unicode mode, decode
early and encode late. This is the rule. Not only it will
not fail, it cann't fail.
Your problem is not on the Python or wxPython level, your
problem is on the "coding of the characters" level.
Python, which is handling all this stuff natively and in
in an extremely good and power way, only reflect this.
Here's the thing. I started this, as I imagine most people do, with
simple tests on the command line before I wrote the gui. In the
command line interface, some characters in the data I am pulling would
cause unicode encode errors, even though I was not encoding at all.
When I encoded, things worked. When I wrote the gui, I kept that
"lesson" in mind and so used the
String.encode(wx.GetDefaultPyEncoding()) method to encode any data
from the website I was going to output, and things always worked.
After your message, I removed all calls to this function, and now I
get no errors at all. I wonder why I got errors when not encoding for
the command line, then no errors when not encoding for my gui. Well,
at least it works now, thanks! Still, I am interested to hear just
what I was doing wrong.
Because Python is automatically going to try to encode any unicode
objects for output to the terminal using the default encoding, which is
usually ascii. By explicitly using wx.GetDefaultPyEncoding() then you
were more likely to be using the default encoding for your locale
instead, so it was able to handle more exotic content than the ascii
encoding, so you got less errors.
Ah, so wx already handles encoding then, unlike the terminal, so when
I tried to force encoding instead of letting Python and/or the os take
care of it in my gui, I restricted things and caused problems since I
was not letting Python just get on with outputting the optimal way?