[wxPython-users] coding error in maskededitctrl wxPython 2.5. 1.5u

Robin,

Mignon Laurent wrote:

An Error occurs when I type an accentuated characters in a
MaskedTextCtrl and locale.setlocale(locale.LC_ALL, '')

Traceback (most recent call last):
  File "C:\soft\PYTHON23\lib\site-packages\wx\lib\maskededit.py",
line 2797, in _OnChar
    if keep_processing and self._isCharAllowed( chr(key), pos,
checkRegex = True ):
  File "C:\soft\PYTHON23\lib\site-packages\wx\lib\maskededit.py",
line 4336, in _isCharAllowed
    newvalue, ignore, ignore, ignore, ignore = self._insertKey(char,
at, sel_sta
rt, sel_to, value, allowAutoSelect=True)
  File "C:\soft\PYTHON23\lib\site-packages\wx\lib\maskededit.py",
line 4790, in _insertKey
    newtext = left + char + right
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position
0: ordinal
not in range(128)

In fact at line 4790 the string concatenation is made between 2
unicode object 'left' and 'right' and a char object 'char'. Indeed
    char is initialized with line 2797 of the function _OnChar(self,
event) of MaskedEditMixin class char = char(key) # key is int

I solve this problem by replacing the line 4790 (newtext = left +
    char + right) by these 3 lines if type(char) is UnicodeType:
        char = char.decode("iso-8859-1")
    newtext = left + char + right

Is-it the good solution and is this problem solved in the new release?

You then wrote back:

I think so. Try it out and see.

I don't know whether this is a "good" solution; I *do* know that
others have worked around this same issue differently, but I'm not
sure how their solution works!

You once wrote (in response to someone having the same issue):

In the unicode builds of wxPython all strings passed to wxWidgets are
converted to unicode using the default encoding, and all wxStrings
returned from wxWidgets methods are returned as Python unicode
  objects. You can see errors like the above when passing values to
wxWidgets methods that don't belong in the ascii encoding. In this
case it probably means that either left or right (or both) are
unicode objects and char is a string that it is trying to coerce to
unicode in order to do the concatenation.

The same workarounds for passing strings to the C++ methods would
apply here. Either decode the string to a unicode object before
using it, or set the default encoding for Python to latin1 or another
appropriate setting.

But I couldn't tell if this was meant as something the control should
be doing or something that the user should be doing outside of it.
Which did you mean, and if inside the control's code, how?

For the same issue, Jean-Michel Fauth wrote:

I am not a MaskEditControls user, but I take a look at this
problem because I'm very sensitive to all the "char coding" issue(s)
in Python and/or wxPython. I made some tests with the MaskedTextCtrl.

1) It seems the ctrl is working fine, understand it works with all
ANSI chars. (mask = "")
2) When using a mask, things are becoming a little more complicate,
if you use international chars on a win platforms. ANSI chars with
code > 127 are not working.
To cicumvent this, one way is to use the locale module as described
in the demo. I do not like this approach, because a wxPython ctrl
should work without such a trick. Beside this, it is not working
with all ANSI chars ! 3) If you carefully read the doc, by default
the allowed valids chars are the chars defined in the Python string
module: string.letters, string.punctuation, string.digits.
Unfortunately, these strings are not windows compliant, they do not
correspond to the chars in the ANSI code range 32 to 255.
4) The good news, as it has been suggested, is to use the includeChars
attribute. But instead of inserting some more chars, it is better to
introduced the whole palette of the ANSI chars.

ansichars = ""
for i in xrange(32, 256):
    ansichars += chr(i)

ctrl = med.MaskedTextCtrl(self, -1, ..., mask=....,

includeChars=ansichars)

I works fine with all "windows chars"; direct keys inputs, modifiied
key inputs
including AltGr+X, Alt + 0nnn, dead keys, ...

Now I'm puzzled, because I don't understand why the decoding is no
longer necessary if you use the "includeChars". The code above that
is otherwise tracing back has not changed; the left and right above
are coming from the text control itself, and char is chr(x) where x
is the keycode pressed. (!?!) Should I add something to to the code to
make this less of an issue?

I'm also wondering if I should make a new "mask character", that means
"ansichars" as above, or if I should redfine 'X' to mean ansichar (as
above), rather than string.letters + string.digits + string.punctuation?

Let me know,
/Will Sadkin

[Sorry for not responding to your other message about this, it got forgotten in my inbox]

Will Sadkin wrote:
[...]

You once wrote (in response to someone having the same issue):

In the unicode builds of wxPython all strings passed to wxWidgets are
converted to unicode using the default encoding, and all wxStrings
returned from wxWidgets methods are returned as Python unicode
objects. You can see errors like the above when passing values to
wxWidgets methods that don't belong in the ascii encoding. In this
case it probably means that either left or right (or both) are
unicode objects and char is a string that it is trying to coerce to
unicode in order to do the concatenation.

The same workarounds for passing strings to the C++ methods would
apply here. Either decode the string to a unicode object before
using it, or set the default encoding for Python to latin1 or another
appropriate setting.

But I couldn't tell if this was meant as something the control should
be doing or something that the user should be doing outside of it.
Which did you mean, and if inside the control's code, how?

I don't know for sure what the best approach would be, and it would probably take more understanding of unicode and the 8-bit encoding schemes to really know...

Keep in mind however that when using a unicode build of wxPython then all values that are given to the textctrl will be converted to unicode if it isn't already, and the results of every GetValue will be a unicode object. So it seems to make sense that the masked controls should also deal with only unicode objects internally. You can easily tell if you are on a unicode build of wxPython by checking

  if 'unicode' in wx.PlatformInfo:
    ...

For the same issue, Jean-Michel Fauth wrote:

I am not a MaskEditControls user, but I take a look at this
problem because I'm very sensitive to all the "char coding" issue(s)
in Python and/or wxPython. I made some tests with the MaskedTextCtrl.

1) It seems the ctrl is working fine, understand it works with all
ANSI chars. (mask = "")
2) When using a mask, things are becoming a little more complicate,
if you use international chars on a win platforms. ANSI chars with code > 127 are not working.
To cicumvent this, one way is to use the locale module as described
in the demo. I do not like this approach, because a wxPython ctrl should work without such a trick. Beside this, it is not working with all ANSI chars ! 3) If you carefully read the doc, by default the allowed valids chars are the chars defined in the Python string module: string.letters, string.punctuation, string.digits. Unfortunately, these strings are not windows compliant, they do not correspond to the chars in the ANSI code range 32 to 255.
4) The good news, as it has been suggested, is to use the includeChars
attribute. But instead of inserting some more chars, it is better to
introduced the whole palette of the ANSI chars.

ansichars = ""
for i in xrange(32, 256):
   ansichars += chr(i)

ctrl = med.MaskedTextCtrl(self, -1, ..., mask=....,

includeChars=ansichars)

I works fine with all "windows chars"; direct keys inputs, modifiied
key inputs including AltGr+X, Alt + 0nnn, dead keys, ...

Now I'm puzzled, because I don't understand why the decoding is no
longer necessary if you use the "includeChars". The code above that
is otherwise tracing back has not changed; the left and right above
are coming from the text control itself, and char is chr(x) where x
is the keycode pressed. (!?!) Should I add something to to the code to
make this less of an issue?

I think that this is separate from the unicode issue. By default string.letters is the english alphabet but in other locales there are other letters that are valid for entering words, names, etc. To complicate things, eventhough the ascii encoding only defines the characters for positions 0-127, but the upper 128 positions do have characters as well, which can be entered, and some of which (but not all) are letters...

Anyway, depending on the encoding (or CodePage) used the upper 128 characters may change. If changing the locale via Python then the string.letters value will be changed as well. For example:

>>> import string
>>> string.letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>>
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "")
'English_United States.1252'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\x83\x8a\x8c\x8e\x9a\x9c\x9e\x9f\xaa\xb5\xba\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

>>> locale.getlocale()
['English_United States', '1252']
>>> locale.setlocale(locale.LC_ALL, "C")
'C'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
>>> locale.getlocale()
(None, None)

I'm also wondering if I should make a new "mask character", that means
"ansichars" as above, or if I should redfine 'X' to mean ansichar (as
above), rather than string.letters + string.digits + string.punctuation?

It depends on if you want 'X' to mean anything is allowed, or for it to just mean letters, digits and punctuation. If the latter then you could just document that the locale should be set so string.letters will properly reflect what are letters for that locale's encoding.

···

--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!