Hi,
After the last comments, I'm still confused.
Can somebody confirm my understanding with this
practical example? I wish to pass the following
*text*, "�l�phant" to "wxPhoenix".If I'm passing "�l�phant" as
1) "�l�phant", type 'str', coding cp1252, iso-8859-1,
iso-8859-15, cp850, mac-roman. This will fails because
"�l�phant"
- is not an ascii byte string
- is not an utf-8 byte string
- is not a unicode, a Python 'unicode' type
Correct.
You've actually highlighted one reason why I'm thinking that dropping auto-conversion support for the locale's default encoding is a good idea. There are so many overlaps that some strings can be compatible with multiple encodings and some programmers may assume that if their test cases work with one then they'll work with others. But there are also enough differences that bugs always creep in when the locale's default encoding is different than what they've tested with. By officially supporting only unicode with auto converts from ascii and utf-8 I think we will eliminate a lot of potential bugs with no loss of functionality and with the only cost being a slightly decreased convenience factor.
2) "\xc3\xa9l\xc3\xa9phant", type 'str', utf-8.
Success because
- it is a type 'str'
- it is an utf-8 byte string
Correct.
3) u"�l�phant", type 'unicode', (coding does not count)
Success because
- it is a Python 'unicode' type
Correct.
4) "\x00\xe9\x00l\x00\xe9\x00p\x00h\x00a\x00n\x00t",
type 'str', utf-16-be
It fails because it
- is not an ascii byte string
- is not an utf-8 byte string
- is not a unicode, a Python 'unicode' type
Correct.
5) "\x00\x00\x00\xe9\x00\x00\x00l\x00\x00\x00\xe9
\x00\x00\x00p\x00\x00\x00h\x00\x00\x00a\x00\x00\x00
n\x00\x00\x00t", type 'str', utf-32-be
It fails because it
- like 4)
Correct.
Note : ascii byte string == a string containing only
bytes supposed to represent ascii valid "code points"
/ characters.
Yes, in other words the characters represented by the lowest 7 bits.
I've been working on the new code for the wxString conversion a bit this weekend. I'll attach the unittest TestCase I'm using to verify the conversions. I used your example texts from above.
BTW, with the reduced complexity and with some newer functionality in wxString I've reduced the typemap conversion code down to about 30 lines of easily understood C++.
test_string.py (2.55 KB)
···
On 11/20/10 3:49 AM, jmfauth wrote:
--
Robin Dunn
Software Craftsman