Right, that's exactly the arbitrariness I've been worrying about. In some of the test text files I'm dealing with, \x8f is what shows up as e-with-graze-accent -- shows up, I mean, in BBEdit, in the STC panel in my program, in the WingIDE editor.
The string returned by wx.GetDefaultPyEncoding, on my development machine (haven't had a chance to check the Windows XP machine) is 'mac-roman'. How portable does that seem? Not bloody very.
So how *should* I go about specifying a regex to search -- on both OSX and Windows -- in text files that may contain a few accented characters -- for those accented characters? I need to treat 'e' and 'è' quite differently. (Did that second one show up as e-with-grave-accent in this message??)
Charles Hartman
Yes, it did appear as an e-accent-grave on both MacOS X and RedHat FC2.
But -for what it is worth- this is Gmail and I am using Mozilla
Firefox to read the message,
/Jean Brouwers
···
On 4/11/05, Charles Hartman <charles.hartman@conncoll.edu> wrote:
Right, that's exactly the arbitrariness I've been worrying about. In
some of the test text files I'm dealing with, \x8f is what shows up as
e-with-graze-accent -- shows up, I mean, in BBEdit, in the STC panel in
my program, in the WingIDE editor.
The string returned by wx.GetDefaultPyEncoding, on my development
machine (haven't had a chance to check the Windows XP machine) is
'mac-roman'. How portable does that seem? Not bloody very.
So how *should* I go about specifying a regex to search -- on both OSX
and Windows -- in text files that may contain a few accented characters
-- for those accented characters? I need to treat 'e' and 'è' quite
differently. (Did that second one show up as e-with-grave-accent in
this message??)
Charles Hartman
---------------------------------------------------------------------
To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org
Right, that's exactly the arbitrariness I've been worrying about. In
some of the test text files I'm dealing with, \x8f is what shows up as
e-with-graze-accent -- shows up, I mean, in BBEdit, in the STC panel in
my program, in the WingIDE editor.
Well it is known that Windows uses some "illegal" unicode/iso-8859-1
characters (in the 0x80-0xAF range), but not for accented characters.
Macintosh used to have its own character set, but recent versions of
MacOS should be compatible with utf-8 AFAIK.
I suggest you abandon legacy encodings (especially proprietary Windows
or Mac ones) and convert your files to utf-8. If you have an Unix shell
you can use the "iconv" command to convert files from/to different
character sets.
As for your regexp, I think you should do the match in Unicode using an
Unicode regexp, e.g.:
>>> pattern = u"[éè]"
>>> pattern
u'[\xe9\xe8]'
>>> print re.search(pattern, u"éléphant")
<_sre.SRE_Match object at 0xb7b0d7c8>
>>> print re.search(pattern, u"escargot")
None
Regards
Antoine.
···
Le lundi 11 avril 2005 à 19:14 -0400, Charles Hartman a écrit :
Hi Charles,
Charles Hartman wrote:
Right, that's exactly the arbitrariness I've been worrying about. In some of the test text files I'm dealing with, \x8f is what shows up as e-with-graze-accent -- shows up, I mean, in BBEdit, in the STC panel in my program, in the WingIDE editor.
The string returned by wx.GetDefaultPyEncoding, on my development machine (haven't had a chance to check the Windows XP machine) is 'mac-roman'. How portable does that seem? Not bloody very.
So how *should* I go about specifying a regex to search -- on both OSX and Windows -- in text files that may contain a few accented characters -- for those accented characters? I need to treat 'e' and 'è' quite differently. (Did that second one show up as e-with-grave-accent in this message??)
Yeap, on Windows XP with Mozilla e-mail.
···
Charles Hartman
---------------------------------------------------------------------
To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org