Jean-Michel,
I don't think I can really help you much. A lot of our wx.STC code was
written by an undergrad I employed who had the audacity to graduate and get
a real job. I've taken over the code for the Unicode conversion, but my
understanding of the details at the level you're asking about is, ummmmm,
imperfect.
I cann't help you, however I thank you for your very
interesting piece of code. I have a lot of problems with the
STC-unicode, this is the main reason I stick on the ANSI
wxPython build.
My application is aimed at academics who need to analyze the content of
video data they've collected. Since I'm in the field of education, I need
to support both Windows and Mac versions of the program. (An unusually high
proportion education researchers use Macs.) We have a multi-user version
which facilitates collaborative analysis, allowing multiple researchers to
work on the same data set simultaneously. As a result, I have users sharing
data cross-platform in real time. Since Windows and the Mac use different
ANSI character systems for accented characters, Unicode is my only option.
I'm now in my 4th month of working on converting the app to Unicode, and it
has not gone at all smoothly, so I understand your choice to stick with
ANSI. Things were much easier when we could do that.
I have a lot of questions, but lets start with a few ones:
- Do you know a way to get a correct char position.
<stc>.GetCurrentPos() returns the caret position, but it
returns the position counted in bytes. I would like to have
the real position of the "carreted" char.
Exemple: éléphant. If my carret is before the "l",
GetCurrentPos() returns 2, because the "é" counts for two
bytes. I wish to have 1. A carret positon just before the p
letter returns 5.
Sorry, can't help you. We don't try to look at character positions that
way, nor do we do anything that requires character counting.
- What is the internal unicode encoding scheme of the stc control?
I don't know. Under different circumstances, we use UTF-8, Latin-1, and
KOI8-r with the wx.STC, and I've never experienced troubles that I'd
attribute to the wx.STC's internal encoding. I've never looked into its
internal encoding.
- More pythonic: how do I get the length in bytes of a
unicode? u = 'abc\N{comet}def}' (\N{comet} == \u2604)
len(u) = 7length(u) in bytes ?
I'm not in a position where I need to know the length in actual characters.
When I use lengths, I need the length of the encoded string, so the value
the function returns works for me. We do all of our interacting with the
content of the wx.STC control using FindText() to determine a position.
You are welcome to explore my wx.STC code at length if you want. Start at
http://www.transana.org, and go to the "Development" > "Source Code" link.
A lot of the wx.STC specific code is in the RichTextEditCtrl.py file and the
TranscriptEditor.py files, which you can get from our SourceForge CVS.
Sorry I can't be more helpful.
David