from ansi to unicode, problem with stc

Jean-Michel Fauth wrote:

Hi,

I have downloaded the wxPython unicode version and
installed it on my two boxes (win98 and win2k). I
wanted to understand the issues of psi with the
unicode version. I succeeded to narrow the problem.
It lies in the stc control.

I created some test applications, testing
the stc in the unicode environment, caret position,
styling, test length and so on.

From that, I realised, the stc has some problems

when using char with code > 127 (à, é, è...).

As far as I understand, the "properities" of the
stc control (caret position, text length) are not
based on the char as glyph, but are based on the
underlying byte representation of the char.
Exemple: in a word like éléphant, the p has a stc
position of 4 and the length is 10.

Did I understand correctly or did I do a mistake?

You understand correctly. In a Unicode build Scintilla uses utf-8 for the document buffer and so characters can have various byte-lengths. I think it would have made more sense and been easier to use UCS-2 or at least whatever wchar_t represents on the platform, but that would obviously greatly increase the amount of memory used by the document buffer. Unfortunatly the current situation doesn't always map well conceptually to how wide characters are used in wxWidgets and wxPython.

···

--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!