Problem with iewin.LoadString on loading utf-8 encoded pages

Hong Yuan wrote:

Since you are using a unicode build then it is converting the file data to a unicode object before passing it to LoadString, so it is probably a problem in the automatic conversion or in how IE is interpreting what it is getting passed. Try decoding the utf-8 yourself and changing the Content-Type to match.

Thanks Robin,

I tried to narrow down the problem by writing just:

htmlctrl.LoadString(u'<html><body>\u554a</body></html>')

IE displayed:

JU

which seems to be the utf-16 encoded input string.

How come that IE finally receives a 'utf-16' encoded string? My python system default encoding is 'utf-8'.

Because since yo have a unicode build of wxPython the unicode string is being passed "as is" to IE. The python default encoding is only used to convert to unicode from strings in the unicode build of wxPython or to convert unicode to a string in the ansi build of wxPython.

Proabbly what you need is a way to pass your utf-8 encoded string that you fetch from your database directly to IE with no translation to or from unicode. Try using the LoadStream method instead. This appears to work for me in a PyShell (using my 2.5.4.1 workspace, I didn't try it with 2.5.3.1):

>>> import wx
>>> f = wx.Frame(None)
>>> import wx.lib.iewin as iewin
>>> i = iewin.IEHtmlWindow(f)
>>> f.Show()
>>> from cStringIO import StringIO
>>> s = '<html><meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"><body>Hello: \xe5\x95\x8a </body></html>'

···

stream = StringIO(s)

>>> i.LoadStream(stream)

--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!

Because since yo have a unicode build of wxPython the unicode string is being passed "as is" to IE. The python default encoding is only used to convert to unicode from strings in the unicode build of wxPython or to convert unicode to a string in the ansi build of wxPython.

Proabbly what you need is a way to pass your utf-8 encoded string that you fetch from your database directly to IE with no translation to or from unicode. Try using the LoadStream method instead. This appears to work for me in a PyShell (using my 2.5.4.1 workspace, I didn't try it with 2.5.3.1):

>>> import wx
>>> f = wx.Frame(None)
>>> import wx.lib.iewin as iewin
>>> i = iewin.IEHtmlWindow(f)
>>> f.Show()
>>> from cStringIO import StringIO
>>> s = '<html><meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"><body>Hello: \xe5\x95\x8a </body></html>'
>>> stream = StringIO(s)
>>> i.LoadStream(stream)

Thanks Robin. I tried your method and find it only works with the ansi build. And actually with the ansi-build my original problem with LoadString is also gone, since wxPython is not trying to do any conversion automatically.

With the unicode version however, LoadStream as used in the code above leads to an application crash:

Error Signature:
AppName: python.exe AppVer: 0.0.0.0 ModName: unknown
ModVer: 0.0.0.0 Offset: 0096efc0

Apparantly the behavior of lib.iewin in unicode build needs some improvements. Is it not a bug that LoadString passes a unicode object 'as is' to IE? Is it a problem of wxPython or wxWidgets?

For now can you suggest a more graceful work-around than writing the string from database first into a temporary file, and then use LoadUrl to load it?

···

--
HONG Yuan
Homemaster Trading Co., Ltd.
No. 601, Bldg. 41, 288 Shuangyang Rd. (N)
Shanghai 200433, P.R.C.
Tel: +86 21 55056553
Fax: +86 21 55067325
E-mail: hongyuan@homemaster.cn