How to convert HTML to wx.richtext's XML?

I have several hundred small HTML files. I want to convert these files to wx.richtext’s XML format.
I tried using the RichTextBuffer:

buf = wx.richtext.RichTextBuffer()
ok = buf.LoadFile('/tmp/q.html', type=wx.richtext.RICHTEXT_TYPE_HTML)
print(f'load q {ok}')
ok = buf.SaveFile('/tmp/q.xml', type=wx.richtext.RICHTEXT_TYPE_XML)
print(f'save q {ok}')
del buf
buf = wx.richtext.RichTextBuffer()
ok = buf.LoadFile('/tmp/d.html', type=wx.richtext.RICHTEXT_TYPE_HTML)
print(f'load d {ok}')
ok = buf.SaveFile('/tmp/d.xml', type=wx.richtext.RICHTEXT_TYPE_XML)
print(f'save d {ok}')

But in all cases it output False and didn’t load or save anything.

I know I could use one of Python’s HTML parsers, but don’t want to reinvent the wheel and surely this has been done?

Incidentally, the SaveFile docs just say “Not all handlers will implement file saving.” They really ought to specify which formats are loadable and which savable.

Is the wx richtext XML format documented somewhere? It seems like I’m going to have to write an html.HTMLParser subclass to get HTML into a RichTextBuffer and knowing the format’s specification would help a lot.

I’m pretty sure somebody has done this in the past (more than a few years ago, IIRC) but i haven’t had time to search for it yet. You might be able to dig it up with some googling.

I tried looking before both on google and in this list but came up empty. I’m amazed that reading HTML into wx.richtext.RichTextBuffer isn’t standard but maybe I’m alone in wanting that. Anyway, I’ll try to do it myself — fortunately for me my HTML files aren’t very fancy.

And now I’ve found the specification wx richtext specification.