A few more Windows locale-related remarks

Hello,
I’ve been away from wxPython for a few weeks and I see that I missed some interesting work on locale-related issues, including a new implementation of wx.App.InitLocale (#1702).
I would like to share some thoughts here, apologizing if I step in only when everything has been done; also, many things I’m about to say could be already well-known.

First of all, yes, Windows locale names are tricky in Python. This is less Windows’ fault than Python’s, indeed: for instance, Windows (10) understands both hyphenated and underscored tags:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'en_US')
'en_US'
>>> locale.localeconv()
{'int_curr_symbol': 'USD', 'currency_symbol': '$', ...}
>>> locale.setlocale(locale.LC_ALL, 'en-GB')
'en-GB'
>>> locale.localeconv()
{'int_curr_symbol': 'GBP', 'currency_symbol': '£', ...}

Unfortunately, then Python hits a few bumps down the road:

>>> locale.getlocale()
ValueError: unknown locale: en-GB

But really, every locale-dependent operation will fail if Python hits an “unknown locale” (unknown to Python, that is!):

>>> import time
>>> time.strptime('12:00', '%H:%M')
ValueError: unknown locale: en-GB

This has always been the case, with Python locale module on Windows. The main reference to understand what’s going on is this old-running bug: https://bugs.python.org/issue37945. I would really recommend to read this thread, especially Eryk Sun’s brilliant explanations about locale names in Windows.
In short, the problem boils down to locale._parse_localename / locale.normalize being broken on Windows.

For instance, consider Zobal’s problem when importing Pandas after the wx.App has been created: it is not really a wxPython issue! To get into this sort of trouble with Python on Windows you don’t need to get wxPython involved:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'en-GB')
'en-GB'
>>> import pandas
ValueError: unknown locale: en-GB

Apparently, importing Pandas involves some locale checking… and that’s all it takes for Python to go down in flames if the tag is “unknown”. (And yes, Pandas could well do a little damage recovery here… I would file an issue on their bug tracker as well.)

My second remark is about what has changed in Python 3.8. The short answer is: very little has changed. I addressed this topic here: https://bugs.python.org/issue38805#msg373896. In short, bpo-34485 now sets locale.LC_CTYPE at startup on Windows. As a collateral effect, locale.getlocale() now always returns a locale, even before one was explicitly set with locale.setlocale(). This is because the careless call locale.getlocale() with no arguments happens to default to locale.setlocale(category=LC_CTYPE). But if you query any other category, you will see that nothing else has changed.

However, all of this has very little, if nothing, to do with wxPython itself. To be clear, this simple code:

>>> import time, wx
>>> app = wx.App()
>>> time.strptime('12:00', '%H:%M')  # I live in Italy, btw
ValueError: unknown locale: it-IT

will fail with both Python 3.7 and 3.8 on Windows (under the current wx.App.InitLocale, of course).

As a result, I find the new “Possible locale Mismatch on Windows” section to be misleading in its first part.

Moving to wxWidgets, it must be said that it has its fair share of long-standing issues: see https://trac.wxwidgets.org/ticket/11594, where we learn that apparently wxWidgets does the Windows locale queries the wrong way.

Finally we have wxPython, where a (dubious) query made by wxWidgets yields a locale tag that cannot be parsed back by the (faulty) Python locale machinery.
Ideally, one should never use Python anymore for locale-aware operations, once the wx.App has been created. For instance, use wx.Datetime and not Python datetime, and so on. Of course, not always this can be the case: sometimes you have to, say, import Pandas (!).

Now, trying to fix these errors at wxPython’s level is a rather quixotic undertaking. The mitigation effort in wx.App.InitLocale is welcome of course, but it must be clear that it is only a partial fix.
TBH, I’m not a fan of wx.App.InitLocale: it seems to me that wxPython is taking on problems that are not its own; furthermore, the “fix” stays hidden within the implementation details of the wx.App, as a rather invasive side-effect. I would prefer an explicitly opt-in solution, if any.

That said, under the new proposed wx.App.InitLocale implementation, the sequence

>>> app = wx.App()
>>> time.strptime('12:00', '%H:%M')

now works as expected, and it is indeed a welcome mitigation.
Of course, one should be aware that the “Python/wxWidgets locale war” is still raging: if we change the locale at runtime, we are back to square one:

>>> import wx, locale, time
>>> app = wx.App()
>>> locale.getlocale()  # I'm from Italy
('it_IT', 'ISO8859-1')
>>> wxloc = wx.Locale(wx.LANGUAGE_ENGLISH)
>>> locale.getlocale()
ValueError: unknown locale: en-GB
>>> time.strptime('12:00', '%H:%M')
ValueError: unknown locale: en-GB

Furthermore, the new implementation queries the default locale with locale.getdefaultlocale, which is a safer choice I guess, but also will erase any previous locale set by Python:

>>> import wx, locale
>>> locale.setlocale(locale.LC_ALL, 'en_US')
'en_US'
>>> locale.localeconv()
{'int_curr_symbol': 'USD', 'currency_symbol': '$', ...}
>>> app = wx.App()
>>> locale.localeconv()
{'int_curr_symbol': 'EUR', 'currency_symbol': '€', ...}

Of course one could well argue that there is little point in trying to account for any previously set locale on Windows anyway. Most of the time, this locale will be just the default locale, being previously set with the common

>>> locale.setlocale(locale.LC_ALL, '')
'Italian_Italy.1252'

and this, on Windows, sets the locale name to a long string that no way wxPython can understand:

>>> wx.Locale.GetLocaleInfo('Italian_Italy') # -> 'None', of course!

That said, the current implementation at least makes an effort to honour a possible current locale, which I guess could pay off on non-Windows systems.

Also, there is no way to leave the locale unset - except by overwriting wx.App.InitLocale, of course. I don’t know how I feel about this: to me, setting a locale should always be a conscious choice.
Even worse, the new implementation leaves indeed the locale unset - on any non-Windows platform! This is a cross-platform compatibility issue and can’t be right…

Finally, as I’ve already said, I’m not really convinced that this code should be hidden in the wx.App startup machinery. If we must undertake an effort of keeping wx.Locale and Python locale in sync, then I would prefer to have a wxPython-only subclass (say, wx.PyLocale) instead, and move there the needed logic. This way you could use wx.PyLocale explicitly (and hope for the best) if you need to keep the locales in sync. As an added bonus, you would re-sync the locales every time a new wx.PyLocale is instantiated, thus allowing changing the locale at runtime.

And that’s all I’m able to say on the topic at the moment, I’m afraid. I hope I can find the time to come up with something more concrete, but the thruth is, there is no silver bullet here.
Best,
riccardo

2 Likes

Thanks for all the research and details @ricpol.