Update on the Wiki

Hi all,

As I'm sure almost everybody has noticed by now the wxPyWiki has been down for a few weeks, with just a message about spammer attacks in its place. Even though editing had been disabled for a few months, it appeared that the spambots were not giving up on trying to add new pages, and between that, a couple rogue spiders, and normal usage, the CPU and comms load that MoinMoin was putting on the system was enough for the hosting provider to notice and to threaten to shut down the account because of the disruption it was causing to the other servers on the same node.

I've moved things to a different hosting account with more RAM, allotted bandwidth, etc. and I finally got things ready to try and do something about the wiki. I spent much of the weekend working on porting the content to MediaWiki, since it has some better security options and extensions (like requiring a confirmed email address to edit, integrating the new reCaptcha tool from Google, etc.) Unfortunately the porting attempt was a dismal failure. :frowning: All the conversion tools I was able to find were old and unmaintained, and MediaWiki has moved on enough since they were written that even the best of the tools was no longer fully compatible with MW. I was able to get the pages created, but there were problems with almost all of them. Almost every page would need to be edited by hand to fix markup issues, fix all links to images and attachments, copy off-site images into attachments and fix their links, some converted pages would need to be tossed out and recreated because the conversion script lost some of the content, and every attachment would need to be uploaded into the wiki by hand. Blech!

So, instead, I'm going to try reinstating the MoinMoin wiki with a few changes and see how that goes. The changes are:

1. I removed all user accounts. You'll need to create a new account if you want to subscribe to page changes or to edit pages.

2. To edit pages you will need to be added to the TrustedEditorsGroup. This is to help keep the bots out. You will need to contact somebody already in the TrustedEditorsGroup and ask them to add you to the group (which just entails adding the user ID to the TrustedEditorsGroup page.)

3. I lowered the limits on the surge protection feature a little bit. So if any one user or IP address is accessing too many pages per minute then they'll get a warning, and then be shut out if it happens too often.

As of now the wiki.wxpython.org site is live again, but I'll be keeping an eye on it for a while. If the stupid bots and spiders are still being stupid and the resource utilization shoots up too high then I'll need to take it down again and we'll need to figure out some other approach. Maybe we can have an online fix-the-wiki group Sprint on some Saturday to go through the MediaWiki copy of the content and fix it up as described above.

If anybody has been involved with a successful MoinMoin --> MediaWiki conversion, (or MoinMoin --> anything else) please let me know how it was done.

···

--
Robin Dunn
Software Craftsman
http://wxPython.org

Dear Sir, please help me,

I have a grid which uses gridlib.PyGridTableBase as data source. I want to merge some cells with the method below, so that cell (1,10) will span across 5 columns and 1 row:

self.SetCellSize(1, 10, 1, 5)

But when I call this method, the grid messes up and becomes unresponsive. Please help

···

On Tuesday, January 13, 2015 at 5:49:07 AM UTC+2, Robin Dunn wrote:

Hi all,

As I’m sure almost everybody has noticed by now the wxPyWiki has been
down for a few weeks, with just a message about spammer attacks in its
place. Even though editing had been disabled for a few months, it
appeared that the spambots were not giving up on trying to add new
pages, and between that, a couple rogue spiders, and normal usage, the
CPU and comms load that MoinMoin was putting on the system was enough
for the hosting provider to notice and to threaten to shut down the
account because of the disruption it was causing to the other servers on
the same node.

I’ve moved things to a different hosting account with more RAM, allotted
bandwidth, etc. and I finally got things ready to try and do something
about the wiki. I spent much of the weekend working on porting the
content to MediaWiki, since it has some better security options and
extensions (like requiring a confirmed email address to edit,
integrating the new reCaptcha tool from Google, etc.) Unfortunately the
porting attempt was a dismal failure. :frowning: All the conversion tools I
was able to find were old and unmaintained, and MediaWiki has moved on
enough since they were written that even the best of the tools was no
longer fully compatible with MW. I was able to get the pages created,
but there were problems with almost all of them. Almost every page
would need to be edited by hand to fix markup issues, fix all links to
images and attachments, copy off-site images into attachments and fix
their links, some converted pages would need to be tossed out and
recreated because the conversion script lost some of the content, and
every attachment would need to be uploaded into the wiki by hand. Blech!

So, instead, I’m going to try reinstating the MoinMoin wiki with a few
changes and see how that goes. The changes are:

  1. I removed all user accounts. You’ll need to create a new account if
    you want to subscribe to page changes or to edit pages.

  2. To edit pages you will need to be added to the TrustedEditorsGroup.
    This is to help keep the bots out. You will need to contact somebody
    already in the TrustedEditorsGroup and ask them to add you to the group
    (which just entails adding the user ID to the TrustedEditorsGroup page.)

  3. I lowered the limits on the surge protection feature a little bit.
    So if any one user or IP address is accessing too many pages per minute
    then they’ll get a warning, and then be shut out if it happens too often.

As of now the wiki.wxpython.org site is live again, but I’ll be keeping
an eye on it for a while. If the stupid bots and spiders are still
being stupid and the resource utilization shoots up too high then I’ll
need to take it down again and we’ll need to figure out some other
approach. Maybe we can have an online fix-the-wiki group Sprint on some
Saturday to go through the MediaWiki copy of the content and fix it up
as described above.

If anybody has been involved with a successful MoinMoin → MediaWiki
conversion, (or MoinMoin → anything else) please let me know how it
was done.


Robin Dunn

Software Craftsman

http://wxPython.org

Hi Steve,

You should not 'hijack' a thread.:wink:

Werner

Robin Dunn wrote:

As of now the wiki.wxpython.org site is live again, but I'll be keeping
an eye on it for a while. If the stupid bots and spiders are still being
stupid and the resource utilization shoots up too high then I'll need to
take it down again and we'll need to figure out some other approach.

Bad news, good news, better news, more better news...

Bad news: There were almost 900 new user accounts added to the wiki in about 24 hours. I'm guessing that at least 99.5% were bots. (There were even 25 of them in the few minutes the site was live before I announced it on these mail lists!)

Good news: Although they did create accounts for themselves, none of them were able to add or modify pages. However I have a feeling that unrestrained adding of accounts could possibly be the root of the resources problem we had before, as there were 10s of thousands of accounts in that instance of MoinMoin. (It was something like 60k IIRC.)

Better news: Even with all that bot activity the resource consumption on the server was never even close to problem levels.

More better news: I decided to take a crack at adding reCAPTCHA support to MoinMoin myself, and it turned out to be rather easy to do!

So I am going to wipe out the user accounts again, say I'm sorry to the half a percent of you who already created new accounts and are not bots, and turn on reCAPTCHAs for everything that had a TextCha before. It may be that after seeing how things go for a while that we can relax that to just the new account page, and maybe also remove the requirement of being a member of the TrustedEditorsGroup. (Fingers crossed!)

···

--
Robin Dunn
Software Craftsman

Wow. I usually never quite realize how pervasive bots are at this point.
What do they want? To modify the pages to be advertisements?

Thanks for doing this and here's hoping the reCAPTCHA will keep them at bay
(I keep imagining the sentinels from the *Matrix *movies on their way...).

···

On Wed, Jan 14, 2015 at 1:41 AM, Robin Dunn <robin@alldunn.com> wrote:

Robin Dunn wrote:

As of now the wiki.wxpython.org site is live again, but I'll be keeping
an eye on it for a while. If the stupid bots and spiders are still being
stupid and the resource utilization shoots up too high then I'll need to
take it down again and we'll need to figure out some other approach.

Bad news, good news, better news, more better news...

Bad news: There were almost 900 new user accounts added to the wiki in
about 24 hours. I'm guessing that at least 99.5% were bots. (There were
even 25 of them in the few minutes the site was live before I announced it
on these mail lists!)

C M wrote:

Wow. I usually never quite realize how pervasive bots are at this point.

Yeah, it's a real PITA. We had tons of troubles with spammers on the wxWidgets Trac too. Some of the available automatic countermeasures for Trac helped, but nothing we tried stopped all of it. Currently we're requiring human moderation for all tickets and comments from users who are not in the developers group.

What do they want? To modify the pages to be advertisements?

Sorta. IIUC they don't really care if the pages are ever read by a human, just that the keywords and links to their sites are seen by Google, Bing and other search engines. The more links in different domains that there are to a particular site, the higher it will rank in search results.

And it's not just the things one would normally think of as being distributed via spam like porn or anatomy enlargement miracle drugs. In the past few days while watching the logs I've seen attempts to create pages for sports medicine facilities, anti-microbial coatings for kitchen appliances, polycystic ovarian disorder and other cancer related topics, music venues in Lubbock Texas, outdoor pool furniture, weight loss, pet products, weddings, home improvement, etc. Plus a bunch that were non-english. I'm guessing that when companies or consultants offer you search engine optimization services (SEO) that this is how at least the less reputable and/or lazy ones do it.

Thanks for doing this and here's hoping the reCAPTCHA will keep them at
bay (I keep imagining the sentinels from the /Matrix /movies on their
way...).

It's funny you mention that. In Google's intro video about reCAPTCHA their character representing the bots reminded me of the sentinels too, although they don't actually look like them. https://www.youtube.com/watch?v=jwslDn3ImM0

···

--
Robin Dunn
Software Craftsman

Thanks for doing this Robin!

The benefit of adding reCAPTCHA to MoinMoin is that spammers won’t be expecting it. I bet their current systems are designed to work with standard MoinMoin installations. It should keep them away at least for a while. :wink:

-Haris

···

On Jan 14, 2015, at 8:44 AM, Robin Dunn <robin@alldunn.com> wrote:

C M wrote:

Wow. I usually never quite realize how pervasive bots are at this point.

Yeah, it's a real PITA. We had tons of troubles with spammers on the wxWidgets Trac too. Some of the available automatic countermeasures for Trac helped, but nothing we tried stopped all of it. Currently we're requiring human moderation for all tickets and comments from users who are not in the developers group.

What do they want? To modify the pages to be advertisements?

Sorta. IIUC they don't really care if the pages are ever read by a human, just that the keywords and links to their sites are seen by Google, Bing and other search engines. The more links in different domains that there are to a particular site, the higher it will rank in search results.

And it's not just the things one would normally think of as being distributed via spam like porn or anatomy enlargement miracle drugs. In the past few days while watching the logs I've seen attempts to create pages for sports medicine facilities, anti-microbial coatings for kitchen appliances, polycystic ovarian disorder and other cancer related topics, music venues in Lubbock Texas, outdoor pool furniture, weight loss, pet products, weddings, home improvement, etc. Plus a bunch that were non-english. I'm guessing that when companies or consultants offer you search engine optimization services (SEO) that this is how at least the less reputable and/or lazy ones do it.

Thanks for doing this and here's hoping the reCAPTCHA will keep them at
bay (I keep imagining the sentinels from the /Matrix /movies on their
way...).

It's funny you mention that. In Google's intro video about reCAPTCHA their character representing the bots reminded me of the sentinels too, although they don't actually look like them. https://www.youtube.com/watch?v=jwslDn3ImM0

--
Robin Dunn
Software Craftsman
http://wxPython.org

--
You received this message because you are subscribed to the Google Groups "wxPython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wxpython-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Robin,

A pity that these spammers don't find more productive hobbies!

I created my login if I can be of any help in this please let me know - e.g. help maintain the TrustedEditorsGroup.

Werner