any way to speed up Tree.ExpandAll()?

Bryan_Oakley · May 10, 2011, 6:59pm

I have a wx.TreeCtrl that I'm loading up with about 8 megs worth of
hierarchical data, roughly 200,000 tree items. This loads up in a
respectable time -- maybe 5 seconds -- but if I call the ExpandAll
method the time is measured in double-digit _minutes_. Is the TreeCtrl
the most efficient widget there is when it comes to large datasets? I
want the ability to see all the data at once, so a virtual tree won't
do me any good.

Tim_Roberts · May 10, 2011, 7:09pm

Bryan Oakley wrote:

I have a wx.TreeCtrl that I'm loading up with about 8 megs worth of
hierarchical data, roughly 200,000 tree items. This loads up in a
respectable time -- maybe 5 seconds -- but if I call the ExpandAll
method the time is measured in double-digit _minutes_. Is the TreeCtrl
the most efficient widget there is when it comes to large datasets? I
want the ability to see all the data at once, so a virtual tree won't
do me any good.

That's clearly not true. You can't see 200,000 items at once. You
don't have enough pixels. You can see about 100 items at once. The
advantage of a virtual tree is that your code can decide how to look the
data up efficiently, instead of relying on the tree's internal data
structure.

···

--
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

Bryan_Oakley · May 10, 2011, 7:15pm

Ok, I stand corrected. What I meant to say was that I wanted to be
able to scroll through all 200,000 nodes quickly. I was assuming
leveraging the tree's (C-based?) data structure would be faster than
anything I could do with a python data structure. Perhaps that's a bad
assumption.

···

On Tue, May 10, 2011 at 2:09 PM, Tim Roberts <timr@probo.com> wrote:

Bryan Oakley wrote:

I have a wx.TreeCtrl that I'm loading up with about 8 megs worth of
hierarchical data, roughly 200,000 tree items. This loads up in a
respectable time -- maybe 5 seconds -- but if I call the ExpandAll
method the time is measured in double-digit _minutes_. Is the TreeCtrl
the most efficient widget there is when it comes to large datasets? I
want the ability to see all the data at once, so a virtual tree won't
do me any good.

That's clearly not true. You can't see 200,000 items at once. You
don't have enough pixels. You can see about 100 items at once. The
advantage of a virtual tree is that your code can decide how to look the
data up efficiently, instead of relying on the tree's internal data
structure.

Che_M · May 10, 2011, 7:33pm

200,000 expanded nodes? Is your user Lieutenant Commander Data? Even
with the idea of scrolling through that quickly, if you calculate it
out to say scroll the whole set in 20 seconds, that is 10,000 nodes
*per second*. Even if you consider 200 seconds (three and a half
minutes) "quick" for scrolling through it, that is still 1000
nodes/second. To scroll through it a rate in which a non-Data user
could actually read it would take hours.

Unless I am really misunderstanding something, this just can't be the
best choice for displaying such a mongo data set. Depending on what
the goal is, there must be better ways. Is there some obvious manner
in which the 200,000 items should be grouped?

Che

···

On Tue, May 10, 2011 at 2:59 PM, Bryan Oakley <bryan.oakley@gmail.com> wrote:

I have a wx.TreeCtrl that I'm loading up with about 8 megs worth of
hierarchical data, roughly 200,000 tree items. This loads up in a
respectable time -- maybe 5 seconds -- but if I call the ExpandAll
method the time is measured in double-digit _minutes_. Is the TreeCtrl
the most efficient widget there is when it comes to large datasets? I
want the ability to see all the data at once, so a virtual tree won't
do me any good.

Marc_Tompkins · May 10, 2011, 7:46pm

We tend to assume that anything we write in Python is automatically going to be exponentially slower than C… but it ain’t necessarily so. Over the years the interpreter has got scary-good at optimizing Python into C-equivalence; unless you make a shambolic mess of your structure (or use XML - the horror, the horror!), your performance is likely to be pretty good - certainly as compared to the creeping paralysis you describe.

I’m going to go out on a limb here and hypothesize that the reason ExpandAll is so dam’ slow is that the control is keeping track of GUI information - x/y position, visibility, etc. - for each node in addition to whatever data you yourself are storing there*. By contrast, a VirtualTreeCtrl acts as a “window” onto your underlying dataset, only loading/displaying the part that’s supposed to be visible at the moment; I suspect that you’ll find it actually does exactly what you want, and that keeping the data and GUI separate will give you a significant boost. On the other hand, using a virtual control can require a bit of a rethink of your code.

Just my $.02…

If I’m wrong about that, I’m sure someone will tell me immediately.

···

On Tue, May 10, 2011 at 12:15 PM, Bryan Oakley bryan.oakley@gmail.com wrote:

On Tue, May 10, 2011 at 2:09 PM, Tim Roberts timr@probo.com wrote:

Bryan Oakley wrote:

I have a wx.TreeCtrl that I’m loading up with about 8 megs worth of

hierarchical data, roughly 200,000 tree items. This loads up in a

respectable time – maybe 5 seconds – but if I call the ExpandAll

method the time is measured in double-digit minutes. Is the TreeCtrl

the most efficient widget there is when it comes to large datasets? I

want the ability to see all the data at once, so a virtual tree won’t

do me any good.

That’s clearly not true. You can’t see 200,000 items at once. You

don’t have enough pixels. You can see about 100 items at once. The

advantage of a virtual tree is that your code can decide how to look the

data up efficiently, instead of relying on the tree’s internal data

structure.

Ok, I stand corrected. What I meant to say was that I wanted to be

able to scroll through all 200,000 nodes quickly. I was assuming

leveraging the tree’s (C-based?) data structure would be faster than

anything I could do with a python data structure. Perhaps that’s a bad

assumption.

Bryan_Oakley · May 10, 2011, 8:15pm

Well, the nodes _are_ grouped, which is why I'm using a tree. It's
hierarchical XML data that represents a semi-complex data model.
Normally there will be a few top-most nodes, each with a few to a few
thousand children, but each of those could have dozens of children,
and some of those may have dozens of children. It adds up quickly.

And, no, this isn't the best way to deal with it. As things go, I
don't have the resources to solve the problem "right" (it's an
internal tool so good enough is Good Enough) We've found the XML
editors we have access to fall down with so much data (I'm looking at
you, Eclipse) so I'm trying to find an quick and easy solution. My
thought was to use a treectrl so that I could deal with just the raw
data (up to maybe 15 megs) rather than the xml data with all the tags
(up to 40-something megs).

I was hoping I could throw the data in the tree, and give the users a
search box to whittle down the dataset to find what they are looking
for. I know, though, that users will want to expand everything and
scroll through looking for patterns in the data.

I'm still formulating a plan of attack, and dumping everything into a
TreeCtrl was a shortcut I was hoping would work. I was pleasantly
surprised to see I could load up the data quickly, and manually
manipulating nodes worked fine. It was just such a shock to see
something that seems as simple as expanding all nodes took a
completely unacceptable amount of time to perform.

···

On Tue, May 10, 2011 at 2:33 PM, C M <cmpython@gmail.com> wrote:

200,000 expanded nodes? Is your user Lieutenant Commander Data? Even
with the idea of scrolling through that quickly, if you calculate it
out to say scroll the whole set in 20 seconds, that is 10,000 nodes
*per second*. Even if you consider 200 seconds (three and a half
minutes) "quick" for scrolling through it, that is still 1000
nodes/second. To scroll through it a rate in which a non-Data user
could actually read it would take hours.

Unless I am really misunderstanding something, this just can't be the
best choice for displaying such a mongo data set. Depending on what
the goal is, there must be better ways. Is there some obvious manner
in which the 200,000 items should be grouped?

Robin · May 10, 2011, 11:29pm

Plus the sending and processing of many thousands events for node expansions, sizing, scrolling, etc. etc.

···

On 5/10/11 12:46 PM, Marc Tompkins wrote:

I'm going to go out on a limb here and hypothesize that the reason
ExpandAll is so dam' slow is that the control is keeping track of GUI
information - x/y position, visibility, etc. - for each node in addition
to whatever data you yourself are storing there*.

--
Robin Dunn
Software Craftsman

Robin · May 10, 2011, 11:31pm

"Virtual" mode in a wx.TreeCtrl basically consists of delaying the addition of child nodes until the user expands the parent, and (optionally) removing the children when the parent is collapsed. So that doesn't buy you much if the user wants to see the whole tree fully expanded as you'll still have to have all of it in memory and the widget will still have to track all of the item information as mentioned before.

A good general rule of thumb is that if you can avoid it do not load more than a "reasonable" amount of items into any of the native widgets because in most cases they are designed (either from a UI perspective, a data perspective, or both) to deal with smaller bite-sized numbers of items.

The new DataViewCtrl in 2.9 may be a better solution for you in some ways. It can show either tabular data (like wx.ListCtrl in report mode) or hierarchical data (like a Tree[List]Ctrl) and in both modes you don't pre-load any data into it, but it asks you for the data as needed for showing on screen. Hierarchical data relationships is a bit more tricky in full virtual mode like that, but it is doable.

But there may be an even better answer for you. You mention that your users may want to do an expand-all in order to "look for patterns" in the data. Depending on the nature of the data there will probably be a not too difficult way to represent it graphically (such as a heat map) that would allow your users to fly over it at the 100,000 foot level instead of trying to find patterns while flying over 37 miles worth of data at the 100 foot level. You could allow them to progressively zoom in and see more details, until they get to the point where a wx.TreeCtrl is practical.

···

On 5/10/11 1:15 PM, Bryan Oakley wrote:

I was hoping I could throw the data in the tree, and give the users a
search box to whittle down the dataset to find what they are looking
for. I know, though, that users will want to expand everything and
scroll through looking for patterns in the data.

--
Robin Dunn
Software Craftsman

Thomas_Zehbe · May 11, 2011, 7:09am

Maybe this Flicker-Free Drawing - WxWiki could help. Or calling Hide() before expanding all, and afterwards Show().

Regards,

Thomas

···

Am 10.05.2011 20:59, schrieb Bryan Oakley:

I have a wx.TreeCtrl that I'm loading up with about 8 megs worth of
hierarchical data, roughly 200,000 tree items. This loads up in a
respectable time -- maybe 5 seconds -- but if I call the ExpandAll
method the time is measured in double-digit _minutes_. Is the TreeCtrl
the most efficient widget there is when it comes to large datasets? I
want the ability to see all the data at once, so a virtual tree won't
do me any good.

--
Thomas Zehbe
Dipl.-Ing

CAE-Services GmbH
Bäckerstraße 18
31683 Obernkirchen
Germany

Tel. +49 5722 968-3934
Tel. +49 5724 951831
Fax. +49 5724 951813