String pooling

One way to achieve this would be to build a dict with each line as the key. The value can be anything, maybe the number of
instances of that line.

Another thing to try is 'intern' each string. That avoids storing the same string in different objects. Make sure that the
first time each line is created you use

   line = intern(<string>).

/Jean Brouwers

I tried this using intern() and performance dropped through the floor,
as well as memory usage soaring (more than storing everything
individually), I have no idea why. I hadn't thought of the dict
implementation, I'll give that a try.

Thanks a lot.

···

On Wed, 11 Aug 2004 21:13:12 -0500, Jean Brouwers <jbrouwers@prophicy.com> wrote:

Chris Mellon wrote:

> I have an application that needs to store a log buffer of up hundreds
> of thousands of lines, many of which may be identical. Does anyone
> know of any implementations of string pooling I might be able to use
> to cut the memory footprint for this down to something reasonable?

>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
> For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org

Which platform is this? That may be a whole different issue.

We saw the opposite, performance improved and memory usage decreased after just intern-ing frequently occurring strings. In
our case, we read a 200+ MB log file into memory as one large string and then slice smaller strings from that. The sliced
strings are all intern'ed. This is Python 2.3 on RedHat Linxu 8.0.

I was using 2.3 on WinXP. I didn't look very far into what was
happening, just assumed intern() wasn't suited to what I was doing and
moved on. I'll take a closer look. Thanks again.

···

On Wed, 11 Aug 2004 21:43:38 -0500, Jean Brouwers <jbrouwers@prophicy.com> wrote:

/Jean Brouwers

Chris Mellon wrote:

> On Wed, 11 Aug 2004 21:13:12 -0500, Jean Brouwers > > <jbrouwers@prophicy.com> wrote:
>
>>One way to achieve this would be to build a dict with each line as the key. The value can be anything, maybe the number of
>>instances of that line.
>>
>>Another thing to try is 'intern' each string. That avoids storing the same string in different objects. Make sure that the
>>first time each line is created you use
>>
>> line = intern(<string>).
>>
>>/Jean Brouwers
>
>
>
> I tried this using intern() and performance dropped through the floor,
> as well as memory usage soaring (more than storing everything
> individually), I have no idea why. I hadn't thought of the dict
> implementation, I'll give that a try.
>
> Thanks a lot.
>
>
>
>>Chris Mellon wrote:
>>
>>
>>>I have an application that needs to store a log buffer of up hundreds
>>>of thousands of lines, many of which may be identical. Does anyone
>>>know of any implementations of string pooling I might be able to use
>>>to cut the memory footprint for this down to something reasonable?
>>
>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
>>>For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org
>>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
>>For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
> For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org
For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org

Make sure you’ve read the documentation on
intern() and that you are using it correctly. You need to call intern()
each time you want to access the string, of course, since that’s the only
way Python knows to do the lookup. If you do something like

s = intern(‘mystring’)

x = ‘mystring’

the second reference won’t refer to the interned
instance of the string. Perhaps some of your code is doing this sort
of thing. For later (i.e., non-initial) references you need
to do

x = intern(‘mystring’)

since the intern() function either returns
the reference to the already interned string or creates and returns the
reference if the string has not already been interned.

Also, you may want to be careful about exactly
which string you are interning so that you know when to call intern() and
when not to. Otherwise, you may be interning all your strings,
and you might not want to be doing that. Note that interned strings
are not garbage collected.

I’ve used interning very effectively. If
it’s now working well for you, I suspect that something isn’t being done
quite correctly. But there are circumstances in which it won’t be
the best approach. Fundamentally, if you have a set of strings that
you know you’ll be accessing repeatedly over time and that you won’t want
to be deleting or otherwise munging, interning should be a win. Otherwise,
maybe not.

The suggested dict implementation is just
your own implementation of intern() and consequently should be much slower.
But it may expose what your problem is with intern().

···

Gary H. Merrill

Director and Principal Scientist, New Applications

Data Exploration Sciences

GlaxoSmithKline Inc.

(919) 483-8456

“Chris Mellon”
arkanes@gmail.com

11-Aug-2004 22:23

Please respond to wxPython-users@lists.wxwidgets.org

To

wxpython-users@lists.wxwidgets.org
cc

Subject

Re: [wxPython-users] String
pooling

`On Wed, 11 Aug 2004 21:13:12 -0500, Jean Brouwers

jbrouwers@prophicy.com wrote:

One way to achieve this would be to build a dict with each line as
the key. The value can be anything, maybe the number of

instances of that line.

Another thing to try is ‘intern’ each string. That avoids storing
the same string in different objects. Make sure that the

first time each line is created you use

line = intern().

/Jean Brouwers

I tried this using intern() and performance dropped through the floor,

as well as memory usage soaring (more than storing everything

individually), I have no idea why. I hadn’t thought of the dict

implementation, I’ll give that a try.

Thanks a lot.

Chris Mellon wrote:

I have an application that needs to store a log buffer of up
hundreds

of thousands of lines, many of which may be identical. Does anyone

know of any implementations of string pooling I might be able
to use

to cut the memory footprint for this down to something reasonable?


To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org

For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org


To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org

For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org


To unsubscribe, e-mail: wxPython-users-unsubscribe@lists.wxwidgets.org

For additional commands, e-mail: wxPython-users-help@lists.wxwidgets.org

`