wxPython Application File Format

I am working on a python desktop application in wxPython and I'm worried about the file format for files that the application saves (like a .DOC file for MSOffice).

Instead of using the struct module (struct.pack and struct.unpack) to write the binary data to a file, I made use of Pickle/cPickle since the data structure to be stored was very complex (and constantly changing during development). Since using cPickle was a "one-liner" I took the easy route first.

I took into account:

- Make sure not to store names I don't control: For example, even though a cnumpy.core.multiarray can be serialized, I convert it to python list.

- Class Names: They MUST remain the same after each application update or de-serialization of old files will fail.

- Class Variable Names: They MUST remain the same after each application update or use of de-serialization of old data could fail.

- Cant serialize PySWIG objects: modified __getstate__ to replace these objects with a tuple of their important arguments and modified __setstate__ to read those arguments and produce the right wxPython/PySWIG object (this had to be done for drag/drop operations within the program anyway).

- Application updates (Adding variables to objects): If any objects which were serialized with a previous version of the application are missing a variable upon de-serialization, it is set to a default value in __setstate__:

>> def __setstate__(self,dict):
>> #Set missing value to some default
>> if "NewValue" not in dict: dict["NewValue"] = 'Some default'
>> self.__dict__.update(dict)
>> return

Some of these appear to be reasonable things to do anyway between application builds but other than the gorilla in the room of not being able to easily port this file format to other environments (Different python program, C++, Java, whatever), does anyone see this biting me later?

Or once development is done, should I just do the work to use struct to write the file?

thanks in advance,
Ben

I generally say that unless file size is a major concern, you should
not use binary file formats. The only gain is in compactness; you lose
everywhere else.

However, you should definitely have some explicit
serialization/deserialization system; just blindly loading pickled
Python objects is probably not safe. Sanitize your inputs!

My suggestion? Use JSON. It's a simple format, it's human-readable,
Python has a built-in library for loading and writing it, and you can
represent almost any data structure using combinations of scalars,
lists, and dicts.

-Chris

···

On Tue, Sep 25, 2012 at 11:11 AM, Benjamin Jessup <bsj@abzinc.com> wrote:

I am working on a python desktop application in wxPython and I'm worried
about the file format for files that the application saves (like a .DOC file
for MSOffice).

<snip>

Some of these appear to be reasonable things to do anyway between
application builds but other than the gorilla in the room of not being able
to easily port this file format to other environments (Different python
program, C++, Java, whatever), does anyone see this biting me later?

Or once development is done, should I just do the work to use struct to
write the file?

JSON looks like the obvious choice, considering it ports into so many languages.

JSON supports objects but I would like to remove the reliance on class definitions & module names. If I use __getstate__ to get the dictionaries for objects (instead of pickling the object itself), and store the dictionaries in a specific order, then I can port the data to any language that uses JSON, map the dict keys to whatever variables I want, and then I am good to go.

With the class definitions & module names included I was worried about letting the user edit files with a text editor, but without them the user can do a lot less damage.

Thanks Chris!
Ben

···

On 9/25/2012 2:43 PM, Chris Weisiger wrote:

On Tue, Sep 25, 2012 at 11:11 AM, Benjamin Jessup <bsj@abzinc.com> wrote:

I am working on a python desktop application in wxPython and I'm worried
about the file format for files that the application saves (like a .DOC file
for MSOffice).

<snip>

Some of these appear to be reasonable things to do anyway between
application builds but other than the gorilla in the room of not being able
to easily port this file format to other environments (Different python
program, C++, Java, whatever), does anyone see this biting me later?

Or once development is done, should I just do the work to use struct to
write the file?

I generally say that unless file size is a major concern, you should
not use binary file formats. The only gain is in compactness; you lose
everywhere else.

However, you should definitely have some explicit
serialization/deserialization system; just blindly loading pickled
Python objects is probably not safe. Sanitize your inputs!

My suggestion? Use JSON. It's a simple format, it's human-readable,
Python has a built-in library for loading and writing it, and you can
represent almost any data structure using combinations of scalars,
lists, and dicts.

-Chris

Although you may continue to get good answers here, AFAICT this is a purely Python question so you probably want to ask it on the Python list, as opposed to a wxPython list.