New help for pdf to word

Hello My friends,

I am sorry I am asking a question not about wxpython, but I know most of you are very good in python field.

My question is,

I need convert a pdf file to word(MS doc) file for an open-source project, is there any open source project I can use?
Or is there any way easy to do it, I know pypdf can control my pdf part, but for the word part, I have no idea, since I am Linux programmer, I know little about windows.

I find MS have an openxml, but it can only support C# programming language and so on…

Thanks very much.

Regards,
usr root

Don’t bother with .doc format if you can help it - instead, try using RTF, which I believe is an open standard, and should be supported by any modern word processor (even MS Office). I started using PyRTF for this recently, with good results so far. It doesn’t appear to be actively developed any more, but it’s open-source (GPL/LGPL) and written entirely in Python, and looks like it would be easy to extend if necessary.

http://pyrtf.sourceforge.net/

-Nat

···

On Wed, Nov 17, 2010 at 6:13 AM, usr root usr.root@gmail.com wrote:

I am sorry I am asking a question not about wxpython, but I know most of you are very good in python field.

My question is,

I need convert a pdf file to word(MS doc) file for an open-source project, is there any open source project I can use?

Or is there any way easy to do it, I know pypdf can control my pdf part, but for the word part, I have no idea, since I am Linux programmer, I know little about windows.

Hi Nat,

Thanks very much for your rely, it helps me a lot. But in fact I have more requirement than that, it’s B/S project, I need to create the file on the server and open it in browsers.
For rtf, when I open it in firefox, it always show me a dialog box, ---- open it or save it, I don’t need this.

Is there any way to open the rtf file in browser?

Thanks,
Usr root

···

On Wed, Nov 17, 2010 at 11:06 PM, Nat Echols nathaniel.echols@gmail.com wrote:

On Wed, Nov 17, 2010 at 6:13 AM, usr root usr.root@gmail.com wrote:

I am sorry I am asking a question not about wxpython, but I know most of you are very good in python field.

My question is,

I need convert a pdf file to word(MS doc) file for an open-source project, is there any open source project I can use?

Or is there any way easy to do it, I know pypdf can control my pdf part, but for the word part, I have no idea, since I am Linux programmer, I know little about windows.

Don’t bother with .doc format if you can help it - instead, try using RTF, which I believe is an open standard, and should be supported by any modern word processor (even MS Office). I started using PyRTF for this recently, with good results so far. It doesn’t appear to be actively developed any more, but it’s open-source (GPL/LGPL) and written entirely in Python, and looks like it would be easy to extend if necessary.

http://pyrtf.sourceforge.net/

-Nat

To unsubscribe, send email to wxPython-users+unsubscribe@googlegroups.com

or visit http://groups.google.com/group/wxPython-users?hl=en

You can check those to convert .doc files:

http://www.artofsolving.com/opensource/pyodconverter
http://dag.wieers.com/home-made/unoconv/

Those commads use openoffice in headless mode. You can check:
http://code.google.com/p/openmeetings/wiki/OpenOfficeConverter

You can check abiword also, for example:

Ricardo

···

On Wed, Nov 17, 2010 at 3:32 PM, usr root <usr.root@gmail.com> wrote:

Hi Nat,

Thanks very much for your rely, it helps me a lot. But in fact I have more
requirement than that, it's B/S project, I need to create the file on the
server and open it in browsers.

Forget to mention antiword. It's a little outdate but still a neat
utility - http://www.winfield.demon.nl/

Ricardo

···

On Wed, Nov 17, 2010 at 4:57 PM, Ricardo Pedroso <rmdpedroso@gmail.com> wrote:

On Wed, Nov 17, 2010 at 3:32 PM, usr root <usr.root@gmail.com> wrote:

Hi Nat,

Thanks very much for your rely, it helps me a lot. But in fact I have more
requirement than that, it's B/S project, I need to create the file on the
server and open it in browsers.

You can check those to convert .doc files:

http://www.artofsolving.com/opensource/pyodconverter
http://dag.wieers.com/home-made/unoconv/

Those commads use openoffice in headless mode. You can check:
Google Code Archive - Long-term storage for Google Code Project Hosting.

You can check abiword also, for example:
Convert MS Word Files to Other formats using Abiword | All about Linux

Pretty darn OT, but...

Is there any way to open the rtf file in browser?

Well, you can't open MSWord format in the browser, either. (though there may be an embedded plugin I don't know about).

In general, people can click on a Word file in a browser, and if their browser has been set up right, it will be brought up in Word. This is the same for any non-native file type. So it should be pretty straight forward to set up your browser to open an rtf in Word.

In firefox on OS-X, for instance, when I click on a file type it doesn't know what to do with, it brings up a dialog that lets me choose to download it or select and app to open it with, and if I do that, there is a checkbox for "do this every time with this filetype", or something like that.

But it sounds like you have conflicting requirements:

If you want people to be able to simply view it in the browser, you would be better off converting the pdf to HTML, or just leaving it as PDF, as most people have their browsers set up to handle pdf easily.

If you want folks to be able to easily open and then edit, etc. the doc in Word, then converting to Word format is a fine idea. RTF is OK, but I'd be inclined to really give people a Word file (*.doc or *.docx).

Personally, I have major objections to application specific, proprietary formats, but the truth is, casual users often don't really know what a "file type" is, or that Word can open/edit other types than its native one, don't know how or want to set up their browsers to deal with rtf, etc. So if your requirement is for naive users to get a Word doc, you should give them a Word doc.

So how to generate it? RTF may still be a good way -- but I'd take the next step and run the rtf through either Word or Open Office or something to convert to a *.doc -- that has got to be scriptable some how.

Do you need to be able to do just a particular set of PDF files? or any arbitrary one? The later is going to be really hard, as PDF is an inherently different data model than word processors -- it's really just instructions as to how to draw a page, not a structured view of the contents. For simple docs that are mostly text, you can often re-create the text structure, but that doesn't work in general.

It also depends of how much of the PDF you want to preserve - just the text is not too hard.

If you have a particular set of pdf to convert, you may be able to go straight to MS's XML format -- I understand it's a pretty complicated mess, but may not be bad to generate just the subset you need, and there are lots of tools for writing XML with Python.

Good luck!

-Chris

···

On 11/17/10 7:32 AM, usr root wrote:

Thanks,
Usr root

On Wed, Nov 17, 2010 at 11:06 PM, Nat Echols <nathaniel.echols@gmail.com > <mailto:nathaniel.echols@gmail.com>> wrote:

    On Wed, Nov 17, 2010 at 6:13 AM, usr root <usr.root@gmail.com > <mailto:usr.root@gmail.com>> wrote:

        I am sorry I am asking a question not about wxpython, but I know
        most of you are very good in python field.

        My question is,

        I need convert a pdf file to word(MS doc) file for an
        open-source project, is there any open source project I can use?
        Or is there any way easy to do it, I know pypdf can control my
        pdf part, but for the word part, I have no idea, since I am
        Linux programmer, I know little about windows.

    Don't bother with .doc format if you can help it - instead, try
    using RTF, which I believe is an open standard, and should be
    supported by any modern word processor (even MS Office). I started
    using PyRTF for this recently, with good results so far. It doesn't
    appear to be actively developed any more, but it's open-source
    (GPL/LGPL) and written entirely in Python, and looks like it would
    be easy to extend if necessary.

    http://pyrtf.sourceforge.net/

    -Nat

    --
    To unsubscribe, send email to
    wxPython-users+unsubscribe@googlegroups.com
    <mailto:wxPython-users%2Bunsubscribe@googlegroups.com>
    or visit http://groups.google.com/group/wxPython-users?hl=en

--
To unsubscribe, send email to wxPython-users+unsubscribe@googlegroups.com
or visit http://groups.google.com/group/wxPython-users?hl=en

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

Most PDF-to-MS-Word-RTF-file converters are really terrible. Also, I have found the non-native file type import and conversion capabilities of Open Office to be very unacceptable on the MSW platform. Open Office has many serious bugs running on the MSW platform.

I have tried very many PDF-to-MS-DOC (RTF, actually) converter programs on the MS Windows platform and have found only a single one that is any good at all:

Solid Documents: PDF To Word

The reason this program is relatively expensive is because Solid Documents knows its competition is essentially nonexistent. That is, most of its competition has cheaper products, but they don’t work properly ! It’s easy to create a PDF document, but it’s also really hard to properly convert it to RTF format.

All of these type converters produce supposedly MS RTF format files readable and editable by by MS Word and MS Wordpad. However, while MS Wordpad will read and display these files, do not edit .RTF files using Wordpad as it alters the appearance in ways you probably will not want. Use only MS Word to edit (alter) the .RTF files if they need editing at all.

Once you have you file in RTF format you can use MS Word to convert it to HTML format which is viewable in any web browser.

Ray

···

On Wed, Nov 17, 2010 at 10:32 AM, usr root usr.root@gmail.com wrote:

Hi Nat,

Thanks very much for your rely, it helps me a lot. But in fact I have more requirement than that, it’s B/S project, I need to create the file on the server and open it in browsers.

For rtf, when I open it in firefox, it always show me a dialog box, ---- open it or save it, I don’t need this.

Is there any way to open the rtf file in browser?

Thanks,
Usr root

Ups... I misread this thread. I read it like .doc to .pdf conversion.
Sorry about the noise.

Ricardo

···

On Wed, Nov 17, 2010 at 4:57 PM, Ricardo Pedroso <rmdpedroso@gmail.com> wrote:

On Wed, Nov 17, 2010 at 3:32 PM, usr root <usr.root@gmail.com> wrote:

Hi Nat,

Thanks very much for your rely, it helps me a lot. But in fact I have more
requirement than that, it's B/S project, I need to create the file on the
server and open it in browsers.

You can check those to convert .doc files:

http://www.artofsolving.com/opensource/pyodconverter
http://dag.wieers.com/home-made/unoconv/

Those commads use openoffice in headless mode. You can check:
Google Code Archive - Long-term storage for Google Code Project Hosting.

You can check abiword also, for example:
Convert MS Word Files to Other formats using Abiword | All about Linux

*Most PDF-to-MS-Word-RTF-file converters are _/really terrible/_.*

I'm not surprised -- determining the logical structure of an arbitrary PDF is a really hard problem.

Once you have you file in RTF format you can *use MS Word to convert it
to HTML format* which is viewable in any web browser.

If I wanted HTML, the last thing I"d do is use Word to generate it. Word-generated HTML is horrible.

Though if you need both Word format and HTML, it might be the easiest thing to do.

-Chris

···

On 11/17/10 9:33 AM, Ray Pasco wrote:

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

Open Office has many serious bugs running on the MSW platform

It will be getting better, The Document Foundation has forked it to LibreOffice, and it is expected to have many improvements by the first release. Don’t give up on opensource office software yet!

···


Hi, I will kill all ads in google gmail.
They will all be dead and gone for all my emails to you. HA HA bye bye ads I just massacred you!!!