Back
Word

Unlock PDFs with Word 2013

We’ve all received PDF files with content that we wanted to reuse. This means that most of us have been disappointed by the difficulty of getting rich content out of a PDF. For example, if you try to copy and paste table rows from a PDF viewer into Word, you frequently end up with a collapsed single line of text, as in the figure below.  Most existing PDF viewers, in essence, limit people who use PDF’s to a “look but don’t touch” experience.

Screenshot of a table that is copied and pasted into Word.

PDF Reflow, a new feature in the upcoming release of Word, changes the landscape by letting you convert PDFs into editable Word documents.

The Goal

The goal of PDF Reflow is to convert PDF content into Word documents that contain the original layout intent and that flow correctly across pages when you edit or read. In other words, when you convert a PDF in Word, the elements in the document should act as if you created them in Word. A list from a PDF, for instance, will act just like any other list in Word: hit Enter at the end of a bulleted paragraph and a new bullet will be created.

The PDF Reflow feature is not intended as a replacement for a reader, such as Windows 8’s Reader, but rather is a converter that gives you a new level of access to your content. It works with any PDF, but because we re-layout the contents, the results are best with documents that are mostly textual, such as legal and business documents. If a PDF contains mostly images and diagrams, as in a presentation or a brochure, converting it has a much higher likelihood of issues like the one in the columns example above.

For example, take a look at the document below. Some of the text in the first column wraps to the next line differently and a line from the top of the original second column moved to the end of the first. All the original content is there, but because the PDF reflow process values the ability to edit the content over picture-perfect alignment, some of the content repositions.

Screenshot of a PDF that has gone through the reflow feature and opened in Word.

That isn’t to say we won’t try our best to convert any PDF file you hand us! For instance, let’s take a look at a PDF of a PowerPoint slide. PDF Reflow converts the file and preserves all the content, however the text ends up in textboxes and won’t re-layout nicely across pages if you start typing in it.

Screenshot of a PDF created from a PPT slide and opened in Word

Keep in mind that PDF Reflow creates a copy of your content during the conversion. If the results aren’t what you expect, your original PDF file still remains safely intact.

How it works

PDF is a fixed file format, which means the file stores where text images and graphics are placed on a page, but not necessarily the relationships among them. Most PDFs don’t have a notion of content structure elements, such as paragraphs, tables, or columns. In our table rows example, there’s not enough information in the PDF file for us to know that these words should be in separate table cells. Instead, all we can see is that the text should be right after each other.

You can see the table structure with its text on the surface of the document, but underneath, the PDF usually stores the table as an absolutely-positioned set of lines. (PDF uses the same type of lines to represent underline, strikethrough, or even graphs.) Sort of like this:

Picture of how Word sees table content in a PDF file

There is typically no indication in the PDF file that links text content with these lines or that these lines and text logically represent cells in a table.

When you open a PDF file in Word 2013, PDF Reflow constructs a Word 2013document from it, opening the door to easy editing and content reuse. It accomplishes this by using a system of complex rules to figure out what Word objects (like headings, lists, tables, etc.) would best represent the original PDF. The figure below shows what our table example looks like when PDF Reflow uses its heuristics to reconstruct the table structure and content from the lines and text.

Screenshot of how a table from a PDF file looks after opening in Word

Getting Started

PDF Reflow is built directly into Word 2013 so you can access your PDF like any other document. In the ribbon click FILE, and go to the Open tab in the Backstage. Navigate to the PDF location and select the file you would like to convert! Your content, formerly locked up in a PDF, is now yours to work with again.

Screenshot of the Open place in Word

The Team

The PDF Reflow team spent the past couple of years thinking about how to turn PDF files into Word documents.

Picture of the PDF Reflow team
 

Join the conversation

8 comments
  1. It is great work ….well appreciated..

    I have a quick question. is it possible to work in sharepoint 2013?..We are looking this technique for document management .

    what about precision of converting pdf to word ,if the pdf file text has been written by hand ?

    • Hello!

      Thanks for your comment!

      PDF Reflow is currently available only on Word client and you still cannot use it in SharePoint 2013.

      As for PDF file where text has been written by hand, the resulting Word document will depend on how that handwriting is stored in PDF. Most likely the handwriting is stored as image or vector graphics, meaning that it will end up in Word document exactly as that: images or vector graphics.

      I hope that answers your questions. Please let me know in the case you have any additional questions.

      Thanks!
      Milos

  2. Cool feature! love this!

    just want to ask if an admin can disable this feature? because some users don’t want their PDFs to be converted.

    • Hello John!

      Thanks for your question.

      PDF Reflow is a Word client feature and it is installed by default, meaning that there is no option to disable this feature during Office installation and configuration process.

      Please also note that by default Word does not take over PDF extension, even if there is no other PDF handler installed on the system. The only way Word may become a default PDF handler is if a user/admin manually sets Word to be the default PDF handler. In other words, by default the PDF will be converted to DOCX only per user explicit request, when a user opens a PDF in Word.

      Thanks!
      Milos

  3. Killer feature! Awesome!
    I can’t wait to see if this works better for longer documents than other software for that purpose. E.g., detecting (numbered) multi-level headings, indents, complex table layouts, footers, etc. I’ll install the preview asap. Great work!

  4. Not sure that my post worked so…. What do you do if it doesn’t work. I’m on Office 2013 Preview client and I opened a pdf with text, table and phone numbers but all I get is one large image.

    • Hello Jeremy,

      It might happen that you opened a scanned PDF that is constructed from images – currently PDF Reflow is not able to extract text/graphics from image. Other thing that might happened, if this is not a scanned PDF, is that that part of the document is graphics heavy, in which case PDF Reflow might render that part of the document.

      Thanks!
      Milos

Comments are closed.