Word Automation Services: What It Does

Following up on my first post about Word Automation Services, I wanted to continue by talking about the functionality offered (and not offered) by the service, how it's exposed, and the types of solutions you will be able to build on top of it.

What the Service Does

Functionally, the service is very simple – this is intentional, as we wanted to address the pain points that we've heard loud and clear from you over the past few years, while keeping performance and scale at the top of our priorities (which meant avoiding the temptation to bring over everything "just because").

With that mindset, we really only set out to tackle the two most common requests that we hear:

  1. I have a bunch of Word documents. I want to convert them to PDF on the server in bulk (e.g. DOCX to PDF).
  2. I have a template and some data. I want to merge the two and create a set of PDF files; one per merge result (e.g. mail merge to PDF).

Now, when we hear that, the output format's not always PDF (but it's probably the most common). As we translated that to features, it meant that the server needed to do one thing really well: file conversions. Accordingly, Word Automation Services supports conversions to/from almost all of the formats Word client understands:

File formats the service can read:

  • Office Open XML (DOCX, DOCM, DOTX, DOTM)
  • Word 97-2003 Document (DOC) and Word 97-2003 Template (DOT)

    • We also support older versions of Word as far back as Word 2.0 for Windows (!)

  • Rich Text Format (RTF)
  • Single File Web Page (MHTML)
  • HTML
  • Word 2003 XML
  • Word 2007/2010 XML

File formats the service can write:

  • PDF
  • XPS
  • Office Open XML (DOCX, DOCM)
  • Word 97-2003 Document (DOC)
  • Rich Text Format (RTF)
  • Single File Web Page (MHTML)
  • Word 2007/2010 XML

This also meant that we needed to support all of the features that are part of loading/saving documents, i.e.:

  • XML data mapping – you can place updated XML in the document, and content controls will automatically be updated
  • Fields – the service (or the file) can be set to recalculate fields automatically during conversion
  • AF Chunks – you can embed documents (DOCX, HTML, RTF, DOC) within a DOCX file, and have the service merge in the content automatically
  • Upgrade – you can specify whether the file should be upgraded as part of loading it on the server
  • Add Thumbnail images on save
  • Etc.

How It's Exposed

To expose this capability, we also thought small (and hopefully simple) – the service exists as a managed API you can utilize on SharePoint, allowing you to build on top of it as appropriate for your solutions – maybe that's a WCF service, maybe a custom workflow activity, etc.

That API breaks down into two basic objects:

  • ConversionJob – the object that encapsulates 1+ conversions that you want to perform as a logical unit
  • ConversionJobStatus – the object that allows you to query the status of a ConversionJob while/after it's processed

With the first, you ask us to convert files on the server and put the result back on the server; with the second, you query the progress of that conversion process.

Example

As an example, consider a server solution in which I want to allow users to schedule self-service conversions: they can right-click on a file in SharePoint and request a XPS version of that file.

On my ASPX page for the conversion, the button handler might contain the following code:

public void Convert_Click(…)
{
    ConversionJob job = new ConversionJob("Word Automation Services")
    job.UserToken = SPContext.Site.UserToken;
    job.UpdateFields = true;
    job.OutputFormat = SaveFormat.XPS;
    job.AddFile("http://contoso.com/input/foo.docx","http://contoso.com/output/foo.xps");
    job.Start();
}

And that's all that's required – I create a ConversionJob object to encapsulate the action, tell it to convert to XPS and update fields using my credentials to read/write the files, tell it the file to convert, and use Start() to kick off the process.

Once it's running, I can easily query the status of that conversion – the job.JobId property specified a unique GUID for that job that I could have stored and reused, e.g.:

public void CheckStatus(Guid jobId)
{
    ConversionJobStatus status = new ConversionJobStatus("Word Automation Services", jobId, null);
    if (status.Count == status.Succeeded)
    {
        //success!
        //do something 
    }
    else if (status.Count == status.Failed)
    {
        //failure :(
        //do something else
    }
    …
}

Just by creating a ConversionJobStatus object, I immediately know where that item is in the system (Succeeded, Failed, InProgress, NotStarted) and can react appropriately.

That example's probably two-thirds of the API – the goal really was to keep it simple and focus doing those two things really well.

Back to the Open XML SDK

Now, the one thing I didn't directly address in this post was the "merging documents with data" piece above.

That part of our solution isn't just the service itself – it's actually solved in combination with the Open XML SDK. I'm going to talk about the SDK a lot when I talk about the server; as I said in the first post, it's the combination of the two that provides the end-to-end story that we believe replaces the need to automate the client applications.

In this case, you'd use the SDK to clone the template and inject the data (a task well suited to manipulation of the file format), and use the service to convert the resulting files to PDF/XPS.

I hope that was a useful introduction to what we're doing and how you'll be able to work with it – in the next post, I'll talk more about our architecture and how we're leveraging the strengths of the SharePoint platform.

- Tristan

Office Blogs Comments

Comments: (22) Collapse

  • Hi,

    Word Automation Services looks very promising.

    One question, though.

    Is it possible to perform the conversion synchronously

    and retrieve the content of the converted document ?

  • @FIF: You can't force conversions to happen synchronously, but you can monitor status and know immediately when they're done. Once it's done, it's trivial to grab the content of the new file. Tristan

  • אני צריכה להוריד תוכנה זו לשימוש

  • Is it possible to get the conversion job to delete the originals after converting them to another format? I want to be able to convert documents to PDF and then delete the original documents. it'd be great if the service could do the deleting for me, otherwise it gets a bit trickier becuase of the async processing...

  • Your product has crashed 4-5 times on me in one day... what a piece of crap!

  • Very nice to see MS finally supplying this type of functionality itself, instead of having us revert to custom built solutions or half-baked solutions commercially available. Personally I'm very interested in the template based document generation. I created a custom solution (http://flexdoc.codeplex.com), but since the custom xml feature will be removed soon from Word 2007 and future versions, I'm looking for a decent alternative.

    Therefore I'm really interested in an article on this subject!! Oh btw: you need to support ODF (.odt) as output format as well, otherwise a lot of gouvernment agencies will not be able to use it.

  • @Robert: Cool project on codeplex. Content controls are a good alternative to use when it comes to helping push data into Word documents. Take a look at the following post for some more details. blogs.technet.com/.../using-content-controls-vs-custom-xml-elements.aspx. In addition, I would recommend checking out some Open XML SDK posts here blogs.msdn.com/brian_jones for other ideas on how to move forward.

  • In fleXdoc I use custom XML tags as a query-language. Eg. when I place a ValueOf-tag in a document, it renders data in the document that is retrieved from the data-XML using an xpath-query that is supplied with the ValueOf-tag itself. This is why I think a standard content control will not do: I need to be able to specify 'arguments' to the control. Also the custom XML tag UI currently in Word 2K3/2K7 is pretty cool:

    - it displays a picklist when the underlying schema specifies an enumeration

    - it validates against schema, thus checking required element properties

    - the tag-properties are easily accessible (right-click -> Attributes), while content controls need more clicks (select control, click 'Properties'-button): for large templates this becomes annoying. So as far as I know, the only way a content control based solution would work is when I can customize them or create my own content controls. Can you give me any guidance or sample on íf this is possible and how to do so? Would I need the building block content control?

  • @amit: We'd love it if you join the Beta and give it a try - just browse here to get started: sharepoint2010.microsoft.com/.../Trial.aspx @Flynn: this is not possible out of the box; you'd need to monitor the conversion status and do this yourself, or have a workflow that watches for the output file and deletes the input file when it appears @Steve: are you talking about Word Automation Services specifically? if so, would love to dig into the problems you're having

  • Can the Word team give me a definite answer on whether true small caps will be supported in Word 2010 or not? Word is taking miniscule steps towards supporting proper typography. I was so excited when you added support for ligatures but equally disappointed when discovered it still fakes small caps. Please see this article: www.osnews.com/.../The_Problem_with_Typography_Complexity_on_the_Web. Get the X-height and stroke weight right before RTM. Such a small feature but will make a huge difference to sales of Word.

  • Typographer-- No. Word 2010 will not support OpenType Small Caps. We will consider this as a future feature as more fonts provide worldwide (rather than just English or Latin) support for such features. -Stuart

    Word PM

  • @Robert te Kaat I am also in the same boat like you. I use custom xml a lot in our application and we are struck now. Replacing Custom Xml with Content Controls is not straight forward. Were you able to find anything in this regard? I have seen your flexdoc. Are you planning on replacing your custom xml with something else? Neel

  • @Neel:

    No plans yet. I've spent a *lot* of my own free time on fleXdoc and redesigning it is just too much, especially since I don't make a dime on it! I don't think content controls would work for fleXdoc; it would work for single fields, but not for more complex constructions like repeating content (like table rows): the template design-experience from Word would be way too confusing for users. By the way: since both OOXML and WordML (2003) are XML-based, replacing custom-XML tags with content controls could be done by creating an xslt-based conversion tool. Once you create the tool, the amount of templates is no longer an issue. Also: MS is not yet finished with the whole custom-XML/i4i thing, so there's still hope custom XML will stay (actually: come back).

  • Re: PDF (1) Specifically, please confirm that Word Automation Services can batch create SECURE PDF. More generally, that it'll understand joboptions files (or their equivalent). (2) What version of PDF? Remember, only the latest version (OK, the last 2 at a push) work with portfolios. (3) The Word 2007 PDF plug-in & SP offering are desperately sad pieces of work. Before I take a serious look at the 2010 offerings for Word & SharePoint, please promise me you've done better than that. ;-)

  • I am about to embark on something similar to what flexdoc creator Robert te Kaat described. I need to export data from a database to a document which is in a pre-defined template(insurance policy pack in my case). However I need to do it synchronously i.e. when a client requests this on the web.There are a lot of custom be-spoke stuff out there for this, however i want to use something generic,supported and XML API based(SOA type), the reason i am here. Can anyone guide me to such an example or resource. Does Word Automation Services solve my queries. Thanks

1 2  Next >
Comments

Comments: (loading) Collapse