You can use your favorite social network to register or link an existing account:
Or use your email address to register without a social network:
Sign in with these social networks:
Or enter your username and password
Forgot your password?
Yes, please link my existing account with for quick, secure access.
No, I would like to create a new account with my profile information.
Tips
How-to
News
Videos
Stories
In my first two posts in the series on Word Automation Services, I talked about what it is and what it does – in this post, I wanted to drill in on how the service works from an architectural standpoint, and what that means for solutions built on top of it.
The most important component of Word Automation Services is getting a core engine with 100% fidelity to desktop Word running on the server – accordingly, much of our effort was focused on this task. If you've ever tried to use desktop Word on the server, you're acutely aware of the work that went into this – we needed to "unlearn" many of the assumptions of the desktop, e.g.:
This means architecture changes that run the gamut from huge, obvious ones (e.g. ensuring that we never write to the hard disk, in order to avoid I/O contention when running several processes in parallel) to small, unexpected ones (e.g. ensuring that we never recalculate the AUTHOR field, since there's no "user" associated with the server conversion).
What this means for you: we've built an engine that's truly optimized for server – it's faster than client in terms of raw speed, and it scales up to multiple cores (as we eliminated both resource contention and cases where the app assumed it lived "alone" – access to normal.dotm being one example that's familiar to folks who've tried to do this before) and across server farms through load balancing.
Having this engine is one step, but we also needed to integrate it into SharePoint Server 2010, enabling us to work within a server ecosystem with other Office services.
To do this, we needed an architecture that enabled us to both:
The result is a system that's asynchronous in nature (something I've alluded to in previous posts). Essentially, the system works like this:
That has two important consequences for solutions:
Dissecting those consequences a little further:
The asynchronous nature of the service means you need to set up your solutions to use either list events or the job status API to find out when a conversion is complete. For example, if I wanted to delete the original file once the converted one was written, as commenter Flynn suggested, I would need to do something like this:
public void ConvertAndDelete(string[] inputFiles, string[] outputFiles){ //start the conversion ConversionJob job = new ConversionJob("Word Automation Services"); job.UserToken = SPContext.Site.UserToken; for (int i = 0; i < inputFiles.Count; i++) job.AddFile(inputfiles[i], outputFiles[i]); job.Start(); bool done = false; while(!done) { Thread.Sleep(5000); ConversionJobStatus status = new ConversionJobStatus("Word Automation Services", jobId, null); if(status.Count == (status.Succeeded + status.Failed + status.Canceled)) //everything done { done = true; //only delete successful conversions ConversionItemInfo[] items = status.GetItems(ItemType.Succeeded); foreach(ConversionItemInfo item in items) SPContext.Web.Files.Delete(item); } }}
Now, clearly using Thread.Sleep isn't something you'd want to do if this is going to happen on many threads simultaneously on the server, but you get the idea – a workflow with a Delay activity is another example of a solution to this situation.
The maximum throughput of the service is essentially mathematically defined at configuration time:
By default, these values are:
You can tune the frequency as low as one minute, or increase the number of files/number of worker processes to increase this number as desired, based on your desire to trade off higher throughput and higher CPU utilization – you might keep this low if the conversion process is low-priority and the server is used for many other tasks, or crank it up if throughput is paramount and the server is dedicated to Word Automation Services.
We recommend that, for server health, that two constraints are followed in this equation:
Of course, by adding CPU cores and/or application servers, this still allows for an unbounded maximum throughput.
That's a high-level overview of how the system works – in the next post, I'll drill into a couple of scenarios that illustrate typical uses of the service.
- Tristan
Comments: (16) Collapse
todoesta vien pero un poco mas reducido estaria mejor
My Company has developed a product on top of SharePoint (can also be used without SharePoint) to do PDF Conversion of the various office formats (Word, Excel, PowerPoint, Infopath, Visio, Publisher etc) At some stage it makes sense for us to add SharePoint 2010's native Word PDF Conversion service as an option. However, as our service may be installed on a separate machine that has no knowledge about SharePoint, how can we access the Office PDF conversion? Can it be accessed via the Job object model only or is there also a remote Web Services interface? Are you planning any further options such as PDF Security and watermarking, or is that something we'll need to continue to do in our own application? Our Web Services interface is described at www.muhimbi.com/.../converting-office-files-to-pdf-format.html
Is it possible to use the engine itself, without the use of the queueing mechanism? And assuming the core engine itself does not technically rely on the existense of a SharePoint instance+sitecollection: can it be used from outside SharePoint? It would be nice if a non-SharePoint solution can also use this feature.
@Jeroen: Out of the box, you must use the provided OM to run the service; it's fairly trivial to write your own web service on top of that API, though, and we imagine that many folks will do just that to get access to it from off-machine. I'll try to blog an example of that in the future. @Robert: It is not possible - right now, the system only works via the queue. It's also not usable outside of SharePoint - building into SharePoint provided us with a significant number of benefits, as the platform is very well-developed. Good feedback for the future, though.
I found this site to be helpful. Microsoft is a fair company that has great influnce on my schooling. I look forward to trying this version and seeing if it is the write fit for me.
Hi Tristan. I have managed to setup sharepoint 2010 server with the sole intention of creating conversion jobs. I have managed to connect to this server using the sharepoint dlls so am happy that part is working. I am now trying to use the code posted above in this blog but am unsure as to which libraries I need to include to use the ConversionJob object. Many thanks for your help, Lewis.
Hi, I found the libraries I was looking for but having trouble getting this to work. I'm assuming the above code is running in the context of sharepoint. I am trying to run the code from a client machine and am struggling to find out how to point it to the server to setup a conversion job there. Any help is greatly appreciated.
Will there be a way to get a page count of either the source document, or the converted documents?
@Lewis: The API for the service can only be called from the machine running SharePoint itself - to start conversions from a client machine, you'd need to write some simple web services that you could call remotely; they should be easy to create. I'll add an example of this to my list of future blog topics. @Balazs: If you convert the file to XPS, for example, counting the number of pages in the document should be trivial by parsing the XML.
+1 for getting this unmarried from SharePoint, please!
@Michael Teper: Thanks for the feedback - if the service was still part of SharePoint in the future, but could read/write files from non-SharePoint locations (so you could use the SharePoint machine simply for conversions, and not file storage), would that be sufficient to solve your scenario?
I've tried this on a few environments, stand-alone, farm, Beta 2, RC, Server 2008 and Server 2008 R2 (I have a big tree of VM snapshots). On each I get the same error when calling the ConversionJob constructor - NullReference exception. It's coming from internal static WordServiceProxy FindServiceProxyCore(SPFarm farm, out string errorString) { WordServiceProxy proxy = farm.ServiceProxies.GetValue (); I believe I have followed the instructions on setting this up to the letter, but no dice. Can you suggest anything I could try to verify that this service actually works? Note: I believe the setups are clean because I am building demos for a wide variety of service application features and they are all working. Thanks in advance!
I forgot to change the build target to x64.
I am trying to do equations but I can not figure out how to get my bottom denominator to work for me. I am having problems with it. I have worked and worked with it until I get frustraed with it.
Comments: (loading) Collapse