分享 Simple Word document templating using Ruby and XML

kevinhua · 2013年09月18日 · 2840 次阅读

Cited from: http://tomasvarsavsky.com/2009/04/04/simple-word-document-templating-using-ruby-and-xml/

In my current project we have a requirement to merge simple data into Microsoft Word document templates. Ruby comes with the WIN32OLE library which can manipulate Office documents. WIN32OLE has a few major downsides — it only runs on Windows, it requires Microsoft Office to be installed and it works by sending commands to Word itself to perform operations. Using Word as a back end system for a web application used by 50 people made us nervous so a different approach was needed. We came up with a combination of Ruby, Office Open XML file format, XML processing with Nokogiri and native Zip libraries that works. Office Open XML file formats The new Office file formats (.docx, .xlsx, .pptx files) are basically a zipped collection of XML files. We focused on Word files (.docx) but this approach would work with any of the other types of files as well. The specification for the format weighs in at several thousand pages. Producing a file from scratch without a purpose built library that handles all the intricacies of the format would be quite a task. Instead, we drafted the templates in Word and placed markers to tell our templating engine where to insert values. We created document properties which reference data values and added these as fields into the document in the place where the values should be inserted. For example, we could have fields like:

label_tag #{data[:user].name}
label_tag #{data[:user].address}
label_tag #{data[:booking].number}
label_tag #{data[:booking].items.collect{|i| i.name}.join(‘,’)}

If it looks a bit like Ruby code, it’s because it is! The expressions get evaluated by our templating engine and the results are inserted into the document. Ruby in Word documents, a world first? Opening the documents To read and create documents we need to unzip and re-zip the document. We had trouble using Ruby’s standard RubyZip library. For some reason Word gave a nasty warning when opening files created with RubyZip. Our application has to run on Windows, Linux and Mac so we created an adapter that delegated to standard operating system zip executables based on the host platform. To keep it fast, we extract and re-added only the files that we need to work on. This is important because some documents can become very large when they contain embedded objects such as images. Processing the template The document content can be found in the file word/document.xml inside the zip archive. The fields in the template come out as fldSimple tags that look like this:

Template Field: User Name

To process the document.xml we simply need to find all the fields that have the text label_tag in the w:instr attribute: xml.xpath(“//w:fldSimple[contains(@w:instr, 'label_tag')]“).each do |element|

process each element here

end The rest is simple. We extract the expression in the element text using a regular expression, evaluate it and insert it back into the XML which ends up looking like this:

Tomas Varsasvky

We add the attribute fldLock with value true to make the field read-only so the user cannot change it when they open the document. We also have tags to create lists, insert rows into tables and duplicate sections in the document. These are a bit more complicated in their XML manipulation. Beware, we had a few issues dealing with Word’s nasty XML which can vary a bit between versions and sometimes do unexpected things with formatting. Conclusion This approach worked really well for us and I would recommend it for simple field merging.

暂无回复。
需要 登录 后方可回复, 如果你还没有账号请点击这里 注册