A Case Study in XML / XSLT Website Maintenance (CMS) -- Project Description

A Case Study in Website Maintenance (CMS) Using XML by Electronic Solutions Company
PROJECT PURPOSE:	Provide a simple means for managing website content (CMS) for use by people with little or no training in HTML or, for that matter, web pages.
DESIGN GOALS:	Maintain overall integrity of the look and feel of the website while editing. Provide as much WYSIWYG editing as practical. Provide easy, but secure, access to the page editing forms. Use intuitive and unobtrusive editing interface elements. Ensure page is previewed before publication. Ensure all images used on a page are also published to the server. Archive previous versions of both the base content (.XML) and final website page (.HTML) files.
SAMPLE WEBSITE:	To view and use the technologies discussed on an actual sample website, go to:
http://www.ElectronicSolutionsCo.com/xmledit/demo.html
	To read the simple editing instructions and caveats, click here.
WHAT YOU CAN EXPECT FROM THIS CASE STUDY:	The thoughts of and lessons learned by developers who started with a simple (mis?)understanding of XML and XSLT technologies, forged ahead regardless, and successfully put together a technology package satisfying the project requirements. Were we overbudget and overschedule? You betcha'. Fortunately, what we lost in time and money on this first project was viewed from the beginning as an investment in XML technologies — an investment that returned a firm understanding of the usefulness and applicability of XML and XSLT for our use along many future avenues.
WHAT YOU SHOULD NOT EXPECT FROM THIS CASE STUDY:	A showcase of best practices — We were not experts when we started, and we won't claim to be now. We're closer, but it's doubtful you'll find anything here revolutionary. Any JavaScript, XSLT files, or other methods found here may or may not be commented, formatted, or otherwise to your liking, and in some cases, may not be up to usual industry standards. A bug-free experience — Although, we know of no bugs currently within the JavaScript and XSLT used on this site, we have seen untraceable idiosyncrasies that occasionally affect performance. Just another case of this all too familiar transformation: <xsl:template match="leading-edge"> <xsl:element name="bleeding-edge" /> </xsl:template>
PARTICIPANTS:	Electronic Solutions Company — ESC is an engineering consulting firm with a broad skill set currently focusing on the home automation and management market. ESC is the "we" in this case study providing all XML, XSLT, and related JavaScript and C++ programming. (contact) The Harrington Group — The Corporate Communications Specialist providing training, advertising, and promotional programs for broadcast, video, DVD, multimedia, print and public relations. THG provided all website design and the foundation HTML and DHTML used for the original client website and thus the sample pages included here. (contact)
TECHNOLOGIES:	XML — holds underlying website content. XML Schema — provides validation of XML content. XSLT — transforms XML to XHTML for either an XHTML website page or for a content editing form. XHTML — HTML with rules; provides cross-browser compatible version of website page for viewing by all site visitors. CSS — removes the bulk of a page's formatting instructions to a seperate file where global changes can be easily implemented. Microsoft XML Parser (MSXML) Version 3 — incorporated in Internet Explorer; used for client edit form view and page preview. Microsoft XML Parser (MSXML) Version 4 — used on web server for final page transformation. JavaScript — provides client handling of page editing input controls including DOM manipulation, XSLT transformations, and page publication. Microsoft ActiveX Data Object (ADO) database access — not used for any actual database access; instead ADO provided an easy (and possibly the only) access method to binary image files on the local client for uploading to the server as part of the XML. Internet Information Server (IIS) V5 on Windows 2000 — website server. Internet Server Application Programming Interface (ISAPI) — provides interface between IIS and ESC's XMLEdit DLL. ESC's XMLEdit ISAPI DLL — provides server side transformation of XML into final XHTML page; also archives previous versions of XML and XHTML files. Internet Explorer Menu Extensions — provides right-click context menu access to the page editing form.
DEVELOPMENT SEQUENCE:	Locate sample code to perform XSLT transformations on XML using MSXML, and develop minimal test case to verify reasonable sanity of the concept. The sample code was Microsoft's MSXSL command line processor. Design and implement the website's basic graphics package and populate pages with representative content. This was performed by a website design firm with no instruction regarding the eventual use of XML to maintain website content. We had worked with the design firm before and expected their results would include a consistent look and feel among pages that would lend itself to simpler XSLT transformations. And, indeed, that is what was delivered. Identify the primary types of pages included on the website. On this project, there were three: the home page, interior content pages, and email contact forms. Identify and separate variable content structures from base page elements. Not surprisingly, the base page elements included the <head> element and its contents, the navigation menu structures, and the page footer. Identify commonalities and differences among base page elements for the different primary types of pages. On this project, the home page had slightly enhanced graphics at the top of the page compared to the inner pages, and obviously, the <title>, <meta keywords="...">, and <meta description="...> elements would change from page to page. Identify commonalities and differences among variable content structures for all types of pages. Develop XML Schema representing variable content structures. The XML Schema continued to evolve throughout the project Migrate sample page(s) from HTML to XML content files – a fairly straightforward process. Familiarity with the XML Schema allowed us to quickly cut and paste from the original HTML pages into the corresponding XML files. Develop XSLT transforms to convert XML content files to XHTML web pages. Let the learning curve begin... (For such a short item in this list, this took a significant chunk of time.) Copy and modify the sample HTML pages to display form elements to edit the page content. Blocks of text became <textarea> elements and single lines of text became <input type="text"...> elements. Page images required substantially more information and a group of appropriate input elements was defined to support this. For the most part style information for web page elements was not an editable option. Instead this information was typically carried in the XML element type, e.g. Paragraph vs. BoldIndent. As the number of style options increases, there seemed to be a natural break over from XML element types with fixed styles to element types with attributes to define optional style characteristics, e.g. an image caption's size, color, and position. Copy the original XSLT transforms to another set of transforms incorporating the HTML modifications to display edit forms as designed above. The initial cut at this actually took very little time, much less than determining what form elements to put where for each XML element. Develop JavaScript support for all edit form elements. Although already familiar with basic JavaScript, not being familiar with manipulating the XML DOM slowed the initial progress on this. As familiarity grew with manipulating the XML DOM, the JavaScript development gathered momentum. Development bugs in the JavaScript actually facilitated learning a substantial amount about both the HTML and XML DOM as the debugger allowed you to easily pry into nodes and actually traverse most of the node tree as well. With a new found understanding of XML, XSLT, XML DOM, and the integration and interactions of these technologies with each other, redefine the variable content structures, the supporting XML Schema, and both flavors of XSLT transformations: web page and edit form. (Every good project should have this step, don't you think?) Develop the ISAPI DLL to support server-side page publication and archiving. ISAPI DLLs allow C/C++ programmers to manipulate the server and its delivery of content. To date, everything that we have attempted using ISAPI has been possible. The downside to ISAPI DLLs is that many web hosting companies take an extremely cautious approach to allowing them on their servers and deservedly so — everything that we have attempted using ISAPI has been possible. Polish, polish, and polish the edit form user interface. Here we discovered we could apply the CSS styles to the various text edit boxes to enhance WYSIWYG. Looking back, the concept seems somewhat obvious. It was just one those things that wasn't tried for some reason until now. The hint text above the text entry boxes was put in place to let the user know what was expected as input. Also this is where the little red arrows with the mouseover action were implemented. Again, this greatly improved the WYSIWYG feel of the edit form by eliminating the large list boxes from the page except when the user needed them. And finally, additional JavaScript and ISAPI C++ code was implemented to work around IE input element navigation issues and IE DOM manipulation crashes. XSLT really shined (all that polish, you know) for this sort of after-the-design-stage modifications. A much higher confidence with and understanding of XSLT made it a simple matter to modify the XSLT to try different UI tweaks, and since the edit form page is built on the fly by IE using the stylesheet processing instruction in the XML file, testing the XSLT changes was simply a matter of clicking the IE Refresh button.
SUCCESSES:	No doubt about it. XSLT delivers as promised. Fundamentally, we had three XSLT files: one for common website elements and XSLT function templates, one for edit page display, and one for webpage publication Using additional wrapper files (used for nothing more than setting a global variable indicating if the XML being transformed should receive the home page graphics treatment), the XML content was easily transformed into whatever XHTML was required for the task at hand. Tying XHTML form elements to the underlying XML DOM node through an addressing scheme that indexes a node by starting at the root node and working down through the node tree until you get to the current node (basically count preceeding-siblings at each level of the node tree). All those JavaScript handlers for the edit page form elements become much easier when you have the ability to easily access ancestoral nodes without passing them throughout your code as multiple parameters. Knowing the id of the XHTML element being processed allowed us to find the corresponding XML node with ease, and vice-a-versa. This became extremely important when inserting or deleting nodes within the XML DOM tree. First you had to transform the parent node of the new or deleted node to derive the new XHTML to display on the edit page. Next you had to find XHTML "div" element corresponding to the newly transformed XML parent node. After you had this it was a simple matter to update the ".innerHtml" of the div and display the modified XML for further editing. The use of the addressing scheme allowed us to easily get the parent node from the newly inserted or deleted node. More importantly, it allowed us to search through XHTML DOM for the first available ancestoral "div" node that could be used to update the XHTML. Fortunately, through all the XML manipulations and transformations, the addressing scheme worked consistently and the updated XHTML simply reflected the new addresses assigned to the original nodes. The use of a "delete" attribute allowed many delete operations to be reversed without the loss of information contained in any child nodes. Setting the delete attribute allowed the XSLT transform to simply insert a placeholder on the edit page to restore the deleted node. Deleted nodes were later scrubbed from the XML when the page was published. The use of an XSLT identity transform with appropriate modifications to filter deleted, empty, and unneccessary elements from the modified XML when used to publish a new page. This was the smallest of transforms, but that made it no less tricky. This is a fundamental methodology in XSLT and many examples of many modified identity transforms are to be found on the Internet — many examples. Just to strengthen this point, it wasn't until these sample pages were being put together that we discovered you could copy processing-instructions in an identity transform (or any other transform) as well. Another gotcha came when handling simple-content XHTML nodes embedded within other XML nodes. Special handling had to be included to pass <br />'s and <hr />'s through to the output file. When a global change to navigation elements on the website occurred, it was extremely easy to rebuild all the XHTML pages to incorporate the change. The use of Microsoft's MSXSL command line utility and an appropriate batch file allowed a complete rebuild in an extremely short time compared to what it would take for hand editing every file. A small detail such as automatically changing the "This page was last updated on..." statement at the bottom of the page was possible using XSLT and MSXML's support of JavaScript functions within the transform. A page gets published; the publication date appears at the bottom of the page without user intervention. Internet Explorer's built-in and proper handling of XML files and embedded XSLT specifications certainly simplified access to page editing forms. The ease with which IE's right click context-menu can be extended. Simply create an HTML file with appropriate scripting inside and enter its location in the registry. Click here to download a .reg file that can be used to add the "XMLEdit Edit Page" menu item to the context menu. After downloading, navigate to the destination directory and simply double click on XMLEdit.reg to register the IE context menu entry. For easy editing of any demo .html page, right click to display IE's context menu, and then select the XMLEdit Edit Page menu item.
TRIALS & TRIBULATIONS:	First and foremost, XSLT is painful to learn. How the simple and elegant concepts of XSLT can become a learning curve cliff is truly amazing. Between figuring out the somewhat mystical (and magical) XPath syntax and how to use named templates as callable "functions", one can chew through a substantial amount of the schedule. (For the record, 15+ years of FORTRAN, PASCAL, multiple assembly languages, and C/C++ expertise was brought to bear on the project. XSLT/XPath was one of the more challenging new experiences.) XML namespaces. Another simple concept with a major pain factor. The problem is not in understanding their purpose and use; the problem is keeping them out of your output. It took until just recently to kill various and sundry namespace declarations sprinkled throughout the XHTML output. Really nothing more than discovering (and truly understanding) the exclude-result-prefixes attribute in the <xsl:stylesheet...> element. Yet another simple concept, but just like Friendly's seven-scoop super sundae, too many simple concepts in one bowl can still slow you down. XML, XSD, and XSLT processing can lead to, shall we say, idiosyncracies. Many were misconceptions on our part or bugs in our code that needed to get sorted out; many were defects in other's code. To this day, the addition and deletion of "a number" of XML elements with updates to the IE display of these elements crashes IE. This led to the "Save" button on the page editing forms. The specifications for these standards are voluminous to say the least — a testament to the drafters' attempts to cover all the possibilities. Unfortunately, as with any legal contract, the larger it becomes, the more vulnerable it becomes to misinterpretation by different parties. Add to this large, multi-programmer software development efforts, and it's no wonder many XSLT developers use multiple engines to test their transforms. Sanity checks in an insane environment are de rigueur. Really basic information on how to handle different XML integration issues is hard to find on the Internet. It took more than a day just to find out how to access the XML DOM in JavaScript. There were plenty of examples of what to do with it once it was had, but simply accessing it when it was automatically loaded by IE was lost among these details. (For anyone on the quest for this answer: document.XMLDocument is the root node for the XML DOM and document.XSLDocument is the root node for the associated XSLT. These get you what's easily available which is not necessarily what you need! Some XML DOM methods are only available with a Msxml.FreeThreadedDOMDocument object. document.XMLDocument returns a straight Msxml.DOMDocument ) Many XML/XSLT/JavaScript-related web searches yield way too many results to find the proverbial needle, and most results deal with more subtle issues than what you should have learned in XML Usage 101. The various XML DOM and XSLT tutorials found on the Internet dealt primarily with the individual technologies. A tutorial on tying them all together within a JavaScript framework was never found. For our C/C++ ISAPI DLL, we started with and modified the Microsoft's MSXSL command line utility. The page editing concept fell apart when faced with pages that contained forms themselves (e.g. email contact pages). It's unclear how large a headache it would be to support these within an XML definition and provide an editing mechanism for them. Quite frankly, the hangover from learning XSLT left no desire to even begin to tackle the concept of forms-within-forms XHTML. Another day...
LESSONS LEARNED:	Use variables. Use lots of variables. It may or may not be efficient; no testing was done either way. Until the transformation comes along that requires the ultimate in efficiency, variables can greatly clarify the XSLT. Even if an XPath expression is used only once, if the expression is "long", assigning it to a meaningfully named variable with less characters will go a long ways towards readability in your XSLT. Another use for variables (and this one probably suffers more of a performance penalty) is to break highly convoluted XPath expressions into pieces with the results placed into appropriately named variables. There appears to be a dozen or so experts on the Mulberry Technologies XSLT discussion list who excel at pulling XPath expressions apart. The rest of us may be better off with self-documenting baby steps. Learn to use named templates as functions for assigning values to variables. It's a whole new concept when it comes to returning results from a named template (Ok, so all of XSLT is a whole new concept), but once the light bulb goes on, function templates can make initializing all those variables much easier. The key is to remember that whatever would have gone to the output stream now ends up in a variable. Your XSLT processor's node-set extension function will be your friend when it comes to processing further any returned XML fragments. For those comfortable with text editing and the concepts of markup, it quickly became apparent that editing XML content is much preferred to editing actual HTML.
RESOURCES:	XSLT Tutorial — The ZVON XSLT Tutorial is straightforward and to the point. Great for understanding XSLT usage after a familiarization with basic XSLT concepts. XSLT Email Discussion List — Everybody's favorite, the Mulberry Technologies XSLT discussion list gets you answers to the subtleties of XSLT (and these can be very subtle indeed...) XML, XSLT, & XPath References — TopXML has references for Microsoft's XML DOM, XSLT, and XPath. These are often more readable than the original source documents upon which they are based. Be aware that the readability is achieved by sacrificing some detail. EXSLT — From the EXSLT website: "EXSLT is a community initiative to provide extensions to XSLT." EXSLT is the XSLT equivalent of a C-Runtime library. It defines elements and functions that resolve many common, but sticky, XSLT situations. In addition to the definitions (and probably even more important), different implementations are available for use in different XSLT processing environments. The actual Microsoft XML DOM Reference freely admits when it strays from the official W3C DOM specification. This reference combined with a good JavaScript debugger lets you find little treasures hidden throughout the DOM objects.

A Case Study in Website Maintenance (CMS) Using XML

by Electronic Solutions Company