A Case Study in Website Maintenance (CMS) Using XML
|
PROJECT PURPOSE: |
- Provide a simple means for managing website content (CMS) for use by people
with little or no training in HTML or, for that matter, web pages.
|
DESIGN GOALS: |
- Maintain overall integrity of the look and feel of the website while editing.
- Provide as much WYSIWYG editing as practical.
- Provide easy, but secure, access to the page editing forms.
- Use intuitive and unobtrusive editing interface elements.
- Ensure page is previewed before publication.
- Ensure all images used on a page are also published to the server.
- Archive previous versions of both the base content (.XML) and final website page (.HTML) files.
|
SAMPLE WEBSITE: |
- To view and use the technologies discussed on an actual sample website, go to:
|
http://www.ElectronicSolutionsCo.com/xmledit/demo.html
|
| To read the simple editing instructions and caveats,
click here.
|
WHAT YOU CAN EXPECT FROM THIS CASE STUDY: |
- The thoughts of and lessons learned by developers who started with a simple (mis?)understanding of
XML and XSLT technologies, forged ahead regardless, and successfully put together a technology package
satisfying the project requirements. Were we overbudget and overschedule? You betcha'.
Fortunately, what we lost in time and money on this first project was viewed from the beginning as an
investment in XML technologies — an investment that returned a firm understanding of the usefulness
and applicability of XML and XSLT for our use along many future avenues.
|
WHAT YOU SHOULD NOT EXPECT FROM THIS CASE STUDY: |
|
PARTICIPANTS: |
- Electronic Solutions Company — ESC is
an engineering consulting firm with a
broad skill set
currently focusing on the home automation and management market. ESC is the "we" in this case study
providing all XML, XSLT, and related JavaScript and C++ programming.
(contact)
- The Harrington Group — The Corporate Communications
Specialist providing training, advertising, and promotional programs for broadcast, video, DVD,
multimedia, print and public relations. THG provided all website design and the foundation HTML and DHTML
used for the original client website and thus the sample pages included here.
(contact)
|
TECHNOLOGIES: |
- XML — holds underlying website content.
- XML Schema — provides validation of XML content.
- XSLT — transforms XML to XHTML for either an XHTML website page or
for a content editing form.
- XHTML — HTML with rules; provides cross-browser compatible version of website page for viewing
by all site visitors.
- CSS — removes the bulk of a page's formatting instructions to a seperate file
where global changes can be easily implemented.
- Microsoft XML Parser (MSXML) Version 3 — incorporated in Internet Explorer;
used for client edit form view and page preview.
- Microsoft XML Parser (MSXML) Version 4 — used on web server for final page
transformation.
- JavaScript — provides client handling of page editing input controls including
DOM manipulation, XSLT transformations, and page publication.
- Microsoft ActiveX Data Object (ADO) database access — not used for any actual database access; instead
ADO provided an easy (and possibly the only) access method to binary image files on the local client for uploading to the
server as part of the XML.
- Internet Information Server (IIS) V5 on Windows 2000 — website server.
- Internet Server Application Programming Interface (ISAPI) — provides interface
between IIS and ESC's XMLEdit DLL.
- ESC's XMLEdit ISAPI DLL — provides server side transformation of XML into final XHTML
page; also archives previous versions of XML and XHTML files.
- Internet Explorer Menu Extensions — provides right-click context menu access to
the page editing form.
|
DEVELOPMENT SEQUENCE: |
- Locate sample code to perform XSLT transformations on XML using MSXML, and develop
minimal test case to verify reasonable sanity of the concept.
The sample code was Microsoft's MSXSL command line processor.
- Design and implement the website's basic graphics package and populate pages with representative
content. This was performed by a website design firm with no instruction
regarding the eventual use of XML to maintain website content. We had worked with
the design firm before and expected their results would include a consistent look and
feel among pages that would lend itself to simpler XSLT transformations. And, indeed,
that is what was delivered.
- Identify the primary types of pages included on the website. On this project,
there were three: the home page, interior content pages, and email contact forms.
- Identify and separate variable content structures from base page elements.
Not surprisingly, the base page elements included the <head> element
and its contents, the navigation menu structures, and the page footer.
- Identify commonalities and differences among base page elements for
the different primary types of pages. On this project, the home page had slightly enhanced
graphics at the top of the page compared to the inner pages, and obviously, the
<title>, <meta keywords="...">, and <meta description="...> elements
would change from page to page.
- Identify commonalities and differences among variable content structures for
all types of pages.
- Develop XML Schema representing variable content structures. The XML Schema
continued to evolve throughout the project
- Migrate sample page(s) from HTML to XML content files – a fairly straightforward
process. Familiarity with the XML Schema allowed us to quickly cut and paste from the
original HTML pages into the corresponding XML files.
- Develop XSLT transforms to convert XML content files to XHTML web pages. Let the learning
curve begin... (For such a short item in this list, this took a significant chunk of time.)
- Copy and modify the sample HTML pages to display form elements to edit the page content.
Blocks of text became <textarea> elements and single lines of text became <input type="text"...> elements.
Page images required substantially more information and a group of appropriate input elements was defined to support this.
For the most part style information for web page elements was not an editable option. Instead this information was typically
carried in the XML element type, e.g. Paragraph vs. BoldIndent. As the number of style options increases, there
seemed to be a natural break over from XML element types with fixed styles to element types with attributes to define
optional style characteristics, e.g. an image caption's size, color, and position.
- Copy the original XSLT transforms to another set of transforms incorporating the HTML modifications to display edit forms as
designed above. The initial cut at this actually took very little time, much less than determining
what form elements to put where for each XML element.
- Develop JavaScript support for all edit form elements. Although already familiar with basic JavaScript, not being
familiar with manipulating the XML DOM slowed the initial progress on this.
As familiarity grew with manipulating the XML DOM, the JavaScript development gathered momentum.
Development bugs in the JavaScript actually facilitated learning a substantial amount about both the HTML and XML DOM as the debugger
allowed you to easily pry into nodes and actually traverse most of the node tree as well.
- With a new found understanding of XML, XSLT, XML DOM, and the integration and interactions
of these technologies with each other, redefine the variable content structures, the supporting
XML Schema, and both flavors of XSLT transformations: web page and edit form. (Every good project should
have this step, don't you think?)
- Develop the ISAPI DLL to support server-side page publication and archiving. ISAPI DLLs allow C/C++ programmers to
manipulate the server and its delivery of content. To date, everything that we have attempted using ISAPI has been possible.
The downside to ISAPI DLLs is that many web hosting companies take an extremely cautious approach to allowing them on their
servers and deservedly so — everything that we have attempted using ISAPI has been possible.
- Polish, polish, and polish the edit form user interface. Here we discovered we could apply the CSS styles to the various
text edit boxes to enhance WYSIWYG. Looking back, the concept seems somewhat obvious. It was just one those things that
wasn't tried for some reason until now.
The hint text above the text entry boxes was put in place to let the user know what was
expected as input.
Also this is where the little red arrows with the mouseover action were implemented. Again, this greatly improved the
WYSIWYG feel of the edit form by eliminating the large list boxes from the page except when the user needed them.
And finally, additional JavaScript and ISAPI C++ code was implemented to work around IE input element navigation issues and IE DOM
manipulation crashes.
XSLT really shined (all that polish, you know) for this sort of after-the-design-stage modifications.
A much higher confidence with and understanding of XSLT made it a simple matter to modify the XSLT to
try different UI tweaks, and since the edit form page is built on the fly by IE using the stylesheet processing instruction
in the XML file, testing the XSLT changes was simply a matter of clicking the IE Refresh button.
|
SUCCESSES: |
- No doubt about it. XSLT delivers as promised. Fundamentally, we had three XSLT files:
- one for common website elements and XSLT function templates,
- one for edit page display, and
- one for webpage publication
Using additional wrapper files (used for nothing more than setting a global variable indicating if the
XML being transformed should receive the home page graphics treatment),
the XML content was easily transformed into whatever XHTML was required for the task at hand.
- Tying XHTML form elements to the underlying XML DOM node through an addressing scheme
that indexes a node by starting at the root node and working down through the node
tree until you get to the current node (basically count preceeding-siblings at each
level of the node tree). All those JavaScript handlers for the edit page
form elements become much easier when you have the ability to easily access
ancestoral nodes without passing them throughout your code as multiple parameters. Knowing
the id of the XHTML element being processed allowed us to find the corresponding XML node with ease,
and vice-a-versa.
This became extremely important when inserting or deleting nodes within the XML DOM tree. First you had
to transform the parent node of the new or deleted node to derive the new XHTML to display on the
edit page. Next you had to find XHTML "div" element corresponding to the newly transformed XML parent
node. After you had this it was a simple matter to update the ".innerHtml" of the div and
display the modified XML for further editing.
The use of the addressing scheme allowed us to easily get the parent node from the newly inserted or deleted node.
More importantly, it allowed us to search through XHTML DOM for the first available ancestoral "div" node
that could be used to update the XHTML. Fortunately, through all the XML manipulations and transformations, the addressing
scheme worked consistently and the updated XHTML simply reflected the new addresses assigned to the original nodes.
- The use of a "delete" attribute allowed many delete operations to be reversed without the loss of
information contained in any child nodes. Setting the delete attribute allowed the XSLT transform to simply
insert a placeholder on the edit page to restore the deleted node. Deleted nodes were later scrubbed from the
XML when the page was published.
- The use of an XSLT identity transform with appropriate modifications to filter deleted, empty, and unneccessary elements from
the modified XML when used to publish a new page. This was the smallest of transforms, but that made it no less tricky. This is
a fundamental methodology in XSLT and many examples of many modified identity transforms are to be found on the Internet — many examples.
Just to strengthen this point, it wasn't until these sample pages were being put together that we discovered you could
copy processing-instructions in an identity transform (or any other transform) as well.
Another gotcha came when handling
simple-content XHTML nodes embedded within other XML nodes. Special handling had to be included to pass <br />'s and <hr />'s
through to the output file.
- When a global change to navigation elements on the website occurred, it was extremely easy to rebuild all the XHTML pages
to incorporate the change. The use of Microsoft's MSXSL command line utility and an appropriate batch file allowed a complete
rebuild in an extremely short time compared to what it would take for hand editing every file.
- A small detail such as automatically changing the "This page was last updated on..." statement at the bottom
of the page was possible using XSLT and MSXML's support of JavaScript functions within the transform. A page gets published;
the publication date appears at the bottom of the page without user intervention.
- Internet Explorer's built-in and proper handling of XML files and embedded XSLT
specifications certainly simplified access to page editing forms.
- The ease with which IE's right click context-menu can be extended. Simply create
an HTML file with appropriate scripting inside and enter its location in the registry.
Click here to download a .reg file that can be used to add the "XMLEdit Edit Page" menu
item to the context menu. After downloading, navigate to the destination directory and simply double click
on XMLEdit.reg to register the IE context menu entry. For easy editing of any demo .html page, right
click to display IE's context menu, and then select the XMLEdit Edit Page menu item.
|
TRIALS & TRIBULATIONS: |
- First and foremost, XSLT is painful to learn. How the simple and elegant concepts of XSLT can become a learning curve cliff
is truly amazing. Between figuring out the somewhat mystical (and magical) XPath syntax and how to use named templates
as callable "functions", one can chew through a substantial amount of the schedule. (For the record, 15+ years of
FORTRAN, PASCAL, multiple assembly languages, and C/C++ expertise was brought to bear on the project. XSLT/XPath was one of
the more challenging new experiences.)
- XML namespaces. Another simple concept with a major pain factor. The problem is not in understanding their purpose and use; the problem is keeping
them out of your output. It took until just recently to kill various and sundry namespace declarations sprinkled throughout the XHTML output.
Really nothing more than discovering (and truly understanding) the exclude-result-prefixes attribute in the <xsl:stylesheet...>
element. Yet another simple concept, but just like Friendly's seven-scoop super sundae, too many simple concepts in one bowl
can still slow you down.
- XML, XSD, and XSLT processing can lead to, shall we say, idiosyncracies. Many were misconceptions on our part or bugs
in our code that needed to get sorted out; many were defects in other's code. To this day, the addition and deletion of
"a number" of XML elements with updates to the IE display of these elements crashes IE. This led to the "Save"
button on the page editing forms.
The specifications for these standards are voluminous to say the least — a testament to the drafters' attempts to cover
all the possibilities. Unfortunately, as with any legal contract, the larger it becomes, the more vulnerable it becomes to
misinterpretation by different parties. Add to this large, multi-programmer software development efforts, and it's no wonder
many XSLT developers use multiple engines to test their transforms. Sanity checks in an insane environment are de rigueur.
- Really basic information on how to handle different XML integration issues is hard to find on the Internet.
It took more than a day just to
find out how to access the XML DOM in JavaScript. There were plenty of examples of what to do with it once it was had, but simply accessing
it when it was automatically loaded by IE was lost among these details. (For anyone on the quest for this answer: document.XMLDocument is the root node for the
XML DOM and document.XSLDocument is the root node for the associated XSLT. These get you what's easily available which is not necessarily what you need! Some XML DOM
methods are only available with a Msxml.FreeThreadedDOMDocument object. document.XMLDocument returns a straight Msxml.DOMDocument )
Many XML/XSLT/JavaScript-related web searches yield way too many results to find the proverbial needle, and most results deal with more subtle issues than what you should have learned in XML Usage 101.
The various XML DOM and XSLT tutorials found on the Internet dealt primarily with the individual technologies. A tutorial on tying them all together within a
JavaScript framework was never found. For our C/C++ ISAPI DLL, we started with and modified the Microsoft's MSXSL command line utility.
- The page editing concept fell apart when faced with pages that contained forms themselves (e.g. email contact pages). It's unclear how large a headache it would be to support
these within an XML definition and provide an editing mechanism for them. Quite frankly, the hangover from learning XSLT left no desire to even begin to tackle the concept of
forms-within-forms XHTML. Another day...
|
LESSONS LEARNED: |
- Use variables.
- Use lots of variables. It may or may not be efficient; no testing was done either
way. Until the transformation comes along that requires the ultimate in efficiency,
variables can greatly clarify the XSLT. Even if an XPath expression is used only once, if
the expression is "long", assigning it to a meaningfully named variable with less
characters will go a long ways towards readability in your XSLT.
Another use for variables (and this one probably suffers more of a performance penalty)
is to break highly convoluted XPath expressions into pieces with the results placed
into appropriately named variables. There appears to be a dozen or so experts on
the Mulberry
Technologies XSLT discussion list who excel at pulling XPath expressions apart.
The rest of us may be better off with self-documenting baby steps.
- Learn to use named templates as functions for assigning values to variables.
It's a whole new concept when it comes to returning results from
a named template (Ok, so all of XSLT is a whole new concept), but once the light bulb
goes on, function templates can make initializing all those variables much easier.
The key is to remember that whatever would have gone to the output stream now ends
up in a variable. Your XSLT processor's node-set extension function will be your
friend when it comes to processing further any returned XML fragments.
- For those comfortable with text editing
and the concepts of markup, it quickly became apparent that editing XML content
is much preferred to editing actual HTML.
|
RESOURCES: |
- XSLT Tutorial — The ZVON XSLT Tutorial is
straightforward and to the point. Great for understanding XSLT usage after a familiarization with basic XSLT concepts.
- XSLT Email Discussion List — Everybody's favorite, the Mulberry Technologies
XSLT discussion list gets you answers to the subtleties of XSLT (and these can be very subtle indeed...)
- XML, XSLT, & XPath References — TopXML
has references for Microsoft's XML DOM, XSLT, and XPath. These are often more readable than the original source documents
upon which they are based. Be aware that the readability is achieved by sacrificing some detail.
- EXSLT — From the EXSLT website: "EXSLT is a community initiative to provide extensions to XSLT."
EXSLT is the XSLT equivalent of a C-Runtime library. It defines elements and functions that resolve
many common, but sticky, XSLT situations. In addition to the definitions (and probably even more important),
different implementations
are available for use in different XSLT processing environments.
- The actual Microsoft XML DOM
Reference freely admits when it strays from the official W3C DOM specification. This reference combined with a good JavaScript debugger lets you find
little treasures hidden throughout the DOM objects.
|