work-storage-papers-wikiGateway wikisym05-wikiGateway-oldversion

\documentclass{acm_proc_article-sp}

\usepackage[bookmarks=true, colorlinks=true, linkcolor=black, citecolor=black, urlcolor=blue,breaklinks=true]{hyperref}

\dontusepackage{graphicx}

\conferenceinfo{WikiSym?}{2005 San Diego, California USA} %%\setpagenumber{50} \CopyrightYear?{2005} %%\crdata{0-12345-67-8/90/01} % Allows default copyright data (X-XXXXX-XX-X/XX/XX) \begin{latex}

\title{WikiGateway?: a library for interoperability and accelerated wiki development} %%\subtitle{[Extended Abstract]}

\numberofauthors{1} \author{ \alignauthor Bayle Shanks
\affaddr{http://communitywiki.org/BayleShanks}
\affaddr{Computational Neurobiology}
\affaddr{University of California}
\affaddr{San Diego, La Jolla, CA 92093}
\email{bshanks at ucsd.edu} }

\date{28 April 2005} \end{latex}

\maketitle \begin{abstract} WikiGateway? is an open-source suite of tools for automated interaction with wikis:

All WikiGateway? tools are compatible with a number of different wiki engines. Developers can use WikiGateway? to hide the differences between wiki engines and build applications which interoperate with many different wiki engines.

\end{abstract}

\category{D.2}{Software engineering}{Interoperability}
%% ??: \category{H.5.3}{Information Interfaces and Presentation}{Group and Organization Interfaces} %% ??: H.5.3 Group and Organization Interfaces %% ??: \category{H.4}{Information systems applications}{Communications Applications} %% ??: %% \category{H.2}{Database management}{Heterogeneous Databases} \terms{Standardization, Design}

\keywords{wiki, interwiki, interoperability, WikiGateway?, client-side wiki, WikiClient?, middleware, Atom, WebDAV?, WikiRPCInterface?, wiki XMLRPC}

\section*{Related resources} Related resources may be found at \url{http://purl.net/bshanks/work/papers/wikigateway_wikisym05/}.

\section{Introduction} WikiGateway? is a suite of tools that allows developers to write programs that act as clients to wiki servers. WikiGateway? provides a unified interface to different types of wiki engines. "Wiki engine" refers to a type of wiki server software; for example, MoinMoin? is a wiki engine.

There are a few different ways of communicating with wiki servers using WikiGateway? tools. These options can be grouped in two categories; either the client software uses WikiGateway? components installed on the local machine, or it communicates with a WikiGateway? web service installed elsewhere.

The simplest way to interact with wikis is for software running on the client machine to call a WikiGateway? library function or tool. WikiGateway? provides modules for Python and Perl, and also a command-line client. Appendix A gives a demonstration of various WikiGateway? Python module functions.

However, WikiGateway? also allows people to run web servers that act as gateway or servers; these intermediaries accept requests from the client using a standardized protocol, and translate them into requests to the actual wiki server (which may be running on yet another machine). WikiGateway? provides tools to run servers that understand the WebDAV?, Atom, or WikiRPCInterface? protocols \cite{webdav.org,rfc2518,whitehead:ecscw99,atomwiki,gregorio:atom-xml04,WikiRPCInterface?2}. WikiGateway? also provides a server that exposes an XML-RPC interface to the Python WikiGateway? module.

WikiGateway? includes a few "demo applications" to show how it can be used. These include a despamming bot, a script to copy an entire wiki onto another wiki, and a script to upload a hierarchal directory structure of text files to a wiki (treating each text file as wiki markup source for a single wiki page).

\section{Motivations}\label{sec:motivations} There are quite a few reasons that a project like WikiGateway? is needed. First, developers are already writing "screen-scraping"-style code to interact with wiki servers; it would be more efficient for the community to write this code "once and only once" in a single library. Second, there are standardized protocols that wiki servers could support, but often don't; WikiGateway? provides gateway servers to allow clients which speak standard protocols to communicate with wiki servers. Third, making wikis more interoperable will aid communications not only with other types of tools, but also between different wiki sites. For example, giving programs the ability to automatically talk to wiki servers is a prerequisite for automated WikiPageInterchange?. Finally, by allowing new ideas to be implemented as "universal" client-side tools rather than as server-side features, WikiGateway? will help with the problem of the division of developer effort among different wiki engines. These motivations are elaborated upon below.

\subsection{A central collection of screen-scraping code} Developers are already spending time implementing interfaces to wiki engines \cite{WikiRPCInterface?,WWW::Wikipedia,WWW:Mediawiki,Pywikipediabot,jacoby:UnifiedRecentChanges?,OddMuse:AutomaticPostingAndUploading?,WikiMinion?}. As long as developers are spending their time writing what is essentially "screen scraping" code to talk to specific wiki engines, why not collect all of these efforts into a single library?

In addition to making it easy for developers to know where to go to find wiki interface code, WikiGateway? has the added advantage of providing clients with a single, unified API to use, regardless of which specific type of software the wiki server is running.

In the future, perhaps developers who want to write screen-scraping code for a particular wiki engine will spend their time writing a driver for WikiGateway?.

%% took out:, rather than creating yet another freestanding tool which only knows how to talk to one type of wiki engine.

\subsection{Interoperability with standard protocols} Many useful software clients use protocols such as WebDAV? or Atom. However, there are few wiki engines which support these protocols. WikiGateway? provides a way for a DAV or Atom client to access a wiki server, even if the wiki engine running on the server does not know anything about DAV or Atom.

%% for accessing collections of documents have been designed, for example WebDAV? and Atom. However, wiki engine developers have not put a high priority on making wiki servers compatible with these protocols. This is too bad, because there are many software tools which use standardized protocols like DAV which could be useful for wikis. WikiGateway? provides gateway servers which exposes a DAV or Atom interface to a target wiki server, even if the wiki engine itself does not know anything about DAV or Atom.

\subsection{Wiki page interchange} One much-discussed feature of the future is the ability to programmatically move or copy a page from one wiki to another one, even if the source and destination wiki use different wiki software \cite{InterWikiWiki:WikiPageInterchange?,Wiki:WikiInterchangeFormat?,MeatBall:WikiInterchangeFormat?,Altheim:IWML,interwiki-discuss,MeatBall:WikiXmlDtd?,CommunityWiki:PlanBWikiModules,Twiki:RenderOnceReadMostly}. This would require two things:

  1. The existence of software that can read and write to a variety of different types of wiki engines.
  2. The existence of software that knows how to convert the wiki markup between a variety of different types of wiki engine markup styles.

WikiGateway? currently can do (1), and the plan is for it also to do (2) in the future (see Section \ref{sec:future}, "Future Work").

\subsection{De-fragment the developer effort} There are over 250 different wiki engines\footnote{252 was the result of a very superficial count of the number of entries on http://c2.com/cgi/wiki?WikiEngines, on April 21, 2005, in the "sorted by language of implementation" section.}. There are even quite a few different "popular" wiki engines\footnote{This is hard to quantify, but \cite{Wiki:WikiEnginePopularity?} lists 9 wiki engines with over 100,000 Google hits; \cite{WorldWideWiki:PopularWikis?} lists 5 popular wiki engines; \cite{rune:TheStateOfWiki?} lists 5 popular wiki engines; but these two 5-engine lists have only 2 engines in common.}. This means that even though there are many wiki software developers, this development effort is split into hundreds of small projects (see Figure \ref{serverSideAndFragmentation}) \cite{CommunityWiki:TooManyWikiEngines}.

One way to solve this problem would be to convince developers to pool their efforts and focus development on a small number of wiki engines. This is difficult because it would involve a lot of individual developers and small teams giving up creative control and joining large organizations.

Another answer is to allow developers to write a single implementation of some feature that will be compatible with multiple wiki engines. This can be done by implementing features client-side rather than server-side.

\subsection{Client-side wiki feature development} With the ability to download and upload the page source from remote wiki servers, it is now possible to implement wiki user interfaces (UI) partially or completely on the client. Advanced features such as a revert button, SubscribedPages?, "filtered Recent changes", a collaborative rating system for wiki pages, or "unified recent changes"\footnote{ "Unified recent changes" is a single list of the RecentChanges? of multiple wikis.}, could all be written without altering the software running at the server

%% Many wiki features can, in fact, be implemented on the client, provided that there is a way for software on the client to send requests to the wiki server directly. For example, a "revert button" feature means putting a shortcut in the UI for "fetch the page source corresponding to the specified version, and save that text as the new page text". A SubscribedPages?\footnote{or "filtered Recent changes"} feature can be applied by filtering the changes at the client side. A "unified recent changes"\footnote{Unified Recent Changes means listing the RecentChanges? of multiple wikis in one listing.} feature can be applied at the client side in a similar fashion to an RSS aggregator. A rating system for pages can be applied by using a rating server separate from the wiki server, in conjunction with an interface on the client. So, many features which we would usually assume would belong inside a wiki engine could also be implemented as separate, client-side tools.

There are also some features that seem to "naturally" belong in the client. Examples include a browsing interface based on an animated, clickable, graphical/spatial "map" of nearby wiki pages; or a Refactoring Browser, that is, a program specifically designed for refactoring text in wiki pages, with special tools for moving blocks of text around between and within pages \cite{MeatBall:WikiRefactoringBrowser?}.

The catch is that, before WikiGateway?, it was difficult for client-side programs to communicate with the wiki server. Wiki servers are designed for human clients, not programs. Different wiki servers have different "protocols" for talking to them, each one using their own idiosyncratic set of CGI forms. This meant that it was very difficult to write a client-side feature that would be compatible with multiple types of wiki engines. Until now, this difficulty has hindered the development of client-side tools for wikis.

With WikiGateway?, client-side tools which interoperate with many different types of wiki engines can be easily developed. This will allow developers to focus their time on interoperable client-side tools, rather than on wiki-engine specific server-side features. In this way, the fragmentation of the developer community can be overcome (see Figure \ref{clientSideVsFragmentation}).

\begin{figure*} \begin{graph} size="7,6"

subgraph dev { rank=same dev1 [label="wiki\ndeveloper\n#1", shape=plaintext] dev2 [label="wiki\ndeveloper\n#2", shape=plaintext] dev3 [label="wiki\ndeveloper\n#3", shape=plaintext] }

subgraph clusterServer1 {

  1. label="UseMod?" shape=box color=black

UseMod? [shape=plaintext] UseMod? -> s1feature1 [style=invis]

s1feature1 [label="feature D"] }

subgraph clusterServer2 {

  1. label="MoinMoin?" shape=box color=black

MoinMoin? [shape=plaintext] MoinMoin? -> s2feature2 [style=invis]

s2feature2 [label="feature E"] }

subgraph clusterServer3 {

  1. label="OddMuse?" shape=box color=black

OddMuse? [shape=plaintext] OddMuse? -> s3feature3 [style=invis]

s3feature3 [label="feature F"] }

dev1->s1feature1 dev2->s2feature2 dev3->s3feature3

user [shape=diamond, color=green3]

  1. user->s2feature2 user->MoinMoin? [color=green3]

\end{graph} \caption{Because there are so many wiki engines, each engine only has a few developers working on it. The sum total of developer effort is large, but the developer effort spent on any single wiki engine is relatively small. So although there are many features implemented somewhere, each wiki offers only a limited set of features. Users who visit a particular wiki can only take advantage of a small proportion of potential features.} \label{serverSideAndFragmentation} \end{figure*}

%%\begin{figure} %%\b egin{graph} %%size="3.8,3.8" %%dev1 [label="wiki\ndeveloper\n#1"] %%dev2 [label="wiki\ndeveloper\n#2"] %%dev3 [label="wiki\ndeveloper\n#3"] %% %% %%subgraph clusterTool { %%#label="client-side tool" %%color=black %%shape=diamond %% %%tool [label="client-side tool",shape=plaintext] %% %%"feature D" %%"feature E" %%"feature F" %% %%WikiGateway? %%} %% %%dev1->"feature D" %%dev2->"feature E" %%dev3->"feature F" %% %%tool->UseMod? %%tool->MoinMoin? %%tool->OddMuse? %% %%user [shape=diamond, color=green3] %% %%user->dev1 [style=invis] %%user->dev2 [style=invis] %%user->dev3 [style=invis] %% %%user->tool [color=green3] %% %%\e nd{graph} %%\caption{} %%\label{clientSideVsFragmentation-one-tool} %%\end{figure}

\begin{figure*}

\begin{graph} node [fontsize=18] size="7,7" dev1 [label="wiki\ndeveloper\n#1", shape=plaintext] dev2 [label="wiki\ndeveloper\n#2", shape=plaintext] dev3 [label="wiki\ndeveloper\n#3", shape=plaintext]

subgraph clusterTool1 { color=black tool1 [label="client-side tool \#1",shape=plaintext] "feature D"

  1. wg1 [label="WikiGateway?", shape=octagon]

bottom1 [style=invis, height=.02, shape=plaintext, label=""] tool1->"feature D" [style=invis] "feature D"->bottom1 [style=invis] }

subgraph clusterTool2 { color=black tool2 [label="client-side tool \#2",shape=plaintext] "feature E"

  1. wg2 [label="WikiGateway?", shape=octagon]

bottom2 [style=invis, height=.02, shape=plaintext, label=""] tool2->"feature E" [style=invis] "feature E"->bottom2 [style=invis] }

subgraph clusterTool3 { color=black tool3 [label="client-side tool \#3",shape=plaintext] "feature F"

  1. wg3 [label="WikiGateway?", shape=octagon]

bottom3 [style=invis, height=.02, shape=plaintext, label=""] tool3->"feature F" [style=invis] "feature F"->bottom3 [style=invis] }

dev1->"feature D" dev2->"feature E" dev3->"feature F"

fav [label="User's favorite wiki", shape=house]

  1. wg1->fav
  2. wg2->fav
  3. wg3->fav
  4. tool1->fav
  5. tool2->fav
  6. tool3->fav

user [shape=diamond, color=green3]

user->dev1 [style=invis] user->dev2 [style=invis] user->dev3 [style=invis] bottom1->fav bottom2->fav bottom3->fav

user->tool1 user->tool2 user->tool3

\end{graph} \caption{With WikiGateway?, developers can work on tools which interoperate with many wiki engines. Developers' time is better spent because the software they create can be used by anyone, not just the users of a particular wiki engine. Users who visit a particular wiki can take advantage of a wide selection of tools.} \label{clientSideVsFragmentation} \end{figure*}

\subsection{Freedom for users to choose their wiki software} Today, if your favorite wiki doesn't implement a particular new feature, there is little that you can individually do. You must use the UI that your favorite wiki runs. The developers of the wiki engine and the administrator of your wiki must both take action to enable the end-user to use a given feature.

WikiGateway? will allow individual users to use a client-side wiki UI of their choice to interact with their favorite wikis. No longer must all members of a community be bound to the same UI. However, this has disadvantages as well as advantages; see Section \ref{sec:lossOfCentralControl}, "Loss of central control over UI".

%% it doesn't help that 50% of all wikis implement that feature.

%% If user's favorite wiki doesn't implement that user's favorite new feature, it doesn't help that 50% of all wikis implement that feature.

\subsection{Accelerate the adoption of new wiki features} Today, when someone imagines a new wiki feature, it can be extremely difficult for that feature to become widely available. This is because three hurdles must be overcome:

  1. The development teams of many wiki engines must be convinced to include the feature.
  2. The feature must be implemented in many wiki engines.
  3. Many wiki hosts (wiki server administrators) must be convinced to upgrade their wiki server software to a newer version incorporating the feature.

Figure \ref{newFeatureHurdles} illustrates these problems.

\begin{figure} \begin{graph} node [fontsize = 30]

rankdir=LR size="3.3,6" idea [label="new feature idea"] idea->we1 idea->we2 idea->we3

we1 [label="wiki engine #1\nmust\nimplement\nfeature"] we1->a11 we1->a12 we1->a13 a11 [label="wiki administrator\nmust\nupgrade server"] a12 [label="wiki administrator\nmust\nupgrade server"] a13 [label="wiki administrator\nmust\nupgrade server"]

we2 [label="wiki engine #2\nmust\nimplement\nfeature"] we2->a21 we2->a22 we2->a23 a21 [label="wiki administrator\nmust\nupgrade server"] a22 [label="wiki administrator\nmust\nupgrade server"] a23 [label="wiki administrator\nmust\nupgrade server"]

we3 [label="wiki engine #3\nmust\nimplement\nfeature"] $\ldots$ [shape=plaintext] we3->$\ldots$

\end{graph} \caption{A lot of work has to be done in order to make a new wiki feature widely accessible. Many wiki engine development teams must be implement the feature, and many site administrators must upgrade their wiki server.} \label{newFeatureHurdles} \end{figure}

Adding features on the client-side, rather than the server-side, using WikiGateway? addresses all of these problems.

  1. Because the software is client-side, the development process does not need to involve the developers of the server-side wiki engines.
  2. Because WikiGateway? hides the differences between wiki engines from the client-side application, the feature can be implemented only once, and yet work with many wiki engines.
  3. Because the software is client-side, the wiki server administrator does not need to do anything to enable the new functionality.

With WikiGateway?, new features will be able to be used without consulting fifty developer communities, without writing fifty implementations, and without waiting for fifty thousand wiki administrators to upgrade (contrast Figure \ref{newFeatureWithWG} to Figure \ref{newFeatureHurdles}). People who don't care about new features can continue to use the standard UIs provided by wiki servers, while "power users" will be able to use client-side software providing them with the latest and greatest. Therefore, new feature ideas will be more quickly dispersed throughout the total wiki user base.

\begin{figure} \begin{graph} rankdir=LR size="3.3,6" idea [label="new feature\nidea"] idea->tool

tool [label="a client-side\ntool must\nimplement\nthe feature"] tool->we1 tool->we2 tool->we3

we1 [label="wiki engine #1\nworks with tool"] we2 [label="wiki engine #2\nworks with tool"] we3 [label=$\ldots$, shape=plaintext]

\end{graph} \caption{With client-side development, neither the developers of all of the wiki engines, nor the administrators of all of the wiki sites need to do anything. A client-side tool must be developed only once, and then users can use the tool on many wikis.} \label{newFeatureWithWG} \end{figure}

\section{Arguments against WikiGateway?} \subsection{Security} It has always been possible to write bots that attack or spam wikis. However, it seems that very few have done so; though some observers attribute the recent surge in spam to bots, I believe that humans are on the other end. It is likely that the availability of WikiGateway? will assist in the development of some malicious wiki bots. It is possible, although unlikely, that it could even catalyze an avalanche of harmful wiki bots.

However, not developing WikiGateway? would not help, in the long run. Eventually, wiki attack bots would be written either way. Eventually, wiki servers must implement security systems that allow wiki communities to be highly resistant to robotic attack. WikiGateway? may force wikis to become secure sooner, but there was never any way to avoid the need to become secure.

In fact, it's possible that if WikiGateway? were never developed, the cracker or spammer communities might eventually have developed a similar set of tools to assist in writing malicious bots.

On the other hand, developing WikiGateway? provides a number of advantages for wiki communities (see Section \ref{sec:motivations}, "Motivations"). It is even possible that WikiGateway? itself could assist in the development of software for wiki security; for example, the WikiGateway? project has produced a spam cleaning bot. Unfortunately, unlike many aspects of wikis which could be done at the client, security is, fundamentally, best handled by wiki server software. This is because wiki servers have the last word on which changes get made to the page database.

An additional argument in favor of developing WikiGateway? is that it would be easy for individual site administrators to "turn off support for editing via WikiGateway?". WikiGateway? is hard-wired with algorithms to recognize the relevant components of the edit forms of each supported wiki. A wiki administrator could simply alter the wiki's code so as to make the edit form unrecognizable by WikiGateway?. One could argue that the attacker could always rewrite WikiGateway? to parse the new edit form, however, if s/he is willing to do that, then s/he could have done it without WikiGateway? in the first place.

However, I would hope that site administrators don't take this route, as it deprives their legitimate users of the opportunity to make use of tools build on WikiGateway?. A better solution would be to make WikiGateway? unusable for anonymous or newly registered users, but to make it usable for trusted users with user accounts.

%% todo: clean up this unweildy sentence with too many commas: Unfortunately, unlike many aspects of wikis which could be done at the client, security is, fundamentally, best handled by wiki server software.

In summary, since the fundamental security problems exist either way, we may as well develop WikiGateway? technology and reap the unique benefits that it can provide.

%% todo: summary kind of non-sequiter here

\subsection{Loss of central control over UI}\label{sec:lossOfCentralControl} One of the motivations discussed above is "Freedom for users to choose their wiki software". But there are disadvantages as well as advantages to this.

\subsubsection{No way to constrain the user} When designing software to support online communities, it may not always be optimal to make every potential action as easy as possible for the user. Sometimes it may be better to make an action annoying or difficult to do for the sake of the group \cite{MeatBall:PricklyHedge?}. An extreme example is that a wiki interface with a "Delete all pages" button is not good for the community, even though all it does is provide a shortcut to something that any user could do anyway.

When the entire community is forced to use one UI, it is possible to make some actions annoying or difficult by design. However, when each user can choose their own UI, then it is possible that some users will choose to use UIs that provide shortcuts for actions that "should" be difficult.

\subsection{Loss of common context} In wiki communities, the UI is part of the common context that the whole community shares \cite{MeatBall:CommonContext?}. Common context is essential for communication, understanding, and empathy. If everyone views a wiki through a different UI, the unity of the community may be seriously threatened.

Imagine, for example, that some people used a standard wiki server web interface, and others used an interface whose RecentChanges? displayed only those changes which were "rated highly" by their friends, while a third group of users viewed the wiki through a wiki to email gateway that emailed diffs of changes to a user-specified list of subscribed pages. One can imagine that these three groups may evolve very different views of what is going on at a given time, and of what the communal norms are.

These two concerns may seem destabilizing, but they are really just instances of decentralization of power, from the wiki administrator to the wiki users. Wikis are unique in that they are extremely decentralized websites, compared to traditional websites in which the webmaster has unique power to write the web content. Yet wikis have taught us that even extreme decentralization can be efficient and secure.

In summary, wiki communities may have to adapt and manage the consequences of decentralizing power over the UI caused by WikiGateway?. However, it is likely that they will be able to do so, and that the benefits of WikiGateway? will outweigh the costs.

\section{Architecture of WikiGateway?} The core of WikiGateway? is the Python module\footnote{WikiGateway?'s core was originally written in Perl, but it was recently rewritten in Python.}. All of the other components are either wrappers around or callers of the Python module, although some of them depend on it indirectly. If the reader has not already done so, s/he is strongly urged to glance at Appendix A in order to get a concrete feel for the sort of functionality that the core Python WikiGateway? module provides.

\subsection{The Python module} The Python module is based on a collection of wiki engine-specific __drivers__ which implement basic \begin{latex}I/O\end{latex} functions for accessing the engines which they handle.

The design goals of the Python module are

  1. The module should be easy to use.
  2. The drivers should be easy to develop, even if the developer is a one-off contributor who has little knowledge of the WikiGateway? framework.

\subsubsection*{How the caller sees the module} The Python module is installed using standard Python distutils installation procedure.

A caller can interact with the WikiGateway? module either in an object-oriented fashion, or in a procedural fashion. For one-off interactions with a remote wiki, the procedural style is more concise:

\begin{verbatim} WikiGateway?.getRecentChanges( 'http://interwiki.sourceforge.net/cgi-bin/wiki.pl', 'oddmuse1', 'April 11, 2005') \end{verbatim}

The first two arguments in the procedural syntax are always the URL of the remote wiki, and its wiki engine type. After this come any arguments specific to the function; in the example, getRecentChanges takes a timestamp to indicate how far back you wish to receive changes).

For extended interaction with a remote wiki, the object-oriented style is more convenient:

\begin{verbatim} wg = WikiGateway?.WikiGateway?( 'http://interwiki.sourceforge.net/cgi-bin/wiki.pl', 'oddmuse1') for change in wg.getRecentChanges('April 11, 2005'): pageName = change['name'] print wg.getPage(pageName) \end{verbatim}

WikiGateway? may raise exceptions or errors in the course of execution. Exception classes may be found in the module WikiGateway?.Errors. Examples are {\tt ReadError?}, {\tt EditError?} and its subclass {\tt EditConflictError?}.

\subsubsection*{Under the hood}

The procedural-style functions are implemented by instantiating an object and then calling the appropriate method on that object.

WikiGateway? objects are constructed using two arguments, the URL and the wiki engine identifier. They are actually constructed using a class factory. At run-time, the class factory looks at the wiki engine identifier (and possibly the URL) and decides which of many wiki engine-specific drivers will be used. Each driver is a class. The class factory function {\tt WikiGateway?.WikiGateway?} instantiates an object of the appropriate class and returns it to the caller.

All driver classes are subclasses of
{\tt WikiGateway?.\_WikiGatewayBase?}, which provides common utility routines which may utilize and alter instance variables of the WikiGateway? object.

{\tt WikiGateway?.\_WikiGatewayBase?} is a subclass of {\tt WikiGateway?.\_HighLevelFunctions?}, which provides default implementations of any functionality that may be built in terms of calls to lower-level methods. An example is {\tt revertToVersion}, which is built in terms of the low level methods {\tt getPageVersion} and {\tt putPage}).

Another common module is {\tt WikiGateway?.\_utils}, which provides static utility functions for use by the driver modules. For example, the function {\tt WikiGateway?.\_utils.getURL\_orReadError} fetches a URL or raises a {\tt WikiGateway?.Errors.ReadError?} if something goes wrong. See Figure \ref{pythonModuleClasses}.

\begin{figure*} \begin{graph} size="7,6" node [shape=box]

"_utils" "Errors" cf [label="WikiGateway? class factory\ninstantiates driver classes"]

subgraph clusterWG { "_HighLevelFunctions?" -> "_WikiGatewayBase?" "_WikiGatewayBase?"->"UseMod? 1.0 driver" "_WikiGatewayBase?"->"UseMod? .91 driver" "_WikiGatewayBase?"->"UseMod? .92 driver" "_WikiGatewayBase?"->"OddMuse? driver" "_WikiGatewayBase?"->"MoinMoin? driver" }

cf->"UseMod? 1.0 driver" [style=dotted] cf->"UseMod? .91 driver" [style=dotted] cf->"UseMod? .92 driver" [style=dotted] cf->"OddMuse? driver" [style=dotted] cf->"MoinMoin? driver" [style=dotted]

cf->"_HighLevelFunctions?" [style=invis]

\end{graph} \caption{The internal classes of the Python WikiGateway? module. Solid lines indicate class inheritance. Dotted lines illustrate that the class factory selects, instantiates, and returns a driver object when the caller creates a "WikiGateway?" object. All classes can use {\tt \_utils} and {\tt Errors}.} \label{pythonModuleClasses} \end{figure*}

The driver modules themselves contain implementations of all of the low-level methods which are specific to each wiki engine.

\subsection{The Perl module} The Perl module Wiki::Gateway uses {\tt Inline::Python} to wrap the Python module {\tt WikiGateway?}. It adds functionality to the {\tt Inline::Python} wrapper by individually wrapping each function with code to intercept Python exceptions and store information about them which can be retrieved by calling the function {\tt Wiki::Gateway::getLastExceptionType()}. In addition, it works around an Inline::Python bug with the treatment of unicode by encoding all unicode into ASCII before it passes through the Inline::Python interface.

The Perl module can be downloaded and installed from CPAN\cite{CPAN}

\subsection{The command-line client} Appendix B shows some example commands to the command-line client. Note that, for convenience, the command-line client can refer to a preferences file names {\tt .intermap} to resolve InterWiki?-like shortcuts to URLs.

The command-line client is written in Perl using the Perl Wiki::Gateway module.

\begin{figure} \begin{graph} size="3.3,3" client g [label="WebDAV? gateway server"] wiki [label="Wiki server\n(running standard\nwiki engine)"]

client -> g [dir=both, label="WebDAV?"] g->wiki [dir=both, label="Wiki server's\nstandard web\ninterface"]

\end{graph} \caption{The gateway servers allow clients to use standard protocols to interact with wiki servers. The wiki servers don't have to implement the protocols themselves; the gateway server acts as a translator to translate the standard protocol into actions on the wiki server's normal web interface. In this example, the gateway server is making a normal wiki server act as a DAV resource.} \label{gatewayServerExample} \end{figure}

\subsection{Gateway servers} Instead of forcing the client to use special WikiGateway? libraries in order to communicate with wiki servers, there are times when one would want make WikiGateway? transparent to the client and allow the client to use a standard protocol to communicate with the wiki server. This is the motivation behind the gateway servers, which are independently-running servers acting as middlemen between the client and the wiki server. For example, consider the WebDAV? gateway server. The client sends WebDAV? requests to the WebDAV? gateway. The WebDAV? gateway translates these requests into the HTTP forms requests used by the wiki server. From the client's point of view, it is communicating with a wiki server that knows WebDAV?. From the server's point of view, it is communicating with a human using its standard web interface. Figure \ref{gatewayServerExample} illustrates this.

This method allows one to use client software that wasn't intentionally designed for wikis. This is useful because there is more software written for protocols such as WebDAV? and Atom than there is software written especially for wikis.

\subsubsection*{Example usage: mounting a remote wiki as a filesystem}

For example, I have used {\tt davfs}, a program to mount a remote WebDAV? resource, to mount a remote wiki running OddMuse?. I was able to use the standard operating command {\tt ls} to list the pages on the wiki, and to read and write the wiki pages with text editors as if they were files on my hard drive. When I operated on the file, {\tt davfs} translated the operation into a WebDAV? request which was sent to a WebDAV? gateway server which is part of the WikiGateway? project. The WebDAV? gateway server translated the incoming DAV request into an HTTP request that the OddMuse? server could understand. So, even though {\tt davfs} didn't know about wikis and OddMuse? doesn't speak WebDAV?, and even though I did not have administrative access to the remote server running OddMuse?, I was able to use {\tt davfs}.

\subsubsection*{Third-party web service providers}

Note that with a gateway server, neither the the client nor the server need to be modified or even to know that they are dealing with WikiGateway?. The client only needs to use a standard protocol such as WebDAV?. The server just has to keep doing what it's always doing; serving users with a web interface.

This approach allows a third-party, that is, someone who is in charge of neither the client nor the server, to effectively "WebDAV? enable" the wiki server. Any member of a wiki community can provide this service without the need for the wiki server administrator to do anything.

%%% at the client-side to enable the client to communicate with

%% note that third-party can host

\subsubsection*{The DAV server} The WebDAV? server makes it appear as if a remote wiki server is a DAV resource. It is a WebDAV? server built on top of PythonDAV? and the WikiGateway? Python module.

\subsubsection*{The Atom server} The Atom server makes it appear as if a remote wiki server is Atom enabled. The version of the Atom specification used is \cite{gregorio:AtomAPI?-9} (more recent versions of the Atom specification now exist \cite{ietf:AtomPub?-3}). It is an Atom server build on top of XML::Atom::Server2.and the WikiGateway? Perl module.

XML::Atom::Server2 was built for this project and provides a "Atom server" class which may be used as a superclass to build an Atom server. The class contains most of the needed logic; all that the caller needs to add is the backend. In this case, the backend consists of calls to the wiki server using WikiGateway?. XML::Atom::Server2 is built on top of XML::Atom::Server, which uses a more bare-bones approach.

\subsubsection*{The WikiRPCInterface?2 server} The WikiRPCInterface?2 server makes it appear as if a remote wiki server supports the nascent WikiRPCInterface? protocol, version 2. It is an Atom server built on top of XML::Atom::Server2 and the WikiGateway? Perl module.

\subsubsection*{The XMLRPC server} The XMLRPC server exposes the functions of the WikiGateway? Python module as an XMLRPC interface. The server administrator presets the target wiki during configuration. The client can then call XMLRPC methods like getPage and putPage on the server.

The XMLRPC server and the WikiRPCInterface?2 server are actually the same program, used with different configuration options.

\subsection{Demo applications} Some software tools have been created to demonstrate the wide range of uses to which WikiGateway? may be put. These tools also provide "sample code" using WikiGateway? for other developers to look at.

\subsubsection*{wikicp} {\tt wikicp} is a script to copy all of the pages from a source wiki onto a target wiki.

\subsubsection*{spamclean} {\tt spamclean} is a bot to detect and revert spam on a remote wiki. {\tt spamclean} identifies spam by comparing it to a regular-expression-based content blacklist which is downloaded from the internet from a user-specified location. The blacklist is in OddMuse? format (see \cite{CommunityWiki:BannedContentDiscussion}), and one place to get one is \cite{OddMuse:BannedContent}. The bot can be configured to ask a user about each piece of spam (interactive mode), or to revert spam automatically.

Other configuration options include the length of time to check for spam, whether to delete pages which have no spamless versions, and the text to put in the summary line upon reversion. The summary line may include variables such as information about the version to which the page is reverting.

{\tt spamclean} reverts spammed pages to the most recent spam-less version found\footnote{Some users have requested a feature where the IP addresses of those who generated the spam are remembered, and the page is then reverted to the latest spam-free version edited by a non-spammer. A similar bot that already has this feature is WikiMinion? \cite{WikiMinion?}.}.

{\tt spamclean} produces a log which notes, for each page with spam, the offending spam, the diff between the spammed version and the clean version, and the action taken.

\subsubsection*{pushWebsiteToWiki} {\tt pushWebsiteToWiki} is a script that uploads a directory of text files into a wiki. Each file in the directory is treated as a text file containing the wiki markup source of a wiki page. Subdirectories are also recursively uploaded. The delimiter '-' is used in page names to indicate from which subdirectory the page came. For example, if {\tt website} is the directory being uploaded, then a file named {\tt website/books.txt} would become the markup source for the wiki page "{\tt Books}". A file in a subdirectory with path {\tt website/ideas/robertsRules.txt} would become the wiki page "{\tt ideas-robertsRules}".

In addition, {\tt pushWebsiteToWiki} creates indices for each subdirectory. For example, the page "{\tt ideas}" would contain an auto-generated index of the subdirectory {\tt website/ideas}.

\subsection{Unit tests} Many of the core parts of WikiGateway? are equipped with a unit testing framework. Specifically, the most important functions in the Python module, the Perl module, the command-line wiki client, the WikiRPCInterface?2 gateway server, and the Atom gateway server all have unit tests.

The unit test framework is object-oriented and hierarchal. The top-most class is {\tt TestReadableWriteableCollection?}, which is an abstract class for testing access to a collection of documents. Lower-level classes implement the read/write primitives depending on which component is being tested. Since most WikiGateway? components are concerned with reading and writing collections of documents, this allows much of the unit testing code to be reused across different WikiGateway? components.

\section{Future work}\label{sec:future} \subsection{Support for more wiki engines} Although a major goal of WikiGateway? is to achieve portability across many different wiki engines, in fact currently there are only drivers for 3 types of wiki engines (UseMod?\footnote{There are three drivers for UseMod?, for versions .91, .92, and 1.0.}, OddMuse?, and MoinMoin?). I felt that building an extensible framework which made it easy to write drivers, which had unit tests, and which provided a way to let clients use protocols such as DAV or Atom was a more pressing need than initially supporting a large number of wiki engines.

However, now that the framework is in place, I plan to add at least three more drivers by October 2005, and many more in the long run.

\subsection{Wiki page interchange} Because WikiGateway? is a central collection of algorithms for reading and writing to various types of wiki engines, it is also a natural place to collect algorithms for converting wiki markup between wiki engines. This would allow WikiGateway? to offer a "copy wiki page" command that could copy a wiki page from one wiki to another and automatically convert markup styles as necessary.

There is no consensus on how best to do this. One could imagine writing converters for each pair of wiki engines, or one could imagine converting all markup types to a single "canonical" markup, and then converting from the canonical markup to the target markup type. One advantage of using a canonical markup type is that only $O(n)$ conversion algorithms need to be written, as opposed to $O(n^2)$ algorithms if each pair of wiki engines must have a specialized conversion algorithm (where $n$ is the number of supported wiki engines) \cite{ayers:email7189385}. Some have suggested that this canonical standard should be XHTML (which is generated by most wikis already), and that the conversion algorithms should be written as XSLTs \cite{InterWikiWiki:XhtmlInterWikiMarkupStandard?,ayers:decoder04}.

WikiGateway? could accommodate either strategy, or a mixture of both. Since each driver is independent of the others, different wiki engines could have different conversion routines for their markup. Depending on the particular source and target style, a conversion algorithm could be either "direct" or make use of a "canonical" intermediary.

However, because of the object-oriented structure of the core Python module, functions could be made available to all drivers which encapsulate common motifs such as XSLT processing.

\subsection{Auto-detection of wiki engine type} Currently, a client using most WikiGateway? components must tell WikiGateway? what sort of software is running on the wiki server. This is inconvenient. It should be possible for WikiGateway? to automatically determine what type of wiki server it is dealing with by sending it a series of test requests and analyzing the responses.

For many wiki engines it would probably be sufficient to analyze the HTML of the front page of the wiki. For some, the HTML of the edit form might be needed.

\subsection{More demo applications} \subsubsection*{WikiWindow?} WikiWindow? is the name for a feature that would allow multiple wiki engines to be used with a single wiki. For example, MeatballWiki? runs the UseMod? wiki engine. However, with WikiWindow?, a user could opt to view and edit MeatballWiki? through a MoinMoin? server. This would allow users to choose which type of wiki engine software they prefer.

How would this be done? MeatballWiki? would be hosted on a standard UseMod? wiki engine. On another server, someone would run a specially modified MoinMoin? engine. The MoinMoin? engine would be modified so that, instead of querying its own database for the wiki page source, it would instead query the real MeatballWiki? UseMod? server, using WikiGateway? (see Figure \ref{WikiWindow?}). The UseMod? server would return the rendered page text, and the MoinMoin? engine would then add its own header and footer, etc.

\begin{figure} \begin{graph} size="3.3,9"; node [fontsize=24]

subgraph clusterU { label="UseMod? server"; color=black;

uUI [label = "UI"]; upg [label="page\ndatabase"];

upg -> uUI [dir="both", color=blue] }

subgraph clusterMM { label="Modified MoinMoin?\nserver"; mmUI [label = "UI"]; mmwg [label = "WikiGateway?\ninterface\nto remote\nUseMod server", shape=octagon];

mmwg -> mmUI [dir="both", color=red]

}

mmwg -> uUI [dir="none", style="dashed", color=red] uMM [label="user\nwho likes\nMoinMoin", color=red] uU [label="user\nwho likes\nUseMod", color=blue] uU -> uUI [dir=both, color=blue]; uMM -> mmUI [dir=both, color=red];

\end{graph} \caption{An example of the WikiWindow? concept. A user interacts with a modified MoinMoin? server; the page database is stored on a UseMod? server. Red lines indicate the communications paths involving the red user. Blue lines indicate the communications paths involving the blue user. The dotted line represents a WikiGateway?-based interface between the modified MoinMoin? server and the UseMod? server.} \label{WikiWindow?} \end{figure}

\subsubsection*{Wiki mode for emacs} Alex Schroeder, David Hansen, Pierre Gaston, and Deepak Goel have written an emacs mode especially for browsing and editing wikis \cite{SimpleWikiEditMode?}. It is compatible with OddMuse? and UseMod? wikis. By replacing the UseMod?/OddMuse?-specific code with calls to WikiGateway?, I will make this emacs mode compatible with all WikiGateway?-supported wikis.

\subsubsection*{Wiki client} At some point, I will implement a very simple wiki client to demonstrate the potential that WikiGateway? holds for wiki clients.

\subsubsection*{Unified, filtered RecentChanges?} Users who regularly read a couple of busy wikis have a lot to keep up with. A tool which aggregates changes from many wikis and which allows the user to apply flexible, powerful filtering criteria could allow a busy user to better focus her or his time.

\subsubsection*{WikiSync?} It would be useful to have a tool that allows one to "checkout" a copy of a wiki for offline editing, to edit the wiki pages as text files using standard word processing applications, and then to "resync" the local database with the online wiki \cite{Wiki:WikiSync?}. This would be useful not only for people actually browsing offline, such as commuters, but also to people who would like to refactor a few related pages with a standard word processor.

A tool like this already exists, but only for Mediawiki \cite{WWW:Mediawiki}.

\subsection{More unit tests} Presently, there are unit tests for the most fundamental WikiGateway? components, but not for all components (such as the demo applications). Eventually, all components will have unit tests.

\section{Conclusions} WikiGateway? is a library which allows software to access wiki servers. The method of access may be a client-side tool (Perl, Python, or command-line) or a standard protocol such as DAV, Atom, or WikiRPCInterface?. It is relatively easy for a developers to write a driver for WikiGateway? to make it compatible with a particular wiki engine.

WikiGateway? will make wiki servers more interoperable with other software, including other wiki servers, wiki client software, and wiki-agnostic software using generic protocols such as WebDAV?.

%%By making (page interchange)

WikiGateway? makes it possible to add new features on client-side tools without modifying the wiki server. Since these tools will be compatible with many different wiki servers, it will be possible to overcome the fragmentation of the developer community into hundreds of different wiki engines. This will accelerate the technological evolution of wiki.

\section{Acknowledgments} L. M. Orchard and David Jacoby contributed the code on which the original Perl version of WikiGateway? was based.

Isolani for contributed the code upon which the Atom gateway server was based. Alex Schroeder suggested building the Atom gateway server. Christian Scholz wrote Python DAVserver, and Benjamin Trott wrote XML::Atom::Server.

Although I have heard that WikiRPCInterface? started on JSPWiki, I don't know who originated it, but thank you to them.

Lion Kimbro helped with testing.

Dana Dahlstrom, David Cary, Alex Schroeder, and Mary Dilley helped to edit this paper.

Thanks also to the members of MeatballWiki? and CommunityWiki? for many helpful discussions.

%%todo: read wiki pages i've written similar to motivations, compare clarity

%%todo: read IWS:motivations and compare

\bibliographystyle{abbrv2} \bibliography{wikiGateway.bib}

\pagebreak

\begin{figure*}[h!] \begin{center} \section*{APPENDIX} \end{center} \section*{A. Demonstration of the Python WikiGateway? module}

For readability, longer responses were truncated, blank lines added, and a subset of a longer session was selected for inclusion.

\begin{verbatim} Python 2.3.4c1 (#2, May 13 2004, 21:46:36) [GCC 3.3.3 (Debian 20040429)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import WikiGateway? >>> print WikiGateway?.getPage(r'http://interwiki.sourceforge.net/cgi-bin/wiki.pl', 'oddmuse1', 'SandBox') This is a test of WikiGateway. Test 2.

>>> wg = WikiGateway?.WikiGateway?('http://interwiki.sourceforge.net/cgi-bin/wiki.pl', 'oddmuse1')

>>> wg.getAllPages() ['AboutInterWikiSoftware?', 'AdvantagesOfAGateway?', 'AndrewGray?', 'AtomGateway?', 'AtomServerModule?', ...(truncated)...]

>>> wg.getRecentChanges('April 11, 2005') [{'comment': u'Automated update of spam blacklist', 'name': u'BannedContent', 'importance': u'major', 'lastModified': u'2005-04-19T12:11:52+00:00', 'version': u'57'}, {'comment': u'revert to revision 39', 'name': u'SpamClean?', 'importance': u'major', 'lastModified': u'2005-04-17T08:00:54+00:00', 'version': u'41'}, {'name': u'SandBox', 'importance': u'major', 'lastModified': u'2005-04-16T17:58:31+00:00', 'version': u'439'}, {'name': u'WikiSandBox?', 'importance': u'major', 'lastModified': u'2005-04-16T10:31:32+00:00', 'version': u'1'}]

>>> wg.putPage('SandBox', 'py wg test')

>>> wg.getPageInfo('SandBox') {'date': <DateTime? object for '2005-04-21 16:19:00.00' at 407596b0>, 'comment': , 'version': 440, 'author': 'user-10cmeae.cable.mindspring.com'}

>>> wg.getPageInfoVersion('SandBox', 429) {'date': <DateTime? object for '2005-04-11 15:03:00.00' at 4077c4b8>, 'comment': , ...(truncated)...}

>>> wg.getPageHistoryInfo('SandBox'){416: {'date': <DateTime? object for '2005-04-05 16:25:00.00' at 4077c7c8>, 'comment': , 'version': 416, 'author': 'user-10cmeae.cable.mindspring.com'}, 417: {'date': <DateTime? object for '2005-04-10 04:46:00.00' at 4077c790>, 'comment': , 'version': 417, 'author': 'user-10cmeae.cable.mindspring.com'}, ...(truncated)...}

>>> wg.getPageVersion('SandBox',439) 'This is a test of WikiGateway. Test 2.\n'

>>> wg.getPageHTMLVersion('SandBox',439) 'This is a test of WikiGateway?. Test 2.

' \end{verbatim} \end{figure*}

\begin{figure*}[h!] \section*{B. Sample calls to the commandline wiki client "wikiclient"} InterWiki? prefixes are resolved using a file ".intermap" in the user's home directory.

\begin{verbatim} wikiclient --type=usemod1 read MeatBall:SandBox wikiclient --type=usemod1 read http://www.usemod.com/cgi-bin/mb.pl:SandBox wikiclient --type=usemod1 write http://interwiki.sourceforge.net/cgi-bin/wiki.pl:SandBox --summary="just a test" <the text to be written is read from STDIN> wikiclient --type=usemod1 rc MeatBall? wikiclient --type=usemod1 rc:11 MeatBall? wikiclient --type=usemod1 allpages http://interwiki.sourceforge.net/cgi-bin/wiki.pl wikiclient --type=usemod1 info MeatBall:SandBox wikiclient --type=usemod1 info MeatBall:SandBox --version=2384 wikiclient --type=usemod1 info MeatBall:SandBox --version=last \end{verbatim} \end{figure*}