notes-simpleDAV

Introduction

These are notes towards a protocol intended to compete with Atompub and WebDAV?. Here is a listing of features that the protocol defines, grouped into sets of features called "profile levels" for convenience:

In addition, the Blog/Wiki Basic Profile is defined as the following subset of features from out of those above:

This protocol is intended to have a short specification (at the expense of comprehensiveness), and to be easily implementable in a web application (i.e. server-side), provided that the web application can handle HTTP PUT and can control the headers issued by the server.

Here is a concise summary of the protocol itself.

Examples

Level 1 example

The client asks for a listing:

GET / HTTP/1.1
Host: example.com
Accept: text/uri-list2

The server provides one:

HTTP/1.1 200 OK
Date: Wed, 25 Feb 2009 06:28:25 GMT
Server: Apache/2.2.9 (Debian)
Content-Length: 50
Content-Type: text/uri-list2
Available-Content-Types: text/html, text/uri-list2

http://example.com/item1 Item 1
http://example.com/item2 Item 2

The client creates a new document:

PUT /item3 HTTP/1.1
Host: example.com
Content-Type: text/html
Content-Length: 41

<html><body><h1>test3</h1></body></html>

The server says OK:

HTTP/1.1 201 Created
Date: Wed, 25 Feb 2009 06:28:25 GMT
Available-Content-Types: text/html;processed=0
Put-Accept: text/html, text/wiki-creole
Content- Length: 0  
Location: http://example.com/item3

The client replaces the document:

PUT /item3 HTTP/1.1
Host: example.com
Content-Type: text/wiki-creole
Content-Length: 9

**bold**

The server says OK:

HTTP/1.1 200 OK
Date: Wed, 25 Feb 2009 06:28:25 GMT
Available-Content-Types: text/html;processed=1, text/wiki-creole;processed=0
Put-Accept: text/html, text/wiki-creole
Content- Length: 0  
Location: http://example.com/item3

Level 2A example

The client asks for the default representation:

GET /item3 HTTP/1.1
Host: example.com

The server provides it:

HTTP/1.1 200 OK
Date: Wed, 25 Feb 2009 06:28:25 GMT
Server: Apache/2.2.9 (Debian)
Content-Length: 49
Content-Type: text/html
Put-Accept: text/html, text/wiki-creole
Available-Content-Types: text/html;processed=1, text/wiki-creole;processed=0

<html><body><strong>test3</strong></body></html>

The client asks for the source representation:

GET /item3 HTTP/1.1
Host: example.com
Accept: text/wiki-creole;processed=0

The server provides it:

HTTP/1.1 200 OK
Date: Wed, 25 Feb 2009 06:28:25 GMT
Server: Apache/2.2.9 (Debian)
Put-Accept: text/html, text/wiki-creole
Content-Length: 8
Content-Type: text/wiki-creole
Alternates: text/html;processed=1, text/wiki-creole;processed=0

**bold**

Description

Features are independent

Different web applications will choose to implement different parts of this standard. For instance, an application exposing a read-only API might implement the features of listing a collection, server-advertised multiple representations of a document, search queries, and document-associated metadata, but not implement creation or updating of documents, whereas another application might implement creating, updating, listing, deletion of documents, server-advertised multiple representations, and versioning, but not search queries or metadata. Therefore, except when dependencies are explicitly stated, each feature of this standard may be thought of as a little standard of its own, and software may claim to comply with some features of this standard without asserting compliance with other features.

For convenience, this specification will use the word MUST when describing individual features; but this should be read as, "IF software implements this particular feature, THEN it MUST ...".

Profiles

The "profiles" are merely useful abbreviations for the sake of human discussion; you can implement some features in a profile without implementing all of the others, and you can implement features in level 2x profiles without implementing profile level 1. To describe this, you speak in terms of individual features, rather than profiles.

However, if you do choose to speak in terms of profiles: to assert that software implements a profile means that it implements every feature in that profile, as well as every feature in every profile in the lower levels. So, if you say that X implements profile 2B, you are saying that it implements every feature in 2B as well as every feature in profile level 1.

Profile level 1

List a collection of documents

To request a list of URIs from a collection URL, a client sends an HTTP GET request with an Accept header with value "text/uri-list2".

The server MUST respond with a document of Content-Type text/uri-list2, or an error message.

The format text/uri-list is defined in http://tools.ietf.org/html/rfc2483#section-5. Briefly, it is a set of lines, with each line either a comment (a line starting with '#'), or a URI. The format text/uri-list2 is the same, except that each URI might optionally be followed by a space, and then a title.

The server MAY offer other representations of the list at the collection URI, for example, an XML feed document that gives metadata about each URI on the list. If so, the client can use the Accept header to choose a representation type.

Error conditions

The following errors are required in case of the following situations:

404 Not Found the URL does not exist
406 Not Acceptable the URL is not associated with a collection

If the URL is a collection URI, but the server does not associate it with a collection, then the server MUST reply with a 406. If the URL is a collection URI, but the collection is currently empty, then the server MUST reply with a 200 OK and an empty document.

The client should be aware that servers which do not follow this protocol might respond with error 406, or they might respond with a document which is not of type text/uri-list2.

Create a document at a client-specified URI

See HTTP PUT.

Update a document

See HTTP PUT.

If the client PUTs a document with a Content-Type that the target URI does not support, the server MUST issue an error 415 Unsupported Media Type.

In addition to the HTTP 1.1 status codes, the following status codes from WebDAV may be returned: 423 Locked, 422 Unprocessable Entity, 507 Insufficient Storage.

Profile level 2A

Server-advertised multiple representations of a document

For URIs which make available multiple representations, a response to HTTP GET MAY contain the new header field Available-Content-Types whose value contains a comma-separated list of possibly parameterized mime-types of alternative representations which are available. Media type parameters MAY be included.

A type which is listed in Available-Content-Types MUST be returned upon a GET request with an Accept header field containing only that type. That is to say, a type advertised in Available-Content-Types MUST actually be available. However, other types may also be available which are not included in Available-Content-Types.

The type of the default representation, that is, the representation that the server will choose to send when it gets a GET request without an Accept header, SHOULD be the first element in the Available-Content-Types list.

Simplified interpretation of the Accept header

If the server provides the Available-Content-Types header at some URL, then, regardless of whether a particular client has actually retrieved that header in the past, the server MAY choose to interpret Accept headers in GET requests to that URL in a simpler way than that given by the HTTP 1.1 spec, replacing that interpretation with the following one.

Each type in the comma-separated Accept header need only be recognized if, considered as a string, after whitespace is removed, it case-insensitively matches one of the values provided in the Available-Content-Types header, OR it case-insensitively matches at least one of the values provided in the Available-Content-Types header when the latter is stripped of parameters.

However, even if no types are recognized, if the Accept header contains the media type */*, then the server MUST serve its default document (just as if no Accept header was present).

If the server chooses not to parse other media ranges (for example, "text/*" or "*/html"), and especially if there are asterisks in the Accept header, the server should strongly consider serving its default representation rather than issuing a 406 Not Acceptable if it cannot recognize an Accept header. Otherwise, clients who request a media range which could have been satisfied will get a 406.

An example of where this interpretation differs from the HTTP 1.1 spec is as follows.

At URI X, the server may provide:

 Available-Content-Types: text/html, text/plain;foo=0;bar=3, text/plain;foo=1;bar=3

If the client sends a request with:

 Accept: text/plain;bar=3;foo=0

Then according to this protocol (but not according to HTTP 1.1), the server may act as if this were an unknown mime type (that is, it may reply with error 406, or it may reply with a representation of whatever type it chooses). The client was supposed to exactly string-match the choices given by the server, for example, this would have been recognized:

  Accept: text/plain;foo=0;bar=3

However, the client is also permitted to simply send

 Accept: text/plain

in which case the server must choose among the available text/plain representations.

A client that knows that it wants a certain media type, but doesn't understand the different parameterized versions of that type which it is being offered, MAY include all available instances of that type in its Accept header, if it sends an Accept at all, or it MAY just include that media type with no parameters. For example, at URI X, the server may provide

 Available-Content-Types: text/html, text/plain;foo=0, text/plain;foo=1

The client may know that it wants text/plain, but may not know what "foo" means. In this case, the client should send either:

 Accept: text/plain;foo=0, text/plain;foo=1

or

 Accept: text/plain

Put-Accept; PUT and POST

The header field Put-Accept can be used by the server to indicate that it can handle a PUT of a document of certain type(s) in an analogous manner. In an analogous manner, the server may refuse to recognize PUTs with a Content-Type that does not string-match one of the choices in Put-Accept.

Available-Content-Types and Put-Accept MAY be provided in responses to PUT and POST, also.

Distinguish "source representations" from "processed representations"

This feature depends on the previous feature, "Server-advertised multiple representations of a document".

Some documents have "source" representations which are given by the client, and "processed" representations which are computed by the server based on source representations.

The media type parameters "processed=0" and "processed=1" may be used to augment the media type in the value of the Accept: header or the Available-Content-Types: header.

If a URI makes available some representations which are processed and some which are not, then all items in the Available-Content-Types list SHOULD be parameterized with the "processed" attribute.

By definition, if a client submits a document, it cannot yet have been "processed". Therefore, the Put-Accept header SHOULD NOT contain any "processed" attributes.

Effect on PUTs

If a URI is writable, the server MUST be capable of accepting PUTs to that URI with any type that is advertised with the "processed=0" parameter in Available-Content-Types. For example, if the server advertises:

 Available-Content-Types: text/html;processed=1, text/plain;processed=0

then it must be capable of accepting a PUT with Content-Type: text/plain;processed=0.

This does not imply that the server has to accept any PARTICULAR PUT; for example, perhaps the user is unauthorized, the disk is full, or the submitted document has a syntax error (in this last case, error 422 Unprocessable Entity should be returned).

Profile level 2B

Locate a newsfeed

The Recent-Additions-URI header can be provided upon a GET. If provided, it MUST contain a URL which, upon a GET, MUST provide a list, possibly an incomplete list, of recent additions to the set of documents being searched, sorted by date order from most recent to oldest. This is the sort of information found in a blog's feed.

The Recent-Changes-URI header can be provided upon a GET. If provided, it MUST contain a URL which, upon a GET, MUST provide a list, possibly an incomplete list, of recent changes to the set of documents being searched, sorted by date order from most recent to oldest. This is the sort of information found in a wiki's RecentChanges?.

The difference between Recent-Additions-URI and Recent-Changes-URI is similar to the difference between file creation date and file modification date. If a document has just been created, it appears on both feeds. But if it was created a long time ago and has recently been modified, it may appear only on the Recent-Changes feed.

As noted, the lists may be incomplete; for example, some servers may choose to not list what they consider "minor" changes, and some servers may choose to only emit lists that are not longer than some maximum length.

The feeds SHOULD be offered in the text/uri-list2 format, but MAY be offered in any format, for example, XML feeds of various sorts.

If the format in which a feed is offered specifies an HTML-specific autodiscovery mechaism for clients to find the feed, and then this SHOULD be used in addition to the above headers.

If these headers are issued in response to a GET on a collection URL L, then the collection over which recent changes or recent additions are being listed SHOULD be identical to the collection of L.

Submit a search query

The Search-Query-URI-Template header can be provided upon a GET. It contains a URI template which SHOULD have the variable "query" and which MAY have other variables.

The URI Template MUST evaluate to a URL which can be GET'd in order to receive the results of the specified search query. The server MAY interpret the search query, and other template variables, however it wishes. The server MAY interpret the query the same way it would interpret a query typed into a search box by a user.

The search results MUST be available from that URL in format text/uri-list2. The search results MAY be made available in other representations, for example, XML feeds of various sorts.

For more advanced and well-specified search behavior, the Search-Service-URI header can be provided upon a GET. This header MUST contain a URI which indicates how the client should submit a search query, and the semantics of the query. It MAY contain a URL pointing to an OpenSearch? service document, or to some other machine-readable service document for some search-related standard. It MUST NOT contain a URL for a resource that only provides human-readable instructions and/or an HTML search form intended to be operated manually.

If these headers are issued in response to a GET on a collection URL L, then the collection over which the search is conducted SHOULD be identical to the collection of L.

Profile level 2C

Read and write metadata about documents

Reading metadata

The Document-Meta-URI header can be provided upon a GET to any URL L. It contains a single URI which is associated with metadata for L.

If this URI is an HTTP URL, then the contents of that URL MUST contain metadata associated with L.

The metadata MAY be represented in any format, and it MAY be offered in multiple formats. Metadata SHOULD be offered in the Atom Entry document format, with MIME type application/atom+xml;type=entry.

Writing metadata

If this metadata can be updated, then it MUST be able to be updated via an HTTP PUT.

In case of a complete success, either a 200 OK, 201 Created, or 204 No Content should be returned.

The server MAY refuse to accept the suggested updates to a subset of fields, while accepting the updates to other fields. In this case, the new status code 207 Multi-Status MUST be returned. The body of a 207 Multi-Status MAY be empty, contrary to its use in WebDAV. However, IF the body of a 207 response is of Content-Type text/xml or application/xml, then it MUST conform to the WebDAV Multi-Status format.

Delete a document

Use HTTP DELETE.

Create a document at a server-specified URI

Use HTTP POST on a collection URI.

Clients MAY attempt to POST a document to any collection URI, as an attempt to submit a document creation request. Clients MUST NOT assume that such an attempt succeeded unless a 201 Created is returned. Collection URIs MUST NOT have harmful effects if an arbitrary document is POSTed to them.

Profile level 3

Get a revision history for a document or collection

The History-URI header can be provided upon a GET request to any URL L. It contains a single URL to which a GET request can be sent in order to retrieve a list of historical versions of the document at L.

The history MUST be offered in the text/uri-list2 format. The history MAY also be offered in other formats.

Historical versions MAY be omitted from the history list (for example, perhaps the server only keeps track of the last 2 revisions; or perhaps it only wishes to emit a list of "major" revisions, even though "minor" revisions are available through other means).

If a URI in the history list is a URL, then that URL must respond to a GET request with the indicated historical version of the document. However, any URI in the history list MAY be a URN.

Revisions which can no longer be retrieved

Revisions which can no longer be retrieved SHOULD be identified by some URN.

If an old revision can no longer be retrieved, the URI in the history list MAY be a URN using the scheme "history", with a URN of the following form:

urn:history:{L}:{id}

where {L} is replaced by the canonical URI of the current version of the document, and {id} is replaced by some identifier for the particular version indicated.

The client SHOULD interpret any URN in the URN scheme "history" as no longer unavailable.

Appendix A: Definitions

Individual URIs (not entire servers) shall be said to support features of this protocol. A server MAY contain some URIs which support this protocol, and some which do not. However, it is not correct to say that a website or web application "supports" this protocol unless all URIs that it controls which make sense with that feature support it, and it is not correct to say that a software product "supports" this protocol unless it can be configured in such a way.

When this protocol is used to communicate the information that a URI supports a feature, that feature is said to be __advertised__ by the URI.

A __collection URI__ is any URI which ends in the character '/'.

A collection URI may be associated with a __collection of URIs__ (abbreviated __collection__), which is a set of URIs.

Appendix B: Index of methods, headers, status codes, and formats

HTTP methods

The behavior of GET is constrained in features:

The behavior of DELETE is used without alteration in:

The behavior of POST is constrained in features:

The behavior of PUT is constrained in features:

and is used without alteration in:

Headers

The behavior of the Accept header is redefined in the feature:

The Accept header is explicitly referenced in features:

The Available-Content-Types header is defined in the feature "server-advertised multiple representations of a document".

The Document-Meta-URI header is defined in feature "read and write metadata about documents".

The History-URI header is defined in feature "get a revision history for a document or collection".

The Search-Service-URI header is defined in feature "submit a basic search query".

The Search-Query-URI-Template header is defined in feature "submit a basic search query".

Formats

A comma-separated list of "possibly parameterized mime-types" is defined as a comma-separated list of:

       possibly parameterized mime-type         = mime-type*( ";" parameter )
       parameter = fieldname "=" value

An example is "text/html, text/wiki-creole;source=1

The "source" media type parameter is defined in feature "distinguish "source representations" from "processed representations"".

The format text/uri-list is defined in http://tools.ietf.org/html/rfc2483#section-5. The format text/uri-list2 is the same, except that each URI is optionally followed by a space, and then the rest of the line is a title associated with that URL. The titles are not URL-encoded. The titles use the charset encoding specified in the response headers.

text/uri-list2 is used in features:

Appendix C: Design rationale

Ease of implementation on the server

At the time of this writing, there is a huge proliferation of document-oriented CMS server software ("web applications" including traditional CMSs, but also blogs and wikis), but not so many client programs that know how to talk to multiple servers. Typically, each server provides its own API for editing and listing documents.

Therefore, the hurdle that a standard in this area needs to overcome is convincing the web application programmers to implement it.

Metadata agnosticism

At the time of this writing, there is not as yet a consensus among web developers about which standard format should be used to provide basic metadata about web documents. Amongst other contenders, there are RSS formats (both RDF and non-RDF), Atom, and WebDAV?. Surely it would be best if a standard like this one could cut the Gordian knot and specify a single metadata format. But many web application implementors have already chosen a metadata format, and programmed the code necessary to provide their metadata in that format.

It's easy for a web application programmer to add a new HTTP header field that tells the client where to go to get a feed. But if this standard required them to provide the feed itself in a different format, that might add just enough work to turn off people from implementing it on the server-side software. As explained in the previous subsection, getting the server-side programmers on board is crucial.

So, instead we don't tell the server-side programmers which format to use. Pathetically underspecified (how silly it seems to define how to get the metadata if you don't define how you can use it once you get it), yes. Difficult for the client programmer (who has to support all popular feed formats), yes. But we have to make it easy to implement on the server-side even at the expense of additional difficulty on the client-side.

You might say that by not defining the metadata format, we're ducking out of the most important part of the job and refusing to do the hard work of standard-making. We feel that the client programmer would vasty prefer to have a standard protocol for getting metadata, and then to have to deal with a multiplicity of formats, over having no protocol at all -- and it seems to me that unless the protocol doesn't pick a format, it won't be widely implemented, and hence we are stuck in the status quo, which is no protocol at all.

There is another reason to be metadata agnostic. Although every consumer and producer application needs to have a superficial familiarity with some of the semantics of a metadata format, applications can use libraries to actually generate and parse the format, hence most application programmers won't have to read most of the details in the specification. However, application programmers will have to know most of the other details of a protocol that their application implements. Therefore, in order to keep the spec short and non-scary so that application programmers will want to implement it, it is desirable to keep the specification of the metadata format separate from the spec for the rest of the protocol.

Why text/uri-list2?

Clearly, text/uri-list2 is almost the simplest possibly way to transmit a bare list of URIs followed by a title. Why not

As explained in the previous section, for practical reasons this standard cannot mandate a metadata format. The use of text/uri-list2 ensures that all clients will at least still be able to get a list of the set of URIs in any collection, along with some sort of title. text/uri-list2 is really easy to generate (python example: 'print "\n".join(map(lambda item: "%s %s" % (item["uri"], item["title"]), pagelist)) + "\n"'), and it's so simple that there's almost no design choices for people to disagree on.

In addition, there are many circumstances in which metadata is not needed. For these circumstances, the time and memory saved by not generating and parsing XML-based listings can sometimes be significant.

Why HTTP PUT?

If getting widespread server-side implementation is so darn important, why use HTTP PUT? Some web application situations make it difficult to the application programmer to handle PUTs, or to handle PUTs to every URL.

The reason is because, by introducing workarounds for these situations, you make the protocol significantly more complex. Either you introduce a slightly more complicated workaround that everyone has to use (see, for example, the complexity of creating documents at a specific location in Atompub, as described in the blog entry http://blog.ianbicking.org/2009/01/11/atompub-instead-o-webdav/ ; you have to use a service document to find the document creation URL, and an extra HTTP header to tell the server where to put the new document), or you introduce "fallback" mechanism, which makes the spec longer and scary-looking.

Either way, our guess is that you lose more people than you gain.

Service discovery

A separate service document or HTTP OPTIONS response is not needed because a client can determine whether a server supports each feature just by attempting to use it:

to see if a server supports a client requests and if it is supported, the response will contain otherwise
list GET w/ Accept: text/uri-list2a text/uri-list2error 406, or a something else besides text/uri-list2 is returned
create at client-specified URI PUT response 201error
update PUT response 2xx error
find multiple representations GET Available-Content-Types: and/or Put-Accept headers no such headers
processed representations GET Available-Content-Types and/or Put-Accept headers with ;processed=0 or ;processed=1 parameters no such headers
read/write metadata GET Document-Meta-URI header no such header
create at server-specified URI POST response 201???
delete DELETE response 2xx error
get revision history GET History-URI header no such header
submit search query GET Search-Query-URI-Template and/or Search-Service-URI header no such headers

Why new HTTP headers? Why not HTML link elements?

It is our position that new HTTP headers are justified for providing generic document-handling functionality.

Both HTTP headers and HTML link elements provide a way to communicate a small amount metadata about a document to the client, but HTML link elements can only be used with documents with a text/html MIME-type. Therefore, for generic document-handling operations, HTTP headers are preferred.

Why do you bother to say "use HTTP PUT"? That's not your feature, that's HTTP's

Other competing standards, even though they are also built on top of HTTP, specify different ways of accomplishing document creation (for example, atompub; see http://tools.ietf.org/html/rfc5023#section-9.2). Therefore, for clarity, we must be explicit.

Why do you alter the interpretation of the Accept header?

It's complicated for the server to have to actually parse the types provided in the Accept header; the server must parse mime types, their parameters, and in addition be able to interpret media ranges. Our view is that many server implementors simply don't (and won't) bother. If the server advertises available content types, it is not too burdensome to ask the client to choose from only among the alternatives given.

Why do you say "we"?

Sorry, I'm trying to be slightly formal. This was written by Bayle Shanks.

todo

Comply with http://tools.ietf.org/html/rfc2774#section-4 ?

Comply with http://tools.ietf.org/wg/httpbis/draft-ietf-httpbis-method-registrations/ ?

Comply with http://tools.ietf.org/wg/httpbis/draft-ietf-httpbis-p2-semantics/ ?

Check compliance with http://tools.ietf.org/html/draft-ietf-httpbis-p3-payload-05

Use http://tools.ietf.org/html/draft-ietf-httpbis-p5-range-05 for partial listing of collections?

what's up with IRIs?