(转帖)rss2.0规范

(转自:http://blogs.law.harvard.edu/tech/rss)

RSS at Harvard Law

Syndication technology hosted by the Berkman Center

RSS 2.0 Specification

Contents

What is RSS?

Sample files

About this document

Required channel elements

Optional channel elements

Elements of <item>

Comments

Extending RSS

Roadmap

License and authorship

What is RSS?

 

 RSS Directory

 About this website

 Specifications

 Feeds

 Aggregators

 Validators

 Howtos/Articles

 Tools

 Utilities

 

 

 

RSS is a Web content syndication format.

Its name is an acronym for Really Simple Syndication.

RSS is a dialect of XML. All RSS files must conform to the XML 1.0 specification, as published on the World Wide Web Consortium (W3C) website.

A summary of RSS version history.

At the top level, a RSS document is a <rss> element, with a mandatory attribute called version, that specifies the version of RSS that the document conforms to. If it conforms to this specification, the version attribute must be 2.0.

Subordinate to the <rss> element is a single <channel> element, which contains information about the channel (metadata) and its contents.

Sample files

Here are sample files for: RSS 0.91, 0.92 and 2.0.

Note that the sample files may point to documents and services that no longer exist. The 0.91 sample was created when the 0.91 docs were written. Maintaining a trail of samples seems like a good idea.

About this document

This document represents the status of RSS as of the Fall of 2002, version 2.0.1.

It incorporates all changes and additions, starting with the basic spec for RSS 0.91 (June 2000) and includes new features introduced in RSS 0.92 (December 2000) and RSS 0.94 (August 2002).

Change notes are here.

First we document the required and optional sub-elements of <channel>; and then document the sub-elements of <item>. The final sections answer frequently asked questions, and provide a roadmap for future evolution, and guidelines for extending RSS.

Required channel elements

Here's a list of the required channel elements, each with a brief description, an example, and where available, a pointer to a more complete description.

Element Description Example

title The name of the channel. It's how people refer to your service. If you have an HTML website that contains the same information as your RSS file, the title of your channel should be the same as the title of your website.  GoUpstate.com News Headlines

link The URL to the HTML website corresponding to the channel. http://www.goupstate.com/

description        Phrase or sentence describing the channel. The latest news from GoUpstate.com, a Spartanburg Herald-Journal Web site.

Optional channel elements

Here's a list of optional channel elements.

Element Description Example

language The language the channel is written in. This allows aggregators to group all Italian language sites, for example, on a single page. A list of allowable values for this element, as provided by Netscape, is here. You may also use values defined by the W3C. en-us

copyright Copyright notice for content in the channel. Copyright 2002, Spartanburg Herald-Journal

managingEditor Email address for person responsible for editorial content. geo@herald.com (George Matesky)

webMaster Email address for person responsible for technical issues relating to channel. betty@herald.com (Betty Guernsey)

pubDate The publication date for the content in the channel. For example, the New York Times publishes on a daily basis, the publication date flips once every 24 hours. That's when the pubDate of the channel changes. All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred). Sat, 07 Sep 2002 00:00:01 GMT

lastBuildDate The last time the content of the channel changed. Sat, 07 Sep 2002 09:42:31 GMT

category Specify one or more categories that the channel belongs to. Follows the same rules as the <item>-level category element. More info. <category>Newspapers</category>

generator A string indicating the program used to generate the channel. MightyInHouse Content System v2.3

docs A URL that points to the documentation for the format used in the RSS file. It's probably a pointer to this page. It's for people who might stumble across an RSS file on a Web server 25 years from now and wonder what it is. http://blogs.law.harvard.edu/tech/rss

cloud Allows processes to register with a cloud to be notified of updates to the channel, implementing a lightweight publish-subscribe protocol for RSS feeds. More info here. <cloud domain="rpc.sys.com" port="80" path="/RPC2" registerProcedure="pingMe" protocol="soap"/>

ttl ttl stands for time to live. It's a number of minutes that indicates how long a channel can be cached before refreshing from the source. More info here. <ttl>60</ttl>

image Specifies a GIF, JPEG or PNG image that can be displayed with the channel. More info here. 

rating The PICS rating for the channel. 

textInput Specifies a text input box that can be displayed with the channel. More info here. 

skipHours A hint for aggregators telling them which hours they can skip. More info here. 

skipDays A hint for aggregators telling them which days they can skip. More info here. 

<image> sub-element of <channel>

<image> is an optional sub-element of <channel>, which contains three required and three optional sub-elements.

<url> is the URL of a GIF, JPEG or PNG image that represents the channel.

<title> describes the image, it's used in the ALT attribute of the HTML <img> tag when the channel is rendered in HTML.

<link> is the URL of the site, when the channel is rendered, the image is a link to the site. (Note, in practice the image <title> and <link> should have the same value as the channel's <title> and <link>.

Optional elements include <width> and <height>, numbers, indicating the width and height of the image in pixels. <description> contains text that is included in the TITLE attribute of the link formed around the image in the HTML rendering.

Maximum value for width is 144, default value is 88.

Maximum value for height is 400, default value is 31.

<cloud> sub-element of <channel>

<cloud> is an optional sub-element of <channel>.

It specifies a web service that supports the rssCloud interface which can be implemented in HTTP-POST, XML-RPC or SOAP 1.1.

Its purpose is to allow processes to register with a cloud to be notified of updates to the channel, implementing a lightweight publish-subscribe protocol for RSS feeds.

<cloud domain="rpc.sys.com" port="80" path="/RPC2" registerProcedure="myCloud.rssPleaseNotify" protocol="xml-rpc" />

In this example, to request notification on the channel it appears in, you would send an XML-RPC message to rpc.sys.com on port 80, with a path of /RPC2. The procedure to call is myCloud.rssPleaseNotify.

A full explanation of this element and the rssCloud interface is here.

<ttl> sub-element of <channel>

<ttl> is an optional sub-element of <channel>.

ttl stands for time to live. It's a number of minutes that indicates how long a channel can be cached before refreshing from the source. This makes it possible for RSS sources to be managed by a file-sharing network such as Gnutella.

Example: <ttl>60</ttl>

<textInput> sub-element of <channel>

A channel may optionally contain a <textInput> sub-element, which contains four required sub-elements.

<title> -- The label of the Submit button in the text input area.

<description> -- Explains the text input area.

<name> -- The name of the text object in the text input area.

<link> -- The URL of the CGI script that processes text input requests.

The purpose of the <textInput> element is something of a mystery. You can use it to specify a search engine box. Or to allow a reader to provide feedback. Most aggregators ignore it.

--------------------------------------------------------------------------------

Elements of <item>

A channel may contain any number of <item>s. An item may represent a "story" -- much like a story in a newspaper or magazine; if so its description is a synopsis of the story, and the link points to the full story. An item may also be complete in itself, if so, the description contains the text (entity-encoded HTML is allowed; see examples), and the link and title may be omitted. All elements of an item are optional, however at least one of title or description must be present.

Element Description Example

title The title of the item. Venice Film Festival Tries to Quit Sinking

link The URL of the item. http://nytimes.com/2004/12/07FEST.html

description      The item synopsis. Some of the most heated chatter at the Venice Film Festival this week was about the way that the arrival of the stars at the Palazzo del Cinema was being staged.

author Email address of the author of the item. More. 

category Includes the item in one or more categories. More. 

comments URL of a page for comments relating to the item. More. 

enclosure Describes a media object that is attached to the item. More. 

guid A string that uniquely identifies the item. More. 

pubDate Indicates when the item was published. More. 

source The RSS channel that the item came from. More. 

<source> sub-element of <item>

<source> is an optional sub-element of <item>.

Its value is the name of the RSS channel that the item came from, derived from its <title>. It has one required attribute, url, which links to the XMLization of the source.

<source url="http://www.tomalak.org/links2.xml">Tomalak's Realm</source>

The purpose of this element is to propagate credit for links, to publicize the sources of news items. It can be used in the Post command of an aggregator. It should be generated automatically when forwarding an item from an aggregator to a weblog authoring tool.

<enclosure> sub-element of <item>

<enclosure> is an optional sub-element of <item>.

It has three required attributes. url says where the enclosure is located, length says how big it is in bytes, and type says what its type is, a standard MIME type.

The url must be an http url.

<enclosure url="http://www.scripting.com/mp3s/weatherReportSuite.mp3" length="12216320" type="audio/mpeg" />

A use-case narrative for this element is here.

<category> sub-element of <item>

<category> is an optional sub-element of <item>.

It has one optional attribute, domain, a string that identifies a categorization taxonomy.

The value of the element is a forward-slash-separated string that identifies a hierarchic location in the indicated taxonomy. Processors may establish conventions for the interpretation of categories. Two examples are provided below:

<category>Grateful Dead</category>

<category domain="http://www.fool.com/cusips">MSFT</category>

You may include as many category elements as you need to, for different domains, and to have an item cross-referenced in different parts of the same domain.

<pubDate> sub-element of <item>

<pubDate> is an optional sub-element of <item>.

Its value is a date, indicating when the item was published. If it's a date in the future, aggregators may choose to not display the item until that date.

<pubDate>Sun, 19 May 2002 15:21:36 GMT</pubDate>

<guid> sub-element of <item>

<guid> is an optional sub-element of <item>.

guid stands for globally unique identifier. It's a string that uniquely identifies the item. When present, an aggregator may choose to use this string to determine if an item is new.

<guid>http://some.server.com/weblogItem3207</guid>

There are no rules for the syntax of a guid. Aggregators must view them as a string. It's up to the source of the feed to establish the uniqueness of the string.

If the guid element has an attribute named "isPermaLink" with a value of true, the reader may assume that it is a permalink to the item, that is, a url that can be opened in a Web browser, that points to the full item described by the <item> element. An example:

<guid isPermaLink="true">http://inessential.com/2002/09/01.php#a2</guid>

isPermaLink is optional, its default value is true. If its value is false, the guid may not be assumed to be a url, or a url to anything in particular.

<comments> sub-element of <item>

<comments> is an optional sub-element of <item>.

If present, it is the url of the comments page for the item.

<comments>http://ekzemplo.com/entry/4403/comments</comments>

More about comments here.

<author> sub-element of <item>

<author> is an optional sub-element of <item>.

It's the email address of the author of the item. For newspapers and magazines syndicating via RSS, the author is the person who wrote the article that the <item> describes. For collaborative weblogs, the author of the item might be different from the managing editor or webmaster. For a weblog authored by a single individual it would make sense to omit the <author> element.

<author>lawyer@boyer.net (Lawyer Boyer)</author>

Comments

RSS places restrictions on the first non-whitespace characters of the data in <link> and <url> elements. The data in these elements must begin with an IANA-registered URI scheme, such as http://, https://, news://, mailto: and ftp://. Prior to RSS 2.0, the specification only allowed http:// and ftp://, however, in practice other URI schemes were in use by content developers and supported by aggregators. Aggregators may have limits on the URI schemes they support. Content developers should not assume that all aggregators support all schemes.

In RSS 0.91, various elements are restricted to 500 or 100 characters. There can be no more than 15 <items> in a 0.91 <channel>. There are no string-length or XML-level limits in RSS 0.92 and greater. Processors may impose their own limits, and generators may have preferences that say no more than a certain number of <item>s can appear in a channel, or that strings are limited in length.

In RSS 2.0, a provision is made for linking a channel to its identifier in a cataloging system, using the channel-level category feature, described above. For example, to link a channel to its Syndic8 identifier, include a category element as a sub-element of <channel>, with domain "Syndic8", and value the identifier for your channel in the Syndic8 database. The appropriate category element for Scripting News would be <category domain="Syndic8">1765</category>.

A frequently asked question about <guid>s is how do they compare to <link>s. Aren't they the same thing? Yes, in some content systems, and no in others. In some systems, <link> is a permalink to a weblog item. However, in other systems, each <item> is a synopsis of a longer article, <link> points to the article, and <guid> is the permalink to the weblog entry. In all cases, it's recommended that you provide the guid, and if possible make it a permalink. This enables aggregators to not repeat items, even if there have been editing changes.

If you have questions about the RSS 2.0 format, please post them on the RSS2-Support mail list, hosted by Sjoerd Visscher. This is not a debating list, but serves as a support resource for users, authors and developers who are creating and using content in RSS 2.0 format.

Extending RSS

RSS originated in 1999, and has strived to be a simple, easy to understand format, with relatively modest goals. After it became a popular format, developers wanted to extend it using modules defined in namespaces, as specified by the W3C.

RSS 2.0 adds that capability, following a simple rule. A RSS feed may contain elements not described on this page, only if those elements are defined in a namespace.

The elements defined in this document are not themselves members of a namespace, so that RSS 2.0 can remain compatible with previous versions in the following sense -- a version 0.91 or 0.92 file is also a valid 2.0 file. If the elements of RSS 2.0 were in a namespace, this constraint would break, a version 0.9x file would not be a valid 2.0 file.

Roadmap

RSS is by no means a perfect format, but it is very popular and widely supported. Having a settled spec is something RSS has needed for a long time. The purpose of this work is to help it become a unchanging thing, to foster growth in the market that is developing around it, and to clear the path for innovation in new syndication formats. Therefore, the RSS spec is, for all practical purposes, frozen at version 2.0.1. We anticipate possible 2.0.2 or 2.0.3 versions, etc. only for the purpose of clarifying the specification, not for adding new features to the format. Subsequent work should happen in modules, using namespaces, and in completely new syndication formats, with new names.

License and authorship

RSS 2.0 is offered by the Berkman Center for Internet & Society at Harvard Law School under the terms of the Attribution/Share Alike Creative Commons license. The author of this document is Dave Winer, founder of UserLand software, and fellow at Berkman Center.

 

Unless otherwise labeled by its originating author, the content found on this site is made available under the terms of an Attribution/Share Alike Creative Commons license, with the exception that no rights are granted -- since they are not ours to grant -- in any logo, graphic design, trademarks or trade names, including the Harvard name. Last update: Sunday, January 30, 2005 at 6:14:58 PM. Webmaster: Rogers Cadenhead.

关于cnzz的访问统计和我的网站访问日志统计的发现

  例如在我的网站log4j记录中,从昨天早上凌晨(02:23:12,484)到今天早上凌晨(02:14:13,140)共有491次guest帐号登录,但是在cnzz的访问统计中昨天大概只有20个独立ip和独立访客,每个访客大概访问4~5页,差距是25倍。

看来80%都是是搜索引擎的登录了。cnzz肯定了排除了搜索引擎的登录了的。还有,从我的log4j日志中看到连续每隔几秒就有一次guest登录,那估计就是搜索引擎了。但我搞不懂,为什么一个搜索引擎要连续的登录,它登录一次应该可以访问本站所有的东西的。难道搜索引擎不能保留session吗?

  另外,我在google adsense上申请的广告投放的一天24小时的展示次数大概是190次,也就是一天大概有190页被查看了吗?这跟cnzz的数据100次page view只差一倍。看来,google adsense是不会计算搜索引擎的访问的。

  太多的知识不知道了。只能记在这里了。

实现学习日记和我的blog的双向绑定

实现创意:根据用户提供的rss地址同步更新用户在其它网站的帖子(http://www.learndiary.com/disDiaryContentAction.do?searchDiaryID=1381&goalID=1381&naviStr=a10a2313)。

既然rss阅读器能够做到,为什么学习日记不能做到呢?

www.43things.com已经实现了它到几个blog的单向绑定,见下面的摘录(摘自:http://www.43things.com/about/view/faq):

...

Can I post to my blog from 43 Things?

You bet, but first you’ll have to configure an external blog (do that here). You’ll be guided through the set-up process and at the end you can try a test post to make sure everything works.

Once that’s done, you can have any entry on 43 Things show up on your blog as well.

We currently support the following blogs:

Blogger

Typepad

Movable Type

Live Journal

Word Press

You can set up as many blogs as you like. We haven’t made a way for you to delete one yet. Sorry.

Can I post to 43 Things from my blog?

Not yet. But we are working on our API now. More details soon. Check out our API and recommend it to your favorite blogging software developer.

...

虽然www.43things.com没有实现从blog到它自身的绑定,但是我在网上好像看到过这种做法,而且rss阅读器能实现的,为什么网站上不能实现?

以学习日记到我在matrix的blog(名为建设学习日记:http://blog.matrix.org.cn/page/littlebat)的双向绑定为目标,不限时间,直到实现。

计划步骤:

1、给学习日记添加rss2.0支持;

2、从学习日记绑定到blog;

3、从blog绑定到学习日记;

这个目标已经在开发社区的issue系统提交两个issue:

1、issue 11:binding learndiary and my roller's blog in both two directions(http://learndiary.tigris.org/issues/show_bug.cgi?id=11)(实现学习日记和我的blog的双向绑定)

2、issue 12:add rss2.0 feed support into learndiary(http://learndiary.tigris.org/issues/show_bug.cgi?id=12)(给学习日记添加rss2.0支持;)

学习日记的邮件系统不能群发邮件

  试验了两次向注册用户群发系统公告。一次用webmaster@learndiary.com的帐号。log显示了每一封都发出了。但是我注册的那个guest帐号并没有收到。大概每200位后的朋友可能都没有收到。发给200位前的朋友的邮件又有近100封没有发送成功,不是拒收就是发送失败。

  另一次用learndiary@tom.com的帐号发,就是今天早上,结果情况更差,发了20封就停了,log上也显示只发了20封。

  我现在可以确定学习日记的邮件系统不能完成它应该有的功能。不知问题出在哪儿,用什么方法才能保证所有有效的邮件都能发送成功?这是一个需要解决的问题。

  但这不是现在最紧的问题,先记在这里再说。

准备申请google的adsense,很奇怪learndiary的google pr=4?

  穷啊,连网费都缴不出了。今天申请了一个google的adsense帐号,看能不能被批准。

  我很奇怪,学习日记的google pr值怎么会是4,那些著名的java门户站都才5~6。不管学习日记的pr值是多少,它的状况并没有实现我预想的效果,这就是一个没有用的东西。

使用静态常量的注意事项

  两个*.java文件,一个是静态常量,一个是使用静态常量的。含有静态常量的文件在我本地和虚拟空间的内容是不一样的,使用静态常量的文件是一样的。我在本地更新了使用静态常量的那个文件,上传到虚拟空间后发现,这个文件引用的是本地的静态常量。

  反编译使用静态常量的那个*.class文件,发现引用静态常量的变量统统都是本地的静态常量值。

  原来,java中使用的静态常量是编译时就固定了,并不是运行时间的动态调用。看来,有必要学习一下java运行的基本原理。

使用标记的怪现象

       <html:select property="parentID" name="oldAdvice">

         <html:options collection="processGoalsList" property="articleID" labelProperty="articleName"/>

       </html:select>

  上面代码的作用是把一个包含同样对象的List中的对象显示到下拉列表中,其中默认为对象“oldAdvice”,本来这个用法在资料中说得很清楚,我用它却出现了至今也想不通的怪现象两次。就是我按照正确的方法写了代码后,运行中却始终得不到正确的效果。我不知道是eclipse或tomcat的问题。

  以后为了保险起见,遇到自己不熟悉的代码编写后,运行时要:1、删除程序运行的work目录;2、重新编译源程序;3、重新启动tomcat服务器。

  我想,这样总不会再扯拐吧?

学习日记网站关于blog的调查

    (备注:由littlebat于2006年4月17日转入文档区。)

  首先非常感谢您光临学习日记网站。

  下面作一个关于blog(即博客、网志、网络日记)小调查:

  请问:您曾经在网上申请了几个blog?你的blog中日记的被回复百分比大概是多少?

  我的示范答案:4,1%