[wordup] RSS for all major newspapers and websites

Tue Oct 15 00:07:19 EDT 2002

a little off topic for word up but i thought there might be some people
here who are interested in helping out.

adam.

---------- Forwarded message ----------
Date: 18 Sep 2002 01:31:21 -0700
From: Kevin A. Burton <burton at openprivacy.org>
Reply-To: p2pj at infoanarchy.org
To: p2pj at infoanarchy.org
Subject: [P2PJ] RSS for all major newspapers and websites

OK...

Technically this isn't P2P Journalism but it certainly leads us down the
path to a more decentralized network.

I am working on an tool (as part of my Reptile [2] project) that can take
any website (CNN, NYTimes, Pravda, People's Daily (Chinese Newspaper),
etc) and produce a *high* quality (verbose descriptions) RSS 1.0 feed with
the mod_content [3] module.

For the most part it is functional.

Here is what I currently have:

http://reptile.openprivacy.org/rss-content-parse.html

... for the record.  Yes.  I am concerned about the legality of these
URLs.  At least in the US they could be illegal under the DMCA.  I need to
talk to the EFF about this and see what they say.

What this enables is a system of RSS aggregators that can monitor
newspapers around the world for the latest articles and view the content
within their favorite aggregator.

For most people this exponentially increases their ability to manage news.
For example I can just login in the morning, turn on my aggregator, and
see the recent 50-100 articles since I went to sleep.  I don't have to
check all 100 of my websites and my aggregator supports advanced
functionality such as offline mirror and export to my PDA [1].

The major problem is that we need people to help develop filters for these
sites.  The filters are just URLs that a Java Servlet uses to figure out
how to filter the website.

For example:

http://reptile.peerfear.org/reptile/servlet/sitefilter/http/www.cnn.com/

Works with CNN.

A more complex one:

http://reptile.peerfear.org/reptile/servlet/sitefilter/http/www.sfgate.com/base/cgi-bin/article.cgi

Works with the San Francisco Chronicle

Right now we have about 7/12 major targets supported.  I need to fix a few
bugs and the others should work just fine.

If this sounds cool and you know a little about HTML and regexp, we could
*really* use your help.  This will only scale if we have sponsors
supporting these URLs.  Ideally we would have a few hundred RSS channels
out of this so we need volunteers... maybe 6-12 would be ideal.

If you want to help just sign up to the Reptile mailing list [4].

PS. Could anyone outside of the US help me host a mirror of these URLs?
At the very minimum the source code should always be open.

PS2. It would also help out of you linked to this.  We will only find the
volunteers if they hear about it.  The more press this gets the better!

1. http://www.peerfear.org/offnews
2. http://reptile.openprivacy.org
3. http://web.resource.org/rss/1.0/modules/content/
4. http://mail.openprivacy.org/mailman/listinfo/reptile/

--
Kevin A. Burton ( burton at apache.org, burton at openprivacy.org, burton at peerfear.org )
             Location - San Francisco, CA, Cell - 415.595.9965
        Jabber - burtonator at jabber.org,  Web - http://www.peerfear.org/
        GPG fingerprint: 4D20 40A0 C734 307E C7B4  DCAA 0303 3AC5 BD9D 7C4D
         IRC - openprojects.net #infoanarchy | #p2p-hackers | #reptile

To fight and conquer in all your battles is not supreme excellence; supreme
excellence consists in breaking the enemy's resistance without fighting.
    - Sun Tzu, 300 B.C.