Weblog entry #8 for JulienV

Aggregating external articles
Posted by JulienV on Sun 11 Mar 2007 at 19:50
Tags:
I am currently preparing an article for my own website, and was thinking that the subject could interest readers of d-a.org. I already have some (interesting?) articles that could be submitted here.

However, I am reluctant posting them on d-a.org, since any change should be made in two places (my own website and on d-a.org).

I know Steve is looking for more submissions: why not allow people aggregating here the articles they write for their own website? I mean, if my CMS is able to provide an XML feed for the articles I write for my pages, they could quite simply appear here, removing the need to change the submitted articles in two places.

This would allow more submissions here, while preserving our own webspace ;-)

What do you guys think of it?

Cheers,
Julien

PS: the same could apply to weblogs (although less interesting than articles, this would allow more general stuffs to be posted here and appear on planet.d-a.org).

 

Comments on this Entry

Posted by ajt (204.193.xx.xx) on Mon 12 Mar 2007 at 12:51
[ Send Message | View Weblogs ]
It's an interesting and challenging suggestion. If we publish articles on our own sites in XML, we can use XSLT (or something similar) to generate RSS feeds, and rendered XHTML. That way both sites (source and aggregator) can render the same XML in their own XHTML/CSS style. The aggregaror site would only need to know when the source XML has been updated, then download fresh XML and perform a transformation locally to house style.

It's the next logical step in syndication, rather than just using RSS feeds for little snippets, publish the whole approved aricles!

I think it's an excellent idea, it just may be a challenge to implement.

--
"It's Not Magic, It's Work"
Adam

[ Parent | Reply to this comment ]

Posted by JulienV (90.13.xx.xx) on Mon 12 Mar 2007 at 16:55
[ Send Message | View Weblogs ]
To be truly honest, I haven't thought about technical aspects of this proposal. I thought this could be done the same way Planet does.

You can set up a special feed for articles meant to be published on d-a.org, and everything is done automatically or Steve might prefer the user to submit a link to a feed which contains only one article for review before accepting it.

I am not too much inside RSS & Co. but thought it could be done quite easily, provided the current CMS are able to provide such feeds (Wordpress can't as pages are not included into the feeds, I haven't checked the other blog systems/CMS)

Anyway, I am glad to see someone approves this proposal. I am pretty sure this could help the site to get new submitters.

Cheers,
Julien

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Mon 12 Mar 2007 at 18:17
[ Send Message | View dkg's Scratchpad | View Weblogs ]
I also think this is an interesting suggestion, but think it poses implementation challenges. For one thing, the article approval process here approves a static version of an article. This allows the site editor (that would be Steve) to vet the article in its entirety before publishing.

If there's simply a URL which is expected to be pulled into d-a.org, an editor might approve version 1 of that URL, but then later find it replaced with an article that doesn't meet the standards of d-a.org. This kind of slippage could be anything from a blatant advertising link farm (e.g. a compromised remote server, or an unscrupulous author) to minor changes in a script which cause big security holes.

While issues like these are unresolved, you can still post links to your articles in your weblog, which is aggregated and syndicated (albeit without front-page placement).

[ Parent | Reply to this comment ]

Posted by JulienV (90.13.xx.xx) on Mon 12 Mar 2007 at 19:27
[ Send Message | View Weblogs ]
These are real threats, but they do already exist for comments which are post-moderated. Obviously, submitters should be registered, maybe even granted a special status to avoid new comers to syndicate their articles etc.

Maybe those syndicated articles could also be downgraded to a special category, not appearing on the front page (thus avoiding bad publicity for d-a.org), just in case something wrong happens.

It is also an evidence that the "report this article" link should not be hidden ;-)

I am of a quite optimistic nature, and do believe in honesty (provided some measures are set up anyway!).

Maybe Steve could use one of his domains to run a test?

Cheers,
Julien

PS: sorry if I seem to find excuses to your argument, but I strongly believe this would make d-a.org more dynamic and go on with the spirit Steve founded this website.

[ Parent | Reply to this comment ]

Posted by dkg (216.254.xx.xx) on Mon 12 Mar 2007 at 19:59
[ Send Message | View dkg's Scratchpad | View Weblogs ]
Those are good ideas, Julien. And i don't think you're making excuses, we're just hashing out a way that things might work.

I agree with you that being optimistic and having freer policies (with rollback coupled with auditable historical trails) is a better way to go in general (interesting thoughts by Joey Hess about this). And i definitely think that being able to submit your own specific articles (hosted elsewhere) for inclusion in d-a.org is a really good idea: no one wants to maintain two copies of a work, and an author's work really does belong on their own web presence, even if it also belongs on d-a.org.

One model that just occurred to me is to have yawns cache a copy of the syndicated article, and check regularly (whatever that means) for updates via HTTP against the original source. If there actually are updates, the updates themselves are treated as edits pending moderation on d-a.org, and wouldn't be published on d-a.org until they were reviewed by the site admin.

Things that would be necessary to make this work conveniently:

  • good publishing policy by the upstream author:
    • don't offer dynamic content for syndication.
    • don't include your site's users' comments in the syndicated text
    • choose a real, permanent URL
    • make sure your web server is publishing appropriate changed-on headers
  • convenient diff markup in yawns for the site editor to review changes to an article pulled from a syndicated source
  • clear site policy about what's allowed and not allowed, to make decisions about accepting/rejecting edits easier.
  • clear upstream policy about syndication permissions (e.g. this article may be republished freely by d-a.org, when d-a.org becomes aware of changes to this article, d-a.org {may,may not} continue to cache older revisions if the new revision is unacceptable, etc): maybe this could be done at syndicated article submission time?

This is all from the point of view of users submitting specific articles for inclusion on d-a.org, of course.

Users submitting an article stream for inclusion is a whole other prospective can of worms.

[ Parent | Reply to this comment ]

Posted by Steve (62.30.xx.xx) on Mon 12 Mar 2007 at 21:00
[ Send Message | View Steve's Scratchpad | View Weblogs ]

I haven't had time to think about this sufficintly yet, but I did want to make one quick comment before I forgot.

The most obvious problem I see with this is the nature of RSS feeds. (Mostly for article importing, but the same thing applies for importing weblog entries too).

Typically an RSS feed will list the most recent N "things". (eg. Most recent 10 weblog entries, most recent 10 articles, etc).

This causes an immediate problem in accepting a single article from an external source, you need two identifiers:

  • The RSS/Atom/XML feed URI.
  • The reference to the specific entry within it. (The "permalink"/"guid")

I could imaging updating the article table to include those two details and every few hours post-accept it could pull the feed, and extract the entry - updating the published article if changed.

But this would mean that the site-specific facilities would no longer work. It would be wrong to use the "edit weblog" entry to edit a specific entry if that were to be overwritten in the future. Ditto for articles.

I guess those aren't major problems.

But since there would still be the overhead of parsing the feeds every 6/12/24 hours until the permalink'd entry fell off the RSS feed to check for changes, and the owner would still end up with two sources of discussion - comments on the origianl (if supported by the publishing platform) and comments here.

I think I'm not convinced of the benefit of supporting it.

One thing I do want to support is the acceptance of submissions via PGP-signed emails - that is something I'm actively going to work on once this site is upgraded to Etch. (Ditto for PGP-signed notification mails)

I guess it isn't a bad idea, but with the approval for articles, the problems with finding the entry, and general overhead of mixing distinct types of content it gets tricky fast.

For weblogs I think the case is much simpler to make, and the code becomes simpler:

  • Poll the feed
  • If there is an entry with a permalink/GUID we already have for the user then we update the text.
  • If not we add a new entry.

No approval, and the reporting issue works just as well as it did in the past.

Steve

[ Parent | Reply to this comment ]

Posted by ajt (84.12.xx.xx) on Mon 12 Mar 2007 at 22:06
[ Send Message | View Weblogs ]
I was thinking of RSS done right!

Articles are placed on a source web server in XML (ideally digitally signed), newer version can be added as and when the site owner wants.

The aggregating site has a link to the XML source, assuming it's approved it's imported XSLTed to XHTML and shown on the aggregating site. Submission would be similar to today, it's approved just as articles are now.

If the end user changes their XML version the date and signature details will change. The aggregating site can either automatically download and render it replacing it's copy or perhaps the editors get a note that a given feed has an update and it's manually approved. It could even show a diff like CPAN does for Perl modules.

That way the author only maintains one copy and this site gets extra articles - or that's the theory. To make it work though we'd need to use a standard XML language, there are plenty to pick from, or we could make our own. If someone wants to use a new one then that's okay as long as they provide an XSLT to get it to the format we need.

It's all doable in Perl (or similar), with XSLT, HTTP and GPG.

--
"It's Not Magic, It's Work"
Adam

[ Parent | Reply to this comment ]

User Login

Username:

Password:

[ Advanced Login ]

Register Account

Quick Site Search