Google news cache

| Comments (5) | DRM
A bunch of Belgian publishers just won suit aginst Google News. What's going on here is kind of interesting. The newspapers allow free access to the articles for a limited time and then move them to a paid archive. Google News indexed (and cached parts of) the articles when they were free, and the cache is still accessible even after free access is ended.

Google has responded by not even indexing (let alone caching) the relevant sites That's not really a win for Google or the publishers. A better compromise would be for the cache to expire at the same time as the article went subscription only. This could be done manually on a per-site basis or with some new HTTP/HTML indicator that told Google when to remove the cache entry (as far as I know, the current HTTP caching technology doesn't really support this, though I suppose you could repeatedly probe the site to see when permission was revoked).


Banjour krasavzy. >justin timberlake shirtless [url=]justin timberlake shirtless[/url]

In what way does the HTTP caching system currently in place not support expiry? It sure seems like it does to me.

Eric, the HTTP cache system works fine for this. Just set the expiry date correctly.

The difference here is that you need an attribute 'Must discard' or something on the expiry date.

So, my claim is that your point about mandatory expiry is on the right track. The HTTP cache expiry semantics are oriented towards "this is how long you can trust this document to be correct without checking back" as opposed to "this data is invalid/must be discarded after time t". (Note that DNS has both kinds of expiry lifetime). So, it's quite possible you would have a page that's dynamically generated and thus the first value is zero or near-zero whereas the second is semi-infinite. That's the distinction HTTP doesn't support well.

'Cache-Control: max-age=4343' seems to imply just that.

Leave a comment