Custom RSS Bridge for Dense Discovery
Dense Discovery's RSS feed does not include the publication date (pubDate
) and nor the issue's content.
I don't mind that they don't include the content. If they want me to visit their page, I can do that. However, my RSS reader sometimes lists the issues in almost random-looking order because of the missing publication date. Or at least, that's what I think is going on.
This annoyance seemed the perfect opportunity to create a custom bridge for RSS Bridge. So I did.
Getting the list of issues and grabbing their content was close to being fun.
On the archive page, all issues are listed in a select
tag.
<select id="dynamic_select">
<option value="">Browse Archive</option>
<option value="https://www.densediscovery.com/archive/188/">Issue #188</option>
<option value="https://www.densediscovery.com/archive/187/">Issue #187</option>
<!-- ... -->
</select>
With the getSimpleHTMLDOMCached
helper function requesting the page and extracting the data was straightforward. Under the hood, it uses a pretty old-school library called simple_html_dom that makes the DOM selection and manipulation easy.
private function issuesInfo(): array
{
$html = getSimpleHTMLDOMCached(self::ARCHIVE_URL);
$optionHtmlElements = array_slice($$html->find('#dynamic_select option'), 1);
$issuesInfo = [];
foreach ($optionHtmlElements as $htmlElement) {
$issuesInfo[] = [
'title' => $htmlElement->innertext,
'url' => $htmlElement->getAttribute('value'),
];
}
return $issuesInfo;
}
I mostly left untouched the content of the issues; I just removed the comments section and fixed the paths of the images. For the path fixing, I used the defaultLinkTo
helper function.
private function issueHtmlContent(string $url): string
{
$html = getSimpleHTMLDOMCached($url);
$comments = $html->find('#comments', 0);
$comments->remove();
return (string)defaultLinkTo($html, $url);
}
Publication dates gave me a bit of a headache. The dates are nowhere mentioned, not even in a meta tag in the source code.
Since I can't extract the dates, and because this is a weekly newsletter and I know the date of the last (188) issue, I'm assigning them myself.
I'm setting the date of issue 187 one week earlier than the 188; for the issue 189, one week after 188, etc. It's probably not accurate, but it should be close, and it solves the ordering problem I had.
private function issueTimestamp(int $issueNr): int
{
$issueNrRelativeToBaseIssue = abs(self::ISSUE_NR_188 - $issueNr);
$dateRelativeToBaseIssueDate = new DateInterval("P{$issueNrRelativeToBaseIssue}W");
$baseIssueDate = new DateTimeImmutable(self::ISSUE_NR_188_DATE);
if (self::ISSUE_NR_188 < $issueNr) {
return $baseIssueDate->add($dateRelativeToBaseIssueDate)->getTimestamp();
}
return $baseIssueDate->sub($dateRelativeToBaseIssueDate)->getTimestamp();
}
Overall, I liked the developer experience. The common problems have a ready-made solution, and the documentation is helpful. It's something that I'll probably use in the future too.