Custom RSS Bridge for Dense Discovery

Dense Discovery's RSS feed does not include the publication date (pubDate) and nor the issue's content.

I don't mind that they don't include the content. If they want me to visit their page, I can do that. However, my RSS reader sometimes lists the issues in almost random-looking order because of the missing publication date. Or at least, that's what I think is going on.

This annoyance seemed the perfect opportunity to create a custom bridge for RSS Bridge. So I did.


Getting the list of issues and grabbing their content was close to being fun.

On the archive page, all issues are listed in a select tag.

<select id="dynamic_select">
    <option value="">Browse Archive</option>
    <option value="https://www.densediscovery.com/archive/188/">Issue #188</option>
    <option value="https://www.densediscovery.com/archive/187/">Issue #187</option>
    <!-- ... -->
</select>

With the getSimpleHTMLDOMCached helper function requesting the page and extracting the data was straightforward. Under the hood, it uses a pretty old-school library called simple_html_dom that makes the DOM selection and manipulation easy.

private function issuesInfo(): array
{
    $html = getSimpleHTMLDOMCached(self::ARCHIVE_URL);
    $optionHtmlElements = array_slice($$html->find('#dynamic_select option'), 1);

    $issuesInfo = [];

    foreach ($optionHtmlElements as $htmlElement) {
        $issuesInfo[] = [
            'title' => $htmlElement->innertext,
            'url' => $htmlElement->getAttribute('value'),
        ];
    }

    return $issuesInfo;
}

I mostly left untouched the content of the issues; I just removed the comments section and fixed the paths of the images. For the path fixing, I used the defaultLinkTo helper function.

private function issueHtmlContent(string $url): string
{
    $html = getSimpleHTMLDOMCached($url);

    $comments = $html->find('#comments', 0);
    $comments->remove();

    return (string)defaultLinkTo($html, $url);
}

Publication dates gave me a bit of a headache. The dates are nowhere mentioned, not even in a meta tag in the source code.

Since I can't extract the dates, and because this is a weekly newsletter and I know the date of the last (188) issue, I'm assigning them myself.

I'm setting the date of issue 187 one week earlier than the 188; for the issue 189, one week after 188, etc. It's probably not accurate, but it should be close, and it solves the ordering problem I had.

private function issueTimestamp(int $issueNr): int
{
    $issueNrRelativeToBaseIssue = abs(self::ISSUE_NR_188 - $issueNr);
    $dateRelativeToBaseIssueDate = new DateInterval("P{$issueNrRelativeToBaseIssue}W");

    $baseIssueDate = new DateTimeImmutable(self::ISSUE_NR_188_DATE);

    if (self::ISSUE_NR_188 < $issueNr) {
        return $baseIssueDate->add($dateRelativeToBaseIssueDate)->getTimestamp();
    }

    return $baseIssueDate->sub($dateRelativeToBaseIssueDate)->getTimestamp();
}

Overall, I liked the developer experience. The common problems have a ready-made solution, and the documentation is helpful. It's something that I'll probably use in the future too.