Skip to content

Add email mentions to validphys index

Emanuele Roberto Nocera requested to merge emailindex into master

Created by: Zaharid

Add a script to parse the emails, and find the mentions of validphys reports and associate report id with email url and title. Because there is no way to get an email URL from the email as received, we scan the HTML of the archives, by crawling over each message in each month.

The script tries to remove links that are in quoted sections but that only works if these have already been parsed as a backquote HTML element in the email archives.

We use this information to create a link to the email, in the index page, by adding an email emoji link to each email. It could be used for other things such as displaying the email in the template.

One annoying aspect is that this is an embracingly parallel task (we could be processing the emails while we are waiting for other emails to download), but I am hitting some bug I don't understand when trying to do this with curio and asks (https://github.com/theelous3/asks/issues/118), so it will stay sequential for the moment. Because it is slow, we add a cache to remember already seen emails. At the moment index-emails needs to be run independently from index-reports (I run it once a day), but that may not be optimal.

Merge request reports

Loading