Add email mentions to validphys index
Created by: Zaharid
Add a script to parse the emails, and find the mentions of validphys reports and associate report id with email url and title. Because there is no way to get an email URL from the email as received, we scan the HTML of the archives, by crawling over each message in each month.
The script tries to remove links that are in quoted sections but that
only works if these have already been parsed as a backquote
HTML
element in the email archives.
We use this information to create a link to the email, in the index page, by adding an email emoji link to each email. It could be used for other things such as displaying the email in the template.
One annoying aspect is that this is an embracingly parallel task (we could be processing the emails while we are waiting for other emails to download), but I am hitting some bug I don't understand when trying to do this with curio and asks (https://github.com/theelous3/asks/issues/118), so it will stay sequential for the moment. Because it is slow, we add a cache to remember already seen emails. At the moment index-emails needs to be run independently from index-reports (I run it once a day), but that may not be optimal.