I'm made a new years resolution to write some posts on my Buy Me a Coffee page for my supporters.
This one is a pretty small one updating you on what I am working on today for WikiTree Sourcer.
If you use Sourcer to create citations from the GRO (England & Wales General Register Office) for birth and death registrations you may have noticed that the citation includes a link to the UKBMD site that gives a description of the registration district.
This link is optional by the way, there is a Sourcer option under Citation / GRO : "Add a link to the registration district page on ukbmd.org.uk".
The difficulty in doing this is that the district name in the GRO index often doesn't correspond directly to the district name used on the UKBMD site. The GRO index contains many variations of names for each district. Sometimes there is something like "MOULTON IN THE COUNTY OF NORTHAMPTON". Moulton has never been a registration district - it is part of the Kettering district and there are various other places called Moulton in other registration districts around the country. Other times there is extra text like "UNION" on the name of the registration district.
I decided early on to try to add these links as they seemed useful. I developed a set of rules and special cases to try my best to map the name in the GRO index to a district on UKBMD. Sometimes this fails and a broken link is created. Over the years I have done regular WikiTree+ searches to find any broken links and fix them up and also add more special cases to the Sourcer code to avoid creating them again. Most of the ones now being created are due to typos in the GRO index.
I'm currently doing another one of these passes through the broken link suggestions and have decided that I will avoid any more broken links being created by Sourcer in the future. To do this I will run the GRO name through my existing rules and special cases to generate a name for UKBMD and then... I have built a list of the 1000+ district pages on the UKBMD site and I will check if the name I came up with actually exists in that list. If not then I will not create a link in the citation.
So the good news is that no more broken links should be created by Sourcer (once I release this version). The downside is that I won't have a way of finding the cases where my rules break down. It seems a good tradeoff at this point though since most of the broken links being created now are due to typos in the GRO index and I can't really expand my special cases to include every possible typo! I have added quite a lot of typo special cases already.
That's my update for today. Hopefully someone might find it interesting :)