Hi, daryll!
I think those strings look good. We could get false positives if someone decided to put those strings into their username or their pet's name, but... given what those strings are, I find that really unlikely lol (worth noting that pet names
can have valid HTML without problems, although it'll just show up as plaintext)
... I think we might also get false positives during events, which we should take into consideration considering we're gonna be looking at 270 million pages spread out over time -gulps-. Every time we load a page during an event, there's a chance that the page will load with a banner (just a regular image) at the bottom of the page, and banner images are also part of static.chickensmoothie.com (examples:
Summer 2017 -
https://static.chickensmoothie.com/pic. ... 680B578640Summer 2016 -
https://static.chickensmoothie.com/tran ... O6Ac8_tGqgI'm pretty sure I got these links from banners that I was about to click). I don't know if there's other stuff in the <img> tags for banners.
The banners don't show up outside of events, though. So maybe the program can just be run outside of events if you don't want to add additional checks to the program. Or we could hope that the next event has trans in the url like it did for Summer 2016 haha
edit: or you could just refresh the page if it has a banner until there's no banner (or just claim the banner lol) before you save the page, but I feel like that might get mentally tiring to have to constantly be on alert for a banner
I assume that the deleted pets are identified by process of elimination (having none of those string constants on the page)? Maybe another workaround for the banner-staticimg conflict is to identify staticimg pets by elimination instead, since deleted pets show this message:
"Sorry, that pet could not be found. Maybe you can adopt a new pet instead :)."
I'm not sure how much this suggestion will help, but maybe you can narrow down the key/legend of pet types of compare against based on a pet's adoption date*? For example, if the pet was adopted in March 2009, then you don't have to compare it against July 2008 pet types like the Sunjewel, Moonswirl, etc. right?** (Exact boundaries might be difficult to determine, especially for 2008 pets, and you'll have to be comparing to basically the full list for any December 18 pet as well as some December 19, 2009 and December 24, 2011 and December 25, 2011 pets, but still)
* we shouldn't do pet ID because of recreated pets (we can't count the actual recreated pets unless they were recreated before 2012, but oh well, I guess?)
** this means we'll miss that October 2008 Sunjewel that does have a rarity as well as... some banana floating around from a non-rerelease date with a rarity I think?, but eh, that's so infrequent that I can probably count how often it happened on just my fingers (whereas recreated pets, while still super rare and probably still negligible, are a little more common than the real oddballs I just mentioned)
This is a mess and not nearly as substantial as the quantity of text suggests oh god I'm sorry
edit: Re: file name
Is there a way to automate the file names or something? Like maybe have something go through the files in chronological order of when they were saved, and rename every file to be either the same (to get Windows to add the (1) (2) etc) or directly rename to what you want?
and yeah, I don't think we'll need to look at the pet pages again once we know what type of pet an ID is