The Sotheby's art auction house started moving towards digital auctions a little over 9 years ago, with high quality scans, about 2000px on their longest side, of various works of art that Sotheby's has sold online. There was a blessed period in 2011 where some intern started uploading the images at 4000px, but it was short lived. If you know where to go to read this blog then you've likely already heard me mention this a million times by now.
We can start by checking out their catalog of past sales. If you open up the Network tab in your favourite modern browser's console and navigate to their auction archive, you'll spot an interesting GET call for a JSON file: ajax.auctions.json.
The file only retrieves a maximum of 500 auctions at a time. We can work around this by doing multiple requests and changing the end date to the earliest returned auction.
auction_json = requests.get(url, headers=headers).text auction_json = json.loads(auction_json) # Sotheby's limits the number of results to 500 per request: # Once we get less than that we can finish up. if len(auction_json['events']) < 500: end_date = 0 else: end_date = auction_json['events'][-1]['startTimeStampInMilliSecs']
It turns out this doesn't always apply to older auctions, which will sometimes have no detail page or have no overview. Usually those are older auctions, however, and are not digitized.
Now we can access arbitrary auctions and their contents, we can just save the auction data to a JSON file and start downloading the images in each auction. Lots that remain under copyright, lots that aren'ts photographed, and lots that are wine (there's an "isWine" flag in the JSON just for that!) have a placeholder image, so we can just ignore them while downloading other lots:
for lot in lots_dict['lots']: # don't load blank images, don't load copyright placeholders, don't drink and drive if "underCopyright" in lot["image"] or "lot.jpg" in lot["image"] or lot["isWine"] == "true": continue
Theres still a few missing features in the program that I might tackle if I feel some pressing desire for their inclusion. Just don't hold your breath for them:
- filtering based on more than just auction title
- avoiding the downloading of lots that are being resold from previous auctions
- better file names (the current-slug-format-isn't-very-convenient.jpg)
- turning the program into a python package
For now I've been using the files I've scraped as randomized desktop backgrounds. I may not publicize far and wide, but if you've found this text, I hope you find my program useful.