2018/09/07/scrapebys

The Sotheby's art auction house started moving towards digital auctions a little over 9 years ago, with high quality scans, about 2000px on their longest side, of various works of art that Sotheby's has sold online. There was a blessed period in 2011 where some intern started uploading the images at 4000px, but it was short lived. If you know where to go to read this blog then you've likely already heard me mention this a million times by now.

Earlier last year, I had been going through the auctions by hand, picking and choosing paintings that I wanted to download, but this year I came back to their website to see if I could automate the process and just grab all of their paintings at once. Please note: my imaginary legal department wanted me to pass on the fact that scraping the Sotheby's website is against their terms of use (check condition 7d), so I suggest you do this at your own discretion. On the bright side, their robots.txt file suggests that they only care for the protection of their receipts, which actually contain what one might consider valuable data. While programming a scraper isn't particularly difficult, I've catalogued some brief notes on my process of cracking Sotheby's faberge egg.

The code, as it stands, is hosted as a bitbucket repo.

We can start by checking out their catalog of past sales. If you open up the Network tab in your favourite modern browser's console and navigate to their auction archive, you'll spot an interesting GET call for a JSON file: ajax.auctions.json.

The file only retrieves a maximum of 500 auctions at a time. We can work around this by doing multiple requests and changing the end date to the earliest returned auction.

auction_json = requests.get(url, headers=headers).text
auction_json = json.loads(auction_json)
# Sotheby's limits the number of results to 500 per request:
# Once we get less than that we can finish up.
if len(auction_json['events']) < 500:
    end_date = 0
else:
    end_date = auction_json['events'][-1]['startTimeStampInMilliSecs']

When visiting the URL for an auction, the items on sale (referred to as "lots") pop up on a list. But how is this list populated? I figured there was another API call to get the data sent via a JSON file, but I wasn't quite right. I burned through most of the JSON and JS files loaded alongside the auction page with no results. It was only after explicitly searching for the name of a lot listed on the auction that I found out the data for each lot was stored as a JSON-formatted string within a Javascript array.

It turns out this doesn't always apply to older auctions, which will sometimes have no detail page or have no overview. Usually those are older auctions, however, and are not digitized.

Now we can access arbitrary auctions and their contents, we can just save the auction data to a JSON file and start downloading the images in each auction. Lots that remain under copyright, lots that aren'ts photographed, and lots that are wine (there's an "isWine" flag in the JSON just for that!) have a placeholder image, so we can just ignore them while downloading other lots:

for lot in lots_dict['lots']:
    # don't load blank images, don't load copyright placeholders, don't drink and drive
    if "underCopyright" in lot["image"] or "lot.jpg" in lot["image"] or lot["isWine"] == "true":
        continue

Theres still a few missing features in the program that I might tackle if I feel some pressing desire for their inclusion. Just don't hold your breath for them:

  • filtering based on more than just auction title
  • avoiding the downloading of lots that are being resold from previous auctions
  • better file names (the current-slug-format-isn't-very-convenient.jpg)
  • turning the program into a python package

For now I've been using the files I've scraped as randomized desktop backgrounds. I may not publicize far and wide, but if you've found this text, I hope you find my program useful.

2018/04/05/web-bby

I wanted to make a writeup about a more recent web-baby of mine, React Forever. Registered users of the site can post a link and an emoji which represents the user's "reaction" to it, whatever that means. It's a cynical take on the ability to react to Slack/Discord messages, Github comments, and Facebook posts. What if the only thing you could do on a social network was react? React Forever answers this rightfully little-asked question. Most of its traffic has been strangers at house parties and Omegle matches on my more bored evenings that didn't tell me I looked like [insert bald pop culture figure here].

So why don't I have whole blog-post's worth of things to say about React Forever? In short, it's not really that interesting, the tech stack is especially unimpressive, and not enough people have used it for some strange mass social behaviour to arise, which was my lofty ideal scenario for a site I basically put 12 hours of effort into. I will tell you one thing, though: if you're serving a ton of small images on your frontend, put them all in one large image as a spritesheet and present them as background images + displacement in div tags. Guaranteed improvement in load times. This wasn't obivous to me: I actually stole the idea from Discord's emoji picker!

I've been mixing dance music off of my laptop more often. Soulseek, CD ripping, and vinyl samplers have helped me hoard house, disco, funk, R&B, etc etc. DJing is quite fun, and my amateur pursuit of it has been pleasant in private, but like most of my amateur pursuits it wound up needing some an external outlet for validation ASAP. To feed the need, I installed icecast on the Cynical Valley server, put up a barebones webpage with a chat box, and started streaming audio to whatever randos I could find online. This really basic clone of n10as has made me a small handful of e-friends. Just don't expect a show schedule any time soon.

I think I'll try for more meandering, free associative writing soon. Something around the lines of just talking about whatever I see online. Why should my bull be limited to twitter? Keep your eyes peeled, dear reader.

2018/01/28/patchy

I cracked open a beer and spent the night inside just writing small bugfixes for the site and adding pagination (not that you could tell until there's at least ten posts up).

To my knowledge, I've programmed something like three working blogs all by myself that I actually wound up using. Let's interpret "using" loosely: between the three of them, I've averaged two posts before dropping them entirely. My desire to write posts in the blogs I've programmed usually gives way to my desire to fix the many bugs I left lying around as I haphazardly made the blogs themselves. Eventually, the codebase becomes a mess that I don't want to refactor, and then shame kicks in. How could I use my own site, I think to myself, knowing that I played Doctor Frankenstein with the backend and made a monster?

Monsters they were. My first blog was written after I took my first basic web development course in university. Back then, all I had learned about backend programming was in that course, and it wasn't much: I walked out of it knowing how to write inline PHP and navigate XML files, so I stuck to those two "proficiencies". All of the posts I wrote were originally written up as .txt files, and when I wanted to upload them I ran a Python script to append the post to a master XML file containing all the posts. As I recall, either the filename itself or the first line was used as the title. It was a total mess, but looking back, I'm pretty impressed that I managed to pull it off. A custom copy of the code I wrote is still in use, minus the Python script for uploading, by a former barmaid from my local dive who posts poetry.

My second attempt at a blog was cleaner and much more recent. Once the Summer of 2017 rolled in, I wanted to have a project I could show off to potential employers at career fairs in my last year at university, so my thoughts drifted towards making a blog that followed Model-View-Controller principles at least remotely and made use of a proper database. I have a simultaneous love for Python and distaste for the Django web framework (one day I'll learn to eat my vegetables), so I went for the lighter Flask web framework and got to work. I made the mistake of writing raw SQL queries in the engine code, which became a pain when it was time to write more complicated stuff like the tagging system I implemented. Sure, everything worked, but by then my backend was becoming convoluted and hard to manage once again. The code still sees use today for another friend's blog: they're using it for poetry too.

Fast-forward to the present and I have a job (for now), as well as a healthy spoonful of ennui and a tendency to let my mind wander. In this iteration of the blog, I've gotten rid of tags and databases in a nod to the fact that I don't really need them if all I wanna do is write. I recently came across some good advice on writing, so I'm keeping it in mind as long as I can, and it's that nobody wants to read your shit. I find that a liberating thought.

2018/01/25/vicarious-ugly-man

I spent a sizeable chunk of my adolescence hooked up to assorted video toys, my favourite of which were first-person shooters and RPGs set in bleak locales where the player character had to survive using their grit, determination, and ability to exploit predictable enemy AI patterns. STALKER, Fallout 3 and New Vegas, Bioshock, Left 4 Dead, Dark Souls, etc. fed a shared cultural fantasy among young men of going it alone as Chief Badass in a cruel and violent world. Long after that phase should have expired in me (and in bundles of other young men around me), I've retained an immature fascination with the most grim titles.

The locus (for me, anyway) was Far Cry 2. Outside of a big portion of the game around two thirds in, I found its pacing addictive. You head into a war-torn country and visit the town that's under cease fire, grab the first job that comes your way from either of the two fighting factions, and start fires and blow a bunch of shit up, further destabilizing the country. There's a nice (if a little on-the-nose) Heart of Darkness moment building up towards the end of the game where how much of an asshole you are is really drilled into you as you play.

I especially liked many of the buddies. You get a choice to be one of nine hardened mercenaries with shady pasts in on-site security and whatnot, but your choice just amounts to your arms looking different in-game while holding a variety of apprently off-model firearms. The characters you didn't choose to play as then litter the world and mumble at you about their shady pasts whenever you meet with them, handing them something like a briefcase full of drugs or gold or whatever: "oh yeah, just like my time in the IDF." "My friend in Bulgaria? He could move this stuff in a week.". Getting those little bits of their horrid character was a great source of fascination for me, with the best horrid character of course being the game's antagonist, The Jackal, whose moral treatises were conveniently scattered across the game world and waxed poetic about how selling weapons made by German union workers to warlords is as equally abhorrent as selling radios made by Bangladeshi kids to Wal-Mart.

Around the same time as I finished Far Cry 2, I read a Wired article about John McAfee. He's a leathery old man with a storied curriculum vitae which includes time spent in the seventies playing Power Gringo and dealing drugs in central America before developing a mediocre antivirus software:

In Belize, offending the Police Commissioner will immediately get a policeman fired, with no repercussions to the Commissioner, and, depending on the offense, may even get the officer "erased". So it gives an officer serious pause when you say: "The drugs belong to Commissioner (insert name). I am delivering them to a friend for him". If spoken with authority and condescension, they can have a dramatic effect. No policeman in his right mind would try to validate the story. Resident Gringos, for odd reasons, are prized as friends by wealthy and prominent locals, so it would not be out of the question to be close with the Country's Police Commissioner.

Things have been adding up for me since. I've basically been fetishizing criminals and soldiers who do shitty things in dangerous places, places that my cushy upbringing and lifestyle allow me to never approach. One of my favourite films is Sorcerer, about a handful of criminals who all accept a suicide mission to escape the unnamed jungle town where they're lying low. I've got photos of soldiers from a smattering of places: the seige of Sarajevo, the old Spanish civil war, the Russia-Chechnya border, next to a bunch of unimpressed Afghani kids. I thought that Kane and Lynch 2 essentially being LiveLeak: The Game was great, and its amateur video style and effects were a great way of getting its themes across.

At this point I'm thinking thease are some major red flags. I'm sharing this interest with neocon reactionaries, after all. At least I don't think Brad Pitt was cool in Fight Club or that the Rhodesians were right, but outside of my politics I'm not so far removed in aesthetic tastes from the people that do. At the time of writing, if you visit /fa/, the fashion board on 4chan, some of it's posters are busy trying to push a new style called terrorwave: wearing military surplus items with cheap-looking clothes like thrifted Levis and Adidas beaters with ridiculous soles.

I've been against every -core or -wave invented by the board since I first visited, but this is the first time I've seen an /fa/-invented idea follow fashion industry guidelines: find some dangerous or disenfranchised group of people from the past, like gangsters, skinheads, bikers, what-have-you and then appropriate what they wear as the New Thing. I wonder if this style will see some traction. It's another opportunity to pay homage to the adolescent Chief Badass.

2018/01/24/hello-world

At it again with rebuilding the site. I basically did this in a day so that I wouldn't play video games tonight. Gonna give writing a shot again. Stay tuned.

The site's built with Flask, like its previous iteration, except now it's actually serving static files from the server so I don't need to bother with any pesky SQL.

Argh, but I just checked and this site just looks like a bad ripoff of Real Life Mag haha