An Optimisation Challenge – Going fast with machine learning and web performance by freediver

Share This Article

Sed ut perspiciatis unde.

I’m going to take you on a journey of optimization to show you how to optimize a web crawler, image recognition, and machine learning model for speed. For this we need a hypothetical problem.

The challenge

Link to heading

There’s a website where tickets to see your favourite artist are released daily at 3 pm. But every time you try, you are too slow. You need to be the first; how fast could you optimize your process if you fully automated it?

You refresh a web page at precisely 3 pm, click through a page to select the date, another page to choose your seat at the concert, and finally fill in a captcha to submit your booking. Each page must be visited, you cannot skip to the end of this process.

Analysis

Link to heading

We need to understand where to speed up to optimize our chances of getting a ticket.

Network Latency – Let’s say the server is hosted in the US, and we’re in London. Time to make a round trip to the server: 100ms x 4 requests
Page load – Time to fully load a single page in Chrome (dom ready): 80ms x 3 pages (the last page loading isn’t time sensitive as we’ve already submitted the captcha)

Protocols – 3-way TCP handshake, SSL, and DNS lookup: 350ms.
Entering captcha – Manually typing 8 characters: 2,500ms
Submitting each page – Clicking buttons on a page: 1000ms per page.
Total time ~6,500ms

The initial optimization – the low-hanging fruit.

Link to heading

The best way to improve our chances is to focus on the largest area of time, which would take the least effort to reduce. For us, that is the human interaction part of clicking the buttons on a web page.

Submitting the form – From 3000ms to 60ms

Link to heading

Optimizing clicking through the web pages is an easy place to start. We can automate moving through the pages with minimal scripting using an extension like Greasemonkey. It can reduce each page interaction from 1000ms to 20ms. There is an additional – but hard-to-measure – benefit of scripting, which is refreshing the page at exactly the right time the appointment is released instead of manually trying to hit the refresh button at exactly 3 pm (well, 3 pm minus half the round trip latency of 50ms)

Protocols DNS/SSL/TCP 3-way Handshake – 350ms to 0ms

Link to heading

Although protocols are a tiny proportion of the overall time at around 350ms, reducing their impact is straightforward. In the browser, we can cache the DNS. We can also ensure the connection is open long enough to prevent repeating the 3-way handshake and ssl process. It stays open for about 60 seconds, so making a request 30 seconds before helps ensure the connection remains open, eliminating 350ms of potential latency.

Initial optimization run.

Link to heading

We’ve saved 3,290ms – or three seconds – suitable for general optimization, but we can go much faster.

Second attempt – Decoding the image

Link to heading

The ticket booking process requires filling in a captcha to secure your ticket. This should be the next focus for optimization as it takes the next most significant amount of time.

Above is the captcha – an 8-character randomly generated string of alphanumeric characters placed on a static background. There is only one way to decode a captcha rapidly and consistently: Teach a machine to do it for you.

Here’s the process we’ll follow to do that: Take the captcha, split apart the letters from the background, then convert the image into individual letters to send to a neural net to make a prediction.

Training data

Link to heading

To train a machine learning model on this captcha, you’ll need around 50,000 labelled images. Whilst you can get many captchas from the ticket website, the process won’t scale because every image needs to be labelled, so renaming the file from download.png to the characters it contains, e.g. 8ah3muxe.png. To manually label it would

An Optimisation Challenge – Going fast with machine learning and web performance by freediver

An Optimisation Challenge – Going fast with machine learning and web performance by freediver

Share This Article

Newsletter

The challenge

Link to heading

Analysis

Link to heading

The initial optimization – the low-hanging fruit.

Link to heading

Submitting the form – From 3000ms to 60ms

Link to heading

Protocols DNS/SSL/TCP 3-way Handshake – 350ms to 0ms

Link to heading

Initial optimization run.

Link to heading

Second attempt – Decoding the image

Link to heading

Training data

Link to heading

HackTech

Leave a comment Cancel reply

Editor's Choice

An Optimisation Challenge – Going fast with machine learning and web performance by freediver

An Optimisation Challenge – Going fast with machine learning and web performance by freediver

Share This Article

Newsletter

The challenge Link to heading

Analysis Link to heading

The initial optimization – the low-hanging fruit. Link to heading

Submitting the form – From 3000ms to 60ms Link to heading

Protocols DNS/SSL/TCP 3-way Handshake – 350ms to 0ms Link to heading

Initial optimization run. Link to heading

Second attempt – Decoding the image Link to heading

Training data Link to heading

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter

The challenge

Link to heading

Analysis

Link to heading

The initial optimization – the low-hanging fruit.

Link to heading

Submitting the form – From 3000ms to 60ms

Link to heading

Protocols DNS/SSL/TCP 3-way Handshake – 350ms to 0ms

Link to heading

Initial optimization run.

Link to heading

Second attempt – Decoding the image

Link to heading

Training data

Link to heading