30K Page Views for $0.21: A Serverless Story

Note: For regular readers of this space, players of Fantasy Movie League, you probably want to stop reading right now as this post will definitively prove my in game moniker.  If you’re a tech person, however, feel free to continue.  The views expressed here are mine in no way reflect the opinions of any past, present, or future employer.

I’m old enough to remember, after many years of writing command line programs in BASIC and C, feeling the awe of possibility upon being introduced to the event-driven programming model that the Mac provided.  My first web application went into production in January of 1996 and couldn’t use <table> tags because it wasn’t part of the HTML spec yet so I was greeted with that same awe of possibility when the DOM and then CSS became ways we could build rich, reactive web UIs.  We used to have to wait months to get access to new resources in the data center before VMs let us get our hands dirty in minutes which then became seconds with containers.

Having seen paradigm shifts before, I think I’m seeing one again: Serverless.

I know some people really hate that term since there are indeed servers running in the stack, it’s just that I don’t have to know they are there.  I certainly don’t have to manage them so I prefer “serverless” to “no ops”, but this article isn’t about semantics as much as it is how I used this technology to generate 29,918 page views (yes, I rounded up for the catchy article title) in July of 2016 for an audience of 5,385 unique users for $0.21 of AWS charges.

JulyPageViews

JulyBill

Technically that’s a $0.22 bill from AWS but notice how $0.01 of it is from EC2.  That’s a remnant of some day job experimenting I was doing and forgot to remove a volume.  For Fantasy Movie League purposes, it was $0.21, which begs the question: “What is Fantasy Movie League?”

My Problem Domain: Fantasy Movie League

In short, Fantasy Movie League (FML, <insert your joke here about alternative meanings of that acronym>) is a fantasy game for people who don’t like sports.  Each week, you get to run a fantasy cineplex of 8 screens, have a budget of $1000 fantasy dollars, and can fill those screens with a slate of 15 movies, each of which has various prices based on how it is expected to perform the following weekend.

fml

Your score each week is based on the actual box office returns of the movies you selected and there are contests for Fandango gift cards, t-shirts, and trips to movie openings or even the Oscars.  The brainchild of ESPN Senior Fantasy Analyst (and former screenwriter) Matthew Berry, when I first started playing FML there were about 6,000 registered users and I as I type now there are roughly 24,000.

A typical FML week starts on Monday when the new slate of movies and prices is available at around 5:00p Pacific.  Professional forecasts, intended to help real life theater owners with staff planning, are available Wednesday evening, final theater and showtime counts on Thursday, and the deadline to have your cineplex entered for the week is 9:00a Pacific on Friday.  Scores are tabulated on Monday and the whole cycle starts over again.

What I provide to the player base are tools that help making cineplex decisions easier.  Most notably, and the focus of the rest of this article, is the Lineup Calculator.

Given a set of forecasts for the individual movies, which you can alter on the right hand side, the Lineup Calculator will do the bin packing math to tell you what is the best combination of screens to play on the left hand side.  I seed the Lineup Calculator with different methods on different days of the week, but it is common for each player to create their own.

Lineup Calculator Architecture

The Lineup Calculator is comprised of a set of AWS Lambda functions organized into different classifications, the first of which I call Collectors.

Collector

Data is typically made available at different days/times throughout the week by different external sources, so each Collector is triggered by a CloudWatch cron job.  Upon starting, each Collector loads a configuration file from an S3 bucket that drives its behavior, typically calling an API or scraping HTML off a web page.  Regardless, the Collector generates a JSON file that then gets stored into another S3 bucket and uses SES to send a notification that it has completed successfully.  The different data sources sometimes have errors in them, which are then easy to fix manually by editing the resulting JSON files directly.

Other similarly structured Lambda functions I call Derivers get triggered by the creation of the last JSON file each needs, takes multiple JSON files created by the Collectors and derives some other intermediary file, typically a .js file to be consumed by the front end later.

Deriver

Finally, there is a set of Lambda functions I call Generators produce HTML files that then reference the newly created .js files.

Generator

I chain these together to load different kinds of data into the Lineup Calculator at different times of the week.  On Monday, for example, data from the Fantasy Movie League API (CoreGame Week), BoxOfficeMojo.com returns for the previous week (Actuals), and ProBoxOffice.com’s Long Range Forecast (LRF) are used.

Monday

But on Wednesday, ProBoxOffice.com (PBO) and ShowBuzzDaily.com (SBD, <insert your joke here about alternative meanings of that acronym>) make their forecasts for the upcoming weekend.

Wednesday

All told for the Lineup Calculator, that’s 8 different Lambda functions pulling data from 5 different sources and automatically generating updates to the static hosted S3 bucket that uses Bootstrap to make it all look nice.  For the entire site, I have close to 20 Lambda functions for the various pages with 8 data sources.

What I’ve Found is Cool (and Not) About Lambda

Boiling what I’ve done down to its essentials, I’m using Lambda as a free batch server where I’m well below the free tier of 1M transactions per month and using S3 as a low cost web host where my primary cost is the egress.  If you look closely at my AWS bill, I also have some data transfer cost because, during July, my user facing S3 bucket was hosted in one region but my Lambda targeted data bucket was in another.  That’s being fixed and should lower my bill even more in the future.

While you could argue that I’m using Lambda beyond its intended use as an IOT back end, it gives a single developer, part time, the ability to manage a complicated data consolidation scheme in a way that would simply be impossible otherwise.   There’s no need to check uptime, log size, network connectivity, or anything else I’d have to do if I were managing my own EC2 instance as a batch host not to mention the additional costs I’d incur.

Here’s some specifics on what I like about it:

  • IDE integration into Eclipse – As a middle-aged developer who has spent the last 4 years in sales and marketing, I didn’t want to have to learn a new language while I was also learning Lambda so I opted to build my functions in Java.  The AWS plug-in for Eclipse makes it ridiculously easy to create a project for a new function and handles all the packaging with a click of a few buttons.  This flattened my learning curve substantially.
  • CloudWatch integration – Some people on the Lambda forums have discussed the limitations of CloudWatch, but at my low scale I found it worked seamlessly and made it very easy to “see” what my functions were doing a  console logging library I built to standardize incremental status formatting throughout all my functions.
  • The Free Tier – My whole scheme doesn’t work without the free tier.

Some things, though, are a drag:

  • Java warming complaints are real – The JVM loading time is a non-starter if you want to use Lambda behind API Gateway for live user transactions.  For my purposes, this doesn’t matter since all my functions are batch but I found it takes around 9 seconds of overhead on a 256 MB memory size.  Less if you go bigger, but Python and Node are better choices if you need speed and I plan on checking out the work over at serverless.com when I need interactive features.
  • The size of the AWS SDK .jar file – If you use any AWS SDK calls within your function or select one of the prebuilt function signatures that use AWS objects (S3 triggers, for example), the Eclipse AWS wizard will automatically put the 34MB+ .jar file in your project.  That might not sound like a lot, but it’ll upload the whole thing every time you upload your function during development and, while I didn’t test this exhaustively, it also seemed to slow my warming time.  I cracked it open and removed everything I wasn’t using so my final function size was closer to 6MB.
  • Single target triggers from S3 – Drop a file in an S3 bucket folder, you can trigger a Lambda function invocation.  The problem is, you can only trigger one function invocation and I had a situation where I wanted two things to happen in parallel in response to this single event.  I took the easy cheat and wrote the file twice to two similarly named folders.
  • Exceptions should be events – I wanted the ability to have an exception thrown from my function be an event I could use to invoke another function.  While you could do that from within your code, that seems like a pretty standard use case that should be handled by the framework.

On sum, I really enjoy this programming model and think it is the future of computing.  When I think about all the time server daemons I’ve written (or written to) over the years sit and do nothing loop after loop after loop, the next leap in compute efficiency needs to come from on demand container consumption like what AWS has done with Lambda.  There are some other projects out there, most notably Mantl.io, which have a chance to do something similar without vendor lock in to AWS and I plan on keeping a close eye on them as this exciting trend matures.

11 thoughts on “30K Page Views for $0.21: A Serverless Story

  1. Since your biggest cost is egress, look into putting CloudFlare (a free cdn) in place. By caching the html and js files with a few hour refresh, you may drive your cost down to near $0

    Like

    1. The weekly crawling of the CloudFlare free tier may be a deal breaker since my main content changes more frequently than that, but I bet I can make something work there. Great suggestion!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s