Fun with Nginx as an API cache
Over the past couple of years, I’ve felt like what is currently called “front-end” web development is a tremendous blind spot for me. It just so happens that current world events leave me curious about some data that is widely available via public APIs. My personal desire to understand some of the information about the world that those APIs expose makes this a nice, organic opportunity to learn more about the front-end. What I learned today was an interesting tangent. I suspect that others have had this problem and have more efficient ways to address it.
Background
Usually, with these APIs, the drill is that you need to sign up for an API key and use that for each request. Free or low cost plans for these keys are often severely rate-limited. That’s unfortunate for trying to explore them directly from the front-end. One wrong turn debugging rendering behavior can abuse the server and run afoul of your quota before you get close to figuring out what you’ve done wrong, unless you’re very careful. The need to wait out your time limit can really ruin your day.
Since almost all of my requests (and 100% of the ones that I was hurting my quotas on against other people’s servers) are just GET requests, this seems easy to solve with a caching proxy. Throwing squid on my network and sending all of my browser requests (and some of my server requests) through that felt like it would be too disruptive.
Web searches on the matter came up lean. This blog post from 2013 described the problem well, but the configurations recommended haven’t aged very well. Here’s what I finally set up.
Details
I have an ESXi server on my LAN. VMWare’s Photon is a nice, small distribution with just enough to play well with docker. It boots very quickly on an ESXi server, and I like it for keeping my development containers off my laptop. Everything here would work just fine with local containers, though.
I started with the official nginx docker image and used volumes to expose config files, a little static content (unrelated to the main topic here) and a filesystem where requests and responses could be cached. The log volume makes it easier to get at debug entries than the standard docker log
dance.
nginx:
image: "nginx"
ports:
- 443:443
volumes:
- /srv/nginx-conf:/etc/nginx/certs
- /srv/nginx-conf/nginx.conf:/etc/nginx/nginx.conf:ro
- /srv/static-content:/usr/share/nginx/html
- /srv/api-cache:/var/cache/nginx
- /srv/nginx-log:/var/log/nginx
Since this machine is on my LAN, I used the nearlyfreespeech plugin I put together for certbot last fall to get a LetsEncrypt certificate for this internal server. Then I copied that along with a config file up to the container host. The configuration was hard to arrive at because cloudfront causes some inscrutable errors with the nginx proxy as a client.
I think the comments in my nginx configuration mostly stand alone. The thing I wasted the most time on is that SSL version mismatch errors can be spurious if you’re not configured properly for SNI as shown below.
# required by nginx even if the defaults are acceptable
events {}
http {
# a writable location on the filesystem for cache entries
proxy_cache_path /var/cache/nginx/propublica-campaign-finance levels=1 keys_zone=propublica-campaign-finance:1024m inactive=1440m;
server {
root /srv/static-content;
index index.html index.htm;
listen 443 ssl;
server_name senatespy-backend.tuxpup.com;
# issued by LetsEncrypt for the above name
ssl_certificate /etc/nginx/certs/fullchain.pem;
ssl_certificate_key /etc/nginx/certs/privkey.pem;
location / {
}
location /propublica-campaign-finance {
error_log /var/log/nginx/debug.log debug;
# let clients talk to the proxy any time. caching will be handled here instead of by clients.
add_header Cache-Control "no-cache, must-revalidate, max-age=0";
# corresponds to a proxy_cache_path keys_zone above
proxy_cache propublica-campaign-finance;
# send down stale data if the cache is updating
proxy_cache_use_stale updating;
proxy_cache_lock on;
# cache any status code for a day
proxy_cache_valid any 1440m;
# disregard these headers set by some back-ends
proxy_ignore_headers X-Accel-Expires Expires Cache-Control;
# upstream API
proxy_pass https://api.example.com/campaign-finance;
proxy_set_header X-API-Key pasteYourRealAPIKeyHere;
# the default proxy doesn't set the right Host: header
proxy_set_header Host api.example.com;
# cloudfront supports current TLS
proxy_ssl_protocols TLSv1.3 TLSv1.2;
# tell the proxy TLS client the host name for SNI
proxy_ssl_name api.example.com;
# tell the proxy TLS client to send SNI
proxy_ssl_server_name on;
# make sure one of these overlaps with what cloudfront offers.
proxy_ssl_ciphers TLS13-CHACHA20-POLY1305-SHA256:TLS13-AES-256-GCM-SHA384:TLS13-AES-128-GCM-SHA256:EECDH+CHACHA20:EECDH+AESGCM:EECDH+AES;
}
}
}