Atom Feed

Debian and Ubuntu Releases

February 13, 2021

Debian Releases

Version Name Release End of Standard Support
11 Bullseye TBA TBA
10 Buster 2019-07-06 2022
9 Stretch 2017-06-17 2020-07-18
8 Jessie 2015-04-25 2018-06-17
7 Wheezy 2013-05-04 2016-06-04
6 Squeeze 2011-02-06 2014-07-19

Ubuntu Releases

Version Name Release End of Standard Support
20.10 Groovy Gorilla 2020-10-22 2021-07
20.04 Focal Fossa 2020-08-06 2025-04
18.04 Bionic Beaver 2018-04-26 2023-04
16.04 Xenial Xerus 2016-04-21 2021-04
14.04 Trusty Tahr 2014-04-17 2019-04
12-04 Precise Pangolin 2012-04-26 2017-04-28
10.04 Lucid Lynx 2010-04-29 2015-04-30
8.04 Hardy Heron 2008-04-24 2013-05-09


Setting Up FastAI Fastbook on a Fresh Ubuntu Instance

January 31, 2021

This is how to set up the fastai environment on a fresh Ubuntu instance, for those of us who have a computer with a good Nvidia graphics card and Ubuntu and don’t want to use a cloud-based platform.

  1. Install nvidia CUDA drivers. So far, the most dependable guide that I’ve found has been this askubuntu post.
  2. Verify that the CUDA drivers have been installed correctly following this guide.
  3. Download fastbook with git clone [email protected]:fastai/fastbook.git && cd fastbook.
  4. Install the latest version of python - sudo apt update && sudo apt install python3.9 python3.9-venv python3-setuptools
  5. Create a virtual environment: python3.9 -m venv env
  6. Activate the virtual environment: source env/bin/activate
  7. Install a build requisite: pip install wheel
  8. Install packages: pip install -r requirements.txt
  9. Launch the notebook with jupyter notebook


Tip for Developer Tools Startups

January 30, 2021

I recently had a really annoying experience with a startup in the developer tools space. I’ve been looking for a better CI platform for a while now (I’ve used Codeship, CircleCI, Travis, and Jenkins) and I found a promising CI platform that relied significantly on open source repositories as plugins to run its build pipelines. Many of these repositories are owned under the company’s own Github organization. Onboarding a few repositories of mine during an initial trial period, I found that there was a major missing feature in one of these plugins which would block me as well as other users from using this CI system effectively. I even found an open Github issue from another user a year earlier that reported the same missing feature. I then went into the plugin, learned the architecture, wrote a patch, added tests and documentation, and submitted five pull requests to fix this missing feature.

I’ve been waiting for three weeks for the company to merge or at least comment on these pull requests. In the meantime, my two week trial period ended which means I can’t even test changes to my pull requests. I therefore have significant doubts whether I should be using this company’s products.

Here’s a tip for dev tools startups: your customers and yourself are likely Github users, and if your customers contribute free feedback or features to your company through Github, you had better be responsive.


A Better Go Defer

October 20, 2020

Go has a defer statement built into the language which allows a function to be executed at the end of another function. This is pretty useful in particular for cleanup (e.g. closing a file handle) or for recovering from errors (because Go code usually contains a lot of if err != nil { return err } and scattering cleanup code everywhere can be visually distracting).

// Real working go code
// Prints out "Start\nmain\nClose
package main

import "fmt"

type Instrumenter struct {}

func (i *Instrumenter) Start() *Instrumenter {
  return i

func (i *Instrumenter) Close() {

func main() {
  i := Instrumenter{}
  defer i.Start().Close()

The funny thing is, although Go has first-class support for functions, defer statements take a a function call as an argument, not just a function declaration or function name - i.e. defer run() and not defer Run. While this may be a minor annoyance while programming, it can be pretty unintuitive when combined with a factory pattern that might use defer with a double function call.

I humbly propose (with heavy doubt it’ll be implemented) that the defer syntax be changed so that defer accepts a function rather than a function call as an argument, thereby making defer accept a continuation that can be invoked at the end of a function call (this does cause some problems with variable mutations that may happen later on, which become even more complicated when exception handling is introduced, but hopefully it can be worked out). I also propose that defer be changed from a built-in statement into a function that can accept a continuation as a parameter.


Covid-19 Economy Predictions

October 13, 2020

(Software-Engineering) Office culture

  • Informal mentorship (whiteboard sessions, (useful) code reviews) will become rare as remote works adds friction to the process.
  • Lack of organic mentorship will make companies not as efficiently extract value from junior employees who require more guidance. Many companies will move to a more Netflix-style model and value senior employees. This will make senior employees more valuable (i.e. paid more) but increase operational risk for those companies as knowledge and leadership will become more concentrated. Junior and new-grads will have an even harder time finding their first few jobs.
  • The risk of people being interpretted incorrectly will increase. This will result in unfocused teams at best and gossip at worst. This hasn’t happened yet because newly remote companies still have teams who know each other from before becoming remote.
  • Impressions (and biases) will form a more significant part of how people work with each other because of limited alternative information.
  • Management has a long way to go in building best practices and tools for remote management. Many management philosophies are still predicated on strong social interactions which break down when working remotely. There is a billion dollar company in building a tool to bridge this gap (and this company will make Zoom look like Cisco).

Macro-economic effects

  • There will be a mass migration of workers from high cost of living to low cost of living areas. While this will create burdens on the real estate market, the more interesting burdens will come from secondary effects like cities and states rebalancing their budgets.
  • Many companies are using Covid-19 as an excuse for layoffs. Smart companies are using the current “employers market” to headhunt future employees.
  • Many companies will use this as an opportunity to shed office-related expenses. This will further accelerate the distribution of employees.


  • A vaccine won’t be developed anytime soon that will be very effective (on the order of a smallpox vaccine). General antiviral and experimental therapies will continue to be researched and used until interest wanes.
  • At some point, the medical establishment is going to come looking for their paycheck. How the insurance and pharmaceutical industry will react will be interesting.
  • There are long-term effects from Covid-19. These effects are so far not well-known but at worst could cause slightly increased morbidity for years among those who have been infected by the virus in the past.
  • There’s going to be an interesting “freakonomics” study in 5 years around the health effects of those who partook in quarantine (e.g. increase in BMI, heart disease, mental stress). Most places didn’t quarantine for long enough for any long term effects to build up though.
  • There’s going to be an interesting “freakonomics” study in 10 years around the health effects from dampened employment rates from Covid. It will be hard to find signal in this noisy data though.


Basic Docker Monitoring

July 4, 2020

  • docker container ls - List all containers and some configuration/status metadata
  • docker ps - same as docker container ls
  • docker stats - top but for docker containers
  • docker top <container> - Snapshot of resources in a single docker container
  • docker inspect <container> - Configuration of a single docker container
  • docker system df - Disk usage of docker entities


Switching From Go Dep to Go Mod

May 30, 2020

On switching from Go’s dep tool to mod:

  • dep ensure turns into go mod download
  • dep ensure -update turns into go-mod-upgrade with the go-mod-upgrade tool
  • dep check turns into go mod verify
  • Gopkg.toml turns into go.mod
  • Gopkg.lock turns into go.sum


Upgrading LibMySQLClient in Python MySQLDB/MySQLClient

May 25, 2020

After upgrading from Ubuntu 18.04 to Ubuntu 20.04, I ran into this error when trying to import python’s mysqlclient:

ImportError: cannot open shared object file: No such file or directory

After spending a while debugging it, I found the python mysqlclient package is a fork of MySQLdb which compiles a _mysql.*.so file which in turn references*. However, gets updated every so often (with the most recent update from 20 to 21) which makes mysqlclient lose track of the version when installed with pip. After trying various ways of clearing build caches, I was able to find a workaround by:

  1. Ensuring libmysqlclient is installed (sudo apt install libmysqlclient-dev)
  2. Cloning the mysqlclient from github (git clone [email protected]:PyMySQL/mysqlclient-python.git)
  3. Manually building mysqlclient (make build)
  4. Copying the generated _mysql.*.so to my virtualenv

I’m still trying to find a better way of doing this so I can get a working mysqlclient after running pip install mysqlclient.

Edit 2020-06-19:

Since I had to do this again recently, here’s a script to automate the above workaround:


# After installing mysqlclient on ubuntu 20.04, run this script to manually
# downgrade libmysqlclient from v20 to v19
# This assumes that you have an active python virtualenvironment

set -exuo pipefail

# sudo apt install libmysqlclient
git clone [email protected]:PyMySQL/mysqlclient-python.git
cd mysqlclient-python
make build
cp MySQLdb/_mysql.*.so $VIRTUAL_ENV/lib/python3*/site-packages/MySQLdb/
cd ..
rm -rf mysqlclient-python


Developing Django in Production

May 15, 2020

Django is still my favorite web framework. While it comes with a neat system for rolling forward and back relational database schema migrations, you still have to populate your test and local databases with fixture data. Instead, you can use this script (configuration required) to connect your local django app to production to help test your code against production data:


# Connect your local django app to a production database

set -ex


# The local port that you want to use to forward to your production database
# This is set to the mysql default port, incremented by 1 so that it doesn't conflict with your local database

# The remote host (or IP) that can access your production database
# This is probably where your app is hosted

# The ip that your production app uses to connect to your production database

# The port that your production app uses to connect to your production database

# Create an ssh tunnel from localhost to production

# Reset everything when this script is killed
function cleanupSSH {
    pkill -f "$REMOTE_IP"
    rm .env
    ln -s .env.development .env
trap cleanupSSH EXIT

# Assuming you keep configs in .env, switch the symlink to a prodlocal config
# This is likely a mixture of development and production configs
rm .env
ln -s .env.prodlocal .env

# Start up django
./ runserver



March 5, 2020

Shamelessly stealing off of Hacker News

Microservices are a design philosophy that people confuse as a deployment strategy


Sendmail Wrapper for Mailgun

March 1, 2020

If you use the sendmail linux CLI and you want to route outgoing emails through mailgun, write this file, make it executable, and add it to your path before the actual exectuable is found:


# Shim for netdata to send emails through mailgun
# filename: sendmail
# suggested location: /usr/local/bin/

# Installation:
# 1.  Write the contents of this script to a file called "sendmail"
# 2.  Fill in the mailgun smtp email and password from
# 3.  sed -i 's/SENDMAIL_PATH_REPLACE_ME/$(mailgun)/g' sendmail
# 4.  `chmod +x sendmail`
# 5.  `sudo mv -n sendmail /usr/local/bin`


# shellcheck disable=SC2068
    -S \
    -au "$MAILGUN_EMAIL" \
    [email protected]


Python Release Support Timeline

December 26, 2019

Since I’ve had a hard time determining when versions of Python are pre-release, supported, or deprecated, here’s a table of all recent python versions:

Version Release End of security fixes
2.7 2010-07 2020-01
3.4 2014-03 2019-03
3.5 2015-09 2020-09
3.6 2016-12 2021-12
3.7 2018-06 2023-06
3.8 2019-10 2024-10
3.9 2020-10 2025-10


Use the Default Flake8 Ignores

December 14, 2019

Flake8 provides a way to ignore PEP8 rules through its --ignore and --extend-ignore flags. The former overwrites a default list of errors and warnings, including W503 and W504 which are mutually incompatible. Therefore, it’s easier to just use --extend-ignore and not use --ignore.


Making Pip Require a Virtualenv

December 5, 2019

Using python pip to install packages without using a virtualenv is generally considered an antipattern. Add this into your ~/.bashrc to make pip require an activated virtualenv before running.

# Do not pip install when not in a virtual environment


Engineering Toolbox

November 30, 2019

If you want to waste a few hours, take a look at The Engineering Toolbox.

(If you want to waste thirty minutes, take a look at This To That.)


Node Timezones

November 1, 2019

Today I debugged some issues with javascript’s time zone support. Unlike most other parts of the Node.js standard libraries, time zone converstion data changes from time to time based on different countries’ whims. Usually, these changes are for minor countries or are minor changes to time zone boundaries, but recently, Brazil decided to end daylight savings with six months notice.

Looking at node specifically, it looks like nodejs’s Intl library depends on ICU, which depends on tzdata. However, even the most current stable version of node as of this writing (v13.0.1) uses ICU version 64.2, which depends on tzdata 2019a which is outdated. On November 3, no stable release of Node will correctly calculate Brazil’s time zone.


Sampling Samples

August 21, 2019

If you had a set of p90 samples, how would you get a p90 of the overall original data? Would you take the p90 of your p90 samples? Would you take the p50 median of your set of p90 samples? It turns out that neither are correct and it’s actually impossible to reliably recover original percentiles from derived percentile data:

If your original data is grouped into a [10, 10, 10, 10] sample and many [0] samples, your p90 samples sould be [10, 0, 0, 0, 0...] and your p90(p90(data)) would turn out to be 0 (your p50(p90(data)) would turn out to be 0 too).

Even with equal sized samples, p90 samples aren’t useful for finding an overall. If we had 10 sets of samples, [1->10], [11->20], etc. until [91->100], our overall p90 would be 90, but our p90(p90(data)) would be 89.


Rotating a NxN Matrix in One Line of Python

July 27, 2019

data = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
[[r[i] for r in data[::-1]] for i in range(len(data))]
[print(r) for r in data]


iTerm2 Search History

July 19, 2019

Like a bunch of people, I keep a dotfiles repository to version control and synchronize my Linux and MacOS configs. One of the things that I store in there is my iTerm2 configuration so that I can synchronize things like same color schemes. It looks like several other people do the same thing.

However, I noticed that iTerm saves searches that you’ve run in the terminal window so iTerm can show history, which means that people are also publishing searches that they’ve made in a private terminal window. These configs can be found inside a file called com.googlecode.iterm2.plist under a config key called NoSyncSearchHistory so I wrote a script to scrape Github for these searches. The results of the search were mostly boring, with terms like kubernetes or user, though I did find someone’s AWS IAM key (I’m currently waiting for a reply from the person about the leakage).


Nginx Auth With IP Whitelists

June 29, 2019

I was looking into a way to configure Nginx to require basic HTTP authentication but to skip authentication for specific IP addresses. I was also running Nginx behind Cloudflare, which obscures the caller IP address. The normal way of reading IP addresses wouldn’t work, so I had to switch up the IP address whitelist to read from a Cloudflare-set header CF-Connecting-IP.

# Create a map of IP addresses to auth configuration
map $http_cf_connecting_ip $auth {
    # Whitelisted ip address has auth off
    "<Whitelisted IP Address>"  "off";
    # Otherwise, auth is enabled
    default                     "Authentication Required";

server {

    # Enable HTTP authentication
    auth_basic           $auth;
    # Set a file with username/password data
    auth_basic_user_file <path to auth file>;



Bash Strict Mode

May 11, 2019

Bash is well-known to be a hard language to write in, with many somewhat nonintuitive syntax requirements and edge cases. However, unlike some languages like Perl or Python, bash is available on basically every Unix machine and is the lingua franca for systems scripting making it very hard to avoid. Therefore, adding this snippet to the top of every script will avoid issues and make errors easier to debug.

set -exuo pipefail

Bonus: use shellcheck to lint bash code and suggest fixes.


Optimizing Asus Routers for Serving Websites With Cloudflare

May 5, 2019

I’ve been serving this site and several others from a personal physical server frontended by Cloudflare rather than using a cloud provider like AWS. One of the things that’s always been a bother has been dealing with dropped packets and in particular, 522 errors from cloudflare. I’ve tried a lot of different things such as changing the netdev budget but one of the things that I think really solved it was looking through my router settings (I have an Asus RT-AC68U) and modifying the firewall to remove DoS protection. It turns out that the DoS protection has been dropping packets for legitimate traffic, and since I use Cloudflare, all web traffic is encrypted and comes from a limited set of Cloudflare IP addresses which probably makes it hard for DoS to recognize the traffic as legitimate.


Browserify, Mochify, Nyc, Envify, and Dotenv

April 1, 2019

I found that it’s possible to mix several javascript libraries together to generate production and test bundles for browsers. I was specifically looking for a way to use

  • browserify - bundling multiple javascript files that require each other together.
  • envify - Interpolating environment variables into javascript code
  • dotenv - Setting environment variables from a file
  • nyc - An istanbul CLI for getting instrumenting code and getting test coverage
  • mochify - A pipeline to run mocha tests in a headless browser with browserify

For a production build with a browserify js script, you can use:

const browserify = require('browserify');


which will generate a bundled javascript output with dotenv variables interpolated with envify. For a test build directly as a shell command, you can use:

nyc --require dotenv/config mochify --transform envify

which will run in-browser tests after applying browserify and envify (with dotenv variables).


Scraping Images From Tumblr

February 24, 2019

With my Reaction.Pics project, I had to scrape a bunch of tumblr accounts for data to assemble its database. Since I was trying to not hotlink to thousands of images, I made local copies of images (about 22 GB raw). However, given the uncurated nature of tumblr posts, I found that there were tons of broken images. Going through them, I noticed a few common themes, including:

  • empty files (I assume from 404s)
  • malformed files (just binary crap)
  • HTML (also mostly 404s from sites that don’t obey Accept HTTP headers)
  • Non-standard images like .raw and .tiff

After processing the database several times with multiple scripts that checked various heuristics like file extension and guessing MIME encoding, I found that the single most useful way of checking images is having Python Pillow parse the image binary:

# Given an image path
path = "abcd.gif"

# Have PIL verify the image
from PIL import Image

After filtering images and removing duplicates, I was able to bring the image databaes down to 8 GB.

Thanks to these sites for providing data:


There Are Too Many NPM Packages

February 10, 2019

I was trying to add modal popups to a website today and trying not to reinvent the wheel and instead use a pre-existing modal package from NPM. However, upon searching for a good package, I found that NPM has 2366 modal packages.

There’s a huge amount of duplicated work here, including lots of packages that integrate with react, bootstrap, vue, browserify, or whatever. Trying to sort packages by popularity or quality gives little benefit and the package that actually owns the "modal" name seems to be dead with people publishing their own dead forks.

My suspicion is that this is a systemic issue from the node community’s encouragement of creating tons of micro-packages multiplied by the ever-increasing number of new javascript web frameworks but other package indices have similar problems. I feel that package managers should do a better job at either recommending well-supported packages or create a higher barrier of entry to people creating new packages. Maybe someone should try namesquatting package names and see where that goes?


Programmers Writing Legal Documents

January 31, 2019

Programmers writing legal documents is like rolling your own crypto. Get an expert to do it.


Solidity Review

November 17, 2018

I took Solidity for a test drive a few months back to try to understand the hype behind it, ethereum, and the crypto space in general. While I’m no expert and I haven’t been keeping up with later developments, I did have a few thoughts on the design of the language.

  • Setting a version at the top of Solidity source code seems like a nice idea and should help with making sure the language spec stays flexible enough for rapid iteration. However, the lack of a fully formed dependency management system is a bigger flaw than a nice language version management system.
  • The fact that all functions default to public visibility rather than a safer alternative like internal, private, or requiring an explicit visibility seems dangerous to me, especially given the “contract” goals of Solidity.
  • The overall syntax seems to borrow from several languages but most heavily from javascript, java, and python. It is nevertheless still quite clean and intuitive.
  • The stdlib is still quite small and mostly consists of ethereum-specific logic and some mathematical functions. Of course, Solidity isn’t meant to be a general purpose programming language so it doesn’t need much more.
  • The lack of testing frameworks on Solidity (and the minimal testing of Solidity itself) really scares me.
  • There seem to be multiple bindings for other languages to call Solidity functions. This sounds like a nice idea, but they seem to be written as wrappers. Is there benefit here from having other more popular languages running Ethereum operations natively instead of wrapping Solidity?



November 9, 2018

I found a neat tool that digs into your CPU setup called likwid. This is the topology for this server:

$ likwid-topology
CPU name:   Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
CPU type:   Intel Core Haswell processor
CPU stepping:   3
Hardware Thread Topology
Sockets:        1
Cores per socket:   4
Threads per core:   1
HWThread    Thread      Core        Socket      Available
0       0       0       0       *
1       0       1       0       *
2       0       2       0       *
3       0       3       0       *
Socket 0:       ( 0 1 2 3 )
Cache Topology
Level:          1
Size:           32 kB
Cache groups:       ( 0 ) ( 1 ) ( 2 ) ( 3 )
Level:          2
Size:           256 kB
Cache groups:       ( 0 ) ( 1 ) ( 2 ) ( 3 )
Level:          3
Size:           6 MB
Cache groups:       ( 0 1 2 3 )
NUMA Topology
NUMA domains:       1
Domain:         0
Processors:     ( 0 1 2 3 )
Distances:      10
Free memory:        4729.45 MB
Total memory:       15710.8 MB


My First Server's IP

November 9, 2018

As a freshman at MIT, I decided that it would be a fun way to get a head start on my CS classes by setting up a webserver and playing around with ubuntu. As I learned later, a typical MIT Course 6 curriculum doesn’t care much for practical devops abilities (more on that subject for a later note), but I still remember the IP address that was first allocated to me for setting up my server at -

I noticed recently that MIT sold off large portions of its class A IP address range, which included the entire range including my old IP address. I assume that the sold IP addresses will be made available randmoly as part of AWS’s Elastic IP Addresses pool for people to attach to AWS hosts and expose to the wider internet. While still seems unallocated, I wish best of luck to whoever gets the IP address (unless I happen to grab it first).


Installing Netdata

September 23, 2018

  1. sudo apt install netdata
  2. Check that netdata is running on the default port 19999. Note that netdata does not by default respond to requests on public interfaces. I therefore used an SSH tunnel from my laptop (ssh -fN -L 19999:localhost:19999 to connect to netdata (localhost:19999).
  3. sudo apt install apache2-utils nginx
  4. Write an nginx config file to proxy traffic to netdata.
  5. Generate a username/password: sudo htpasswd -c netdata-access $USER
  6. sudo service nginx reload


Interrobang Versus Shebang

July 10, 2018

Interrobang - A punctuation mark used to combine a question mark and exclamation mark


Shebang - A character sequence that is used to specify the interpreter for a unix script


Bad Interview Questions

July 8, 2018

I recently saw an advertisement for a tech recruiting company with this sample interview question:

I don’t know what interviewing signal one could gain by asking about implementation details of floating-point arithmetic.


Showing Users in Different Databases

July 7, 2018






SELECT user, host FROM mysql.user;


Some MIT (Undergraduate) Admissions Interview Advice

July 4, 2018

I’ve been doing MIT admissions interviews for two years now and I’m planning on doing a third this Fall. So far, I’ve had a fun time talking to about 15 high school seniors about their interests and college plans. I’ve noticed a few common themes in interviews:

  1. Students are pretty reticent. Although nervousness is certainly a contributing factor here, it’s hard to have an interesting one sided conversation.
  2. Students say they have interests in some subject area but when asked to go deeper, to explain what about an area interests them, or how they’ve scratched their interests by investing time learning about their interests, they don’t have many examples. Of course, not everyone is afforded opportunities that match their interests, and it’s much easier to demonstrate interest in something like computer science than nuclear engineering. However, saying that one is interested in something but not doing anything about it doesn’t give much credence to the former.
  3. Many students concentrate too hard on either just academics or have participated in way too many extracurriculars. Either way, both extremes limit real exposure to different ways of thinking and as a side effect creates pretty cookie cutter stories during interviews.


Optimize the Develop-Test-Debug Cycle

April 22, 2018

(This is essentially a rewrite of core-metric-developer-productivity).

Increasing developer efficiency is oftentimes one of the highest priority goals of engineering orgs at companies ranging from startups to megacorps. With the high price of developers and notoriously bad adherence to deadlines, it’s quite normal for a company to want to speed up development time. Many organizations try various process tricks that read like it was written by an MBA student - agile, scrum, kanban, lean, to name a few. While many of those processes are good ideas, I feel many people lose focus on engineering issues that cause bad developer productivity.

I think that a core principle for optimizing and scaling developer efficiency is that many companies operate on a loose form of test driven development, and that most programming work is spinning within a cycle of “development” (writing production code to satisfy some business problem), “testing” (asserting the correctness of “development"), and “debugging” (making fixes to “development” given the feedback from “testing"). Many developments in computer science and software engineering have been aimed at speeding up (e.g. usage of IDEs) or short-circuiting (e.g. strongly typed languages) this cycle. I would recommend that any company which relies on an effective engineering department pay attention to the speed at which developers can move through this develop-test-debug cycle.


Example of Python Subprocess

March 23, 2018

I needed to test out a thing with the subprocessing module and wrote an example of using it to run multiple processes in parallel, then reading all the stdouts and stderrs at the end:

import subprocess
import time

commands = ['./'] * 5
outputs = [None] * len(commands)
processes = [None] * len(commands)

start = time.time()

for i, command in enumerate(commands):
    process = subprocess.Popen([command], stdout=subprocess.PIPE)
        processes[i] = process

for i, process in enumerate(processes):
    outputs[i] = process.communicate()
        print(i, outputs[i])

print('elapsed seconds: ', time.time() - start)

sleep 2


Spotted in Taiwan

January 20, 2018

Eslite bookstores (and most other bookstores in Asia) have really good stationery sections.

There are a lot of mopeds. Some people put blue or red lights on their personal mopeds so they look like police and get right of way.


Fixing "Fatal Error: Python.h: No Such File or Directory"

December 16, 2017

Some python packages, notably uWSGI and mypy, require access to a Python.h file to compile C bindings and they’ll fail with an ugly fatal error: Python.h: No such file or directory error if one can’t be located on your system. Stackoverflow gives a pretty good answer on fixing this, but I want to amend the top answer in case your system has multiple versions of python 3 (e.g. you’re using an Ubuntu PPA).

If you do have multiple python versions available, (on an Ubuntu system) explicitly specify the python version of the dev tools package, e.g.

sudo apt install python3.6-dev


Cassandra Primary Keys

December 11, 2017

Cassandra schemas can be a bit hard to design and are especially important to design correctly because of the distributed nature of Cassandra. Many new users of Cassandra try to design schemas similar to relational databases because of CQL’s similar syntax to SQL.


CREATE TABLE example (
  key uuid PRIMARY KEY

Composite Key

CREATE TABLE example (
  key1 text,  // partition key:  determines how data is partitioned across nodes
  key2 int,   // clustering key: determines how data is sorted within a partition
  PRIMARY KEY(key1, key2)

The goals for designing cassandra keys (stolen from the datastax documentation) are:

  1. Spread data evenly around the cluster
  2. Minimize the number of partitions read


MyPy Review

November 2, 2017

I recently added type annotations to two of my projects git-browse and git-reviewers using mypy and found it to be relatively enjoyable. Adding types to python definitely helps to make code self-documenting and effectively increases the number of tests in your code. There are a few large issues though for anyone trying to add type annotations:

  1. In order to use python’s type annotation syntax (rather than type comments), your code must be Python 3 only. (Yes, you should use python 3 regardless of whether you’re adding type annotations)

  2. You must have library stubs of your imports so that MyPy can infer types. So far, there are very few library stubs available and even some extremely popular packages like Flask aren’t covered. This limits type checking to packages with few if any external dependencies.

Adding in type annotations, I also ran into a few issues:

  1. The docs are quite good but given python typing’s obscurity, it’s still hard to find answers for more esoteric features.

  2. The syntax for default values, e.g. (from the mypy docs):

def greeting(name: str, prefix: str = 'Mr.') -> str:
   return 'Hello, {} {}'.format(name, prefix)

puts the default value after the type annotation. For a person who hasn’t worked with the type syntax before, at first glance it looks like a string value is being assigned to str within a dictionary.

  1. MyPy requires newly instantiated empty iterables (lists, sets, dictionaries) to include annotations so it can type check elements. However, the native python syntax has no support for it which requires adding type comments, resulting in:
data = [] # type: List[int]
  1. The comment syntax has a bug where its types require imports which set off linters like Flake8 as an unused import. From the above example, this ends up requiring odd code to pass both the flake8 linter and mypy:
from types import List  # NOQA
data = []  # type: List[int]

MyPy is still under heavy development with significant hands-on support from Guido van Rossum himself. Overall though, adding in types was still a relatively easy and useful exercise and helped prompt some refactorings.

PS - Turns out that python-markdown2 has a bug when rendering code fences inside of lists.


Griping About Time Zones

October 26, 2017

Daylight savings is ending in two weeks and I’m going to gripe about it. Not the usual gripe about people having to adjust their clocks and schedules (I for one, actually like daylight savings), but about how basically everyone writes time zones incorrectly. When people give a time with a time zone, such as “five o’clock in San Francisco”, many people just write “5 PDT” or “5 PST” without giving a thought where that “D” or “S” come from. Well, those acronyms stand for “Daylight” and “Standard,” respectively, so you should only use “PDT” to refer to times in the summer, and “PST” refer to times in the winter. If you want to be generic, you can use “PT” for everything and let context take care of the exact time zone.


Bundling Python Packages With PyInstaller and Requests

September 23, 2017

I recently tried using PyInstaller to bundle python applications as a single binary executable. Pyinstaller was relatively easy to use and its documentation is pretty good. However, I ran into a bit of trouble bundling the python requests package because of problems with requests looking for a trusted certificates file, usually emitting an error like OSError: Could not find a suitable TLS CA certificate bundle, invalid path: .... In a typical installation, the certifi package includes a set of trusted CA certificates but when PyInstaller bundles the requests and ceritifi packages, certifi can’t provide a file path for requests to use.

The way to fix this is to set the REQUESTS_CA_BUNDLE variable (documentation) within your code before using requests:

import pkgutil
import requests
import tempfile

# Read the cert data
cert_data = pkgutil.get_data('certifi', 'cacert.pem')

# Write the cert data to a temporary file
handle = tempfile.NamedTemporaryFile(delete=False)

# Set the temporary file name to an environment variable for the requests package
os.environ['REQUESTS_CA_BUNDLE'] =

# Make requests using the requests package

# Clean up the temp file


Go Receiver Pointers vs. Values

September 4, 2017

When writing a method in Go, should you use a pointer or a value receiver?

Type Use
Basic Value
Map Value
Func Value
Chan Value
Slice (no reslicing/reallocating) Value
Small Struct/Array Value
Concurrent mutations Value if possible
Is Mutated By Method Pointer
Large Struct/Array Pointer
Contains a sync.Mutex Pointer
Contains Pointers Pointer
🤷 Pointer

Distilled from Golang Code Review Comments


Fixing Latency

September 1, 2017

Last night, I finally discovered and fixed the reason for very high latencies for Many of the data-heavy pages on have had multi-second response times and although a Django/MySQL site on a minimally provisioned server isn’t the epitome of performance engineering, I’ve always bet it should run faster. After four years of optimizing parts of the website, I finally find a way to reduce latencies by an order of magnitude and bring sub-second response times.

These were some of the things that I tried which gave relatively minimal benefit:

  • Adding memcached to cache model data and view partials
  • Adding a Cloudflare CDN
  • Upgrading the server, particularly increasing CPU cores and memory
  • Optimizing the MySQL configuration
  • Rate limiting web crawlers
  • Denormalizing database models

What I did last night is by checking django-silk, I noticed that on certain pages, multiple simple but slow SQL queries were made and filtered by indexed fields. Some of these queries were taking on the order of hundereds of milliseconds, over 10X the latency of the same query on a local unoptimized virtual machine. Digging deeper with EXPLAIN queries, and checking the database schema, I found several indices were missing. Although the indices were included in the models, they were never added (or dropped) some time ago, probably by Django South, Django’s old migration tool. Evidently, one should not rely too much on ORMs, and manually checking your MySQL schemas can result in some amazing latency improvements:

StatsOnIce Latencies


Showing Schemas in Different Databases

August 26, 2017


describe keyspace <keyspace>;




FROM information_schema.columns
WHERE table_schema = '<DATABASE NAME>'


Straight Lines

June 2, 2017

There are a ton of straight lines commonly seen in English text.

  • - the hyphen, used to join breaks within words or compound words
  • - the en-dash, used for spans of numbers or compound adjectives
  • - the em-dash, used in place of colons, commas, and parentheses
  • _ - the underscore, originally used to underline words by typewriters
  • | - the vertical bar of many uses.

This doesn’t even include the dozens of singular lines available in Unicode.


Emerson on Intellect

May 29, 2017

Before Pycon, I visited Powell’s City of Books and stumbled into the fiction aisle of D through H authors. I sampled some good books by Dostoyevsky, Dumas, and Hemingway but a particular passage stuck in my memory:

Every human being has a choice between truth and repose. Take which you please, you cannot have both. Between these, as a pendulum, man oscillates. He in whom the love of repose predominates will accept the first creed, the first philosophy, the first political party he meets– most likely his father’s. He gets rest, commodity, and reputation; but he shuts the door on truth. He in whom a love of truth predominates will keep himself aloof from all moorings and afloat. He will abstain from dogmatism and recognize all the opposite negations between which, as walls, his being is swung. He submits to the inconvenience of suspense and imperfect opinion but he is a candidate for truth, as the other is not, and respects the highest law of being.

Ralph Waldo Emerson Essays: First Series Essay XI Intellect


Core Metric for Developer Productivity

May 21, 2017

A lot of software companies are concerned about the concept of developer productivity and maximizing the amount of output each engineer produces. The problem is that few companies measure productivity in any quantitative manner, instead using subjective metrics like developer happiness or irrelevant metrics like lines of code. Additionally, companies frequently confuse product management processes, like agile development, with helping software engineering metrics. This results in software being developed slowly and quite a lot of grumbling when the scrum master burns hours of each engineer’s team on sprint planning.

Looking around, I think that the core metric for developer productivity should be the average frequency of the edit/test/debug cycle. In all but the most intellectually challenging of programming problems, there is a clear direction of what needs to be built but the limiting factor is making sure the programmer’s code works as intended. Optimizing the edit/test/debug cycle therefore makes developers produce value faster. Many features in IDEs and modern development practices are aimed at shortening steps in the cycle, reducing the number of cycle iterations needed to complete the development work, or helping developers stay within the cycle. Indeed, Joel Spolsky’s famous test for programming environments can be summarized as checking the health of a development team’s edit/test/debug cycle. I therefore hope that when you evaluate processes and products to help be more productive, you think of how it will benefit your edit/test/debug cycle.


How to Capture a Camera Image With Python

May 7, 2017

While working on sky-color, I found that taking a photo using a webcam with python was pretty hard. opencv has some pretty opaque documentation since it’s primarily written for C developers and simplecv is dead and doesn’t support python 3. Stackoverflow is also filled with outdated incorrect answer. I therefore had to figure out a way to take a photo and save it to a file myself using python 3.6 and MacOS.

Prerequisites: Install numpy and opencv. My requirements.txt file looks like:



import time
import cv2

camera_id = 0
file_name = 'image.png'
cam = cv2.VideoCapture(camera_id)
time.sleep(1) # Give some time for the webcam to automatically adjust brightness levels
ret_val, img =
cv2.imwrite(file_name, img)

Further reference: OpenCV API


Python Has a Ridiculous Number of Inotify Implementations

May 2, 2017

Mostly stolen from watchdog’s readme:

Looking through a few of these, I think I recommend watchdog and inotify_simple.


Projects: Gentle-Alerts

April 27, 2017

Gentle-Alerts is a chrome extension that I built to fix the problem of noisy popup alerts in Chrome. Using Google Calendar a lot, I used to get a popup alert before every event that I was invited to. Fiddling with its built-in “browser notifications”, I wasn’t very satisfied because of its pop-over UX. I therefore decided to create Gentle-Alerts to solve this problem for Calendar and all other websites.

Gentle-Alerts works by overriding the window.alert built-in function with a custom function that shows a browser modal. In building Gentle-Alerts, I had some fun with some different frontend programming rules. Rather than the usual problem of writing javascript code that has to be compatible with different browsers with a known environment, writing the javascript for Gentle-Alerts required me to write javascript code compatible specifically for Chrome but running against the javascript environment of any website. I therefore kept the code pretty simple and used only vanilla javascript without any third-party dependencies.

Thanks to Chris Lewis, David Hamme, Song Feng, and Scott Kennedy for testing the extension.


Creating a New PyPI Release

April 24, 2017

As a reminder to myself for the magic incantations for uploading a repository to PyPI:

pip install twine
python sdist bdist_wheel
twine upload dist/*


Eva Air USB Ports

April 24, 2017

I just got off an Eva Air flight which had in-seat USB ports not only for power but also for data. I found that when I plugged in USB keys, it could navigate through FAT32 and NTFS memory sticks which makes me think that the in-flight entertainment system was based off of an embedded Windows OS. Several of the games in the system also had multiplayer modes which would mean that there must be some LAN within the plane and since the plane’s audio announcement system could pipe audio through the seatback system, that must be connected as well.

Although I doubt the Boeing 777’s designers would also link up flight-critical systems like avionics, there is something to be said about the possibilities arising from putting some sufficiently determined hacker on a plane with Wi-Fi, an electrical socket, physical access to a Windows-backed USB port, and twelve-plus hours of boredom.

From the Eva Air Website:

In-seat USB Port

If you are traveling on our selected B777-300ER (Royal Laurel Class) and A330-300 aircrafts (Premium Laurel Class), you can navigate through PDF files, photos and other multimedia content stored in your storage devices (iPod, USB flash drive**, AV connector-enabled device, etc.) on your seat-back screen. Instructions are shown on the screen once connected.


Projects: Git-Browse

March 18, 2017

I’ve recently worked on a project called Git-Browse to help look up information in github and uber’s phabricator. Quite often, I’ve found the need to look up information about a git repository in order to share code with people, find history, or file issues. Having to manually look up the repository on github or phabricator takes excessive time and can easily lead to incorrect information from looking at forks. Git-Browse solves the problem by introspecting a git repository’s .git/config file and automatically opening the git repository in a browser. Git-Browse can then be integrated in your local or global .gitconfig as an alias so you can open repository objects with git browse <path>.

While working on git-browse, I found that this is similar to github hub’s browse but git-browse would be a lot easier to support additional repository hosts. Hub doesn’t support opening arbitrary branches or commits either, but it does support opening issues and wikis.

Git-Browse requires python 3 to run. Install it by following the Readme Instructions.


Cassandra Compaction Strategies

March 5, 2017

When setting up Cassandra tables, you should specify the compaction strategy Cassandra should use to store data internally. To do so, just add

WITH compaction = { 'class': '<compactionName>' }

to an ALTER TABLE or CREATE TABLE command.

Name Acronym Used For
SizeTieredCompactionStrategy STCS Insert-Heavy Tables
LeveledCompactionStrategy LCS Read-Heavy Tables
DateTieredCompactionStrategy DTCS Time Series Data


Code Is Like Tissue Paper

January 25, 2017

Code is like Tissue Paper, it falls apart after one use

Code is like Tissue Paper, there are holes everywhere

Code is like Tissue Paper, it sucks to have to use someone else’s

Code is like Tissue Paper, you get a new one, even for the same problems

Code is like Tissue Paper, you should not feel bad when you throw it away

Code is like Tissue Paper, there are many layers

Code is like Tissue Paper, other people won’t like to use yours


Seen in a Bathroom Stall at MIT

January 24, 2017

Do you compute?
No, I come poop


Underused Python Package: Webbrowser

January 21, 2017

While I was working on git-browse (post coming soon), I found out about python’s webbrowser package. It’s a super-simple way of opening a URL in one of many different browsers. Python’s standard library is pretty awesome.


Pax ?

January 5, 2017

There’s an idea in (popular) political science about Pax Romana, the idea that for several hundred years, Ancient Rome’s preponderance of power created an atmosphere of relative peace both inside and outside its borders. This was supported by a massive, well-organized military who’s the job it was to enforce Roman law inside its borders, and suppress any enemies internal or external.

This idea has been applied to other cases, in particular Pax Brittanica and Pax Americana. In all three of these cases, the hegemon has had a dominating military in a specific battlespace compared to any rival. Rome had superior land armies, Britain had superior navies, and the US has a superior air (and space) force. States were thus geared for supporting these forces and reaping the economic benefits they gave. Rome’s armies sustained Rome’s economies through slaves, Britain through mercantile trade, and the US through rapid global soft and hard power projection.

The question therefore is what does the future hold for hegemonic peace? We’re probably in the middle of a Pax Americana so during the next few decades we’ll have 1) continued Pax Americana, 2) a shift of power to a different hegemon (as happened to Pax Britanica at the end of the 19th century), or 3) the world order devolves into having several localized powers that do not necessarily cooperate for mutual peace and gain (as happened after Pax Romana in the Middle Ages). Since military and economic power are tightly linked and interdependent, I believe the new economic shift to the Internet will require a new hegemon to develop and maintain technical and information superiority to challenge the current world order. Many countries, particularly Russia and China, have developed these cyberwarfare capabilities and successfully tested them against adversaries. The US also has significant capabilities but cyberwarfare is so far in such a nascent state that it’s hard to establish dominance. Hopefully, it won’t require a real test like a repeat of the Napoleonic wars or WWI/WWII.


Golang Review

January 2, 2017

I’ve been using Go for several projects at Uber and personally. From my experience, I’ve developed some opinions on the Go programming language, from both objective and subjective points of view. Compared to other languages, I find that Go has much to be desired in terms of its language design.

Let’s start off with the nice points. Go is pretty opinionated about its development setup with standardized layouts of packages and build systems. Language-wise, it has a simple, easy-to-learn syntax that can be easily learned by anybody with backgrounds in C, Java, or Python. It’s statically typed, expressive while legible, and has understandable concurrency primitives.

On the other hand, Go has a lot of downsides in terms of environment and language. Environment first. A glaring issue with Go is dependency management. Go’s development layout assumes you’re working on a monorepo (more on this later). If you’re not working on a monorepo and/or you don’t have direct control over your dependencies, you’re going to have a bad time trying to keep your dependencies up to date without breaking things. Many tools have been written to work around this, but the core issue is that many Go packages aren’t themselves versioned. It seems semantic versioning and even change logs seem to be new and controversial in many Go communities.

My other gripe with Go’s environment is the work it takes to set up and maintain a new project. A go repo not only requires the entire $GOPATH to be set up, with attendant dependencies and repositories, but also requires Makefiles, and a lot of testing boilerplate. Arguably, in a monorepo-style development process, this is a cost that scales O(1) instead of O(n), but I feel that in many Go projects, the amount of Bash and scripting language (most commonly python) code exceeds the amount of actual Go code. Maintaining this (often untested and hacky) code is an extra source of work for a Go developer.

As for the language itself, I’ll freely echo the common criticism about Go’s lack of generics and while I have gotten used to the lack of exceptions, this all seems to oversimplify the language. Many nice programming idioms found in other languages (e.g. monads, classes, duck typing) are unavailable in Go due to the restricted feature set.

From these points, I find that Go is a fine language for large teams of average programmers creating large monorepo systems. Go caters to this demographic of programmers by being essentially a compilable, statically-typed BASIC while ignoring the last several decades of Programming Language Theory.


Wadler's Law

December 15, 2016

In any language design, the total time spent discussing
a feature in this list is proportional to two raised to
the power of its position.

0. Semantics
1. Syntax
2. Lexical syntax
3. Lexical syntax of comments


Tunnel V2

December 8, 2016

I posted a handy single-line trick to forward connections from one ip/port address to another. I recently had a problem where I needed to forward to a port on a local host but the service was only listening on the public interface. OpenSSH however tries to be clever and rewrites the public ip to localhost, which isn’t overridable (the tunnel entrance is overridable but not the exit, which is what I was trying to change).

I therefore present tunnel v2, using NodeJS instead of SSH:

'use strict';

var net = require('net');
var process = require('process');
var console = require('console');

// parse "80" and "localhost:80" or even ""
var addrRegex = /^(([a-zA-Z\-\.0-9]+):)?(\d+)$/;

var addr = {
    from: addrRegex.exec(process.argv[2]),
    to: addrRegex.exec(process.argv[3])

if (!addr.from || ! {
    console.log('Usage: <from> <to>'); // eslint-disable-line no-console
    throw new Error('Not enough arguments');

net.createServer(function onServer(from) {
    var to = net.createConnection({
}).listen(addr.from[3], addr.from[2]);

Adapted from Andrey Sidorov



December 5, 2016

I have a particular interest in multicolor pens and I’ve developed a taste for them over the years since using the common BIC 4-Color Ballpoint Pen. Those pens only have four basic colors, and the actual tips and ink result in inconsistent sticky ink flow. BIC also makes 4-Color pens with finer points but that only results in spotty writing.

Next up, I’ve also used the Zebra Multi Color Pen which is a similar pen but with an additional pencil included. The ink and ballpoint is also slightly better than the BIC.

For those who optimize for options, there’s 6 color and 10 color pens. If the ridiculousness of the number of colors doesn’t keep you from using the pens every day, the problem with having so many colors is that their individual ink sticks are more off-center. This causes more bending of the ink and can make the side of a ballpoint pen scrape along paper. It also makes writing more spongy because of the increased room and bent existent within the pen.

The pen that I use now is the Uni Jetstream Pen. It contains the four standard black, red, green, and blue colors, plus a pencil and eraser. It has some reasonable weight, the writing is consistent, and it looks pretty professional.


SSH Tunnel

September 18, 2016

This is a handy command to memorize:

ssh -fN -L $port1:$host1:$port2 $host2

This allows you to make requests to $host1 on $port1 to instead hit $host2 on $port2.


That Time I Was a Whitehat Hacker

September 18, 2016

I’ve been trying to find a replacement for github streaks after they removed them a few months ago. I was pretty happy to find GithubOriginalStreak which had browser plugins for Chrome, Firefox, and Opera. After installing the plugin and noticing it wasn’t correctly reporting streak lengths, I dug into its source code and was surprised to see it was using github gists as a datastore for streak information.

This presented a few problems:

  1. Github gists aren’t supposed to be used as a high performance database. Github probably rate limits access to its data.
  2. The packaged browser extensions contain read/write keys to the account that owns the github gist. The GithubOriginalStreak repository itself doesn’t have the keys but the keys are easily extractable from the extensions anyways.
  3. Neither the gist nor the code does any validation of incoming data before the supposed gist lengths are displayed inline in the Github page.

This last problem was the most critical. A malicious attacker could have gotten write-privileges by downloading and unpacking the extension, then modified the gist to inject an XSS attack into someone else’s browser. The best part is that the gist contains a list of all people who use the extension so you could target a specific person for XSS.

I talked to the author afterwards and thankfully he was receptive of the feedback. The extension is still using Github gists but is now doing some data validation. With the new profile design, extensions like these shouldn’t be needed anymore.


Comparison of Country and Company GDPs

September 8, 2016

I’ve noticed that many companies have yearly revenues on the order of many (non-insignificant countries). With countries though, yearly revenues are usually called gross domestic product (GDP). I therefore present a comparison of national revenues and corporate GDPs:

Company? Rank within type GDP (billions of USD) Name
1 18,558 USA
2 11,383 China
23 509 Taiwan
Y 1 482 Walmart
24 474 Poland
37 306 Israel
Y 6 305 Samsung
38 302 Denmark
39 295 Singapore
Y 7 273 Royal Dutch Shell
Y 8 270 Vitol
Y 9 268 ExonMobil
40 266 South Africa
42 253 Columbia
Y 11 245 Volkswagen
43 235 Chile
44 234 Finland
Y 12 234 Apple
45 226 Bangladesh

This table does not include state-owned companies. Fun fact - Vitol, which has a revenue of $270B is headquartered in Switzerland, which has a GDP of $652B.



Sketching Science

September 8, 2016

Sketching Science


Tech Hiring Misperceptions at Different Companies

July 22, 2016

I’ve seen many companies’ technical interview processes and I feel many of them are wrong. For almost any advice about technical interviewing, you’re likely to hear the opposite advice from a different person. What people don’t realize is that different types of companies need different types of engineers, and that you can’t select the correct type of engineer if you’re interviewing for the same qualities.

Many people, especially those firmly in the Silicon Valley Blogosphere, opine that you should only hire ninja rockstar jedi engineers (nobody calls them that anymore, but the mindset is still there) for your startup (because you obviously work at a startup, right?) whether your startup is trying to sell a static program analysis engine or be an Uber for X company. The fact is that for most companies, good software will not save a failing company and bad software will not sink a successful company (case in point: Yahoo and your typical government contractor, respectively). Therefore, if you’re in the category of companies whose success doesn’t depend on the strength of your engineering organization (which is most companies), I think you should stop trying to attract really good engineers - they’re going to get bored writing yet another CRUD app and you’re going to be paying a lot more money.


Calculating Rails Database Connections

June 26, 2016

I recently ran into a problem with calculating the number of database connections used by Rails. It turns out that for a typical production environment, it’s actually hard to find the maximum number of connections that would be made. MySQL and PostgreSQL also have relatively low default maximum connection limits (151 and 100, respectively) which means it’s really easy to get an error like “PG::ConnectionBad: FATAL: sorry, too many clients already.”

After some digging, I believe the formula for getting the maximum number of open connections is by multiplying the “pool” value in config/database.yaml against the number of processes (workers in Puma). If you’re running sidekiq or other background job processor, you’ll also need to add in the number of background processors into your web server’s processes count.


DevOps Reactions

June 12, 2016


Tuning Postgres

June 9, 2016

I was trying to tune a PostgreSQL database and found that default settings for PostgreSQL are optimized for really old computers. If you want to fix it, you can spend an hour reading through documentation, or you can use pgtune. If CLI isn’t your thing, you can try the web version.



June 4, 2016

Romanesco Broccoli

(Romanesco broccoli + Fibonacci)