Notes
Atom FeedDebian and Ubuntu Releases
Debian Releases
Version | Name | Release | End of Standard Support |
---|---|---|---|
11 | Bullseye | TBA | TBA |
10 | Buster | 2019-07-06 | 2022 |
9 | Stretch | 2017-06-17 | 2020-07-18 |
8 | Jessie | 2015-04-25 | 2018-06-17 |
7 | Wheezy | 2013-05-04 | 2016-06-04 |
6 | Squeeze | 2011-02-06 | 2014-07-19 |
Ubuntu Releases
Version | Name | Release | End of Standard Support |
---|---|---|---|
20.10 | Groovy Gorilla | 2020-10-22 | 2021-07 |
20.04 | Focal Fossa | 2020-08-06 | 2025-04 |
18.04 | Bionic Beaver | 2018-04-26 | 2023-04 |
16.04 | Xenial Xerus | 2016-04-21 | 2021-04 |
14.04 | Trusty Tahr | 2014-04-17 | 2019-04 |
12-04 | Precise Pangolin | 2012-04-26 | 2017-04-28 |
10.04 | Lucid Lynx | 2010-04-29 | 2015-04-30 |
8.04 | Hardy Heron | 2008-04-24 | 2013-05-09 |
Setting Up FastAI Fastbook on a Fresh Ubuntu Instance
This is how to set up the fastai environment on a fresh Ubuntu instance, for those of us who have a computer with a good Nvidia graphics card and Ubuntu and don’t want to use a cloud-based platform.
- Install nvidia CUDA drivers. So far, the most dependable guide that I’ve found has been this askubuntu post.
- Verify that the CUDA drivers have been installed correctly following this guide.
- Download fastbook with
git clone [email protected]:fastai/fastbook.git && cd fastbook
. - Install the latest version of python -
sudo apt update && sudo apt install python3.9 python3.9-venv python3-setuptools
- Create a virtual environment:
python3.9 -m venv env
- Activate the virtual environment:
source env/bin/activate
- Install a build requisite:
pip install wheel
- Install packages:
pip install -r requirements.txt
- Launch the notebook with
jupyter notebook
Tip for Developer Tools Startups
I recently had a really annoying experience with a startup in the developer tools space. I’ve been looking for a better CI platform for a while now (I’ve used Codeship, CircleCI, Travis, and Jenkins) and I found a promising CI platform that relied significantly on open source repositories as plugins to run its build pipelines. Many of these repositories are owned under the company’s own Github organization. Onboarding a few repositories of mine during an initial trial period, I found that there was a major missing feature in one of these plugins which would block me as well as other users from using this CI system effectively. I even found an open Github issue from another user a year earlier that reported the same missing feature. I then went into the plugin, learned the architecture, wrote a patch, added tests and documentation, and submitted five pull requests to fix this missing feature.
I’ve been waiting for three weeks for the company to merge or at least comment on these pull requests. In the meantime, my two week trial period ended which means I can’t even test changes to my pull requests. I therefore have significant doubts whether I should be using this company’s products.
Here’s a tip for dev tools startups: your customers and yourself are likely Github users, and if your customers contribute free feedback or features to your company through Github, you had better be responsive.
PermalinkA Better Go Defer
Go has a defer
statement built into the language which allows a function to be executed at the end of another function. This is pretty useful in particular for cleanup (e.g. closing a file handle) or for recovering from errors (because Go code usually contains a lot of if err != nil { return err }
and scattering cleanup code everywhere can be visually distracting).
// Real working go code
// Prints out "Start\nmain\nClose
// https://repl.it/@varsnap/Go-Defer-Example
package main
import "fmt"
type Instrumenter struct {}
func (i *Instrumenter) Start() *Instrumenter {
fmt.Println("Start")
return i
}
func (i *Instrumenter) Close() {
fmt.Println("Close")
}
func main() {
i := Instrumenter{}
defer i.Start().Close()
fmt.Println("main")
}
The funny thing is, although Go has first-class support for functions, defer
statements take a a function call as an argument, not just a function declaration or function name - i.e. defer run()
and not defer Run
. While this may be a minor annoyance while programming, it can be pretty unintuitive when combined with a factory pattern that might use defer
with a double function call.
I humbly propose (with heavy doubt it’ll be implemented) that the defer
syntax be changed so that defer
accepts a function rather than a function call as an argument, thereby making defer
accept a continuation that can be invoked at the end of a function call (this does cause some problems with variable mutations that may happen later on, which become even more complicated when exception handling is introduced, but hopefully it can be worked out). I also propose that defer
be changed from a built-in statement into a function that can accept a continuation as a parameter.
Covid-19 Economy Predictions
(Software-Engineering) Office culture
- Informal mentorship (whiteboard sessions, (useful) code reviews) will become rare as remote works adds friction to the process.
- Lack of organic mentorship will make companies not as efficiently extract value from junior employees who require more guidance. Many companies will move to a more Netflix-style model and value senior employees. This will make senior employees more valuable (i.e. paid more) but increase operational risk for those companies as knowledge and leadership will become more concentrated. Junior and new-grads will have an even harder time finding their first few jobs.
- The risk of people being interpretted incorrectly will increase. This will result in unfocused teams at best and gossip at worst. This hasn’t happened yet because newly remote companies still have teams who know each other from before becoming remote.
- Impressions (and biases) will form a more significant part of how people work with each other because of limited alternative information.
- Management has a long way to go in building best practices and tools for remote management. Many management philosophies are still predicated on strong social interactions which break down when working remotely. There is a billion dollar company in building a tool to bridge this gap (and this company will make Zoom look like Cisco).
Macro-economic effects
- There will be a mass migration of workers from high cost of living to low cost of living areas. While this will create burdens on the real estate market, the more interesting burdens will come from secondary effects like cities and states rebalancing their budgets.
- Many companies are using Covid-19 as an excuse for layoffs. Smart companies are using the current “employers market” to headhunt future employees.
- Many companies will use this as an opportunity to shed office-related expenses. This will further accelerate the distribution of employees.
Medicine
- A vaccine won’t be developed anytime soon that will be very effective (on the order of a smallpox vaccine). General antiviral and experimental therapies will continue to be researched and used until interest wanes.
- At some point, the medical establishment is going to come looking for their paycheck. How the insurance and pharmaceutical industry will react will be interesting.
- There are long-term effects from Covid-19. These effects are so far not well-known but at worst could cause slightly increased morbidity for years among those who have been infected by the virus in the past.
- There’s going to be an interesting “freakonomics” study in 5 years around the health effects of those who partook in quarantine (e.g. increase in BMI, heart disease, mental stress). Most places didn’t quarantine for long enough for any long term effects to build up though.
- There’s going to be an interesting “freakonomics” study in 10 years around the health effects from dampened employment rates from Covid. It will be hard to find signal in this noisy data though.
Basic Docker Monitoring
docker container ls
- List all containers and some configuration/status metadatadocker ps
- same asdocker container ls
docker stats
-top
but for docker containersdocker top <container>
- Snapshot of resources in a single docker containerdocker inspect <container>
- Configuration of a single docker containerdocker system df
- Disk usage of docker entities
Switching From Go Dep to Go Mod
On switching from Go’s dep
tool to mod
:
dep ensure
turns intogo mod download
dep ensure -update
turns intogo-mod-upgrade
with the go-mod-upgrade tooldep check
turns intogo mod verify
Gopkg.toml
turns intogo.mod
Gopkg.lock
turns intogo.sum
Upgrading LibMySQLClient in Python MySQLDB/MySQLClient
After upgrading from Ubuntu 18.04 to Ubuntu 20.04, I ran into this error when
trying to import python’s mysqlclient
:
ImportError: libmysqlclient.so.20: cannot open shared object file: No such file or directory
After spending a while debugging it, I found the python mysqlclient
package
is a fork of MySQLdb
which compiles a _mysql.*.so
file which in turn references
libmysqlclient.so.*
. However, libmysqlclient.so
gets updated every so
often (with the most recent update from 20
to 21
) which makes mysqlclient
lose track of the libmysqlclient.so
version when installed with pip. After
trying various ways of clearing build caches, I was able to find a workaround
by:
- Ensuring
libmysqlclient
is installed (sudo apt install libmysqlclient-dev
) - Cloning the mysqlclient from
github (
git clone [email protected]:PyMySQL/mysqlclient-python.git
) - Manually building mysqlclient (
make build
) - Copying the generated
_mysql.*.so
to my virtualenv
I’m still trying to find a better way of doing this so I can get a working
mysqlclient
after running pip install mysqlclient
.
Edit 2020-06-19:
Since I had to do this again recently, here’s a script to automate the above workaround:
#!/bin/bash
# After installing mysqlclient on ubuntu 20.04, run this script to manually
# downgrade libmysqlclient from v20 to v19
# This assumes that you have an active python virtualenvironment
set -exuo pipefail
IFS=$'\n\t'
# sudo apt install libmysqlclient
git clone [email protected]:PyMySQL/mysqlclient-python.git
cd mysqlclient-python
make build
cp MySQLdb/_mysql.*.so $VIRTUAL_ENV/lib/python3*/site-packages/MySQLdb/
cd ..
rm -rf mysqlclient-python
Developing Django in Production
Django is still my favorite web framework. While it comes with a neat system for rolling forward and back relational database schema migrations, you still have to populate your test and local databases with fixture data. Instead, you can use this script (configuration required) to connect your local django app to production to help test your code against production data:
#!/bin/bash
# Connect your local django app to a production database
set -ex
LOCAL_IP=127.0.0.1
# The local port that you want to use to forward to your production database
# This is set to the mysql default port, incremented by 1 so that it doesn't conflict with your local database
LOCAL_PORT=3307
# The remote host (or IP) that can access your production database
# This is probably where your app is hosted
REMOTE_HOST=example.com
# The ip that your production app uses to connect to your production database
REMOTE_IP=172.24.0.3
# The port that your production app uses to connect to your production database
REMOTE_PORT=3306
# Create an ssh tunnel from localhost to production
ssh -fNL $LOCAL_IP:$LOCAL_PORT:$REMOTE_IP:$REMOTE_PORT $REMOTE_HOST
# Reset everything when this script is killed
function cleanupSSH {
pkill -f "$REMOTE_IP"
rm .env
ln -s .env.development .env
}
trap cleanupSSH EXIT
# Assuming you keep configs in .env, switch the symlink to a prodlocal config
# This is likely a mixture of development and production configs
rm .env
ln -s .env.prodlocal .env
# Start up django
./manage.py runserver
Quote
Shamelessly stealing off of Hacker News
Sendmail Wrapper for Mailgun
If you use the sendmail
linux CLI and you want to route outgoing emails
through mailgun, write this file, make it executable, and add it to your
path before the actual exectuable is found:
#!/bin/bash
# Shim for netdata to send emails through mailgun
# filename: sendmail
# suggested location: /usr/local/bin/
# Installation:
# 1. Write the contents of this script to a file called "sendmail"
# 2. Fill in the mailgun smtp email and password from https://app.mailgun.com/app/sending/domains/albertyw.com/credentials
# 3. sed -i 's/SENDMAIL_PATH_REPLACE_ME/$(mailgun)/g' sendmail
# 4. `chmod +x sendmail`
# 5. `sudo mv -n sendmail /usr/local/bin`
SENDMAIL_PATH="SENDMAIL_PATH_REPLACE_ME"
MAILGUN_EMAIL="REPLACE_ME"
MAILGUN_PASSWORD="REPLACE_ME"
# shellcheck disable=SC2068
"$SENDMAIL_PATH" \
-S smtp.mailgun.org \
-au "$MAILGUN_EMAIL" \
-ap "$MAILGUN_PASSWORD" \
[email protected]
Python Release Support Timeline
Since I’ve had a hard time determining when versions of Python are pre-release, supported, or deprecated, here’s a table of all recent python versions:
Version | Release | End of security fixes |
---|---|---|
2.7 | 2010-07 | 2020-01 |
3.4 | 2014-03 | 2019-03 |
3.5 | 2015-09 | 2020-09 |
3.6 | 2016-12 | 2021-12 |
3.7 | 2018-06 | 2023-06 |
3.8 | 2019-10 | 2024-10 |
3.9 | 2020-10 | 2025-10 |
Use the Default Flake8 Ignores
Flake8 provides a way to ignore PEP8 rules through its --ignore
and
--extend-ignore
flags. The former overwrites a default list of errors
and warnings, including W503
and W504 which are
mutually incompatible. Therefore, it’s easier to just use --extend-ignore
and
not use --ignore
.
Making Pip Require a Virtualenv
Using python pip to install packages without using a virtualenv is generally
considered an antipattern. Add this into your ~/.bashrc
to make pip
require an activated virtualenv before running.
# Do not pip install when not in a virtual environment
# https://docs.python-guide.org/dev/pip-virtualenv/#requiring-an-active-virtual-environment-for-pip
export PIP_REQUIRE_VIRTUALENV=true
Permalink
Engineering Toolbox
If you want to waste a few hours, take a look at The Engineering Toolbox.
(If you want to waste thirty minutes, take a look at This To That.)
PermalinkNode Timezones
Today I debugged some issues with javascript’s time zone support. Unlike most other parts of the Node.js standard libraries, time zone converstion data changes from time to time based on different countries’ whims. Usually, these changes are for minor countries or are minor changes to time zone boundaries, but recently, Brazil decided to end daylight savings with six months notice.
Looking at node specifically, it looks like nodejs’s Intl
library depends on
ICU, which depends on
tzdata. However, even the
most current stable version of node as of this writing (v13.0.1) uses ICU
version 64.2, which depends on tzdata 2019a which is outdated. On November 3,
no stable release of Node will correctly calculate Brazil’s time zone.
Sampling Samples
If you had a set of p90 samples, how would you get a p90 of the overall original data? Would you take the p90 of your p90 samples? Would you take the p50 median of your set of p90 samples? It turns out that neither are correct and it’s actually impossible to reliably recover original percentiles from derived percentile data:
If your original data is grouped into a [10, 10, 10, 10]
sample and many [0]
samples, your p90 samples
sould be [10, 0, 0, 0, 0...]
and your p90(p90(data))
would turn out to be 0
(your p50(p90(data))
would
turn out to be 0 too).
Even with equal sized samples, p90 samples aren’t useful for finding an overall. If we had 10
sets of samples, [1->10]
, [11->20]
, etc. until [91->100]
, our overall p90 would be 90, but our
p90(p90(data))
would be 89.
Rotating a NxN Matrix in One Line of Python
data = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
]
[[r[i] for r in data[::-1]] for i in range(len(data))]
[print(r) for r in data]
iTerm2 Search History
Like a bunch of people, I keep a dotfiles repository to version control and synchronize my Linux and MacOS configs. One of the things that I store in there is my iTerm2 configuration so that I can synchronize things like same color schemes. It looks like several other people do the same thing.
However, I noticed that iTerm saves searches that you’ve run in the terminal window
so iTerm can show history, which means that people are also publishing searches that
they’ve made in a private terminal window. These configs can be found inside a file
called com.googlecode.iterm2.plist
under a config key called NoSyncSearchHistory
so I wrote a script to scrape Github for these searches. The results of the search
were mostly boring, with terms like kubernetes
or user
, though I did find someone’s
AWS IAM key (I’m currently waiting for a reply from the person about the leakage).
Nginx Auth With IP Whitelists
I was looking into a way to configure Nginx to require basic HTTP authentication
but to skip authentication for specific IP addresses. I was also running Nginx
behind Cloudflare, which obscures the caller IP address. The normal way of
reading IP addresses wouldn’t work, so I had to switch up the IP address whitelist
to read from a Cloudflare-set header CF-Connecting-IP
.
# Create a map of IP addresses to auth configuration
map $http_cf_connecting_ip $auth {
# Whitelisted ip address has auth off
"<Whitelisted IP Address>" "off";
# Otherwise, auth is enabled
default "Authentication Required";
}
server {
...
# Enable HTTP authentication
auth_basic $auth;
# Set a file with username/password data
auth_basic_user_file <path to auth file>;
...
}
Permalink
Bash Strict Mode
Bash is well-known to be a hard language to write in, with many somewhat nonintuitive syntax requirements and edge cases. However, unlike some languages like Perl or Python, bash is available on basically every Unix machine and is the lingua franca for systems scripting making it very hard to avoid. Therefore, adding this snippet to the top of every script will avoid issues and make errors easier to debug.
set -exuo pipefail
IFS=$'\n\t'
Bonus: use shellcheck
to lint bash code and suggest fixes.
Optimizing Asus Routers for Serving Websites With Cloudflare
I’ve been serving this site and several others from a personal physical server frontended by Cloudflare rather than using a cloud provider like AWS. One of the things that’s always been a bother has been dealing with dropped packets and in particular, 522 errors from cloudflare. I’ve tried a lot of different things such as changing the netdev budget but one of the things that I think really solved it was looking through my router settings (I have an Asus RT-AC68U) and modifying the firewall to remove DoS protection. It turns out that the DoS protection has been dropping packets for legitimate traffic, and since I use Cloudflare, all web traffic is encrypted and comes from a limited set of Cloudflare IP addresses which probably makes it hard for DoS to recognize the traffic as legitimate.
PermalinkBrowserify, Mochify, Nyc, Envify, and Dotenv
I found that it’s possible to mix several javascript libraries together to generate production and test bundles for browsers. I was specifically looking for a way to use
- browserify - bundling multiple
javascript files that
require
each other together. - envify - Interpolating environment variables into javascript code
- dotenv - Setting environment variables from a file
- nyc - An istanbul CLI for getting instrumenting code and getting test coverage
- mochify - A pipeline to run mocha tests in a headless browser with browserify
For a production build with a browserify js script, you can use:
const browserify = require('browserify');
require('dotenv').config();
browserify('target.js')
.transform('envify')
.bundle()
.pipe(process.stdout);
which will generate a bundled javascript output with dotenv variables interpolated with envify. For a test build directly as a shell command, you can use:
nyc --require dotenv/config mochify --transform envify
which will run in-browser tests after applying browserify and envify (with dotenv variables).
PermalinkScraping Images From Tumblr
With my Reaction.Pics project, I had to scrape a bunch of tumblr accounts for data to assemble its database. Since I was trying to not hotlink to thousands of images, I made local copies of images (about 22 GB raw). However, given the uncurated nature of tumblr posts, I found that there were tons of broken images. Going through them, I noticed a few common themes, including:
- empty files (I assume from 404s)
- malformed files (just binary crap)
- HTML (also mostly 404s from sites that don’t obey
Accept
HTTP headers) - Non-standard images like
.raw
and.tiff
After processing the database several times with multiple scripts that checked various heuristics like file extension and guessing MIME encoding, I found that the single most useful way of checking images is having Python Pillow parse the image binary:
# Given an image path
path = "abcd.gif"
# Have PIL verify the image
from PIL import Image
Image.open(path).verify()
After filtering images and removing duplicates, I was able to bring the image databaes down to 8 GB.
Thanks to these sites for providing data:
PermalinkThere Are Too Many NPM Packages
I was trying to add modal popups to a website today and trying not to reinvent the wheel and instead use a pre-existing modal package from NPM. However, upon searching for a good package, I found that NPM has 2366 modal packages.
There’s a huge amount of duplicated work here, including lots of packages that integrate with react, bootstrap, vue, browserify, or whatever. Trying to sort packages by popularity or quality gives little benefit and the package that actually owns the "modal" name seems to be dead with people publishing their own dead forks.
My suspicion is that this is a systemic issue from the node community’s encouragement of creating tons of micro-packages multiplied by the ever-increasing number of new javascript web frameworks but other package indices have similar problems. I feel that package managers should do a better job at either recommending well-supported packages or create a higher barrier of entry to people creating new packages. Maybe someone should try namesquatting package names and see where that goes?
PermalinkProgrammers Writing Legal Documents
Programmers writing legal documents is like rolling your own crypto. Get an expert to do it.
PermalinkSolidity Review
I took Solidity for a test drive a few months back to try to understand the hype behind it, ethereum, and the crypto space in general. While I’m no expert and I haven’t been keeping up with later developments, I did have a few thoughts on the design of the language.
- Setting a version at the top of Solidity source code seems like a nice idea and should help with making sure the language spec stays flexible enough for rapid iteration. However, the lack of a fully formed dependency management system is a bigger flaw than a nice language version management system.
- The fact that all functions default to
public
visibility rather than a safer alternative likeinternal
,private
, or requiring an explicit visibility seems dangerous to me, especially given the “contract” goals of Solidity. - The overall syntax seems to borrow from several languages but most heavily from javascript, java, and python. It is nevertheless still quite clean and intuitive.
- The stdlib is still quite small and mostly consists of ethereum-specific logic and some mathematical functions. Of course, Solidity isn’t meant to be a general purpose programming language so it doesn’t need much more.
- The lack of testing frameworks on Solidity (and the minimal testing of Solidity itself) really scares me.
- There seem to be multiple bindings for other languages to call Solidity functions. This sounds like a nice idea, but they seem to be written as wrappers. Is there benefit here from having other more popular languages running Ethereum operations natively instead of wrapping Solidity?
Likwid
I found a neat tool that digs into your CPU setup called likwid. This is the topology for this server:
$ likwid-topology
--------------------------------------------------------------------------------
CPU name: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
CPU type: Intel Core Haswell processor
CPU stepping: 3
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets: 1
Cores per socket: 4
Threads per core: 1
--------------------------------------------------------------------------------
HWThread Thread Core Socket Available
0 0 0 0 *
1 0 1 0 *
2 0 2 0 *
3 0 3 0 *
--------------------------------------------------------------------------------
Socket 0: ( 0 1 2 3 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level: 1
Size: 32 kB
Cache groups: ( 0 ) ( 1 ) ( 2 ) ( 3 )
--------------------------------------------------------------------------------
Level: 2
Size: 256 kB
Cache groups: ( 0 ) ( 1 ) ( 2 ) ( 3 )
--------------------------------------------------------------------------------
Level: 3
Size: 6 MB
Cache groups: ( 0 1 2 3 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains: 1
--------------------------------------------------------------------------------
Domain: 0
Processors: ( 0 1 2 3 )
Distances: 10
Free memory: 4729.45 MB
Total memory: 15710.8 MB
--------------------------------------------------------------------------------
Permalink
My First Server's IP
As a freshman at MIT, I decided that it would be a fun way to get a head
start on my CS classes by setting up a webserver and playing around with
. As I learned later, a typical
MIT Course 6 curriculum doesn’t care much for practical devops abilities
(more on that subject for a later note), but I still remember the IP address
that was first allocated to me for setting up my server at albertyw.mit.edu -
18.239.1.127
.
I noticed recently that MIT sold off large portions of its 18.0.0.0/8
class A
IP address range, which included the entire 18.239.0.0/16
range including my
old IP address. I assume that the sold IP addresses will be made available
randmoly as part of AWS’s Elastic IP Addresses pool for people to attach to
AWS hosts and expose to the wider internet. While 18.239.1.127
still
seems unallocated, I wish best of luck to whoever gets the IP address (unless
I happen to grab it first).
Installing Netdata
sudo apt install netdata
- Check that netdata is running on the default port 19999. Note that netdata
does not by default respond to requests on public interfaces. I therefore
used an SSH tunnel from my laptop (
ssh -fN -L 19999:localhost:19999 albertyw.com
) to connect to netdata (localhost:19999
). sudo apt install apache2-utils nginx
- Write an nginx config file to proxy traffic to netdata.
- Generate a username/password:
sudo htpasswd -c netdata-access $USER
sudo service nginx reload
Interrobang Versus Shebang
‽
Interrobang - A punctuation mark used to combine a question mark and exclamation mark
#!
Shebang - A character sequence that is used to specify the interpreter for a unix script
PermalinkBad Interview Questions
I recently saw an advertisement for a tech recruiting company with this sample interview question:
I don’t know what interviewing signal one could gain by asking about implementation details of floating-point arithmetic.
PermalinkShowing Users in Different Databases
Cassandra
LIST USERS;
PostgreSQL
\du
MySQL
SELECT user, host FROM mysql.user;
Some MIT (Undergraduate) Admissions Interview Advice
I’ve been doing MIT admissions interviews for two years now and I’m planning on doing a third this Fall. So far, I’ve had a fun time talking to about 15 high school seniors about their interests and college plans. I’ve noticed a few common themes in interviews:
- Students are pretty reticent. Although nervousness is certainly a contributing factor here, it’s hard to have an interesting one sided conversation.
- Students say they have interests in some subject area but when asked to go deeper, to explain what about an area interests them, or how they’ve scratched their interests by investing time learning about their interests, they don’t have many examples. Of course, not everyone is afforded opportunities that match their interests, and it’s much easier to demonstrate interest in something like computer science than nuclear engineering. However, saying that one is interested in something but not doing anything about it doesn’t give much credence to the former.
- Many students concentrate too hard on either just academics or have participated in way too many extracurriculars. Either way, both extremes limit real exposure to different ways of thinking and as a side effect creates pretty cookie cutter stories during interviews.
Optimize the Develop-Test-Debug Cycle
(This is essentially a rewrite of core-metric-developer-productivity).
Increasing developer efficiency is oftentimes one of the highest priority goals of engineering orgs at companies ranging from startups to megacorps. With the high price of developers and notoriously bad adherence to deadlines, it’s quite normal for a company to want to speed up development time. Many organizations try various process tricks that read like it was written by an MBA student - agile, scrum, kanban, lean, to name a few. While many of those processes are good ideas, I feel many people lose focus on engineering issues that cause bad developer productivity.
I think that a core principle for optimizing and scaling developer efficiency is that many companies operate on a loose form of test driven development, and that most programming work is spinning within a cycle of “development” (writing production code to satisfy some business problem), “testing” (asserting the correctness of “development"), and “debugging” (making fixes to “development” given the feedback from “testing"). Many developments in computer science and software engineering have been aimed at speeding up (e.g. usage of IDEs) or short-circuiting (e.g. strongly typed languages) this cycle. I would recommend that any company which relies on an effective engineering department pay attention to the speed at which developers can move through this develop-test-debug cycle.
PermalinkExample of Python Subprocess
I needed to test out a thing with the subprocessing module and wrote an example of using it to run multiple processes in parallel, then reading all the stdouts and stderrs at the end:
process.sh
:
import subprocess
import time
commands = ['./time.sh'] * 5
outputs = [None] * len(commands)
processes = [None] * len(commands)
start = time.time()
for i, command in enumerate(commands):
process = subprocess.Popen([command], stdout=subprocess.PIPE)
processes[i] = process
for i, process in enumerate(processes):
outputs[i] = process.communicate()
print(i, outputs[i])
print('elapsed seconds: ', time.time() - start)
time.sh
:
#!/bin/bash
sleep 2
date
Spotted in Taiwan
Eslite bookstores (and most other bookstores in Asia) have really good stationery sections.
There are a lot of mopeds. Some people put blue or red lights on their personal mopeds so they look like police and get right of way.
PermalinkFixing "Fatal Error: Python.h: No Such File or Directory"
Some python packages, notably uWSGI and mypy, require access to a Python.h
file
to compile C bindings and they’ll fail with an ugly
fatal error: Python.h: No such file or directory
error if one can’t be located
on your system. Stackoverflow
gives a pretty good answer on fixing this, but I want to amend the top answer
in case your system has multiple versions of python 3 (e.g. you’re using an
Ubuntu PPA).
If you do have multiple python versions available, (on an Ubuntu system) explicitly specify the python version of the dev tools package, e.g.
sudo apt install python3.6-dev
Permalink
Cassandra Primary Keys
Cassandra schemas can be a bit hard to design and are especially important to design correctly because of the distributed nature of Cassandra. Many new users of Cassandra try to design schemas similar to relational databases because of CQL’s similar syntax to SQL.
Simple
CREATE TABLE example (
key uuid PRIMARY KEY
)
Composite Key
CREATE TABLE example (
key1 text, // partition key: determines how data is partitioned across nodes
key2 int, // clustering key: determines how data is sorted within a partition
PRIMARY KEY(key1, key2)
)
The goals for designing cassandra keys (stolen from the datastax documentation) are:
- Spread data evenly around the cluster
- Minimize the number of partitions read
MyPy Review
I recently added type annotations to two of my projects git-browse and git-reviewers using mypy and found it to be relatively enjoyable. Adding types to python definitely helps to make code self-documenting and effectively increases the number of tests in your code. There are a few large issues though for anyone trying to add type annotations:
In order to use python’s type annotation syntax (rather than type comments), your code must be Python 3 only. (Yes, you should use python 3 regardless of whether you’re adding type annotations)
You must have library stubs of your imports so that MyPy can infer types. So far, there are very few library stubs available and even some extremely popular packages like Flask aren’t covered. This limits type checking to packages with few if any external dependencies.
Adding in type annotations, I also ran into a few issues:
The docs are quite good but given python typing’s obscurity, it’s still hard to find answers for more esoteric features.
The syntax for default values, e.g. (from the mypy docs):
def greeting(name: str, prefix: str = 'Mr.') -> str:
return 'Hello, {} {}'.format(name, prefix)
puts the default value after the type annotation. For a person who hasn’t
worked with the type syntax before, at first glance it looks like a string
value is being assigned to str
within a dictionary.
- MyPy requires newly instantiated empty iterables (lists, sets, dictionaries) to include annotations so it can type check elements. However, the native python syntax has no support for it which requires adding type comments, resulting in:
data = [] # type: List[int]
- The comment syntax has a bug where its types require imports which set off linters like Flake8 as an unused import. From the above example, this ends up requiring odd code to pass both the flake8 linter and mypy:
from types import List # NOQA
data = [] # type: List[int]
MyPy is still under heavy development with significant hands-on support from Guido van Rossum himself. Overall though, adding in types was still a relatively easy and useful exercise and helped prompt some refactorings.
PS - Turns out that python-markdown2 has a bug when rendering code fences inside of lists.
PermalinkGriping About Time Zones
Daylight savings is ending in two weeks and I’m going to gripe about it. Not the usual gripe about people having to adjust their clocks and schedules (I for one, actually like daylight savings), but about how basically everyone writes time zones incorrectly. When people give a time with a time zone, such as “five o’clock in San Francisco”, many people just write “5 PDT” or “5 PST” without giving a thought where that “D” or “S” come from. Well, those acronyms stand for “Daylight” and “Standard,” respectively, so you should only use “PDT” to refer to times in the summer, and “PST” refer to times in the winter. If you want to be generic, you can use “PT” for everything and let context take care of the exact time zone.
PermalinkBundling Python Packages With PyInstaller and Requests
I recently tried using PyInstaller
to bundle python applications as a single
binary executable. Pyinstaller was relatively easy to use and its
documentation is pretty good. However, I ran into a bit of trouble bundling
the python requests package
because of problems with requests
looking for a trusted certificates file,
usually emitting an error like
OSError: Could not find a suitable TLS CA certificate bundle, invalid path: ...
.
In a typical installation, the certifi
package includes a set of trusted CA certificates but when PyInstaller bundles
the requests
and ceritifi
packages, certifi
can’t provide a file path for
requests
to use.
The way to fix this is to set the REQUESTS_CA_BUNDLE
variable
(documentation)
within your code before using requests:
import pkgutil
import requests
import tempfile
# Read the cert data
cert_data = pkgutil.get_data('certifi', 'cacert.pem')
# Write the cert data to a temporary file
handle = tempfile.NamedTemporaryFile(delete=False)
handle.write(cert_data)
handle.flush()
# Set the temporary file name to an environment variable for the requests package
os.environ['REQUESTS_CA_BUNDLE'] = handle.name
# Make requests using the requests package
requests.get('https://www.albertyw.com/')
# Clean up the temp file
Go Receiver Pointers vs. Values
When writing a method in Go, should you use a pointer or a value receiver?
Type | Use |
---|---|
Basic | Value |
Map | Value |
Func | Value |
Chan | Value |
Slice (no reslicing/reallocating) | Value |
Small Struct/Array | Value |
Concurrent mutations | Value if possible |
Is Mutated By Method | Pointer |
Large Struct/Array | Pointer |
Contains a sync.Mutex |
Pointer |
Contains Pointers | Pointer |
🤷 | Pointer |
Distilled from Golang Code Review Comments
PermalinkFixing statsonice.com Latency
Last night, I finally discovered and fixed the reason for very high latencies for statsonice.com. Many of the data-heavy pages on statsonice.com have had multi-second response times and although a Django/MySQL site on a minimally provisioned server isn’t the epitome of performance engineering, I’ve always bet it should run faster. After four years of optimizing parts of the website, I finally find a way to reduce latencies by an order of magnitude and bring sub-second response times.
These were some of the things that I tried which gave relatively minimal benefit:
- Adding memcached to cache model data and view partials
- Adding a Cloudflare CDN
- Upgrading the server, particularly increasing CPU cores and memory
- Optimizing the MySQL configuration
- Rate limiting web crawlers
- Denormalizing database models
What I did last night is by checking django-silk, I noticed that on certain pages, multiple simple but slow SQL queries were made and filtered by indexed fields. Some of these queries were taking on the order of hundereds of milliseconds, over 10X the latency of the same query on a local unoptimized virtual machine. Digging deeper with EXPLAIN queries, and checking the database schema, I found several indices were missing. Although the indices were included in the models, they were never added (or dropped) some time ago, probably by Django South, Django’s old migration tool. Evidently, one should not rely too much on ORMs, and manually checking your MySQL schemas can result in some amazing latency improvements:
Showing Schemas in Different Databases
Cassandra
describe keyspace <keyspace>;
PostgreSQL
\dn
MySQL
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, COLUMN_TYPE, COLUMN_COMMENT, ORDINAL_POSITION
FROM information_schema.columns
WHERE table_schema = '<DATABASE NAME>'
ORDER BY TABLE_NAME, ORDINAL_POSITION;
Straight Lines
There are a ton of straight lines commonly seen in English text.
‐
- the hyphen, used to join breaks within words or compound words–
- the en-dash, used for spans of numbers or compound adjectives—
- the em-dash, used in place of colons, commas, and parentheses_
- the underscore, originally used to underline words by typewriters|
- the vertical bar of many uses.
This doesn’t even include the dozens of singular lines available in Unicode.
PermalinkEmerson on Intellect
Before Pycon, I visited Powell’s City of Books and stumbled into the fiction aisle of D through H authors. I sampled some good books by Dostoyevsky, Dumas, and Hemingway but a particular passage stuck in my memory:
Every human being has a choice between truth and repose. Take which you please, you cannot have both. Between these, as a pendulum, man oscillates. He in whom the love of repose predominates will accept the first creed, the first philosophy, the first political party he meets– most likely his father’s. He gets rest, commodity, and reputation; but he shuts the door on truth. He in whom a love of truth predominates will keep himself aloof from all moorings and afloat. He will abstain from dogmatism and recognize all the opposite negations between which, as walls, his being is swung. He submits to the inconvenience of suspense and imperfect opinion but he is a candidate for truth, as the other is not, and respects the highest law of being.
Ralph Waldo Emerson Essays: First Series Essay XI Intellect
PermalinkCore Metric for Developer Productivity
A lot of software companies are concerned about the concept of developer productivity and maximizing the amount of output each engineer produces. The problem is that few companies measure productivity in any quantitative manner, instead using subjective metrics like developer happiness or irrelevant metrics like lines of code. Additionally, companies frequently confuse product management processes, like agile development, with helping software engineering metrics. This results in software being developed slowly and quite a lot of grumbling when the scrum master burns hours of each engineer’s team on sprint planning.
Looking around, I think that the core metric for developer productivity should be the average frequency of the edit/test/debug cycle. In all but the most intellectually challenging of programming problems, there is a clear direction of what needs to be built but the limiting factor is making sure the programmer’s code works as intended. Optimizing the edit/test/debug cycle therefore makes developers produce value faster. Many features in IDEs and modern development practices are aimed at shortening steps in the cycle, reducing the number of cycle iterations needed to complete the development work, or helping developers stay within the cycle. Indeed, Joel Spolsky’s famous test for programming environments can be summarized as checking the health of a development team’s edit/test/debug cycle. I therefore hope that when you evaluate processes and products to help be more productive, you think of how it will benefit your edit/test/debug cycle.
PermalinkHow to Capture a Camera Image With Python
While working on sky-color, I found that taking a photo using a webcam with python was pretty hard. opencv has some pretty opaque documentation since it’s primarily written for C developers and simplecv is dead and doesn’t support python 3. Stackoverflow is also filled with outdated incorrect answer. I therefore had to figure out a way to take a photo and save it to a file myself using python 3.6 and MacOS.
Prerequisites: Install numpy and opencv. My requirements.txt
file looks like:
numpy==1.12.1
opencv-python==3.2.0.7
Code:
import time
import cv2
camera_id = 0
file_name = 'image.png'
cam = cv2.VideoCapture(camera_id)
time.sleep(1) # Give some time for the webcam to automatically adjust brightness levels
ret_val, img = cam.read()
cv2.imwrite(file_name, img)
Further reference: OpenCV API
PermalinkPython Has a Ridiculous Number of Inotify Implementations
Mostly stolen from watchdog’s readme:
- pnotify
- unison fsmonitor
- fsmonitor
- guard
- pyinotify
- inotify-tools
- jnotify
- treewalker
- file.monitor
- pyfilesystem
- watchdog
- inotify_simple
Looking through a few of these, I think I recommend watchdog and inotify_simple.
PermalinkProjects: Gentle-Alerts
Gentle-Alerts is a chrome extension that I built to fix the problem of noisy popup alerts in Chrome. Using Google Calendar a lot, I used to get a popup alert before every event that I was invited to. Fiddling with its built-in “browser notifications”, I wasn’t very satisfied because of its pop-over UX. I therefore decided to create Gentle-Alerts to solve this problem for Calendar and all other websites.
Gentle-Alerts works by overriding the window.alert
built-in function with a custom function
that shows a browser modal. In building Gentle-Alerts, I had some fun with some different frontend
programming rules. Rather than the usual problem of writing javascript code that has to be compatible
with different browsers with a known environment, writing the javascript for Gentle-Alerts required
me to write javascript code compatible specifically for Chrome but running against the javascript
environment of any website. I therefore kept the code pretty simple and used only vanilla javascript
without any third-party dependencies.
- Code: Github
- Extension: Chrome Web Store
- Library: NPM
Thanks to Chris Lewis, David Hamme, Song Feng, and Scott Kennedy for testing the extension.
PermalinkCreating a New PyPI Release
As a reminder to myself for the magic incantations for uploading a repository to PyPI:
pip install twine
python setup.py sdist bdist_wheel
twine upload dist/*
Eva Air USB Ports
I just got off an Eva Air flight which had in-seat USB ports not only for power but also for data. I found that when I plugged in USB keys, it could navigate through FAT32 and NTFS memory sticks which makes me think that the in-flight entertainment system was based off of an embedded Windows OS. Several of the games in the system also had multiplayer modes which would mean that there must be some LAN within the plane and since the plane’s audio announcement system could pipe audio through the seatback system, that must be connected as well.
Although I doubt the Boeing 777’s designers would also link up flight-critical systems like avionics, there is something to be said about the possibilities arising from putting some sufficiently determined hacker on a plane with Wi-Fi, an electrical socket, physical access to a Windows-backed USB port, and twelve-plus hours of boredom.
From the Eva Air Website:
In-seat USB Port
If you are traveling on our selected B777-300ER (Royal Laurel Class) and A330-300 aircrafts (Premium Laurel Class), you can navigate through PDF files, photos and other multimedia content stored in your storage devices (iPod, USB flash drive**, AV connector-enabled device, etc.) on your seat-back screen. Instructions are shown on the screen once connected.
Permalink
Projects: Git-Browse
I’ve recently worked on a project called Git-Browse to help look up information
in github and uber’s phabricator.
Quite often, I’ve found the need to look up information about a git repository
in order to share code with people, find history, or file issues. Having to
manually look up the repository on github or phabricator takes excessive time
and can easily lead to incorrect information from looking at forks. Git-Browse
solves the problem by introspecting a git repository’s .git/config
file
and automatically opening the git repository in a browser. Git-Browse can then
be integrated in your local or global .gitconfig
as an alias so you can open
repository objects with git browse <path>
.
While working on git-browse, I found that this is similar to github hub’s browse but git-browse would be a lot easier to support additional repository hosts. Hub doesn’t support opening arbitrary branches or commits either, but it does support opening issues and wikis.
Git-Browse requires python 3 to run. Install it by following the Readme Instructions.
PermalinkCassandra Compaction Strategies
When setting up Cassandra tables, you should specify the compaction strategy Cassandra should use to store data internally. To do so, just add
WITH compaction = { 'class': '<compactionName>' }
to an ALTER TABLE
or CREATE TABLE
command.
Name | Acronym | Used For |
---|---|---|
SizeTieredCompactionStrategy | STCS | Insert-Heavy Tables |
LeveledCompactionStrategy | LCS | Read-Heavy Tables |
DateTieredCompactionStrategy | DTCS | Time Series Data |
Code Is Like Tissue Paper
Code is like Tissue Paper, it falls apart after one use
Code is like Tissue Paper, there are holes everywhere
Code is like Tissue Paper, it sucks to have to use someone else’s
Code is like Tissue Paper, you get a new one, even for the same problems
Code is like Tissue Paper, you should not feel bad when you throw it away
Code is like Tissue Paper, there are many layers
Code is like Tissue Paper, other people won’t like to use yours
PermalinkSeen in a Bathroom Stall at MIT
Do you compute?
No, I come poop
Permalink
Underused Python Package: Webbrowser
While I was working on git-browse (post coming soon), I found out about python’s webbrowser package. It’s a super-simple way of opening a URL in one of many different browsers. Python’s standard library is pretty awesome.
PermalinkPax ?
There’s an idea in (popular) political science about Pax Romana, the idea that for several hundred years, Ancient Rome’s preponderance of power created an atmosphere of relative peace both inside and outside its borders. This was supported by a massive, well-organized military who’s the job it was to enforce Roman law inside its borders, and suppress any enemies internal or external.
This idea has been applied to other cases, in particular Pax Brittanica and Pax Americana. In all three of these cases, the hegemon has had a dominating military in a specific battlespace compared to any rival. Rome had superior land armies, Britain had superior navies, and the US has a superior air (and space) force. States were thus geared for supporting these forces and reaping the economic benefits they gave. Rome’s armies sustained Rome’s economies through slaves, Britain through mercantile trade, and the US through rapid global soft and hard power projection.
The question therefore is what does the future hold for hegemonic peace? We’re probably in the middle of a Pax Americana so during the next few decades we’ll have 1) continued Pax Americana, 2) a shift of power to a different hegemon (as happened to Pax Britanica at the end of the 19th century), or 3) the world order devolves into having several localized powers that do not necessarily cooperate for mutual peace and gain (as happened after Pax Romana in the Middle Ages). Since military and economic power are tightly linked and interdependent, I believe the new economic shift to the Internet will require a new hegemon to develop and maintain technical and information superiority to challenge the current world order. Many countries, particularly Russia and China, have developed these cyberwarfare capabilities and successfully tested them against adversaries. The US also has significant capabilities but cyberwarfare is so far in such a nascent state that it’s hard to establish dominance. Hopefully, it won’t require a real test like a repeat of the Napoleonic wars or WWI/WWII.
PermalinkGolang Review
I’ve been using Go for several projects at Uber and personally. From my experience, I’ve developed some opinions on the Go programming language, from both objective and subjective points of view. Compared to other languages, I find that Go has much to be desired in terms of its language design.
Let’s start off with the nice points. Go is pretty opinionated about its development setup with standardized layouts of packages and build systems. Language-wise, it has a simple, easy-to-learn syntax that can be easily learned by anybody with backgrounds in C, Java, or Python. It’s statically typed, expressive while legible, and has understandable concurrency primitives.
On the other hand, Go has a lot of downsides in terms of environment and language. Environment first. A glaring issue with Go is dependency management. Go’s development layout assumes you’re working on a monorepo (more on this later). If you’re not working on a monorepo and/or you don’t have direct control over your dependencies, you’re going to have a bad time trying to keep your dependencies up to date without breaking things. Many tools have been written to work around this, but the core issue is that many Go packages aren’t themselves versioned. It seems semantic versioning and even change logs seem to be new and controversial in many Go communities.
My other gripe with Go’s environment is the work it takes to set up and
maintain a new project. A go repo not only requires the entire $GOPATH
to be
set up, with attendant dependencies and repositories, but also requires
Makefile
s, and a lot of testing boilerplate. Arguably, in a monorepo-style
development process,
this is a cost that scales O(1)
instead of O(n)
, but I feel that in many Go
projects, the amount of Bash and scripting language (most commonly python) code
exceeds the amount of actual Go code. Maintaining this (often untested and
hacky) code is an extra source of work for a Go developer.
As for the language itself, I’ll freely echo the common criticism about Go’s lack of generics and while I have gotten used to the lack of exceptions, this all seems to oversimplify the language. Many nice programming idioms found in other languages (e.g. monads, classes, duck typing) are unavailable in Go due to the restricted feature set.
From these points, I find that Go is a fine language for large teams of average programmers creating large monorepo systems. Go caters to this demographic of programmers by being essentially a compilable, statically-typed BASIC while ignoring the last several decades of Programming Language Theory.
PermalinkWadler's Law
In any language design, the total time spent discussing
a feature in this list is proportional to two raised to
the power of its position.
0. Semantics
1. Syntax
2. Lexical syntax
3. Lexical syntax of comments
Permalink
Tunnel V2
I posted a handy single-line trick to forward connections from one ip/port address to another. I recently had a problem where I needed to forward to a port on a local host but the service was only listening on the public interface. OpenSSH however tries to be clever and rewrites the public ip to localhost, which isn’t overridable (the tunnel entrance is overridable but not the exit, which is what I was trying to change).
I therefore present tunnel v2, using NodeJS instead of SSH:
'use strict';
var net = require('net');
var process = require('process');
var console = require('console');
// parse "80" and "localhost:80" or even "42mEANINg-life.com:80"
var addrRegex = /^(([a-zA-Z\-\.0-9]+):)?(\d+)$/;
var addr = {
from: addrRegex.exec(process.argv[2]),
to: addrRegex.exec(process.argv[3])
};
if (!addr.from || !addr.to) {
console.log('Usage: <from> <to>'); // eslint-disable-line no-console
throw new Error('Not enough arguments');
}
net.createServer(function onServer(from) {
var to = net.createConnection({
host: addr.to[2],
port: addr.to[3]
});
from.pipe(to);
to.pipe(from);
}).listen(addr.from[3], addr.from[2]);
Adapted from Andrey Sidorov
PermalinkMultiPens
I have a particular interest in multicolor pens and I’ve developed a taste for them over the years since using the common BIC 4-Color Ballpoint Pen. Those pens only have four basic colors, and the actual tips and ink result in inconsistent sticky ink flow. BIC also makes 4-Color pens with finer points but that only results in spotty writing.
Next up, I’ve also used the Zebra Multi Color Pen which is a similar pen but with an additional pencil included. The ink and ballpoint is also slightly better than the BIC.
For those who optimize for options, there’s 6 color and 10 color pens. If the ridiculousness of the number of colors doesn’t keep you from using the pens every day, the problem with having so many colors is that their individual ink sticks are more off-center. This causes more bending of the ink and can make the side of a ballpoint pen scrape along paper. It also makes writing more spongy because of the increased room and bent existent within the pen.
The pen that I use now is the Uni Jetstream Pen. It contains the four standard black, red, green, and blue colors, plus a pencil and eraser. It has some reasonable weight, the writing is consistent, and it looks pretty professional.
PermalinkSSH Tunnel
This is a handy command to memorize:
ssh -fN -L $port1:$host1:$port2 $host2
This allows you to make requests to $host1
on $port1
to instead hit $host2
on $port2
.
That Time I Was a Whitehat Hacker
I’ve been trying to find a replacement for github streaks after they removed them a few months ago. I was pretty happy to find GithubOriginalStreak which had browser plugins for Chrome, Firefox, and Opera. After installing the plugin and noticing it wasn’t correctly reporting streak lengths, I dug into its source code and was surprised to see it was using github gists as a datastore for streak information.
This presented a few problems:
- Github gists aren’t supposed to be used as a high performance database. Github probably rate limits access to its data.
- The packaged browser extensions contain read/write keys to the account that owns the github gist. The GithubOriginalStreak repository itself doesn’t have the keys but the keys are easily extractable from the extensions anyways.
- Neither the gist nor the code does any validation of incoming data before the supposed gist lengths are displayed inline in the Github page.
This last problem was the most critical. A malicious attacker could have gotten write-privileges by downloading and unpacking the extension, then modified the gist to inject an XSS attack into someone else’s browser. The best part is that the gist contains a list of all people who use the extension so you could target a specific person for XSS.
I talked to the author afterwards and thankfully he was receptive of the feedback. The extension is still using Github gists but is now doing some data validation. With the new profile design, extensions like these shouldn’t be needed anymore.
PermalinkComparison of Country and Company GDPs
I’ve noticed that many companies have yearly revenues on the order of many (non-insignificant countries). With countries though, yearly revenues are usually called gross domestic product (GDP). I therefore present a comparison of national revenues and corporate GDPs:
Company? | Rank within type | GDP (billions of USD) | Name |
---|---|---|---|
1 | 18,558 | USA | |
2 | 11,383 | China | |
23 | 509 | Taiwan | |
Y | 1 | 482 | Walmart |
24 | 474 | Poland | |
37 | 306 | Israel | |
Y | 6 | 305 | Samsung |
38 | 302 | Denmark | |
39 | 295 | Singapore | |
Y | 7 | 273 | Royal Dutch Shell |
Y | 8 | 270 | Vitol |
Y | 9 | 268 | ExonMobil |
40 | 266 | South Africa | |
42 | 253 | Columbia | |
Y | 11 | 245 | Volkswagen |
43 | 235 | Chile | |
44 | 234 | Finland | |
Y | 12 | 234 | Apple |
45 | 226 | Bangladesh |
This table does not include state-owned companies. Fun fact - Vitol, which has a revenue of $270B is headquartered in Switzerland, which has a GDP of $652B.
Sources:
PermalinkSketching Science
Permalink
Tech Hiring Misperceptions at Different Companies
I’ve seen many companies’ technical interview processes and I feel many of them are wrong. For almost any advice about technical interviewing, you’re likely to hear the opposite advice from a different person. What people don’t realize is that different types of companies need different types of engineers, and that you can’t select the correct type of engineer if you’re interviewing for the same qualities.
Many people, especially those firmly in the Silicon Valley Blogosphere, opine that you should only hire ninja rockstar jedi engineers (nobody calls them that anymore, but the mindset is still there) for your startup (because you obviously work at a startup, right?) whether your startup is trying to sell a static program analysis engine or be an Uber for X company. The fact is that for most companies, good software will not save a failing company and bad software will not sink a successful company (case in point: Yahoo and your typical government contractor, respectively). Therefore, if you’re in the category of companies whose success doesn’t depend on the strength of your engineering organization (which is most companies), I think you should stop trying to attract really good engineers - they’re going to get bored writing yet another CRUD app and you’re going to be paying a lot more money.
PermalinkCalculating Rails Database Connections
I recently ran into a problem with calculating the number of database connections used by Rails. It turns out that for a typical production environment, it’s actually hard to find the maximum number of connections that would be made. MySQL and PostgreSQL also have relatively low default maximum connection limits (151 and 100, respectively) which means it’s really easy to get an error like “PG::ConnectionBad: FATAL: sorry, too many clients already.”
After some digging, I believe the formula for getting the maximum number of open connections is by multiplying the “pool” value in config/database.yaml against the number of processes (workers in Puma). If you’re running sidekiq or other background job processor, you’ll also need to add in the number of background processors into your web server’s processes count.
PermalinkDevOps Reactions
Permalink
Tuning Postgres
I was trying to tune a PostgreSQL database and found that default settings for PostgreSQL are optimized for really old computers. If you want to fix it, you can spend an hour reading through documentation, or you can use pgtune. If CLI isn’t your thing, you can try the web version.
PermalinkFibonaccoli
(Romanesco broccoli + Fibonacci)
Permalink