request / response

a blog about the web, Go, and building things

(by Matt Silverlock)


Building Go Projects on CircleCI

•••

If you follow me on Twitter, you would have noticed I was looking to migrate the Gorilla Toolkit from TravisCI to CircleCI as our build-system-of-choice after they were bought out & fired a bunch of senior engineers. We’d been using TravisCI for a while, appreciated the simple config, but realized it was time to move on.

I also spent some time validating a few options (Semaphore, BuildKite, Cirrus) but landed on CircleCI for its popularity across open-source projects, relatively sane (if a little large) config API, and deep GitHub integration.

Requirements

I had two core requirements I needed to check off:

  1. The build system should make it easy to build multiple Go versions from the same config: our packages are widely used by a range of different Go programmers, and have been around since the early Go releases. As a result, we work hard to support older Go versions (where possible) and use build tags to prevent newer Go APIs from getting in the way of that.

  2. Figuring out what went wrong should be easy: a sane UI, clear build/error logs, and deep GitHub PR integration so that a contributor can be empowered to debug their own failing builds. Overall build performance falls into this too: faster builds make for a faster feedback loop, so a contributor is more inclined to fix it now.

The Config

Without further ado, here’s what the current (June, 2019) .circleci/config.yml looks like for gorilla/mux - with a ton of comments to step you through it.

version: 2.0

jobs:
  # Base test configuration for Go library tests Each distinct version should
  # inherit this base, and override (at least) the container image used.
  "test": &test
    docker:
      - image: circleci/golang:latest
    working_directory: /go/src/github.com/gorilla/mux
    steps: &steps
      # Our build steps: we checkout the repo, fetch our deps, lint, and finally
      # run "go test" on the package.
      - checkout
      # Logs the version in our build logs, for posterity
      - run: go version
      - run:
          name: "Fetch dependencies"
          command: >
            go get -t -v ./...
      # Only run gofmt, vet & lint against the latest Go version
      - run:
          name: "Run golint"
          command: >
            if [ "${LATEST}" = true ] && [ -z "${SKIP_GOLINT}" ]; then
              go get -u golang.org/x/lint/golint
              golint ./...
            fi
      - run:
          name: "Run gofmt"
          command: >
            if [[ "${LATEST}" = true ]]; then
              diff -u <(echo -n) <(gofmt -d -e .)
            fi
      - run:
          name: "Run go vet"
          command:  >
            if [[ "${LATEST}" = true ]]; then
              go vet -v ./...
            fi
      - run: go test -v -race ./...

  "latest":
    <<: *test
    environment:
      LATEST: true

  "1.12":
    # This is the neat trick: the <<: *test references the, well, &test reference
    # we created above. If we want to override any sub-object, we just need to supply
    # that object, and not copy+paste the entire object (at risk of typo, or misconfig)
    <<: *test
    docker:
      - image: circleci/golang:1.12

  "1.11":
    <<: *test
    docker:
      # Version tags here fetch the latest Go patch release by default
      # e.g. "golang:1.11" fetches 1.11.11 automatically.
      - image: circleci/golang:1.11

  "1.10":
    <<: *test
    docker:
      - image: circleci/golang:1.10

  "1.9":
    <<: *test
    docker:
      - image: circleci/golang:1.9

  "1.8":
    <<: *test
    docker:
      - image: circleci/golang:1.8

  "1.7":
    <<: *test
    docker:
      - image: circleci/golang:1.7

workflows:
  version: 2
  build:
    jobs:
      # Self-explanatory: we run all of our builds. We could potentially
      # add a "requires: latest" to the specific version tags, if we wanted to
      # attempt to build latest first (only).
      - "latest"
      - "1.12"
      - "1.11"
      - "1.10"
      - "1.9"
      - "1.8"
      - "1.7"

Pretty straightforward, huh? We define a base job configuration, create a reference for it at &test, and then refer to that reference with <<: *test and just override the bits we need to (Docker image URL, env vars) without having to repeat ourselves.

By default, the jobs in our workflows.build list run in parallel, so we don’t need to do anything special there. A workflow with sequential build steps can set a requires value to indicate the jobs that must run before it (docs).

Note: If you’re interested in what the previous TravisCI config looked like vs. the new CircleCI config, see here.

Go Modules?

If you have a project defined as a Go Module - that is, there’s a go.mod present - then you can make a couple of minor adjustments to the Job definition:

    # Setting the env var will allow the rest of the Go toolchain to
    # correctly enable Module support
    environment:
      GO111MODULE: "on"
    steps: &steps
      - checkout
      - run: go version
      - run:
          name: "Fetch dependencies"
          command: >
            go mod download
      - run: # Rest of the steps here...

  "latest":
    <<: *test-and-build
    environment:
      LATEST: "true"
      # Since we re-define the "environment" here, we need to add "GO111MODULE=on" back
      GO111MODULE: "on"

  "1.12":
    <<: *test-and-build
    docker:
      - image: circleci/golang:1.12

If you’re also vendoring dependencies with go mod vendor, then you’ll want to make sure you pass the -mod=vendor flag to go test or go build as per the Module docs.

Other Tips

A few things I discovered along the way:

  • Building from forks is not enabled by default - e.g. when a contributor (normally) submits a PR from their fork. You’ll need to turn it on explicitly.
  • Enable GitHub Checks to get deeper GitHub integration and make it easier to see build status from within the Pull Request UI itself (example).
  • Updating the CI config on 10+ projects is not fun, and so I wrote a quick Go program that templates the config.yml and generates it for the given list of repos.

In the end, it took a couple of days to craft a decent CircleCI config (see: large API surface), but thankfully the CircleCI folks were pretty helpful on that front. I’m definitely happy with the move away from Travis, and hopefully our contributors are too!


Connecting to a Coral TPU Dev Board with Windows

•••

The Coral Dev Board is a TPU-enabled development board for testing out machine learning models with a requirement for near-real-time inference. For instance, image classification or object detection on video feeds, where a CPU would struggle to keep up.

However, the dev board’s setup instructions only document a native Linux process, but it’s entirely possible to flash the boot image via native Windows (without WSL): here’s how!

Pre-requisites

You’ll need to install a few things: this is mostly a process of clicking “next” a few times (the USB drivers) and unzipping a folder (the Android Platform Tools).

You should also be moderately familiar with serial consoles & have read through the Coral’s setup instructions to familiarize yourself with the process.

Note: It’s important to make sure you’re using a data-capable USB-C cable when connecting to the USB-C data port. Like many things USB-C / USB 3.x, this can be non-obvious at first. You’ll know when the Device Manager shows a “⚠ USB Download Gadget” in the Device Manager. If you use a power-only cable, nothing will show up and it’ll seem as if the OS isn’t seeing the device.

Connecting to the Serial Console

Mostly identical to the Coral setup instructions:

  1. Connect to the dev board’s micro-USB port, and identify the COM port the device is attached to in the Device Manager by looking under “Ports (COM & LPT)” for the “CP2105 USB to UART (Standard)” device. In my case, it was COM3.
  2. Power on the board by connecting the USB-C power cable to the power port (furthest from the HDMI port).
  3. Open PuTTY, select “Serial” as the connection option, set the COM port to the one you identified above, and the data rate to 115200bps. For confirmation, the serial comms settings should be at 8 data bits, no parity bits, 1 stop bit and XON/XOFF flow control.

The serial port on the dev board accepts other settings, but I’m documenting an explicit list for those who don’t have a background in serial comms.

You should now be at the dev board’s uboot prompt, and ready to flash the bootloader & disk image. If not, check that the board is powered on, that the COM port is correct, and that the Device Manager lists the device.

Flashing the Board

Connect the USB-C data cable to the dev board, and the other end to your PC.

In the Device Manager, you’ll see a “USB Download Gadget” appear with a warning symbol. Right click, choose “Update Driver”, select “Browse my computer for driver software” and then “Let me pick from a list of available drivers from my computer”. In the driver browser, choose “WinUsb Device” from the left side, and “ADB Device” (Android Debugger) from the right. Click “Next” and accept the warning. The Device Manager will refresh, and show the device under “Universal Serial Bus devices”.

To confirm it’s configured correctly and visible to the OS, head back to your command prompt and enter:

λ fastboot devices
122041d6ef944da7        fastboot

If you don’t see anything, confirm the device is still showing in the Device Manager, and that you have the latest version of fastboot from the Android Platform Tools (linked above).

From here, you’ll need to download and unzip the bootloader image and the disk image (identical to the official instructions), and confirm you see the contents below:

λ curl -O https://dl.google.com/aiyprojects/mendel/enterprise/mendel-enterprise-beaker-22.zip
λ unzip mendel-enterprise-beaker-22.zip
λ cd mendel-enterprise-beaker-22
λ ls
    boot_arm64.img  partition-table-16gb.img  partition-table-8gb.img  rootfs_arm64.img
    flash.sh*       partition-table-64gb.img  recovery.img             u-boot.imx

Unfortunately, the flash.sh script is a Bash script, which won’t work for us: but we can easily replicate what it does:

λ tail -n 15 flash.sh
fi

# Flash bootloader
${FASTBOOT_CMD} flash bootloader0 ${PRODUCT_OUT}/u-boot.imx
${FASTBOOT_CMD} reboot-bootloader

# Flash partition table
${FASTBOOT_CMD} flash gpt ${PRODUCT_OUT}/${PART_IMAGE}
${FASTBOOT_CMD} reboot-bootloader

# Flash filesystems
${FASTBOOT_CMD} erase misc
${FASTBOOT_CMD} flash boot ${PRODUCT_OUT}/boot_${USERSPACE_ARCH}.img
${FASTBOOT_CMD} flash rootfs ${PRODUCT_OUT}/rootfs_${USERSPACE_ARCH}.img
${FASTBOOT_CMD} reboot

Where we see “FASTBOOT_CMD” we simply run fastboot - and where we see USERSPACE_ARCH we only have one choice for the dev board: arm64. We can work with this.

In the serial console (e.g. in PuTTY), put the dev board into fastboot mode:

fastboot 0

Then, in the command prompt and from within the mendel-enterprise-beaker-22 directory, invoke the following commands. You should leave the serial console connected: you’ll see the progress of each step.

fastboot flash bootloader0 u-boot.imx
fastboot reboot-bootloader
 
fastboot flash gpt partition-table-8gb.img
fastboot reboot-bootloader

fastboot erase misc
fastboot flash boot boot_arm64.img
fastboot flash rootfs rootfs_arm64.img
fastboot reboot

When the device reboots, you’ll get a more familiar Linux login prompt in the serial console! Enter mendel (username) and mendel (password) to log in, and then follow the steps within the official documentation to set up network connectivity! You’ll then be able to log into the board remotely via SSH, and will only need to connect it to power unless you want to flash it again.

Beyond that: enjoy experimenting & building things on your Coral Dev Board! And if you run into issues, or find something unclear in these instructions, you can reach me on Twitter at @elithrar.


Updating Kubernetes Deployments on a ConfigMap Change

•••

Update (June 2019): kubectl v1.15 now provides a rollout restart sub-command that allows you to restart Pods in a Deployment - taking into account your surge/unavailability config - and thus have them pick up changes to a referenced ConfigMap, Secret or similar. It’s worth noting that you can use this with clusters older than v1.15, as it’s implemented in the client.

Example usage: kubectl rollout restart deploy/admission-control to restart a specific deployment. Easy as that!

One initially non-obvious thing to me about Kubernetes was that changing a ConfigMap (a set of configuration values) is not detected as a change to Deployments (how a Pod, or set of Pods, should be deployed onto the cluster) or Pods that reference that configuration. That expectation can result in unintentionally stale configuration persisting until a change to the Pod spec. This could include freshly created Pods due to an autoscaling event, or even restarts after a crash, resulting in misconfiguration and unexpected behaviour across the cluster.

Note: This doesn’t impact ConfigMaps mounted as volumes, which are periodically synced by the kubelet running on each node.

Updating the ConfigMap and running kubectl apply -f deployment.yaml results in a no-op, which makes sense if you consider the impacts of an unintended config change and rollout in a larger deployment.

But, there are certainly cases where we want to:

  • Update a ConfigMap
  • Have our Deployment reference that specific ConfigMap version (in a version-control & CI friendly way)
  • Rollout a new revision of our Deployment

So how can we accomplish that? It turns it out to be fairly straightforward, but let’s step through an example.

Example

Our ConfigMap, applied to our Kubernetes cluster:

➜  less demo-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: demo-config
  namespace: default
data:
  READ_TIMEOUT_SECONDS: "15"
  WRITE_TIMEOUT_SECONDS: "15"
  NAME: "elithrar"
➜  kubectl apply -f demo-config.yaml
configmap/demo-config created

And here’s our Deployment before we’ve referenced this version of our ConfigMap - notice the spec.template.metadata.annotations.configHash key we’ve added. It’s important to note that modifying a top-level Deployment’s metadata.annotations value is not sufficient: a Deployment will only re-create our Pods when the underlying template.spec (Pod spec) changes.

This is how we’ll couple the Deployment with our ConfigMap, triggering a change in our Deployment only when our ConfigMap actually changes.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-deployment
  labels:
    app: config-demo-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: config-demo-app
  template:
    metadata:
      labels:
        app: config-demo-app
    annotations:
      # The field we'll use to couple our ConfigMap and Deployment
      configHash: ""
    spec:
      containers:
      - name: config-demo-app
        image: gcr.io/optimum-rock-145719/config-demo-app
        ports:
        - containerPort: 80
        envFrom:
        # The ConfigMap we want to use
        - configMapRef:
            name: demo-config
        # Extra-curricular: We can make the hash of our ConfigMap available at a
        # (e.g.) debug endpoint via a fieldRef
        env:
          - name: CONFIG_HASH
            valueFrom:
              fieldRef:
                fieldPath: spec.template.metadata.annotations.configHash

With these two pieces in mind, let’s create a SHA-256 hash of our ConfigMap. Because this hash is deterministic (the same input == same output), the hash only changes when we change our configuration: making this a step we can unconditionally run as part of our deployment (CI/CD) pipeline into our Kubernetes cluster.

Note that I’m using yq (a CLI tool for YAML docs, like jq is to JSON) to modify our Deployment YAML at a specific path.

➜  yq w demo-deployment.yaml spec.template.metadata.annotations.configHash \
>  $(kubectl get cm/demo-config -oyaml | sha256sum)
...
spec:
  ...
  template:
    metadata:
      ...
      annotations:
        configHash: 4431f6d28fdf60c8140d28c42cde331a76269ac7a0e6af01d0de0fa8392c1145

We can now re-deploy our Deployment, and because our spec.template changed, Kubernetes will detect it as a change and re-create our Pods.

As a bonus, if we want to make a shortcut for this during development/local iteration, we can wrap this flow in a useful shell function:

# Invoke as hash-deploy-config deployment.yaml configHash myConfigMap
hash-deploy-config() {
  yq w $1 spec.template.metadata.annotations.$2 \
  $(kubectl get cm/$3 -oyaml | sha256sum)
}

Accidentally From macOS to Windows and WSL

•••

Update (June 2019): Much of this post remains true, and is still relevant ~six months later (slow IO perf, WSL feeling like a shim), but there are some major improvements just around the corner.

Specifically, with WSL2 moving to a VM-based architecture, a lot of the perf woes are scheduled to disappear. With VS Code’s Remote extension, the “two halves of the same system” problem - where you have to duplicate your toolchain - is effectively gone (this has been my favorite improvement so far, by a long shot). On the terminal front, we’re almost there: Alacritty still (unfortunately) struggles with Unicode glyph rendering on Windows, but Microsoft has open-sourced their own Windows Terminal, and it’s actually really good, even in this preview state.

I’d say that, six months after writing this post, that WSL (as it exists in June 2019) is not a replacement for every dev environment just yet. But there’s been meaningful steps to make it better, and I’m fighting the “shim” less and less now with WSL2 & the remote extension. macOS is still likely the best ‘default’ choice for many, but it’s good to have options.

It’s been ~5 months since I’ve used macOS proper, after 13+ years of personal use and a handful of work-use. This began when I started using my Windows “gaming” desktop & WSL (Windows Subsystem for Linux) for maintaining OSS projects & other dev-work—in-between dungeons or rounds of Overwatch—purely out of the convenience of being on the same machine.

It came to a head when I realized my 12” MacBook was collecting dust, that I wasn’t using it at work (ChromeOS + Crostini), and when I saw the Surface Pro 6 on sale. I decidd to see if I could live with WSL closer to full-time, and critically, go without macOS. And so I put it up on Craigslist, sold it that weekend, and unpacked the Surface Pro a week later.

I did it partially as an experiment: Windows has been seen some significant improvements as an OSS development over the last couple of years. Could I use it for writing Go, [an increasing amount of] data science / SQL / ML explorations, and testing new cloud infrastructure? Could it really compete with the macOS developer experience, which although not perfect, is pretty darned good? I figured it wouldn’t hurt to try out, seeing as I was most of the way there already: and I figured it’d be a worthwhile process to document for other developers curious about WSL.

If you’re considering the switch, or are just curious as to what it’s like—including how WSL integrates with Windows, what tool choices you have, and importantly, what you’re going to miss from macOS—then read on.

Side-note: I wrote a short guide around my original WSL-based setup a while ago. Some of this article revises the tool choices I made at the time; the rest of it talks around the general Windows + WSL-experience and how it compares to macOS.

“The Shim”

In short: you effectively have “1.5” computers to deal with, and it feels like it at times.

Linux & Windows co-exist via the WSL layer, and although it’s generally pretty great (if not technically impressive), there are parts where the facade peels back to reveal some less-than-great interactions.

Jessie Frazelle wrote a great post on how WSL internals work (Windows <-> Linux syscall translation), and touches on some of the challenges I speak to below.

The first, and most obvious, is the way the filesystems interact. You can write to Windows from WSL - e.g. /mnt/c/Users/Matt/Dropbox/ writes to my Dropbox and works as expected, but you can’t read/write files from Windows -> WSL. Thus, accessing Windows from WSL is the “happy” path: anything you download via Chrome, in your Dropbox, on an external drive, etc - is accessible via /mnt/<driveletter>. It’s when you’ve cloned a git repo, use wget/curl -O to pull something down, or are iterating on a $language package in WSL and want to use a Windows-native tool that you’re destined to shuffle things around. I’ve symlinked my core working folders back into the Windows filesystem to make this part a little more livable - e.g. ln -s $USERPROFILE/repos $HOME/repos.

You notice this filesystem gap the most when dealing with Windows-native editors but WSL-based toolchains: in my case, that’s VS Code on Windows and the Go toolchain inside WSL. VS Code doesn’t know how to look for your toolchain & packages inside WSL, and so you either need to live inside of Windows (losing your Linux tooling), install VS Code inside of WSL, which means losing the ability to open files outside of WSL + native Windows integration. The ‘partial’ solution is to use a shared $GOPATH within the Windows filesystem, which at least means your packages only need to be fetched once, but you’ll need to be wary of potential differences should a package change implementation across OS’ (inc. the standard lib!). This is far less of a solution for systems programmers. There’s an open issue for this as it relates to vscode-go, but it still speaks to the “1.5 computers” problem I mentioned earlier.

Overall? It’s usable, you learn to live with it, but it adds friction to my day-to-day.

Terminal Emulators

I’ve bounced between a few terminal emulators here. None are perfect, and all of them make me yearn for iTerm2 on macOS. I wish it was better.

The situation is improving though, and with the ConPTY API in the October 2018 Windows 10 build (1809) making it much easier to integrate existing terminal emulators, it can only improve.

What I’ve tried so far:

  • Cmder (ConEmu): fast & configurable, but poor Unicode support, tmux glitches & some emulation/escaping issues. Some improvements coming via ConPTY.
  • Hyper.js: Cross-platform due to Electron underpinnings, lots of third-party plugins. Same underlying emulator as VS Code (xterm.js), but tends to be very slow launch, spawn new shells, and doesn’t keep up with lots of terminal output. I used Hyper for most of this year because despite the perf issues, it was the least buggy.
  • wsltty (Mintty): Fast. Moderately configurable, but config DSL is a pain & docs are lacking. Not a bad option for most, and is the only one with mouse support for tmux out-of-the-box.
  • Terminus: Similar to Hyper.js in that’s it’s Electron-based, but faster, and easier to configure. Good font rendering, doesn’t break under tmux, and has a solid tab UI. It’s still innately limited to its Electron roots in that it can be slow to launch, but handles high velocity output much better than Hyper.
  • Alacritty: A (very) fast, minimalist cross-OS emulator with a well-documented configuration. Windows support relies on winpty-agent, and font rendering (esp. Unicode fallback) is far from perfect. There is upcoming support for the aforementioned ConPTY API is in the works, and font changes coming.

I’m using Terminus for now, but I’m hopeful about Alacritty becoming my default terminal by end of year. Terminus is “good enough despite the bugs”, which has been a good way to sum up how most tools work under WSL.

Automation & Package Management

There were (are) myriad ways to bootstrap a new Mac: usually some combination of Homebrew, a shell script calling defaults write to set preferences, and installation of your dotfiles. Certainly, there are ways to do this on Windows—but something lightweight that doesn’t involve directly hacking at registry keys via PowerShell and has a solid community to crib from has been historically lacking.

Thankfully, there are ways to do this on Windows now: both the OS-level configuration as well as desktop package management (via Chocolatey). The answer is Boxstarter, which is a wrapper around Chocolatey itself, as well as a number of convenience functions for modifying Windows Explorer settings, enabling WSL, and removing the (honestly pretty horrible amount of) bundled applications that Windows comes with. Why does my first-party Microsoft hardware comes with a FitBit app and Candy Crush? (rhetorical; it’s $$$).

Here’s a snippet of what my Boxstarter script looks like:

# Pre
Disable-UAC

# Set PC name
$computername = "junior"
if ($env:computername -ne $computername) {
    Rename-Computer -NewName $computername
}

# Set DNS upstreams
Set-DNSClientServerAddress -InterfaceIndex $(Get-NetAdapter | Where-object {$_.Name -like "*Wi-Fi*" } | Select-Object -ExpandProperty InterfaceIndex) -ServerAddresses "8.8.8.8", "1.1.1.1", "2001:4860:4860::8888", "2001:4860:4860::8844"

# Set environment variables
setx GOPATH "$env:USERPROFILE\go"
setx WSLENV "$env:WSLENV`:GOPATH/p:USERPROFILE/p"

# Install applications
choco install -y sysinternals
choco install -y vscode
choco install -y googlechrome.dev
choco install -y 1password
choco install -y docker-for-windows
choco install -y cmdermini
choco install -y discord
choco install -y spotify
choco install -y dropbox
choco install -y adobereader
choco install -y 7zip.install
choco install -y firacode

# WSL
choco install -y Microsoft-Hyper-V-All -source windowsFeatures
choco install -y Microsoft-Windows-Subsystem-Linux -source windowsfeatures
Invoke-WebRequest -Uri https://aka.ms/wsl-ubuntu-1804 -OutFile ~/Ubuntu.appx -UseBasicParsing
Add-AppxPackage -Path ~/Ubuntu.appx

RefreshEnv
Ubuntu1804 install --root
Ubuntu1804 run apt update
Ubuntu1804 run apt upgrade

# System-level configuration
Disable-BingSearch
Disable-GameBarTips

Set-WindowsExplorerOptions -EnableShowHiddenFilesFoldersDrives -EnableShowProtectedOSFiles -EnableShowFileExtensions
Set-TaskbarOptions -Size Small -Dock Bottom -Combine Full -Lock
Set-TaskbarOptions -Size Small -Dock Bottom -Combine Full -AlwaysShowIconsOn

You’ll still going to need to write some PowerShell for more advanced things (i.e. setting DNS servers), but you might also consider that a blessing, given it’s power.

Within WSL I’m using Linuxbrew, a fork of Homebrew (and which is on-track to merge with it) in cases where I need more cutting-edge packages beyond the Ubuntu repositories. Using the same brew install workflow as I’m used to on macOS is pretty nice, and makes it a friendlier development environment without having to add package-specific repositories or build from source.

Docker

Not much has changed from last time: it works, with a few minor problems.

The docker CLI inside WSL can talk to Docker for Windows (the daemon), so you get Hyper-V benefits there. The catch is that the CLI doesn’t know how to validate the certificates used by the daemon, and thus you either need to disable TLS for connections over localhost (bad), or do a cert-generation dance and edit the Docker for Window config file by hand to use these new certs. It’d be great if the Docker daemon did this for you, so you could just set DOCKER_CERT_PATH=/mnt/c/ProgramData/Docker/pki and have things work securely.

As a reminder, you don’t get Hyper-V support without Windows Pro, which impacts both Linux Containers on Windows and Windows Containers on Windows (unless you want to use VirtualBox).

What I Miss

I miss FileVault and Apple’s push towards securing the device, especially with their recent Secure Enclave-based improvements: a benefit of verticalizing, really. Windows’ BitLocker continues to be untrustworthy, and I’d be far more worried about a lost Windows machine vs. a lost macOS machine. BitLocker is also awkwardly positioned as a Windows 10 Pro only feature, which in 2018, is very much the wrong thing to nickle-and-dime users over. It’s frustrating to buy a Surface Pro and then have to dole out $99 for the Windows Pro upgrade.

macOS’ community of power-user tooling is also unsurpassed: the aforementioned Alfred App as a powerful search tool, great screen-capture tools, Preview.app (the Windows PDF editor landscape is not good), Quick Look, some fantastic design tools, Automator (still good!), easy keyboard shortcut customization (no RegEdit or third-party tools), consistent keyboard shortcuts, upper quartile battery life due to tight software-hardware integration, and a single filesystem no matter whether you’re in a Cocoa app on macOS or a cross-compiled GNU tool inside iTerm2. There’s room for improvement here in both Windows-itself & WSL-land, but much of it is around developer community, and that’s a hard win.

I also want to say that I don’t share the “macOS” is dead sentiment that others do, and that hasn’t been the driver for the change. It’s just that some alternatives have finally started to close the gap, both in terms of software experience & hardware quality/support, and I was in the position to experiment with them.

Why Not All-In on Linux?

I’ll keep this short: I still depend on Lightroom, writing tools (Notion, Evernote prior), a solid default desktop environment, first-party hardware support (be it a MacBook or Surface) & battery life, and most of all, my time. I respect those who’ve invested the time into maintaining & automating a full Linux environment they can use daily, but I just don’t have the time for that investment nor am I ready to make the trade-offs required for it. To each their own.

So, Are You Going to Stick with WSL?

Before I answer: I’d love to see a few things improve, and although I think they will, some improvements will be challenging given that the WSL and Windows environments are distinct. Specificallly:

  • Better interaction between filesystems; if I could access my WSL root partition via a (default, NFS) mount in Windows, then I’d have access both ways. Something like //wsl/ or //linux would be fantastic. For contrast, the Linux container environment within ChromeOS (“Crostini”) exposes your files into the native ChromeOS environment, and thus makes working on data across both OS’ a less disruptive process.
  • Improved VS Code interactions with WSL-based tools: pointing at language servers and file paths within the WSL environment would be key to this
  • A continued march towards a solid terminal emulator or two; I’m hopeful here thanks to the ConPTY changes. Microsoft contributing resources here would likely benefit the viability of WSL.

So, am I going to continue to use WSL as a dev environment?

The answer is a (reserved) yes, because most of the dev-work I do in it is OSS, exploratory or web-based, with tools that I mostly control. If I’d been dealing with the heavily Dockerized environment at my old job, and writing/debugging lots of Lua, the answer might be closer to “no”.

WSL needs another six months of tools development (ConPTY being core to that), and although I’d thought that 6+ months ago, and had hoped the experience would be a little more polished now, at least Microsoft has continued to invest resources into it. I’m not quite convinced that a Linux toolchain makes my life easier than the Darwin-based one in macOS, but here I am.

Still, try asking me again in another 6 months?


Diving Into FiveThirtyEight's "Russian Troll Tweets" Dataset with BigQuery

•••

FiveThityEight recently released a dataset of what is believed to be ~3 million tweets associated with “Russian trolls”. These tweets are designed to spread misinformation (let’s not mince words: lies), and ultimately influence voters. If you haven’t read the linked article, I highly suggest you do that before continuing on.

Exploring a ~700MB+ CSV file isn’t hugely practical (it’s since been sharded into < 100MB chunks), and so I’ve made the tweets available as a public dataset via Google’s BigQuery analytics engine. BigQuery has a sizeable free tier of 1TB per month, which should allow a fair bit of exploration, even if you’re a student or if paid services present a challenge for you.

Note: This isn’t a BigQuery & SQL tutorial: for that, take a look at the documentation.

If you’re already familiar with BigQuery & accessing public datasets, then you can simply run the below to start exploring the data:

#standardSQL
SELECT
  author,
  COUNT(*) AS tweets,
  followers
FROM
  `silverlock-bigquery.public_datasets.fivethirtyeight_troll_tweets`
GROUP BY
  author,
  followers
ORDER BY
  tweets DESC,
  followers DESC

For everyone else: read on.

Accessing the Dataset

We’re going to use the BigQuery web UI, so navigate to the BigQuery interface and select the project you want to access it from. You’ll see the fivethirtyeight_russian_troll_tweets table appear on the left-hand-side, in the Resource tab. From there, you can inspect the table russian_troll_tweets, look at the schema (also pasted below), and see a preview of the data.

name type mode
external_author_id FLOAT NULLABLE
author STRING NULLABLE
content STRING NULLABLE
region STRING NULLABLE
language STRING NULLABLE
publish_date TIMESTAMP NULLABLE
harvested_date TIMESTAMP NULLABLE
following INTEGER NULLABLE
followers INTEGER NULLABLE
updates INTEGER NULLABLE
post_type STRING NULLABLE
account_type STRING NULLABLE
new_june_2018 INTEGER NULLABLE
retweet INTEGER NULLABLE
account_category STRING NULLABLE

So with the data above, what can we do? We can look at how these tweets were amplified (updates), what language the tweet was posted in (what audience was it for?), and the direct audience of the account (followers). We don’t get details on the followers themselves however, which makes it hard to know how impactful the reach was: is it trolls/bots followed by other trolls, or members of the general Twitter populace?

Analyzing It

OK, let’s take a quick look at the data to get you thinking about it. We’ll answer:

  • Was there a specific account with a non-negligible fraction of tweets?
  • Which months saw the most activity?
  • Which tweets were the most amplified in each language?
-- Was there a specific account with a non-negligible fraction of tweets?
SELECT
  author,
  COUNT(*) AS count,
  FORMAT("%.2f", COUNT(*) / (
    SELECT
      COUNT(*)
    FROM
      `silverlock-bigquery.public_datasets.fivethirtyeight_troll_tweets`) * 100) AS percent
FROM
  `silverlock-bigquery.public_datasets.fivethirtyeight_troll_tweets`
GROUP BY
  author
ORDER BY
  percent DESC
LIMIT
  10

The EXQUOTE account was definitely a sizeable contributor, although there’s not an order-of-magnitude difference across the top 10.

author count percent
EXQUOTE 59652 2.01
SCREAMYMONKEY 44041 1.48
WORLDNEWSPOLI 36974 1.24
AMELIEBALDWIN 35371 1.19
TODAYPITTSBURGH 33602 1.13
SPECIALAFFAIR 32588 1.10
SEATTLE_POST 30800 1.04
FINDDIET 29038 0.98
KANSASDAILYNEWS 28890 0.97
ROOMOFRUMOR 28360 0.95
-- Which months saw the most activity?
SELECT
  FORMAT("%d-%d", EXTRACT(month
    FROM
      publish_date), EXTRACT(year
    FROM
      publish_date) ) AS date,
  COUNT(*) AS count
FROM
  `silverlock-bigquery.public_datasets.fivethirtyeight_troll_tweets`
GROUP BY
  date
ORDER BY
  count DESC
LIMIT
    10

Unsuprisingly here, we see October 2016 (just prior to the election on Nov 8th) feature prominently, as well August 2017, in which the North Korean conversation escalated immensely.

date count
8-2017 191528
12-2016 155560
10-2016 152115
7-2015 145504
4-2017 136013
1-2017 135811
11-2015 132306
3-2017 128483
11-2016 123374
8-2015 119454
-- Which tweets were the most amplified (likes, retweets) by language?
SELECT
  language,
  content,
  updates
FROM (
  SELECT
    language,
    content,
    updates,
    RANK() OVER (PARTITION BY language ORDER BY updates DESC) AS tweet_rank
  FROM
    `silverlock-bigquery.public_datasets.fivethirtyeight_troll_tweets`
  GROUP BY
    language,
    updates,
    content ) troll_tweets
WHERE
  tweet_rank = 1
GROUP BY
  language,
  content,
  updates
ORDER BY
  updates DESC
LIMIT
  10

I’ll leave analyzing these tweets as an exercise to the reader, but they certainly appear to prey on the hot button issues in a few places. Also note that I’ve truncated the output here, for brevity. Also be mindful of any links you follow here: I have not vetted them.

language truncated_content updates
English ‘@JustinTrudeau Mr. Trudeau, Canadian citizens dem 166113
Turkish KARMA, KARMA, KARMA!!! https://t.co/Eh5XUyILeJ 165833
Catalan ‘@HCDotNet Excellent! 🇺🇸👠🠻😆’ 165751
Farsi (Persian) Shameful https://t.co/rll2JrUzRI 165468
Dutch Trump’s tweets. #ThingsITrustMoreThanCNN https:/ 165407
Norwegian #2018PredictionsIn5Words Pro-Trump landslide 165371
Vietnamese So sad. @TitosVodka rocks!! https://t.co/sWtLlZxL5 164288
Lithuanian Stump for Trump @Stump4TrumpPac https://t.co/S0NS9 164082
Estonian #QAnon @Q #FOLLOWTHEWHITERABBIT 🠇 #FLYSIDFLY# 163448
Croatian ‘@FoxNews @rayann2320 @POTUS Bravo Mr President!!’ 163126

Wrap

There’s a lot of data to explore here, but it’s also worth keeping in mind that three (3) million tweets is only a small fraction of tweets associated with this kind of content, and this kind of bounded data collection may have some subjectivity to it.

If you have any questions about the dataset itself, you should open an issue on FiveThirtyEight’s GitHub repository. As for questions about exploring it via BigQuery: feel free to tweet @elithrar with your questions or explorations!



© 2019 Matt Silverlock | His photo journal | Code snippets are MIT licensed | Built with Jekyll