#golang: The Next Great Teaching Language

The first few months spent learning a programming language are incredibly important. If a teacher finds the perfect balance of difficulty, creativity, and magic, she’ll reward her students with a beautiful skill. If the learning process becomes too difficult, however, the student might forsake programming, missing out on the opportunity of a lifetime. Thus, it’s paramount that a student’s first programming language provides an almost perfect experience: it should be easy to pick up for a complete novice, while also including a great purview of computing.

Many languages have spent time as academia’s favorite teaching language. For much of the 80’s and 90’s, a Lisp dialect, Scheme, was considered that language. It’s minimal syntax combined with excellent teaching materials (SICP) made it a favorite for many universities. In the late 90’s, Java took its place by introducing unparalleled portability, familiar C-like syntax, and wide applications. Many universities introduce students to programming with dynamic languages, such as Python. This is so that students are not bogged down with too much “low-level” work. My alma mater, Harvey Mudd College, hedges its bets by teaching all three languages in the first year of the computer science program. Today, we are starting to see a very viable competitor emerge.

In 2009, Google introduced the programming language Go, which promised compiled language performance and dynamic language productivity. In addition, the language has a serious pedigree: two of the co-creators are Rob Pike and Ken Thompson, who were heavily involved in the invention of the UTF-8 encoding and the Plan 9 operating system. Although Go was designed to solve computing problems of Google’s scale, it also introduces many key features that, in my opinion, make it suited to be the next great teaching language.

Minimal Syntax

Go has a minimal syntax. It is so minimal, in fact, that the entire specification of the language fits on a single web page. This makes a huge difference when teaching because students can learn the complete language quickly. When I was a sophomore in college learning C++, I never felt like I actually knew the language. After every lecture, we would learn another obscure C++ feature that only created confusion when it was actually time to program. New programming features take practice in order to learn correctly. If you shove too many of these in front of a novice, they’ll quickly become overwhelmed.

Static Typing

If you’re coming from a dynamic language background, static typing might seem tedious and unnecessary. As your program becomes larger and more complex, however, static typing becomes an indispensable aid for refactoring and determining correctness; the compiler becomes an asset. In dynamic programming languages, you depend on unit tests for assuring correctness throughout the development process. Although testing is still required in static languages, you can ignore a lot of boilerplate testing because the compiler will assure you that your code will execute correctly. This results in confident programming, of which new programmers need plenty.

Concurrent Programming

The end of Moore’s Law is approaching: we’re starting to reach physical limitations to how fast we can make our computer processors. Subsequently, vendors now scale out processor architectures over various cores. Taking advantage of modern multicore and multiprocessor hardware, however, is not easy: doing so involves refactoring existing code to either be thread or process aware—a considerable task for new programmers. Go abandons these traditional models and implements it’s own style of concurrency, using goroutines and channels. Goroutines and channels essentially function as lighter threads and queues. By providing these primitives as first class members of the language, concurrent programming feels like a natural extension of the language. By understanding these concepts, new programmers will be able to create modern and performant applications earlier.

Developer Tools

Developer tools are easy to ignore when learning a programming language. Tools such as profilers, formatters, race condition detectors, and documentation generators are not entirely critical when writing basic programs. They become essential very quickly, though. By bundling all of these tools with the standard distribution, Go provides all of the facilities necessary to jump into serious software development.

Comprehensive Standard Library

Out of the box, Go is suited for a wide variety of programming tasks. Everything from a web application to a database can be implemented without turning to third party libraries. This is advantageous because aspiring programmers do not have to concern themselves with the tedious task of finding and updating dependencies. In Go, a generally bug-free implementation of many common programming libraries is built-in. Since the implementations are tied to the language, you can depend on the interfaces to be stable long-term.

It Factor

Programming in Go feels right. The language feels well architected, and the libraries offer precise documentation and clean interfaces. I know that this is hard to explain and quantify, so I’ll leave it as an exercise to the reader to confirm. I have a feeling you’ll agree.


Scaling Static Sites to 2,000 Concurrent Connections (or How to Survive Hacker News)

Every serious developer has heard of the C10k problem, which is the networking challenge of accommodating 10,000 simultaneous connections, essentially users, on a web site. While accomplishing this feat is an incredible technical achievement and a cost-effective solution for web sites of a certain scale, it requires many infrastructural changes. For some people, solving the C10k problem might involve switching their web server; for others, it might involve using a different programming language or paradigm.

In this blog post, I will not claim to solve the C10k problem. Instead, I will show you how to support 2,000 concurrent connections on a static site by tuning Linux (Ubuntu 14.04) and Apache (2.4.7). Armed with a $20 a month DigitalOcean droplet, I was able to maintain 2,000 concurrent connections and 1,900 hits per second on this blog. Over an entire day, this performance translates to almost 165,000,000 total requests—more than enough to deal with seasonal load or a popular article on Hacker News.

Open File Limits

In the UNIX abstraction of a computer, all input and output operations are performed on files. This design decision was made to present programmers with a unified I/O interface, leaving all of the hard work to the operating system. As a result, each new network connection we want to accommodate corresponds to an open file, and the operating system maintains a limit for the maximum number of files that can be considered “open” at any given time per user. You can check this setting on the command line by running (as root):

Open File Limit
 ulimit -n

If you’re running a stock Ubuntu machine, the output of this will likely be 1,024. You can then check to see how many file descriptors you already have open by running:

Open Files for Current User
lsof | grep $USER | wc -l

On a stock Ubuntu 14.04 machine, this last command outputs 568, limiting me to 468 additional file descriptors. If I want to support 2,000 simultaneous HTTP connections, I’m more than 1,500 short!

To change this setting, we want to modify the root user’s open files limit (Apache runs as root). To do so, add the following lines to /etc/security/limits.conf:

root   soft    nofile  100000
root   hard    nofile  100000

The first setting is considered the “soft”, or initial, value. A running process can adjust its own open files limit up to the “hard” limit, which corresponds to the second setting. We’ll set these to 100,000 to allow the root user to open up to 100,000 files at a time if it really wants to, which should be more than enough for handling 2,000 concurrent connections.

In order for these settings to actually be enforced, we’ll have to add to add a new module to the system’s PAM (Pluggable Authentication Modules). In /etc/pam.d/common-session, add this line:

session required pam_limits.so

This module will ensure that the root user’s open file limits are set to the settings in /etc/security/limits.conf upon login.

After restarting Ubuntu, ulimit -n should now output 100,000.

Tuning Apache

In order to have Apache scale quickly to the number of connections, we’ll be using its MPM Worker module. This module turns Apache into both a multi-process and multi-threaded server. For each incoming request, Apache will utilize a free thread on one of its running processes. When all threads have been exhausted, Apache will create a new process. This type of system utilizes fewer system resources than a process per request system because new threads are not as expensive to create or carry the memory requirements of a new process.

To enable this module, run

Enable MPM Module
sudo a2enmod mpm_worker

Now, you’ll want to modify /etc/apache2/mods-enabled/mpm_worker.conf to match this:

<IfModule mpm_worker_module>
        # Use 2 processes initially
        StartServers 2

        # Use up to 32 processes
        ServerLimit 32

        # Always have 32 available threads
        MinSpareThreads 32

        # Keep a max of 256 available threads
        MaxSpareThreads 256

        # Enforce a maximum of 64 threads per process
        ThreadLimit 64

        # Use exactly 64 threads per process
        ThreadsPerChild 64

        # Use up to 2048 total threads
        MaxRequestWorkers 2048

        # Keep processes serving requests indefinitely
        MaxConnectionsPerChild 0

Afterward, restart Apache with sudo service apache2 restart. Theoretically, we now have Apache configured to serve up to 2,048 simultaneous connections.

Benchmarking with Blitz.io

Blitz.io is a wonderful site for benchmarking web servers because they provide real internet load from many, possibly international, client machines and offer a free tier. Using their product, we can very quickly generate 2,000 simultaneous connections for a web site.

After a test run with 2,000 simultaneous California connections, I was shown these results:

The test started with a single connection and progressed to 2,000 over the course of 2 minutes. Overall, there were 0 timeouts and 0 errors. On an unconfigured machine receiving 1,000 simultaneous Blitz.io connections, I received an error rate of 13.0%, and 6% of requests timed out. Configuration makes a big difference.

This post should leave you with the right tools to achieve similar scalability feats. Although I didn’t achieve 10,000 simultaneous connections with Apache, I was able to sustain a more realistic 2,000 connections, which is probably more than enough for Hacker News.