Some ideas and thoughts on recruiting for tech positions, no technical background required.

Our take on the “final interview” for developers

How do you at one point in the tech interview process determine if a candidate is the right fit for your team? He or she has passed initial screenings and interviews, and seems to possess the right skills to fill the needed role. If we consider this a stage-gate-process, what should be the final gate?

At Unacast, we are experimenting with an interesting take. The question that we asked ourselves is basically - how do know that the candidate will fit into our everyday life at Unacastle and how we work? This lead to a rough analysis of what we actually do during a typical workday.

Being a startup in a highly exploratory space, working on the tech, we spend a lot of time on researching, testing and discussing different approaches to solve problems at hand before implementing production code. This means that we often need to learn new languages, frameworks and paradigms.

Pair programming

A common approach in development is to do pair-programming, which is basically what it sounds like, developers working on the same problem on one computer. One could also call it pair-problem-solving - and research have shown that being two often yields better solutions faster. This is especially true when learning new things or attacking unknown problems.

We therefore decided to invite candidates that had passed all stages to a “night at the Unacastle”, where the candidate would be paired up with one of our developers to work for a couple of hours on a given problem.

An important twist is that the problem to work in is unknown both to our developer and the candidate. That means that both start pretty much on scratch and have to approach the problem together and stitch together a solution by googling, reading documentation and discussing. And usually a pretty solid dose of Stack Overflow.

In our experience, after a little while, everybody seems to forgot that this is actually an interview, and people let their guard down - let’s get some shit done! It has been interesting to see how people react when they are being questioned or how they argue that one approach is better than the other. In our opinion, the real goal of an interview is to let the candidates show their true selves, and their skills. We have found that this interview structure make that happen because it creates a relaxed environment. Considering the fact that some developers can have an analytical and introvert personality (and by the way, being introvert or extrovert are neither positive or negative traits - they are just different) , in addition to the natural nervousness connected to applying for a new job, an approach that creates a playing field that feels natural is a huge win.

Some guidelines and ethical perspectives

We value the candidate’s time and effort, so it would be unethical to work on internal products or features, which basically would mean that candidates did free work for us. Our solution to this issue has been to just do something for fun, that could be open sourced at some point. Obviously it can be stuff that is useful for us, but it should also have potential to be useful to other people and not tightly integrated or related to our systems or codebase. Usually, we build something in a new, exciting language or framework that we may have looked at in our spare time at but haven’t found the time to dig into. It’s a win-win!

One could fear that the candidate may find that we do not have the level they expect, but that is ok. An interview process goes both ways, it is just as much about that the candidate should get a real impression of our company, our people and how we do our things.

Takeaways and learnings

On the practical side, you need at least one developer internally that is available. Ideally, a few more are good because then the team can just hang around and listen casually into the conversation and drop in if they feel like. Looking at recruiting in general for non-tech positions, case assignments are a widely used method to assess a candidate’s real-life abilities, however in a somewhat artificial setting. Coding like we do in “A night at the Unacastle” is in many ways easier to assess, since it is much closer to the actual thing. We are a young company that is in continuous development in all aspects, such as recruiting. But we feel that we have found something precious here, and we will definitely continue with this practice, and continue to refine it based on feedback from candidates and our own learnings.

Lastly, although not entirely related to the “Night at Unacastle”, one of the most important learnings is that sourcing candidates is really, really hard work! Considering that the type of candidates (i.e. the A players) we are looking for usually are happy where they are, it’s all about using networks, seminars, social media and whatever means possible to get hold of the right people. We have also found that working with recruiters is a great help in the actual interview process, but there is no excuse for not making sourcing a team responsibility that everybody should feel committed to.

A last little secret that we want to share is that most developers get at least a couple of recruiter calls or emails per month that are easily dismissed. It has a lot more punch to start the conversation with “Hey, I work as a Platform Engineer at Unacast. I think you have done some really interesting stuff, wanna grab a coffee?”

If you feel that you are in the target group and that the last question speaks to you, don’t hesitate to reach out! We would love to spend a night at Unacastle with you.

How do you make sense of all those terabytes of data stuck in your BigQuery database?

Here’s what we tested

We here at Unacast sit on loads of data in several BigQuery databases and have tried several ways of visualizing that data to better understand them. These efforts have been mostly custom Javascript code as a part of our admin UI, but when we read about Re:dash we were eager to test how advanced visualizations we could do with an “off the shelf” solution like that. We wanted both charts showing all kinds of numerical statistics retrieved from that data and maps showing us geographical patterns. Re:dash supports this right out of the box, so what were we waiting for?

Getting up and running

Since we run all our systems on Google Cloud we were really happy to discover that Re:dash offers a pre-built image for Google Compute Engine, and they even have one with BigQuery capabilities preconfigured. This means that when we fire up Re:dash in one of our Google Cloud projects, the BigQuery databases in the same project are automatically available as a data sources ready to be queried. Awesomeness!!

Apart from booting the GCE image itself we had to open some firewall ports (80/443) using the gcloud compute firewall-rules create command, add a certificate to the nginx instance running inside the Re:dash image to enable https and lastly add a dns record for easy access.

The final touch was to add authentication using Google Apps so we could log in using our Unacast Google accounts. This also makes access and user control a breeze.

The power of queries

As the name implies, the power of BigQuery lies in queries on big datasets. To write these queries we can (luckily) just use our old friend SQL so we don’t have to learn some new weird query language. The documentation is nothing less than excellent. There’s a detailed section on Query Syntax and then there’s a really extensive list of Functions that spans from simple COUNT() and SUM() via REGEXP_EXTRACT() on Strings and all kinds of Date manipulations like DATE_DIFF(). There’s also beta support for standard SQL syntax

which is compliant with the SQL 2011 standard and has extensions that support querying nested and repeated data

but that’s sadly not supported in Re:dash yet (at least not in the version included in the GCE image we use).

In Re:dash you can utilize all of BigQuery’s querying power and you can (and should) save those queries with descriptive names to use later for visualizations in dashboards. Here’s a screenshot of the query editor and the observant reader will notice that I’ve used Google’s public nyc-tlc:yellow dataset in this example. It’s a dataset containing lots and lots of data about NYC Yellow Cab trips and I’ll use them in my examples because they’re kind of similar to our beacon interaction data as they contain lat/long coordinates and timestamps for when the interaction occurred.

1000 cab trips

It’s, however, worth noting that you don’t get any autocomplete functionality in Re:dash, so if you want to explore the different functions of BigQuery using the tab key you should use the “native” query editor instead. Just ⌘-C/⌘-V the finished query into Re:dash and start visualizing.

Visualize it

Every query view in Re:dash has a section at the bottom where you can create visualizations of the data returned by that specific query. We can choose between these visualization types: [Boxplot, Chart, Cohort, Counter, Map] and here’s how 100 cab trips look in a map

100 cab trips

When you get a handful of these charts and maps you might want to gather them in a dashboard to e.g. showcase them on a monitor in the office. Re:dash has a dashboard generator where you can choose to add widgets based on the visualizations you have made from your different queries. You can even rename and rearrange these widgets to create the most informative view. Here’s an example dashboard with the map we saw earlier and a graph showing the number of trips for each day in a month. The graph makes it easy to see that the traffic fluctuates throughout the week, with a peak on Fridays.

dashboard

So what’s the conclusion?

Re:dash has been a pleasant experience so far, and it has helped us get more insight into the vast amount of data we have. We discover new ways to query the data because it’s easier to picture a specific graph or map that we want to produce rather than just numbers in columns. We intend to use this as an internal tool to quickly generate visualizations and dashboards of specific datasets to better understand how they relate too and differs from other datasets we have.

There are some rough edges, however, that have been bothering us a bit. The prebuilt GCE images aren’t entirely up to date with the latest releases, unfortunately. The documentation mentions a way to upgrade to the newest release, but we haven’t gotten around to that yet. The lack of support for standard SQL syntax in BigQuery is also a little disappointing since that syntax has even better documentation and the feature set seems larger, but it’s not that big of a deal. The biggest problem we have been facing is that the UI sometimes freezes and crashes that tab in the browser. We haven’t pinpointed exactly what causes it yet, whether it’s the size of the result set or the size of the dataset we’re querying. It’s really annoying regardless of the cause because it’s hard to predict which queries will cause Re:dash to crash. Hopefully, this will be solved when we figure out how to upgrade to a newer version or the Re:dash team releases an updated image.

Introduction

Today, I’m writing about concurrency and concurrency patterns in Go. In this blog post I will outline why I think concurrency is important and how they can be implemented in Go using channels and goroutines.

Disclaimer: This post is heavily is inspired by “Go Concurrency Patterns” a talk by Rob Pike.

Why is concurrency important?

Web services today is largely dependent upon I/O. Either from disk, database or an external service. Running these operations sequentially and waiting for them to finish will result in a slow and underperforming system. Most modern web frameworks solves the basic issues for you. That is, without setup it handles each http request concurrently. But if you need to do something out of the ordinary, like calling a few external services and combine the results you are mostly on your own.

The two most common models for concurrency that I’ve used is shared-memory model using Threads like in Java. Or callbacks used in asynchronously languages like in Node.js. I believe that both approaches can be insanely powerful when done right. However, that they’re also insanely hard to get right. Shared-memory model sharing state/messages through memory using locks and is error-prone to say the least. And asynchronous programming is, at least in my experience, a hard programming paradigm to reason about and especially to master.

Concurrency in Go

Go solves concurrency in a different manner. It’s similar to Threads but instead of sharing messages through memory, it shares memory through messages. Go uses goroutines to achieve concurrency and channels for passing data between them. We will dig into these two concepts a bit further.

Goroutines

Goroutines is a simple abstraction for running things (functions) concurrently. This is achieved by prepending go before a function call. E.g.

A good example of the concept can be found here

Channels

Channels is the construct for passing data between routines in Go. A channel blocks both on the sending and receiving side until both are ready. Meaning a channel can be used for both synchronising goroutines and passing data between them. Below we see a simple example of how to use channels in Go. The basic idea is that data flows the same directions as the arrow.

In the example below we see how channels and goroutines can be used to create a function utilising concurrency that is easy to understand and reason about.

Example: using goroutines and channels

First let’s assume we want to create a service that asks three external services and return them. Let’s call these three services Facebook, Twitter and Github. For simplicity, we a fake communication with each of these services, such that the result of each service can be found at https://localhost/{facebook, twitter, github}, respectively.

The behaviour for GetAllTheThings is to fetch data from all services defined and combined them into a list. Let’s start with a naive approach.

Above we see an example implementation of the naive approach. In other words we query each service sequentially. That means that the call to the Github service has to wait for the Facebook service. And the Twitter service needs to wait on both Github and Facebook. Since each of these services are not dependent on each other. We can improve this by performing the requests concurrently. Enter channels and goroutines.

(PS: I’ve ignored handling errors in the concurrent examples. Don’t do this at home. It’s just for pure readability).

We’ve now modified the naive approach using channels and goroutines. We see that each call is being issued inside a goroutine. And that the results are being collected in the for-loop at the end. The code can be read sequentially and therefore easy to reason about. It’s also explicitly concurrent since we explicitly issue several goroutines. The only caveat is that the results may not be returned in the same order as the routines were issued.

Notice that we can still are able to use the naive approach for fetching a resource: naive.Get(path string). And that the signature of the function is exactly the same as before. That is powerful! But does it actually run faster?

In main.go we put everything together and measure execution time to see if its actually faster.

The conclusion is yes, it runs faster. Actually, it runs an order of magnitude faster. If you want to run these experiments your self or just curious about the implementation. The full example project can be found here.

Closing notes

We have shown that it’s easy to utilise concurrency in Go using channels and goroutines. However, this post has simplified a lot and the caveats you may encounter using channels and goroutines are not fully addressed here. So use channels and goroutines with caution. They can cause a lot of headache if over used. The general advice is to always start by building something naive before optimising.

I hope you have enjoyed reading this post. If I’ve done something unidiomatic please tell me so in the comment below or on twitter (@gronnbeck). I’m still learning and having fun with Go. And I’m always eager to learn from you as well.

The 12 factor app is a methodology that unify the composition and interface of web applications. Additionally, this methodology addresses other factors of web applications such as scalability and continuous deployment.

The 12 factors

  1. Codebase
  2. Dependencies
  3. Config
  4. Backing services
  5. Build, release, run
  6. Processes
  7. Port binding
  8. Concurrency
  9. Disposability
  10. Dev/prod parity
  11. Logs
  12. Admin processes

Source: 12factor.net

A building block for microservices and container orchestration

There is a wide range of reasons for why adoption of the microservices pattern is common today — continuous integration, independent service scalability and organizational scalability are some. Moreover, the 12 factor app is crucial to creating microservices as it ensures ability to perform continuous integration and independent service scalability.

Microservices require an infrastructure that offers simple service orchestration and deployment management. Therefore, container and orchestration products like Docker or Kubernetes and friends are absolutely pivotal to making microservices a sensible approach. Container technology creates portable, containers, of an application or data. Whereas orchestration tools manages running clusters of such containers or other portable executables, providing clear interfaces for deployment, resilience and scalability.

To create valuable application containers, 12 factor apps again come in handy by exhibiting traits like the use of backing services, relying on port binding and disposability.

Further, container orchestration products are deeply sunk into the composition and interface of 12 factor apps. Commonly, these products expect, in addition to the container traits, apps to be dealing with config through the environment, scale by stateless processes and handle logs through stdout.

Looking at the highest current maturity level of web application platforms AppEngine, Heroku and the similars, they are bound into 12 factor apps in much the same way as most container orchestration products. In fact, the 12 factor app was introduced by Heroku themselves.

Developer experience in a polyglot world

The state of web application development is evolving at a heartening pace. As a result many aspects are heavily fragmented, most notably the number of programming languages and frameworks. Fortunately, an increasing adoption of applications cohering to the 12 factor methodology helps keeping the eco-system as a whole sane. Without the common ground of the 12 factor app, creating general tools would likely be an excruciating task. Not to mention, the 12 factor common ground ease the mental load for developers moving from one framework to another — something that is especially huge in a microservices setting.

Up until recently Kubernetes clusters running on GCP have only supported one machine-type per cluster. It is easy to imagine situations where it would be beneficial to have different types of machines available to applications with different demands. This post will detail how to add a new node-pool and ensure that specific pods are deployed to the preferred nodes.

k8s logo

Why not a cluster with different machines?

There is at least one good reason to run a cluster with a homogeneous machine pool, it is the simplest thing. And up to a certain level, that is the smartest thing to do. If all your applications running on k8s has roughly the same demands to e.g. CPU and memory, it is also something you can do for a long time.

What pushed us to explore heterogeneous clusters was mainly two things:

  1. We had some apps demanding a much higher amount of memory than others
  2. We had some apps that need to run on privileged GCP machines.

We could solve #2 by giving all machines the privileges needed, and we also did for a while. But to solve #1 it would be very expensive to upgrade all machines in the cluster to high memory instances.

Enter node-pools.

Node pools

Node pools is a fairly new, and very poorly documented, alpha feature on Google Container Engine that lets you run nodes with different machine types. Earlier you were stuck with the initial machine type, but with node-pools, you are also able to migrate your cluster from one machine type to another. This is a great feature, as migrating all your apps from one cluster to another is nothing I would recommend doing more than once.

All clusters come with a default pool, and all new pools need to have a minimum size of 3 nodes.

Creating a new node pool

Creating a node pool is pretty straight forward, use the following command

  $> gcloud alpha container node-pools create <name-of-pool> \
  --machine-type=<machine-type> --cluster=<name-of-cluster>

Scheduling pods to specific types of nodes

To schedule a pod to a specific node or a set of nodes, one can use a nodeSelector in the pod spec. The nodeSelector needs to refer to a label on the node, and that’s pretty much it. An alpha feature in Kubernetes 1.2 is node affinity, but more on that in a later post.

There are a couple of ways to approach the selection of nodes. We could add custom labels to the nodes with the kubectl label node <node> <label>=<value> command, and use this label as the nodeSelector in the pod spec. The disadvantage of this approach is that you will have to add the new labels as you resize the node pool. The other and simpler solution are just to refer to the node-pool itself when scheduling a the pods.

Let us imagine that we added a node-pool with high memory machines to our cluster, and we called the new node-pool highmem-pool. When creating node-pools on GKE, a label is automatically added. If we do a kubectl describe node <node-name> we can see that the node has the following label: cloud.google.com/gke-nodepool=highmem-pool.

To ensure that a pod is scheduled to the node pools, we need to add that label in the nodeSelector like this:

1
2
3
4
5
6
7
8
9
10
11
  apiVersion: v1
  kind: Pod
  metadata:
    name: nginx
   spec:
    containers:
    - name: nginx
      image: nginx
      imagePullPolicy: Always
    nodeSelector:
      cloud.google.com/gke-nodepool: highmem-pool

Summary

Node-pools are a great new feature on GKE and something that makes Kubernetes much more flexible and also let you run different kinds of workload with different requirements.