How I created sttrace in 1 week

Gaurang Pawar · Friday at 6:26 PM

This was one week before my Meta Production Engineer interview. I was looking for a website to practice my debugging and SRE skills. Meta has these debugging rounds where you are given a real world situation such as a broken server or an unresponsive service, and you have to provide a step by step analysis of what you would do in that situation.

Unlike normal coding interviews where you can simply solve the top 50 DSA/algo questions and clear the interview, debugging interviews do not have many websites where you can tackle simulated production outages.

So, just like any other developer with lots of free time, I decided to create one

It started with a very basic webpage: a simple terminal emulated in the browser with the xterm library. The input and output are sent via a socket connection to a backend server, which then routes it to a running container via SSH connection. The container has a very basic structure:

entrypoint.sh sets up sshd and other required processes
init.sh initializes the required state for the problem
submit.sh is triggered after the user clicks the submit button in the browser. It checks the current state of the user directory and echoes success or failure on the stdout stream.

Once I had the backbone set up, I needed to think about scaling. This was not going to be the next ChatGPT receiving millions of users in a week, but I still needed to plan for horizontal scaling in case it grew. The Node.js monolith backend could handle heavy traffic, so I was not worried about that. I also trusted Postgres for performance since most of the required queries were simple reads with little to no joins. The only part that could become a bottleneck was the running containers triggered for each problem.

I needed to spread these containers across multiple nodes for better performance and reliability. I went with a basic master slave architecture. Each node worker, when started, registers itself with a master service, which in this case is the backend monolith. The master then performs a basic health check on the node worker and places it inside a priority queue. I use the priority queue to find the best node worker for an incoming connection request and it also acts as a load balancer.

Another challenge was finding the right container manager. I needed a lightweight container manager with an API to start and stop containers. I could not find one, so like any other unemployed engineer with lots of time on his hands, I decided to create my own. A basic Flask API wrapper around the Docker client took less than an hour.

Once I had all the basic parts, I duct taped together a working website. Next was hosting. Looking at my options, I realized I did not have enough spare money to throw at EC2 boxes, but I did have two powerful Dell Optiplex boxes in my homelab, both running Linux. With a few bash scripts the homelab was all set to run the site. I got an elastic IP and spun up a t2.micro box routing all the traffic via an SSH tunnel to my homelab, and the sttrace.com was live.

The whole thing took around one week to build, and I was able to get around 40 users within the first two days of launch.

So, what is next?

I try to create at least one problem every day, but the website is still incomplete. The idea behind sttrace.com was bridging the gap between recruiters and developers. The next step is to build an internal job board where recruiters can post jobs. I also want this website to be a place where developers come to upskill instead of grinding problems only for job interviews.

I’m just getting started, and there’s a lot more to build. If you’re a developer looking to sharpen your debugging skills or a recruiter interested in meaningful assessments, check out sttrace.com and be part of this journey. Your feedback and participation can help shape it into a valuable platform for everyone.

Continue reading...

How I created sttrace in 1 week

Gaurang Pawar

Guest