27 Dec, 2024

Building Argus

Insights and learnings from creating a logging service

You know that feeling when you start a project thinking “this shouldn’t be too hard” and in reality it is much more complicated but these are good problems to have. That’s exactly what happened when I decided to build Argus, a logging service. I had built many small applications before, but I never had to think about scale. This project taught me that logging at scale is a whole different beast - and reminded me why it’s so much more fun to learn new technologies when you actually need them rather than because they’re trending.

Starting Simple

I started as simple as possible with Go and a project structure borrowed from melkey. Like all my previous projects, I reached for PostgreSQL without much thought - it wasn’t until after implementing authentication that I started questioning this choice as I began thinking about log volumes.

The Database Journey

The search for the right database was interesting:

DynamoDB seemed like too much work to set up and maintain
ClickHouse looked promising but felt too costly for the initial scope
Then I discovered Cassandra

What I loved about Cassandra was that CQL felt familiar enough coming from SQL - I didn’t have to learn a completely new query language. The distributed nature of Cassandra was fascinating to think about, even though sometimes it was frustrating when things didn’t work exactly like PostgreSQL. Understanding why those differences existed due to Cassandra’s distributed nature was actually quite enlightening.

Communication Evolution

I was already thinking about optimizing for scale, I remembered watching a video about how LinkedIn increased their API performance by 60% using protocol buffers. My first instinct was to use gRPC - I had played with Twirp before but initially dismissed it. After spending time with gRPC and writing a proof of concept, I revisited Twirp’s documentation. What sealed the deal was Twirp’s ability to provide both HTTP/JSON and HTTP/protobuf endpoints from a single implementation.

Current Challenges

Now I’m tackling background jobs for log analytics in a distributed system. Initially, I thought Go routines would be sufficient, but the distributed nature of the system requires more thought. I’m considering approaches like:

Creating a dedicated table for background jobs with task locking
Message queues (though I’m hesitant to add another storage system)

I’m currently leaning toward the locks table approach since I already have Cassandra in place.

Another thing I am thinking about is how to send logs from client, Amazon Keyspaces (managed Cassandra), can do 30-row batch inserts at a time. I’m now deciding on how often and how many logs to send in a batch.

Project Structure

Argus is split into four main repositories:

argus-core: The main service handling log ingestion and storage
argus-client: A Go client library for easy integration
argus-web: The web interface for log visualization and management
argus-demo: A sample application demonstrating real-world usage

Future Directions

While Argus currently focuses solely on logging, the architecture allows for future expansion. Potential features include:

Distributed tracing integration
Metrics collection
Log analytics and visualization
Custom alert rules

Conclusion

Building Argus has been an exercise in making pragmatic architectural decisions. While there’s still room for improvement, the current version represents a solid foundation for a lightweight, efficient logging service. The key takeaway has been that focusing on core functionality and making decisions based on actual requirements leads to better outcomes.

Want to learn more? Check out the project documentation or try the demo application yourself.

Questions or comments? Send an email.