Skip to content
Home » Technology Decision Log

Technology Decision Log

Introduction

The company I worked at recently shut down. I worked there for over four and a half years, and I was involved in many of the decisions that influenced the engineering organization’s technology choices. Some of those decisions paid great dividends down the line, and some did not. I found myself thinking: If I were to join a new organization, would I make the same bets, or what would I change?

Disclaimer

This is purely my subjective opinion. Every organization’s context is different. Understanding that context and the long term vision is crucial to making sound decisions.

Decision Log

ChoiceDecision Notes
AWSGoodTo name a few benefits: AWS allowed us to scale dynamically to the business growth and demands. It was a cornerstone of our DevOps culture and kept our infrastructure teams lean. AWS was responsible for 2-3 incidents that crippled the business which is impressive for the duration we used them. I would stay out of us-east-1, however.
FargateGoodWe run most of the production systems on Fargate. Fargate enabled us to delegate most of maintenance work to AWS (ec2 batching, upgrades,..etc) and have the SREs focus on other value-enabling streams.
KubernetesBadThere are legitimate use-cases for k8s. However I believe that most of the organizations that think they need k8s, don’t need it. Supporting k8s platform would have required massive bandwidth from the infrastructure teams for the ongoing maintenance and KTLO. Public Cloud providers have solutions that work for most organizations unless there is a business need that demands k8s.
PostgresGoodWe run one of the biggest transactional DBs I’ve ever worked with in my career (~10TB-14TB). It was a solid choice, and I’d definitely use it again
AuroraBadExpensive!! We were able to scale well with vanilla Postgres high throughput (read and write) systems.
RedisGoodGreat In-memory database and cache. Never use it as your main db however (well…never say never)
ElasticcacheBadExpensive! Most of the usecases we had didn’t really need ElasticSearch. If I ever needed a full-text-search or realtime metrics analytics then ES is a great tool for that.
SNS, SQS, EventbusGoodThey are reliable and reasonably easy to setup.
KinesisBadToo expensive if all you need is a regular message bus, try SNS, SQS, Eventbus first. There are legitimate usecases for Kinesis and Firehose and those are strong products in their respective areas
DynamodbBadIt is ok for a simple key-value storage. However, for the love of god, don’t use it to hammer every nail. It struggles to adapt to changing business requirements and can get expensive real quick if you aren’t careful.
CloudfrontGoodWhile it doesn’t have as many features as Akami and Cloudflare,it does integrate well with AWS and can do some powerful routing when combined with Lambdas.
AWS WAFGoodAfter launching WAF, we started blocking one third of production traffic outside ofbusiness hours. Overall easy to use with AWS Firewall Manager to ensure WAF protection is enabled for every endpoint in your environment. The AWS-managed rules worked well..
ACMGoodSetting up SSL certificates has never been easier.
VPC LatticeGoodFor connecting applications across multiple VPCs, I highly recommmend VPC Lattice for most usecases.
VPC Private EndpointGoodPerfect for connecting to datastores across multiple VPCs
VPC PeeringBadIt’s been offset by a few product launches over the past few years (Transit Gateway, Lattice, Private Endpoints)
VPC Transit GatewayBadTry Lattice and Private Endpoints first.
TerraformGoodThis was one of the best bets we made. We started using Terraform to manage AWS infrastructure, but eventually expanded to Github, Launchdarkly, Datadog and configuration management in many providers.
CloudformationBadUnreliable, unreadable and limited to AWS
Serverless Lambda AppsBadLambda is excellent for building custom integrations between AWS services and small one-off scripts. However, as with Dynamodb, things get ugly when you hammer every nail with a lambda function. Building complex business application workflows on Lambda will give you a handicap. It gets expensive, complicated, and challenging to maintain really quickly.
CDKBadInfrastucture and configuration management should be easy and simple. CDK stacks can sometime get complicated and allow engineers to shoot themselves in the foot by building infrastructure configuration that are as complicated as the business logic itself
Terraform Cloud (TFC)BadIt’s a glorified and expensive Terraform CICD. Overtime, it became too expensive. Their new pricing model that charges you by the number of resources is absolutely insane, and they know it. It also leads to having two cicd platforms (infra cicd and services cicd) which isn’t ideal and too much cognitive load. You could replace it with Terraform Cli + S3 backend + Your regular CICD platform
Terraform SentinelBadI’m all for compliance as policy. However, Sentinel is coupled with TFC. Nowadays, You can replace it with OpenPolicyAgent
PythonGoodGiven Python’s popularity, hiring Python engineers is easy. Many of the vendors on this list treat it as a first-class citizen with support at first launch.
RubyUndecidedMany of the engineers we hired had to learn Ruby on Day 1. Given how opinionated the Rails ecosystem is, this meant that engineers would often run into some friction for the first few weeks. In addition, many of the vendors we worked with didn’t support Ruby on the same level as Python or Java. That being said, in the hands of capable engineers familiar with how Ruby (and especially Rails) operates, it is a performant solution that can accelerate development velocity/.
GithubGoodWhile GH is great, Gitlab is a really good alternative. It offers full devops package inlcuding cicd, sast, test coverage,…etc and would be a better choice. However, Github Copilot is uncontested
BuildkiteBadAt the time (2020), it was the best solution in the market for hybrid CICD platforms. You host the compute and Buildkite hosts the control plane. BK is a great vendor and I greatly respect what they built. As of 2024, GH Action added support for self-hosted agents and their library of GH Actions is uncontested.
Github ActionsGoodGreat community support and massive library of ready-to-use actions. They have added self-hosted agents in recent years. Their great integration with Github platform is a huge plus
ArtifactoryGoodThere are good alternatives depending on which languages you need to support (GIthub Packages, AWS CodeArtifact). Unfortunately, neither supports both Ruby and Python.
OpsgenieBadI can’t remember the last time Opsgenie released anything new that I was excited about. I’d go for PagerDuty instead or DD Incident Management tool. I wouldn’t be surprised if Datadog expanded their catalogue to provide both incident managenet (currently supported) and oncall (not supported yet) functionality 
DatadogGoodIt is expensive but it is the best in the market. It is no longer just a monitoring tool but rather a platform that ingest most of the engineering operations telemetries with best in the market integrations. Datadog was one of the most innovative vendors we worked with
HoneybadgerBadGood errors tracker, but in the spirit of consolidating vendors, DD can replace it
SplunkBadI have nothing nice to say about them: bad support, bad product, bad company and rotten sales department.
TrivyGoodGreat tool for enforcing terraform and infrastructure best practices
SnykBadit does the job, but it lacks some core functionality like native github integration (work inprogress). DD can replace it and there are free open source tools that would work well for small companies 
LogRocketBadNot a bad product; but DD has replaced it
Swagger HubBadFewer features than Postman
PostmanGoodIt can be expensive but it is a good product overall.
OpenVPNUndecidedIt gets the job done and is reliable. The free version however is “not enterprisey” enough. Another alternative that i never explored myself but was recommended to me is https://tailscale.com/pricing
LaunchDarklyGoodWe introduced it too late but I had positive experience overall. Great Terraform support
Sophos EDRBadHaving seen what Defender offers, it is difficult to consider Sophos ever again
SlackGoodI love the integrations and level of customization Slack provides.
JiraGoodIt is bulky, it is slow but it is customizable. We used it to track incidents postmortem analysis, services, teams and all sorts of metadata.
ConfluenceGoodI love what Confluence provides. However, knowledge management at scale is challenging for every product and organization. As years go by, the knowledge keeps growing, and it takes hard and deliberate effort to maintain it (remove irrelevant pages, organize content, simplify knowledge discovery, etc.). I’m not aware of a product in the market that has solved these problems yet but I am hopeful that someone will finally crack them.
ZoomGoodOverall, it is stable and reliable, and Zoom keeps innovating. The meeting summary feature has so much potential.
EntraIDGoodI would not want to manage on-prem Active Directory. It is an enterprise-grade identity provider. It can be complicated to manage, but overall, the positives outweigh the negatives.
Microsoft DefenderGoodOne of the best products out there. Defender can ingest signals from many systems with minimal configuration and provide full end-to-end security monitoring. It is expensive, but I think it is worth every penny.
Exchange OnlineGoodReasonably reliable (With the exception of when your email gets quarantined and it is not very clear why it got flagged). When combined with the Microsoft entire eco-system you get an enterprise grade package (DLPs, Compliance, Security,..etc).
Auth0GoodIdentity management is challenging, and Auth0 provides an initial boost to help you get started. However, as the company’s requirements evolved, we had to build additional internal abstractions on top of Auth0.
MerakiGoodReliable hardware, cloudbased and easy to setup and maintain.
Arbua BadGiven a redo chance, sticking with one vendor such as Meraki would be prime
Rapid7BadTheir product is more geared towards on-prem enterprises. We had few bad experiences with their CSOC. I’d like to give CrowdStrike a shot.
New RelicBadMy experience is based on an older pricing model they changed four years ago. That being said, Datadog’s platform offers more features beyond observability platform (Khajiit has wares if you got the coins)

Tags: