Security is a never-ending process, not a validation step! (2026 edition)
Four years later, half of what I wrote is embarrassing. The other half is still true. Your WAF is still useless, your SIEM still spams your on-call, and your security team still cannot code. Let's fix all of that.
If you follow this blog, you know we are the IT firefighters. And no, I still don't enjoy 3 AM phone calls, no matter how many times I tell myself I'll get used to it.
Most of those calls are about infrastructure falling apart. Some are about security. And security calls always leave a bigger crater.
Four years ago I wrote the first version of this article. I re-read it before starting this update, and a lot of it still holds. But our stack has evolved, our opinions have hardened, and frankly, the industry has doubled down on exactly the behaviors I was complaining about back then. So let's do another pass.
This article is still about infrastructure security. Code security is still "coming in a second article", which, yes, four years later, is becoming a running joke internally.

Security is not a checklist
Before we get into tooling, let me get something off my chest.

Security is not a form you fill once a year for your insurance. Security is not an ISO certificate hanging on the wall of reception. Security is not a WAF you bought because PCI requirement 6.6 told you to. Security is not a SOC 2 report you wave at procurement.
Security is a process. A boring, repetitive, never-finished, daily process of reading changelogs, patching servers, killing dead services, rotating credentials, reading logs that nobody wants to read, and refusing shortcuts that everybody wants to take.
If your security posture is a binder, you are not secure. You are documented. Those are two very different things.
If your security team cannot code, it is not a security team

I want to be extremely clear on this, because I have had this conversation too many times with CISOs of regulated-sector clients:
If the people you call "security engineers" cannot read code, write scripts, automate their own checks, and patch a playbook, they are not a security team. They are paper pushers with an expensive title.
A modern attacker writes code. Their payloads are code. Their C2 is code. Their lateral movement is scripted. Their infra is deployed in seconds.
You cannot defend against that with a team whose main output is a quarterly PowerPoint and a shared Excel of "risks". You need people who can write a small Python script against an API in 10 minutes, read the source of the software they are supposed to protect, and ship a patch through your CI/CD.
If your security team needs to "open a ticket with the infra team" to get a firewall rule changed, your attacker already won. They are not gated by tickets.
And this brings me to the next point.
Teams that rely only on tools are already dead

I have audited shops with every box ticked: CDN WAF, internal WAF, SIEM from a vendor whose name ends in "X", EDR on every endpoint, SOAR, DLP, UEBA, ZTNA, and any other 3-letter acronym you can think of. Millions of dollars a year. Full SOC, 24/7.
They got popped by a developer committing an .env file to a public repo.
Tools are a force multiplier for people who understand what they are doing. For people who do not, tools are a very expensive placebo. They generate a feeling of safety, a lot of dashboards, and zero actual defense.
The attackers do not care about your vendor stack. They care about your mistakes. And mistakes are found by humans reading code, reading logs, and thinking, not by a blinky dashboard bought from Gartner's top right quadrant.
You do not need the Rolls Royce

Every time we start a due diligence engagement, we see the same pattern: a company with 12 engineers has bought enterprise security software designed for a bank with 12,000 employees. It is misconfigured, half deployed, nobody knows how to use it, and the license renewal is next month.
Keep it simple. Keep it understandable. Keep the pace.
- Simple tools you fully understand beat complex tools you half-understand.
- On-prem hardware is stupidly cheap now. A refurbished 1U server with 256GB RAM costs less than one month of equivalent AWS.
- Storage is cheaper than it has ever been. Log everything. Keep it for years. Grep later.
- AI is a massive productivity win for security work: log triage, anomaly summarization, writing detection rules, reviewing IaC diffs. Use it. Just do not trust it blindly.
- The fewer moving parts in your stack, the fewer things can break or get misconfigured at 2 AM.
Our philosophy at Kalvad has not changed: no Kubernetes, no Helm charts stacked 7 layers deep, no "platform team of 40 people". Just boring, replaceable, observable Unix boxes, with code to manage them.

Still a rare video of an admin configuring a whole network by hand. Still the root cause of most incidents we are called for.
Our actual stack in 2026
Let's get practical. Here is what we run today, what changed, and why.

Router and Security: bare FreeBSD

In 2021, we were big OPNsense fans. We still have respect for the project, but our skills have caught up, and the abstraction started to get in the way more than it helped.
We now run bare FreeBSD on our edge boxes. pf directly, not behind a WebUI. CARP for failover. Suricata on the side.
The config lives in git and Jinn. pf.conf, interfaces, the whole thing. It is deployed with PyInfra. Peer reviewed through merge requests. We know exactly what every rule does, because we wrote every rule.
Why this move? Two reasons:
- Control. We hit the ceiling of what the OPNsense UI and API could express. For most people, OPNsense is fine, great even. For our usage, we needed raw FreeBSD.
- Understanding. If you cannot write a pf ruleset by hand, you should not be configuring a firewall, period. Running bare FreeBSD forces the team to actually understand what is happening on the wire.
On the hardware side, we moved away from expensive boxes. We now standardize on small, cheap, replaceable mini PCs with 10Gbps NICs. If one dies, we image a spare in 5 minutes, CARP picks up, done. No 6-week RMA with a vendor. No "approved hardware list" from procurement. Just a shelf of identical boxes and a deployment script.
Total cost per box: a fraction of what we used to spend. Total peace of mind: significantly higher.
DNS and DNS protection: Blocky
Pi-hole served us well for years, but we outgrew it. We moved to Blocky.
Why the switch:
- Single Go binary. No PHP, no full web stack, no "please reboot your Raspberry Pi".
- Native Prometheus metrics.
- Proper conditional forwarding, per-client policies, and blacklist groups that actually scale.
- Configured entirely through a YAML file. No clicking around.
- Redundancy is trivial: deploy N instances, ship them the same config, put them behind anycast or just list them all in DHCP.
It does exactly one job, it does it well, and when it breaks (rarely) you can read the source.
WAF: still useless

I am not going to soften this: after another four years of seeing WAFs in production, my opinion has gotten worse, not better.
WAFs in the real world are either:
- In "log only" mode because blocking breaks the app, therefore useless.
- In blocking mode with rules so permissive they would not stop a curl with
-A "Mozilla", therefore useless. - Blocking legitimate traffic and generating support tickets, therefore worse than useless.
Every dollar you spend on a WAF is a dollar you did not spend on code review, dependency scanning, SAST, DAST, or paying a good pentester. PCI 6.6 lets you do either. Do the code review. It is cheaper, more effective, and it also happens to make your product better. A WAF does none of those things.
Ditch it. Review your code. Update your dependencies. Run a SAST in CI. Fail the build on criticals.
SIEM: we built our own
This is the section I am most proud of.

In 2021 I was telling people to ditch their SIEM and use a traditional log system with Alerta on top. The principle was right. The tooling is now better.
We built our own SIEM on top of OpenObserve. Yes, built. As in, we wrote code. Shocking concept, I know.
The stack:
- OpenObserve as the storage and query layer. S3-compatible backend, columnar, absurdly cheap per GB, fast enough.
- Logs shipped from every box via Vector (on Alpine) or fluentbit (on FreeBSD).
- Detection rules written in Python, running on a schedule, querying OpenObserve's API, raising alerts only when something actually matches a tuned pattern.
- Alerta is out. We moved to a custom rule engine written in Crystal, with ntfy.sh on top for delivery.
- Jinn (more on that in a second) as the glue that manages the whole pipeline.
Total cost: a small fraction of what a Splunk, QRadar or Sentinel deployment would cost, for a setup that matches our workflow exactly, with no per-GB ingestion anxiety. We keep logs for years because storage is cheap. Dashboards are in OpenObserve. Nothing fancy, nothing vendor locked.
The important point: we only alert on successful attacks or credible indicators of one. Failed logins, denied firewall packets, scanner noise, all of that goes to the log layer and is queryable, but nobody gets paged for it. If your on-call gets paged for every blocked SYN on port 22, your on-call stops reading pages. Congratulations, you have trained your team to ignore alerts.
Infra management: Kalvad Jinn
In 2021 we used Nautobot as source of truth and Ansible to deploy. That architecture was correct. The tools have changed.

We built Kalvad Jinn, our own infrastructure orchestrator. It is the source of truth for our inventory, the driver for our automation, the audit log of what is deployed where, and the control plane for our clients' environments.
Jinn knows every box, every service, every version, every dependency. When a CVE drops, we ask Jinn "who is vulnerable", we get a list in 2 seconds, we trigger the patch playbook, done. Same workflow as before, but tighter, faster, and fully ours.
Ansible is out, PyInfra is in
I said in the 2021 article that we were starting to like Ansible less and less. That trend did not reverse. It accelerated.

We are now fully on PyInfra. I am a core contributor to the project (contributor #3 if you are counting). I also submitted a FOSDEM 2026 talk on exactly this topic, PyInfra vs Ansible/YAML.
The short version of why we moved:
- Screw YAML. Configuration should be code, not a templated DSL pretending to be data.
- Python is a real language with real tooling: debuggers, type hints, tests, linters, IDE support.
- PyInfra is an order of magnitude faster than Ansible. Deployments that took 20 minutes now take 2.
- No agent. Just SSH and Python.
- You can actually reason about what a deploy will do, because it is just a Python program.
If you are still writing YAML loops with Jinja templates to simulate conditionals, stop. There is a better way. Rewrite your automation in something that is actually a language.
OS: Alpine and FreeBSD, with Void on the horizon
In 2021 we were mixed. In 2026 we are exclusively Alpine Linux and FreeBSD.
Alpine for the vast majority of Linux workloads. Small, fast, musl, no systemd, excellent package manager, trivial to harden. A base image is under 100MB. Security audits are tractable because there is almost nothing installed by default.
FreeBSD for edge networking, storage, and anything where ZFS, pf, and jails make our lives easier.
We are currently evaluating Void Linux with a custom kernel for a subset of workloads. The runit init, the xbps package manager, and the ability to ship a tuned kernel for specific hardware is attractive. Expect a blog post when that migration happens.
What we do not run: Ubuntu, RHEL, Debian-with-systemd, anything with snap or flatpak, anything that tries to be "user friendly" at the expense of being transparent.
Emergency scenario, 2026 edition

The web is on fire. Critical CVE in a widely used library. How do we handle it?
- Alert drops (mailing lists, RSS, Alpine security announcements, FreeBSD-SA, a few Mastodon accounts that are faster than any vendor feed).
- We query Jinn for affected versions. In 5 seconds we have a list of every vulnerable host across every client.
- We run the relevant PyInfra deploy against that list. Because PyInfra is fast and parallel, 50 hosts patched takes a couple of minutes.
- We verify via OpenObserve queries that the new version is running everywhere.
- We write a short client-facing postmortem from our notes. Done.
Total time, from alert to fully patched fleet: under 15 minutes for most CVEs. Sometimes faster.
For comparison, the same incident in a shop with no inventory, no automation, and a security team that cannot code: 3 to 5 days, a lot of Slack panic, a Jira epic with 40 tickets, and a 50% chance that two servers are forgotten and found by the attacker first.
The gap is not about money. It is about mindset.
Conclusion
Rewriting this article four years later, the headline has not changed: security is a never-ending process, not a validation step. The tools have evolved, our skills have sharpened, but the failure modes in the industry are exactly the same. People still buy Rolls Royces they cannot drive. People still hire security teams that cannot code. People still think a SIEM will save them.
Start simple. Keep it simple. Write code. Read code. Automate everything. Log everything. Alert on what matters. Patch relentlessly. Understand every box in your network.
If you do those things with a 3-person team on Alpine and FreeBSD, you will beat a 30-person team on a $20M vendor stack. Every time.
Next article, I really will write the code security one. Probably.