A lot of "scaling" advice optimizes for problems most teams will never have. Yuva Portal taught me the opposite — keep it boring, watch the right things, and a single VPS will cheerfully serve six figures of users.
The shape of the load
Bursty. Most of the day was idle. Logins came in waves around scholarship deadlines. A naive read of CPU graphs would say the box was lightly used; a 5-second p99 would tell you the truth.
We tracked p99 latency per route from day one. If you only watch averages, the long tail will eat you alive without warning.
Indexes solve more problems than caches do
The 30% drop in data-processing time mentioned in the resume came almost entirely from one diagnostic: turning on MongoDB's slow query log and indexing the three queries that showed up most often. No Redis, no read replicas — just compound indexes that matched the actual access patterns.
A useful rule: before you reach for a cache, look at the query plan. Caches add a coherence problem. A correct index doesn't.
PM2 is fine, actually
The reflexive answer to "Node.js in production" is now Kubernetes. For a single VPS with two services, that is a 50-fold complexity increase for zero new capability. PM2 with a cluster mode of max and a pm2 startup systemd integration ran for 24 months without an incident that wasn't self-inflicted.
What I'd do differently
Two things:
- Structured logs from day zero. We had freeform
console.logfor the first year. When a real bug landed, grepping logs across PM2 instances was painful. - A single dashboard, not three. I had one for Mongo, one for the host, one for app metrics. Inevitably you only check the one that's currently green.
Boring infra is a feature. The work I'm proudest of is the work that didn't page anyone.