So before the gate keeping starts, my first crack at optimizations was in 1987 on a 1 MHz Apple //e
1. I wrote the code in BASIC
2. I wrote the code in assembly
3. I got more improvement because storing and reading from the first page of memory took two clock cycles instead of 3.
But this isn’t 1986, this is 2026. I “vibe coded” my first project this year. I designed the AWS architecture from the an empty account using IAC, I chose every service to use, I verified every permission, I chose and designed the orchestration, the concurrency model, I gathered requirements. What I didn’t do is look at a line of Python code or infrastructure code aside from the permissions that Codex generated.
Now to answer your questions :
How did I validate the correctness? Just like if I had written it myself. I had Codex to create a shell script to do end to end tests of all of the scenarios I cared about and when one broke, I went back to Codex to fix it. I was very detailed about the scenarios.
The web front end that I used was built by another developer. I haven’t touched web dev in a decade. I told Codex what changes I needed and I verified the changes by deploying it and testing it manually.
How did I validate the performance? Again just like I would do on something I wrote. I tested it first with a few hundred transactions to verify the functionality and then I stress tested it with a real world amount a transactions. The first iteration broke horribly. Not because of Claude code. It was a bad design.
But here’s the beauty. It took me a day to do the bad implementation that would have taken me three or four days. I then redesigned it, didn’t use the AWS service and did I designed that was much more scalable and it took a day. I knew in theory how it worked under the hood. But not in practice. Again I tested for scalability by testing the result.
The architectural quality? I validated it by synthesizing real world traffic. ChatGPT in thinking mode did find a subtle concurrency bug. That was my fault though. I designed the concurrency implementation, Codex just did what I told it to do.
Subtle bugs happen whether people write it or an agent writes it. You do the best you can with your tests and when they come up you fix it?
How do I prevent technical debt? All large implementations have technical debt. Again just like when I lead a team - I componenitize everything with clean interfaces. It makes it easier for coding agents and people.
1. I wrote the code in BASIC
2. I wrote the code in assembly
3. I got more improvement because storing and reading from the first page of memory took two clock cycles instead of 3.
But this isn’t 1986, this is 2026. I “vibe coded” my first project this year. I designed the AWS architecture from the an empty account using IAC, I chose every service to use, I verified every permission, I chose and designed the orchestration, the concurrency model, I gathered requirements. What I didn’t do is look at a line of Python code or infrastructure code aside from the permissions that Codex generated.
Now to answer your questions :
How did I validate the correctness? Just like if I had written it myself. I had Codex to create a shell script to do end to end tests of all of the scenarios I cared about and when one broke, I went back to Codex to fix it. I was very detailed about the scenarios.
The web front end that I used was built by another developer. I haven’t touched web dev in a decade. I told Codex what changes I needed and I verified the changes by deploying it and testing it manually.
How did I validate the performance? Again just like I would do on something I wrote. I tested it first with a few hundred transactions to verify the functionality and then I stress tested it with a real world amount a transactions. The first iteration broke horribly. Not because of Claude code. It was a bad design.
But here’s the beauty. It took me a day to do the bad implementation that would have taken me three or four days. I then redesigned it, didn’t use the AWS service and did I designed that was much more scalable and it took a day. I knew in theory how it worked under the hood. But not in practice. Again I tested for scalability by testing the result.
The architectural quality? I validated it by synthesizing real world traffic. ChatGPT in thinking mode did find a subtle concurrency bug. That was my fault though. I designed the concurrency implementation, Codex just did what I told it to do.
Subtle bugs happen whether people write it or an agent writes it. You do the best you can with your tests and when they come up you fix it?
How do I prevent technical debt? All large implementations have technical debt. Again just like when I lead a team - I componenitize everything with clean interfaces. It makes it easier for coding agents and people.