Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Speaking of agents and tests, here's a fun one I had the other day: while refactoring a large code base I told the agent to do something precise to a specific module, refactor with the new change, then ensure the tests are passing.

The test suite is slow and has many moving parts; the tests I asked it to run take ~5 minutes. The thing decided to kill the test run, then it made up another command it said was the 'tests' so when I looked at the agent console in the IDE everything seemed fine collapsed, i.e. 'Tests ran successfully'.

Obviously the code changes also had a subtle bug that I only saw when pushing its refactoring to CI (and more waiting). At least there were tests to catch the problem.



So it took a shortcut as it was too lazy and it lied to your face about it. AGI is here for good.


I think that it's something that model providers don't want to fix, because the amount of times that Claude Code just decided to delete tests that were not passing before I added a memory saying that it would need to ask for my permission to do that was staggering. It stopped happening after the memory, so I believe that it could be easily fixed by a system prompt.


Your Claude Code actually respects CLAUDE.md?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: