This would seem to imply that the model doesn't actually "understand" (whatever that means for these systems) that it has a "system prompt" separate from user input.
Yes, this is how they work. All the LLM can do is take text and generate the text that’s likely to follow. So for a chatbot, the system “prompt” is really just an introduction explaining how the chat works and what delimiters to use and the user’s “chat” is just appended to that, and then the code asks the LLM what’s next after the system prompt plus the user’s chat.