Are you sure it wasn't just instruction alignment? Inserting nops before loop jump targets to align the first loop body instruction to 8 or 16 bytes is a very common x86 thing most compilers do. See e.g. https://reverseengineering.stackexchange.com/a/2930.