The original LoRA paper does exactly this using GPT2 and BERT type language models. They are explicit about it being designed for LLMs, but since it seems to work really well for stable diffusion it would be interesting to see metrics in the image domain.