3. Fine-Tune LLaMA 13B With QLoRA on Amazon SageMaker




  • Parameter Efficient Fine-tuning, is a new open-source library from Hugging Face to enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model’s parameters (View Highlight)
  • QLoRA is a new technique to reduce the memory footprint of large language models during finetuning, without sacrificing performance. The TL;DR; of how QLoRA works is: • Quantize the pretrained model to 4 bits and freezing it. • Attach small, trainable adapter layers. (LoRA) • Finetune only the adapter layers, while using the frozen quantized model for context. (View Highlight)