GenAI Studio
  • Edge AI SDK/GenAI Studio
  • Getting Started
    • About GenAI Studio
    • Quickstart
      • Prerequisite
      • Installation
      • Utilities
    • Feature Overview
      • Inference Chat
      • Fine-tuning
      • Model Management
      • Application
    • Version History
      • Version 1.1
      • Version 1.0
  • Inference
    • Chat Inference
    • AI Agents
  • Finetune
    • Text-to-Text
      • Overview
      • Full Parameter
      • LoRA
    • Text-to-Image (Coming Soon)
    • Dataset Management
    • Schedule
  • Model
    • Model Management
  • Validation
  • Convert
  • Administration
    • Resource Monitoring
  • System Configuration
    • AI Providers
      • LLM Setup
      • Embedder Setup
      • Vector DB
      • Transcription Setup
    • System Administration
      • Users
      • Workspace Chats
      • Invites
      • GPU Resource
      • Register an App
    • Appearance Customization
    • Tools
      • Embedded Chat Widgets
      • Event Logs
      • Security & Access
  • Application
    • Text to Image
    • Background Removal
    • OCR
  • FAQ
    • Technical
Powered by GitBook
On this page
  • Model Conversion and Quantization
  • Model Sources
  • Target Platforms and Formats:
  • Conversion Process:
  • Instructions:

Convert

PreviousValidationNextResource Monitoring

Last updated 2 months ago

Model Conversion and Quantization

This document provides technical guidance on converting and quantizing large language models (LLMs) for deployment on various platforms. This is crucial for deploying and optimizing LLMs on diverse hardware, including but not limited to NVIDIA Jetson, AMD, Intel, and Qualcomm. This guide covers both foundation LLMs and fine-tuned LLMs.

Currently, this version only supports converting models to the GGUF format. Support for more platforms and formats will be added in the future.

Model Sources

  • Foundation LLMs

  • Fine-tuned LLMs

Target Platforms and Formats:

This guide covers the following target platforms and formats:

  • GGUF: A format for efficient CPU execution of models, particularly using the llama.cpp library.

Conversion Process:

  1. Select Source Model: Choose from the available foundation LLMs or fine-tuned LLMs.

  2. Model Quantization: (Optional) Apply quantization techniques to reduce model size and improve inference speed.

Quantization Parameters:

Quantization is the process of converting model weights from floating-point numbers (e.g., FP32) to lower-precision formats (e.g., INT8). This can significantly reduce model size and improve inference speed, but may slightly decrease accuracy.

Common quantization types include:

  • q4_k_m: A 4-bit quantization method.

  • q6_k: A 6-bit quantization method.

Instructions:

  1. Name: Enter a name for the converted model (letters, numbers, . - _ only).

  2. Description: Provide an optional description for the model (limit 20 characters).

  3. Source Model: Select the base model from the dropdown menu.

  4. Quantization Type: Select the desired quantization type from the dropdown menu.

  5. Convert: Click the "Convert" button to begin the conversion process.