Validation

Last updated 2 months ago

Validation

Model Validation provides a tool for side-by-side comparison of multiple LLM models, including those that have been fine-tuned. This feature allows you to assess model performance at different training stages (epochs) by evaluating their responses to a given set of questions.

Validation Interface

Configuration Options for Model Validation

Max Token Input: Allows users to set a limit on the maximum number of tokens generated by the model.
Temperature Control: Provides a slider to adjust the randomness of the model's outputs, promoting deterministic or creative responses.
Top-p Sampling: Enables users to fine-tune the sampling technique for better response quality.

Model Selection Drop-down

Users can select from multiple models available for validation.
Clicking on the model list allows you to choose from a variety of available models. These include both pre-trained base models and custom models that have been fine-tuned for specific tasks.

Example Models:
- Foundation Model: meta-llama/Llama-3.1-8B-Instruct
- After Fine-tuning Model: AIR-8B -Epoch 4.

Side-by-Side Output Comparison

The interface displays outputs from different models or configurations in parallel for easy comparison.
Each column corresponds to a model with its specific parameters.

Question Input Area

A dedicated input field for entering questions or prompts to be validated.
Users can add questions using the text input area at the bottom of the interface or upload a JSON file to provide a batch of questions.

The provided JSON snippet adheres to the standard JSON format. It consists of a list (indicated by square brackets []) containing multiple strings, each representing a question.

For Example:
[
  "What processor is integrated into the AIR-100 system?",
  "What graphics engine is used for HDMI-1 and HDMI-2 outputs in the AIR-100?",
  "How many HDMI outputs does the AIR-100 support, and what are their versions?"
]

Action Buttons

Submit Button: Initiates the validation process based on the current configuration.
Reset Button: Clears all question inputs.

Validate and Compare the Model

Case I: Foundation Model vs Fine-tuning Mode (Epoch 4)

Case II: Fine-tuning Mode (Epoch 1) vs Fine-tuning Mode (Epoch 4)

Action Buttons

Download: Download the questions and their corresponding model-generated answers as a CSV file.
Create (Model Quantization):
- Create Workspace with this inference: Quantize the model and create a new Workspace directly, applying the quantized model. The default quantization format is q4_k_m. Advanced quantization options are available in advanced mode. The model selection list will display fine-tuned models from various training epochs. A statistical summary of user ratings for each model's responses will be included to aid in your selection.

The Q_K_M configuration is designed to:

Reduce memory usage: By lowering the precision of weights, models can run on devices with limited RAM.
Improve inference speed: Quantized models require fewer computational resources, making them faster.
Maintain acceptable accuracy: Advanced quantization methods like K-type aim to minimize the loss in model performance caused by reduced precision.

For example, in llama.cpp, a Q4_K_M model uses 4-bit quantization with K-type optimization, striking a balance between memory efficiency and model accuracy.

Breaking Down Q_K_M:

Q:
- Represents Quantization, a process of reducing the precision of model weights (e.g., from 16-bit floating point to 4-bit integers).
- Quantization reduces the memory footprint and speeds up inference, especially on resource-constrained devices.
K:
- Refers to a K-type quantization method, which is a specific algorithm or approach used to optimize the quantization process.
- K-type quantization typically focuses on minimizing perplexity loss (a measure of model performance) while maintaining efficiency. It is often more advanced than simpler quantization methods.
M:
- Likely stands for a mode or configuration within the quantization method. For example:
  - M could mean "Medium," indicating a balance between performance and efficiency.
  - Other suffixes (e.g., S for "Small" or L for "Large") might represent different trade-offs between speed, memory usage, and accuracy.

Import to Ollama's Inference Repo: To place a model into a repository that Ollama can access and use.

Once the model is compressed, you can locate it in the AI Provider section, under LLM and then Ollama.

PreviousModel Management NextConvert

Last updated 2 months ago

Validation Interface

Configuration Options for Model Validation

Max Token Input: Allows users to set a limit on the maximum number of tokens generated by the model.
Temperature Control: Provides a slider to adjust the randomness of the model's outputs, promoting deterministic or creative responses.
Top-p Sampling: Enables users to fine-tune the sampling technique for better response quality.

Model Selection Drop-down

Users can select from multiple models available for validation.
Clicking on the model list allows you to choose from a variety of available models. These include both pre-trained base models and custom models that have been fine-tuned for specific tasks.

Example Models:
- Foundation Model: meta-llama/Llama-3.1-8B-Instruct
- After Fine-tuning Model: AIR-8B -Epoch 4.

Side-by-Side Output Comparison

The interface displays outputs from different models or configurations in parallel for easy comparison.
Each column corresponds to a model with its specific parameters.

Question Input Area

A dedicated input field for entering questions or prompts to be validated.
Users can add questions using the text input area at the bottom of the interface or upload a JSON file to provide a batch of questions.

The provided JSON snippet adheres to the standard JSON format. It consists of a list (indicated by square brackets []) containing multiple strings, each representing a question.

For Example:
[
  "What processor is integrated into the AIR-100 system?",
  "What graphics engine is used for HDMI-1 and HDMI-2 outputs in the AIR-100?",
  "How many HDMI outputs does the AIR-100 support, and what are their versions?"
]

Action Buttons

Submit Button: Initiates the validation process based on the current configuration.
Reset Button: Clears all question inputs.

Validate and Compare the Model

Upon clicking submit, the system will query both models with the provided question. During this process, there will be a brief loading time as the models are loaded onto the GPU. Once the models have generated their responses, a "like" icon will appear next to each answer. If you are satisfied with a particular response, please click the corresponding icon. The system will record the number of "likes" each model receives, which will be used for subsequent model quantification.

Case I: Foundation Model vs Fine-tuning Mode (Epoch 4)

Case II: Fine-tuning Mode (Epoch 1) vs Fine-tuning Mode (Epoch 4)

Action Buttons

Download: Download the questions and their corresponding model-generated answers as a CSV file.
Create (Model Quantization):
- Create Workspace with this inference: Quantize the model and create a new Workspace directly, applying the quantized model. The default quantization format is q4_k_m. Advanced quantization options are available in advanced mode. The model selection list will display fine-tuned models from various training epochs. A statistical summary of user ratings for each model's responses will be included to aid in your selection.

The Q_K_M configuration is designed to:

Reduce memory usage: By lowering the precision of weights, models can run on devices with limited RAM.
Improve inference speed: Quantized models require fewer computational resources, making them faster.
Maintain acceptable accuracy: Advanced quantization methods like K-type aim to minimize the loss in model performance caused by reduced precision.

For example, in llama.cpp, a Q4_K_M model uses 4-bit quantization with K-type optimization, striking a balance between memory efficiency and model accuracy.

Breaking Down Q_K_M:

Q:
- Represents Quantization, a process of reducing the precision of model weights (e.g., from 16-bit floating point to 4-bit integers).
- Quantization reduces the memory footprint and speeds up inference, especially on resource-constrained devices.
K:
- Refers to a K-type quantization method, which is a specific algorithm or approach used to optimize the quantization process.
- K-type quantization typically focuses on minimizing perplexity loss (a measure of model performance) while maintaining efficiency. It is often more advanced than simpler quantization methods.
M:
- Likely stands for a mode or configuration within the quantization method. For example:
  - M could mean "Medium," indicating a balance between performance and efficiency.
  - Other suffixes (e.g., S for "Small" or L for "Large") might represent different trade-offs between speed, memory usage, and accuracy.

Import to Ollama's Inference Repo: To place a model into a repository that Ollama can access and use.

Once the model is compressed, you can locate it in the AI Provider section, under LLM and then Ollama.