Full Parameter
Last updated
Last updated
After selecting Add Task
and assigning a task name, the training workflow begins. The process is divided into four main steps: Settings, Training, Validation, and Finished.
In this step, users configure the model and dataset for training, as well as define the training parameters.
Select the model to fine-tune from the list.
If all models are grayed out, it indicates that the models are not yet available. To download a model:
Use the Model Management interface to download the desired model.
Choose an existing dataset from the list.
If the dataset list is empty, upload or generate a dataset by:
In the Dataset Management interface, you can upload a dataset or use the LLM to generate one based on your input files (e.g., PDF, Word documents).
Configure the following parameters:
Batch Size
Meaning: The number of data samples the model processes at a time during training. It’s like studying 10 pages of a book in one sitting; those 10 pages are the Batch Size.
Key Considerations:
Too small: The model may not learn efficiently, and training can become unstable.
Too large: It requires more memory (e.g., GPU VRAM) and might slow down training.
Total Batch Size
Meaning: If you’re training with multiple GPUs, this is the sum of Batch Sizes across all GPUs. For example, if each GPU processes 32 samples and you have 4 GPUs, the Total Batch Size is 32 × 4 = 128.
Key Considerations:
Overall size impacts learning: Larger sizes can stabilize training but might require adjustments to other parameters like the Learning Rate.
Maximum Sequence Length
Meaning: The maximum number of tokens (words, subwords, or characters) the model processes in one input. Think of it as the maximum length of a sentence or paragraph the model can read at once.
Key Considerations:
Longer sequences: Provide more context but require more memory and computational power.
Shorter sequences: Are faster to process but may lose important context.
Learning Rate
Meaning: How much the model adjusts its parameters with each training step. It’s like deciding how big of a step you take when walking toward your goal.
Key Considerations:
Too high: The model might overshoot the optimal solution, leading to instability.
Too low: Training becomes slow, and the model might get stuck at a suboptimal solution.
Epoch
Meaning: One complete pass through the entire training dataset. If you have a book with 100 pages, reading all 100 pages once is one Epoch.
Key Considerations:
Too few epochs: The model may underfit (not learn enough from the data).
Too many epochs: The model may overfit (memorize the data instead of generalizing well).
Once all configurations are complete, proceed to the next step by clicking Start Training.
A progress bar indicates the real-time status of training, along with the elapsed time (e.g., 15h 17m 13s
).
Users can click the Stop
button to immediately halt the training process.
GPU Utilization: Includes real-time and maximum values for GPU usage (e.g., 0%
out of 100%
).
VRAM Usage: Indicates both current (e.g., 0.89%
) and peak (e.g., 93%
) memory usage.
Temperature: Tracks the temperature of the GPU (e.g., 31°C
), and peak (e.g.,71°C
).
Fan Speed: Displays real-time fan speed as a percentage of its maximum capacity (e.g., 30%
) and peak (e.g., 40%
).
CPU Utilization: Real-time and peak CPU usage are displayed (e.g., 0%
) and peak (e.g., 93%
).
Memory Utilization: Shows the memory usage of the system (e.g., 2%
) and peak (e.g., 17%
).
AI SSD Usage: Monitors SSD usage specifically allocated for AI operations (e.g., 21%
) and peak (e.g., 20%
).
A dynamic graph tracks the loss rate over epochs.
The graph prominently highlights loss rate improvements:
This visualization allows users to quickly assess training effectiveness and convergence trends.
Real-time logs provide granular information about training iterations, including specific timestamps and operations performed (e.g., Forward
, Backward
, Save Model_Checkpoint
).
Validation Overview Model Validation offers tools for side-by-side comparison of multiple Large Language Models (LLMs), including fine-tuned models. It evaluates model performance across different training stages (e.g., epochs
) by analyzing responses to a given set of questions.
For more detailed instructions and examples on utilizing this feature effectively, we encourage you to visit the Validation operation page. This page provides comprehensive guidance, including step-by-step procedures, best practices, and troubleshooting tips to ensure you can maximize the feature's capabilities and apply it efficiently to meet your requirements.
Upon successful model validation and quantization, you will be redirected to either the designated model repository (ollama) or your designated workspace.
Click the icon next to "Model" to open the Model Management window.
Clicking the icon next to "Dataset" to open the Dataset Management window.
Users can click the icon next to the Log section to instantly download detailed log files.