diff --git a/README-ja.md b/README-ja.md new file mode 100644 index 0000000..c661a16 --- /dev/null +++ b/README-ja.md @@ -0,0 +1,138 @@ +## リポジトリについて +Stable Diffusionの学習、画像生成、その他のスクリプトを入れたリポジトリです。 + +[README in English](./README.md) ←更新情報はこちらにあります + +GUIやPowerShellスクリプトなど、より使いやすくする機能が[bmaltais氏のリポジトリ](https://github.com/bmaltais/kohya_ss)で提供されています(英語です)のであわせてご覧ください。bmaltais氏に感謝します。 + +以下のスクリプトがあります。 + +* DreamBooth、U-NetおよびText Encoderの学習をサポート +* fine-tuning、同上 +* 画像生成 +* モデル変換(Stable Diffision ckpt/safetensorsとDiffusersの相互変換) + +## 使用法について + +当リポジトリ内およびnote.comに記事がありますのでそちらをご覧ください(将来的にはすべてこちらへ移すかもしれません)。 + +* [DreamBoothの学習について](./train_db_README-ja.md) +* [fine-tuningのガイド](./fine_tune_README_ja.md): +BLIPによるキャプショニングと、DeepDanbooruまたはWD14 taggerによるタグ付けを含みます +* [LoRAの学習について](./train_network_README-ja.md) +* [Textual Inversionの学習について](./train_ti_README-ja.md) +* note.com [画像生成スクリプト](https://note.com/kohya_ss/n/n2693183a798e) +* note.com [モデル変換スクリプト](https://note.com/kohya_ss/n/n374f316fe4ad) + +## Windowsでの動作に必要なプログラム + +Python 3.10.6およびGitが必要です。 + +- Python 3.10.6: https://www.python.org/ftp/python/3.10.6/python-3.10.6-amd64.exe +- git: https://git-scm.com/download/win + +PowerShellを使う場合、venvを使えるようにするためには以下の手順でセキュリティ設定を変更してください。 +(venvに限らずスクリプトの実行が可能になりますので注意してください。) + +- PowerShellを管理者として開きます。 +- 「Set-ExecutionPolicy Unrestricted」と入力し、Yと答えます。 +- 管理者のPowerShellを閉じます。 + +## Windows環境でのインストール + +以下の例ではPyTorchは1.12.1/CUDA 11.6版をインストールします。CUDA 11.3版やPyTorch 1.13を使う場合は適宜書き換えください。 + +(なお、python -m venv~の行で「python」とだけ表示された場合、py -m venv~のようにpythonをpyに変更してください。) + +通常の(管理者ではない)PowerShellを開き以下を順に実行します。 + +```powershell +git clone https://github.com/kohya-ss/sd-scripts.git +cd sd-scripts + +python -m venv venv +.\venv\Scripts\activate + +pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116 +pip install --upgrade -r requirements.txt +pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl + +cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\ +cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py +cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py + +accelerate config +``` + +コマンドプロンプトでは以下になります。 + + +```bat +git clone https://github.com/kohya-ss/sd-scripts.git +cd sd-scripts + +python -m venv venv +.\venv\Scripts\activate + +pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116 +pip install --upgrade -r requirements.txt +pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl + +copy /y .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\ +copy /y .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py +copy /y .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py + +accelerate config +``` + +(注:``python -m venv venv`` のほうが ``python -m venv --system-site-packages venv`` より安全そうなため書き換えました。globalなpythonにパッケージがインストールしてあると、後者だといろいろと問題が起きます。) + +accelerate configの質問には以下のように答えてください。(bf16で学習する場合、最後の質問にはbf16と答えてください。) + +※0.15.0から日本語環境では選択のためにカーソルキーを押すと落ちます(……)。数字キーの0、1、2……で選択できますので、そちらを使ってください。 + +```txt +- This machine +- No distributed training +- NO +- NO +- NO +- all +- fp16 +``` + +※場合によって ``ValueError: fp16 mixed precision requires a GPU`` というエラーが出ることがあるようです。この場合、6番目の質問( +``What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:``)に「0」と答えてください。(id `0`のGPUが使われます。) + +### PyTorchとxformersのバージョンについて + +他のバージョンでは学習がうまくいかない場合があるようです。特に他の理由がなければ指定のバージョンをお使いください。 + +## アップグレード + +新しいリリースがあった場合、以下のコマンドで更新できます。 + +```powershell +cd sd-scripts +git pull +.\venv\Scripts\activate +pip install --upgrade -r +``` + +コマンドが成功すれば新しいバージョンが使用できます。 + +## 謝意 + +LoRAの実装は[cloneofsimo氏のリポジトリ](https://github.com/cloneofsimo/lora)を基にしたものです。感謝申し上げます。 + +## ライセンス + +スクリプトのライセンスはASL 2.0ですが(Diffusersおよびcloneofsimo氏のリポジトリ由来のものも同様)、一部他のライセンスのコードを含みます。 + +[Memory Efficient Attention Pytorch](https://github.com/lucidrains/memory-efficient-attention-pytorch): MIT + +[bitsandbytes](https://github.com/TimDettmers/bitsandbytes): MIT + +[BLIP](https://github.com/salesforce/BLIP): BSD-3-Clause + + diff --git a/README.md b/README.md index 6148d85..860d821 100644 --- a/README.md +++ b/README.md @@ -6,8 +6,9 @@ This repository repository is providing a Gradio GUI for kohya's Stable Diffusio Python 3.10.6+ and Git: -- Python 3.10.6+: https://www.python.org/ftp/python/3.10.9/python-3.10.9-amd64.exe +- Install Python 3.10 using https://www.python.org/ftp/python/3.10.9/python-3.10.9-amd64.exe (make sure to tick the box to add Python to the environment path) - git: https://git-scm.com/download/win +- Visual Studio 2015, 2017, 2019, and 2022 redistributable: https://aka.ms/vs/17/release/vc_redist.x64.exe ## Installation @@ -23,7 +24,7 @@ Open a regular user Powershell terminal and type the following inside: git clone https://github.com/bmaltais/kohya_ss.git cd kohya_ss -python -m venv --system-site-packages venv +python -m venv venv .\venv\Scripts\activate pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116 @@ -40,7 +41,7 @@ accelerate config ### Optional: CUDNN 8.6 -This step is optional but can improve the learning speed for NVidia 4090 owners... +This step is optional but can improve the learning speed for NVidia 30X0/40X0 owners... It allows larger training batch size and faster training speed Due to the filesize I can't host the DLLs needed for CUDNN 8.6 on Github, I strongly advise you download them for a speed boost in sample generation (almost 50% on 4090) you can download them from here: https://b1.thefileditch.ch/mwxKTEtelILoIbMbruuM.zip @@ -130,6 +131,9 @@ Then redo the installation instruction within the kohya_ss venv. ## Change history +* 2023/01/26 (v20.5.0): + - Add new `Dreambooth TI` tab for training of Textual Inversion embeddings + - Add Textual Inversion training. Documentation is [here](./train_ti_README-ja.md) (in Japanese.) * 2023/01/22 (v20.4.1): - Add new tool to verify LoRA weights produced by the trainer. Can be found under "Dreambooth LoRA/Tools/Verify LoRA" * 2023/01/22 (v20.4.0): diff --git a/kohya_gui.py b/kohya_gui.py index 5224be2..bc7faa8 100644 --- a/kohya_gui.py +++ b/kohya_gui.py @@ -3,6 +3,7 @@ import os import argparse from dreambooth_gui import dreambooth_tab from finetune_gui import finetune_tab +from textual_inversion_gui import ti_tab from library.utilities import utilities_tab from library.extract_lora_gui import gradio_extract_lora_tab from library.merge_lora_gui import gradio_merge_lora_tab @@ -30,6 +31,8 @@ def UI(username, password): ) = dreambooth_tab() with gr.Tab('Dreambooth LoRA'): lora_tab() + with gr.Tab('Dreambooth TI'): + ti_tab() with gr.Tab('Finetune'): finetune_tab() with gr.Tab('Utilities'): diff --git a/library/common_gui.py b/library/common_gui.py index eaff569..4b5e8dc 100644 --- a/library/common_gui.py +++ b/library/common_gui.py @@ -424,8 +424,8 @@ def gradio_training(learning_rate_value='1e-6', lr_scheduler_value='constant', l minimum=1, maximum=os.cpu_count(), step=1, - label='Number of CPU threads per process', - value=os.cpu_count(), + label='Number of CPU threads per core', + value=2, ) seed = gr.Textbox(label='Seed', value=1234) with gr.Row(): diff --git a/library/train_util.py b/library/train_util.py index 0fdbadc..85b58d7 100644 --- a/library/train_util.py +++ b/library/train_util.py @@ -12,6 +12,7 @@ import math import os import random import hashlib +from io import BytesIO from tqdm import tqdm import torch @@ -25,6 +26,7 @@ from PIL import Image import cv2 from einops import rearrange from torch import einsum +import safetensors.torch import library.model_util as model_util @@ -85,6 +87,7 @@ class BaseDataset(torch.utils.data.Dataset): self.enable_bucket = False self.min_bucket_reso = None self.max_bucket_reso = None + self.bucket_info = None self.tokenizer_max_length = self.tokenizer.model_max_length if max_token_length is None else max_token_length + 2 @@ -110,9 +113,14 @@ class BaseDataset(torch.utils.data.Dataset): self.image_data: dict[str, ImageInfo] = {} + self.replacements = {} + def disable_token_padding(self): self.token_padding_disabled = True + def add_replacement(self, str_from, str_to): + self.replacements[str_from] = str_to + def process_caption(self, caption): if self.shuffle_caption: tokens = caption.strip().split(",") @@ -125,6 +133,17 @@ class BaseDataset(torch.utils.data.Dataset): random.shuffle(tokens) tokens = keep_tokens + tokens caption = ",".join(tokens).strip() + + for str_from, str_to in self.replacements.items(): + if str_from == "": + # replace all + if type(str_to) == list: + caption = random.choice(str_to) + else: + caption = str_to + else: + caption = caption.replace(str_from, str_to) + return caption def get_input_ids(self, caption): @@ -217,11 +236,17 @@ class BaseDataset(torch.utils.data.Dataset): self.buckets[bucket_index].append(image_info.image_key) if self.enable_bucket: + self.bucket_info = {"buckets": {}} print("number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)") for i, (reso, img_keys) in enumerate(zip(bucket_resos, self.buckets)): + self.bucket_info["buckets"][i] = {"resolution": reso, "count": len(img_keys)} print(f"bucket {i}: resolution {reso}, count: {len(img_keys)}") + img_ar_errors = np.array(img_ar_errors) - print(f"mean ar error (without repeats): {np.mean(np.abs(img_ar_errors))}") + mean_img_ar_error = np.mean(np.abs(img_ar_errors)) + self.bucket_info["mean_img_ar_error"] = mean_img_ar_error + print(f"mean ar error (without repeats): {mean_img_ar_error}") + # 参照用indexを作る self.buckets_indices: list(BucketBatchIndex) = [] @@ -599,7 +624,7 @@ class FineTuningDataset(BaseDataset): else: # わりといい加減だがいい方法が思いつかん abs_path = glob_images(train_data_dir, image_key) - assert len(abs_path) >= 1, f"no image / 画像がありません: {abs_path}" + assert len(abs_path) >= 1, f"no image / 画像がありません: {image_key}" abs_path = abs_path[0] caption = img_md.get('caption') @@ -706,15 +731,17 @@ class FineTuningDataset(BaseDataset): return npz_file_norm, npz_file_flip -def debug_dataset(train_dataset): +def debug_dataset(train_dataset, show_input_ids=False): print(f"Total dataset length (steps) / データセットの長さ(ステップ数): {len(train_dataset)}") print("Escape for exit. / Escキーで中断、終了します") k = 0 for example in train_dataset: if example['latents'] is not None: print("sample has latents from npz file") - for j, (ik, cap, lw) in enumerate(zip(example['image_keys'], example['captions'], example['loss_weights'])): + for j, (ik, cap, lw, iid) in enumerate(zip(example['image_keys'], example['captions'], example['loss_weights'], example['input_ids'])): print(f'{ik}, size: {train_dataset.image_data[ik].image_size}, caption: "{cap}", loss weight: {lw}') + if show_input_ids: + print(f"input ids: {iid}") if example['images'] is not None: im = example['images'][j] im = ((im.numpy() + 1.0) * 127.5).astype(np.uint8) @@ -790,6 +817,49 @@ def calculate_sha256(filename): return hash_sha256.hexdigest() +def precalculate_safetensors_hashes(tensors, metadata): + """Precalculate the model hashes needed by sd-webui-additional-networks to + save time on indexing the model later.""" + + # Because writing user metadata to the file can change the result of + # sd_models.model_hash(), only retain the training metadata for purposes of + # calculating the hash, as they are meant to be immutable + metadata = {k: v for k, v in metadata.items() if k.startswith("ss_")} + + bytes = safetensors.torch.save(tensors, metadata) + b = BytesIO(bytes) + + model_hash = addnet_hash_safetensors(b) + legacy_hash = addnet_hash_legacy(b) + return model_hash, legacy_hash + + +def addnet_hash_legacy(b): + """Old model hash used by sd-webui-additional-networks for .safetensors format files""" + m = hashlib.sha256() + + b.seek(0x100000) + m.update(b.read(0x10000)) + return m.hexdigest()[0:8] + + +def addnet_hash_safetensors(b): + """New model hash used by sd-webui-additional-networks for .safetensors format files""" + hash_sha256 = hashlib.sha256() + blksize = 1024 * 1024 + + b.seek(0) + header = b.read(8) + n = int.from_bytes(header, "little") + + offset = n + 8 + b.seek(offset) + for chunk in iter(lambda: b.read(blksize), b""): + hash_sha256.update(chunk) + + return hash_sha256.hexdigest() + + # flash attention forwards and backwards # https://arxiv.org/abs/2205.14135 @@ -1057,6 +1127,8 @@ def add_training_arguments(parser: argparse.ArgumentParser, support_dreambooth: choices=[None, "float", "fp16", "bf16"], help="precision in saving / 保存時に精度を変更して保存する") parser.add_argument("--save_every_n_epochs", type=int, default=None, help="save checkpoint every N epochs / 学習中のモデルを指定エポックごとに保存する") + parser.add_argument("--save_n_epoch_ratio", type=int, default=None, + help="save checkpoint N epoch ratio (for example 5 means save at least 5 files total) / 学習中のモデルを指定のエポック割合で保存する(たとえば5を指定すると最低5個のファイルが保存される)") parser.add_argument("--save_last_n_epochs", type=int, default=None, help="save last N checkpoints / 最大Nエポック保存する") parser.add_argument("--save_last_n_epochs_state", type=int, default=None, help="save last N checkpoints of state (overrides the value of --save_last_n_epochs)/ 最大Nエポックstateを保存する(--save_last_n_epochsの指定を上書きします)") diff --git a/lora_gui.py b/lora_gui.py index 5c62d08..14ab60b 100644 --- a/lora_gui.py +++ b/lora_gui.py @@ -275,6 +275,9 @@ def train_model( msgbox('Output folder path is missing') return + if not os.path.exists(output_dir): + os.makedirs(output_dir) + if stop_text_encoder_training_pct > 0: msgbox('Output "stop text encoder training" is not yet supported. Ignoring') stop_text_encoder_training_pct = 0 diff --git a/networks/lora.py b/networks/lora.py index 9243f1e..174feda 100644 --- a/networks/lora.py +++ b/networks/lora.py @@ -7,6 +7,8 @@ import math import os import torch +from library import train_util + class LoRAModule(torch.nn.Module): """ @@ -31,7 +33,7 @@ class LoRAModule(torch.nn.Module): self.lora_up = torch.nn.Linear(lora_dim, out_dim, bias=False) if type(alpha) == torch.Tensor: - alpha = alpha.detach().numpy() + alpha = alpha.detach().float().numpy() # without casting, bf16 causes error alpha = lora_dim if alpha is None or alpha == 0 else alpha self.scale = alpha / self.lora_dim self.register_buffer('alpha', torch.tensor(alpha)) # 定数として扱える @@ -221,6 +223,14 @@ class LoRANetwork(torch.nn.Module): if os.path.splitext(file)[1] == '.safetensors': from safetensors.torch import save_file + + # Precalculate model hashes to save time on indexing + if metadata is None: + metadata = {} + model_hash, legacy_hash = train_util.precalculate_safetensors_hashes(state_dict, metadata) + metadata["sshs_model_hash"] = model_hash + metadata["sshs_legacy_hash"] = legacy_hash + save_file(state_dict, file, metadata) else: torch.save(state_dict, file) diff --git a/textual_inversion_gui.py b/textual_inversion_gui.py new file mode 100644 index 0000000..86d026c --- /dev/null +++ b/textual_inversion_gui.py @@ -0,0 +1,777 @@ +# v1: initial release +# v2: add open and save folder icons +# v3: Add new Utilities tab for Dreambooth folder preparation +# v3.1: Adding captionning of images to utilities + +import gradio as gr +import json +import math +import os +import subprocess +import pathlib +import argparse +from library.common_gui import ( + get_folder_path, + remove_doublequote, + get_file_path, + get_any_file_path, + get_saveasfile_path, + color_aug_changed, + save_inference_file, + gradio_advanced_training, + run_cmd_advanced_training, + run_cmd_training, + gradio_training, + gradio_config, + gradio_source_model, +) +from library.dreambooth_folder_creation_gui import ( + gradio_dreambooth_folder_creation_tab, +) +from library.utilities import utilities_tab +from easygui import msgbox + +folder_symbol = '\U0001f4c2' # 📂 +refresh_symbol = '\U0001f504' # 🔄 +save_style_symbol = '\U0001f4be' # 💾 +document_symbol = '\U0001F4C4' # 📄 + + +def save_configuration( + save_as, + file_path, + pretrained_model_name_or_path, + v2, + v_parameterization, + logging_dir, + train_data_dir, + reg_data_dir, + output_dir, + max_resolution, + learning_rate, + lr_scheduler, + lr_warmup, + train_batch_size, + epoch, + save_every_n_epochs, + mixed_precision, + save_precision, + seed, + num_cpu_threads_per_process, + cache_latents, + caption_extension, + enable_bucket, + gradient_checkpointing, + full_fp16, + no_token_padding, + stop_text_encoder_training, + use_8bit_adam, + xformers, + save_model_as, + shuffle_caption, + save_state, + resume, + prior_loss_weight, + color_aug, + flip_aug, + clip_skip, + vae, + output_name, + max_token_length, + max_train_epochs, + max_data_loader_n_workers, + mem_eff_attn, + gradient_accumulation_steps, + model_list, token_string, init_word, num_vectors_per_token, max_train_steps, weights, template, +): + # Get list of function parameters and values + parameters = list(locals().items()) + + original_file_path = file_path + + save_as_bool = True if save_as.get('label') == 'True' else False + + if save_as_bool: + print('Save as...') + file_path = get_saveasfile_path(file_path) + else: + print('Save...') + if file_path == None or file_path == '': + file_path = get_saveasfile_path(file_path) + + # print(file_path) + + if file_path == None or file_path == '': + return original_file_path # In case a file_path was provided and the user decide to cancel the open action + + # Return the values of the variables as a dictionary + variables = { + name: value + for name, value in parameters # locals().items() + if name + not in [ + 'file_path', + 'save_as', + ] + } + + # Save the data to the selected file + with open(file_path, 'w') as file: + json.dump(variables, file, indent=2) + + return file_path + + +def open_configuration( + file_path, + pretrained_model_name_or_path, + v2, + v_parameterization, + logging_dir, + train_data_dir, + reg_data_dir, + output_dir, + max_resolution, + learning_rate, + lr_scheduler, + lr_warmup, + train_batch_size, + epoch, + save_every_n_epochs, + mixed_precision, + save_precision, + seed, + num_cpu_threads_per_process, + cache_latents, + caption_extension, + enable_bucket, + gradient_checkpointing, + full_fp16, + no_token_padding, + stop_text_encoder_training, + use_8bit_adam, + xformers, + save_model_as, + shuffle_caption, + save_state, + resume, + prior_loss_weight, + color_aug, + flip_aug, + clip_skip, + vae, + output_name, + max_token_length, + max_train_epochs, + max_data_loader_n_workers, + mem_eff_attn, + gradient_accumulation_steps, + model_list, token_string, init_word, num_vectors_per_token, max_train_steps, weights, template, +): + # Get list of function parameters and values + parameters = list(locals().items()) + + original_file_path = file_path + file_path = get_file_path(file_path) + + if not file_path == '' and not file_path == None: + # load variables from JSON file + with open(file_path, 'r') as f: + my_data_db = json.load(f) + print('Loading config...') + else: + file_path = original_file_path # In case a file_path was provided and the user decide to cancel the open action + my_data_db = {} + + values = [file_path] + for key, value in parameters: + # Set the value in the dictionary to the corresponding value in `my_data`, or the default value if not found + if not key in ['file_path']: + values.append(my_data_db.get(key, value)) + return tuple(values) + + +def train_model( + pretrained_model_name_or_path, + v2, + v_parameterization, + logging_dir, + train_data_dir, + reg_data_dir, + output_dir, + max_resolution, + learning_rate, + lr_scheduler, + lr_warmup, + train_batch_size, + epoch, + save_every_n_epochs, + mixed_precision, + save_precision, + seed, + num_cpu_threads_per_process, + cache_latents, + caption_extension, + enable_bucket, + gradient_checkpointing, + full_fp16, + no_token_padding, + stop_text_encoder_training_pct, + use_8bit_adam, + xformers, + save_model_as, + shuffle_caption, + save_state, + resume, + prior_loss_weight, + color_aug, + flip_aug, + clip_skip, + vae, + output_name, + max_token_length, + max_train_epochs, + max_data_loader_n_workers, + mem_eff_attn, + gradient_accumulation_steps, + model_list, # Keep this. Yes, it is unused here but required given the common list used + token_string, init_word, num_vectors_per_token, max_train_steps, weights, template, +): + if pretrained_model_name_or_path == '': + msgbox('Source model information is missing') + return + + if train_data_dir == '': + msgbox('Image folder path is missing') + return + + if not os.path.exists(train_data_dir): + msgbox('Image folder does not exist') + return + + if reg_data_dir != '': + if not os.path.exists(reg_data_dir): + msgbox('Regularisation folder does not exist') + return + + if output_dir == '': + msgbox('Output folder path is missing') + return + + if token_string == '': + msgbox('Token string is missing') + return + + if init_word == '': + msgbox('Init word is missing') + return + + if not os.path.exists(output_dir): + os.makedirs(output_dir) + + # Get a list of all subfolders in train_data_dir + subfolders = [ + f + for f in os.listdir(train_data_dir) + if os.path.isdir(os.path.join(train_data_dir, f)) + ] + + total_steps = 0 + + # Loop through each subfolder and extract the number of repeats + for folder in subfolders: + # Extract the number of repeats from the folder name + repeats = int(folder.split('_')[0]) + + # Count the number of images in the folder + num_images = len( + [ + f + for f in os.listdir(os.path.join(train_data_dir, folder)) + if f.endswith('.jpg') + or f.endswith('.jpeg') + or f.endswith('.png') + or f.endswith('.webp') + ] + ) + + # Calculate the total number of steps for this folder + steps = repeats * num_images + total_steps += steps + + # Print the result + print(f'Folder {folder}: {steps} steps') + + # Print the result + # print(f"{total_steps} total steps") + + if reg_data_dir == '': + reg_factor = 1 + else: + print( + 'Regularisation images are used... Will double the number of steps required...' + ) + reg_factor = 2 + + # calculate max_train_steps + if max_train_steps == '': + max_train_steps = int( + math.ceil( + float(total_steps) + / int(train_batch_size) + * int(epoch) + * int(reg_factor) + ) + ) + else: + max_train_steps = int(max_train_steps) + + print(f'max_train_steps = {max_train_steps}') + + # calculate stop encoder training + if stop_text_encoder_training_pct == None: + stop_text_encoder_training = 0 + else: + stop_text_encoder_training = math.ceil( + float(max_train_steps) / 100 * int(stop_text_encoder_training_pct) + ) + print(f'stop_text_encoder_training = {stop_text_encoder_training}') + + lr_warmup_steps = round(float(int(lr_warmup) * int(max_train_steps) / 100)) + print(f'lr_warmup_steps = {lr_warmup_steps}') + + run_cmd = f'accelerate launch --num_cpu_threads_per_process={num_cpu_threads_per_process} "train_textual_inversion.py"' + if v2: + run_cmd += ' --v2' + if v_parameterization: + run_cmd += ' --v_parameterization' + if enable_bucket: + run_cmd += ' --enable_bucket' + if no_token_padding: + run_cmd += ' --no_token_padding' + run_cmd += ( + f' --pretrained_model_name_or_path="{pretrained_model_name_or_path}"' + ) + run_cmd += f' --train_data_dir="{train_data_dir}"' + if len(reg_data_dir): + run_cmd += f' --reg_data_dir="{reg_data_dir}"' + run_cmd += f' --resolution={max_resolution}' + run_cmd += f' --output_dir="{output_dir}"' + run_cmd += f' --logging_dir="{logging_dir}"' + if not stop_text_encoder_training == 0: + run_cmd += ( + f' --stop_text_encoder_training={stop_text_encoder_training}' + ) + if not save_model_as == 'same as source model': + run_cmd += f' --save_model_as={save_model_as}' + # if not resume == '': + # run_cmd += f' --resume={resume}' + if not float(prior_loss_weight) == 1.0: + run_cmd += f' --prior_loss_weight={prior_loss_weight}' + if not vae == '': + run_cmd += f' --vae="{vae}"' + if not output_name == '': + run_cmd += f' --output_name="{output_name}"' + if int(max_token_length) > 75: + run_cmd += f' --max_token_length={max_token_length}' + if not max_train_epochs == '': + run_cmd += f' --max_train_epochs="{max_train_epochs}"' + if not max_data_loader_n_workers == '': + run_cmd += ( + f' --max_data_loader_n_workers="{max_data_loader_n_workers}"' + ) + if int(gradient_accumulation_steps) > 1: + run_cmd += f' --gradient_accumulation_steps={int(gradient_accumulation_steps)}' + + run_cmd += run_cmd_training( + learning_rate=learning_rate, + lr_scheduler=lr_scheduler, + lr_warmup_steps=lr_warmup_steps, + train_batch_size=train_batch_size, + max_train_steps=max_train_steps, + save_every_n_epochs=save_every_n_epochs, + mixed_precision=mixed_precision, + save_precision=save_precision, + seed=seed, + caption_extension=caption_extension, + cache_latents=cache_latents, + ) + + run_cmd += run_cmd_advanced_training( + max_train_epochs=max_train_epochs, + max_data_loader_n_workers=max_data_loader_n_workers, + max_token_length=max_token_length, + resume=resume, + save_state=save_state, + mem_eff_attn=mem_eff_attn, + clip_skip=clip_skip, + flip_aug=flip_aug, + color_aug=color_aug, + shuffle_caption=shuffle_caption, + gradient_checkpointing=gradient_checkpointing, + full_fp16=full_fp16, + xformers=xformers, + use_8bit_adam=use_8bit_adam, + ) + run_cmd += f' --token_string={token_string}' + run_cmd += f' --init_word={init_word}' + run_cmd += f' --num_vectors_per_token={num_vectors_per_token}' + if not weights == '': + run_cmd += f' --weights="{weights}"' + if template == 'object template': + run_cmd += f' --use_object_template' + elif template == 'style template': + run_cmd += f' --use_style_template' + + print(run_cmd) + # Run the command + subprocess.run(run_cmd) + + # check if output_dir/last is a folder... therefore it is a diffuser model + last_dir = pathlib.Path(f'{output_dir}/{output_name}') + + if not last_dir.is_dir(): + # Copy inference model for v2 if required + save_inference_file(output_dir, v2, v_parameterization, output_name) + + +def UI(username, password): + css = '' + + if os.path.exists('./style.css'): + with open(os.path.join('./style.css'), 'r', encoding='utf8') as file: + print('Load CSS...') + css += file.read() + '\n' + + interface = gr.Blocks(css=css) + + with interface: + with gr.Tab('Dreambooth TI'): + ( + train_data_dir_input, + reg_data_dir_input, + output_dir_input, + logging_dir_input, + ) = ti_tab() + with gr.Tab('Utilities'): + utilities_tab( + train_data_dir_input=train_data_dir_input, + reg_data_dir_input=reg_data_dir_input, + output_dir_input=output_dir_input, + logging_dir_input=logging_dir_input, + enable_copy_info_button=True, + ) + + # Show the interface + if not username == '': + interface.launch(auth=(username, password)) + else: + interface.launch() + + +def ti_tab( + train_data_dir=gr.Textbox(), + reg_data_dir=gr.Textbox(), + output_dir=gr.Textbox(), + logging_dir=gr.Textbox(), +): + dummy_db_true = gr.Label(value=True, visible=False) + dummy_db_false = gr.Label(value=False, visible=False) + gr.Markdown('Train a TI using kohya textual inversion python code...') + ( + button_open_config, + button_save_config, + button_save_as_config, + config_file_name, + ) = gradio_config() + + ( + pretrained_model_name_or_path, + v2, + v_parameterization, + save_model_as, + model_list, + ) = gradio_source_model() + + with gr.Tab('Folders'): + with gr.Row(): + train_data_dir = gr.Textbox( + label='Image folder', + placeholder='Folder where the training folders containing the images are located', + ) + train_data_dir_input_folder = gr.Button( + '📂', elem_id='open_folder_small' + ) + train_data_dir_input_folder.click( + get_folder_path, outputs=train_data_dir + ) + reg_data_dir = gr.Textbox( + label='Regularisation folder', + placeholder='(Optional) Folder where where the regularization folders containing the images are located', + ) + reg_data_dir_input_folder = gr.Button( + '📂', elem_id='open_folder_small' + ) + reg_data_dir_input_folder.click( + get_folder_path, outputs=reg_data_dir + ) + with gr.Row(): + output_dir = gr.Textbox( + label='Model output folder', + placeholder='Folder to output trained model', + ) + output_dir_input_folder = gr.Button( + '📂', elem_id='open_folder_small' + ) + output_dir_input_folder.click(get_folder_path, outputs=output_dir) + logging_dir = gr.Textbox( + label='Logging folder', + placeholder='Optional: enable logging and output TensorBoard log to this folder', + ) + logging_dir_input_folder = gr.Button( + '📂', elem_id='open_folder_small' + ) + logging_dir_input_folder.click( + get_folder_path, outputs=logging_dir + ) + with gr.Row(): + output_name = gr.Textbox( + label='Model output name', + placeholder='Name of the model to output', + value='last', + interactive=True, + ) + train_data_dir.change( + remove_doublequote, + inputs=[train_data_dir], + outputs=[train_data_dir], + ) + reg_data_dir.change( + remove_doublequote, + inputs=[reg_data_dir], + outputs=[reg_data_dir], + ) + output_dir.change( + remove_doublequote, + inputs=[output_dir], + outputs=[output_dir], + ) + logging_dir.change( + remove_doublequote, + inputs=[logging_dir], + outputs=[logging_dir], + ) + with gr.Tab('Training parameters'): + with gr.Row(): + weights = gr.Textbox( + label='Resume TI training', + placeholder='(Optional) Path to existing TI embeding file to keep training', + ) + weights_file_input = gr.Button( + '📂', elem_id='open_folder_small' + ) + weights_file_input.click(get_file_path, outputs=weights) + with gr.Row(): + token_string = gr.Textbox( + label='Token string', + placeholder='eg: cat', + ) + init_word = gr.Textbox( + label='Init word', + value='*', + ) + num_vectors_per_token = gr.Slider( + minimum=1, + maximum=75, + value=1, + step=1, + label='Vectors', + ) + max_train_steps = gr.Textbox( + label='Max train steps', + placeholder='(Optional) Maximum number of steps', + ) + template = gr.Dropdown( + label='Template', + choices=[ + 'caption', + 'object template', + 'style template', + ], + value='caption', + ) + ( + learning_rate, + lr_scheduler, + lr_warmup, + train_batch_size, + epoch, + save_every_n_epochs, + mixed_precision, + save_precision, + num_cpu_threads_per_process, + seed, + caption_extension, + cache_latents, + ) = gradio_training( + learning_rate_value='1e-5', + lr_scheduler_value='cosine', + lr_warmup_value='10', + ) + with gr.Row(): + max_resolution = gr.Textbox( + label='Max resolution', + value='512,512', + placeholder='512,512', + ) + stop_text_encoder_training = gr.Slider( + minimum=0, + maximum=100, + value=0, + step=1, + label='Stop text encoder training', + ) + enable_bucket = gr.Checkbox(label='Enable buckets', value=True) + with gr.Accordion('Advanced Configuration', open=False): + with gr.Row(): + no_token_padding = gr.Checkbox( + label='No token padding', value=False + ) + gradient_accumulation_steps = gr.Number( + label='Gradient accumulate steps', value='1' + ) + with gr.Row(): + prior_loss_weight = gr.Number( + label='Prior loss weight', value=1.0 + ) + vae = gr.Textbox( + label='VAE', + placeholder='(Optiona) path to checkpoint of vae to replace for training', + ) + vae_button = gr.Button('📂', elem_id='open_folder_small') + vae_button.click(get_any_file_path, outputs=vae) + ( + use_8bit_adam, + xformers, + full_fp16, + gradient_checkpointing, + shuffle_caption, + color_aug, + flip_aug, + clip_skip, + mem_eff_attn, + save_state, + resume, + max_token_length, + max_train_epochs, + max_data_loader_n_workers, + ) = gradio_advanced_training() + color_aug.change( + color_aug_changed, + inputs=[color_aug], + outputs=[cache_latents], + ) + with gr.Tab('Tools'): + gr.Markdown( + 'This section provide Dreambooth tools to help setup your dataset...' + ) + gradio_dreambooth_folder_creation_tab( + train_data_dir_input=train_data_dir, + reg_data_dir_input=reg_data_dir, + output_dir_input=output_dir, + logging_dir_input=logging_dir, + ) + + button_run = gr.Button('Train TI') + + settings_list = [ + pretrained_model_name_or_path, + v2, + v_parameterization, + logging_dir, + train_data_dir, + reg_data_dir, + output_dir, + max_resolution, + learning_rate, + lr_scheduler, + lr_warmup, + train_batch_size, + epoch, + save_every_n_epochs, + mixed_precision, + save_precision, + seed, + num_cpu_threads_per_process, + cache_latents, + caption_extension, + enable_bucket, + gradient_checkpointing, + full_fp16, + no_token_padding, + stop_text_encoder_training, + use_8bit_adam, + xformers, + save_model_as, + shuffle_caption, + save_state, + resume, + prior_loss_weight, + color_aug, + flip_aug, + clip_skip, + vae, + output_name, + max_token_length, + max_train_epochs, + max_data_loader_n_workers, + mem_eff_attn, + gradient_accumulation_steps, + model_list, + token_string, init_word, num_vectors_per_token, max_train_steps, weights, template, + ] + + button_open_config.click( + open_configuration, + inputs=[config_file_name] + settings_list, + outputs=[config_file_name] + settings_list, + ) + + button_save_config.click( + save_configuration, + inputs=[dummy_db_false, config_file_name] + settings_list, + outputs=[config_file_name], + ) + + button_save_as_config.click( + save_configuration, + inputs=[dummy_db_true, config_file_name] + settings_list, + outputs=[config_file_name], + ) + + button_run.click( + train_model, + inputs=settings_list, + ) + + return ( + train_data_dir, + reg_data_dir, + output_dir, + logging_dir, + ) + + +if __name__ == '__main__': + # torch.cuda.set_per_process_memory_fraction(0.48) + parser = argparse.ArgumentParser() + parser.add_argument( + '--username', type=str, default='', help='Username for authentication' + ) + parser.add_argument( + '--password', type=str, default='', help='Password for authentication' + ) + + args = parser.parse_args() + + UI(username=args.username, password=args.password) diff --git a/train_db_README-ja.md b/train_db_README-ja.md index 53ee715..85ae35a 100644 --- a/train_db_README-ja.md +++ b/train_db_README-ja.md @@ -72,7 +72,7 @@ identifierとclassを使い、たとえば「shs dog」などでモデルを学 ※LoRA等の追加ネットワークを学習する場合のコマンドは ``train_db.py`` ではなく ``train_network.py`` となります。また追加でnetwork_\*オプションが必要となりますので、LoRAのガイドを参照してください。 ``` -accelerate launch --num_cpu_threads_per_process 8 train_db.py +accelerate launch --num_cpu_threads_per_process 1 train_db.py --pretrained_model_name_or_path=<.ckptまたは.safetensordまたはDiffusers版モデルのディレクトリ> --train_data_dir=<学習用データのディレクトリ> --reg_data_dir=<正則化画像のディレクトリ> @@ -89,7 +89,7 @@ accelerate launch --num_cpu_threads_per_process 8 train_db.py --gradient_checkpointing ``` -num_cpu_threads_per_processにはCPUコア数を指定するとよいようです。 +num_cpu_threads_per_processには通常は1を指定するとよいようです。 pretrained_model_name_or_pathに追加学習を行う元となるモデルを指定します。Stable Diffusionのcheckpointファイル(.ckptまたは.safetensors)、Diffusersのローカルディスクにあるモデルディレクトリ、DiffusersのモデルID("stabilityai/stable-diffusion-2"など)が指定できます。学習後のモデルの保存形式はデフォルトでは元のモデルと同じになります(save_model_asオプションで変更できます)。 @@ -159,7 +159,7 @@ v2.xモデルでWebUIで画像生成する場合、モデルの仕様が記述 ![image](https://user-images.githubusercontent.com/52813779/210776915-061d79c3-6582-42c2-8884-8b91d2f07313.png) -各yamlファイルは[https://github.com/Stability-AI/stablediffusion/tree/main/configs/stable-diffusion](Stability AIのSD2.0のリポジトリ)にあります。 +各yamlファイルは[Stability AIのSD2.0のリポジトリ](https://github.com/Stability-AI/stablediffusion/tree/main/configs/stable-diffusion)にあります。 # その他の学習オプション diff --git a/train_network.py b/train_network.py index d60ae9a..8a8acc7 100644 --- a/train_network.py +++ b/train_network.py @@ -212,6 +212,8 @@ def train(args): # epoch数を計算する num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps) num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch) + if (args.save_n_epoch_ratio is not None) and (args.save_n_epoch_ratio > 0): + args.save_every_n_epochs = math.floor(num_train_epochs / args.save_n_epoch_ratio) or 1 # 学習する total_batch_size = args.train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps @@ -264,6 +266,7 @@ def train(args): "ss_keep_tokens": args.keep_tokens, "ss_dataset_dirs": json.dumps(train_dataset.dataset_dirs_info), "ss_reg_dataset_dirs": json.dumps(train_dataset.reg_dataset_dirs_info), + "ss_bucket_info": json.dumps(train_dataset.bucket_info), "ss_training_comment": args.training_comment # will not be updated after training } @@ -437,8 +440,8 @@ if __name__ == '__main__': train_util.add_training_arguments(parser, True) parser.add_argument("--no_metadata", action='store_true', help="do not save metadata in output model / メタデータを出力先モデルに保存しない") - parser.add_argument("--save_model_as", type=str, default="pt", choices=[None, "ckpt", "pt", "safetensors"], - help="format to save the model (default is .pt) / モデル保存時の形式(デフォルトはpt)") + parser.add_argument("--save_model_as", type=str, default="safetensors", choices=[None, "ckpt", "pt", "safetensors"], + help="format to save the model (default is .safetensors) / モデル保存時の形式(デフォルトはsafetensors)") parser.add_argument("--unet_lr", type=float, default=None, help="learning rate for U-Net / U-Netの学習率") parser.add_argument("--text_encoder_lr", type=float, default=None, help="learning rate for Text Encoder / Text Encoderの学習率") diff --git a/train_network_README-ja.md b/train_network_README-ja.md index 8e329e9..e67d8cd 100644 --- a/train_network_README-ja.md +++ b/train_network_README-ja.md @@ -24,7 +24,7 @@ DreamBoothの手法(identifier(sksなど)とclass、オプションで正 [DreamBoothのガイド](./train_db_README-ja.md) を参照してデータを用意してください。 -学習するとき、train_db.pyの代わりにtrain_network.pyを指定してください。 +学習するとき、train_db.pyの代わりにtrain_network.pyを指定してください。そして「LoRAの学習のためのオプション」にあるようにLoRA関連のオプション(``network_dim``や``network_alpha``など)を追加してください。 ほぼすべてのオプション(Stable Diffusionのモデル保存関係を除く)が使えますが、stop_text_encoder_trainingはサポートしていません。 @@ -32,7 +32,7 @@ DreamBoothの手法(identifier(sksなど)とclass、オプションで正 [fine-tuningのガイド](./fine_tune_README_ja.md) を参照し、各手順を実行してください。 -学習するとき、fine_tune.pyの代わりにtrain_network.pyを指定してください。ほぼすべてのオプション(モデル保存関係を除く)がそのまま使えます。 +学習するとき、fine_tune.pyの代わりにtrain_network.pyを指定してください。ほぼすべてのオプション(モデル保存関係を除く)がそのまま使えます。そして「LoRAの学習のためのオプション」にあるようにLoRA関連のオプション(``network_dim``や``network_alpha``など)を追加してください。 なお「latentsの事前取得」は行わなくても動作します。VAEから学習時(またはキャッシュ時)にlatentを取得するため学習速度は遅くなりますが、代わりにcolor_augが使えるようになります。 @@ -45,7 +45,7 @@ train_network.pyでは--network_moduleオプションに、学習対象のモジ 以下はコマンドラインの例です(DreamBooth手法)。 ``` -accelerate launch --num_cpu_threads_per_process 12 train_network.py +accelerate launch --num_cpu_threads_per_process 1 train_network.py --pretrained_model_name_or_path=..\models\model.ckpt --train_data_dir=..\data\db\char1 --output_dir=..\lora_train1 --reg_data_dir=..\data\db\reg1 --prior_loss_weight=1.0 @@ -60,7 +60,9 @@ accelerate launch --num_cpu_threads_per_process 12 train_network.py その他、以下のオプションが指定できます。 * --network_dim - * LoRAの次元数を指定します(``--networkdim=4``など)。省略時は4になります。数が多いほど表現力は増しますが、学習に必要なメモリ、時間は増えます。また闇雲に増やしても良くないようです。 + * LoRAのRANKを指定します(``--networkdim=4``など)。省略時は4になります。数が多いほど表現力は増しますが、学習に必要なメモリ、時間は増えます。また闇雲に増やしても良くないようです。 +* --network_alpha + * アンダーフローを防ぎ安定して学習するための ``alpha`` 値を指定します。デフォルトは1です。``network_dim``と同じ値を指定すると以前のバージョンと同じ動作になります。 * --network_weights * 学習前に学習済みのLoRAの重みを読み込み、そこから追加で学習します。 * --network_train_unet_only @@ -126,7 +128,7 @@ python networks\merge_lora.py --ratiosにそれぞれのモデルの比率(どのくらい重みを元モデルに反映するか)を0~1.0の数値で指定します。二つのモデルを一対一でマージす場合は、「0.5 0.5」になります。「1.0 1.0」では合計の重みが大きくなりすぎて、恐らく結果はあまり望ましくないものになると思われます。 -v1で学習したLoRAとv2で学習したLoRA、次元数の異なるLoRAはマージできません。U-NetだけのLoRAとU-Net+Text EncoderのLoRAはマージできるはずですが、結果は未知数です。 +v1で学習したLoRAとv2で学習したLoRA、rank(次元数)や``alpha``の異なるLoRAはマージできません。U-NetだけのLoRAとU-Net+Text EncoderのLoRAはマージできるはずですが、結果は未知数です。 ### その他のオプション diff --git a/train_textual_inversion.py b/train_textual_inversion.py new file mode 100644 index 0000000..35b4ede --- /dev/null +++ b/train_textual_inversion.py @@ -0,0 +1,498 @@ +import importlib +import argparse +import gc +import math +import os + +from tqdm import tqdm +import torch +from accelerate.utils import set_seed +import diffusers +from diffusers import DDPMScheduler + +import library.train_util as train_util +from library.train_util import DreamBoothDataset, FineTuningDataset + +imagenet_templates_small = [ + "a photo of a {}", + "a rendering of a {}", + "a cropped photo of the {}", + "the photo of a {}", + "a photo of a clean {}", + "a photo of a dirty {}", + "a dark photo of the {}", + "a photo of my {}", + "a photo of the cool {}", + "a close-up photo of a {}", + "a bright photo of the {}", + "a cropped photo of a {}", + "a photo of the {}", + "a good photo of the {}", + "a photo of one {}", + "a close-up photo of the {}", + "a rendition of the {}", + "a photo of the clean {}", + "a rendition of a {}", + "a photo of a nice {}", + "a good photo of a {}", + "a photo of the nice {}", + "a photo of the small {}", + "a photo of the weird {}", + "a photo of the large {}", + "a photo of a cool {}", + "a photo of a small {}", +] + +imagenet_style_templates_small = [ + "a painting in the style of {}", + "a rendering in the style of {}", + "a cropped painting in the style of {}", + "the painting in the style of {}", + "a clean painting in the style of {}", + "a dirty painting in the style of {}", + "a dark painting in the style of {}", + "a picture in the style of {}", + "a cool painting in the style of {}", + "a close-up painting in the style of {}", + "a bright painting in the style of {}", + "a cropped painting in the style of {}", + "a good painting in the style of {}", + "a close-up painting in the style of {}", + "a rendition in the style of {}", + "a nice painting in the style of {}", + "a small painting in the style of {}", + "a weird painting in the style of {}", + "a large painting in the style of {}", +] + + +def collate_fn(examples): + return examples[0] + + +def train(args): + if args.output_name is None: + args.output_name = args.token_string + use_template = args.use_object_template or args.use_style_template + + train_util.verify_training_args(args) + train_util.prepare_dataset_args(args, True) + + cache_latents = args.cache_latents + use_dreambooth_method = args.in_json is None + + if args.seed is not None: + set_seed(args.seed) + + tokenizer = train_util.load_tokenizer(args) + + # acceleratorを準備する + print("prepare accelerator") + accelerator, unwrap_model = train_util.prepare_accelerator(args) + + # mixed precisionに対応した型を用意しておき適宜castする + weight_dtype, save_dtype = train_util.prepare_dtype(args) + + # モデルを読み込む + text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype) + + # Convert the init_word to token_id + if args.init_word is not None: + init_token_id = tokenizer.encode(args.init_word, add_special_tokens=False) + assert len( + init_token_id) == 1, f"init word {args.init_word} is not converted to single token / 初期化単語が二つ以上のトークンに変換されます。別の単語を使ってください" + init_token_id = init_token_id[0] + else: + init_token_id = None + + # add new word to tokenizer, count is num_vectors_per_token + token_strings = [args.token_string] + [f"{args.token_string}{i+1}" for i in range(args.num_vectors_per_token - 1)] + num_added_tokens = tokenizer.add_tokens(token_strings) + assert num_added_tokens == args.num_vectors_per_token, f"tokenizer has same word to token string. please use another one / 指定したargs.token_stringは既に存在します。別の単語を使ってください: {args.token_string}" + + token_ids = tokenizer.convert_tokens_to_ids(token_strings) + print(f"tokens are added: {token_ids}") + assert min(token_ids) == token_ids[0] and token_ids[-1] == token_ids[0] + len(token_ids) - 1, f"token ids is not ordered" + assert len(tokenizer) - 1 == token_ids[-1], f"token ids is not end of tokenize: {len(tokenizer)}" + + # Resize the token embeddings as we are adding new special tokens to the tokenizer + text_encoder.resize_token_embeddings(len(tokenizer)) + + # Initialise the newly added placeholder token with the embeddings of the initializer token + token_embeds = text_encoder.get_input_embeddings().weight.data + if init_token_id is not None: + for token_id in token_ids: + token_embeds[token_id] = token_embeds[init_token_id] + # print(token_id, token_embeds[token_id].mean(), token_embeds[token_id].min()) + + # load weights + if args.weights is not None: + embeddings = load_weights(args.weights) + assert len(token_ids) == len( + embeddings), f"num_vectors_per_token is mismatch for weights / 指定した重みとnum_vectors_per_tokenの値が異なります: {len(embeddings)}" + # print(token_ids, embeddings.size()) + for token_id, embedding in zip(token_ids, embeddings): + token_embeds[token_id] = embedding + # print(token_id, token_embeds[token_id].mean(), token_embeds[token_id].min()) + print(f"weighs loaded") + + print(f"create embeddings for {args.num_vectors_per_token} tokens, for {args.token_string}") + + # データセットを準備する + if use_dreambooth_method: + print("Use DreamBooth method.") + train_dataset = DreamBoothDataset(args.train_batch_size, args.train_data_dir, args.reg_data_dir, + tokenizer, args.max_token_length, args.caption_extension, args.shuffle_caption, args.keep_tokens, + args.resolution, args.enable_bucket, args.min_bucket_reso, args.max_bucket_reso, args.prior_loss_weight, + args.flip_aug, args.color_aug, args.face_crop_aug_range, args.random_crop, args.debug_dataset) + else: + print("Train with captions.") + train_dataset = FineTuningDataset(args.in_json, args.train_batch_size, args.train_data_dir, + tokenizer, args.max_token_length, args.shuffle_caption, args.keep_tokens, + args.resolution, args.enable_bucket, args.min_bucket_reso, args.max_bucket_reso, + args.flip_aug, args.color_aug, args.face_crop_aug_range, args.random_crop, + args.dataset_repeats, args.debug_dataset) + + # make captions: tokenstring tokenstring1 tokenstring2 ...tokenstringn という文字列に書き換える超乱暴な実装 + if use_template: + print("use template for training captions. is object: {args.use_object_template}") + templates = imagenet_templates_small if args.use_object_template else imagenet_style_templates_small + replace_to = " ".join(token_strings) + captions = [] + for tmpl in templates: + captions.append(tmpl.format(replace_to)) + train_dataset.add_replacement("", captions) + elif args.num_vectors_per_token > 1: + replace_to = " ".join(token_strings) + train_dataset.add_replacement(args.token_string, replace_to) + + train_dataset.make_buckets() + + if args.debug_dataset: + train_util.debug_dataset(train_dataset, show_input_ids=True) + return + if len(train_dataset) == 0: + print("No data found. Please verify arguments / 画像がありません。引数指定を確認してください") + return + + # モデルに xformers とか memory efficient attention を組み込む + train_util.replace_unet_modules(unet, args.mem_eff_attn, args.xformers) + + # 学習を準備する + if cache_latents: + vae.to(accelerator.device, dtype=weight_dtype) + vae.requires_grad_(False) + vae.eval() + with torch.no_grad(): + train_dataset.cache_latents(vae) + vae.to("cpu") + if torch.cuda.is_available(): + torch.cuda.empty_cache() + gc.collect() + + if args.gradient_checkpointing: + unet.enable_gradient_checkpointing() + text_encoder.gradient_checkpointing_enable() + + # 学習に必要なクラスを準備する + print("prepare optimizer, data loader etc.") + + # 8-bit Adamを使う + if args.use_8bit_adam: + try: + import bitsandbytes as bnb + except ImportError: + raise ImportError("No bitsand bytes / bitsandbytesがインストールされていないようです") + print("use 8-bit Adam optimizer") + optimizer_class = bnb.optim.AdamW8bit + else: + optimizer_class = torch.optim.AdamW + + trainable_params = text_encoder.get_input_embeddings().parameters() + + # betaやweight decayはdiffusers DreamBoothもDreamBooth SDもデフォルト値のようなのでオプションはとりあえず省略 + optimizer = optimizer_class(trainable_params, lr=args.learning_rate) + + # dataloaderを準備する + # DataLoaderのプロセス数:0はメインプロセスになる + n_workers = min(args.max_data_loader_n_workers, os.cpu_count() - 1) # cpu_count-1 ただし最大で指定された数まで + train_dataloader = torch.utils.data.DataLoader( + train_dataset, batch_size=1, shuffle=False, collate_fn=collate_fn, num_workers=n_workers) + + # 学習ステップ数を計算する + if args.max_train_epochs is not None: + args.max_train_steps = args.max_train_epochs * len(train_dataloader) + print(f"override steps. steps for {args.max_train_epochs} epochs is / 指定エポックまでのステップ数: {args.max_train_steps}") + + # lr schedulerを用意する + lr_scheduler = diffusers.optimization.get_scheduler( + args.lr_scheduler, optimizer, num_warmup_steps=args.lr_warmup_steps, num_training_steps=args.max_train_steps * args.gradient_accumulation_steps) + + # acceleratorがなんかよろしくやってくれるらしい + text_encoder, optimizer, train_dataloader, lr_scheduler = accelerator.prepare( + text_encoder, optimizer, train_dataloader, lr_scheduler) + + index_no_updates = torch.arange(len(tokenizer)) < token_ids[0] + print(len(index_no_updates), torch.sum(index_no_updates)) + orig_embeds_params = unwrap_model(text_encoder).get_input_embeddings().weight.data.detach().clone() + + # Freeze all parameters except for the token embeddings in text encoder + text_encoder.requires_grad_(True) + text_encoder.text_model.encoder.requires_grad_(False) + text_encoder.text_model.final_layer_norm.requires_grad_(False) + text_encoder.text_model.embeddings.position_embedding.requires_grad_(False) + # text_encoder.text_model.embeddings.token_embedding.requires_grad_(True) + + unet.requires_grad_(False) + unet.to(accelerator.device, dtype=weight_dtype) + if args.gradient_checkpointing: # according to TI example in Diffusers, train is required + unet.train() + else: + unet.eval() + + if not cache_latents: + vae.requires_grad_(False) + vae.eval() + vae.to(accelerator.device, dtype=weight_dtype) + + # 実験的機能:勾配も含めたfp16学習を行う PyTorchにパッチを当ててfp16でのgrad scaleを有効にする + if args.full_fp16: + train_util.patch_accelerator_for_fp16_training(accelerator) + text_encoder.to(weight_dtype) + + # resumeする + if args.resume is not None: + print(f"resume training from state: {args.resume}") + accelerator.load_state(args.resume) + + # epoch数を計算する + num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps) + num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch) + if (args.save_n_epoch_ratio is not None) and (args.save_n_epoch_ratio > 0): + args.save_every_n_epochs = math.floor(num_train_epochs / args.save_n_epoch_ratio) or 1 + + # 学習する + total_batch_size = args.train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps + print("running training / 学習開始") + print(f" num train images * repeats / 学習画像の数×繰り返し回数: {train_dataset.num_train_images}") + print(f" num reg images / 正則化画像の数: {train_dataset.num_reg_images}") + print(f" num batches per epoch / 1epochのバッチ数: {len(train_dataloader)}") + print(f" num epochs / epoch数: {num_train_epochs}") + print(f" batch size per device / バッチサイズ: {args.train_batch_size}") + print(f" total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): {total_batch_size}") + print(f" gradient ccumulation steps / 勾配を合計するステップ数 = {args.gradient_accumulation_steps}") + print(f" total optimization steps / 学習ステップ数: {args.max_train_steps}") + + progress_bar = tqdm(range(args.max_train_steps), smoothing=0, disable=not accelerator.is_local_main_process, desc="steps") + global_step = 0 + + noise_scheduler = DDPMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", + num_train_timesteps=1000, clip_sample=False) + + if accelerator.is_main_process: + accelerator.init_trackers("textual_inversion") + + for epoch in range(num_train_epochs): + print(f"epoch {epoch+1}/{num_train_epochs}") + + text_encoder.train() + + loss_total = 0 + bef_epo_embs = unwrap_model(text_encoder).get_input_embeddings().weight[token_ids].data.detach().clone() + for step, batch in enumerate(train_dataloader): + with accelerator.accumulate(text_encoder): + with torch.no_grad(): + if "latents" in batch and batch["latents"] is not None: + latents = batch["latents"].to(accelerator.device) + else: + # latentに変換 + latents = vae.encode(batch["images"].to(dtype=weight_dtype)).latent_dist.sample() + latents = latents * 0.18215 + b_size = latents.shape[0] + + # Get the text embedding for conditioning + input_ids = batch["input_ids"].to(accelerator.device) + encoder_hidden_states = train_util.get_hidden_states(args, input_ids, tokenizer, text_encoder, torch.float) # weight_dtype) use float instead of fp16/bf16 because text encoder is float + + # Sample noise that we'll add to the latents + noise = torch.randn_like(latents, device=latents.device) + + # Sample a random timestep for each image + timesteps = torch.randint(0, noise_scheduler.config.num_train_timesteps, (b_size,), device=latents.device) + timesteps = timesteps.long() + + # Add noise to the latents according to the noise magnitude at each timestep + # (this is the forward diffusion process) + noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps) + + # Predict the noise residual + noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample + + if args.v_parameterization: + # v-parameterization training + target = noise_scheduler.get_velocity(latents, noise, timesteps) + else: + target = noise + + loss = torch.nn.functional.mse_loss(noise_pred.float(), target.float(), reduction="none") + loss = loss.mean([1, 2, 3]) + + loss_weights = batch["loss_weights"] # 各sampleごとのweight + loss = loss * loss_weights + + loss = loss.mean() # 平均なのでbatch_sizeで割る必要なし + + accelerator.backward(loss) + if accelerator.sync_gradients: + params_to_clip = text_encoder.get_input_embeddings().parameters() + accelerator.clip_grad_norm_(params_to_clip, 1.0) # args.max_grad_norm) + + optimizer.step() + lr_scheduler.step() + optimizer.zero_grad(set_to_none=True) + + # Let's make sure we don't update any embedding weights besides the newly added token + with torch.no_grad(): + unwrap_model(text_encoder).get_input_embeddings().weight[index_no_updates] = orig_embeds_params[index_no_updates] + + # Checks if the accelerator has performed an optimization step behind the scenes + if accelerator.sync_gradients: + progress_bar.update(1) + global_step += 1 + + current_loss = loss.detach().item() + if args.logging_dir is not None: + logs = {"loss": current_loss, "lr": lr_scheduler.get_last_lr()[0]} + accelerator.log(logs, step=global_step) + + loss_total += current_loss + avr_loss = loss_total / (step+1) + logs = {"loss": avr_loss} # , "lr": lr_scheduler.get_last_lr()[0]} + progress_bar.set_postfix(**logs) + + if global_step >= args.max_train_steps: + break + + if args.logging_dir is not None: + logs = {"loss/epoch": loss_total / len(train_dataloader)} + accelerator.log(logs, step=epoch+1) + + accelerator.wait_for_everyone() + + updated_embs = unwrap_model(text_encoder).get_input_embeddings().weight[token_ids].data.detach().clone() + d = updated_embs - bef_epo_embs + print(bef_epo_embs.size(), updated_embs.size(), d.mean(), d.min()) + + if args.save_every_n_epochs is not None: + model_name = train_util.DEFAULT_EPOCH_NAME if args.output_name is None else args.output_name + + def save_func(): + ckpt_name = train_util.EPOCH_FILE_NAME.format(model_name, epoch + 1) + '.' + args.save_model_as + ckpt_file = os.path.join(args.output_dir, ckpt_name) + print(f"saving checkpoint: {ckpt_file}") + save_weights(ckpt_file, updated_embs, save_dtype) + + def remove_old_func(old_epoch_no): + old_ckpt_name = train_util.EPOCH_FILE_NAME.format(model_name, old_epoch_no) + '.' + args.save_model_as + old_ckpt_file = os.path.join(args.output_dir, old_ckpt_name) + if os.path.exists(old_ckpt_file): + print(f"removing old checkpoint: {old_ckpt_file}") + os.remove(old_ckpt_file) + + saving = train_util.save_on_epoch_end(args, save_func, remove_old_func, epoch + 1, num_train_epochs) + if saving and args.save_state: + train_util.save_state_on_epoch_end(args, accelerator, model_name, epoch + 1) + + # end of epoch + + is_main_process = accelerator.is_main_process + if is_main_process: + text_encoder = unwrap_model(text_encoder) + + accelerator.end_training() + + if args.save_state: + train_util.save_state_on_train_end(args, accelerator) + + updated_embs = text_encoder.get_input_embeddings().weight[token_ids].data.detach().clone() + + del accelerator # この後メモリを使うのでこれは消す + + if is_main_process: + os.makedirs(args.output_dir, exist_ok=True) + + model_name = train_util.DEFAULT_LAST_OUTPUT_NAME if args.output_name is None else args.output_name + ckpt_name = model_name + '.' + args.save_model_as + ckpt_file = os.path.join(args.output_dir, ckpt_name) + + print(f"save trained model to {ckpt_file}") + save_weights(ckpt_file, updated_embs, save_dtype) + print("model saved.") + + +def save_weights(file, updated_embs, save_dtype): + state_dict = {"emb_params": updated_embs} + + if save_dtype is not None: + for key in list(state_dict.keys()): + v = state_dict[key] + v = v.detach().clone().to("cpu").to(save_dtype) + state_dict[key] = v + + if os.path.splitext(file)[1] == '.safetensors': + from safetensors.torch import save_file + save_file(state_dict, file) + else: + torch.save(state_dict, file) # can be loaded in Web UI + + +def load_weights(file): + if os.path.splitext(file)[1] == '.safetensors': + from safetensors.torch import load_file + data = load_file(file) + else: + # compatible to Web UI's file format + data = torch.load(file, map_location='cpu') + if type(data) != dict: + raise ValueError(f"weight file is not dict / 重みファイルがdict形式ではありません: {file}") + + if 'string_to_param' in data: # textual inversion embeddings + data = data['string_to_param'] + if hasattr(data, '_parameters'): # support old PyTorch? + data = getattr(data, '_parameters') + + emb = next(iter(data.values())) + if type(emb) != torch.Tensor: + raise ValueError(f"weight file does not contains Tensor / 重みファイルのデータがTensorではありません: {file}") + + if len(emb.size()) == 1: + emb = emb.unsqueeze(0) + + return emb + + +if __name__ == '__main__': + parser = argparse.ArgumentParser() + + train_util.add_sd_models_arguments(parser) + train_util.add_dataset_arguments(parser, True, True) + train_util.add_training_arguments(parser, True) + + parser.add_argument("--save_model_as", type=str, default="pt", choices=[None, "ckpt", "pt", "safetensors"], + help="format to save the model (default is .pt) / モデル保存時の形式(デフォルトはpt)") + + parser.add_argument("--weights", type=str, default=None, + help="embedding weights to initialize / 学習するネットワークの初期重み") + parser.add_argument("--num_vectors_per_token", type=int, default=1, + help='number of vectors per token / トークンに割り当てるembeddingsの要素数') + parser.add_argument("--token_string", type=str, default=None, + help="token string used in training, must not exist in tokenizer / 学習時に使用されるトークン文字列、tokenizerに存在しない文字であること") + parser.add_argument("--init_word", type=str, default=None, + help="word to initialize vector / ベクトルを初期化に使用する単語、tokenizerで一語になること") + parser.add_argument("--use_object_template", action='store_true', + help="ignore caption and use default templates for object / キャプションは使わずデフォルトの物体用テンプレートで学習する") + parser.add_argument("--use_style_template", action='store_true', + help="ignore caption and use default templates for stype / キャプションは使わずデフォルトのスタイル用テンプレートで学習する") + + args = parser.parse_args() + train(args) diff --git a/train_ti_README-ja.md b/train_ti_README-ja.md new file mode 100644 index 0000000..90989ec --- /dev/null +++ b/train_ti_README-ja.md @@ -0,0 +1,63 @@ +## Textual Inversionの学習について + +[Textual Inversion](https://textual-inversion.github.io/)です。実装に当たっては https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion を大いに参考にしました。 + +学習したモデルはWeb UIでもそのまま使えます。 + +なお恐らくSD2.xにも対応していますが現時点では未テストです。 + +## 学習方法 + +``train_textual_inversion.py`` を用います。 + +データの準備については ``train_network.py`` と全く同じですので、[そちらのドキュメント](./train_network_README-ja.md)を参照してください。 + +## オプション + +以下はコマンドラインの例です(DreamBooth手法)。 + +``` +accelerate launch --num_cpu_threads_per_process 1 train_textual_inversion.py + --pretrained_model_name_or_path=..\models\model.ckpt + --train_data_dir=..\data\db\char1 --output_dir=..\ti_train1 + --resolution=448,640 --train_batch_size=1 --learning_rate=1e-4 + --max_train_steps=400 --use_8bit_adam --xformers --mixed_precision=fp16 + --save_every_n_epochs=1 --save_model_as=safetensors --clip_skip=2 --seed=42 --color_aug + --token_string=mychar4 --init_word=cute --num_vectors_per_token=4 +``` + +``--token_string`` に学習時のトークン文字列を指定します。__学習時のプロンプトは、この文字列を含むようにしてください(token_stringがmychar4なら、``mychar4 1girl`` など)__。プロンプトのこの文字列の部分が、Textual Inversionの新しいtokenに置換されて学習されます。 + +プロンプトにトークン文字列が含まれているかどうかは、``--debug_dataset`` で置換後のtoken idが表示されますので、以下のように ``49408`` 以降のtokenが存在するかどうかで確認できます。 + +``` +input ids: tensor([[49406, 49408, 49409, 49410, 49411, 49412, 49413, 49414, 49415, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407]]) +``` + +tokenizerがすでに持っている単語(一般的な単語)は使用できません。 + +``--init_word`` にembeddingsを初期化するときのコピー元トークンの文字列を指定します。学ばせたい概念が近いものを選ぶとよいようです。二つ以上のトークンになる文字列は指定できません。 + +``--num_vectors_per_token`` にいくつのトークンをこの学習で使うかを指定します。多いほうが表現力が増しますが、その分多くのトークンを消費します。たとえばnum_vectors_per_token=8の場合、指定したトークン文字列は(一般的なプロンプトの77トークン制限のうち)8トークンを消費します。 + + +その他、以下のオプションが指定できます。 + +* --weights + * 学習前に学習済みのembeddingsを読み込み、そこから追加で学習します。 +* --use_object_template + * キャプションではなく既定の物体用テンプレート文字列(``a photo of a {}``など)で学習します。公式実装と同じになります。キャプションは無視されます。 +* --use_style_template + * キャプションではなく既定のスタイル用テンプレート文字列で学習します(``a painting in the style of {}``など)。公式実装と同じになります。キャプションは無視されます。 + +## 当リポジトリ内の画像生成スクリプトで生成する + +gen_img_diffusers.pyに、``--textual_inversion_embeddings`` オプションで学習したembeddingsファイルを指定してください(複数可)。プロンプトでembeddingsファイルのファイル名(拡張子を除く)を使うと、そのembeddingsが適用されます。 + diff --git a/train_ti_README.md b/train_ti_README.md new file mode 100644 index 0000000..ba03d55 --- /dev/null +++ b/train_ti_README.md @@ -0,0 +1,62 @@ +## About learning Textual Inversion + +[Textual Inversion](https://textual-inversion.github.io/). I heavily referenced https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion for the implementation. + +The trained model can be used as is on the Web UI. + +In addition, it is probably compatible with SD2.x, but it has not been tested at this time. + +## Learning method + +Use ``train_textual_inversion.py``. + +Data preparation is exactly the same as ``train_network.py``, so please refer to [their document](./train_network_README-en.md). + +## options + +Below is an example command line (DreamBooth technique). + +``` +accelerate launch --num_cpu_threads_per_process 1 train_textual_inversion.py + --pretrained_model_name_or_path=..\models\model.ckpt + --train_data_dir=..\data\db\char1 --output_dir=..\ti_train1 + --resolution=448,640 --train_batch_size=1 --learning_rate=1e-4 + --max_train_steps=400 --use_8bit_adam --xformers --mixed_precision=fp16 + --save_every_n_epochs=1 --save_model_as=safetensors --clip_skip=2 --seed=42 --color_aug + --token_string=mychar4 --init_word=cute --num_vectors_per_token=4 +``` + +``--token_string`` specifies the token string for learning. __The learning prompt should contain this string (eg ``mychar4 1girl`` if token_string is mychar4)__. This string part of the prompt is replaced with a new token for Textual Inversion and learned. + +``--debug_dataset`` will display the token id after substitution, so you can check if the token string after ``49408`` exists as shown below. I can confirm. + +``` +input ids: tensor([[49406, 49408, 49409, 49410, 49411, 49412, 49413, 49414, 49415, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, + 49407, 49407, 49407, 49407, 49407, 49407, 49407]]) +``` + +Words that the tokenizer already has (common words) cannot be used. + +In ``--init_word``, specify the string of the copy source token when initializing embeddings. It seems to be a good idea to choose something that has a similar concept to what you want to learn. You cannot specify a character string that becomes two or more tokens. + +``--num_vectors_per_token`` specifies how many tokens to use for this training. The higher the number, the more expressive it is, but it consumes more tokens. For example, if num_vectors_per_token=8, then the specified token string will consume 8 tokens (out of the 77 token limit for a typical prompt). + + +In addition, the following options can be specified. + +* --weights + * Load learned embeddings before learning and learn additionally from there. +* --use_object_template + * Learn with default object template strings (such as ``a photo of a {}``) instead of captions. It will be the same as the official implementation. Captions are ignored. +* --use_style_template + * Learn with default style template strings instead of captions (such as ``a painting in the style of {}``). It will be the same as the official implementation. Captions are ignored. + +## Generate with the image generation script in this repository + +In gen_img_diffusers.py, specify the learned embeddings file with the ``--textual_inversion_embeddings`` option. Using the filename (without the extension) of the embeddings file at the prompt will apply the embeddings. \ No newline at end of file