![]() ![]() The SD 2-v model produces 768x768 px outputs.Įvaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,ĥ.0, 6.0, 7.0, 8.0) and 50 DDIM sampling steps show the relative improvements of the checkpoints: Stable Diffusion v2 refers to a specific configuration of the modelĪrchitecture that uses a downsampling-factor 8 autoencoder with an 865M UNetĪnd OpenCLIP ViT-H/14 text encoder for the diffusion model. The weights are available via the StabilityAI organization at Hugging Face under the CreativeML Open RAIL++-M License. The weights are research artifacts and should be treated as such.ĭetails on the training procedure and data, as well as the intended use of the model can be found in the corresponding model card. Although efforts were made to reduce the inclusion of explicit pornographic material, we do not recommend using the provided weights for services or products without additional safety mechanisms and considerations. Stable Diffusion models are general text-to-image diffusion models and therefore mirror biases and (mis-)conceptions that are present Upon successful installation, the code will automatically default to memory efficient attentionįor the self- and cross-attention layers in the U-Net and autoencoder. You can update an existing latent diffusion environment by running Stable Diffusion is a latent text-to-image diffusion model. High-Resolution Image Synthesis with Latent Diffusion Models ![]() The original Stable Diffusion model was created in a collaboration with CompVis and RunwayML and builds upon the work: We follow the original repository and provide basic inference scripts to sample from the models. The model is conditioned on monocular depth estimates inferred via MiDaS and can be used for structure-preserving img2img and shape-conditional synthesis.Ī text-guided inpainting model, finetuned from SD 2.0-base. New depth-guided stable diffusion model, finetuned from SD 2.0-base. ![]() The above model is finetuned from SD 2.0-base, which was trained as a standard noise-prediction model on 512x512 images and is also made available.Īdded a x4 upscaling latent text-guided diffusion model. SD 2.0-v is a so-called v-prediction model. Same number of parameters in the U-Net as 1.5, but uses OpenCLIP-ViT/H as the text encoder and is trained from scratch. New stable diffusion model ( Stable Diffusion 2.0-v) at 768x768 resolution. To enable fp16 (which can cause numerical instabilities with the vanilla attention module on the v2.1 model), run your script with ATTN_PRECISION=fp16 python Per default, the attention operation of the model is evaluated at full precision when xformers is not installed. New stable diffusion model ( Stable Diffusion 2.1-v, Hugging Face) at 768x768 resolution and ( Stable Diffusion 2.1-base, HuggingFace) at 512x512 resolution, both based on the same number of parameters and architecture as 2.0 and fine-tuned on 2.0, on a less restrictive NSFW filtering of the LAION-5B dataset.Instructions are available here.Ī public demo of SD-unCLIP is already available at /stable-diffusion-reimagine Comes in two variants: Stable unCLIP-L and Stable unCLIP-H, which are conditioned on CLIP ViT-L and ViT-H image embeddings, respectively. This model allows for image variations and mixing operations as described in Hierarchical Text-Conditional Image Generation with CLIP Latents, and, thanks to its modularity, can be combined with other models such as KARLO. New stable diffusion finetune ( Stable unCLIP 2.1, Hugging Face) at 768x768 resolution, based on SD2.1-768. The following list provides an overview of all currently available models. This repository contains Stable Diffusion models trained from scratch and will be continuously updated with ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |