Txt2Img
This page is a basic guide to Txt2Img for WebUi, if you want to learn how to write prompts, please read Prompt Editing
related content.
If you are looking for some application examples, you should see the "Practical Guide" page.
Basic parameters
cfg scale
corresponds to the degree of prompt, the higher the value the more literal the prompt will be, the lower it will give the model more room to play, but the actual model performance in terms of CFG scale low (6-8) is low saturation, biased line drawing, biased clutter, high (18-22) is high saturation, biased CG style.
High CFG will cause color distortion, CFG should be between 5-15
denoise strength
img2img proprietary parameter, from 0 to 1, the higher the value the lower the AI reference to the original image (while increasing the number of iterations), personally I like low CFG high denoise heavy drawing, high CFG low denoise change the details.
If you have doubts about other parameters (e.g. seed
), please check the WebUi parameters on the model debugging page.
Negative cue words
The WebUi(SD) web application will reject content related to negative prompt words at generation time.
Negative cues are a way to use stable diffusion, allowing the user to specify what he does not want to see, without any additional burden or requirement on the model. The way it works is to use the user-specified text instead of the empty string as unconditional_conditioning
when doing the sampling.
Here is the (simplified) code from txt2img.py.
prompts = ["a castle in a forest"]
# batch_size = 1
c = model.get_learned_conditioning(prompts)
uc = model.get_learned_conditioning(batch_size * [""])
samples_ddim, _ = sampler.sample(conditioning=c, unconditional_conditioning=uc, [...]) )
This starts the sampler and iterates the following process.
Denoise the image to make it look more like your cue (condition).
Denoise the image to make it look more like an empty cue (unconditional condition).
Observe the difference between the two and use it to produce a set of changes to the noisy picture (different samplers handle this part differently).
- A sample using negative hints
prompts = ["a castle in a forest"]
# negative_prompts = ["grainy, fog"]
c = model.get_learned_conditioning(prompts)
uc = model.get_learned_conditioning(negative_prompts)
samples_ddim, _ = sampler.sample(conditioning=c, unconditional_conditioning=uc, [...])
For example, watermarks and textual content are rejected using the following prompt
lowres, bad anatomy, bad hands, text, error, missing fingers,
extra digit, fewer digits, cropped, worst quality, low quality,
normal quality, jpeg artifacts, signature, watermark, username, blurry
Also see this example
ugly, fat, obese, chubby, (((deformed))), [blurry], bad anatomy,
disfigured, poorly drawn face, mutation, mutated, (extra_limb),
(ugly), (poorly drawn hands fingers), messy drawing, morbid,
mutilated, tranny, trans, trannsexual, [out of frame], (bad proportions),
(poorly drawn body), (poorly drawn legs), worst quality, low quality,
normal quality, text, censored, gown, latex, pencil
CFG Scale/Denoising strength fit/noise reduction
CFG Scale
cfg scale
is the reference to the prompt when predicting, the higher it is, the better the fit when predicting.
Denoising strength Reducing noise
Denoising strength
determines how much the algorithm retains the content of the image, which can reduce the becoming of the drawing style, but also weakens the img2img capability. The higher the value, the less the AI references the original image (and increases the number of iterations).
For the image as a picture, a low denoising
means correcting the original image, while a high denoising
has no great relevance to the original image. Generally speaking, the threshold value is about 0.7, more than 0.7 and the original figure is basically irrelevant, 0.3 below is a slight change some.
The actual implementation, the specific implementation steps are Denoising strength * Sampling Steps.
Image Prompt
By default, the program will add Prompt, Parameters, and Model Metadata information to the image. For the original image without compression, we can drag the file into the PNG Info
tab to view the Prompt (Token).
Or use the Online Tools to view it.
Reverse Tokens
Here are some Image Reverse Parameter Services that can extract relevant parameters from images, not necessarily accurate.
Prompt Search Engine
You can visit the following links for some excellent examples of parameters.
Batch count&batch size
batch count
Specifies how many batches of images to train.
batchsize
translates to batch size in Chinese (batch size)
To put it simply, batch size will determine the number of samples we train at a time. The batch_size will affect how well and how fast the model is optimized.
Note also that batch size
is designed to find the best balance between memory efficiency
and memory capacity
, since GPU resources are not fully utilized in a single 512x512 image generation. Larger images get less benefit from it.
Info
Iteration is the action of repetitive feedback, where we want to train the neural network several times through iterations to reach the desired goal or result. The result of each iteration is used as the initial value for the next iteration. One iteration = one forward pass + one reverse pass
Fix large size screen crash
Highres.fix
is a convenient option to enable by checking this checkbox on the txt2img
page.
By default, txt2img
produces very chaotic images at very high resolutions. Whereas this plugin this makes it possible to avoid using small image compositions, partially render your image at lower resolutions, increase the resolution, and then add details at higher resolutions.
Note size
For example, out of the picture size wide people may be more
Tip
To match the pose, the lens and the character is not deformed, sometimes you need to limit the amount of words, when multiple characters to deal with spatial relationships and prompt masking priority. Number of people -> Character appearance -> Environment style -> Character state
Sizes above 1024 may give undesirable results! We recommend using small size + moderately high Step count + super-clear image resolution (see Advanced).
Step iteration steps
More iterative steps may result in better generation, more detail and sharpening, but will result in longer generation times. In practice, the difference between 30 and 50 steps is almost indistinguishable.
Too many iterative steps may also be counterproductive, with little to no improvement.
When generating graphs, weaker denoising normally requires fewer iterative steps (this is how it works). You can change the setting in the settings to let the program perform exactly the number of iteration steps specified by the slider.
Higher-order samplers such as DPM-Solver++
are more efficient and take fewer steps.
The exact execution steps in practical reasoning are Denoising strength * Sampling Steps.
Samplers Samplers
First order currently good to use are eular
, eular a
, more detailed, and Ddim
, labeled ++
samplers for higher order solvers, with fewer steps more detail and faster implementation.
Newbies are recommended to use eular a
eular a
is creative, different steps can produce different pictures.
PS: adjust too high step (> 30) will not work better
DDIM
converges fast, but relatively inefficient, because it takes many steps to get good results, suitable for use in redrawing
LMS
and PLMS
are derivatives of eular
, which use a related but slightly different method (averaging over several steps to improve accuracy). Stable results can be obtained in about 30 steps
PLMS
is an effective LMS (classical method) that can better handle the singularities in the neural network structure
DPM2
is a fantastic method that aims to improve DDIM by reducing the number of steps to get good results. It requires two denoising runs per step and it is about twice as fast as DDIM. But if you are experimenting with debugging cue words, this sampler doesn't work very well
Euler
is the simplest, and therefore one of the fastest
The higher-order sampler is DPM-Solver++
, which you can try with DPM++ 2S a
. Experiments have shown that DPM-Solver++ can generate high quality samples in only 15 to 20 steps for bootstrapping pixel space and potential space DPM.
- Different results for different steps and samplers
预览一 | 预览二 |
---|---|
from https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/4363
Seed debugging
The actual seed integer is not important. It just initializes a random number generator that defines the starting point of the diffusion and is a random initial value.
A good seed can perform things like composition and color in various cues, samplers, and CFGs. But it is now of limited use.
With the same Step, cfg, Seed, parameters (prompts), produces essentially the same picture!
The same seed will produce the same image for the same model and back-end implementation, keeping all parameters the same. depending on the other parameters.
Note, however, that different graphics cards may cause unexpectedly different results (e.g. parameters like precision)
The 10xx series looks so different from all other cards, see here
Merging Seeds
This technique is implemented by the UI. It is a way to merge different seeds from the same cue.
Click the Extra
button next to the seed input to enable this feature.
Make changes to the initial noise to try to merge different features.